Login     Register

        Contact Us     Search

XLeratorDB/statistics Documentation

SQL Server LOGIT function


LOGITSUM

Updated: 27 February 2015


Use the table-valued function LOGITSUM to calculate the binary logistic regression coefficients from a table of grouped x-values (the independent variables) with counts of the successes and failures for each group. The successes and failures can appear in any column and the column number of the successes and failures is specified at input.

x1
x2
x3
xn
success
failure
x1,1
x1,2
x1,3
x1,n
success1
failure1
x2,1
x2,2
x2,3
x2,n
success2
failure2
 
xm,1
xm,2
xm,3
xm,n
successm
failurem

For dichotomous values, where the y-values are {0,1], use the LOGIT function.
Logistic regression estimates the probability of an event occurring. Unlike ordinary least squares (see LINEST and LINEST_q) which estimates the value of a dependent variable based on the independent variables, LOGIT measures the probability (p) of an event occurring (1) or not occurring (0). The probability is estimated as:



The LOGIT function works by finding the coefficient (ß) values that maximize the log-likelihood statistic, which is defined as:



using a method of iteratively re-weighted least squares.

Syntax

SELECT * FROM [wct].[LOGITSUM](
  <@Matrix_RangeQuery, nvarchar(max),>
 ,<@Success_ColumnNumber, int,>
 ,<@Failure_ColumnNumber, int,>)
@Matrix_RangeQuery
the SELECT statement, as a string, which, when executed, creates the resultant table of x-values including the counts of successes and failures.
@Success_ColumnNumber
the number of the column in the resultant table returned by @Matrix_RangeQuery which contains the count of successes. @Success_ColumnNumber must be of the type int or of a type that implicitly converts to int.
@Failure_ColumnNumber
the number of the column in the resultant table returned by @Matrix_RangeQuery which contains the count of failures. @Failure_ColumnNumber must be of the type int or of a type that implicitly converts to int.

Return Types
RETURNS TABLE(
       [stat_name] [nvarchar](10) NULL,
       [idx] [int] NULL,
       [stat_val] [float] NULL
)

Table Description
stat_name
Identifies the statistic being returned:

b
estimated coefficient for each independent variable plus the intercept
se
standard error of b
z
z statistic for b
pval
p-value (normal distribution) for the z statistic
Wald
Wald statistic
LL0
log-likelihood with just the intercept and no other coefficients
LLM
model log-likelihood
chisq
chi squared statistic
df
degrees of freedom
p_chisq
p-value of the chi-squared statistic
AIC
Akaike information criterion
BIC
Bayesian information criterion
Nobs
number of observations
rsql
log-linear ratio R2
rsqcs
Cox and Snell's R2
rsqn
Nagelkerke's R2
D
deviance
Iterations
number of iteration in iteratively re-weighted least squares
Converged
a bit value identifying whether a solution was found (1) or not (0)
AUROC
area under the ROC curve

For more information on how these statistics are calculated see the LOGIT documentation.
idx
Identifies the subscript for the estimated coefficient (b), standard error of the coefficient (se),
z statistics (z), p-value of the z statistic (pval), and the Wald statistic.
When the idx is 0, it is referring to the intercept, otherwise the idx identifies the column number of independent variable.
Descriptive statistics other than the ones mentioned above will have an idx of NULL.
stat_val
the calculated value of the statistic.
 
Remarks
·         If @Success_ColumnNumber is NULL thenthe second right-most column in the resultant table is assumed to contain the count of successes.
·         If @Failure_ColumnNumber is NUL then the right-most column in the resultant table is assumed to contain the count of failures.
·         @Matrix_RangeQuery must return at least 3 columns or an error will be returned.
·         If @Success_ColumnNumber is not NULL and @Success_ColumnNumber < 1 an error will be returned.
·         If @Failure_ColumnNumber is not NULL and @Success_ColumnNumber < 1 an error will be returned.
·         If @Success_ColumnNumber is not NULL and @Success_ColumnNumber greater than the number of columns returned by @Matrix_RangeQuery an error will be returned.
·         If @Failure_ColumnNumber is not NULL and @Failure_ColumnNumber greater than the number of columns returned by @Matrix_RangeQuery an error will be returned.

Examples
Example #1
Using the coronary heart disease data from LOGIT Example #1, we summarize the data into the 8 different age groups in Hosmer and use the summarized data in the logistic regression. This summarization can be handled entirely within the @MatrixRangeQuery string passed into the function.
SELECT
   *
INTO
   #chd
FROM (VALUES
    (20,0),(23,0),(24,0),(25,0),(25,1),(26,0),(26,0),(28,0),(28,0),(29,0)
    ,(30,0),(30,0),(30,0),(30,0),(30,0),(30,1),(32,0),(32,0),(33,0),(33,0)
    ,(34,0),(34,0),(34,1),(34,0),(34,0),(35,0),(35,0),(36,0),(36,1),(36,0)
    ,(37,0),(37,1),(37,0),(38,0),(38,0),(39,0),(39,1),(40,0),(40,1),(41,0)
    ,(41,0),(42,0),(42,0),(42,0),(42,1),(43,0),(43,0),(43,1),(44,0),(44,0)
    ,(44,1),(44,1),(45,0),(45,1),(46,0),(46,1),(47,0),(47,0),(47,1),(48,0)
    ,(48,1),(48,1),(49,0),(49,0),(49,1),(50,0),(50,1),(51,0),(52,0),(52,1)
    ,(53,1),(53,1),(54,1),(55,0),(55,1),(55,1),(56,1),(56,1),(56,1),(57,0)
    ,(57,0),(57,1),(57,1),(57,1),(57,1),(58,0),(58,1),(58,1),(59,1),(59,1)
    ,(60,0),(60,1),(61,1),(62,1),(62,1),(63,1),(64,0),(64,1),(65,1),(69,1)
   )n(age,chd)
 
SELECT
   *
FROM wct.LOGITSUM('
   SELECT
        a.age as [Age]
       ,SUM(c.chd) as Present
       ,COUNT(*) - SUM(c.chd) as Absent
   FROM
       #chd c
   CROSS APPLY (
       SELECT TOP 1
           age
       FROM (VALUES
           (20,1),(30,2),(35,3),(40,4)
          ,(45,5),(50,6),(55,7),(60,8)
          )n(age,grp)
       WHERE
          n.age <= c.age
       ORDER BY
          n.age DESC
       )a
   GROUP BY
        a.age'
   ,2
   ,3
   )
This produces the following result.
As you can see the results are similar but not exactly the same as those produced using the raw data in LOGIT. Using the summarized data, the fitted values are given by the equation:


We can compare the results of the two models for a 60-year old using the following SQL.
SELECT
        1/(1+EXP(-1*(-5.30945337391905 + 0.1109211422069 * 60))) as P_raw
       ,1/(1+EXP(-1*(-4.85868536330207 + 0.106379775834921 * 60))) as P_sum
This produces the following result.


Example #2
In this example we have 3 independent variables, a count of the number of successes and a count of the number of observations for each grouping. We will need to calculate the count of failures in @Matrix_RangeQuery.
SELECT
   *
INTO
   #t
FROM (VALUES
   (100,1,10,28,156)
   ,(150,1,10,33,144)
   ,(200,1,10,44,171)
   ,(250,1,10,56,196)
   ,(300,1,10,55,158)
   ,(350,1,10,44,100)
   ,(400,1,10,57,126)
   ,(450,1,10,77,166)
   ,(500,1,10,84,166)
   ,(100,2,10,23,153)
   ,(150,2,10,31,165)
   ,(200,2,10,40,179)
   ,(250,2,10,42,152)
   ,(300,2,10,55,181)
   ,(350,2,10,68,200)
   ,(400,2,10,59,148)
   ,(450,2,10,69,156)
   ,(500,2,10,75,157)
   ,(100,1,11,19,164)
   ,(150,1,11,23,147)
   ,(200,1,11,35,182)
   ,(250,1,11,46,196)
   ,(300,1,11,41,143)
   ,(350,1,11,60,189)
   ,(400,1,11,59,162)
   ,(450,1,11,75,187)
   ,(500,1,11,59,129)
   ,(100,2,11,9,105)
   ,(150,2,11,22,179)
   ,(200,2,11,30,182)
   ,(250,2,11,32,155)
   ,(300,2,11,41,164)
   ,(350,2,11,58,200)
   ,(400,2,11,60,181)
   ,(450,2,11,75,199)
   ,(500,2,11,59,141)
   )n(x1,x2,x3,success,N)
 
SELECT
   *
FROM
   wct.LOGITSUM('SELECT x1,x2,x3,success,n-success from #t',4,5)
This produces the following result.

 

See Also
·         LINEST - Linear regression
·         LOGIT - Logit regression

 



Copyright 2008-2017 Westclintech LLC         Privacy Policy        Terms of Service