# SQL Server CHISQ.TEST function

CHITESTN2_q

Updated: 6 August 2010

Use CHITESTN2_q to calculate the Pearson chi-square test for independence on normalized tables. CHITESTN2_q returns the value from the chi-square (χ2) distribution for the statistic and the appropriate degrees of freedom.  Calculate the chi-square statistic (χ2) directly using the CHISQN2 or the CHISQN2_q function.

The chi-square statistic is calculated by finding the difference between each observed and theoretical frequency for each possible outcome, squaring them, dividing each by the theoretical frequency, and taking the sum of the results. A second important part of determining the test statistic is to define the degrees of freedom of the test: this is essentially the number of squares errors involving the observed frequencies adjusted for the effect of using some of those observations to define the expected frequencies.

Given the test statistic and the degrees of freedom, the test value is returned by the regularized gamma function Q(a, x) where:

a             is the degrees of freedom divided by 2
x              is χ2 statistic divided by 2

CHITESTN2_q requires the expected results as input to the function but automatically calculates the degrees of freedom.

The value of the test statistic is

Where
r              is the number of rows
c              is the number of columns
O             is the Observed result
E              is the Expected result

Syntax
SELECT [wctStatistics].[wct].[CHITESTN2_q] (
<@Actual_range_RangeQuery, nvarchar(max),>
,<@Expected_range_RangeQuery, nvarchar(max),>
,<@CompareByKeys, bit,>)
Arguments
@Actual_range_RangeQuery
the select statement, as text, used to determine the actual, or observed, results to be used in the calculation.
@Expected_range_RangeQuery
the select statement, as text, used to determine the expected results to be used in the calculation.
@CompareByKeys
A bit value (True/False) that tells the function to use the relative position (False) or the column names (True) when comparing the observed and the expected results.
Return Types
float
Remarks
·         CHITESTN2_q is designed for normalized tables. For de-normalized tables, use the CHISTEST2_q function.
·         CHITESTN2_q requires the expected results as input. To perform the chi-squared test without entering the expected results, use the CHITESTN_q function.
·         For simpler queries, consider using the CHITESTN function.
·         CHITESTN2_q = CHIDIST(χ2, df), where df = (r-1)(c-1), r>1, c>1.
·         Use CHISQN2_q to calculate the test statistic.
·         No GROUP BY is required for this function even though it produces aggregated results.
Examples
In this hypothetical situation, we want to determine if there is an association between population density and the preference for a sport from among baseball, football, and basketball. We will use the CHITESTN_q function to perform the chi-squared test.
CREATE TABLE #O (
[Sport]     [varchar] (20) NOT NULL,
[Locale]    [varchar] (20) NOT NULL,
[Result]    [float] NOT NULL
)
INSERT INTO #O VALUES ('Basketball', 'Rural', 28)
INSERT INTO #O VALUES ('Basketball', 'Suburban', 35)
INSERT INTO #O VALUES ('Basketball', 'Urban', 54)
INSERT INTO #O VALUES ('Baseball', 'Rural', 60)
INSERT INTO #O VALUES ('Baseball', 'Suburban', 43)
INSERT INTO #O VALUES ('Baseball', 'Urban', 35)
INSERT INTO #O VALUES ('Football', 'Rural', 52)
INSERT INTO #O VALUES ('Football', 'Suburban', 48)
INSERT INTO #O VALUES ('Football', 'Urban', 28)

CREATE TABLE #E (
[Sport]     [varchar] (20) NOT NULL,
[Locale]    [varchar] (20) NOT NULL,
[Result]    [float] NOT NULL
)
INSERT INTO #E VALUES ('Basketball', 'Rural', 42.77)
INSERT INTO #E VALUES ('Basketball', 'Suburban', 38.49)
INSERT INTO #E VALUES ('Basketball', 'Urban', 35.74)
INSERT INTO #E VALUES ('Baseball', 'Rural', 50.44)
INSERT INTO #E VALUES ('Baseball', 'Suburban', 45.4)
INSERT INTO #E VALUES ('Baseball', 'Urban', 42.16)
INSERT INTO #E VALUES ('Football', 'Rural', 46.79)
INSERT INTO #E VALUES ('Football', 'Suburban', 42.11)
INSERT INTO #E VALUES ('Football', 'Urban', 39.1)

SELECT wct.CHITESTN2_q(
'SELECT * FROM #O'
,'SELECT * FROM #E'
,'True')

This produces the following result
----------------------
0.000162575660939674

(1 row(s) affected)