Login     Register

        Contact Us     Search

XLeratorDB/statistics Documentation

SQL Server chi-squared statistic for normalized data


CHISQN2

Updated: 6 August 2010

Note: This documentation is for the SQL2008 (and later) version of this XLeratorDB function, it is not compatible with SQL Server 2005.
Click here for the SQL2005 version of the CHISQN2 function


Use CHISQN2 to calculate the chi-square (χ2) statistic for normalized tables. This function calculates the chi-square statistic by finding the difference between each observed and theoretical frequency for each possible outcome, squaring them, dividing each by the theoretical frequency, and taking the sum of the results. A second important part of determining the test statistic is to define the degrees of freedom of the test: this is essentially the number of squares errors involving the observed frequencies adjusted for the effect of using some of those observations to define the expected frequencies.
 
CHISQN2 requires the expected results as input to the function.
 
The value of the chi-square statistic is:

 
Where
                r              is the number of rows
                c              is the number of columns
                O             is the Observed result
                E              is the Expected result
Syntax
SELECT [wctStatistics].[wct].[CHISQN2] (
  <@Actual_range_TableName, nvarchar(4000),>
 ,<@AR_Key1Columnname, nvarchar(4000),>
 ,<@AR_Key2ColumnName, nvarchar(4000),>
 ,<@AR_DataColumnName, nvarchar(4000),>
 ,<@AR_GroupedColumnName, nvarchar(4000),>
 ,<@AR_GroupedColumnValue, sql_variant,>
 ,<@Expected_range_TableName, nvarchar(4000),>
 ,<@ER_Key1Columnname, nvarchar(4000),>
 ,<@ER_Key2ColumnName, nvarchar(4000),>
 ,<@ER_DataColumnName, nvarchar(4000),>
 ,<@ER_GroupedColumnName, nvarchar(4000),>
 ,<@ER_GroupedColumnValue, sql_variant,>
 ,<@CompareByKeys, bit,>)
Arguments
@Actual_range_TableName
the name, as text, of the table or view that contains the observed values to be used in the calculation.
@AR_Key1Columnname
the name, as text, of the first column in the normalized table or view specified by @Actual_range_TableName that identifies the actual, or observed, results to be used in the calculation.
@AR_Key2ColumnName
the name, as text, of the second column in the normalized table or view specified by @Actual_range_TableName that identifies the actual, or observed, values to be used in the calculation.
@AR_GroupedColumnName
the name, as text, of the column in the table or view specified by @Actual_range_TableName which will be used for grouping the results.
@AR_GroupedColumnValue
the column value to do the grouping on.
@Expected_range_TableName
the name, as text, of the table or view that contains the expected values to be used in the calculation.
@ER_Key1ColumnName
the name, as text, of the first column in the normalized table or view specified by @Expected_range_TableName that identifies the expected results to be used in the calculation.
@ER_Key2Columnname
the name, as text, of the second column in the normalized table or view specified by @Expected_range_TableName that identifies the expected results to be used in the calculation.
@ER_GroupedColumnName
the name, as text, of the column in the table or view specified by @Expected_range_TableName which will be used for grouping the results.
@ER_GroupedColumnValue
the column value to do the grouping on.
@CompareByKeys
A bit value (True/False) that tells the function to use the relative position (False) or the column names (True) when comparing the observed and the expected results.
Return Types
float
Remarks
·         CHISQN2 is designed for normalized tables. For de-normalized tables, use the CHISQ2 function.
·         CHISQN2 requires the expected values as input. If you do not want to enter the expected values, use the CHISQN function.
·         For queries that are more complex, consider using the CHISQN2_q function.
·         No GROUP BY is required for this function even though it produces aggregated results.
Examples
In this hypothetical situation, we want to determine if there is an association between population density and the preference for a sport from among baseball, football, and basketball. We will use the CHISQ function to calculate the chi-squared statistic.
CREATE TABLE #O(
      [Sport]     [varchar] (20) NOT NULL,
      [Locale]    [varchar] (20) NOT NULL,
      [Result]    [float] NOT NULL
)
INSERT INTO #O VALUES ('Basketball', 'Rural', 28)
INSERT INTO #O VALUES ('Basketball', 'Suburban', 35)
INSERT INTO #O VALUES ('Basketball', 'Urban', 54)
INSERT INTO #O VALUES ('Baseball', 'Rural', 60)
INSERT INTO #O VALUES ('Baseball', 'Suburban', 43)
INSERT INTO #O VALUES ('Baseball', 'Urban', 35)
INSERT INTO #O VALUES ('Football', 'Rural', 52)
INSERT INTO #O VALUES ('Football', 'Suburban', 48)
INSERT INTO #O VALUES ('Football', 'Urban', 28)
 
CREATE TABLE #E(
      [Sport]     [varchar] (20) NOT NULL,
      [Locale]    [varchar] (20) NOT NULL,
      [Result]    [float] NOT NULL
)
INSERT INTO #E VALUES ('Basketball', 'Rural', 42.77)
INSERT INTO #E VALUES ('Basketball', 'Suburban', 38.49)
INSERT INTO #E VALUES ('Basketball', 'Urban', 35.74)
INSERT INTO #E VALUES ('Baseball', 'Rural', 50.44)
INSERT INTO #E VALUES ('Baseball', 'Suburban', 45.4)
INSERT INTO #E VALUES ('Baseball', 'Urban', 42.16)
INSERT INTO #E VALUES ('Football', 'Rural', 46.79)
INSERT INTO #E VALUES ('Football', 'Suburban', 42.11)
INSERT INTO #E VALUES ('Football', 'Urban', 39.1)
 
SELECT wct.CHISQN2(
 '#O'
,'Sport'
,'Locale'
,'Result'
,''
,NULL
,'#E'
,'Sport'
,'Locale'
,'Result'
,''
,NULL
,'True')
This produces the following result
----------------------
22.4562079299414
 

(1 row(s) affected)



Copyright 2008-2024 Westclintech LLC         Privacy Policy        Terms of Service