KSTEST_q
Updated: 27 June 2012
Use the scalar function KSTEST_q to compare a sample with a reference probability distribution. The Kolmogorov-Smirnov statistic qualifies a distance between the empirical distribution function of the sample and the cumulative distribution function of the reference sample.
Syntax
SELECT [wctStatistics].[wct].[KSTEST_q](
<@XValues_RangeQuery, nvarchar(max),>
,<@Statistic, nvarchar(4000),>)
Arguments
@XValues_RangeQuery
a string which contains the SQL to select the sample values and the cumulative distiribution function. The resultant table must contain two columns containing the following values.
@x
the sample values. @x is an expression of type float or of a type that can be implicitly converted to float.
@cdf
the values of the cumulative distribution function. @cdf is an expression of type float or of a type that can be implicitly converted to float.
@statistic
identifies the return value as either the Kolmogorov-Smirnov statistic (k) or the p-value (p).
Return Types
float
Remarks
· If @x is null then @x and the corresponding @cdf are not included in the calculation.
· If @cdf is null then @cdf and the corresponding @x are not included in the calculation.
· To evaluate two samples, use KSTEST2 or KSTEST_q.
· For simpler queries, consider using KSTEST.
Examples
In this example, we compare the sample (-2,-1, 0, 1, 2, 3, 4) to the standard normal distribution.
/*Put data into a tenp table*/
SELECT *
INTO #n
FROM (VALUES (-2),(-1),(0),(1),(2),(3),(4)) n(x)
/*Calcuate the statistic and the p-value*/
SELECT wct.KSTEST_q('SELECT x,wct.NORMSDIST(x) FROM #n', 'k') as k_observed
,wct.KSTEST_q('SELECT x,wct.NORMSDIST(x) FROM #n', 'p') as p_value
/*Clean up the temp table*/
DROP TABLE #n
This produces the following result.
k_observed p_value
---------------------- ----------------------
0.412773317497114 0.135890127950776
In this example, we compare the sample x, consisting of 1,000 randomly generated numbers in the standard normal distribution, to the standard normal distribution.
/*Put data into a tenp table*/
SELECT k.seriesvalue as x
INTO #n
FROM wctMath.wct.SeriesFloat(0,1,NULL,1000,'N') k
/*Calcuate the statistic and the p-value*/
SELECT wct.KSTEST_q('SELECT x,wct.NORMSDIST(x) FROM #n', 'k') as k_observed
,wct.KSTEST_q('SELECT x,wct.NORMSDIST(x) FROM #n', 'p') as p_value
/*Clean up the temp table*/
DROP TABLE #n
This produces the following result. Your results will be different.
k_observed p_value
---------------------- ----------------------
0.0207079400233719 0.776329498203758
In this example, we compare the uniformaly random numbers to the gamma distribution with shape 1 and scale 1.
/*Put data into a tenp table*/
SELECT k.seriesvalue as x
INTO #n
FROM wctMath.wct.SeriesFloat(0,1,NULL,1000,'R') k
/*Calcuate the statistic and the p-value*/
SELECT wct.KSTEST_q('SELECT x,wct.GAMMADIST(x,1,1,1) FROM #n', 'k') as k_observed
,wct.KSTEST_q('SELECT x,wct.GAMMADIST(x,1,1,1) FROM #n', 'p') as p_value
/*Clean up the temp table*/
DROP TABLE #n
This produces the following result. Your results will be different.
k_observed p_value
---------------------- ----------------------
0.367902930881325 6.82167636320844E-122