Login    Register

XLeratorDB/statistics Documentation

PEARSON


PEARSON

Updated: 15 January 2011

Use the aggregate PEARSON function to calculate the correlation coefficient between two datasets. The equation for the correlation coefficient is
 
PEARSON function for SQL Server
 
Syntax
SELECT [wctStatistics].[wct].[PEARSON] (
     <@fmtPEARSON, nvarchar(4000),>)
 
The syntax has changed as of release 1.07 and the function has changed from a scalar to an aggregate. Please make the appropriate changes when you upgrade to 1.07.
Arguments
 
@fmtPEARSON
An nvarchar string formatted by the fmtPEARSON function containing the known-y and known-x values to be used by the PEARSON calculation.
Return Types
float
Remarks
·         PEARSON is an AGGREGATE function and follows the same conventions as all other AGGREGATE functions in SQL Server.
·         This function is a one-pass PEARSON solution. For a two-pass solution, use the PEARSON_q function.
Examples
In this example, we calculate the Pearson coefficient as correlation for a single set of x- and y-values
SELECT wct.PEARSON(wct.fmtPEARSON(y, x)) as PEARSON
FROM (
      SELECT 0.75, 1 UNION ALL
      SELECT 2.5, 2 UNION ALL
      SELECT 6.75, 3 UNION ALL
      SELECT 10, 4
      ) n(x,y)
 
This produces the following result
               PEARSON
----------------------
     0.988719187867937
 
(1 row(s) affected)
In this example, we will populate some temporary table with some historical financial information and then calculate the Pearson coefficient of correlation. First, create the table and put some data in it:
CREATE TABLE #c(
      SYM         NVARCHAR(5),
      YE          BIGINT,
      REV         FLOAT,
      GPROF       FLOAT,
      OPINC       FLOAT,
      NETINC      FLOAT
      )
     
INSERT INTO #c VALUES('YHOO',2009,6460.32,3588.57,386.69,597.99)
INSERT INTO #c VALUES('YHOO',2008,72.5,4185.14,12.96,418.92)
INSERT INTO #c VALUES('YHOO',2007,6969.27,4130.52,695.41,639.16)
INSERT INTO #c VALUES('YHOO',2006,6425.68,3749.96,940.97,751.39)
INSERT INTO #c VALUES('YHOO',2005,5257.67,3161.47,1107.73,1896.23)
INSERT INTO #c VALUES('GOOG',2009,23650.56,14806.45,8312.19,6520.45)
INSERT INTO #c VALUES('GOOG',2008,21795.55,13174.04,5537.21,4226.86)
INSERT INTO #c VALUES('GOOG',2007,16593.99,9944.9,54.44,4203.72)
INSERT INTO #c VALUES('GOOG',2006,10604.92,6379.89,3550,3077.45)
INSERT INTO #c VALUES('GOOG',2005,6138.56,3561.47,2017.28,1465.4)
INSERT INTO #c VALUES('MSFT',2010,62484,509,24167,18760)
INSERT INTO #c VALUES('MSFT',2009,58437,46282,21225,14569)
INSERT INTO #c VALUES('MSFT',2008,60420,48822,22271,17681)
INSERT INTO #c VALUES('MSFT',2007,51122,40429,18438,14065)
INSERT INTO #c VALUES('MSFT',2006,44282,36632,16064,12599)
INSERT INTO #c VALUES('ORCL',2010,26820,21056,9062,6135)
INSERT INTO #c VALUES('ORCL',2009,23252,18458,8321,5593)
INSERT INTO #c VALUES('ORCL',2008,22430,17449,7844,5521)
INSERT INTO #c VALUES('ORCL',2007,17996,13805,5974,4274)
INSERT INTO #c VALUES('ORCL',2006,14380,11145,4736,3381)
INSERT INTO #c VALUES('SAP',2009,10672,6980,2588,1748)
INSERT INTO #c VALUES('SAP',2008,11575,7370,2701,1847)
INSERT INTO #c VALUES('SAP',2007,10256,6631,2698,1906)
INSERT INTO #c VALUES('SAP',2006,9393,6064,2578,1871)
INSERT INTO #c VALUES('SAP',2005,8509,5460,2337,1496)
Now, calculate the correlation of the revenue (REV) againt the year (YE) for each company (SYM)
SELECT #c.SYM
,wct.PEARSON(wct.fmtPEARSON(REV,YE)) as PEARSON
FROM #c
GROUP BY SYM
This produces the following result.
SYM                  PEARSON
----- ----------------------
GOOG       0.988604792733014
MSFT        0.91861026921264
ORCL       0.983795721235544
SAP        0.873067973316442
YHOO      -0.219384585146667
 
(5 row(s) affected)

Let’s say we wanted to perform the same analysis as above, but we only want to return the results where the Pearson coefficient of correlation is positive.
SELECT #c.SYM
,wct.PEARSON(wct.fmtPEARSON(REV,YE)) as PEARSON
FROM #c
GROUP BY SYM
HAVING wct.PEARSON(wct.fmtPEARSON(REV,YE)) > 0
 
This produces the following result.
SYM                  PEARSON
----- ----------------------
GOOG       0.988604792733014
MSFT        0.91861026921264
ORCL       0.983795721235544
SAP        0.873067973316442
 
(4 row(s) affected)

In this example, we will calculate the Pearson coefficient of correlation of the operating income (OPINC) against the revenue (REV)
SELECT #c.SYM
,wct.PEARSON(wct.fmtPEARSON(OPINC,REV)) as PEARSON
FROM #c
GROUP BY SYM
 
This produces the following result.
SYM                  PEARSON
----- ----------------------
GOOG       0.651906713868849
MSFT       0.987612258172035
ORCL         0.9924157389967
SAP        0.844595495520328
YHOO       0.677389856742323
 

(5 row(s) affected)



  Comments
Add Comment
No Comments Yet


 |  View Topic History  |
Copyright 2010 WestClinTech LLC         Privacy Policy        Terms of Service