Correlation is measured on a scale from -1, meaning that whatever is being measured is perfectly negatively correlated, and 1, meaning perfectly correlated. What the values between -1 and 1 mean is subject to interpretation depending on the context. However, when constructing a correlation matrix we are more concerned with comparing the correlation coefficients than with drawing any inference about the actual value of the correlation coefficient. It lets us compare the degree of correlation.

We have used the closing stock prices over the last 90 or so business days as input for these examples. We created a table, PRICES, which stores the date and closing price for each stock symbol (ticker).

--Create a table in 3rd Normal form to store price history

CREATE TABLE PRICES(

tdate date,

ticker char(5),

price money,

PRIMARY KEY (tdate, ticker)

)

We will populate this table with data for the symbols AAPL, BIDU, FN, GOOG, LNKD, MSFT, TWTR, and YHOO.

Once you have inserted the data into your database, you can select the first 8 rows with the following SQL to make sure it agrees with the data in this example.

--Select the first few rows to make sure the data are correct

SELECT TOP 8

*

FROM

PRICES

ORDER BY

tdate, tdate

This produces the following result.

We use the table-valued function CORRM to calculate the correlation matrix.

--Calculate the Correlation Matrix using CORRM

SELECT

*

FROM

wct.CORRM('SELECT * FROM PRICES ORDER BY ticker, tdate','True')

This produces the following result.

As you can see, the results are returned in 3

--Put the tickers into a table so that we can label the rows and columns

SELECT

ROW_NUMBER()OVER (ORDER BY ticker) - 1 as rn,ticker

INTO

#vlookup

FROM(SELECT DISTINCT ticker FROM PRICES)n

--Create the output using the tickers rather the RowNum, ColNum

SELECT

y.ticker,x.ticker,k.ItemValue

FROM

wct.CORRM('SELECT * FROM PRICES ORDER BY ticker, tdate','True')k

CROSS APPLY(SELECT TOP(1) ticker FROM #vlookup WHERE colnum <= rn ORDER BY rn)x

CROSS APPLY(SELECT TOP(1) ticker FROM #vlookup WHERE rownum <= rn ORDER BY rn)y

This produces the following result.

If we wanted to see the results returned in a more traditional 'matrix' format, we simply use the SQL Server PIVOT function.

--Pivot the resultant table using tickers alphabetically

SELECT ticker2,AAPL,BIDU,FB,GOOG,LNKD,MSFT,TWTR,YHOO

FROM (

SELECT

y.ticker2, x.ticker1,k.ItemValue

FROM

wct.CORRM('SELECT * FROM PRICES ORDER BY ticker, tdate','True')k

CROSS APPLY(SELECT TOP(1) ticker FROM #vlookup WHERE colnum <= rn ORDER BY rn)x(ticker1)

CROSS APPLY(SELECT TOP(1) ticker FROM #vlookup WHERE rownum <= rn ORDER BY rn)y(ticker2)

)d

PIVOT(SUM(ItemValue) FOR Ticker1 IN(AAPL,BIDU,FB,GOOG,LNKD,MSFT,TWTR,YHOO))pvt

ORDER BY

ticker2

This produces the following result.

But, we are not really interested in looking at the correlation among 8 companies using the last 90 days worth of data. What could be interesting, as an example, is creating a correlation matrix with every company traded on the NYSE. Let's look at how that might work.

First, you should eliminate the data in the PRICES table.

TRUNCATE TABLE PRICES

There are approximately 2,800 companies that trade on the NYSE. We will randomly create 2,800 stock symbols and create a year's worth of price data, which we will skew ever so slightly to be correlated. We will only create prices for non-weekend days. This should create 2,800 * 262 = 733,600 rows of data.

SELECT

n2.ticker

,n2.mean

,ROUND(mean / wct.RANDBETWEEN(2,10),2) as sigma

INTO

#p1

FROM (

SELECT TOP 2800

ticker,

wct.RANDBETWEEN(50,150) as mean

FROM (

SELECT

wct.RANDBETWEEN(1,17576) as num

,CHAR(k1.SeriesValue) + CHAR(k2.SeriesValue) + CHAR(k3.SeriesValue) as ticker

FROM

wct.SeriesInt(UNICODE('A'), UNICODE('Z'),NULL,NULL,NULL)k1

CROSS APPLY

wct.SeriesInt(UNICODE('A'), UNICODE('Z'),NULL,NULL,NULL)k2

CROSS APPLY

wct.SeriesInt(UNICODE('A'), UNICODE('Z'),NULL,NULL,NULL)k3

)n1

ORDER BY

num ASC

)n2

SELECT

SeriesValue as tdate

,wct.RANDNORM(0,1) as normsinv

INTO

#d1

FROM

wct.SeriesDate(wct.EDATE(cast(GETDATE() as date),-12),cast(GETDATE() as date),1,NULL,NULL)k4

WHERE

DATEPART(dw, k4.SeriesValue) != 1 AND DATEPART(dw, k4.SeriesValue) != 7

INSERT INTO

PRICES

SELECT

p.ticker

,d.tdate

,ROUND(p.mean *(1+wct.RAND()) + p.sigma*d.normsinv,2) as price

FROM

#p1 p, #d1 d

DROP TABLE #p1

DROP TABLE #d1

Then we can run the following query which will create a table (#corr) with all the correlations between all the tickers.

SELECT

v1.ticker as ticker1

,v2.ticker as ticker2

,k.ItemValue as correlation

INTO

#corr

FROM

wct.CORRM('SELECT tdate, ticker, price FROM PRICES ORDER BY ticker, tdate','True')k, #vlookup v1, #vlookup v2

WHERE

k.rownum < k.ColNum

and k.RowNum = v1.rn

and k.ColNum = v2.rn

On my laptop computer running SQL Server 2012 this takes about 90 seconds and produces a table with 3,918,600 rows, which is the combination of ticker pairs.

Now, it's just a matter of selecting what's of interest. For example, we could select the top 10 pairs with the highest correlation.

SELECT TOP 10

*

FROM

#corr

ORDER BY

correlation DESC

This produces the following result (your results will be different).

Or we could select the 10 pairs with the lowest correlation.

SELECT TOP 10

*

FROM

#corr

ORDER BY

correlation ASC

This produces the following result (your results will be different).

We think that using the CORRM table-valued function in SQL Server is probably the best way to do analyses of this type. Using CORRM makes looking for correlations in large datasets simple and efficient, allowing you to gain insights into your data that might otherwise remain hidden. Let us know what you think.

If you think that CORRM might be useful, you can get the free 15-day trial by following this link.

If you want to find out more about our math library for SQL Server click here. If there is something you would like to see added to the almost 800 functions already included in XLeratorDB, just send us an e-mail to support@westclintech.com.

Archive

Monthly

Go

| |||||||||

Sun | Mon | Tue | Wed | Thu | Fri | Sat | |||
---|---|---|---|---|---|---|---|---|---|

26 | 27 | 28 | 1 | 2 | 3 | 4 | |||

5 | 6 | 7 | 8 | 9 | 10 | 11 | |||

12 | 13 | 14 | 15 | 16 | 17 | 18 | |||

19 | 20 | 21 | 22 | 23 | 24 | 25 | |||

26 | 27 | 28 | 29 | 30 | 31 | 1 | |||

2 | 3 | 4 | 5 | 6 | 7 | 8 |

Go