Calculating a Correlation Matrix in SQL Server
Mar
25
Written by:
Charles Flock
3/25/2015 10:24 AM
Diversification is the basis for any sound investment strategy and the heart of diversification is finding uncorrelated risk in different asset classes. In this article we show you how to do that using the XLeratorDB table-valued function CORRM.
Correlation is measured on a scale from -1, meaning that whatever is being measured is perfectly negatively correlated, and 1, meaning perfectly correlated. What the values between -1 and 1 mean is subject to interpretation depending on the context. However, when constructing a correlation matrix we are more concerned with comparing the correlation coefficients than with drawing any inference about the actual value of the correlation coefficient. It lets us compare the degree of correlation.
We have used the closing stock prices over the last 90 or so business days as input for these examples. We created a table, PRICES, which stores the date and closing price for each stock symbol (ticker).
--Create a table in 3rd Normal form to store price history
CREATE TABLE PRICES(
tdate date,
ticker char(5),
price money,
PRIMARY KEY (tdate, ticker)
)
We will populate this table with data for the symbols AAPL, BIDU, FN, GOOG, LNKD, MSFT, TWTR, and YHOO.
You can get the data here.
Once you have inserted the data into your database, you can select the first 8 rows with the following SQL to make sure it agrees with the data in this example.
--Select the first few rows to make sure the data are correct
SELECT TOP 8
*
FROM
PRICES
ORDER BY
tdate, tdate
This produces the following result.
We use the table-valued function CORRM to calculate the correlation matrix.
--Calculate the Correlation Matrix using CORRM
SELECT
*
FROM
wct.CORRM('SELECT * FROM PRICES ORDER BY ticker, tdate','True')
This produces the following result.
As you can see, the results are returned in 3rd-normal from using row and column indices to identify the correlations. In order to be useful to us we need to turn the row and column indices into stock symbols. Here's one way to do that.
--Put the tickers into a table so that we can label the rows and columns
SELECT
ROW_NUMBER()OVER (ORDER BY ticker) - 1 as rn,ticker
INTO
#vlookup
FROM(SELECT DISTINCT ticker FROM PRICES)n
--Create the output using the tickers rather the RowNum, ColNum
SELECT
y.ticker,x.ticker,k.ItemValue
FROM
wct.CORRM('SELECT * FROM PRICES ORDER BY ticker, tdate','True')k
CROSS APPLY(SELECT TOP(1) ticker FROM #vlookup WHERE colnum <= rn ORDER BY rn)x
CROSS APPLY(SELECT TOP(1) ticker FROM #vlookup WHERE rownum <= rn ORDER BY rn)y
This produces the following result.
If we wanted to see the results returned in a more traditional 'matrix' format, we simply use the SQL Server PIVOT function.
--Pivot the resultant table using tickers alphabetically
SELECT ticker2,AAPL,BIDU,FB,GOOG,LNKD,MSFT,TWTR,YHOO
FROM (
SELECT
y.ticker2, x.ticker1,k.ItemValue
FROM
wct.CORRM('SELECT * FROM PRICES ORDER BY ticker, tdate','True')k
CROSS APPLY(SELECT TOP(1) ticker FROM #vlookup WHERE colnum <= rn ORDER BY rn)x(ticker1)
CROSS APPLY(SELECT TOP(1) ticker FROM #vlookup WHERE rownum <= rn ORDER BY rn)y(ticker2)
)d
PIVOT(SUM(ItemValue) FOR Ticker1 IN(AAPL,BIDU,FB,GOOG,LNKD,MSFT,TWTR,YHOO))pvt
ORDER BY
ticker2
This produces the following result.
But, we are not really interested in looking at the correlation among 8 companies using the last 90 days worth of data. What could be interesting, as an example, is creating a correlation matrix with every company traded on the NYSE. Let's look at how that might work.
First, you should eliminate the data in the PRICES table.
TRUNCATE TABLE PRICES
There are approximately 2,800 companies that trade on the NYSE. We will randomly create 2,800 stock symbols and create a year's worth of price data, which we will skew ever so slightly to be correlated. We will only create prices for non-weekend days. This should create 2,800 * 262 = 733,600 rows of data.
SELECT
n2.ticker
,n2.mean
,ROUND(mean / wct.RANDBETWEEN(2,10),2) as sigma
INTO
#p1
FROM (
SELECT TOP 2800
ticker,
wct.RANDBETWEEN(50,150) as mean
FROM (
SELECT
wct.RANDBETWEEN(1,17576) as num
,CHAR(k1.SeriesValue) + CHAR(k2.SeriesValue) + CHAR(k3.SeriesValue) as ticker
FROM
wct.SeriesInt(UNICODE('A'), UNICODE('Z'),NULL,NULL,NULL)k1
CROSS APPLY
wct.SeriesInt(UNICODE('A'), UNICODE('Z'),NULL,NULL,NULL)k2
CROSS APPLY
wct.SeriesInt(UNICODE('A'), UNICODE('Z'),NULL,NULL,NULL)k3
)n1
ORDER BY
num ASC
)n2
SELECT
SeriesValue as tdate
,wct.RANDNORM(0,1) as normsinv
INTO
#d1
FROM
wct.SeriesDate(wct.EDATE(cast(GETDATE() as date),-12),cast(GETDATE() as date),1,NULL,NULL)k4
WHERE
DATEPART(dw, k4.SeriesValue) != 1 AND DATEPART(dw, k4.SeriesValue) != 7
INSERT INTO
PRICES
SELECT
p.ticker
,d.tdate
,ROUND(p.mean *(1+wct.RAND()) + p.sigma*d.normsinv,2) as price
FROM
#p1 p, #d1 d
DROP TABLE #p1
DROP TABLE #d1
Then we can run the following query which will create a table (#corr) with all the correlations between all the tickers.
SELECT
v1.ticker as ticker1
,v2.ticker as ticker2
,k.ItemValue as correlation
INTO
#corr
FROM
wct.CORRM('SELECT tdate, ticker, price FROM PRICES ORDER BY ticker, tdate','True')k, #vlookup v1, #vlookup v2
WHERE
k.rownum < k.ColNum
and k.RowNum = v1.rn
and k.ColNum = v2.rn
On my laptop computer running SQL Server 2012 this takes about 90 seconds and produces a table with 3,918,600 rows, which is the combination of ticker pairs.
Now, it's just a matter of selecting what's of interest. For example, we could select the top 10 pairs with the highest correlation.
SELECT TOP 10
*
FROM
#corr
ORDER BY
correlation DESC
This produces the following result (your results will be different).
Or we could select the 10 pairs with the lowest correlation.
SELECT TOP 10
*
FROM
#corr
ORDER BY
correlation ASC
This produces the following result (your results will be different).
We think that using the CORRM table-valued function in SQL Server is probably the best way to do analyses of this type. Using CORRM makes looking for correlations in large datasets simple and efficient, allowing you to gain insights into your data that might otherwise remain hidden. Let us know what you think.
If you think that CORRM might be useful, you can get the free 15-day trial by following this link.
If you want to find out more about our math library for SQL Server click here. If there is something you would like to see added to the almost 800 functions already included in XLeratorDB, just send us an e-mail to support@westclintech.com.