Window Functions for SQL Server 2005 and SQL Server 2008
Nov
8
Written by:
Charles Flock
11/8/2012 4:59 PM
SQL Server 2012 added more complete support for window functions, including enhanced support for window aggregate functions, as well as support for window offset and window distribution functions. In our latest function library, XLeratorDB / windowing, we make many of the same capabilities available for SQL Server 2005 and SQL Server 2008 users, as well as providing a bunch of functions that aren’t available in SQL Server 2012, but are available in Oracle.
One of the most frequent questions we get from our customers is “do you guys have anything that helps with the running total problem?” Up until recently our response has been that you needed to upgrade to SQL Server 2012. With our latest function library, XLeratorDB / windowing, we provide functions for running totals, moving averages, moving forecasts, lag, and many others. These are all scalar functions and eliminate the need for self-joins, stored procedures, cursors, common table expressions, or other methods for doing these types of calculations.
Generally, the ability to calculate cumulative values within a dataset or a partition is called windowing. In SQL Server 2012, this is completely straightforward. Let’s put some data in a table and then we can look at how to do this in SQL Server 2012 and in earlier versions of SQL Server using XLeratorDB / windowing.
CREATE TABLE #x (
rn int,
amt money,
PRIMARY KEY (rn)
)
INSERT INTO #x VALUES (1, 2756.73)
INSERT INTO #x VALUES (2,-622.46)
INSERT INTO #x VALUES (3,3593.53)
INSERT INTO #x VALUES (4,1942.06)
INSERT INTO #x VALUES (5,2795.85)
INSERT INTO #x VALUES (6,7659.16)
INSERT INTO #x VALUES (7,3681.59)
INSERT INTO #x VALUES (8,8214.8)
INSERT INTO #x VALUES (9,-220)
INSERT INTO #x VALUES (10,-529.62)
In this SELECT we will calculate the running total for the entire dataset in SQL Server 2012 using the SUM function with an OVER clause.
SELECT rn
,amt
,SUM(amt) OVER (ORDER BY rn) as Total
FROM #x
This produces the following result.
rn amt Total
----------- --------------------- ---------------------
1 2756.73 2756.73
2 -622.46 2134.27
3 3593.53 5727.80
4 1942.06 7669.86
5 2795.85 10465.71
6 7659.16 18124.87
7 3681.59 21806.46
8 8214.80 30021.26
9 -220.00 29801.26
10 -529.62 29271.64
Using the XLeratorDB / windowing function RunningSUM in SQL Server 2008, we can produce exactly the same result.
SELECT rn
,amt
,wct.RunningSUM(amt,2,rn,NULL) as Total
FROM #x
This produces the following result.
rn amt Total
----------- --------------------- ----------------------
1 2756.73 2756.73
2 -622.46 2134.27
3 3593.53 5727.8
4 1942.06 7669.86
5 2795.85 10465.71
6 7659.16 18124.87
7 3681.59 21806.46
8 8214.80 30021.26
9 -220.00 29801.26
10 -529.62 29271.64
The wct.RunningSUM(amt,2,rn,NULL) statement is equivalent to SUM(amt) OVER (ORDER BY rn).
In addition to being able to define a window, SQL Server 2012 allows you to define a frame, which I think of as a sliding window. In this example, which uses the same data, we calculate a moving average using SQL Server 2012.
SELECT rn
,amt
,AVG(amt) OVER (ORDER BY rn ROWS BETWEEN 5 preceding and current row) as [Moving Average]
FROM #x
This produces the following result.
rn amt Moving Average
----------- --------------------- ---------------------
1 2756.73 2756.73
2 -622.46 1067.135
3 3593.53 1909.2666
4 1942.06 1917.465
5 2795.85 2093.142
6 7659.16 3020.8116
7 3681.59 3174.955
8 8214.80 4647.8316
9 -220.00 4012.2433
10 -529.62 3600.2966
We can achieve the same result in SQL Server 2008 or 2005 with the following SQL.
SELECT rn
,amt
,wct.MovingAVG(amt,5, ROW_NUMBER() OVER (ORDER BY rn),NULL) as [Moving Average]
FROM #x
This produces the following result.
rn amt Moving Average
----------- --------------------- ----------------------
1 2756.73 2756.73
2 -622.46 1067.135
3 3593.53 1909.26666666667
4 1942.06 1917.465
5 2795.85 2093.142
6 7659.16 3020.81166666667
7 3681.59 3174.955
8 8214.80 4647.83166666667
9 -220.00 4012.24333333333
10 -529.62 3600.29666666667
The SQL Server 2012 window and framing capability is available for all the SQL Server built-in aggregate functions:
· AVG
· COUNT
· SUM
· MIN
· MAX
· STDEV
· STDEVP
· VAR
· VARP
The list of XLeratorDB functions that match the windowing and framing capabilities for the aggregates in SQL Server 2012 is:
The Running functions calculate a cumulative value from the beginning of a dataset or partition. The Moving functions calculate a cumulative value from the beginning of the frame to the end. We had thought about having a single SUM function, for example, that could do both windowing and framing, but after trying several different implementations we decided that it was better to have 2 functions that did different things rather than one function that could do two different things depending on the passed parameters.
Like the SQL Server 2012 implementation, it is possible to calculate multiple windows and/or frames in a single resultant table. For example, you might want to have multiple running totals in your resultant table or you may want to combine a running total with a moving average. For more about how to do this, please refer to the documentation.
In creating this library, we discovered that ORACLE has even more functions that support windowing and framing so we included the following functions to match some of the capabilities of ORACLE.
There are aggregate versions of these functions available in XLeratorDB / statistics and XLeratorDB / statistics 2008. For more about how these functions work, please refer to the documentation.
We also added some that let you calculate weighted moving averages and exponentially-weighted moving averages. Those functions are:
· RunningEWMA – calculates an exponentially-weighted moving average in a window,
· MovingEWMA – calculates an exponentially-weighted moving average in a frame,
· DEMA – daily exponentially-weighted moving average (automatic calculation of alpha), and
· DWMA – daily weighted moving average.
Finally, we did include a LAG function, which permits you to access numeric data from a previous row in the same result set without the use of a self-join.
We created over 300,000,000 cases in our test harness, which contains more than 40GB of data to test the windowing functions. For the SQL Server built-in aggregate we would run the test harness in SQL Server 2012 using the built-in TSQL syntax. We would then import the 2012 test harness into SQL Server 2008 and SQL Server 2005 and compare the results using the XLeratorDB functions. We also did the same for the LAG function.
For functions that are not supported in SQL Server 2012 (like SLOPE and INTERCEPT), we simulated those calculations using TSQL in SQL Server 2012 and then imported the data to SQL Server 2008 and SQL Server 2005.
We were not able to perform some calculations in SQL Server 2012 at all. The exponentially-weighted moving averages could not be calculated in SQL Server 2012 without resorting to cursors or common table expressions. We also tried to calculate daily weighted moving averages using the SQL Server 2012 LAG function, which is at least theoretically possible. For example, you might calculate a 5-day daily-weighted moving average using this SQL.
SELECT *
,(
LAG(price_closing * 1, 5, NULL) OVER (PARTITION by id ORDER BY id, date_trade) +
LAG(price_closing * 2, 4, NULL) OVER (PARTITION by id ORDER BY id, date_trade) +
LAG(price_closing * 3, 3, NULL) OVER (PARTITION by id ORDER BY id, date_trade) +
LAG(price_closing * 4, 2, NULL) OVER (PARTITION by id ORDER BY id, date_trade) +
LAG(price_closing * 5, 1, NULL) OVER (PARTITION by id ORDER BY id, date_trade)
) / 15 as DWMA_5
FROM equityprices
In this example, the table equityprices contains the closing prices for the last 10 years for each ticker in the S&P 500. The table contains 1,827,000 rows. This query took about 5 minutes 30 seconds to run on my 64-bit 2.5 GHz Pentium machine with 4GB of memory. When I added similar calculations for the 50-day moving average and the 200-day moving average, I had to kill the query that used native T-SQL after 36 hours!
Using the XLeratorDB DWMA function, however, allowed me to calculate the 5-day, 50-day and 200-day moving averages with the following statement.
SELECT *
,wct.DWMA(price_closing,5,1,ROW_NUMBER() OVER(PARTITION BY id ORDER BY id, date_trade), NULL) as DWMA_5
,wct.DWMA(price_closing,50,1,ROW_NUMBER() OVER(PARTITION BY id ORDER BY id, date_trade), 1) as DWMA_50
,wct.DWMA(price_closing,200,1,ROW_NUMBER() OVER(PARTITION BY id ORDER BY id, date_trade), 2) as DWMA_200
FROM equityprices
Running this XLeratorDB query on the same box as the 2012 T-SQL query returned all 1,827,000 rows with the respective moving averages in exactly 5 minutes.
In the course of our testing, we noticed that the performance in SQL Server 2005 and 2008 using XLeratorDB was generally as good as or better than the T-SQL performance in SQL Server 2012. Of course, it’s really hard to talk about performance, because every environment is different, so here we are really only talking about relative performance.
For example, we shut down our test server, restarted it, started SQL Server 2008 and ran the following query.
SELECT M.TRN
,M.TRANDATE
,F.CustomerName
,M.TRANAMT
,wct.RunningAVG(TRANAMT, Row_number() OVER (ORDER BY F.CustomerName, M.TRANDATE, M.TRN),NULL) as t0
,wct.RunningAVG(TRANAMT, Row_number() OVER (PARTITION BY F.CustomerName ORDER BY F.CustomerName, M.TRANDATE, M.TRN),1) as t1
,wct.RunningAVG(TRANAMT, Row_number() OVER (PARTITION BY F.CustomerName, M.TRANDATE ORDER BY F.CustomerName, M.TRANDATE, M.TRN),2) as t2
FROM MVMNT M
JOIN FEDWIRE F
ON F.id = M.FEDWIRE_ID
WHERE F.STATE = 'MA'
ORDER BY F.CustomerName, M.TRANDATE, M.TRN
That query produced the following statistics
We then started SQL Server 2012 on the same box and ran this query
SELECT M.TRN
,M.TRANDATE
,F.CustomerName
,M.TRANAMT
,AVG(cast(TRANAMT as float)) OVER (ORDER BY F.CustomerName, M.TRANDATE, M.TRN) as T0
,AVG(cast(TRANAMT as float)) OVER (PARTITION BY F.CustomerName ORDER BY F.CustomerName, M.TRANDATE, M.TRN) as T1
,AVG(cast(TRANAMT as float)) OVER (PARTITION BY F.CustomerName, M.TRANDATE ORDER BY F.CustomerName, M.TRANDATE, M.TRN) as T2
FROM MVMNT M
JOIN FEDWIRE F
ON F.id = M.FEDWIRE_ID
WHERE F.STATE = 'MA'
ORDER BY F.CustomerName, M.TRANDATE, M.TRN
That query produced the following statistics.
As you can see, the total execution time for SQL Server 2008 was 243,280 milliseconds. The total execution time in SQL Server 2012 was 524,821 milliseconds, or more than twice as long.
We think that the addition of window and framing capabilities is a great addition to SQL Server 2012. With XLeratorDB / windowing, SQL Server 2008 and 2005 users can perform these calculations on the database too and pick up about 2 dozen other calculations that are not directly available in SQL Server 2012. SQL Server 2012 users can use the XLeratorDB functions as well, particularly if they are interested in doing some basic time-series analysis on the database. You can download the 15-day free trial and let us know what you think. If there is something that you would like to see added to the library, just let us know at support@westclintech.com