Multi-input Aggregates in SQL Server 2005
Jan
8
Written by:
Charles Flock
1/8/2011 1:37 PM
Despite what Books-on-line and other Microsoft documentation tells you, it is possible to implement multi-input aggregates in SQL Server 2005, albeit with some restrictions.
In September 2010, we launched our XLeratorDB/statistics 2008 package, which contained new implementations of 14 existing XLeratorDB/statistics functions taking advantage of the multi-input aggregate features available in SQL Server 2008. Those functions are:
· CORREL
· COVAR
· DEVSQ
· FORECAST
· GROWTH
· INTERCEPT
· KURT
· PEARSON
· RSQ
· SKEW
· SLOPE
· STEYX
· TREND
· ZTEST
By re-implementing these functions as multi-input aggregates, we were able to achieve a more than tenfold increase in performance as well as making the functions syntactically simpler and more intuitive to use.
With that experience in mind, we realized that if we could find a way to wrap the multiple inputs into a single input when the aggregate is invoked, we could then unwrap the inputs in the aggregate function and process it exactly the same way that it was processed in SQL Server 2008. This possibility arose because each of these 14 functions had been re-implemented in SQL Server 2008 as single pass solutions; therefore the MaxByteSize limit of 8K in SQL Server 2005 was really no issue.
Syntactically the aggregate functions in SQL Server 2005 look almost the same as the aggregate function in SQL Server 2008, except that the values in SQL Server 2005 are embedded inside a wrapper function. For example, in SQL Server 2008, the GROWTH function looks like this:
SELECT wct.GROWTH(@Known_y, @Known_x, @new_x)
In SQL Server 2005, the GROWTH function looks like this:
SELECT wct.GROWTH(wct.fmtGROWTH(@Known_y, @Known_x, @new_x))
The difference being that @Known_y, @Known_x, and @new_x are wrapped in the function fmtGROWTH. The fmtGROWTH function has no other use except to package up the input for the GROWTH function in SQL Server 2005.
Of course, the wrapping and unwrapping of the input parameters makes the SQL Server 2005 versions slower than the SQL Server 2008 versions. But, they are still more than 5 times faster than the previous scalar implementations of the functions and are syntactically simpler and more intuitive to use. Of course, if you still need the precision of a two-pass solution, you can use the scalar equivalents of these functions (which are simply the function name with ‘_q’ appended).
Here’s a simple example. We will populate a table with some population information by region and then use the GROWTH function to predict the population in the year 2025.
CREATE TABLE #p(
continent nvarchar(50),
year bigint,
pop bigint
)
INSERT INTO #p VALUES ('Africa',1950,227270)
INSERT INTO #p VALUES ('Asia',1950,1402887)
INSERT INTO #p VALUES ('Europe',1950,547460)
INSERT INTO #p VALUES ('Latin America and the Caribbean',1950,167307)
INSERT INTO #p VALUES ('Northern America',1950,171615)
INSERT INTO #p VALUES ('Oceania',1950,12807)
INSERT INTO #p VALUES ('Africa',1975,418765)
INSERT INTO #p VALUES ('Asia',1975,2379374)
INSERT INTO #p VALUES ('Europe',1975,676207)
INSERT INTO #p VALUES ('Latin America and the Caribbean',1975,323323)
INSERT INTO #p VALUES ('Northern America',1975,242360)
INSERT INTO #p VALUES ('Oceania',1975,21286)
INSERT INTO #p VALUES ('Africa',2000,819462)
INSERT INTO #p VALUES ('Asia',2000,3698296)
INSERT INTO #p VALUES ('Europe',2000,726568)
INSERT INTO #p VALUES ('Latin America and the Caribbean',2000,521228)
INSERT INTO #p VALUES ('Northern America',2000,318654)
INSERT INTO #p VALUES ('Oceania',2000,31160)
INSERT INTO #p VALUES ('Africa',2010,1033043)
INSERT INTO #p VALUES ('Asia',2010,4166741)
INSERT INTO #p VALUES ('Europe',2010,732759)
INSERT INTO #p VALUES ('Latin America and the Caribbean',2010,588649)
INSERT INTO #p VALUES ('Northern America',2010,351659)
INSERT INTO #p VALUES ('Oceania',2010,35838)
SQL Server 2008
|
SQL Server 2005
|
SELECT continent
,ROUND(wct.GROWTH(pop, year, 2025), 0) as GROWTH
FROM #p
GROUP BY continent
|
SELECT continent
,ROUND(wct.GROWTH(wct.fmtGROWTH(pop, year, 2025)), 0) as GROWTH
FROM #p
GROUP BY continent
|
This produces the following result.
continent GROWTH
-------------------------------------------------- ----------------------
Africa 1523464
Asia 5698985
Europe 812097
Latin America and the Caribbean 856450
Northern America 427360
Oceania 47493
(6 row(s) affected)
On average, we found that the 2005 aggregates ran about 5 times faster than the scalar implementations. We also found that they ran in about twice the time as the SQL Server 2008 aggregates.
We think that implementing these functions as aggregates in SQL Server 2005 provides a clear benefit in terms of performance as well as making them simpler and easier to use. Let us know what you think.