Login     Register

        Contact Us     Search

Multi-input Aggregates in SQL Server 2005

Jan 8

Written by: Charles Flock
1/8/2011 1:37 PM  RssIcon

Despite what Books-on-line and other Microsoft documentation tells you, it is possible to implement multi-input aggregates in SQL Server 2005, albeit with some restrictions.
In September 2010, we launched our XLeratorDB/statistics 2008 package, which contained new implementations of 14 existing XLeratorDB/statistics functions taking advantage of the multi-input aggregate features available in SQL Server 2008. Those functions are:
·         CORREL
·         COVAR
·         DEVSQ
·         FORECAST
·         GROWTH
·         INTERCEPT
·         KURT
·         PEARSON
·         RSQ
·         SKEW
·         SLOPE
·         STEYX
·         TREND
·         ZTEST
By re-implementing these functions as multi-input aggregates, we were able to achieve a more than tenfold increase in performance as well as making the functions syntactically simpler and more intuitive to use.
With that experience in mind, we realized that if we could find a way to wrap the multiple inputs into a single input when the aggregate is invoked, we could then unwrap the inputs in the aggregate function and process it exactly the same way that it was processed in SQL Server 2008. This possibility arose because each of these 14 functions had been re-implemented in SQL Server 2008 as single pass solutions; therefore the MaxByteSize limit of 8K in SQL Server 2005 was really no issue.
Syntactically the aggregate functions in SQL Server 2005 look almost the same as the aggregate function in SQL Server 2008, except that the values in SQL Server 2005 are embedded inside a wrapper function. For example, in SQL Server 2008, the GROWTH function looks like this:
SELECT wct.GROWTH(@Known_y, @Known_x, @new_x)
In SQL Server 2005, the GROWTH function looks like this:
SELECT wct.GROWTH(wct.fmtGROWTH(@Known_y, @Known_x, @new_x))
The difference being that @Known_y, @Known_x, and @new_x are wrapped in the function fmtGROWTH. The fmtGROWTH function has no other use except to package up the input for the GROWTH function in SQL Server 2005.
Of course, the wrapping and unwrapping of the input parameters makes the SQL Server 2005 versions slower than the SQL Server 2008 versions. But, they are still more than 5 times faster than the previous scalar implementations of the functions and are syntactically simpler and more intuitive to use. Of course, if you still need the precision of a two-pass solution, you can use the scalar equivalents of these functions (which are simply the function name with ‘_q’ appended).
Here’s a simple example. We will populate a table with some population information by region and then use the GROWTH function to predict the population in the year 2025.
CREATE TABLE #p(
      continent   nvarchar(50),
      year        bigint,
      pop               bigint
      )
     
INSERT INTO #p VALUES ('Africa',1950,227270)
INSERT INTO #p VALUES ('Asia',1950,1402887)
INSERT INTO #p VALUES ('Europe',1950,547460)
INSERT INTO #p VALUES ('Latin America and the Caribbean',1950,167307)
INSERT INTO #p VALUES ('Northern America',1950,171615)
INSERT INTO #p VALUES ('Oceania',1950,12807)
INSERT INTO #p VALUES ('Africa',1975,418765)
INSERT INTO #p VALUES ('Asia',1975,2379374)
INSERT INTO #p VALUES ('Europe',1975,676207)
INSERT INTO #p VALUES ('Latin America and the Caribbean',1975,323323)
INSERT INTO #p VALUES ('Northern America',1975,242360)
INSERT INTO #p VALUES ('Oceania',1975,21286)
INSERT INTO #p VALUES ('Africa',2000,819462)
INSERT INTO #p VALUES ('Asia',2000,3698296)
INSERT INTO #p VALUES ('Europe',2000,726568)
INSERT INTO #p VALUES ('Latin America and the Caribbean',2000,521228)
INSERT INTO #p VALUES ('Northern America',2000,318654)
INSERT INTO #p VALUES ('Oceania',2000,31160)
INSERT INTO #p VALUES ('Africa',2010,1033043)
INSERT INTO #p VALUES ('Asia',2010,4166741)
INSERT INTO #p VALUES ('Europe',2010,732759)
INSERT INTO #p VALUES ('Latin America and the Caribbean',2010,588649)
INSERT INTO #p VALUES ('Northern America',2010,351659)
INSERT INTO #p VALUES ('Oceania',2010,35838)
 

SQL Server 2008
SQL Server 2005
SELECT continent
,ROUND(wct.GROWTH(pop, year, 2025), 0) as GROWTH
FROM #p
GROUP BY continent
 
SELECT continent
,ROUND(wct.GROWTH(wct.fmtGROWTH(pop, year, 2025)), 0) as GROWTH
FROM #p
GROUP BY continent
 

 
This produces the following result.
continent                                          GROWTH
-------------------------------------------------- ----------------------
Africa                                             1523464
Asia                                               5698985
Europe                                             812097
Latin America and the Caribbean                    856450
Northern America                                   427360
Oceania                                            47493
 
(6 row(s) affected)

On average, we found that the 2005 aggregates ran about 5 times faster than the scalar implementations. We also found that they ran in about twice the time as the SQL Server 2008 aggregates.
We think that implementing these functions as aggregates in SQL Server 2005 provides a clear benefit in terms of performance as well as making them simpler and easier to use. Let us know what you think.

Tags:
Categories:

Search Blogs

Copyright 2008-2024 Westclintech LLC         Privacy Policy        Terms of Service