WestClinTech - SQL Server Functions - Blog - Introducing XLeratorDB/statistics for SQL Server 2008

Home

XLeratorDB function packages for SQL Server
financial view documentation pricing
statistics view documentation pricing
math view documentation pricing
engineering view documentation pricing
strings view documentation pricing
financial-options view documentation pricing
windowing view documentation pricing

XLeratorDB Compilation packages for SQL Server
Suite incl: financial, statistics, math, engineering & strings pricing
Suite (Developer) requires SQL Server Developer Edition pricing
Suite (Subscription) One-year non-recurring license pricing

SuitePLUS incl: all Suite packages PLUS financial-options pricing
SuitePLUS (Developer) requires SQL Server Developer Edition, also incl: financial-options pricing
SuitePLUS (Subscription) One-year non-recurring license, also incl: financial-options pricing

XLeratorDLL function packages Microsoft .NET API Library
financial (DLL) view documentation pricing SQL Server not required

View All Product Pricing ...

Download Free 15 Day Trial ...

Documentation

Purchase

XLeratorDB function packages for SQL Server (2008 & later)
financial
statistics
math
engineering
strings
financial-options
windowing

XLeratorDB Compilation packages for SQL Server (2008 & later)
Suite
Suite (Developer)
Suite (Subscription)

SuitePLUS
SuitePLUS (Developer)
SuitePLUS (Subscription)

XLeratorDLL function packages Microsoft .NET API Library
financial (DLL)

Legacy XLeratorDB Packages for SQL Server 2005
financial for SQL Server 2005 only
statistics for SQL Server 2005 only
math for SQL Server 2005 only

Suite for SQL Server 2005 only
Suite (Developer) for SQL Server 2005 only
SuitePLUS for SQL Server 2005 only
SuitePLUS (Developer) for SQL Server 2005 only

Download Trial
Case Studies
Blog
Support

Introducing XLeratorDB/statistics for SQL Server 2008

Oct 8

Written by: Charles Flock
10/8/2010 8:20 AM

We have changed 14 scalar functions to take advantage of the multi-input aggregate capabilities of SQL Server 2008 and improved performance more than twenty-five-fold.

We are happy to announce that we have re-implemented the following functions as multi-input aggregate functions in SQL Server 2008:

· CORREL

· COVAR

· DEVSQ

· FORECAST

· GROWTH

· INTERCEPT

· KURT

· PEARSON

· RSQ

· SKEW

· SLOPE

· STEYX

· TREND

· ZTEST

We found that after implementing these functions as aggregates, they performed up to twenty-five times faster than their scalar equivalents in our test harness.

As aggregate functions, they require a new syntax. For example, given the following table and data:

CREATE TABLE #c (

PROD nvarchar(10),

YR bigint,

SALES float)

INSERT INTO #c VALUES ('ABC',2005,10000)

INSERT INTO #c VALUES ('ABC',2006,11000)

INSERT INTO #c VALUES ('ABC',2007,12000)

INSERT INTO #c VALUES ('ABC',2008,13000)

INSERT INTO #c VALUES ('ABC',2009,14000)

INSERT INTO #c VALUES ('DEF',2005,10000)

INSERT INTO #c VALUES ('DEF',2006,12000)

INSERT INTO #c VALUES ('DEF',2007,14000)

INSERT INTO #c VALUES ('DEF',2008,16000)

INSERT INTO #c VALUES ('DEF',2009,18000)

INSERT INTO #c VALUES ('GHI',2005,10000)

INSERT INTO #c VALUES ('GHI',2006,12000)

INSERT INTO #c VALUES ('GHI',2007,14000)

INSERT INTO #c VALUES ('GHI',2008,12000)

INSERT INTO #c VALUES ('GHI',2009,10000)

in SQL Server 2005, we would enter the following SQL to calculate the SLOPE:

SELECT PROD,

wct.SLOPE('#c','sales','yr','PROD',PROD)

FROM #C

GROUP BY PROD

This produdes the following result.

PROD SLOPE

---------- ----------------------

ABC 1000

DEF 2000

GHI 0

(3 row(s) affected)

In SQL Server 2008 the SQL looks like this:

SELECT PROD,

wct.SLOPE(SALES,YR) as SLOPE

FROM #C

GROUP BY PROD

which produces exactly the same result.

PROD SLOPE

---------- ----------------------

ABC 1000

DEF 2000

GHI 0

(3 row(s) affected)

A similar change occurs in the FORECAST function where this:

SELECT PROD,

wct.FORECAST(2010, '#c','sales','yr','PROD',PROD) as [2010 Sales]

FROM #C

GROUP BY PROD

producing the following result

PROD 2010 Sales

---------- ----------------------

ABC 15000

DEF 20000

GHI 11600

(3 row(s) affected)

becomes this

SELECT PROD,

wct.FORECAST(2010,SALES,YR) as [2010 Sales]

FROM #C

GROUP BY PROD

returning the same result.

PROD 2010 Sales

---------- ----------------------

ABC 15000

DEF 20000

GHI 11600

(3 row(s) affected)

Prior to making this change, XLeratorDB had implemented a type of scalar function that we called ‘range queries’. The range query allows many rows of multi-input input data for a function to return a single result. In SQL Server 2005, aggregate functions could only have a single input. Range queries overcame that limitation. Implemented as scalar functions, they could not take advantage of the aggregate function architecture. Many of these functions would collect multiple rows and columns of data as well as scalar values to perform complex financial, statistical, or mathematical calculations.

In SQL Server 2008, it became possible to re-implement some of these scalar range-query functions as multi-input aggregate functions. One of our clients was using the SLOPE function on a database containing 430,000 rows and his query ran in under twelve seconds. He asked to see if we could get the time down to around a second. In order to do this, we took a hard look at the new multi-input aggregate function feature in SQL Server 2008.

We realized almost immediately that functions like these required a ‘one-pass’ solution. The existing XLeratorDB implementations were ‘two-pass’ solutions, matching the precision of the comparable EXCEL functions and, initially, we were worried that the potential loss of significance in a one-pass solution might be a problem. This led us to decide to make both options available, so there are one-pass and two-pass solutions available in XLeratorDB/statistics 2008. However, the two-pass implementation is always slower.

The one-pass implementation lends itself very nicely to the features of the aggregate function. We did not need to keep track of the order of the data received into the function and there was a simple calculation performed at the end of the processing.

Is the one-pass calculation vs. two-pass calculation really much of an issue? On a practical level, I do not think so. This is an example that we got from the EXCEL knowledge base (which you can find here).

CREATE TABLE #n(

y float,

x float

)

INSERT INTO #n VALUES (1, 1000000003)

INSERT INTO #n VALUES (2, 1000000004)

INSERT INTO #n VALUES (3, 1000000002)

INSERT INTO #n VALUES (4, 1000000005)

INSERT INTO #n VALUES (5, 1000000004)

INSERT INTO #n VALUES (6, 1000000007)

SELECT wct.SLOPE(y,x) as [One-Pass]

,wct.SLOPE_q('SELECT y, x from #n') as [Two-Pass]

from #n

In this example we are using the one-pass multi-input aggregate SLOPE function and the two-pass range query (scalar) SLOPE_q function and displaying the results side-by-side. This produces the following result.

One-Pass Two-Pass

---------------------- ----------------------

NULL 0.775280898876404

(1 row(s) affected)

The one-pass solution returns a NULL and the two-pass solution returns the same value as EXCEL. As the EXCEL documentation points out, this occurs when the data contains a high number of significant digits combined with relatively little variation between data values. If your data meet these criteria, you can simply use the SLOPE_q function, or, if you prefer, use the SLOPE function and adjust the x-values by the mean of x:

SELECT wct.SLOPE(y,x-c.mean) as SLOPE

FROM (

SELECT AVG(x) as mean

from #n

) c, #n

This produces the following result.

SLOPE

----------------------

0.775280898876405

(1 row(s) affected)

We picked these 14 functions because they lent themselves quite nicely to this type of implementation. Other range query functions, like AVEDEV for example, did not lend themselves to one-pass solutions.

The improvement in performance in our environment was spectacular. We compared the results of the scalar functions to the aggregate functions on a Dell Inspiron 5305 with an Intel Pentium Dual CPU E2160 processor @ 1.80GHz with 4 GB of RAM running Windows Vista. The following table summarizes the performance improvements:

Aggregate	Scalar	Performance Multiple
CORREL	CORREL_q	9.6
COVAR	COVAR_q	25.3
DEVSQ	DEVSQ_q	15.6
FORECAST	FORECAST_q	19.2
GROWTH	GROWTH_q	14.9
INTERCEPT	INTERCEPT_q	23.4
KURT	KURT_q	14.9
PEARSON	PEARSON_q	11.3
RSQ	RSQ_q	11.8
SKEW	SKEW_q	15.6
SLOPE	SLOPE_q	24.3
STEYX	STEYX_q	20.2
TREND	TREND_q	19.5
ZTEST	ZTEST_q	11

Our tests were run against datasets containing anywhere from 1.5 million to 5.5 million rows, depending on the function, comprising 100,000 rows in the resultant table for each function. The following table summarizes the average number of rows processed each second for each function:

Aggregate	Rows per second	Scalar	Rows per second
CORREL	570,002	CORREL_q	59,540
COVAR	599,867	COVAR_q	23,685
DEVSQ	761,705	DEVSQ_q	48,685
FORECAST	376,916	FORECAST_q	19,615
GROWTH	264,807	GROWTH_q	17,717
INTERCEPT	475,366	INTERCEPT_q	20,342
KURT	709,185	KURT_q	47,577
PEARSON	514,062	PEARSON_q	45,420
RSQ	541,770	RSQ_q	45,890
SKEW	723,648	SKEW_q	46,417
SLOPE	488,503	SLOPE_q	20,140
STEYX	1,439,992	STEYX_q	71,226
TREND	391,245	TREND_q	20,019
ZTEST	514,829	ZTEST_q	46,937

We believe that for the first time, you will be able to perform these types of calculations on large datasets directly in T-SQL.

Here is one example: If you want to calculate the next year’s sales for every company listed in the NYSE, NASDAQ, FTSE and any other exchange you can think of (assuming that you have the historical financial data) using the FORECAST function, now you can.
In about a second.

Trackback Print

Tags:

Categories:

Location: Blogs Parent Separator

The WestClinTech Blog

Search Blogs

Keywords

Phrase

Blog Archives

Products

Support

Contact Us
FAQ’s
Blog
XLeratorDB Documentation
- Financial
- Financial-Options
- Statistics
- Math
- Engineering
- Strings
- Windowing
XLeratorDLL Documentation
- Financial-DLL
XLeratorDB Installation Guide

XLeratorDB function packages
for SQL Server

XLeratorDB Compilation packages
for SQL Server

XLeratorDLL function packages
Microsoft .NET API Library

XLeratorDB function packages for
SQL Server (2008 & later)

XLeratorDB Compilation packages for
SQL Server (2008 & later)

XLeratorDLL function packages
Microsoft .NET API Library

Legacy XLeratorDB Packages for
SQL Server 2005

Introducing XLeratorDB/statistics for SQL Server 2008

Search Blogs

Blog Archives

Products

Support

About

Pricing

Contact us

About us

FAQ

XLeratorDB function packagesfor SQL Server

XLeratorDB Compilation packagesfor SQL Server

XLeratorDLL function packagesMicrosoft .NET API Library

XLeratorDB function packages for SQL Server (2008 & later)

XLeratorDB Compilation packages for SQL Server (2008 & later)

XLeratorDLL function packagesMicrosoft .NET API Library

Legacy XLeratorDB Packages for SQL Server 2005

Introducing XLeratorDB/statistics for SQL Server 2008

Search Blogs

Blog Archives

Products

Support

About

Pricing

Contact us

About us

FAQ

XLeratorDB function packages
for SQL Server

XLeratorDB Compilation packages
for SQL Server

XLeratorDLL function packages
Microsoft .NET API Library

XLeratorDB function packages for
SQL Server (2008 & later)

XLeratorDB Compilation packages for
SQL Server (2008 & later)

XLeratorDLL function packages
Microsoft .NET API Library

Legacy XLeratorDB Packages for
SQL Server 2005