WestClinTech - SQL Server Functions - Blog - What’s new in XLeratorDB Math 1.06

Home

XLeratorDB function packages for SQL Server
financial view documentation pricing
statistics view documentation pricing
math view documentation pricing
engineering view documentation pricing
strings view documentation pricing
financial-options view documentation pricing
windowing view documentation pricing

XLeratorDB Compilation packages for SQL Server
Suite incl: financial, statistics, math, engineering & strings pricing
Suite (Developer) requires SQL Server Developer Edition pricing
Suite (Subscription) One-year non-recurring license pricing

SuitePLUS incl: all Suite packages PLUS financial-options pricing
SuitePLUS (Developer) requires SQL Server Developer Edition, also incl: financial-options pricing
SuitePLUS (Subscription) One-year non-recurring license, also incl: financial-options pricing

XLeratorDLL function packages Microsoft .NET API Library
financial (DLL) view documentation pricing SQL Server not required

View All Product Pricing ...

Download Free 15 Day Trial ...

Documentation

Purchase

XLeratorDB function packages for SQL Server (2008 & later)
financial
statistics
math
engineering
strings
financial-options
windowing

XLeratorDB Compilation packages for SQL Server (2008 & later)
Suite
Suite (Developer)
Suite (Subscription)

SuitePLUS
SuitePLUS (Developer)
SuitePLUS (Subscription)

XLeratorDLL function packages Microsoft .NET API Library
financial (DLL)

Legacy XLeratorDB Packages for SQL Server 2005
financial for SQL Server 2005 only
statistics for SQL Server 2005 only
math for SQL Server 2005 only

Suite for SQL Server 2005 only
Suite (Developer) for SQL Server 2005 only
SuitePLUS for SQL Server 2005 only
SuitePLUS (Developer) for SQL Server 2005 only

Download Trial
Case Studies
Blog
Support

What’s new in XLeratorDB Math 1.06

Feb 20

Written by: Charles Flock
2/20/2012 2:59 PM

We have added some performance enhancements to the pseudo-random number generators, as well as a new function to generate random normal numbers.

We created scalar versions of the table-valued matrix functions, making the SQL much simpler and easier to follow. We also added a couple of new matrix functions for forward substitution and back substitution.

In the latest release of XLeratorDB/math and XLeratorDB/math 2008 we have added about two dozen scalar functions for linear algebra, many of which are comparable to the existing table-valued functions found in earlier math releases. The table-valued functions are still there, and in many cases will perform as well or better than the scalar implementations.

We have also tweaked the RAND and RANDBETWEEN functions so that they run a lot faster than in previous versions, and we added a RANDNORM function allowing you to generate pseudo-random numbers using the normal distribution. RANDNORM not only much easier than using the NORMINV and/or NORMSINV function, but is significantly faster. We also enhanced the SeriesFloat function to support random normal as an option in generating a series of random numbers.

Random number improvements

XLeratorDB has always included a RAND() function and a RANDBETWEEN() function, which use the .NET Random Class to return a result. Why do this, when SQL Server already has a built-in RAND() function? The following SQL demonstrates one of the shortcomings of the SQL Server built-in function:

with mycte as

(

select 1 as seq

union all

select seq + 1

from mycte

where seq < 15

)

select seq

,RAND() as [SQL Server RAND function]

FROM mycte

While the actual value for the random number will be different, you will see that the all 15 rows in the resultant table have the same value.

seq SQL Server RAND function

----------- ------------------------

1 0.83355267941491

2 0.83355267941491

3 0.83355267941491

4 0.83355267941491

5 0.83355267941491

6 0.83355267941491

7 0.83355267941491

8 0.83355267941491

9 0.83355267941491

10 0.83355267941491

11 0.83355267941491

12 0.83355267941491

13 0.83355267941491

14 0.83355267941491

15 0.83355267941491

Let’s compare the behavior of the built-in SQL Server function to XLeratorDB function.

with mycte as

(

select 1 as seq

union all

select seq + 1

from mycte

where seq < 15

)

select seq

,RAND() as [SQL Server RAND function]

,wct.RAND() as [XLeratorDB RAND function]

FROM mycte

Again, your random numbers will be different, but you will see that the XLeratorDB function is invoked for every row in the resultant table whereas the built-in SQL Server function in invoked once for the SELECT statement.

seq SQL Server RAND function XLeratorDB RAND function

----------- ------------------------ ------------------------

1 0.444139309225921 0.0687217502959403

2 0.444139309225921 0.876126038507577

3 0.444139309225921 0.0511527941534818

4 0.444139309225921 0.438098561318172

5 0.444139309225921 0.458887217966801

6 0.444139309225921 0.888243296179593

7 0.444139309225921 0.128806069278796

8 0.444139309225921 0.182973967730051

9 0.444139309225921 0.0980124961291151

10 0.444139309225921 0.146008610048462

11 0.444139309225921 0.0532174073695017

12 0.444139309225921 0.476973783652565

13 0.444139309225921 0.558249687159375

14 0.444139309225921 0.525847193682225

15 0.444139309225921 0.0362511808620208

The big improvement in XLeratorDB/math 1.06 is that the RAND function is much faster. Here are the results of some of our tests.

We created a numbers table and put a million rows in it.

CREATE TABLE [dbo].[numbers](

[num] [bigint] NOT NULL,

CONSTRAINT [PK_Numbers] PRIMARY KEY CLUSTERED (

[num] ASC)

)

INSERT INTO Numbers

SELECT seriesvalue

FROM wct.SeriesInt(1, 1000000, NULL, NULL, NULL)

Our perfromance test is then very simple. We selected the million rows we just put into the Numbers table and for each row, we calculate a random number, putting the result into a temp table. We then ran the following SQL to calculate the performance:

SET STATISTICS IO ON

SET STATISTICS TIME ON

select num

,wctMath.wct.RAND() as r

into #r

from Numbers

SET STATISTICS TIME OFF

SET STATISTICS IO OFF

DROP TABLE #r

This produced the following results.

Table 'Numbers'. Scan count 1, logical reads 2112, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

SQL Server Execution Times:

CPU time = 4400 ms, elapsed time = 4510 ms.

(1000000 row(s) affected)

1,000,000 rows in 4,510 milliseconds is a throughput of 221.7295 rows per millisecond.

When we perform the same test using XLeratorDB/math 1.05, we got the following results:

Table 'Numbers'. Scan count 1, logical reads 2112, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

SQL Server Execution Times:

CPU time = 166593 ms, elapsed time = 171909 ms.

(1000000 row(s) affected)

In this case, it takes 171,909 milliseconds to process 1,000,000 rows which is a throughput of 5.8170 rows per millisecond.

The XLeratorDB/math 1.06 version of RAND is slightly more than 38 times faster than the previous version. We see the same type of performance improvement in the RANDBETWEEN function.

New random number function

We have added a RANDNORM function in XLeratorDB/math 1.06. The RANDNORM function generates a pseudo-random number along a normal distribution defined by mean and a standard deviation. To use the standard normal distribution, set the mean to zero and that standard deviation to one.

Use the RANDNORM function when you don’t want to generate a uniform distribution of random numbers. Both the RAND function and the RANDBETWEEN function generate random numbers uniformly. What this means, for example, is that in the range -5 to 5, the RANDBETWEEN function is just as likely to generate a -4 as a one or a zero. However, if want to generate a random number along the standard normal distribution, we will generate many more results between -1 and 1 then we will anywhere else in the distribution. We can run the following SQL to and put the results into a graph to contrast the difference.

/*Generate a million Random Normal numbers*/

/*binned by .01 */

SELECT bin

,count(*)

FROM (

SELECT seriesvalue

,ROUND(wct.RANDNORM(0,1), 2) as bin

FROM wct.SeriesInt(1,1000000,NULL,NULL,NULL)

) n

GROUP BY bin

ORDER BY 1 ASC

/*Generate a million Random numbers between -5 and */

/*5 binned by .01 */

SELECT rnd

,COUNT(*)

FROM (

SELECT num

,ROUND(Cast(wct.RANDBETWEEN(-500,500) as float)/CAST(100 as float), 2) rnd

FROM Baseball.dbo.Numbers

) n

GROUP BY rnd

ORDER BY 1 ASC

Here is a comparison graph of the results.

New scalar linear algebra functions

Previous releases of XLeratorDB have included table-valued functions to perform some basic linear algebra computations like matrix multiplication, matrix inversions, the LU decomposition, and the QR decomposition. These functions are still there, and they let you perform quite sophisticated calculations, but the nature of the table-valued functions forces the materialization of interim results, which can make the calculations somewhat unwieldy.

For example, let’s assume that you wanted to do an ordinary least squares calculation using linear algebra. The OLS calculation is defined as:

And will return a series of coefficients that satisfies the equation:

Let’s look first at how we could perform this calculation in XLeratorDB/math 1.06. First, we need a table with some x- and y-values.

x1	x2	x3	x4	Y
2310	2	2	20	142000
2333	2	2	12	144000
2356	3	1.5	33	151000
2379	3	2	43	150000
2402	2	3	53	139000
2425	4	2	23	169000
2448	2	1.5	99	126000
2471	2	2	34	142900
2494	3	3	23	163000
2517	4	4	55	169000
2540	2	3	22	149000

In this table we have 4 columns of x-values and a column of y-values. Since we are going to calculate a y-intercept, we are going to get 5 coefficients returned from our formula.

One way that we could do this, is to put our data into 2 scalar variables, x and y.

DECLARE @x as varchar(max) = '1,2310,2,2,20;

1,2333,2,2,12;

1,2356,3,1.5,33;

1,2379,3,2,43;

1,2402,2,3,53;

1,2425,4,2,23;

1,2448,2,1.5,99;

1,2471,2,2,34;

1,2494,3,3,23;

1,2517,4,4,55;

1,2540,2,3,22'

DECLARE @y as varchar(max) = '142000;

144000;

151000;

150000;

139000;

169000;

126000;

142900;

163000;

169000;

149000'

We can then just do the calculation in a single SELECT statement:

SELECT *

FROM wct.MATRIX(wct.MATMULT(wct.MATINVERSE(wct.MATMULT(wct.TRANSPOSE(@x),@x)),wct.MATMULT(wct.TRANSPOSE(@x),@y)))

This produces the following result.

RowNum ColNum ItemValue

----------- ----------- ----------------------

0 0 52317.8305071266

1 0 27.6413873657511

2 0 12529.7681670853

3 0 2553.21066038766

4 0 -234.237164471067

The MATMULT, MATINVERSE, and TRANSPOSE are scalar functions which take a string representation of a matrix as input into the function and return a string representation of a matrix as output. The string representations use commas and column separators and semi-colons as row separators. The table-valued function MATRIX takes the string representation of the matrix and turns it into a third-normal form table representation. Since the ordinary least squares calculation returns a vector, the ColNum can be excluded from the results. With a slight change to the SQL, we can make the results line up with the OLS equation very nicely.

SELECT 'm' + CAST(RowNum as varchar(2)) as coefficient

,ItemValue as Val

FROM wct.MATRIX(wct.MATMULT(wct.MATINVERSE(wct.MATMULT(wct.TRANSPOSE(@x),@x)),wct.MATMULT(wct.TRANSPOSE(@x),@y)))

This produces the following result.

coefficient Val

----------- ----------------------

m0 52317.8305071266

m1 27.6413873657511

m2 12529.7681670853

m3 2553.21066038766

m4 -234.237164471067

Let’s contrast this by using the existing table-valued functions to perform the same calculation.

SELECT *

INTO #xy

FROM (VALUES

(0,2310,2,2,20,142000),

(1,2333,2,2,12,144000),

(2,2356,3,1.5,33,151000),

(3,2379,3,2,43,150000),

(4,2402,2,3,53,139000),

(5,2425,4,2,23,169000),

(6,2448,2,1.5,99,126000),

(7,2471,2,2,34,142900),

(8,2494,3,3,23,163000),

(9,2517,4,4,55,169000),

(10,2540,2,3,22,149000)

) n(r, x1, x2, x3, x4, y)

In this table we have 4 columns of x-values and a column of y-values. The r column is simply a way to uniquely identify the row. Since we are going to calculate a y-intercept, we are going to get 5 coefficients returned from our formula.

We will populate a table, #xprime, with the x-values, including an extra column of all ones, to account for the y-intercept.

SELECT r, c, xx

INTO #xprime

FROM (SELECT r,1 as x0,x1, x2, x3, x4 FROM #xy) n1

CROSS APPLY(VALUES

('0', x0),

('1', x1),

('2', x2),

('3', x3),

('4', x4)

) x(c, xx)

We then mulitply the transposed #xprime by #xprime and store that in #x_xprime.

SELECT *

into #x_xprime

FROM wct.MMULTN_q(

'SELECT c,r,xx FROM #xprime'

,'SELECT r,c,xx from #xprime'

)

We take the inverse of that and store it in #x_xprime_inverse.

SELECT *

into #x_xprime_inverse

FROM wct.MINVERSEN_q('SELECT RowNum, ColNum, ItemValue from #x_xprime')

We multiply the transposed #x_prime values by the y-values and store that in #xprime_y.

SELECT *

into #xprime_y

FROM wct.MMULTN_q(

'SELECT c,r,xx FROM #xprime'

,'SELECT r,0,y from #xy'

)

And now we can calculate the regression coefficients.

SELECT 'm' + Cast(RowNum as varchar) as coefficient

,ItemValue

FROM wct.MMULTN_q(

'SELECT RowNum,ColNum,ItemValue from #x_xprime_inverse'

,'SELECT RowNum,ColNum,ItemValue from #xprime_y'

)

This produces the following result.

coefficient ItemValue

------------------------------- ----------------------

m0 52317.8305073259

m1 27.6413873659951

m2 12529.7681670867

m3 2553.21066039095

m4 -234.237164471189

As you can see, there is no difference in the results. However, until now, it was not possible to do the calculation in a single select statement, and we had to create a number of temporary tables, which we then have to clean up when we are done.

In many situations you will need much less SQL with the scalar functions versus the table-valued functions. For larger matrices, however, the table-valued functions will perform better than the scalar functions, because converting to and from the string representation of the matrices is a relatively expensive operation. You can decide which works best in your environment.

We also created a number of functions to convert table-valued output into the string representation of the matrix: you can use MATRIX2STRING to convert a de-normalized table to a string and NMATRIX2STRING to convert a third-normal form table to a string. We also have created functions to materialize (as a string) the identity matrix (EYE), a matrix of ones (ONES), and matrix of zeroes (ZERO), a matrix of random numbers (MRAND), and a matrix of random normal numbers (MRANDN) .

There are functions that extract data from the string representation of the matrix. DIAG takes the diagonal of the matrix; MTRIL creates the lower triangular matrix of the input matrix; MTRIU takes the upper triangular matrix of the input matrix.

Finally, we created scalar versions of the QR decomposition (QR) and the LU decomposition (LU) as well as creating a matrix addition function (MATADD) and a matrix subtraction function (MATSUB) a forward substitution function (FWDSUB) and a backward substitution function (BKSUB).

If you are already using the math library, you can download the latest version for free. If you are not already using XLeratorDB, try the 15-day free trial today and discover how you can analyze big data in your SQL Server database.

Trackback Print

Tags:

Categories:

Location: Blogs Parent Separator

The WestClinTech Blog

Search Blogs

Keywords

Phrase

Blog Archives

Products

Support

Contact Us
FAQ’s
Blog
XLeratorDB Documentation
- Financial
- Financial-Options
- Statistics
- Math
- Engineering
- Strings
- Windowing
XLeratorDLL Documentation
- Financial-DLL
XLeratorDB Installation Guide

XLeratorDB function packages
for SQL Server

XLeratorDB Compilation packages
for SQL Server

XLeratorDLL function packages
Microsoft .NET API Library

XLeratorDB function packages for
SQL Server (2008 & later)

XLeratorDB Compilation packages for
SQL Server (2008 & later)

XLeratorDLL function packages
Microsoft .NET API Library

Legacy XLeratorDB Packages for
SQL Server 2005

What’s new in XLeratorDB Math 1.06

Search Blogs

Blog Archives

Products

Support

About

Pricing

Contact us

About us

FAQ

XLeratorDB function packagesfor SQL Server

XLeratorDB Compilation packagesfor SQL Server

XLeratorDLL function packagesMicrosoft .NET API Library

XLeratorDB function packages for SQL Server (2008 & later)

XLeratorDB Compilation packages for SQL Server (2008 & later)

XLeratorDLL function packagesMicrosoft .NET API Library

Legacy XLeratorDB Packages for SQL Server 2005

What’s new in XLeratorDB Math 1.06

Search Blogs

Blog Archives

Products

Support

About

Pricing

Contact us

About us

FAQ

XLeratorDB function packages
for SQL Server

XLeratorDB Compilation packages
for SQL Server

XLeratorDLL function packages
Microsoft .NET API Library

XLeratorDB function packages for
SQL Server (2008 & later)

XLeratorDB Compilation packages for
SQL Server (2008 & later)

XLeratorDLL function packages
Microsoft .NET API Library

Legacy XLeratorDB Packages for
SQL Server 2005