WestClinTech - SQL Server Functions - Blog - It aint over ‘til it’s over

Home

XLeratorDB function packages for SQL Server
financial view documentation pricing
statistics view documentation pricing
math view documentation pricing
engineering view documentation pricing
strings view documentation pricing
financial-options view documentation pricing
windowing view documentation pricing

XLeratorDB Compilation packages for SQL Server
Suite incl: financial, statistics, math, engineering & strings pricing
Suite (Developer) requires SQL Server Developer Edition pricing
Suite (Subscription) One-year non-recurring license pricing

SuitePLUS incl: all Suite packages PLUS financial-options pricing
SuitePLUS (Developer) requires SQL Server Developer Edition, also incl: financial-options pricing
SuitePLUS (Subscription) One-year non-recurring license, also incl: financial-options pricing

XLeratorDLL function packages Microsoft .NET API Library
financial (DLL) view documentation pricing SQL Server not required

View All Product Pricing ...

Download Free 15 Day Trial ...

Documentation

Purchase

XLeratorDB function packages for SQL Server (2008 & later)
financial
statistics
math
engineering
strings
financial-options
windowing

XLeratorDB Compilation packages for SQL Server (2008 & later)
Suite
Suite (Developer)
Suite (Subscription)

SuitePLUS
SuitePLUS (Developer)
SuitePLUS (Subscription)

XLeratorDLL function packages Microsoft .NET API Library
financial (DLL)

Legacy XLeratorDB Packages for SQL Server 2005
financial for SQL Server 2005 only
statistics for SQL Server 2005 only
math for SQL Server 2005 only

Suite for SQL Server 2005 only
Suite (Developer) for SQL Server 2005 only
SuitePLUS for SQL Server 2005 only
SuitePLUS (Developer) for SQL Server 2005 only

Download Trial
Case Studies
Blog
Support

It aint over ‘til it’s over

May 18

Written by: Charles Flock
5/18/2010 8:19 PM

When is it time to admit that your favorite baseball team is not going to play in the post-season?

Casey Stengel, the famous NY Yankee manager, once said of the baseball season: no matter what you do you will win 50 games and lose 50 games; it’s what you do with the other 50 games that really matters. (This was when the season was only 154 games, today he would say 55).

This week a Yankee announcer commented that the Boston Red Sox need to get moving before they fall too far behind division leaders TampaBay and NY. Since, MLB plays 162 games, I was struck by the idea that this early in the season, there would be some sense of urgency about having to win games. The major league baseball season is a little more than 1/5^th complete – let’s call it 33 games played – and I began to wonder what these early results might tell us about the play-off picture.

I thought it would be cool to see if we could use Monte Carlo in SQL Server to see if we can’t make some informed guesses about who might make the playoffs. I am going to keep this model very simple and make lots of simplifying assumptions.

Let’s start with the standings in the American League East. While it is early, you can see that the New York Yankees and the Tampa Bay Rays are in a close race for first place in their division, and Boston is already 5.5 games back. (To be fair, Boston has won its last three games).

Team	Wins	Losses	Pct	GB	WCGB
TampaBay	23	10	.697	-	-
New York	21	10	.677	1.0	-
Toronto	19	16	.543	5.0	4.0
Boston	18	16	.529	5.5	4.5
Baltimore	9	24	.273	14.0	13.0

The way that we frame the problem determines the kind of Monte Carlo simulation we will build. I am going to assume (for now) that if an American League East team wins 95 games they will make the playoffs either as division champion or as the wild card entry. Ninety-five wins equates to a .586 winning percentage.

With just this one assumption, we can build a model that will calculate the probability of a team winning 95 games given its current record, everything else being equal. We can use the SERIESINT function to randomly generate wins and losses for a specified number of games. The following statement will produce a 1 for a win and a zero for a loss.

SELECT seq

,CASE
WHEN SERIESVALUE < 587 THEN 1
ELSE 0
END as wins

FROM wct.SERIESINT(1,1000,1,33,'R')

We have specified in the SERIESINT table-valued function that we want to randomly generate 33 integer values between 1 and 1,000. In the SELECT statement we have specified that if the randomly-generated number is less than 587 (which is based on our .586 winning percentage for 162 games), return a 1, otherwise return a 0.

This will return 33 rows of 0 or 1. What we really want to know is how many wins. Moreover, we do not just want to run one simulation, we want to run 10,000 (or more).

This statement will record the results for 10,000 simulations of the first 33 games of the season:

SELECT num

,SUM(n.wins)as record

FROM(

SELECT wct.TRUNC((seq-1)/33, 0)as num

      ,CASE
            WHEN SERIESVALUE < 587 THEN 1
            ELSE 0
       END as wins

FROM wct.SERIESINT(1,1000,1,330000,'R')

) n

GROUPBY num

This statement uses a derived table, n, which has two columns, num and wins. The num column is a way of grouping the wins together for a particular trial and is calculated using the TRUNC function such that every 33 rows are grouped together. Running this statement returns 10,000 rows, so we are going to summarize the data further.

What we are really interested in is calculating the number of times a trial returns a specified number of wins. The following statement will do that (your results will vary because of the random numbers).

SELECT record

,COUNT(*)as [Number of Trials]

FROM(

SELECT num

,SUM(n.wins)as record

FROM(

SELECT wct.TRUNC((seq-1)/33, 0)as num

            ,CASE
                  WHEN SERIESVALUE < 587 THEN 1
                  ELSE 0
             END as wins

FROM wct.SERIESINT(1,1000,1,330000,'R')

) n

GROUP BY num

) m

GROUP BY record

ORDER BY record

The results will look something like this.

So, this looks nice, especially since it looks like the results are normally-distributed, but what exactly is it telling us? A team that has 9 wins after 33 games (so they have a 9 – 24 record), won exactly 95 games 1 time out 10,000 trials. A team that has 10 wins after 33 games won exactly 95 games 13 times out of 10,000 trials. However, a team that has 28 wins after 33 games won exactly 95 games 10 times out of 10,000 trials. So the probability for a team with a 28 – 5 record winning 95 games is less than the probability of a team that is 10 – 23?

What we have calculated is the probability of winning exactly 95 games. This is called the probability density function. The team with a 28 – 5 record will probably win more than 95 games. So, what we really want to calculate is the probability of a team winning at least 95 games. This is called the cumulative distribution function.

To calculate the cumulative distribution function, all we need to do is add up the probability density functions that come before it. This requires us to convert the number of wins to a probability, which we do simply by dividing by the number of trials, and then using the SUM function do the addition. To keep the SQL simple, I am going to insert my interim results into a temporary table and calculate the cumulative distribution from that.

SELECT record

,COUNT(*)as [Number of Trials]

INTO #w

FROM(

SELECT num

,SUM(n.wins)as record

FROM(

SELECT wct.TRUNC((seq-1)/33, 0)as num

            ,CASE
                 WHEN SERIESVALUE < 587 THEN 1
                 ELSE 0
             END as wins

FROM wct.SERIESINT(1,1000,1,330000,'R')

) n

GROUP BY num

) m

GROUP BY record

SELECT w1.record

,sum(w2.[Number of Trials])/cast(10000 as float) as [Cumulative Distribution]

FROM #w w1, #w w2

WHERE w2.record <= w1.record

GROUP BY w1.record,w1.[Number of Trials]

ORDER BY 1

As you can see, I inserted my results into a temporary table, #w, which I then used in the second SELECT to calculate the cumulative distribution function. This returned the following results.

Now we can see that a team with a 9 – 24 record (the Baltimore Orioles) has a probability of .0001 (.01%) of winning at least 95 games, whereas a team with a 23 – 10 record (the Tampa Bay Rays) has a probability of .9327 (93.27%) of winning at least 95 games, all things remaining the same.

How about the Yankees? They have played 31 games, not 33 games, so we need to change the number of games in our SELECT. We will take this opportunity to create some variables so that it is easier to run this SELECT for any number of games and any number of iterations.

DECLARE @num_iterations as float,

@num_games as float,

@season_wins as float,

@record as float

SET @num_iterations = 10000

SET @num_games = 31

SET @season_wins = 96

SET @record =Round(@season_wins/cast(162 as float), 3)* 1000 + 1

SELECT record

,COUNT(*)as [Number of Trials]

INTO #w

FROM(

SELECT num

,SUM(n.wins)as record

FROM(

SELECT wct.TRUNC((seq-1)/@num_games, 0) as num

            ,CASE
                 WHEN SERIESVALUE < @record THEN 1
                 ELSE 0
             END as wins

FROM wct.SERIESINT(1,1000,1,@num_games * @num_iterations,'R')

) n

GROUP BY num

) m

GROUP BY record

SELECT w1.record

,sum(w2.[Number of Trials])/cast(@num_iterations as float) as [Cumulative Distribution]

FROM #w w1, #w w2

WHERE w2.record <= w1.record

GROUP BY w1.record,w1.[Number of Trials]

ORDER BY 1

When I run this simulation, the Yankees have a better than .87 probability of winning 96 games based on their 21 – 10 record. The Red Sox, with an 18 – 16 record, have just a little less than a .28 probability of winning 96 games. In other words, the Yankees are 3 times as likely to win 96 games as the Red Sox and the Rays are 3.33 times as likely as the Red Sox to win 96 games, all things remaining the same.

The key phrase here, of course, is: all things remaining the same. The Orioles, with a probability of .0001 want to make some changes. Mostly, they just want their injured players to get healthy. The Rays, with a probability of .9327 want to keep everything the same. This being baseball, that is unlikely. Players get injured, traded, and return to health. Weather affects games and schedules. Lots of things don’t remain equal.

That’s what makes the game interesting. It is also what gives every die-hard fan and season ticket holders of lagging teams HOPE. . .that this will be the year the magic happens.

UPDATING THE MODEL

The following script will create a table-valued function that will allow you to run this analysis any time, adjusting the variables as you see fit, during the course of the season. If you are new to SQL Server, you can get SQL Server Express for free and XLeratorDB Express for free and do this analysis for yourself.

SET ANSI_NULLS ON

SET QUOTED_IDENTIFIER ON

-- =============================================

-- Author: Charles Flock

-- Create date: 05/12/2010

-- Description: Monte Carlo simulation for winning records in baseball

-- =============================================

CREATE FUNCTION [dbo].[fn_simWL]

(

-- Add the parameters for the function here

@num_iterations as float,

@num_games as float,

@season_wins as float

)

RETURNS @wlCDF

TABLE

(

-- Add the column definitions for the TABLE variable here

wins float,

losses float,

cdf float

)

BEGIN

-- Add the SELECT statement with parameter references here

DECLARE @record as float

DECLARE @w TABLE(

record float,

[Number of trials] float

)

SET @record =Round(@season_wins/cast(162 as float), 3)* 1000 + 1

INSERT INTO @w

SELECT record

,COUNT(*) as [Number of Trials]

FROM(

SELECT num

,SUM(n.wins) as record

FROM(

SELECT wct.TRUNC((seq-1)/@num_games, 0)as num

                  ,CASE
                       WHEN SERIESVALUE < @record THEN 1
                       ELSE 0
                   END as wins

FROM wct.SERIESINT(1,1000,1,@num_games * @num_iterations,'R')

) n

GROUP BY num

) m

GROUP BY record

INSERT INTO @wlCDF

SELECT w1.record

,@num_games - w1.record

,sum(w2.[Number of Trials])/cast(@num_iterations as float)

FROM @w w1, @w w2

WHERE w2.record <= w1.record

GROU PBY w1.record,w1.[Number of Trials]

RETURN

END

And now, I can just enter the following statement to run the Monte Carlo simulation 100,000 times:

SELECT *

FROM dbo.fn_simWL(100000,33,96)
ORDER BY wins

Trackback Print

Tags:

Categories:

Location: Blogs Parent Separator

The WestClinTech Blog

Search Blogs

Keywords

Phrase

Blog Archives

Products

Support

Contact Us
FAQ’s
Blog
XLeratorDB Documentation
- Financial
- Financial-Options
- Statistics
- Math
- Engineering
- Strings
- Windowing
XLeratorDLL Documentation
- Financial-DLL
XLeratorDB Installation Guide

XLeratorDB function packages
for SQL Server

XLeratorDB Compilation packages
for SQL Server

XLeratorDLL function packages
Microsoft .NET API Library

XLeratorDB function packages for
SQL Server (2008 & later)

XLeratorDB Compilation packages for
SQL Server (2008 & later)

XLeratorDLL function packages
Microsoft .NET API Library

Legacy XLeratorDB Packages for
SQL Server 2005

It aint over ‘til it’s over

Search Blogs

Blog Archives

Products

Support

About

Pricing

Contact us

About us

FAQ

XLeratorDB function packagesfor SQL Server

XLeratorDB Compilation packagesfor SQL Server

XLeratorDLL function packagesMicrosoft .NET API Library

XLeratorDB function packages for SQL Server (2008 & later)

XLeratorDB Compilation packages for SQL Server (2008 & later)

XLeratorDLL function packagesMicrosoft .NET API Library

Legacy XLeratorDB Packages for SQL Server 2005

It aint over ‘til it’s over

Search Blogs

Blog Archives

Products

Support

About

Pricing

Contact us

About us

FAQ

XLeratorDB function packages
for SQL Server

XLeratorDB Compilation packages
for SQL Server

XLeratorDLL function packages
Microsoft .NET API Library

XLeratorDB function packages for
SQL Server (2008 & later)

XLeratorDB Compilation packages for
SQL Server (2008 & later)

XLeratorDLL function packages
Microsoft .NET API Library

Legacy XLeratorDB Packages for
SQL Server 2005