WestClinTech - SQL Server Functions - Blog

Home

XLeratorDB function packages for SQL Server
financial view documentation pricing
statistics view documentation pricing
math view documentation pricing
engineering view documentation pricing
strings view documentation pricing
financial-options view documentation pricing
windowing view documentation pricing

XLeratorDB Compilation packages for SQL Server
Suite incl: financial, statistics, math, engineering & strings pricing
Suite (Developer) requires SQL Server Developer Edition pricing
Suite (Subscription) One-year non-recurring license pricing

SuitePLUS incl: all Suite packages PLUS financial-options pricing
SuitePLUS (Developer) requires SQL Server Developer Edition, also incl: financial-options pricing
SuitePLUS (Subscription) One-year non-recurring license, also incl: financial-options pricing

XLeratorDLL function packages Microsoft .NET API Library
financial (DLL) view documentation pricing SQL Server not required

View All Product Pricing ...

Download Free 15 Day Trial ...

Documentation

Purchase

XLeratorDB function packages for SQL Server (2008 & later)
financial
statistics
math
engineering
strings
financial-options
windowing

XLeratorDB Compilation packages for SQL Server (2008 & later)
Suite
Suite (Developer)
Suite (Subscription)

SuitePLUS
SuitePLUS (Developer)
SuitePLUS (Subscription)

XLeratorDLL function packages Microsoft .NET API Library
financial (DLL)

Legacy XLeratorDB Packages for SQL Server 2005
financial for SQL Server 2005 only
statistics for SQL Server 2005 only
math for SQL Server 2005 only

Suite for SQL Server 2005 only
Suite (Developer) for SQL Server 2005 only
SuitePLUS for SQL Server 2005 only
SuitePLUS (Developer) for SQL Server 2005 only

Download Trial
Case Studies
Blog
Support

What is Normal?

Feb 24

Written by: Charles Flock
2/24/2009 7:41 PM

One of the big problems that you may have in moving from the spreadsheet paradigm to the database paradigm is in ‘normalizing’ your data. In this article, I will attempt to explain what that means.

If you are like me and have been using spreadsheets since the days of Visicalc (and Lotus 1-2-3 and QuattroPro), it is quite normal to visualize data in terms of rows (horizontal slices of data) and columns (vertical slices of data). Just imagine financial data (or if you can’t, go to some financial site like Moneycentral or the Wall Street Journal). In some cases, you will find the column headings to be a date, usually representing the end of a reporting period, and the rows will be some piece of financial information to be reported on; sales (or revenue), cost of sales, operating expenses, net income, etc. Here’s a fairly standard representation of a cash flow statement:

2008

2007

2006

2005

2004

Period End Date

06/30/2008

06/30/2007

06/30/2006

06/30/2005

06/30/2004

Period Length

12 Months

Stmt Source

10-K

Stmt Source Date

07/31/2008

08/03/2007

08/25/2006

08/26/2005

Stmt Update Type

Updated

Restated

Net Income/Starting Line

17,681.0

14,065.0

12,599.0

12,254.0

8,168.0

Depreciation/Depletion

2,056.0

1,440.0

903.0

855.0

1,186.0

Amortization

0.0

Deferred Taxes

935.0

421.0

219.0

-179.0

-1,479.0

Non-Cash Items

787.0

1,181.0

1,356.0

2,589.0

5,538.0

Unusual Items

Other Non-Cash Items

Changes in Working Capital

153.0

689.0

-673.0

1,086.0

1,213.0

Accounts Receivable

Other Assets

Other Liabilities

Cash from Operating Activities

21,612.0

17,796.0

14,404.0

16,605.0

14,626.0

Capital Expenditures

-3,182.0

-2,264.0

-1,578.0

-812.0

-1,109.0

Purchase of Fixed Assets

Other Investing Cash Flow Items, Total

-1,405.0

8,353.0

9,581.0

15,839.0

-2,233.0

Acquisition of Business

Sale/Maturity of Investment

Purchase of Investments

Other Investing Cash Flow

Cash from Investing Activities

-4,587.0

6,089.0

8,003.0

15,027.0

-3,342.0

Financing Cash Flow Items

120.0

54.0

89.0

-18.0

0.0

Other Financing Cash Flow

Total Cash Dividends Paid

-4,015.0

-3,805.0

-3,545.0

-36,112.0

-1,729.0

Issuance (Retirement) of Stock, Net

-9,039.0

-20,793.0

-17,106.0

-4,948.0

-635.0

Issuance (Retirement) of Debt, Net

0.0

Cash from Financing Activities

-12,934.0

-24,544.0

-20,562.0

-41,078.0

-2,364.0

Foreign Exchange Effects

137.0

56.0

18.0

-7.0

27.0

Net Change in Cash

4,228.0

-603.0

1,863.0

-9,453.0

8,947.0

Net Cash - Beginning Balance

6,111.0

6,714.0

4,851.0

14,304.0

5,357.0

Net Cash - Ending Balance

10,339.0

6,111.0

6,714.0

4,851.0

14,304.0

It turns out, though, from a data base perspective, this starts to present some serious challenges. To overcome those challenges, the concept of normalization was developed.

The simplest explanation of normalization that I can offer is the following. A table is normalized when you can find the data that you want using the key, the whole key, and nothing but the key. If you use this simple formula, I can guarantee that your table will be sufficiently normalized.

In this case, if we want to put this data in database to do some further analysis, its structure presents us with some problems. Why put it in a database? What if I want to compare 20 companies? What about 200 or 2,000 or 20,000? At some point, it just becomes impractical to try and manage that kind of volume in a spreadsheet.

The first thing that we have to realize is that we want to represent all companies in the same way on the database. This allows us to calculate and compare information in a wide variety of ways, with the assurance that the data has been consistently rendered.

Secondly, we then want to store data in such a way, that we will not have to change the database structure as new data is provided. In the above example, when the 2009 results are recorded, we need add another column to store those results. While you can do this in a spreadsheet, doing it in a database is problematic; none of your existing queries, views, stored procedures, etc. will recognize this column, or, perhaps the column will turn up in unexpected places, if you have used ‘SELECT *’ anywhere.

Let’s get right to the issue, then, of how to normalize this data. First, as I pointed out previously, we need to store information about the company. We could do that any number of ways, by using the company name, by using a number, or by using a symbol of some type. It is important, though, that whatever you pick uniquely identifies the company, as you would not want to mix company results together. Just to keep things simple, we will use the company’s stock symbol, though in real life, it might make more sense to use a number and associate the number with a symbol or the company name.

Next, look at the column headings and see if there is any way to generalize the headings. To me, it seems straightforward that the columns represent fiscal years.

Now I look at the rows. It seems to me that there are 2 distinct types of data. At the top of the table, there is some information about the FY, consisting of Period End Date, Period Length, Stmt Source, and Stmt Update Type. Then there is actual financial data, which consist of a description of the financial activity (net income, depreciation, changes in working capital, etc.). For purposes of this exercise, let’s focus on the financial data.

I think that at this point we can safely say that we can uniquely the financial information that we are interested in as consisting of the symbol, fiscal year, description of the financial activity, and the amount. We can then implement a simple table which consists of 4 columns:

o SYMBOL

o FY

o FINANCIAL ACTIVITY

o AMOUNT

with first three fields being the key. This means that these fields uniquely identify the remaining field in this table; the key, the whole key and nothing but the key.

Our data would then go into the database looking like this (and I will only do the Cash from Operating Activities):

SYMBOL	FY	FINANCIAL ACTIVITY	AMOUNT
XYZ	2008	Net Income	17681.00
XYZ	2007	Net Income	14065.00
XYZ	2006	Net Income	12599.00
XYZ	2005	Net Income	12254.00
XYZ	2004	Net Income	8168.00
XYZ	2008	Depreciation	2056.00
XYZ	2007	Depreciation	1440.00
XYZ	2006	Depreciation	903.00
XYZ	2005	Depreciation	855.00
XYZ	2004	Depreciation	1186.00
XYZ	2008	Amortization	0.00
XYZ	2007	Amortization	0.00
XYZ	2006	Amortization	0.00
XYZ	2005	Amortization	0.00
XYZ	2004	Amortization	0.00
XYZ	2008	Deferred Taxes	935.00
XYZ	2007	Deferred Taxes	421.00
XYZ	2006	Deferred Taxes	219.00
XYZ	2005	Deferred Taxes	-179.00
XYZ	2004	Deferred Taxes	-1479.00
XYZ	2008	Non-Cash Items	787.00
XYZ	2007	Non-Cash Items	1181.00
XYZ	2006	Non-Cash Items	1356.00
XYZ	2005	Non-Cash Items	2589.00
XYZ	2004	Non-Cash Items	5538.00
XYZ	2008	Changes in Working Capital	153.00
XYZ	2007	Changes in Working Capital	689.00
XYZ	2006	Changes in Working Capital	-673.00
XYZ	2005	Changes in Working Capital	1086.00
XYZ	2004	Changes in Working Capital	1213.00

When data is ready for 2009 or beyond, there is no need to change the layout of the table. I can have as money companies as I want, for as many fiscal years as I want. If I want to compare results across many companies, it becomes very straightforward and easy. It now becomes possible, for example, to use the FORECAST function on the database to predict 2009 Revenue (for example), for all companies, selected companies, any combination of companies that you might think of, and get the result in less than a second.

Take the time to normalize when you move your data from your spreadsheet and into the database. It will pay big dividends in terms of consistency, reliability, and speed of processing.

Trackback Print

Tags:

Categories:

Location: Blogs Parent Separator

The WestClinTech Blog

Search Blogs

Keywords

Phrase

Blog Archives

Products

Support

Contact Us
FAQ’s
Blog
XLeratorDB Documentation
- Financial
- Financial-Options
- Statistics
- Math
- Engineering
- Strings
- Windowing
XLeratorDLL Documentation
- Financial-DLL
XLeratorDB Installation Guide

XLeratorDB function packages
for SQL Server

XLeratorDB Compilation packages
for SQL Server

XLeratorDLL function packages
Microsoft .NET API Library

XLeratorDB function packages for
SQL Server (2008 & later)

XLeratorDB Compilation packages for
SQL Server (2008 & later)

XLeratorDLL function packages
Microsoft .NET API Library

Legacy XLeratorDB Packages for
SQL Server 2005

What is Normal?

Search Blogs

Blog Archives

Products

Support

About

Pricing

Contact us

About us

FAQ

XLeratorDB function packagesfor SQL Server

XLeratorDB Compilation packagesfor SQL Server

XLeratorDLL function packagesMicrosoft .NET API Library

XLeratorDB function packages for SQL Server (2008 & later)

XLeratorDB Compilation packages for SQL Server (2008 & later)

XLeratorDLL function packagesMicrosoft .NET API Library

Legacy XLeratorDB Packages for SQL Server 2005

What is Normal?

Search Blogs

Blog Archives

Products

Support

About

Pricing

Contact us

About us

FAQ

XLeratorDB function packages
for SQL Server

XLeratorDB Compilation packages
for SQL Server

XLeratorDLL function packages
Microsoft .NET API Library

XLeratorDB function packages for
SQL Server (2008 & later)

XLeratorDB Compilation packages for
SQL Server (2008 & later)

XLeratorDLL function packages
Microsoft .NET API Library

Legacy XLeratorDB Packages for
SQL Server 2005