Login     Register

        Contact Us     Search

What is Normal?

Feb 24

Written by: Charles Flock
2/24/2009 7:41 PM  RssIcon

One of the big problems that you may have in moving from the spreadsheet paradigm to the database paradigm is in ‘normalizing’ your data. In this article, I will attempt to explain what that means.

If you are like me and have been using spreadsheets since the days of Visicalc (and Lotus 1-2-3 and QuattroPro), it is quite normal to visualize data in terms of rows (horizontal slices of data) and columns (vertical slices of data). Just imagine financial data (or if you can’t, go to some financial site like Moneycentral or the Wall Street Journal). In some cases, you will find the column headings to be a date, usually representing the end of a reporting period, and the rows will be some piece of financial information to be reported on; sales (or revenue), cost of sales, operating expenses, net income, etc. Here’s a fairly standard representation of a cash flow statement:
 
2008
2007
2006
2005
2004
Period End Date
06/30/2008
06/30/2007
06/30/2006
06/30/2005
06/30/2004
Period Length
12 Months
12 Months
12 Months
12 Months
12 Months
Stmt Source
10-K
10-K
10-K
10-K
10-K
Stmt Source Date
07/31/2008
08/03/2007
08/25/2006
08/26/2005
08/26/2005
Stmt Update Type
Updated
Updated
Updated
Updated
Restated
 
 
 
 
 
 
Net Income/Starting Line
17,681.0
14,065.0
12,599.0
12,254.0
8,168.0
Depreciation/Depletion
2,056.0
1,440.0
903.0
855.0
1,186.0
Amortization
0.0
0.0
0.0
0.0
0.0
Deferred Taxes
935.0
421.0
219.0
-179.0
-1,479.0
Non-Cash Items
787.0
1,181.0
1,356.0
2,589.0
5,538.0
Unusual Items
Other Non-Cash Items
Changes in Working Capital
153.0
689.0
-673.0
1,086.0
1,213.0
Accounts Receivable
Other Assets
Other Liabilities
Cash from Operating Activities
21,612.0
17,796.0
14,404.0
16,605.0
14,626.0
 
 
 
 
 
 
Capital Expenditures
-3,182.0
-2,264.0
-1,578.0
-812.0
-1,109.0
Purchase of Fixed Assets
Other Investing Cash Flow Items, Total
-1,405.0
8,353.0
9,581.0
15,839.0
-2,233.0
Acquisition of Business
Sale/Maturity of Investment
Purchase of Investments
Other Investing Cash Flow
Cash from Investing Activities
-4,587.0
6,089.0
8,003.0
15,027.0
-3,342.0
 
 
 
 
 
 
Financing Cash Flow Items
120.0
54.0
89.0
-18.0
0.0
Other Financing Cash Flow
Total Cash Dividends Paid
-4,015.0
-3,805.0
-3,545.0
-36,112.0
-1,729.0
Issuance (Retirement) of Stock, Net
-9,039.0
-20,793.0
-17,106.0
-4,948.0
-635.0
Issuance (Retirement) of Debt, Net
0.0
0.0
0.0
0.0
0.0
Cash from Financing Activities
-12,934.0
-24,544.0
-20,562.0
-41,078.0
-2,364.0
 
 
 
 
 
 
Foreign Exchange Effects
137.0
56.0
18.0
-7.0
27.0
Net Change in Cash
4,228.0
-603.0
1,863.0
-9,453.0
8,947.0
 
 
 
 
 
 
 
 
 
 
 
 
Net Cash - Beginning Balance
6,111.0
6,714.0
4,851.0
14,304.0
5,357.0
Net Cash - Ending Balance
10,339.0
6,111.0
6,714.0
4,851.0
14,304.0
 
It turns out, though, from a data base perspective, this starts to present some serious challenges. To overcome those challenges, the concept of normalization was developed.
The simplest explanation of normalization that I can offer is the following. A table is normalized when you can find the data that you want using the key, the whole key, and nothing but the key. If you use this simple formula, I can guarantee that your table will be sufficiently normalized.
In this case, if we want to put this data in database to do some further analysis, its structure presents us with some problems. Why put it in a database? What if I want to compare 20 companies? What about 200 or 2,000 or 20,000? At some point, it just becomes impractical to try and manage that kind of volume in a spreadsheet.
The first thing that we have to realize is that we want to represent all companies in the same way on the database. This allows us to calculate and compare information in a wide variety of ways, with the assurance that the data has been consistently rendered.
Secondly, we then want to store data in such a way, that we will not have to change the database structure as new data is provided. In the above example, when the 2009 results are recorded, we need add another column to store those results. While you can do this in a spreadsheet, doing it in a database is problematic; none of your existing queries, views, stored procedures, etc. will recognize this column, or, perhaps the column will turn up in unexpected places, if you have used ‘SELECT *’ anywhere.
Let’s get right to the issue, then, of how to normalize this data. First, as I pointed out previously, we need to store information about the company. We could do that any number of ways, by using the company name, by using a number, or by using a symbol of some type. It is important, though, that whatever you pick uniquely identifies the company, as you would not want to mix company results together. Just to keep things simple, we will use the company’s stock symbol, though in real life, it might make more sense to use a number and associate the number with a symbol or the company name.
Next, look at the column headings and see if there is any way to generalize the headings. To me, it seems straightforward that the columns represent fiscal years.
Now I look at the rows. It seems to me that there are 2 distinct types of data. At the top of the table, there is some information about the FY, consisting of Period End Date, Period Length, Stmt Source, and Stmt Update Type. Then there is actual financial data, which consist of a description of the financial activity (net income, depreciation, changes in working capital, etc.). For purposes of this exercise, let’s focus on the financial data.
I think that at this point we can safely say that we can uniquely the financial information that we are interested in as consisting of the symbol, fiscal year, description of the financial activity, and the amount. We can then implement a simple table which consists of 4 columns:
o        SYMBOL
o        FY
o        FINANCIAL ACTIVITY
o        AMOUNT
with first three fields being the key. This means that these fields uniquely identify the remaining field in this table; the key, the whole key and nothing but the key.
Our data would then go into the database looking like this (and I will only do the Cash from Operating Activities):
SYMBOL
FY
FINANCIAL ACTIVITY
AMOUNT
XYZ
2008
Net Income
17681.00
XYZ
2007
Net Income
14065.00
XYZ
2006
Net Income
12599.00
XYZ
2005
Net Income
12254.00
XYZ
2004
Net Income
8168.00
XYZ
2008
Depreciation
2056.00
XYZ
2007
Depreciation
1440.00
XYZ
2006
Depreciation
903.00
XYZ
2005
Depreciation
855.00
XYZ
2004
Depreciation
1186.00
XYZ
2008
Amortization
0.00
XYZ
2007
Amortization
0.00
XYZ
2006
Amortization
0.00
XYZ
2005
Amortization
0.00
XYZ
2004
Amortization
0.00
XYZ
2008
Deferred Taxes
935.00
XYZ
2007
Deferred Taxes
421.00
XYZ
2006
Deferred Taxes
219.00
XYZ
2005
Deferred Taxes
-179.00
XYZ
2004
Deferred Taxes
-1479.00
XYZ
2008
Non-Cash Items
787.00
XYZ
2007
Non-Cash Items
1181.00
XYZ
2006
Non-Cash Items
1356.00
XYZ
2005
Non-Cash Items
2589.00
XYZ
2004
Non-Cash Items
5538.00
XYZ
2008
Changes in Working Capital
153.00
XYZ
2007
Changes in Working Capital
689.00
XYZ
2006
Changes in Working Capital
-673.00
XYZ
2005
Changes in Working Capital
1086.00
XYZ
2004
Changes in Working Capital
1213.00
 
When data is ready for 2009 or beyond, there is no need to change the layout of the table. I can have as money companies as I want, for as many fiscal years as I want. If I want to compare results across many companies, it becomes very straightforward and easy. It now becomes possible, for example, to use the FORECAST function on the database to predict 2009 Revenue (for example), for all companies, selected companies, any combination of companies that you might think of, and get the result in less than a second.
Take the time to normalize when you move your data from your spreadsheet and into the database. It will pay big dividends in terms of consistency, reliability, and speed of processing.

Tags:
Categories:
Copyright 2008-2017 Westclintech LLC         Privacy Policy        Terms of Service