Generally the industry recognizes three distinct categories of data in a BI environment:
The distinction between Transaction and Master data is quite pervasive in the BI industry, as it is the basis for all OLAP (On-Line-Analytical-Processing). With OLAP, you divide your data in (numerical) Facts and Dimensions. Facts linked to multiple dimensions are 'put into a (virtual) cube', that can be navigated by slicing-and-dicing the cube. See the pivot-table of Excel for an example of how this works. The Facts are generally directly derived from the (numerical fields in the) transactional data, whereas the dimensions are constructed from relevant master data elements. All very neat. At the surface, that is.
Because if you examine the transaction- and master data categories more closely, you'll see that mapping these to Facts and Dimensions is not always obvious. This is particularly true for all data that does not describe an atomic transaction, but more a status that is valid during a time period. There are numerous examples to be found: the credit status of a customer, the status of an order (received, approved, shipped, delivered, paid, etc.), the (list)price of a product, the exchange rate of a currency, stock price, etc… Now, most of these data elements are rightfully considered master data, but can still be equally valuable for analysis as the transaction data. I could want to analyze price erosion by putting the list prices of all products in a category in a pivot table. I also could want to analyze a process by evaluating how long on average an order maintains a certain status.
Examining this closer learns that most (if not all) master data elements can actually change over time, even though many source systems do not track these changes and simply overwrite previous values. These changing master data elements have a lot in common with the status elements, and can be treated similarly in a data warehouse, if the changes in the master data are properly recorded. This means: including a validity period.
So, we are actually deciding what (master) data elements end up in Dimensions and what elements are mapped into Facts depending on our analysis needs. This is a very strong case against a data warehouse that is built from fact tables and dimension tables (often referred to as: the data warehouse as a collection of data marts, with conformed dimensions). In stead, the data warehouse should be oblivious of Facts and Dimensions. Better still: the data warehouse can be (and should be) oblivious of transaction data and master data as well: they should not be –fundamentally- treated differently. Ideally even the meta-data should be treated as 'ordinary data'. That would mean that one would be able to load, archive, analyze and report meta-data in the exact same way you'd expect to be able to do with transaction and master data.
This notion calls for a very generic and flexible data modeling technique. One that is still simple enough to quickly comprehend and scalable enough to be used in any data warehouse environment , from very small scale to multiple-petabytes. We believe such a modeling technique is the Data Vault, as designed and described by Dan Linstedt. We are so convinced of its implicit value, that we have created a company dedicated to developing and implementing Data Vault solutions, also supporting and maintaining them. Visit www.qosqo.nl for details.