Data Warehouse Architecture: data areas of our Corporate Data Warehouse

 

When designing the architecture of a data warehouse system we have to raise different environments for which data must pass en route to its Data mart or target cube. Given the amount of changes that have to do, and normally the DWH, besides fulfilling its function of support analytical requirements, performs a data integration function that will form the Corporate store and they will have to be also consulted the traditional way by the operational systems, it is recommended to create different data areas in the path between source systems and OLAP tools.

Each of these areas as characterized by the functions performed, how data is organized in the same, and what type of need can service. The area that is 'at the end of the road' is important, but it will not be the only store data that will exploit the reporting tools.

Nor is there a standard convention that covers exactly what each area, and compulsion to use each. Each project is a world, and influenced by many factors such as complexity, volume of information about it, if you really want to use the Data Warehouse as a corporate store or Master of Information System, or if there are real needs to support operational reporting.

 

Dataprix: arquitectura data warehouse: STG, ODS, DWH y DM

Staging Area

It is a temporary area where they collect the data needed for home systems. Collect the data strictly necessary for cargo, and applies minimal changes to them. There are no restrictions or integrity key is used, the data are treated as if the tables were flat files. This will offset the decline in home systems, the charge is as quick as possible to minimize the required time window, and is also reduced to a minimum the possibility of error.Once the data is transferred, the DWH is independent of source systems to the next load. The only thing that is often added a field that stores the date of loading.

Obviously these data will not serve any reporting application, are temporary data once they have served their purpose will be deleted, in fact in the logic diagram of the architecture often does not appear, because its function is purely operational.

Some consider that the Staging Area covers most of what I said, or even that this area encompasses the entire environment where ETL processes do, I opted for their use only as a temporary area.

ODS (Operational Data Store)

As its name suggests, this area is going to supporting operational systems.The data model of the operational data store is a relational structure and standardized so that any reporting tool or operating system able to access your data. It is within the data warehouse because it leverages the integration effort involved the creation of the Corporate Data Warehouse to also address operational needs, but not required, nor is this something specific to the Business Intelligence, the ODS predate we began to talk of BI and DWH.

It does not store historical data, shows the image of the moment, but that does not mean that you can not record the changes.

ODS data are collected from the Stage Area, and here it is done processing, data cleansing and referential integrity checks so that data are fully integrated into the standard relational model.

Keep in mind that the updating of the ODS will not be instantaneous, changes in the source system data will not be reflected until the end of the load. This means that data will be updated from time to time, which must be explained to users, because reports are launched against the ODS may rarely be as 'minute' like that exist in the source system. What we can do is to define a higher frequency of cargo for the ODS that for Warehousing Professionals. If necessary, you can refresh the ODS every 15 minutes, and the rest each day, for example.

 

Corporate Data Warehouse

The Corporate Data Warehouse does contain historical data, and is aimed at analytical exploitation of information it collects. DSS tools for reporting or analytical attack mainly the Data marts, but inquiries can be made directly against the Corporate Data Warehouse, especially when necessary to display both information found in different Datamarts.

It can store data that come from both the ODS and Staging Area. If we have made transformation and integration processes in the ODS are not going to repeat the same data to pass to the Corporate Warehouse. What can not collect from the ODS itself to be to go look at the Staging Area.

 

 

The scheme is similar to a standard relational model, but already denormalization techniques are implemented. It should not contain an excessive number of tables or relations because, for example, many hierarchical relationships in a standard format would be implemented with separate tables should be created here and in the same table, which then represent a dimension. Another peculiarity is that most of the tables must include date fields to control the loading date, the date on which an event occurs, or the period of validity of registration.

If the data warehouse is not too large, or the level of demand is not very high requirements in terms of 'operational', to simplify the structure may choose to eliminate ODS, and if necessary adjust the Corporate Data Warehouse for serve two types of reporting. In this case, the resulting area would be the Corporate DWH, but sometimes also called ODS. 

 

Data marts

And finally we reach the last area of data, which is where you create the data marts. These are obtained from information collected in the area of Corporate Warehouse. Each data mart is a subset of this store, but subject-oriented analysis, typically associated with a department of the company.

The Data marts are designed with multi-dimensional structure, each object of analysis is a fact table linked to various dimension tables. If designed according to the Star model will be practically a table for each dimension is the denormalized version. Under a model Snowflake dimension tables are less unnormalised and each dimension may be used several hierarchically linked table.

This area may be in the same database as the other if the tool is operating ROLAP type, or you can create either of the BD, the data structure itself generated MOLAP-style applications, better known as multidimensional cubes.

The previous step of this data area must be quite simple, which also provides some independence to the software that is used for analytical reporting. If for any reason you need to change the OLAP tool that would do little more than redefine the metadata and regenerate the cubes, and if the change is two ROLAP type even the latter would be necessary. In any case, the above areas need not be modified.

 

Talk About at forum