Documenti di Didattica
Documenti di Professioni
Documenti di Cultura
Data warehousing is the process of transforming all these data into a single format and consolidating them in one place. This integration of data at a single location helps business users gain an overall perspective of what is happening in the various parts of the organization. This process also helps consolidate old and historical data so that business users may make comparative studies of trends and patterns emerging over time. This in turn helps managers and decision makers gain key insights from large volumes of data, thereby helping them in taking necessary steps. Definition of Data Warehouse According to Bill Inmon A data warehouse is a subject oriented, integrated, time variant, non volatile collection of data in support of management's decision making process. Ralph Kimball defines it as A data warehouse is a copy of transaction data specifically structured for query and analysis.
Need for Data Warehousing Why do we build a data warehouse in the first place? Why dont we use existing data sources themselves to locate whatever information that is needed? This is a often repeated question. Well, there are several reasons why a data warehouse is a necessity. First, the existing data sources are often non-uniform, i.e. they are stored in multiple formats under multiple structures using multiple nomenclatures. And it is no easy task to connect them together and then pull out the information as needed. Further, these enterprises routinely work with data that is current, whereas, past data is often stored in ways which are not readily accessible. However, for analysis of business performance, the study of past data is an absolute must. Therefore, business needs a data warehouse where all historical records can be consolidated and stored for quick and easy access and analysis. All these combined together make the retrieval of complex information much faster and easier. This in turn, makes data presentation and reporting very flexible, powerful and reliable.
Characteristics of a Data Warehouse What characterizes a data warehouse? First and foremost, the data warehouse handles historical data as compared to transactional systems whose focus is current data. This means, not only the data volume of data warehouses is very high, but also it caters to a much smaller number of users who are concerned with monitoring of business performances rather than managing routine operations. The data warehouse is thus a very important organizational information asset that enhances the decision making capabilities of business users. Operational data stores: An Operational Data Store (ODS) is a type of database often used as an interim area for a data warehouse. Unlike a data warehouse, which contains static data, the contents in the ODS are updated through the course of business operations. An ODS is designed to quickly perform relatively simple queries on small amounts of data, rather than the complex queries on large amounts of data typical of the data warehouse. An ODS is similar to the short term memory where it stores only recent and current information; in comparison, the data warehouse is more like long term memory where it stores permanent information. Primary operation of ODS is to collect data from various sources into a single format.
In the above diagram data is extracted directly from the operational sources and transformation is done on to the extracted data. After which data is loaded to data warehouse. Architecture with Staging Area
In this case data is loaded into a staging area where the data is transformed into a single format which makes the ETL process easier. And once ETL process is
completed users can access the existing data warehouse to analyze and create reports. Architecture with Data Mart
Here in addition to the staging area, data marts are created. A data mart is a repository of data related to a specific group of data, for example data related to a specific department of an organization can be clubbed together and a data mart is formed. To conclude, a data warehouse provides some significant advantages for the organization. First and foremost, it enables decision makers with better enterprise intelligence by providing accurate information of the status of the various business parameters across the organization. The key insights thus gained by the decision makers improve their productivity, arming them with critical inputs. And this of course, gives them a distinct advantage over the competition.
transformation phase are selection, splitting, joining, conversion, calculation, derivation and summarization. The transformation tool improves the efficiency as well as the accuracy when compared to the manual technique. Using a tool one can set the parameters, data definitions and the transformation rules to get accurate results. The tools have made the transformation phase much simpler and have reduced the time consumption. Some of the major transformation tools in the market are SSIS tool from Microsoft, Data Integrator from Business Objects, Power Center from Informatica and Oracle Warehouse Builder from Oracle.
Metadata
Metadata is defined as Data about Data. The term Meta originates from the Greek word indicating a nature of a higher order. Metadata is often used in the daily routine. To illustrate lets consider the description of any text book as an Ex: - Raj, 2006. DW Concepts. Bangalore: Tom marks. In this example the very first element is the author name followed by year, title, city and publisher name which provide the information about the book which contains information about DW Concepts. Metadata is one of the key factors responsible for the success of a Data warehouse project. It captures information necessary to Extract, Transform and load the data from a source system into a datawarehouse. It is an essential ingredient in the transformation of raw data into knowledge. For ex: - A line in the sales database may contain: 1023 k940 9483 The above data becomes meaningless until the metadata in the data directory gives the information that store number 1023, product k940 and the total sales is 9483.
Access Tools
The final product of a data warehouse is the reports generated using the processed information. The process which starts from the business analysis followed by ETL phase ends in reporting. Good formatting and navigation to different levels of information is the important factor in a report. The reporting tools are evaluated on the following points: Ability to connect to data sources Performance in Scheduling and distribution How secured the tool is Customization of features Should be able to export the output to flat files and PDF
Some of the popular access tools in the market are Crystal Reports from Business Objects and Reportnet from Cognos.
Data marts
Data mart is a subset of data warehouse that is designed for a particular department of business. The departments of business such as sales, marketing, finance and production are few examples of data mart. The data warehouse is split into a number of simple data marts each one related for a particular analysis activity. The reasons behind splitting a data warehouse into data marts are plenty, the main reason being the ease of creation & the implementation cost. Besides that Data marts also provide easy access to the data which is needed on a regular basis with quick end-user response times.
Each data mart represents data by means of a star schema which consists of a large fact table at the center & a set of smaller dimension tables placed in a circular pattern around the fact table. In case of a simple data warehouse data extraction can be made directly from the operational systems to the data mart, whereas in the case of a complex data warehouse the extraction has to take place from intermediate repositories which are known as operational data stores (ODS).