Sei sulla pagina 1di 6

Data Warehousing In an enterprise, data comes from multiple departments and it is stored in multiple formats.

Data warehousing is the process of transforming all these data into a single format and consolidating them in one place. This integration of data at a single location helps business users gain an overall perspective of what is happening in the various parts of the organization. This process also helps consolidate old and historical data so that business users may make comparative studies of trends and patterns emerging over time. This in turn helps managers and decision makers gain key insights from large volumes of data, thereby helping them in taking necessary steps. Definition of Data Warehouse According to Bill Inmon A data warehouse is a subject oriented, integrated, time variant, non volatile collection of data in support of management's decision making process. Ralph Kimball defines it as A data warehouse is a copy of transaction data specifically structured for query and analysis.

Need for Data Warehousing Why do we build a data warehouse in the first place? Why dont we use existing data sources themselves to locate whatever information that is needed? This is a often repeated question. Well, there are several reasons why a data warehouse is a necessity. First, the existing data sources are often non-uniform, i.e. they are stored in multiple formats under multiple structures using multiple nomenclatures. And it is no easy task to connect them together and then pull out the information as needed. Further, these enterprises routinely work with data that is current, whereas, past data is often stored in ways which are not readily accessible. However, for analysis of business performance, the study of past data is an absolute must. Therefore, business needs a data warehouse where all historical records can be consolidated and stored for quick and easy access and analysis. All these combined together make the retrieval of complex information much faster and easier. This in turn, makes data presentation and reporting very flexible, powerful and reliable.

Characteristics of a Data Warehouse What characterizes a data warehouse? First and foremost, the data warehouse handles historical data as compared to transactional systems whose focus is current data. This means, not only the data volume of data warehouses is very high, but also it caters to a much smaller number of users who are concerned with monitoring of business performances rather than managing routine operations. The data warehouse is thus a very important organizational information asset that enhances the decision making capabilities of business users. Operational data stores: An Operational Data Store (ODS) is a type of database often used as an interim area for a data warehouse. Unlike a data warehouse, which contains static data, the contents in the ODS are updated through the course of business operations. An ODS is designed to quickly perform relatively simple queries on small amounts of data, rather than the complex queries on large amounts of data typical of the data warehouse. An ODS is similar to the short term memory where it stores only recent and current information; in comparison, the data warehouse is more like long term memory where it stores permanent information. Primary operation of ODS is to collect data from various sources into a single format.

Data warehouse architecture: Architecture without Staging Area

In the above diagram data is extracted directly from the operational sources and transformation is done on to the extracted data. After which data is loaded to data warehouse. Architecture with Staging Area

In this case data is loaded into a staging area where the data is transformed into a single format which makes the ETL process easier. And once ETL process is

completed users can access the existing data warehouse to analyze and create reports. Architecture with Data Mart

Here in addition to the staging area, data marts are created. A data mart is a repository of data related to a specific group of data, for example data related to a specific department of an organization can be clubbed together and a data mart is formed. To conclude, a data warehouse provides some significant advantages for the organization. First and foremost, it enables decision makers with better enterprise intelligence by providing accurate information of the status of the various business parameters across the organization. The key insights thus gained by the decision makers improve their productivity, arming them with critical inputs. And this of course, gives them a distinct advantage over the competition.

Data Warehousing Component


Data Warehouse Database
The Data Warehouse has a relational database consisting of historical and current data along with the data extracted from external data sources. The database is designed to support analysis of business dealings by categories and attributes. It is tuned in such a way that it supports complex and large queries accessing huge number of rows per table. It provides simultaneous access to few users. The database has a multidimensional data structure as against complex data structure in an OLTP database. It will have many number of indexes and moderate number of joins. To provide a better performance in terms of access speed and the data model structure the database of a data warehouse will be more inclined towards a denormalized structure of the database although normalized too fits in.

Sourcing, Acquisition, Cleanup and transformation tools


Sourcing The data warehouse contains data extracted from a variety of multiple sources. The varieties of sources are OnLine Transaction Processing (OLTP) databases, historical data stores and external data sources from which data is passed on to a data warehouse. The concept and structure of the data sources are closely examined along with their dependencies between them before pulling the data. This step facilitates to build the data warehouse with quality data at a more rapid pace. Acquisition The acquisition of data i.e., the extraction phase first involves a team of people who will determine what data should be extracted and from which particular part of the source. The success of data extraction process largely depends on a constant and efficient connection to the data source. The extraction phase has the following process Identification of source applications and its structure. The method of Extraction. Extraction can be either manual or tool based. The frequency at which extraction takes place i.e. daily, weekly and quarterly. Cleanup In this stage the data extracted from various sources will be scanned for errors and cleaned before loading into the data warehouse. The cleaning of data involves removing the null values, checking for consistency in the data format, standardizing date format, eliminating or adding the title for a set of records and standardizing the numerical values. Custom based cleaning of data is also carried out depending on the customer requirements. Transformation tools The data collected from the OLTP systems gets transformed inside the data warehouse into info which further gets transformed into useful knowledge for the decision makers. The extracted data will be raw in nature; hence it cannot be applied to the datawarehouse. The data quality has to be improved before it becomes useable in the data warehouse. The transformation phase applies a series of rules to manipulate the extracted data to suit the end user requirements. Some of the basic tasks performed in the

transformation phase are selection, splitting, joining, conversion, calculation, derivation and summarization. The transformation tool improves the efficiency as well as the accuracy when compared to the manual technique. Using a tool one can set the parameters, data definitions and the transformation rules to get accurate results. The tools have made the transformation phase much simpler and have reduced the time consumption. Some of the major transformation tools in the market are SSIS tool from Microsoft, Data Integrator from Business Objects, Power Center from Informatica and Oracle Warehouse Builder from Oracle.

Metadata
Metadata is defined as Data about Data. The term Meta originates from the Greek word indicating a nature of a higher order. Metadata is often used in the daily routine. To illustrate lets consider the description of any text book as an Ex: - Raj, 2006. DW Concepts. Bangalore: Tom marks. In this example the very first element is the author name followed by year, title, city and publisher name which provide the information about the book which contains information about DW Concepts. Metadata is one of the key factors responsible for the success of a Data warehouse project. It captures information necessary to Extract, Transform and load the data from a source system into a datawarehouse. It is an essential ingredient in the transformation of raw data into knowledge. For ex: - A line in the sales database may contain: 1023 k940 9483 The above data becomes meaningless until the metadata in the data directory gives the information that store number 1023, product k940 and the total sales is 9483.

Access Tools
The final product of a data warehouse is the reports generated using the processed information. The process which starts from the business analysis followed by ETL phase ends in reporting. Good formatting and navigation to different levels of information is the important factor in a report. The reporting tools are evaluated on the following points: Ability to connect to data sources Performance in Scheduling and distribution How secured the tool is Customization of features Should be able to export the output to flat files and PDF

Some of the popular access tools in the market are Crystal Reports from Business Objects and Reportnet from Cognos.

Data marts
Data mart is a subset of data warehouse that is designed for a particular department of business. The departments of business such as sales, marketing, finance and production are few examples of data mart. The data warehouse is split into a number of simple data marts each one related for a particular analysis activity. The reasons behind splitting a data warehouse into data marts are plenty, the main reason being the ease of creation & the implementation cost. Besides that Data marts also provide easy access to the data which is needed on a regular basis with quick end-user response times.

Each data mart represents data by means of a star schema which consists of a large fact table at the center & a set of smaller dimension tables placed in a circular pattern around the fact table. In case of a simple data warehouse data extraction can be made directly from the operational systems to the data mart, whereas in the case of a complex data warehouse the extraction has to take place from intermediate repositories which are known as operational data stores (ODS).

Data warehousing administration and management


Once the first version of the data warehouse is deployed the administration & management task will come into effect. It will have tasks on maintenance management and change management in general. As the number of version releases grow up enhancements and revisions add up to the tasks list. The tasks will spread in areas such as data growth, storage, performance tuning, security, user support and many other related areas. Data growth needs special attention as a small percentage increase in data can result in substantial changes in the overall data warehouse. Increase in the data volume increases the demand for storage space in turn reflecting in additional cost; hence proper storage management is a must. Performance Tuning is done to improve the system performance. The process is intended to address the performance related issues faced by the system upon an increase in the load.

Information delivery system


The information which is ready in the data warehouse is delivered depending on the class of users. The users may have simple/complex requirements but the process followed by the information delivery system will be the same. The information is delivered in terms of reports and queries. The reporting gets a boost in data warehouse as the information is structured providing easy navigation between different levels of information. The reports are user driven with different formats to facilitate better projection of the information to the end users. The reports provide a lot of features in terms of formatting. The reports are internally written using queries; hence even queries too serve as a means of providing information. The data warehouse provides a rich set of features in querying with capabilities of readily available templates, complex queries, faster processing and options to save the results.

Potrebbero piacerti anche