The systems used earlier attempted to automate the established business
processes by leveraging the power of computers to obtain significant improvements in efficiency and speed. However, in today’s business environment, efficiency or speed is not the only key for competitiveness. Today, multinational companies and large organizations have operations in many places within their origin country and other parts of the world. Each place of operation may generate large volume of data. For example, insurance companies may have data from thousands of local and external branches, large retail chains have data from hundreds or thousands of stores, large manufacturing organizations having complex structure may generate different data from different locations or operational systems and so on. Therefore, the business of the 21st century is the competition between business models and the ability to acquire, accumulate and effectively use the collective knowledge of the organization. It is the flexibility and responsiveness that differentiates competitors in the new Web enabled e- business economy. The key to success of the modern business will depend on an effective data management strategy of data warehousing and interactive data analysis capabilities that culminates with data mining. Data warehousing systems have emerged as one of the principal technological approaches to the development of newer, leaner, meaner and more profitable corporate organizations. Data Warehousing A Data Warehouse is a subject oriented, integrated, time variant, non- volatile collection of data in support of management’s decisions. There are following components of this definition: Subject Oriented Integrated Time Variant Non Volatile The Primary goals of a Data warehouse are the following: Provide access to the data of an organization Data consistency Capacity to separate and combine data Inclusion of tools set to query, analyze and present information Publish used data Drive business reengineering
Data Warehousing has triggered an era of information based
management, which provides the following advantages to the end users: 1. A single information source 2. Distributed information availability 3. Providing information in a business context 4. Automated information delivery 5. Managing information quality and ownership Data Warehousing typically delivers information to users in one of the following formats: 1. Query and reporting 2. Online Analytical Processing (OLAP) 3. Statistical Analysis 4. Data Mining 5. Graphical/Geographic System
Characteristics of Data Warehouses
Data Warehouses have the following distinct characteristics: 1. Multidimensional conceptual view 2. Unlimited dimensional and aggregation levels 3. Unrestricted cross-dimensional operations 4. Dynamic matrix handling 5. Client/Server Architecture 6. Multi user support 7. Accessibility 8. Transparency 9. Data Manipulation 10. Consistent reporting performance 11. Flexible reporting
Benefits of Data Warehouses
1. High return of investments (ROI) 2. More cost-effective decision-making 3. Competitive Advantage 4. Better enterprise intelligence 5. Increased productivity of corporate decision-makers 6. Enhanced customer service 7. Business and information re-engineering
Limitations of Data Warehouses
1. It is query intensive 2. Data Warehouse themselves tend to be very large, may be in the order of TBs, as a result the performance tuning is hard 3. Scalability can be a problem 4. Hidden problems with various sources 5. Increased end user demands 6. High demand of resources 7. High maintenance 8. Complexity of integration Main components of Data Warehouses Following are the three components that are supported by data warehouse: 1. Data Acquisition 2. Data Storage 3. Data Access
Data Warehouse Architecture
A typical architecture of data warehouse contains: 1. Operational and external data sources 2. Data Warehouse DBMS 3. Repository System 4. Data Marts 5. Application Tools 6. Management Platform 7. Information delivery system Structure of a Data Warehouse 1. Physical Data warehouse 2. Logical Data warehouse 3. Data marts Physical Data warehouse – Physical database in which all the data for the data warehouse are stored along with metadata and processing logic for scrubbing, organizing, packaging and processing the detail data. Logical Data warehouse – The logical data warehouse contains metadata including enterprise rules and processing logic for scrubbing, organizing, packaging and processing the detail data, but does not contain actual data. Instead it contains the information necessary to access the data wherever they reside. Data Mart – Data mart is a subset of an enterprise-wide data warehouse which typically supports an enterprise element. As a part of an iterative data warehouse development process, an enterprise builds a series of physical or logical data marts over time and links them via an enterprise- wide logical data warehouse or feeds them from a single physical warehouse. Data Warehousing Process Overview The following are the major components of data warehousing process: 1. Data Sources 2. Data extraction and transformation 3. Data loading 4. Comprehensive database 5. Metadata 6. Middleware tools Data Warehousing Architectures 1. A Three-Tier Data Warehouse
2. A Two-Tier Data Warehouse
3. Web-based Data Warehouse
Representation of data in Data warehouse Many variations of data warehouse architecture are possible. No matter what are the architecture was, the design of data representation in the data warehouse has always been on the concept of dimensional modelling. Dimensional modelling is a retrieval-based system that supports high volume query access. Representation and storage of data in a data warehouse should be designed in such a way that not only accommodates but also boosts the processing of complex multidimensional queries. The means by which dimensional modelling is implemented in data warehouses are: 1. Star Schema