maintained separately from the organization’s operational databases. A data warehouse is a subject-oriented, integrated, time-varying, non-volatile collection of data that is used primarily in organizational decision making Companies, over the years, gathered huge volumes of data Data not suitable for ready usage – Information crisis Technology changes Inability of past DSS to provide information How to take strategic decisions? Can we analyze this data to get any competitive advantage? Need for Strategic Decision Making Strategic Information – Information required to make strategic decisions Managers, Executives use this to improve business, market share, to retain customers etc. Operational data is good for day to day activities but not good for strategic decision making Characteristics of Strategic Information Integrated Must have single, enterprise wide view Data Integrity Information must be accurate and must conform to business rules Accessible Easily accessible with intuitive access paths, and responsive for analysis Credible Every business factor must have one and only one value Timely Information must be available within the stipulated time frame Operational Vs Decision Support Systems (DSS) Operational(oltp) Informational(olap) Data Content Current data Archived, derived, summarized Data Structure Optimized for Optimized for transactions complex queries Access Frequency High Medium to Low Access Type Read, Update, Read Delete Usage Predictable, Ad-hoc, random, repetitive heuristic Response Time Sub-seconds Several seconds to min Users Large number Relatively small number Data Warehouse is designed around “subjects” rather than processes A company may have Retail Sales System Outlet Sales System Catalog Sales System DW will have a Sales Subject Area Heterogeneous Source Systems Little or no control Need to Integrate source data Remove Inconsistencies For Example: Product codes could be different in different systems Arrive at common code in DW Standardize various data elements Naming Conventions Codes Data Attributes Measurements Most business analysis has a time component Trend Analysis (historical data is required) Extract, Transform, Load (ETL) tools DW databases & DBMS tools Data marts Meta data DW administration & management tools Information delivery system Data Extraction Data Cleaning Data Transformation Convert from legacy/host format to warehouse format Load Sort, summarize, consolidate, compute views, check integrity, build indexes, partition Consumes 70-80% of project time Heterogeneous Source Systems Little or no control over source systems Source systems scattered Different currencies, measurement units Ensuring data quality A storage area where extracted data is cleaned, transformed and deduplicated. Initial storage for data Need not be based on Relational model Mainly sorting and Sequential processing Does not provide data access to users Commercial tools: Warehouse Builders (Oracle) MS Data Transformation Services DataStage SAS ETL Server Typical functions Define source, query (run SQL), define transformation, define target, verify transformation, schedule run, audit report Almost always a relational DB Oracle, DB2, Sybase, SQL Server New DB design for special purpose of DW (e.g., scale up, speed up, parallel processing) OLTP Systems are Data Capture Systems “DATA IN” systems DW are “DATA OUT” systems Design of the DW must directly reflect the way the managers look at the business Should capture the measurements of importance along with parameters by which these parameters are viewed must facilitate data analysis, i.e., answering business questions A logical design technique that seeks to eliminate data redundancy Illuminates the microscopic relationships among data elements Perfect for OLTP systems Responsible for success of transaction processing in Relational Databases ER models are NOT suitable for DW? End user cannot understand or remember an ER Model Many DWs have failed because of overly complex ER designs Not optimized for complex, ad-hoc queries Data retrieval becomes difficult due to normalization Browsing becomes difficult Facts are stored in FACT Tables Dimensions are stored in DIMENSION tables Dimension tables contains textual descriptors of business Fact and dimension tables form a Star Schema “BIG” fact table in center surrounded by “SMALL” dimension tables Measures or facts Facts are “numeric” & “additive” For example; Sale Amount, Sale Units Factors or dimensions Star Schemas Snowflake & Fact constellations Data mart = subset of DW for community users, e.g. accounting department Sometimes exist as Multidimensional Database Info mart = summarized data + report for community users Top down – Data pulled from DW Bottom up – Data pushed to DW Data Warehouse Vs Data Mart Data Warehouse Data Mart
Corporate / Enterprise wide Departmental
Union of all data marts A single business process Data received from staging area Star-join (fact & dimensions) Organized on E-R Model Uses structure to suit departmental view Data about data Help users understand content & locate data Types 1.Operational Metadata Field, Field length, data types, structure 2. Extraction and Transformation Metadata Extraction frequencies, Methods, Business rules 3. End User Metadata Navigational map for DW Security & priority Keep track of updates QC Purging & copy to data mart Security issue critical (users at many levels) Some security measures to protect a DW Views = limit users to see certain rows/columns Access control = grant rights to specific users to access selected data (can be created by DBA thro’ SQL commands such as Grant/Revoke) Admin controls such as group access, firewall, encryption Audit = track what users are doing Tools Query & reporting OLAP Data mining, visualization, segmentation, clustering New developments: text mining, web mining & personalization Mining multimedia data Commercial tools Crystal Report, Impromptu, WebFocus Increasingly common mode of delivery: Web-enabled Paulraj Ponniah. 2001. Data Warehousing Fundamentals, John Wiley & Sons, 2001. Vincent Rainardi. 2007. Building a Data Warehouse With Examples in SQL Server. Apress. Turban, Aronson, and Liang. 2006. Decision Support Systems and Intelligent Systems, Seventh Edition Thank you