Sei sulla pagina 1di 8

Bachelor of Computer Application (BCA) Semester 6 BC0058 Data Warehousing

1. with necessary diagram, Explain about Data Warehouse Development Life Cycle?
Ans :-

Data warehouse life cycle cover two vital areas. First is Warehouse management and second is Data management. The former deals with defining the project activities and requirements gathering. Life Cycle of Data Warehouse Development :-

Define the
Dd

Gather Requirements

Project Model The Warehouse

Validate The Model

Design The Warehouse

Validate The Design

Implementation

Life Cycle Step Of a Data Warehouse


Managing the Data Warehouse project is an on going activity. It is not like traditional system project. The Data Warehouse is concerned with the execution of Warehousing process and the data. Transaction processing system focus on automating the process, making it faster and efficient.

2. What is Metadata? What is its use in Data Warehouse Architecture?


Ans:Metadata in a Data Warehouse is similar to the data dictionary or the data catalog in a Database Management System. In the data dictionary, you keep the information about the logical data structures, the information about the file and addresses, the information about the indexes, and so on. The data dictionary contains data about data in the DataBase. The logical Meta data repository can be centralized or distributed depending on the business needs and organizational requirements. It would most likely be a group of data stores consisting of objects and relational data. The active Meta data manager would be the core component of the Meta data architecture and would ideally consist of the following components:

Meta Data Capture: Initial capture of Meta data from a variety of sources. Meta Data Synchronizer: Processes to keep Meta data up to date. Meta Data Search Engine: This will be the front end for the users to search and access Meta data. Meta Data Results Manager: This will process the results of the Meta data search and allow the user to make an appropriate selection. Meta Data Alerter: As part of a push technology, this will notify the subscribers about any new changes to the Meta data contents depending on the user profile. Meta Data Query Trigger: Will trigger an appropriate query tool to get data from a data warehouse or any other source based on the selection made by the user in the Meta data results manager.

3. Write briefly any four ETL tools. What is transformation? Briefly explain the basic transformation types.
Ans:ETL process can be created using almost any programming language, creating them from scratch is quite complex. Companies are buying ETL tools to help in the creation of

ETL processes. A good ETL tool must be able to communicate with the many different relational database and read the various file formats used throughout an organization. ETL tool have started to migrate into enterprise application integration, or even enterprise service bus, systems that now cover much more than just the extraction transformation and loading of data. Many ETL vendors now have data profiling, data quality and metadata capabilities. ETL Tools : PL/SQL SAS Data Integrator/SAS-integration studio Ascential Data Stage Cognos Decision Stream Microsoft DTS Business Objects Data Integrator

Transformation :Data transformations are often the most complex and, in terms of processing time , the most costly part of the ETL process. They can range from single data conversion to extremely complex data scrubbing techniques. Most common transformation types are : Format revisions:- you will come across these quite often. These revisions include changes to the data types and lengths of individual fields. In your source system, product package types may be indicated by codes and names in which the fields are numeric and text data types. The lengths of the package types may vary among the different source systems. Decoding of Fields:- A common type of data transformation. When you deal with multiple source system, you are bound to have the same data items described by a plethora of field values. Ex. The coding for gender, with one source system using 1 and 2 for Male and Female and another system using M and F. Calculated and derived Values:- The extracted data from the sales system contains sales amounts, sales unit and operating cost estimates by product. You will have to calculate the total cost and the profit margin before data can be stored in the data warehouse.

Splitting of Single Field:- Earlier the legacy systems stored name and addresses of customers and employees in large text fields. The first name, middle initials and last name were stored as a large text in a single field. Merging of Information :- this type of data transformation does not literally mean the merging of several field to create a single field of data. Character Set Conversion:- this type of transformation relates to the conversion of character set to an agreed standard character set for textual data in the data warehouse. If you have mainframe legacy system as source system, the source data from these systems will be in EBCDIC characters. Conversion of unit of measurements Date/Time conversion . Summarization De-duplication.

4. What are ROLAP, MOLAP and HOLAP? What is Multidimensional


Analysis? How do we achieve it? Ans:ROLAP (Relational Online Analytical Processing) :- These are the intermediate servers that stand in between a relational back-end server and client front-end tool. They use a relational or extended relational DBMS to store and manage warehouse data, and OLAP middleware to support missing pieces. ROLAP servers include optimization for each DBMS back end implementation of aggregation navigation logic , and additional tool and services. ROLAP Technology tends to have greater scalability than MOLAP technology. MOLAP (Multidimensional Online Analytical Processing):- These servers support multidimensional view of data through array-based multidimensional storage engines. They map multidimensional views directly to data cube array structure. The advantage of using a data cube is that is allow fast indexing to precomputed summarized data. Notice that with multidimensional data stores, the storage utilization may be low if the data set is sparse. HOLAP (Hybrid Online Analytical Processing):- The hybrid OLAP approach combines ROLAP and MOLAP technology, benefiting from the greater scalability of ROLAP and the faster computation of MOLAP. HOLAP server may allow large volumes of detail data to be stored in a relational database, while aggregations are kept in a separate MOLAP store.

Multidimensional analysis:- It is a data analysis process that groups data into two or more categories: data dimensions and measurements. For example, a data set consisting of the number of wins for a single football team at each of several years is a single-dimensional (in this case, longitudinal) data set. A data set consisting of the number of wins for several football teams in a single year is also a single-dimensional (in this case, cross-sectional) data set. A data set consisting of the number of wins for several football teams over several years is a two-dimensional data set. Twodimensional data sets are also called panel data. While, strictly speaking, two- and higher- dimensional data sets are multi-dimensional, the term multidimensional tends to be applied only to data sets with three or more dimension.

5. Explain testing process for Data Warehouse with necessary


diagram.
Ans:Testing process for data warehouse is: Requirement Testing :- The main aim for doing Requirement Testing is to check stated requirement for completeness. The requirements are mostly around reporting. Hence it becomes more important to verify whether these reporting requirements can be created using the data available. Successful requirement are those structured closely to business rules and address functionality and performance. These business rule and requirement provide a solid foundation to the data architects. Using the defined requirement and business rules, high level design of the data model is created. Unit Testing :- Unit Testing for data warehouse is White Box. It should check the ETL procedures and the reports developed. The developers usually do this. Whether ETL are accessing and picking up right data form right source. All the data transformation are correct according to the business rule and data warehouse is correctly populated with the transformed data. Testing the rejected records that dont fulfil transformation rules. Checking the source system connectivity. Extracting the right data. Security permission needs to be checked. Regression Testing Integration Testing :- System testing only include testing within the ETL application. The endpoints for system testing are the input and output of the ETL

code being tested. Integration testing shows how the application fits into the overall flow of all upstream and downstream application. When creating integration test scenarios, consider how the overall process can break and focus on touch point between applications rather than within one application. Integration testing will involve following: Sequence of ETL jobs in batch. Dependency and sequencing. Job re-starts ability. Initial loading of records at a later date to verify the newly inserted or updated data. Testing the rejected those dont fulfil transformation rules. Error log generation.

QA Team Review BRD For Completeness

Business

High Level Design

Requirements Testing
QA Team builds Test Plan

Review of HLD

Develop Test Cases and SQL Queries

Test Case Preparation

Unit Testing Functional Testing Regression Testing Performance Testing

Test Execution

User Acceptance Testing (UAT)

Process of Data warehouse Testing


User-Acceptance testing :- The main reason for building a data warehouse application is to make data available to business users. Users know the data best, and their participation in the testing effort is a key component to the success of a data warehouse implementation. User Acceptance Testing typically focuses on data loaded to the data warehouses and any views that have been created on top of the tables, not the mechanics of how the ETL application work. Use data that is either from production or as near to production data as possible. Test database views comparing view contents to what is expected. It is important that user sing off and clearly understand how the views are created. Plan for the system test team to support user during UAT. The user will likely have question about how the data is populated and need to understand detail of how the ETL works. Consider how the user would require the data loaded during UAT and negotiate how often the data will be refreshed.

6. What is testing? Differentiate between the Data Warehouse testing


and traditional software testing.
Ans:Testing for data warehouse is quite different from testing the development of OLTP systems. The main areas of testing for OLTP include testing user input for valid data type, edge values, etc. Testing for data warehouse, on the other hand, cannot and should not duplicate all of the error checks done in the source system. Even though there are some data quality improvements, such as making sure postal codes are associated with the correct city and state that are practical to do, data warehouse implementations must pretty much take in what the OLTP system has produced. Testing for data warehouse falls into three general categories. These are testing for ETL, testing that reports and other artifacts in the data warehouse provide correct answers and lastly that the performance of all the data warehouse components is acceptable

Here are some main areas of testing that should be done for the ETL process:

Making sure that all the records in the source system that should be brought into the data warehouse actually are extracted into the data warehouse: no more, no less. Making sure that all of the components of the ETL process complete successfully All of the extracted source data is correctly transformed into dimension tables and fact tables All of the extracted and transformed data is successfully loaded into the data warehouse

Potrebbero piacerti anche