Sei sulla pagina 1di 29

Course: BIS3214 Data warehousing and Business Intelligence Module leader: Joanna Loveday Students:

Student Name Bethlehem Kibret (Group leader) Kenechukwu Ijeh Gideon Akintomiwa Ogunleye Ajibola Bolorunduro Adekunle Sheila Ouma

Student Number M00310028 M00428671 M00426634 M00428993 M00416330

Introduction

Data warehouse Data warehouse is system designed to help in management of decision making. Data warehouse has a wide selection of data that presents a logical image of the business. Development of a data warehouse includes development of systems to extract data from operating systems plus installation of a warehouse database system that provides managers flexible access to the data. Subject-Oriented: A data warehouse can be used to analyse a specific subject area in a company. For example, "finance can be an area of subject. Integrated: A data warehouse integrates data from numerous data sources. For example, source X and source Y may have completely opposite ways of identifying a particular customer, but a data warehouse will have only one way of identifying the customer. Time-Variant: Historical data is kept in a data warehouse. For example, data that was stored a year ago can be retrieved from a data warehouse. Non-volatile: Once data has already been stored onto a data warehouse, it can never be altered or modified in any way possible. For example historical data can never be altered in a warehouse. A data warehouse is a duplicate of transaction data exactly structured for query and analysis. Advantages and disadvantages of data warehousing An advantage is that a data warehouse is a computerised system in which information is stored. With a data warehouse an organisation that owns the information can analyse it so that finding historical design or connection allows making very important decisions in a business. With a data warehouse users will be able to access very large amount of information which can be used to solve many problems and also used to increase profits in an organisation. Data on the data warehouse is also consistent because it is well organised and appropriate

With a data ware house data from all locations can be put together and stored in one particular location which really easy to access by users. An advantage is that data is taken from many different data sources and combined in one centralised location, so a c company can analyse it in a way that allows them to have different solutions that the data can be looked at separately. Another advantage of data warehouses is that a structure can be created to allow changes to be carried out within the stored data and then transferred back to the operational system. There are a couple of disadvantages like before data can be stored on a data warehouse it needs to be cleaned, loaded and extracted which takes a really long time to do Issues with compatibility can also be a big problem. For example, a newly developed transaction system might not work with a system that is already in place. So it might be costly to train users to be able to user the new system. There will also be security issues if the data warehouse allows users to access the system via the internet. Data warehouses are important to companies now days, so companies that use data warehouses need to have detailed information that is related to the transactions that take place within the collection. When a company is able to analyse detailed information that is related to its own operations, the company can make strategic decisions easier. Data mining Data mining is the computer based process used to go through and analysing huge amount of data and then mining the entire data to get meaningful data. Data mining tools forecast behaviours and future trends, allowing businesses to make positive decisions. Data mining tools can help make a business decision that would have taken a long time to resolve. They clean databases for hidden designs, finding analytical information that specialists may not find just because it is out of their expectation.

Data mining consists of five major elements:

Extract, transform, and load transaction data onto the data warehouse system. Store and manage the data in a multidimensional database system. Provide data access to business analysts and information technology professionals. Analyze the data by application software. Present the data in a useful format, such as a graph or table.

Advantages and disadvantages Marketing: Data mining helps to build models based on historical data to predict who will reply to any of their new marketing plans in place like online marketing, mails, etc. this prediction can help to be a right method to sell and make profits from products to a targeted group of customers to provide the best service and fulfil customers high satisfaction. Data mining is a benefit to most retail companies as through market analysis; the store can have a proper construction plan in the way that customers can buy products while they are happy. It also helps the retail companies to offer particular discounts to certain products that will help attract more customers. Finance: Data mining helps to provide most financial companies about information on loans and credit reporting. So by developing a model from previous customers data, the bank and financial can estimate what are the god and/or bad loans and its risk level. In addition, data mining can help banks to detect fraudulent credit card transaction to help credit cards owner prevent their losses. Governments: Data mining helps government departments to analyze records of financial transaction to develop patterns that can identify any money criminal activity like online fraud. Privacy Issues: people are scared of their private information being collected and used in unethical way casing harm to any business sector. For example customers details might be collected and for some reason the business doesnt last or other company takes over, this is when privacy issues comes up because personal customer information might be sold to other companies or leaked out

Security issues: Security is another massive issue because most businesses have details of their employees and customers stored on the data warehouse and it can be hacked into and important information is stolen. Information misuse: any information gathered by the use of data mining for the purpose of marketing for example can easily be missed used. This data can be misused by unethical people to take advantage of vulnerable people or discriminate against a group of people.

The business process of the company (5 marks) 1.1. Company name and type of business British airways (travel industry) British Airways (BA) is an airline based in the United Kingdom, Waterside, near its main hub at London Heathrow Airport. British air ways is the largest airline in the United Kingdom with international flights and destinations and British airways used to be the largest passengers carrier till 2008 when it was beaten to second place by rival Easy Jet. The British Airways Board was established in 1971 to control the two nationalized airline corporations, BOAC and BEA, and two smaller, regional airlines, Cambrian Airways, from Cardiff, and Northeast Airlines, from Newcastle upon Tyne. British Airways is one of the worlds premier airlines and a founding member of the one world alliance. The airline reaches 190 destinations in 90 countries from its London Heathrow hub. Operations British Airways is the largest airline based in the United Kingdom in terms of fleet size, international flights, and international destinations and was, until 2008, the largest airline by passenger numbers as well. The airline carried 34.6 million passengers in 2008 but Easy Jet, a rival, low-cost carrier carried 44.5 million passengers that year, passing British Airways for the first time. British Airways holds a United Kingdom Civil Aviation Authority Type A Operating License, it is permitted to carry passengers, cargo, and mail on aircraft with 20 or more seats Travel classes British Airways offers up to four cabin classes First, Club World (Business Class), World Traveller Plus (Premium Economy) and World Traveller (Economy class).
5

1.2. Company objective, vision and scope Objectives British Airways main objective is to focus on customer service at every level of a passenger's journey. The main objective is divided into three global meaning appealing to all passengers, whether for leisure or business travel in order to create repeat customers); Premium meaning ensuring that all passengers receive the highest quality of service that will be provided; and Airline meaning maintaining the focus on aviation with the latest equipment, products and services. The strategic objective of British Airways provides five strategic goals: Airline of Choice remains the top choice for international flights for premium customers as well as cargo, economy and shorter flights. Customers Partners Colleagues Performance Excellence

Vision The vision of British airways is to become the worlds most responsible airline, and the activities are centred in key areas: Workplace making sure sustainable employment is offered to current employees. Marketplace building a more sustainable business by working with suppliers and customers. Environment reducing the impact on the environment, which contribes to climate change, air quality, noise and waste. . Scope British Airways scope is to provide the best service ever to their customers as it is one of the worlds most recognised airlines. The airline provides services to more than 170 destinations throughout Europe, North America, South America, Asia, Africa and Australia. The scope of the company is as follows below: This data warehouse will help the company to carry out any fast billing Helps to maintain a great database of all customers. Will enable to create queries fast and easy. Easy to maintain in future prospect.
6

1.3. A short description of the existing system Analysis of present system It is important to study the system that will be improved or replaced if the company had one in place. We had to analyze how this system uses hardware, software, network and the resources to translate data resources, like transaction data, into information like reports. Therefore we documented how the information system activities are carried out the input, processing, output, storage and control of the entire system. Problem of existing system User friendly: The existing system is not user friendly as the retrieval and storing of data is slow and data is not maintained professionally. Modify data: it is impossible to modify and improve data as it is difficult to manage huge amount of data effectively and efficiently. Paperwork: the existing system has a lot of paper work to be carried out; even the smallest transaction requires so many papers filled in. also in an incidence of a fire for example can destroy all data of the company leading to data loss. Strategic competitive advantage: the existing system does not support the company in any competitive advantages strategically. Manual operator control: many errors are occurring as the company is using a manual operator control. Generating Reports: the existing system does not generate any reports or the reports generated are time consuming to be created. Data sharing: Data cannot be shared in the existing system. So people cannot use the same data at the same time in the company. Decision-making: the existing system does not support decision-making, so managers make decisions using the gut instinct.

1.4. Department(s)/sector(s) that you have identified to carry out the project tasks Booking department In the booking departments, this is where all flight details, bookings and cancellations are carried out. So passengers can book a flight representing a completed sale which is also known as reservation or transaction. Finance department The finance department is where the companys money is managed. So the department is very important in any company as this is where the planning, organising, auditing. Accounting is all carried out in order to control the companys finances and also the finance department produces finance statements for the company. Marketing department The marketing department is like a guide to the other departments in the company. They make sure the best service is provided to their customers remembering the communication is very important. The marketing department has a good understanding of the current market and needs of customers. Sales department The sales department is attracts and retain as many customers as possible. The department needs to meet customer demands which will increase sales volume in a particular period of time. The department also helps the marketing department to meet sales volume as forecast 1.5. Why do you think a Data Warehouse is a solution for their business? There are a number of reasons why British airways need a data warehouse. The most important benefit of using data warehouses is that they will be able to store and present information in such a way that it allows decision making easy for the company. A data warehouse will allow business executives to look at the company as a whole instead of looking at it in terms of departments and the ability to be able to handle all the information they have. A Data Warehouse Delivers improved Business Intelligence: The British airways managers will no longer have the need to make decisions with little information gathered, but a data warehouse and business intelligence can be used to manage business processes in departments which include sales, marketing and finance departments.

Saves Time: Decisions can be made fast saving lots of company time, as users have quick access to critical information from a number of different sources all stored in one place. Data Quality and Consistency: Data from a data warehouse can be accurate because data collected from different departments in the company is standardized each department producing results that are in line with the other departments and is formatted for good quality and consistency data. Provides Historical Intelligence: Data warehouse stores massive amounts of historical data so users can analyze different time periods and trends which help in making future predication for the company, such data usually cannot be stored in a transactional database. Generates more revenue: Data warehouses and complementary business intelligence has generated more money for the company and helped save lots of money at the same time than organizations that have not implemented a data warehouse. User friendly: the system developed will be user friendly and training will be provided to make sure all users know what they are doing. Data can be shared easily, decisions made quickly and reports generated fast.

2. Data design (20 marks) The table below shows the operational tables for British airways, but the department we shall concentrate on the sales department. The reason we choose the sales department is simply because this is where most of the transaction is carried out.

2.1. What are the current data sources? Database: British Airways OPERATIONAL-TABLES Employee-Table Employee No (auto gene.) First name Last name Department Id Job title Gender DOB Address City State Country Email Phone Date Hired Salary Commission References Flight Table Flight Id. Flight Name Flight from Passenger-Table Passenger-Id (auto gene.) Passenger full name State City Country Phone Next of Kin Office Location Table Office Id Office Name Office Address Phone No. of Employees

Department Table Department Id. Department Name Manager Id


10

Pilot Table Pilot No. Pilot Name Pilot flight route

Flight to Flight departure Flight arrival

Planes-Table Plane Id Plane name Plane code Regular Region Plane type Plane size

Pilot address Pilot Phone Pilot Schedule Email

Ticket Table Ticket Id Ticket Name Ticket Type Ticket Duration Ticket Expiration Country

Sales-Table Transaction Id Ticket Id Sales No Purchased Price Flight Id Plane ID Region id Return Ticket Insurance Table Insurance no Insurance type Plane Id Region Id Time

Region-Table Region Id Region Zone Location Id

Managers Table Manger ID Manager name Address Phone Email

Promotion-Table Promotion id Promotion name Promotion type Period Eligibility Validity Agency Table Agent id Agent name Address Phone

Time Id Week Month Year

11

2.2. ERD(s) for the business


Passenger 1..1 Pass Id ----PK Ticket ID -----FK 1..* Ticket -----ID 1..* Ticket Ticket ID--PK Region ID---FK 1..* 1 1..1 1..1 Promotion Ticket IDPK TicketIDd-- FK Ticket -----ID Promotion

1..1

1..* Region

Flight Flight IDPK Ticket ID -----FK Planes ID---FK 1..* 1..1

Planes Planes --ID--PK Flight ID---FK Ticket -----ID

Region ID---PK Ticket ID -----FK

12

2.3. Conceptual, logical and physical design of data models (your design should result in a set of star and/or snowflake schemas) A data model can be in forms of diagrams that show the relationships between data. Data models use multiple models to view the same data and ensure that all processes, entities, relationships and data flows have been identified. Below are the three different approaches used for data modeling: Conceptual Data Model: this model identifies the highest-level relationships between different entities. Logical Data Model: this shows particular entities, attributes and relationships involved in a business function. Physical data model: Physical data model shows how the model will be created in the database. A physical database model shows the column name, column data type, column constraints, primary key, foreign key, and relationships between tables. A physical data model Features the following:

All tables and columns Specification. Foreign keys are used to identify relationships between tables. Physical consideration may cause the physical data model to be relatively different from the logical data model. Physical data model will be different for different Relational Database management system. For example, data type for a column may be different in MySQL and SQL Server.

13

Conceptual
Sales Analysis Employee Region

Time Promotion

Passengers

Tickets Flights

14

Logical
Employee Employee ID (PK) First name Last name Department Id Job title Gender DOB Address

Sales Analysis

Region

1..1

1..*1

Sales ID (PK) Employee ID (FK) Region id (FK) Promotion Id (FK) Flight Id (FK) Passenger-Id (FK)

1..*1

1..1

Region Id (PK) Region Zone Location Id

Promotion
Promotion id (PK) Promotion name Promotion type Period Eligibility Validity

Time
Time ID (PK) Hourly Daily Week Month Year Passenger

1..1

1..*1

Plane ID (FK) Ticket ID(FK) Return Ticket Purchased Price Cost

1..*1

1..1

Planes
Plane Id (PK) Plane name Plane code Regular Region Plane type Plane size

1..1

1..*1

Profit

1..*1

1..1

Passenger-Id (PK) Passenger full name State City Country Phone Next of Kin Flight Id Flight Name Flight from Flight to Flight departure

1..*1
Flight

1..*1
1..1

Ticket Ticket ID(PK) Ticket Name

1..1

Ticket type 15 Ticket duration

Flight arrival

Varchar(20),

Physical
Employee Employee ID Integer First name Varchar (50) Last name Department Id Integer Job title Varchar(50) Gender Varchar(50) DOB number Varchar(50) Address text Varchar(50)

Sales Analysis
Sales ID Integer Employee ID Integer Region id Integer Promotion Id Integer Flight Id Integer Passenger-Id Integer Plane ID Integer Ticket ID Integer Return Ticket Float Purchased Price Float Cost Float

Region Region Id Integer Region Zone Varchar (50) Location Id Integer

Promotion

Time
Time ID Integer Hourly Daily Integer Week Integer Month Integer Year

Promotion id Integer Promotion name Varchar (50) Promotion type Period Integer Eligibility Varchar (50) Validity Varchar (50) Planes

Profit Float Passenger

Passenger-Id Integer Passenger full name Varchar (50) State Varchar (50) City Varchar (50) Country Varchar (50) Phone Varchar (50) Next of Kin Varchar (50)

Flight
Flight Id integer

profit

Plane Id Integer Plane name Varchar(50) Plane code Integer Regular Region Plane type
Ticket Ticket ID(PK) integer Ticket Name Varchar (50) Ticket type Varchar (50) Ticket duration Varchar(50) Varchar(50)

Plane size varchar(50)

Flight Name Varchar (50) Flight from Varchar (50) Flight to varchar (50) Flight departure Varchar (50)

16

Flight arrival Varchar(20),

Second star schema Conceptual FACT TABLE Dim promotion Dim promotion name

Dim Eligibility Dim Period

Logical

Dim Promotion Prom No PK Prom ID FK Period ID FK Eligibility ID FK Cost Profit

Dim Promotion Name Prom ID PK Prom name Prom Type

Dim Eligibility Eligibility ID City Country

Dim period Period ID Hourly Week Month Year

17

Physical Dim Promotion Prom No PK integer Prom ID FK integer Period ID FK integer Eligibility ID FK integer Cost Varchar(20) Profit Varchar(20)

Dim Promotion Name Prom ID PK integer Prom name Varchar(20) Prom Type Varchar (20)

Dim period Period ID integer Week integer Month integer Year integer

Dim Eligibility Eligibility ID integer City varchar(20) Country varchar (20)

Snow flake Time Date Day Week Month Year

Sales analysis Ticket ID PK Passenger ID FK Time FK Region FK Quantity sold Amount sold

Passenger Passenger ID Passenger full name State City Country Phone Next of kin Passenger type Passenger ID Passenger destination

Ticket Ticket ID Ticket name Ticket type Ticket duration

Region Region ID Region zone Location

Location Location ID Location name

18

2.4. Granularity of dimension and fact tables Granularity Granularity is used in fact tables to determine the granularity of the fact table. So this means that the lowest level of information that will be stored in the fact table. Which Dimensions to Include For the sales fact table we shall choose time, region, passenger and ticket. A promotion program might take place, where passengers will provide some personal information in exchange for a reward and the airline would offer lower prices for certain destination flights for passengers who present a promotional code when booking a flight. This will enable the company to track the passenger dimension. What Level within Each Dimension to Include British airways might want to do analysis along an hourly level? i.e., the airline will look at tickets sold by different hours of the day, so if using hourly is the best then the lowest granularity in the time dimension will be hourly. If daily analysis is appropriate, then day can be used as the lowest level of granularity. Because the lower the level of detail, the larger the data amount in the fact table. There are three types of facts:

Additive: Additive facts can be added up using all of the dimensions in the fact table.

Semi-Additive: Semi-additive facts can be added up for some of the dimensions in the fact table, but not all.

Non-Additive: Non-additive facts cannot be added up for any of the dimensions present in the fact table.

19

Below is an illustration Sales analysis Ticket ID PK Passenger ID FK Time FK Region FK Quantity sold Amount sold

The purpose of this table above is to record the sales amount for each ticket purchased by a passenger in each store on a daily basis. Sales Amount is the fact and it is an additive fact because you can add up this fact. For example, the sum of Sales Amount for all 31 days shows the total sales amount for that month.

20

3. Implementation & data analysis (20 marks) British Airways MY SQL code
CREATE TABLE Employee( employeeid int Not Null, firstname int Not Null, lastname Varchar(25), departmentid Varchar(25),jobtitle Varchar(20), jender Varchar(20), dob Varchar(20), address Varchar(30), city Varchar(35), state Varchar(50), country Varchar(50), email Varchar(75), phone Varchar(70), dateHired Varchar(20), salary Varchar(40), commission Varchar(20), PRIMARY KEY (employeeid) )ENGINE=InnoDB insert the table

21

CREATE TABLE Flight( FlightID int Not Null, FlightName int Not Null, Flightfrom Varchar(20), Flightto Varchar(30), Flightdeparture Varchar(20), Flightarrival Varchar(20), PRIMARY KEY (FlightID) )ENGINE=InnoDB

CREATE TABLE Passenger( PassengerID int Not Null, fullname int Not Null, State Varchar(20), City Varchar(20), Country Varchar(20), Phone Varchar(20), PRIMARY KEY (PassengerID) )ENGINE=InnoDB

22

CREATE TABLE Planes( PlaneID int Not Null, Planename int Not Null, Planecode Varchar(20), RegularRegion Varchar(20), Planetype Varchar(20), Planesize Varchar(20), PRIMARY KEY (PlaneID) )ENGINE=InnoDB

CREATE TABLE Region( RegionID int Not Null, RegionZone Varchar(20), LocationId Varchar(20), PRIMARY KEY (RegionID) )ENGINE=InnoDBCREATE

23

CREATE TABLE Promotion( PromotionID int Not Null, Promotionname int Not Null, Promotiontype Varchar(20), Period Varchar(20), Eligibility Varchar(20), Validity Varchar(20), PRIMARY KEY (PromotionID) )ENGINE=InnoDB

Create Table Ticket( TicketID int Not Null, TicketName Varchar(25), Tickettype Varchar(30), Ticketduration Varchar(20), PRIMARY KEY(TicketID) )ENGINE=InnoDB

24

Create Table Time( TimeID int Not Null, Week Varchar(52), Month Varchar(12), Year Varchar(30), PRIMARY KEY (TimeID) )ENGINE=InnoDB

25

CREATE TABLE Sales( SalesID int Not Null, employeeid int Not Null, PromotionID int Not Null, FlightID int Not Null, PlaneID int Not Null, RegionID int Not Null, PassengerID int Not Null, TicketID int Not Null, TimeID int Not Null, ReturnTicket Varchar(25), PurchasedPrice Varchar (30), PRIMARY KEY (SalesID), FOREIGN KEY (employeeid)REFERENCES Employee (employeeid), FOREIGN KEY (PromotionID)REFERENCES Promotion (PromotionID), FOREIGN KEY (FlightID)REFERENCES Flight (FlightID), FOREIGN KEY (PlaneID)REFERENCES Planes (PlaneID), FOREIGN KEY (RegionID)REFERENCES Region (RegionID), FOREIGN KEY (PassengerID) REFERENCES Passenger (PassengerID), FOREIGN KEY (TicketID)REFERENCES ticket(TicketID), FOREIGN KEY (TimeID) REFERENCES time (TimeID) )ENGINE=InnoDB

26

3.1. How will you implement the ETL process? (cf. Oracle9i Data Warehousing Guide online, Chapter 11) E.g. full or incremental, online or offline extraction? ETL stands for extraction, transformation, & loading is a planned data incorporation process that includes extracting data from different data sources, transforming the data into a suitable format, and loading the data into a data warehouse for storage. ETL makes it possible to physically move data from source to target data store. The first stage of extraction is to collect or gather data from different sources. The second stage is to transform the data by converting, reformatting and cleansing the data into a suitable format that can be used in the targeted database. The third stage and the last stage is loading, the transformed data is imported into a data warehouse, or a data mart. In British airways data will be extracted from external sources and converted into more suitable formats. The transformation stage will change the data format into a standard format for example employee name is split into first, middle and last names, and allocate the appropriate manager and the employees should be based to either work in the inside sales department or the outside sales department. The load stage will take the resulting file and send it to the data warehouse and reports will be generated showing the data loaded.

3.2. Sample analytical script(s), with e.g. cube roll-up and drill-down, materialised views, etc. Show codes 4. Future challenges (5 marks) 4.1. Possible recommendations on strategies to overcome future challenges, for example, a data mining approach Data mining strategic approach to improve forecast accuracy in British airways: The Database models developed by applying data mining techniques could be used to improve forecasting accuracy in the airlines business processes. In order to improve the revenue of a flight, the number of seats available is typically higher than the physical seat capacity (Overbooking). To optimize the booking rate, an accurate estimation of no show passengers (Passengers who hold a valid booking, but do not appear
27

at the gate to board a flight) is essential. To tackle this issue accurately, classification trees and logistic regression models should be applied to estimate the probability that a passenger turns out to be a no-show. Passenger information stored in the reservation system is either used directly as an explanatory variable, or used to create attributes that have an impact on the probability of a passenger turning out to be a no-show. Software change: The software used to operate the data warehouse will be helpful as there will be tasks, to be carried out such as generating reports. As there are many different types of software on the market available to operate this software to manage the data warehouse there is always one that will match the company needs. This software is time saving and helps to maintain large amounts of data but they do have down sides as well included below: Loss of Data or Service: in the future when the company is changing software, they need to be careful any loss of service due to computer outage could cause a work disruption which delays or even prevents input of new data into the system or access data already stored onto the system. Incorrect Information: The information in a data warehouse is only as valid as the information put into the system. Since most data warehouse need some manual input of data, report results could be incorrect unless all input data is reviewed. If there is a tendency to only review the final reports or output of an accounting system, it may be difficult to find faulty information. System Configuration: as the company grows, there may be a need to change the data warehouse in the future and this could cause a large disruption, as information must be migrated and new training is needed for employees who will be using the system. Cost: A disadvantage of a data warehouse in the future might be the cost involved in running the system. The software will be purchased, but it requires cost of maintenance, customization, and training. Granularity: this is important in any company as it helps to make decisions easy and fast but in the future as the company develops fast, the level of granularity will increase in the sets of data

28

References Read more: The Goals and Objectives of British Airways | eHow.com http://www.ehow.com/list_7499971_goals-objectives-britishairways.html#ixzz2HIHu5amg http://www.britishairways.com/cms/global/microsites/ba_reports/pdfs/13_CR _Intro.pdf Read more: http://www.businessdictionary.com/definition/financedepartment.html#ixzz2HadxY5bt http://money.howstuffworks.com/marketing-plan3.htm http://www.dataminingtools.net/wiki/introdw.php KHURANA, J, 2011. billing system . data warehouse, [Online]. 4, 5-8. Available at: http://www.iisjaipur.org/iiim-current-08/mca_iv_sem_pro_eva/04.projectbilling%20system.pdf [Accessed 22 January 2013]. Read more: http://www.businessdictionary.com/definition/granularity.html#ixzz2IoOQrAxE

29