Documenti di Didattica
Documenti di Professioni
Documenti di Cultura
ie/bmacnamee)
2 of 25 46
Acknowledgments
These notes are based (heavily) on those provided by the authors to accompany Data Mining: Concepts & Techniques by Jiawei Han and Micheline Kamber Some slides are also based on trainers kits provided by
More information about the book is available at: www-sal.cs.uiuc.edu/~hanj/bk2/ And information on SAS is available at: www.sas.com
3 of 25 46
Contents
Motivation: Examples What is business systems intelligence? Motivation: Why business systems intelligence? BI systems BI Application areas Miscellanea Course outline
4 of 25 46
Examples: Telecommunications
Transactional data (about each phone call) Data on mobile phones, house based phones, Internet, etc. Other customer data (billing, personal information, etc.) Additional data (network load, faults, etc.)
5 of 25 46
Questions:
6 of 25 46
Case study:
We cant do manual credit checks on each residential customer, so this saves a lot of time. We know what customers need to make deposits and who isnt a credit risk, so they dont need to have their service cut off if their payment is a few days late. It improves customer satisfaction. Pavel Vlasan, Head of Credit Risk and Collection
7 of 25 46
Examples: Health
Personal health records (at GPs, specialists, etc.) Hospital data (e.g. admission data, midwives data, surgery data) Billing information (VHI, Bupa etc)
8 of 25 46
Questions:
9 of 25 46
Case study:
SAS allows us to make more accurate predictions so that we can present that information to the case managers in a very simple, user-friendly fashion.
- Howard Underwood, Head of Informatics and Quality Metrics
10 of 25 46
Examples: Finance
Credit card transactions Direct debits Loan applications Retail financing deals
11 of 25 46
Questions:
12 of 25 46
Case study:
13 of 25 46
Examples: Retail
Every time you buy items using a loyalty card a record is kept of this On-line the situation is even more extreme every time you even look at an item a record is kept There is a lot of information out there about what you like!
14 of 25 46
Questions:
What kind of special offers would you most likely respond to? Which other customers are you most closely related to? What kind of ads can we display to you while you browse?
15 of 25 46
Case study:
use data mining to predict the behaviour of their customers While they dont use SAS software live on their web site they use it to explore techniques they are interested in deploying We work hard to refine our technology, which allows us to make recommendations that make shopping more convenient and enjoyable. SAS helps Amazon.com analyze the results of our ongoing efforts to improve personalization
-Diane N. Lye Amazon.com's Snr. Manager for Worldwide Data Mining
16 of 25 46
Business intelligence uses knowledge management, data warehouse[ing], data mining and business analysis to identify, track and improve key processes and data, as well as identify and monitor trends in corporate, competitor and market performance. -bettermanagement.com
17 of 25 46
We will basically consider business Used databases and business systems intelligence to be: In 2003 bad image because of TIA Data Warehousing + Data Mining Knowledge Discovery in Databases (1989): +Machine Some Extra Stuff Used by AI, Learning Community Business Intelligence ACHTUNG: A(1990): lot of these terms are Business used management term interchangeably
Also data archaeology, information harvesting, information discovery, knowledge extraction, data/pattern analysis, etc.
18 of 25 46
19 of 25 46
20 of 25 46
We are drowning in data, but starving for knowledge! Solution: Data warehousing and data mining
Data warehousing and on-line analytical processing Mining interesting knowledge (rules, regularities, patterns, constraints) from data in large databases
21 of 25 46
DATA
KNOWLEDGE
22 of 25 46
1960s:
1970s:
Relational data model, relational DBMS implementation
1980s:
RDBMS, advanced data models (extendedrelational, OO, deductive, etc.) Application-oriented DBMS (spatial, scientific, engineering, etc.)
23 of 25 46
1990s:
2000s
Stream data management and mining Data mining with a variety of applications Web technology and global information systems
24 of 25 46
The BI Process
Knowledge
25 of 25 46
Other applications
Text mining (email, documents) and Web mining Stream data mining DNA and bio-data analysis
26 of 25 46
Target marketing
Find clusters of model customers who share the same characteristics Determine customer purchasing patterns over time
Cross-market analysis
Associations/co-relations between product sales, & prediction based on such association
27 of 25 46
Customer profiling
28 of 25 46
Resource planning
Summarize and compare the resources and spending
Competition
Monitor competitors and market directions Group customers into classes and a class-based pricing procedure Set pricing strategy in a highly competitive market
29 of 25 46
Retail industry
Analysts estimate that 38% of retail shrink is due to dishonest employees
Anti-terrorism
30 of 25 46
Other Applications
IBM Advanced Scout analyzed NBA game statistics (shots blocked, assists, and fouls) to gain competitive advantage for New York Knicks and Miami Heat
Sports
Astronomy
JPL and the Palomar Observatory discovered 22 quasars with the help of data mining
31 of 25 46
Steps Of A BI Process
Relevant prior knowledge and goals of application
1) Learning the application domain 2) Creating a target data set: data selection 3) Data cleaning and preprocessing
May take 60% of effort!
32 of 25 46
Steps Of A BI Process
6) Choosing the mining algorithm(s) 7) Data mining: search for patterns of interest 8) Pattern evaluation and knowledge presentation
Visualization, transformation, removing redundant patterns, etc.
33 of 25 46
Making Decisions Data Presentation Visualization Techniques Data Mining Information Discovery Data Exploration Statistical Analysis, Querying and Reporting
End User
Business Analyst
Data Analyst
Data Warehouses / Data Marts OLAP, MDA Data Sources Paper, Files, Information Providers, Database Systems, OLTP
DBA
34 of 25 46
Databases
Data Warehouse
35 of 25 46
Relational database Data warehouse Transactional database Advanced database and information repository
Object-relational database Spatial and temporal data Time-series data Stream data Multimedia database Text databases & WWW
36 of 25 46
Concept description
37 of 25 46
Cluster analysis
Outlier analysis
Outlier: a data object that does not comply with the general behavior of the data Noise or exception? No! useful in fraud detection and rare event analysis
38 of 25 46
AI
KDD
39 of 25 46
Major Issues In BI
Mining different kinds of knowledge from diverse data types, e.g., bio, stream, Web Performance: efficiency, effectiveness, and scalability Pattern evaluation: the interestingness problem Incorporation of background knowledge Handling noise and incomplete data Parallel, distributed and incremental mining methods Integration of the discovered knowledge with existing one: knowledge fusion
40 of 25 46
User interaction
41 of 25 46
Summary
Business Systems Intelligence: Data Warehousing + Data Mining + Some Extra Stuff
We are drowning in data, but starving for knowledge A BI process includes data cleaning, data integration, data selection, transformation, data mining, pattern evaluation, and knowledge presentation There are major steps yet to be made in BI and some major issues yet to be resolved
42 of 25 46
Miscellanea
Me: Dr. Brian Mac Namee E-Mail: Brian.MacNamee@comp.dit.ie Web Site: www.comp.dit.ie/bmacnamee Lectures & Labs:
Monday 14:00 17:00 (A-3030)
43 of 25 46
Miscellanea (cont)
50% continuous assessment
Significant data mining assignment
Assessment:
Books etc:
Data Mining: Concepts & Techniques, J. Han & M. Kamber, Morgan Kaufmann, 2006 DONT BUY IT YET!
44 of 25 46
Course Outline
Business Data Modelling
Data, Information, Knowledge Modelling an activity Framing a business model Developing a model Deploying a model Introduction to data warehousing Characteristics of a data warehouse and how it differs to operational DBs etc Extracting and loading data into a data warehouse Dimensional modelling Data aggregation
Data Warehousing
Data Mining
Introduction to data mining and applications of data mining Data mining lifecycles Data preparation Data association techniques Data classification techniques Data clustering techniques Data visualisation Data evaluation
45 of 25 46
Statistics
Conferences: Joint Stat. Meeting, etc. Journals: Annals of statistics, etc.
Visualization
Conference proceedings: CHI, ACM-SIGGraph, etc. Journals: IEEE Trans. visualization and computer graphics, etc.
46 of 25 46
Questions
47 of 25 46
Disclaimer
Slides accompanying the book Data Mining: Concepts & Techniques Slides from the SAS Introduction to SAS Business Intelligence Applications trainers kit Original slides by Brian Mac Namee
If there are problems with breach of copyright etc, please dont hesitate to contact me