Benvenuto in Scribd!

CSE P546 Data Mining Homework 1: Data Warehousing, OLAP, Decision Trees

Caricato da

Il 0% ha trovato utile questo documento (0 voti)

50 visualizzazioni2 pagine

This document outlines homework assignments for a data mining course. It includes two parts: [1] Part A focuses on data warehousing and OLAP with questions about drawing schema diagrams and performing OLAP operations on a university data warehouse. [2] Part B covers decision trees with questions about information gain, finding contradictions in training data, and reduced-error pruning of decision trees. The assignments are due on April 11th for Part A and April 18th for Part B.

Descrizione originale:

Titolo originale

hw1

Copyright

Formati disponibili

PDF, TXT o leggi online da Scribd

Condividi questo documento

Condividi o incorpora il documento

Opzioni di condivisione

Hai trovato utile questo documento?

Questo contenuto è inappropriato?

Segnala questo documento

Copyright:

Formati disponibili

Scarica in formato PDF, TXT o leggi online su Scribd

Segnala contenuti inappropriati

Il 0% ha trovato utile questo documento (0 voti)

50 visualizzazioni2 pagine

CSE P546 Data Mining Homework 1: Data Warehousing, OLAP, Decision Trees

Caricato da

hiran

Copyright:

Formati disponibili

Scarica in formato PDF, TXT o leggi online su Scribd

Segnala contenuti inappropriati

Salta alla pagina

Sei sulla pagina 1di 2

Cerca all'interno del documento

CSE P546 Data Mining Homework 1

Due Date: 11th April for Part A, and 18th April for Part B. We would prefer
that you turn in a hard copy of your solutions at the start of class. However, you
can also email them to bhushan@cs. Your file must contain your name at the top,
and can be in any of pdf, Word or plaintext formats.

Part A: Data Warehousing & OLAP

1. (Han & Kamber 3.4) Suppose that a data warehouse for Big University con-
sists of the following four dimensions: student, course, semester, and in-
structor, and two measures count and avg-grade. When at the lowest con-
ceptual level (e.g., for a given student, course, semester, and instructor com-
bination), the avg-grade measure stores the actual course grade of the stu-
dent. At higher conceptual levels, avg-grade stores the average grade for the
given combination.

(a) Draw a snowflake schema diagram for the data warehouse.

(b) Starting with the base cuboid [student, course, semester, instructor],
what specific OLAP operations (e.g., roll-up from semester to year)
should one perform in order to list the average grade of CS courses for
each student.
(c) If each dimension has five levels (including all), such as “student <
major < status < university < all”, how many cuboids will this cube
contain (including the base and apex cuboids)?

2. (Han & Kamber 3.5 a, b) Suppose that a data warehouse consists of the four
dimensions, date, spectator, location, and game, and the two measures, count
and charge, where charge is the fare that a spectator pays when watching a
game on a given date. Spectators may be students, adults, or seniors, with
each category having its own charge rate.

(a) Draw a star schema diagram for the data warehouse.

1
(b) Starting with the base cuboid [date, spectator, location, game], what
specific OLAP operations should one perform in order to list the total
charge paid by student spectators at GM place in 2004?

Part B: Decision Trees

1. Mitchell 3.1

2. Mitchell 3.2

3. Suppose training set S has p positive and n negative instances. Suppose

attribute A splits S into the sets fS1 ; S2 g such that Si has pi positive and
ni negative instances. Show that the information gain for A, as defined in

Equation 3.4 in Mitchell, is always non-negative. Under what conditions

will it be zero.

4. Suppose there are n boolean attributes and 1 boolean class, and the training
set is composed of m distinct examples drawn uniformly from the set of
2n+1 possible examples, what is the probability of finding a contradiction
(i.e., the same example with different classes) in the data? Explain briefly
how you came up with your answer. There is no need to expand any binomial
coefficients that may occur in your answer.

5. Suppose that in reduced-error pruning of a decision tree, the validation-set

error rate of the subtree rooted at node N is greater than the error rate ob-
tained by converting N into a leaf. Is it possible that N is not a leaf in the
optimal pruned tree? Explain.

Potrebbero piacerti anche

Unit 2-Assignment-2-1
Documento1 pagina
Unit 2-Assignment-2-1
Snowflake
Nessuna valutazione finora
Assignment Data Warehousing - Odt
Documento3 pagine
Assignment Data Warehousing - Odt
missbannu7350
Nessuna valutazione finora
Dataware dp2
Documento1 pagina
Dataware dp2
Mamta Taneja
Nessuna valutazione finora
Cia1 Paper
Documento2 pagine
Cia1 Paper
vik
Nessuna valutazione finora
Kamal 8th Sem Mining 2074
Documento1 pagina
Kamal 8th Sem Mining 2074
kamalshrish
Nessuna valutazione finora
Homework Assignment 2: Total Points 80
Documento2 pagine
Homework Assignment 2: Total Points 80
samriddhi
Nessuna valutazione finora
Big University case study explores data warehouse design
Documento5 pagine
Big University case study explores data warehouse design
Bharat Goyal
100% (5)
STW124MS-Assignment I (CW - 1) 123
Documento6 pagine
STW124MS-Assignment I (CW - 1) 123
TV Serial Update
Nessuna valutazione finora
Winsem2012-13 Cp0535 Modqst Model QP
Documento4 pagine
Winsem2012-13 Cp0535 Modqst Model QP
thelionphoenix
Nessuna valutazione finora
Question Bank For Test1
Documento1 pagina
Question Bank For Test1
SANGRAM SINGH 1CD20IS093
Nessuna valutazione finora
Chapter 3
Documento4 pagine
Chapter 3
Ziad Al-otaibi
100% (3)
Internal Assessment 1 Question Bank 22-23
Documento1 pagina
Internal Assessment 1 Question Bank 22-23
tadasd
Nessuna valutazione finora
DW Assignment
Documento1 pagina
DW Assignment
trupti.kodinariya9810
Nessuna valutazione finora
ER Diagram Examples
Documento4 pagine
ER Diagram Examples
S Sarkar
75% (4)
Machine Learning: E0270 2015 Assignment 4: Due March 24 Before Class
Documento3 pagine
Machine Learning: E0270 2015 Assignment 4: Due March 24 Before Class
Mahesh Yada
Nessuna valutazione finora
CSC263 Winter 2021 Problem Set 1: Instructions
Documento4 pagine
CSC263 Winter 2021 Problem Set 1: Instructions
Codage Aider
Nessuna valutazione finora
ST102 Exercise 1
Documento4 pagine
ST102 Exercise 1
Jiang H
Nessuna valutazione finora
CS402 Data Mining Question Bank Modules 1-2
Documento6 pagine
CS402 Data Mining Question Bank Modules 1-2
Junaid M Faisal
Nessuna valutazione finora
Assignment 1 19
Documento2 pagine
Assignment 1 19
Yash Kothari
Nessuna valutazione finora
S in Reverse Order. S
Documento2 pagine
S in Reverse Order. S
creativeshuvo
Nessuna valutazione finora
Gujarat Technological University
Documento5 pagine
Gujarat Technological University
compiler&automata
Nessuna valutazione finora
CS5785 Homework 4: .PDF .Py .Ipynb
Documento5 pagine
CS5785 Homework 4: .PDF .Py .Ipynb
Al Tarino
Nessuna valutazione finora
Department of Computer Science and Engineering
Documento3 pagine
Department of Computer Science and Engineering
Md.Ashiqur Rahman
Nessuna valutazione finora
E-R Diagram for University Registrar's Office Data (40ch
Documento6 pagine
E-R Diagram for University Registrar's Office Data (40ch
Rajavati Nadar
100% (1)
A Detailed Lesson Plan in Mathematics 10: A. Preliminary/Routinary Activity
Documento12 pagine
A Detailed Lesson Plan in Mathematics 10: A. Preliminary/Routinary Activity
Wilson Morales
Nessuna valutazione finora
Midterm Exam Solution
Documento11 pagine
Midterm Exam Solution
ShelaRamos
Nessuna valutazione finora
Assignment DBDA Preeti
Documento12 pagine
Assignment DBDA Preeti
mounika
Nessuna valutazione finora
Mathematics: Quarter 3 - Module 4
Documento20 pagine
Mathematics: Quarter 3 - Module 4
Cillian Reeves
Nessuna valutazione finora
2020 Assignment1 PDF
Documento4 pagine
2020 Assignment1 PDF
Joaquín Villagra Pacheco
Nessuna valutazione finora
Data Warehousing and Mining Exam Questions
Documento2 pagine
Data Warehousing and Mining Exam Questions
Monika
Nessuna valutazione finora
Data Mining Major Exam
Documento2 pagine
Data Mining Major Exam
Ajay Kumar
Nessuna valutazione finora
Bangladesh University of Business & Technology (BUBT) : Department of Computer Science and Engineering
Documento2 pagine
Bangladesh University of Business & Technology (BUBT) : Department of Computer Science and Engineering
BD Entertainment
Nessuna valutazione finora
HW 1
Documento5 pagine
HW 1
calvinlam12100
Nessuna valutazione finora
BB113 Group Assignment
Documento4 pagine
BB113 Group Assignment
Kirito Ong
Nessuna valutazione finora
Project C: Dr. Shahin Tavakoli Applied Bayesian Statistics Project 1
Documento2 pagine
Project C: Dr. Shahin Tavakoli Applied Bayesian Statistics Project 1
方鑫然
Nessuna valutazione finora
COMP1942 Question Paper
Documento5 pagine
COMP1942 Question Paper
pakaMuziki
Nessuna valutazione finora
Tutor Test and Home Assignment Questions For de
Documento4 pagine
Tutor Test and Home Assignment Questions For de
achaparala4499
Nessuna valutazione finora
Problem 3:: Ref: Homework Assignment. The George Washington University (Csci 243 - Data Mining, Spring 2007)
Documento2 pagine
Problem 3:: Ref: Homework Assignment. The George Washington University (Csci 243 - Data Mining, Spring 2007)
slogeshwari
Nessuna valutazione finora
Mock Test
Documento13 pagine
Mock Test
mauryapaliya
Nessuna valutazione finora
Chapter 3 Exercises
Documento3 pagine
Chapter 3 Exercises
KidGhost
50% (2)
ML5
Documento5 pagine
ML5
Anonymous C5ACgu0
Nessuna valutazione finora
Mccoy Graduated Difficulty Lesson
Documento7 pagine
Mccoy Graduated Difficulty Lesson
api-256995276
Nessuna valutazione finora
2024 Honework 01 Questions
Documento3 pagine
2024 Honework 01 Questions
iu L (Lucky)
Nessuna valutazione finora
640005
Documento4 pagine
640005
Swetang Khatri
Nessuna valutazione finora
University of Michigan STATS 500 hw2 F2020
Documento1 pagina
University of Michigan STATS 500 hw2 F2020
wasabiwaffles
Nessuna valutazione finora
Statistics in Education
Documento7 pagine
Statistics in Education
Sharan Hans
Nessuna valutazione finora
DW - Compre Solution
Documento7 pagine
DW - Compre Solution
Divyanshu Tiwari
Nessuna valutazione finora
Data Warehouse and Data Mining Unit 1
Documento2 pagine
Data Warehouse and Data Mining Unit 1
shivam.13819011722
Nessuna valutazione finora
Ubd Lesson Plan1
Documento3 pagine
Ubd Lesson Plan1
api-317359280
Nessuna valutazione finora
Mock Final Examination Model Answer: Faculty of Computer Studies TM351 Data Management and Analysis
Documento9 pagine
Mock Final Examination Model Answer: Faculty of Computer Studies TM351 Data Management and Analysis
Christina Fington
Nessuna valutazione finora
1 Analytical Part (3 Percent Grade) : + + + 1 N I: y +1 I 1 N I: y 1 I
Documento5 pagine
1 Analytical Part (3 Percent Grade) : + + + 1 N I: y +1 I 1 N I: y 1 I
Muhammad Hur Rizvi
Nessuna valutazione finora
3013 THEORY and SET UP Student Notes Fall 17-18
Documento5 pagine
3013 THEORY and SET UP Student Notes Fall 17-18
alex61937
Nessuna valutazione finora
CS 5785 - Applied Machine Learning - Lec. 1: 1 Logistics
Documento7 pagine
CS 5785 - Applied Machine Learning - Lec. 1: 1 Logistics
Arindrima Datta
Nessuna valutazione finora
HW 3
Documento5 pagine
HW 3
Abbas
Nessuna valutazione finora
Csizg518 Nov25 An
Documento2 pagine
Csizg518 Nov25 An
TabithaDsouza
100% (1)
Vi Int368 Ml-I
Documento2 pagine
Vi Int368 Ml-I
Prabhat Jha
Nessuna valutazione finora
Learning Plan in Mathematics 6
Documento5 pagine
Learning Plan in Mathematics 6
Mark Jayson De Leon
Nessuna valutazione finora
Week Two Group Assignment 2A: Learning Progression: Numbers and Operations-Fractions Begins in 3rd Grade
Documento31 pagine
Week Two Group Assignment 2A: Learning Progression: Numbers and Operations-Fractions Begins in 3rd Grade
api-432465054
Nessuna valutazione finora
Modern Mathematical Statistics with Applications
Da Everand
Modern Mathematical Statistics with Applications
Jay L. Devore
Nessuna valutazione finora
Olympiad Sample Paper 3: Useful for Olympiad conducted at School, National & International levels
Da Everand
Olympiad Sample Paper 3: Useful for Olympiad conducted at School, National & International levels
EDITORIAL BOARD
Valutazione: 1 su 5 stelle
1/5 (1)
Water Resources
Documento26 pagine
Water Resources
hiran
Nessuna valutazione finora
Mineralresources - Final
Documento24 pagine
Mineralresources - Final
hiran
Nessuna valutazione finora
Land Resources - Final
Documento25 pagine
Land Resources - Final
hiran
Nessuna valutazione finora
FOREST RESOURCES CONSERVATION
Documento17 pagine
FOREST RESOURCES CONSERVATION
hiran
Nessuna valutazione finora
Tutorial 6 PDF
Documento2 pagine
Tutorial 6 PDF
hiran
Nessuna valutazione finora
Energy Resources - Final Send
Documento29 pagine
Energy Resources - Final Send
hiran
Nessuna valutazione finora
Tutorial 7 PDF
Documento2 pagine
Tutorial 7 PDF
hiran
Nessuna valutazione finora
Food Resources
Documento20 pagine
Food Resources
hiran
Nessuna valutazione finora
Chapter 4 Tutorial
Documento11 pagine
Chapter 4 Tutorial
hiran
Nessuna valutazione finora
Food Resources
Documento20 pagine
Food Resources
hiran
Nessuna valutazione finora
VRLA Instruction Manual
Documento11 pagine
VRLA Instruction Manual
ashja battery
Nessuna valutazione finora
7-Seater MPV: Kia Singapore
Documento16 pagine
7-Seater MPV: Kia Singapore
adi
Nessuna valutazione finora
Gen-6000-0mh0/0mhe Gen-6000-0mk0 Gen-6000-0ms0/0mse Gen-7500-0mh0/0mhe Gen-8000-0mk0/0mke Gen-8000-0ms0/0mse
Documento26 pagine
Gen-6000-0mh0/0mhe Gen-6000-0mk0 Gen-6000-0ms0/0mse Gen-7500-0mh0/0mhe Gen-8000-0mk0/0mke Gen-8000-0ms0/0mse
Ahmed Khodja Karim
Nessuna valutazione finora
Bildiri Sunum - 02
Documento12 pagine
Bildiri Sunum - 02
Orhan Veli Kazancı
Nessuna valutazione finora
Assessment Nursing Diagnosis Scientific Rationale Planning Intervention Rationale Evaluation
Documento9 pagine
Assessment Nursing Diagnosis Scientific Rationale Planning Intervention Rationale Evaluation
clydell joyce masiar
Nessuna valutazione finora
Cu Unjieng V Mabalacat
Documento6 pagine
Cu Unjieng V Mabalacat
Mp Cas
Nessuna valutazione finora
Homework No1. Kenner Pérez Turizo
Documento6 pagine
Homework No1. Kenner Pérez Turizo
Kenner Pérez
Nessuna valutazione finora
tmp1AE2 TMP
Documento8 pagine
tmp1AE2 TMP
Frontiers
Nessuna valutazione finora
Design Proposal For North Public & Suite Areas Decorative Lighting, Solaire Quezon City
Documento42 pagine
Design Proposal For North Public & Suite Areas Decorative Lighting, Solaire Quezon City
Richard Libunao Beldua
Nessuna valutazione finora
Imaging With FTK Imager
Documento9 pagine
Imaging With FTK Imager
Robert Leigh
Nessuna valutazione finora
Lista 30 Julio
Documento2 pagine
Lista 30 Julio
Max Bike Martinez
Nessuna valutazione finora
How COVID-19 Affects Corporate Financial Performance and Corporate Valuation in Bangladesh: An Empirical Study
Documento8 pagine
How COVID-19 Affects Corporate Financial Performance and Corporate Valuation in Bangladesh: An Empirical Study
International Journal of Innovative Science and Research Technology
Nessuna valutazione finora
Transforming City Governments For Successful Smart Cities
Documento194 pagine
Transforming City Governments For Successful Smart Cities
Tri Ramdani
100% (2)
RMAN Backup and Recovery Strategies with Oracle Database
Documento26 pagine
RMAN Backup and Recovery Strategies with Oracle Database
Cristiano Vasconcelos Barbosa
Nessuna valutazione finora
Using Social Stories With Students With Social Emotional and Behavioral Disabilities The Promise and The Perils (2019)
Documento17 pagine
Using Social Stories With Students With Social Emotional and Behavioral Disabilities The Promise and The Perils (2019)
Sarah
Nessuna valutazione finora
GCAF Online Inspector Practice Exam
Documento5 pagine
GCAF Online Inspector Practice Exam
camwills2
100% (1)
FEL
Documento71 pagine
FEL
Elimel Rome Rico
100% (4)
Appeal Tax Procedure (Malaysia)
Documento2 pagine
Appeal Tax Procedure (Malaysia)
Zati Ty
Nessuna valutazione finora
SC WD 1 WashHandsFlyerFormatted JacobHahn Report 1
Documento3 pagine
SC WD 1 WashHandsFlyerFormatted JacobHahn Report 1
jackson lee
Nessuna valutazione finora
Medical Parasitology
Documento33 pagine
Medical Parasitology
Alexander Luie Jhames Sarita
Nessuna valutazione finora
What Is Interpol
Documento5 pagine
What Is Interpol
Jimmy Jr Comahig Lape
Nessuna valutazione finora
Grid Infrastructure Installation and Upgrade Guide Ibm Aix Power Systems 64 Bit
Documento284 pagine
Grid Infrastructure Installation and Upgrade Guide Ibm Aix Power Systems 64 Bit
Antonio
Nessuna valutazione finora
Sciencedirect Sciencedirect Sciencedirect
Documento7 pagine
Sciencedirect Sciencedirect Sciencedirect
Mohamed Amine NZ
Nessuna valutazione finora
Natural Frequency of Axially Functionally Graded, Tapered Cantilever Beamswith Tip Masses Mahmoud
Documento9 pagine
Natural Frequency of Axially Functionally Graded, Tapered Cantilever Beamswith Tip Masses Mahmoud
Drm Amm
Nessuna valutazione finora
Science 7 Module1
Documento22 pagine
Science 7 Module1
Hector Panti
0% (1)
Tax - CIR Vs Cebu Toyo Digest
Documento3 pagine
Tax - CIR Vs Cebu Toyo Digest
Dyannah Alexa Marie Ramacho
Nessuna valutazione finora
Cambridge Ext2 Ch1 Complex Numbers IWEB
Documento62 pagine
Cambridge Ext2 Ch1 Complex Numbers IWEB
chen
Nessuna valutazione finora
Image Authentication ENFSI
Documento43 pagine
Image Authentication ENFSI
Iolanda Oprisan
Nessuna valutazione finora
Task and Link Analysis Assignment
Documento4 pagine
Task and Link Analysis Assignment
Azeem Nawaz
Nessuna valutazione finora
KD.7.1-WPS Office
Documento9 pagine
KD.7.1-WPS Office
Pratista Tyas
Nessuna valutazione finora