Sei sulla pagina 1di 11

DATA MINING AND KNOWLEDGE DISCOVERY LAB MID TERM (ANSWER TEMPLATE)

NAME: Lim Paul Huah STUDENT ID: 09021379 SIGNATURE: DATE: 30/5/2013

Q1:

[15 marks] Description Answer Justification

Multiple Regression

Linear Regression

Logistic Regression

Q2:

(ANSWER TEMPLATE)

Page 1

[15 marks] Graphical presentation Explanation SAMPLE: The correlation between annual income and pizza expenditure shows that ..

SAMPLE

(ANSWER TEMPLATE)

Page 2

A histogram diagram shown the model of x & y data for the selected target.

Pie chart shown the exact same result as the above, with different selected plot.

(ANSWER TEMPLATE)

Page 3

Q3:

[40 marks] Requirements Screen shots

2 marks per screen shot 5 marks for explanation Total 15 marks

Create Project Airlines

1. To specify a new project name and the location of project, enable to create a server for a new particular project.

(ANSWER TEMPLATE)

Page 4

Create Library AirL 2. In creating a library it connects SAS Enterprise Miner with the raw data source that

enable direct links to the server and to retrieve datasets file from it.

Creating Diagram AirD

3. Upon creating the diagram, the steps will be displayed on the screen to enable analysis and properties changes for the project.

(ANSWER TEMPLATE)

Page 5

Inserting Airline Datasets

4. Airlines datasets were created and imported, meanwhile to make changes on the variable types, roles, level and order to have a correct measurement when generating the models (E.g.: Decision Trees).

Demonstrates variable that are Input and Target

5. In the data source wizard the variables can be modify to its required roles as an input type or a target in order to generate the necessary analysis and model.
(ANSWER TEMPLATE) Page 6

Explanation: With these settings are complete thus the model can only be created, when data are not well defined and inserted the quality of data will then not good, therefore the analysis might have inaccurate results as it should be, as example no handling on missing values. In following the steps, it is important to do so when you define a model from the data.

(ANSWER TEMPLATE)

Page 7

2 marks for screen shot 3 marks with explanation Total 5 marks

Splitting the data in to 50% training and 50% validation

Explanation: As recalled, the training are used to create or build the model where test are to validate the model, to split into 50% of each, drag the data partition from sample of the SEMMA tab, and drag the data partition into the AirD diagrams and make properties changes on the properties panel.

(ANSWER TEMPLATE)

Page 8

2 marks for property screen shot 3 marks for tree screen shot 5 marks with explanation Total 10 marks

Properties of the Decision Tree, to change subtree assessment measure into average square error

Generated: Decision tree from the target variable.

(ANSWER TEMPLATE)

Page 9

Explanation: Drag the decision tree from model of the panel, and link the path with data partition which set early on has a 50:50 split of the test and validation. Upon that run the path and generate the analysis and show the diagram as above.

2 marks for screen shot 3 marks with explanation

2 marks for screen shot 3 marks with explanation Total 10 marks

Average Square Error Properties setting for the decision

Analysis diagram

(ANSWER TEMPLATE)

Page 10

Explanation: Setting properties to average square error allows to find the least error and the average/mean of the total data, and predit into a newly analysis.

(ANSWER TEMPLATE)

Page 11

Potrebbero piacerti anche