Project Data Mining

DATA MINING (STA555)
PROJECT REPORT
TITLE OF PROJECT: IDENTIFYING THE DETERMINANTS

OF LEAVING WORK PREMATURELY
Contents
1.0 Introduction…………………………………………………….…………............... 2
2.0 Import Excel data to SAS………………………………………………….............. 3
3.0 Create new project………………………………………………………………..... 6
4.0 Insert data into project……………………………………………………………... 8
5.0 Data exploration………………………………………………………………......... 10
6.0 Decision tree…………………………………………………………...................... 16
7.0 Logistic regression……………………………………………………………......... 23
8.0 Neural network………………………………………………………………........... 33
9.0 Best model comparison………………………………………………………….…. 39
10.0 Output explanation for best model…………………………………………….…… 42
11.0 Conclusion……………………………………………………………..................... 54
1
1.0 Introduction
1.1 Problem statement
The worker will leave from work prematurely because of many factors such as satisfaction level,
last evaluation, the number of project, work accidents, average monthly working hours, time
spend in the company, promotion for the last five years, sales and salary. In our study, a
prediction task is to determine whether a worker will leave the work prematurely or not based on
the factors.
1.2 Objectives
The research objectives are:
 To develop and compare three predictive models which are Logistic Regression, Neural
Network, and Decision Tree Model.
 To find the best predictive model for predicting the status of employees leaving work
prematurely.
1.3 Scope and limitation
There are some limitations of this study that need to be discussed. We had limited source of data
for our research since we used the secondary data that are collected by other researcher. There
are nine variables and a target. However, we do not need to filter and impute the missing values
data since there is no missing value and outlier.
2
2.0 Import Excel Data to SAS
1. Open SAS 9.3. Then, go to file tab and click on ‘import data’.
2. Choose ‘Microsoft Excel Workbook’ then click Next.
3
3. Choose the table that we want to import and click Next.
4. Then, under library selection, select SASUSER and name the member with ‘HR_DATA’ and
click Next.
4
5. Browse where the file should be save and click Finish.
6. Then the template show that the data has succesfully imported.
5
3.0 Create New Project
1. Open SAS Enterprise Miner Station 14.1
2. Click New Project.
3. Name the project ‘PROJECT DM’ and browse the SAS server directory then click next.
6
4. Then, click Finish.
7
4.0 Insert Data Into Project
1. Right click on data source and choose Create Data Source.
2. Select SAS Table then click Next.
8
3. Browse HR_DATA in sasuser and click Next.
4. Then, click Next until Finish.
9
5.0 Data Exploration
Before we begin with model building and model prediction, we must explore the data and
modifying and correcting data source. This is because the data may not perfect and cannot be
used in model building. The problem for source of data may have too many missing values, the
outlier in data sources and too many categories for nominal measurement. This is because some
of model building such as Neural Network and Logistic Regression cannot handle missing values.
Therefore some manipulation and modification must be applied to data source. Therefore, to deal
with missing value, we must impute or delete the data. We also must do regroup for nominal
variable that has too much range and find outliers. Variables is rejected when there are too many
missing value or too many categories for nominal measurement. Below are the step on
exploration and manipulation of our data.
1. Right click and click ‘Create Diagram’ to create a new diagram
2. Enter ‘EXPLORE’ as the Diagram Name then click OK.
10
3. Drag HR_DATA from data sources into workspace, right click and select edit variable.
4. Look for histogram chart. Identify the problem such as missing value, too many categories in
nominal variable and typing error.
11
5. Since, there is no missing values, too many categories, typing error and outlier problem, so
no need to impute or filter the data.
6. Click Explore tab, drag StatExplore to the diagram and connect the data to StatExplore.
7. Run and see the results. The results show the worth of each variable.
12
8. Click Explore tab, drag Multiplot node to the diagram and connect the data to Multiplot.
9. Run and see the results. The results show the train graphs for each variable.
10. Click Sample tab, drag Sample node to the diagram and connect data to the Sample node.
After that, click Explore tab and drag Graph Explore node to the diagram. Connect the
Sample node to the Graph Explore node.
13
11. Then, run and see the results.
12. Click Sample tab and drag Data Partition node.
14
13. Click Data Partition node and under Data Set Allocations, change the Training to 70,
Validation to 30 and Test to 0.
15
6.0 Decision Tree
1. Select model tab. Drag 5 Decision Tree node to the diagram and connect with data partition.
Name each decision tree as:
 DT_Gini
 DT_Entropy
 DT_Logworth
 DT_Chaid
 DT_Cart
2. Click DT_GINI node and view the properties. At the properties bar, make sure the nominal
target criterion is changed to ‘Gini’.
16
3. Click DT_ENTROPY node and view the properties. At the properties bar, make sure the
nominal target criterion is changed to ‘Entropy’.
4. Click DT_LOGWORTH node and view the properties. At the properties bar, make sure the
nominal target criterion is changed to ‘ProbChisq’.
17
5. Click DT_CHAID node and view the properties. At the properties bar, change:
 Nominal target criterion to ‘ProbChisq’.
 Significance Level to 0.05.
 Maximum Branch to 5
 Leaf Size to 1
 Split Size to 2
 Method to largest
 Assessment Measure to Decision
 Time of Bonferroni Adjustment to After
18
6. Click DT_CART node and view the properties. At the properties bar, change:
 Nominal target criterion to ‘Gini’
 Missing values to Largest Branch
 Number of Surrogate Rules to 5
 Exhaustive to 2000000000
19
7. Then, drag Model Comparison node under Assess tab. Connect all decision tree nodes to
Model Comparison nodes. Right click on the Model Comparison node and click run.
8. Then, we obtain the result.
20
9. From the Fit Statistics results, calculate the data to find the best model using Microsoft Excel.
Copy the data from fit statistics and paste to excel.
10. Find the gap (valid-train) for average square error (ASE), misclassification rate (MR) and
ROC index.
11. After finding the gap, identify the presence of under fitting and over fitting. There is no under
fit model as there is no negative value for ASE gap and MR gap also no positive value for
ROC gap for every model.
12. The over fit is identify by examining the absolute gap between train and valid results. Then,
choose a model that yield largest gap in general. Since DT_CHAID is the largest gap for
ASE, MR and ROC index, thus DT_CHAID is the over fit model.
13. To find the best model for decision tree, we need to eliminate the over fit model. Then, we
find the lowest value of valid ASE and valid MR also largest value of valid ROC index.
14. Since DT_CART is the lowest value for valid ASE and valid MR also largest value for valid
ROC index, thus DT_CART is the best model for decision tree.
21
The DT_CART model is better in predicting the employees that do not left work
prematurely (negative target) since the value of specificity is higher than sensitivity.
22
7.0 Logistic Regression
1. Select Model tab. Drag 7 Logistic Regression node to the diagram and connect with data
partition. Name each logistic regression as:
 Reg_Main
 Reg_Poly
 Reg_Int
 Reg_Main_Poly
 Reg_Main_Int
 Reg_Poly_Int
 Reg_Main_Poly_Int
23
2. Click on Reg_Main node and under the equation table, Main Effect should be yes and the
other is no.
3. Click on Reg_Poly node and under the equation table, Polynomial Terms should be yes and
the other is no.
4. Click on Reg_Int node and under the equation table, Two-Factor Interactions should be yes
and the other is no.
24
5. Click on Reg_Main_Poly node and under the equation table, Main Effect and Polynomial
Terms should be yes and the other is no.
6. Click on Reg_Main_Int node and under the equation table, Main Effect and Two-Factor
Interactions should be yes and the other is no.
Yes
s
7. Click on Reg_Poly_Int node and under the equation table, Two-Factor Interactions and
Polynomial Terms should be yes and the other is no.
25
8. Click on Reg_Main_Poly_Int node and under the equation table, Main Effect, Two-Factor
Interactions and Polynomial Terms should be yes.
9. Click on Assess tab and drag Model Comparison node to the diagram and connect all logistic
regression nodes to the Model Comparison (2).
26
10. Right click on Model Comparison (2) and click run. Then, see the results.
11. Copy the data from fit statistics and paste to Microsoft Excel. From the Fit Statistics results,
calculate the data to find the best model using Microsoft Excel.
27
12. Find the gap (valid-train) for average square error (ASE), mean square error (MSE),
misclassification rate (MR), and ROC index.
fit model as there is no negative value for ASE gap, MSE gap and MR gap also no positive value
for ROC gap for every model.
choose a model that yield largest gap in general. Since Reg_Poly_Int is the majority which is
largest gap for ASE, MSE and ROC index, thus Reg_Poly_Int is the over fit model.
15. To find the best model, we need to eliminate the over fit model. Then, we find the lowest
value of valid ASE, valid MSE and valid MR also largest value of valid ROC index.
16. Since Reg_Main_Poly_Int is the lowest value for valid ASE, valid MSE and valid MR also
largest value for valid ROC index, thus Reg_Main_Poly_Int is the best model.
28
17. Since Reg_Main_Poly_Int as the best model, we need to compare it with three method
selection model. Therefore, select model tab, drag another 3 logistic regression node to the
diagram and connect with data partition. Name each logistic regression as:
 Reg_Main_Poly_Int_Forward
 Reg_Main_Poly_Int_Backward
 Reg_Main_Poly_Int_Stepwise
18. Model selection for Reg_Main_Poly_Int is none.
19. Model selection for Reg_Main_Poly_Int_Forward is forward.
20. Model selection for Reg_Main_Poly_Int_Backward is backward.
21. Model selection for Reg_Main_Poly_Int_Stepwise is stepwise.
29
22. Click on Assess tab and drag Model Comparison node to the diagram and connect all logistic
regression nodes to the Model Comparison (3).
23. Right click on Model Comparison (3) and click run. Then, see the results.
30
calculate the data to find the best model using Microsoft Excel.
25. Find the gap (valid-train) for average square error (ASE), mean square error (MSE),
misclassification rate (MR), and ROC index.
choose a model that yield largest gap in general. There is no over fit model since no majority
largest gap.
28. To find the best model, we need to find the lowest value of valid ASE, valid MSE and valid
MR also largest value of valid ROC index.
29. Since Reg_Main_Poly_Int is the lowest value for valid ASE and valid MSE also largest
value for valid ROC index, thus Reg_Main_Poly_Int is the best model for logistic regression.
31
The Reg_Main_Poly_Int model is better in predicting the employees that do not left work
prematurely (negative target) since the value of specificity is higher than sensitivity.
32
8.0 Neural Network
1. Select Model tab. Drag 3 Neural Network node to the diagram and connect with data partition.
Name each logistic regression as:
 NN_2
 NN_5
 NN_7
2. Drag Variable Selection node which is under Explore tab to the diagram and connect with the
data. Click sample tab, drag Data Partition node and connect with Variable Selection node.
3. Click Data Partition (2) and under Data Set Allocations, change the Training to 70, Validation
to 30 and Test to 0.
4. Select Model tab. Drag another 3 Neural Network node to the diagram and connect with Data
Partition (2). Name each logistic regression as:
 VS_NN_2
 VS_NN_5
 VS_NN_7
33
5. For NN_2 and VS_NN_2 nodes, go to property and select Network and change the Number of
Hidden Units to 2.
Hidden Units to 5.
34
Hidden Units to 7.
8. Go to Assess tab, drag Model Comparison node and connect all Neural Network nodes to the
Model Comparison node.
35
9. Next, run and see the results.
10. From the Fit Statistics results, calculate the data to find the best model using Microsoft Excel.
Copy the data from fit statistics and paste to excel.
36
11. Find the gap (valid-train) for misclassification rate (MR), average square error (ASE), mean
square error (MSE) and ROC index.
choose a model that yield largest gap in general. Since NN_5 is the majority for largest gap of
ASE, MSE and MR, thus NN_5 is the over fit model.
14. To find the best model for neural network, we need to eliminate the over fit model. Then, we
find the lowest value of valid ASE, valid MSE and valid MR also largest value of valid ROC
index.
15. Since NN_7 is the lowest value for valid ASE, valid MSE and valid MR, thus NN_7 is the
best model for neural network.
37
The NN_7 model is better in predicting the employees that do not left work prematurely
(negative target) since the value of specificity is higher than sensitivity.
38
9.0 Best Model Comparison
1. Since DT_CART is the best model for decision tree, Reg_Main_Poly_Int is the best model for
logistic regression and NN_7 is the best model for neural network, then we will compare the
three model to choose the best model for this study.
2. Drag a Model Comparison node to the diagram, then connect DT_CART, Reg_Main_Poly_Int
and NN_7 to the Model Comparison node which is Model Comparison (5).
39
3. Right click Model Comparison (5) and click Run. After that, see the results.
calculate the data to find the best model using Microsoft Excel. Find the gap(valid-train) for
misclassification rate (MR), average square error (ASE), and ROC index.
40
fit model as there is no negative value for ASE gap and MR gap also no positive value for ROC
gap for every model.
choose a model that yield largest gap in general. Since NN_7 is the largest gap for ASE, MR and
ROC, thus NN_7 is the over fit model.
7. To find the best model, we need to eliminate the over fit model. Then, we find the lowest
value of valid ASE and valid MR also largest value of valid ROC index. Since DT_CART is the
lowest value for valid ASE and valid MR also the largest value for valid ROC index, thus
DT_CART is the best model to predict the status of employees leaving work prematurely.
41
10.0 Output Explanation for DT_CART
Output 1
From output 1, we know that the most important variable is Satisfaction Level. There are 9
important variables ranked by the value of ‘Importance’ column. Average_monthly_hours
variable is used 11 times in the decision tree model as the split. Satisfaction_level and
time_spend company variables are used 10 times in the decision tree model as the split.
Last_evaluation variable is used 9 times in the decision tree model as the split. Number_project
variable is used 5 times in the decision tree model as the split. Work_accident variable is used 2
times in the decision tree model as the split. The other three variables are used once in the
decision tree model as the split.
42
Output 2
From output 2, we know that there are 9 variables involved in building decision tree and there
are 29 rules which represented by the number of leaf. The depth of this decision tree is 6.
43
Output 3
*------------------------------------------------------------*
Node = 10
*------------------------------------------------------------*
if satisfaction_level < 0.115
AND number_project >= 2.5 or MISSING
then
Tree Node Identifier = 10
Number of Observations = 626
Predicted: left=1 = 1.00
*------------------------------------------------------------*
Node = 13
*------------------------------------------------------------*
if time_spend_company < 4.5 or MISSING
AND satisfaction_level >= 0.465 or MISSING
AND average_montly_hours >= 290.5
then
*------------------------------------------------------------*
Node = 14
*------------------------------------------------------------*
if time_spend_company >= 4.5
AND last_evaluation < 0.805
then
*------------------------------------------------------------*
Node = 19
*------------------------------------------------------------*
AND number_project < 2.5
AND average_montly_hours >= 279
then
*------------------------------------------------------------*
Node = 21
*------------------------------------------------------------*
if satisfaction_level < 0.465 AND satisfaction_level >= 0.115 or MISSING
AND number_project >= 6.5
then
*------------------------------------------------------------*
Node = 28
*------------------------------------------------------------*
AND last_evaluation < 0.575 or MISSING
AND average_montly_hours < 125.5
then
44
*------------------------------------------------------------*
Node = 31
*------------------------------------------------------------*
AND sales IS ONE OF: SALES, PRODUCT_MNG or MISSING
AND last_evaluation >= 0.575
AND average_montly_hours < 162 or MISSING
then
*------------------------------------------------------------*
Node = 32
*------------------------------------------------------------*
AND average_montly_hours < 241 AND average_montly_hours >= 162 or MISSING
then
*------------------------------------------------------------*
Node = 34
*------------------------------------------------------------*
AND number_project < 6.5 AND number_project >= 2.5 or MISSING
then
*------------------------------------------------------------*
Node = 35
*------------------------------------------------------------*
then
*------------------------------------------------------------*
Node = 38
*------------------------------------------------------------*
AND sales IS ONE OF: TECHNICAL, SUPPORT, IT or MISSING
AND average_montly_hours < 290.5 or MISSING
then
*------------------------------------------------------------*
45
Node = 45
*------------------------------------------------------------*
then
*------------------------------------------------------------*
Node = 47
*------------------------------------------------------------*
AND last_evaluation >= 0.805 or MISSING
AND average_montly_hours >= 216.5 or MISSING
then
*------------------------------------------------------------*
Node = 48
*------------------------------------------------------------*
AND average_montly_hours < 162 AND average_montly_hours >= 125.5 or MISSING
then
*------------------------------------------------------------*
Node = 49
*------------------------------------------------------------*
AND last_evaluation < 0.575 AND last_evaluation >= 0.445 or MISSING
then
*------------------------------------------------------------*
Node = 50
*------------------------------------------------------------*
if satisfaction_level < 0.32 or MISSING
AND sales IS ONE OF: TECHNICAL
then
*------------------------------------------------------------*
Node = 51
*------------------------------------------------------------*
if satisfaction_level < 0.465 AND satisfaction_level >= 0.32
46
then
*------------------------------------------------------------*
Node = 54
*------------------------------------------------------------*
AND average_montly_hours < 279 AND average_montly_hours >= 241
then
*------------------------------------------------------------*
Node = 55
*------------------------------------------------------------*
then
*------------------------------------------------------------*
Node = 58
*------------------------------------------------------------*
then
*------------------------------------------------------------*
Node = 59
*------------------------------------------------------------*
then
*------------------------------------------------------------*
Node = 60
*------------------------------------------------------------*
if time_spend_company < 4.5 AND time_spend_company >= 3.5
AND sales IS ONE OF: HR, TECHNICAL
AND number_project < 5.5 or MISSING
then
47
*------------------------------------------------------------*
Node = 61
*------------------------------------------------------------*
if time_spend_company < 4.5 AND time_spend_company >= 3.5
AND sales IS ONE OF: SALES, ACCOUNTING, SUPPORT, IT, PRODUCT_MNG, MARKETING, MANAGEMENT, RANDD or MISSING
then
*------------------------------------------------------------*
Node = 64
*------------------------------------------------------------*
if time_spend_company < 2.5
AND sales IS ONE OF: SALES, PRODUCT_MNG, RANDD
then
*------------------------------------------------------------*
Node = 65
*------------------------------------------------------------*
if time_spend_company < 4.5 AND time_spend_company >= 2.5 or MISSING
then
*------------------------------------------------------------*
Node = 68
*------------------------------------------------------------*
then
*------------------------------------------------------------*
Node = 69
*------------------------------------------------------------*
then
48
*------------------------------------------------------------*
Node = 70
*------------------------------------------------------------*
AND satisfaction_level < 0.705 AND satisfaction_level >= 0.465
then
*------------------------------------------------------------*
Node = 71
*------------------------------------------------------------*
then
From output 3, there are 29 rules in the decision tree model. The distributions of the rules are
such that:
 There are 8 rules in predicting the employees that leave work prematurely (Y=1)
 There are 20 rules in predicting the employees that does not leave work
prematurely(Y=0)
 There is 1 rule that cannot be used in predicting the target Y
There are 10498 observations used to grow the tree which it is the size of training data set.
The profile for predicting the worker that left (Y=1):

 if satisfaction_level < 0.465
AND last_evaluation < 0.575 or MISSING
 if time_spend_company >= 4.5

49
AND sales IS ONE OF: SALES, PRODUCT_MNG or MISSING

AND average_montly_hours < 241 AND average_montly_hours >= 162 or MISSING
 if satisfaction_level < 0.465 AND satisfaction_level >= 0.115 or MISSING

 if time_spend_company < 4.5 or MISSING

AND sales IS ONE OF: TECHNICAL, SUPPORT, IT or MISSING


 if satisfaction_level < 0.32 or MISSING

 if satisfaction_level < 0.465 AND satisfaction_level >= 0.32


50

 if time_spend_company < 4.5 AND time_spend_company >= 3.5

AND sales IS ONE OF: HR, TECHNICAL
 if time_spend_company < 4.5 AND time_spend_company >= 3.5

AND sales IS ONE OF: SALES, ACCOUNTING, SUPPORT, IT, PRODUCT_MNG, MARKETING,
MANAGEMENT, RANDD or MISSING
 if time_spend_company < 2.5

 if time_spend_company < 4.5 AND time_spend_company >= 2.5 or MISSING


51
AND satisfaction_level < 0.705 AND satisfaction_level >= 0.465
The profile for predicting the worker that does not left (Y=0)

AND average_montly_hours >= 290.5






52
OUTPUT 4
The plot shows the ASE corresponding to each subtree as the data is sequentially split.
Assessing the performance of a leaf tree is from optimality of the leaf tree which is the number
of leaf at the smallest value of average square error on valid data set. Number of leaves: 29.
OUTPUT 5
The plot shows the Misclassification Rate corresponding to each subtree as the data is
sequentially split. Assessing the performance of a leaf tree is from optimality of the leaf tree
which is the number of leaf at the smallest value of misclassification rate on valid data set.
Number of leaves: 29.
53
11.0 Conclusion
First, among five decision tree models which are DT_GINI, DT_ENTROPY,
DT_LOGWORTH, DT_CHAID and DT_CART, we find the best model. Based on the result of
SASE-Miner, we found that best model for decision tree is DT_CART.
Second, among seven logistic regression models which are Reg_Main, Reg_Poly, Reg_Int,
Reg_Main_Poly, Reg_Main_Int, Reg_Poly_Int and Reg_Main_Poly_Int, the best model is
Reg_Main_Poly_Int. Then compare Reg_Main_Poly_Int model again by using another selection
method which are forward, backward and stepwise. Among four logistic regression models
which are Reg_Main_Poly_Int, Reg_Main_Poly_Int_Forward, Reg_Main_Poly_Int_Backward
and Reg_Main_Poly_Int_Stepwise, the best model for logistic regression is Reg_Main_Poly_Int.
Third, among six neural network models which are NN_2, NN_5, NN_7, VS_NN_2,
VS_NN_5 and VS_NN_7, the best model for neural network is NN_7.
Lastly, among DT_CART, Reg_Main_Poly_Int and NN_7, we found out that the best model
to predict the employees that leave work prematurely is DT_CART.
54

Project Data Mining

Caricato da

Informazioni sul documento

Copyright

Formati disponibili

Condividi questo documento

Condividi o incorpora il documento

Opzioni di condivisione

Hai trovato utile questo documento?

Questo contenuto è inappropriato?

Copyright:

Formati disponibili

Project Data Mining

Caricato da

Copyright:

Formati disponibili

DATA MINING (STA555)

TITLE OF PROJECT: IDENTIFYING THE DETERMINANTS

2.0 Import Excel data to SAS………………………………………………….............. 3

3.0 Create new project………………………………………………………………..... 6

4.0 Insert data into project……………………………………………………………... 8

5.0 Data exploration………………………………………………………………......... 10

6.0 Decision tree…………………………………………………………...................... 16

7.0 Logistic regression……………………………………………………………......... 23

8.0 Neural network………………………………………………………………........... 33

9.0 Best model comparison………………………………………………………….…. 39

10.0 Output explanation for best model…………………………………………….…… 42

1.1 Problem statement

The research objectives are:

1.3 Scope and limitation

2. Choose ‘Microsoft Excel Workbook’ then click Next.

1. Open SAS Enterprise Miner Station 14.1

2. Click New Project.

1. Right click on data source and choose Create Data Source.

2. Select SAS Table then click Next.

4. Then, click Next until Finish.

1. Right click and click ‘Create Diagram’ to create a new diagram

2. Enter ‘EXPLORE’ as the Diagram Name then click OK.

12. Click Sample tab and drag Data Partition node.

8. Then, we obtain the result.

18. Model selection for Reg_Main_Poly_Int is none.

19. Model selection for Reg_Main_Poly_Int_Forward is forward.

20. Model selection for Reg_Main_Poly_Int_Backward is backward.

21. Model selection for Reg_Main_Poly_Int_Stepwise is stepwise.

important variables ranked by the value of ‘Importance’ column. Average_monthly_hours

decision tree model as the split.

The profile for predicting the worker that left (Y=1):

 if time_spend_company >= 4.5

 if satisfaction_level < 0.465

 if satisfaction_level < 0.465 AND satisfaction_level >= 0.115 or MISSING

 if time_spend_company < 4.5 or MISSING

 if time_spend_company >= 6.5

 if satisfaction_level < 0.465

 if satisfaction_level < 0.32 or MISSING

 if satisfaction_level < 0.465 AND satisfaction_level >= 0.32

 if satisfaction_level < 0.465

 if time_spend_company < 3.5 or MISSING

 if time_spend_company < 4.5 AND time_spend_company >= 3.5

 if time_spend_company < 4.5 AND time_spend_company >= 3.5

 if time_spend_company < 2.5

 if time_spend_company < 4.5 AND time_spend_company >= 2.5 or MISSING

 if time_spend_company >= 4.5

 if time_spend_company < 4.5 or MISSING

 if satisfaction_level < 0.465

 if satisfaction_level < 0.465 AND satisfaction_level >= 0.115 or MISSING

 if satisfaction_level < 0.465 AND satisfaction_level >= 0.115 or MISSING

 if time_spend_company >= 4.5

 if satisfaction_level < 0.465

 if time_spend_company < 6.5 AND time_spend_company >= 4.5 or MISSING

Potrebbero piacerti anche