Sei sulla pagina 1di 7

CERTIFICATE

ii
ACKNNOWLEDGEMENT
iii
DECLARATION
iv
ABSTRACT
v
CONTENTS
vi
LIST OF FIGURES

LIST OF Tables
xii

CONTENTS
1.Introduction

1.1 Motivation

1.2 Objectives

1.3 Scope

1.4 Limitations

1.5 Software and Hardware Requirements

1.6 Algorithms

2. Literature Review

2.1 Introduction

2.2 Data Pr-Processing

2.2.1 Data Cleaning

2.2.1.1 Problems in Data Cleaning

2.2.1.2 Popular Methods used

2.2.2 Data Reduction

2.2.2.1 Best Practices used

2.2.3 Data Integration

6
6

2.2.4 Data Transformation

2.3 Missing Data

2.3.1 Handling Missing Values

2.4 Attribute Relation File Format(ARFF)

2.4.1 Types of Attributes

2.5 Kinds of Data

2.5.1 Discrete Data

2.5.2 Continuous Data

10

2.6 Clustering

10

2.6.1 K-Means Algorithm

10

2.6.1.1 Working of K-Means Algorithm


2.6.2 K-Medoid Algorithm

10
11

2.7 Information Gain

11

2.8 Decision Tree

12

2.9 Induction Decision Tree Version 3(ID3) Algorithm

13

2.9.1 Working of ID3


2.10 Datasets

14

2.11 Training set and Test set

15

3. Weka

16

3.1 What constitutes Weka?

16

3.2 Implementing Weka

17

3.3 Preparing the Data

18

3.4 Studying the Explorer

20

3.5 Using the Explorer

20

3.6 Filtering Algorithms

21

3.6.1 Kinds of Filters

21

3.6.1.1 Unsupervised Attribute Filters

21

3.6.1.2 Unsupervised Instance Filters

24

3.6.1.3 Supervised Attribute Filters

25

3.6.1.4 Supervised Instance Filters

25

3.7 Clustering

25

3.7.1 Clustering Algorithms supported by Weka

25

3.7.2 Clustering Algorithms Used in Project

26

3.8 Classifiers

26

3.8.1 Tree Classifiers used in Project

26

3.9 Attribute Selection

27

3.9.1 Attribute Evaluation Methods for Attribute Selection supported

27

by Weka
3.9.1.1 Attribute Subset Evaluator

27

3. 9.1.2 Single Attribute Evaluator

28

3.9.2 Search Methods for Attribute Selection supported by Weka

28

3.9.3 Ranking Methods for Attribute Selection supported by Weka

28

3.9.4 Attribute Evaluator used in Project

29

3.9.5 Search Method used in Project

29

4. Project Analysis

30

4.1 Introduction

30

4.2 Steps followed in achieving the Goal of the Project

30

4.3 Screen shots showing various phases of our Project

32

5. Project Design

47

5.1 Unified Modeling Language(UML)

47

5.2 Sequence Diagram


6. Testing

48

6.1 Introduction

48

6.2 Testing Methods

48

6.2.1 Black box Testing

48

6.2.2 White box Testing

48

6.3 Testing can be done on following levels

48

6.3.1 Unit Testing

48

6.3.2 Integration Testing

49

6.4 Final Testing

49

6.4.1 Alpha Testing

49

6.4.2 Beta Testing

49

6.5 Testing our Application

49

6.6 Result Analysis

49

6.6.1 Scope of Testing

49

6.6.2 Quantitative Grading

50
51

7. Conclusion and Future Scope


7.1 Conclusion

51

7.2 Future Scope

51

8. Bibliography

52

Appendix

List of Figures
1. Definition of Information Gain

12

2. Dataset on which Information gain is calculated

12

3. Decision based on Information Gain

12

4. Example showing Decision Tree classification

13

4.1 Result of the Dataset

13

5. Generalized Dataset

19

6. Test set used for Prediction

19

7. Shows the Raw data of crops cultivated across Guntur

32

8. Conversion of Excel sheet into ARF format

33

9. Graphical User Interface

33

10. Selecting a Dataset

34

11. Selecting a dataset of required .arff File

34

12. Loading .arff file into GUI

35

13. GUI displaying the Clustering Algorithm K-Means

35

14 Implementing a Classification Algorithm(ID3) on a Dataset

36

15.Generating Tree For GUI Loading .arff File

36

16. Tree Generated For .arff File

37

17. Rules Generated For the .arff File

37

18. Tree Generation for Selected Attribute

38

19. Loading of .arff File for required attribute

38

20.Selecting of attribute mandal for given .arff File

39

21.Tree Generation For the attribute Mandal

39

22.Showing the Association rules

40

23.Showing the output after classification

40

24.Showing the output after Classification

41

25. Options required to be selected for Displaying Prediction

41

26. Result after implementing Classification Algorithm

42

27. Result after implementing Classification Algorithm

42

28. Result after implementing Classification Algorithm

43

29. Result after implementing Classification Algorithm

43

30. Sequence Diagram showing how the User Interacts with the
GUI

47

10

TABLES
1. Scope of Testing

50

2. Quantitative Grading

50

11

12

Potrebbero piacerti anche