Documenti di Didattica
Documenti di Professioni
Documenti di Cultura
2 0 2 4 4
Version : 1.00
Pre-requisite: None
Course Objectives:
To introduce the concept of Data Mining and Data Preprocessing
To provide the skills required to handle large data sets
To develop the knowledge for application of the mining algorithms for association, clustering.
To introduce the algorithms for mining data streams
To explain the features of recommendation engine
Expected Outcomes:
The student will be able to
To design Data mining algorithms for real world applications
To evaluate the performance of the various Data Mining algorithms
Analyze and leverage data for real-time decision making
SLO: 17
Project
# Generally a team project [3 to 4 members]
# Concepts studied in XXXX should have been used
# Down to earth application and innovative idea should have been attempted
# Report in Digital format with all drawings using software package to be submitted. [Ex. 1. Design of a
traffic light system using sequential circuits OR 2. Design of digital clock]
# Assessment on a continuous basis with a min of 3 reviews.
//Available online data sources may be used for exploring the following projects:
For example: Kaggle, UCI repository, kdnuggets, UCR Time Series Archive etc.
Projects may be given as group projects
Sample Projects:
1. Using a programming language that you are familiar with, such as C++ or Java, implement recent
frequent/closed/maximal itemset mining algorithms: Compare the performance of each
algorithm with various kinds of large data sets. Write a report to analyze the situations (e.g., data
size, data distribution, minimal support threshold setting, and pattern density) where one
algorithm may perform better than the others, and state why.
2. The DBLP data set (www.informatik.uni-trier.de/_ley/db/) consists of over one million entries
of research papers published in computer science conferences and journals. Among these entries,
there are a good number of authors that have coauthor relationships.
(a) Propose a method to efficiently mine a set of coauthor relationships that are closely
correlated (e.g., often coauthoring papers together).
(b) Based on the mining results and the pattern evaluation measures, discuss which measure may
convincingly uncover close collaboration patterns better than others.
(c) Based on the study in (a), develop a method that can roughly predict advisor and advisee
relationships and the approximate period for such advisory supervision.
3. Implement the associative classification algorithms and compare the performance of each
algorithm with various kinds of large data sets. Write a report to analyze the situations (e.g., data
size, data distribution, minimal support threshold setting, and pattern density) where one
algorithm may perform better than the others, and state why.
4. Implement fuzzy clustering and probabilistic clustering methods and compare the performance
of each algorithm with various kinds of large data sets. Write a report to analyze the situations
(e.g., data size, data distribution, pattern density and cluster validity) where one algorithm may
perform better than the others, and state why.
5. Implement and compare different outlier detection methods/outlier factors on various kinds of
large data sets. Write a report to analyze the situations (e.g., data size, data distribution, pattern
density) where one algorithm may perform better than the others, and state why.
6. Using a programming language that you are familiar with, such as C++ or Java, implement recent
algorithms for intent mining: Compare the performance of each algorithm with various kinds of
large data sets. Write a report to analyze the results where one algorithm may perform better than
the others, and state why.
7. Design and implement sentiment analysis algorithm for twitter dataset. Experiment the proposed
idea using different classifiers and identify the best classifier for the chosen data set based on
different performance measures.
Design and implement content based, user based and collaborative filtering technique on any
benchmark dataset to build a recommender system. Prepare a report based on the performance
of different methods to justify the choice of the best recommender system.
Lab SLO: 14
Indicative List of Experiments: