Sei sulla pagina 1di 9

Synopsis

on
Multithreaded Apriori Algorithm on Different Multicore System.

BACHELORE OF ENGINEERING
SUBMITTED BY
TEJASWI B. GUNJAL [roll no:16]
REVTI K. DIMBAR [roll no:11]
NEHA C. GUPTA [ roll no:33]
Under the guidance of
Prof. Halkarnikar
Department of Computer Engineering
PADMASHREE DR D Y PATIL INSTITUTE OF
ENGINEERING MANGEMENT AND RESEARCH AKURDI
PUNE-411 044.

1. INTRODUCTION WITH PROBLEM IDENTIFICATION

1.1 PROBLEM STATEMENT:-

A Huge amount of data gets collected from society with different sources.
Hardly has it led to a useful knowledge. For finding useful knowledge an algorithm
is required. Apriori is an algorithm for mining data from databases which shows
items that are related to each other. The databases having a size in GB and TB need
a fast processor. For fast processing multi-core processors are used. Parallelism is
used to reduce time and increase performance, Multi-core processor is used for
parallelizing. Serial mining can consume time and reduce performance for mining.
To solve this issue we are proposing a work in which load balancing is done among
processors. In this paper we have implemented Apriori algorithm in serial and
parallel manner and comparison of both on the basis of varying support-count and
time using parallel programming technique Multithread Java

1.1.1

INTRODUCTION :-

TECHNICAL KEYWORDS
1. Data Mining, e-Commerce, apriori algorithm, association rules, support,
confidence, retail sector, Parallel processing, Multicore processing,

Relevance of Work:
In computer science and data mining, Apriori is a classic algorithm for learning
association rules. Apriori is designed to operate on databases containing transactions (for
example, collections of items bought by customers, or details of a website frequentation).
As is common in association rule mining, given a set of itemsets (for instance, sets of
retail transactions, each listing individual items purchased), the algorithm attempts to find
subsets which are common to at least a minimum number C of the itemsets. Apriori uses a
"bottom up" approach, where frequent subsets are extended one item at a time (a step known as
candidate generation), and groups of candidates are tested against the data. The algorithm
terminates when no further successful extensions are found.

Apriori uses breadth-first search search and a tree structure to count candidate item sets
efficiently. It generates candidate item sets of length k from item sets of length k 1. Then it
prunes the candidates which have an infrequent sub pattern. According to the downward closure
lemma, the candidate set contains all frequent k-length item sets. After that, it scans the
transaction database to determine frequent item sets among the candidates. It is nothing but finding
frequent itemsets using candidate generation. It uses Apriori property that all nonempty subsets of a
frequent itemset must also be frequent.

1.1.2 ADVANTAGES :-

1. This application can be used to perform mining operation (apriori


algorithm) on databases.
2. Mining can be done in very efficient way in terms of space And timing
cost.
3. Speed of computations is increased a lot by using Newly deployed
Multicore system.
4. Smart analysis is calculated with time which is very useful for basket
analysis.
5. Great visualization for analysis can be done.
DISADVANTAGES:-

1. The system must have more than one core.


1.1.2
1.
2.
3.

APPLICATIONS:
It is used in Data-Marts.
It is used in Share Market
Also used in Development Centers..

2. LITERATURE SURVEY:
Association rule mining tries to find frequent patterns,
associations, correlations, or casual structures sets of items or
objects in transaction database, relational database, etc. that is
to say, to find out the relation or dependency of occurrence of
one of one item based on occurrence of other items. Apriori
algorithm is a basic algorithm for association rule mining.
A supermarket wants to implement a bundling sale. They
need to find the items purchased together frequently. Its a typical
market basket analysis problem. This process analyzes customer
buying habits by finding associations between the different items
that customers place in their shipping baskets. The result can
help retailers develop marketing strategies by getting to know

which items are frequently purchases together by customers.


Apriori is a good solution to this Association rules mining problem.

4. OBJECTIVES OF PROJECT:

Objectives :
To generate the frequent itemset using apriori algorithm,our aim to implement
mining system using serial approach and parallel approach through which we will
focus on the to enhancement of apriori algorithm performance.

Implementation Modules:
1. Authentication Module
2. User Interface Module
3. Serial Approach Module
3.1. Candidates Generation module(single Threaded)
3.2. Frequent Item Calculation Module(Single Threaded)

4. Parallel Approach Module


4.1. Candidates Generation module(Multi- Threaded)
4.2. Frequent Item Calculation Module(Multi-Threaded)

Project Scope
This project is to implement parallel apriori
algorithm using new generations Multicore Processing
units/Processors.We are going to implement the array
based (bitmap based ) apriori algorithm. With this
functionality, as specified in system architecture master
node will also interact with other distributed systems
and will parallel y execute the mining operations on
those systems.

System Architecture

4. METHODOLOGY:
Step 1:
The Prune Step: To find the count of each candidate in Ck the entire database is scanned.
Candidate k-itemset is represented by Ck. To find whether that itemset can be placed in frequent k-itemset
Lk to count each itemset in Ck is compared with a predefined minimum support count [1].

Step 2:
The join step: Lk is natural joined with itself to get the next candidate k+1- itemset Ck+1.
The major step here is the prune step which requires scanning the entire database for finding the count of
each itemset in every candidate k-itemset. If the database size is large, so to find all the frequent itemsets
in the database, it requires more time [1]
The Apriori Algorithmis an influential algorithm for mining frequent itemsets for boolean
association rules. Following are the key concepts:-

Frequent Itemsets: The sets of item which has minimum support (denoted by Li for ith Itemset).
Apriori Property: Any subset of frequent itemset must be frequent.
Join Operation: To find Lk, a set of candidate k-itemsets is generated by joining Lk-1with itself.
Join Step: Ck is generated by joining Lk-1with itself
Prune Step: Any (k-1)-itemset that is not frequent cannot be a subset of a frequent k-itemset
Ck: Candidate itemset of size k
Lk: frequent itemset of size k
L1= {frequent items};
( for(k= 1; Lk!=0;k++) do begin
Ck+1= candidates generated from Lk;
for eachtransaction tin database do
increment the count of all candidates in Ck+1that are contained in t
Lk+1= candidates in Ck+1with min_support
end
return UkLk

HARDWARE REQUIREMENTS:

Hardware

-Dual Core (minimum)

Speed

-1.1 GHz

RAM

-1GB

Hard Disk

-20 GB

Floppy Drive

-1.44 MB

Key Board

-Standard Windows Keyboard

Mouse

-Two or Three Button Mouse

Monitor

- SVGA

SOFTWARE REQUIREMENTS:
Operating System

: Any(Windows/Linux)

Technology

: Java and J2EE

IDE

: My Eclipse

Java Version

: J2SDK1.5 or later

5. POSSIBLE OUTCOMES:
Approximate Output must be written in this.

6. REFERENCES:
1. Han and Micheline Kamber, Data Mining concepts and Techniques 2nd edition Morgan
Kaufmann Publishers, San Francisco 2006.
2. N. Ricci, S. Guyer, and J. E. Moss, Elephant Tracks: Portable Production Complete and Precise
GC Traces, in ISMM, 2013.
3. S. Blackburn, R. Garner, C. Hoffmann, et al., The DaCapo Benchmarks: Java Benchmarking
Development and Analysis, in OOPSLA, 2006.
4. 'The 6 biggest challenges retailer Face today", www.onStepRetail .com. retrieved on June 2011
5. Berry, M. J. A. and Linoff, G. Data mining techniques for marketing, sales and customer support,
USA: John Wiley and Sons,1997
6. Andre Bergmann, "Data Mining for Manufacturing: PreventiveMaintenance, Failure Prediction,
and Quality Control" Fayyad, U. M; Piatetsky-Shapiro, G. ; Smyth, P.; and Uthurusamy, R. 1996.
7. Advances in Knowledge Discovery and Data Mining. Menlo Park,Calif.: AAAI Press.Dr. Gary
Parker, vol 7, 2004, Data Mining: Modules in emerging fields, CD-ROM.

8. Jiawei Han and Micheline Kamber (2006), Data Mining Concepts and Techniques, published by
Morgan Kauffman, 2nd ed.
9. Literature Review: Data mining, http://nccur.lib.nccu.edu. twlbitstream/1 40.1 I 9/3523I/S/35603
I OS.pdf, retrieved on June 2012
10. .H. Mahgoub,"Mining association rules from unstructured documents" in Proc. 3rd Int. Conf. on
Knowledge Mining, ICKM, Prague, Czech Republic, Aug. 25- 27, 2006, pp. 1 67-1 72. S. annan,
and R. Bhaskaran "Association rule pruning based on interestingness meas ures with clustering".
International Journal of Computer Science Issues, IJCSI, 6(1 ), 2009, pp. 35-43 .
11. M. Ashrafi, D. Taniar, and K. Smith "A New Approach of Eliminating Redundant Associa tion
Rules". Lecture Notes in Computer Science,Volume 31 S0, 2004, pp. 465 -474.
12. http://wenku.baidu.com/view/972ef7c66137ee06eff91824.html.
13. Data Mining: Concepts and Techniques. J.Han and M.Kamber. 2000.
14. Pattern Recognition and Machine Learning. Christopher M. Bishop. 2006.
15. Data Mining by Dr. Hall (http://www.cse.usf.edu/~hall/dm/
16. http://en.wikipedia.org/wiki/Apriori_algorithm

Conferences:
Sr. No
1
2

7
8
9

Name of Conference

Date

Location

IFERP- International Conference on Emerging


Trends in Engineering and Technology(ICET- 15)
IRAJ- International Conference on Electrical
Electronics
&
Communication
Engineering(ICEECE)
IFERP- International Conference on Current
Advances in Electronics, Electrical and Computer
Science(ICEECS-15)
IFERP- International Conference on Current
Advances in Electronics, Electrical and Computer
Science(ICEECS-15)
IFERP- International Conference on Current
Advances in Electronics, Electrical and Computer
Science(ICEECS-15)
National Conference on Science, Tecnology,
Engineering, Mathematics and Sustainability CSTEMS 2015
Symbiosis Institute of Management Studies
Annual Research Conference(SIMSARC) 2015
2nd International Conference on Advances in
Computing and Management(ICACM) 2016
International Conference Data Engineering and
Communication Technology(ICDECT) by Aspire
Research Foundation

6th September Pune


2015
6th September Pune
2015
13th September Pune
2015
20th September Pune
2015
27th September Pune
2015
4th
October Pune
2015
11th December
2015
15th
January
2016
10th
th
11 March
2016

Pune
Pune
Lavasa,
Pune

(Guide and Project Co-ordinator

Potrebbero piacerti anche