Documenti di Didattica
Documenti di Professioni
Documenti di Cultura
P R O J E C T G U I D E :
Pr of . A k hi le sh
Tiwari
Department of CSE & IT MITS, Gwalior
IBM
A S S I D u o u s G R O U P
TEAM MEMBERS:
Version I 30-01-2012
Table of Contents
Description
Page no.
1. Introduction
1.1 Purpose
1.2 Scope
1.4 References
1.6 Overview
2. Overall Description
10
Assiduous Group/MITS
Page 2
Version I 30-01-2012
10
11
11
11
12
15
Assiduous Group/MITS
Version 1 30-01-2012
1. Introduction:
1.1 Purpose:
The amount of data being collected in databases today far exceeds our ability to reduce
and analyze data without the use of automated analysis techniques. Many scientific and transactional business databases grow at a phenomenal rate. Knowledge discovery in
databases (KDD) is the field that is evolving to provide automated analysis solutions.
In view of above, purpose is to analyze market basket data for t he extraction of hidden
1.2 Scope:
Suppose as a manager of an All Electronics branch, you would like to learn the buying habits of your customers. Specifically, you wonder, Which groups or sets of items are customers likely to purchase on a given trip to the store? To answer your question
market basket analysis may be performed on the retail data of customer transactions at
your store. The results may be used to plan marketing or advertising strategies, as well as catalog design. For instance market basket analysis may help managers design different store layouts. In one strategy, items that are frequently purchased together can be placed
in close proximity in order to further encourage the sale of such items together.
Although Market Basket Analysis conjures up pictures of shopping carts and supermarket shoppers, it is important to realize that there are many other areas in
which it can be applied. These include: Analysis of credit card purchases. Analysis of telephone calling patterns. Identification of fraudulent medical insurance claims. (Consider cases where common rules are broken).
s Group/MITS Page 4
Version I 30-01-2012
following are the common data mining techniques: Association Rule Classification Clustering Sequence Rule Generalization and Summarization etc.
Since the proposed project is related to Association Rule Mining, a brief description of
Assiduou s
Group/MITS
Page 5
Version I 30-01-2012
Market Basket Analysis Software Requirement Specification Assiduous Group 1.3.3 Association Rule Mining association rules in transactional or relational databases has recently attracted a
lot of attention in databases communities. The task is to find interesting associations or correlations among a large set of data i.e. to identify sets of items or predicates that frequently occurs together and then formulate rules that characterize their relationship. For example one may find, from a large set of transaction data, such an association rule as if a customer buys "X", he/she usually buys "Y", in the same transaction. Here "X" and "Y" are individual items or set of items. Retail stores frequently use association rules in order to assist marketing, advertising, floor-management and inventory control etc. Although they have a direct applicability to retail business, they can also be used for
other purposes. A formal statement of the association rule problem is given in [1]. Let I = { i , i , i , i ,.......,i }, I , be a set of m distinct literals called items.
1 2 3 4 m
An association rule is an implication of the form X Y, where X, Y I and X Y = . Here 'X' is called the antecedent or body and 'Y' is called consequent or head of the rule.
1.4 References:
[1] R. Agrawal, T. Imielinski, and A. Swami. Mining association rules between sets of items in Large Databases. In proceedings of ACM-SIGMOD International conference on management of Data, Washington D.C., May 1993, pp 207-216. [2] Abraham Silberschatz, Henry F. Korth and S. Sudarshan. Database System
Concept. The McGRAW HILL Companies Fifth Edition 2006. [3] Margaret H. Dunham. Data Mining, PEARSON Education Sevent Edition 2005.
Version I 30-01-2012
Assiduous Group
J2EE: (Servlet, JSP, JAX Java Platform, Enterprise Edition or Java EE is a widely
used platform for server programming in the Java programming language. The Java
Platform (Enterprise Edition) differs from the Java Standard Edition Platform (Java SE) in that it adds libraries which provide functionality to deploy fault-tolerant, distributed, Multi-tier Java software, based largely on modular components running on an application
Page 7
Assiduous Group/MITS
Version I 30-01-2012
1.6 Overview:
-I- Overall Description: Processes during the tenure of project (i) Study of Apriori Algorithm (ii) Data Collection (iii) Implementation of Apriori Algorithm (iv) Development of user interface (v) Application of Apriori on collected market basket data (vi) Analysis of results -I- Specific Requirements: Real-life dataset (Market Basket Data)
2. Overall Description:
Application Tier
This layer implements the business logic of the applications. It is usually powered by a Java Application Server (WebSphere). There're several sub-layers within the application
layer.
Version I 30-01-2012 Market Basket Analysis Software Requirement Specification Assiduous Group Data Tier
This is the layer that manages the persistence of application information. It is usually powered by a relational database server ( MS SQL Server).
Stored Procedures and Functions are used to execute database server-side
processes pertinent to data integrity. Business logic processes should be part of
Page 9
Assiduous Group/MITS
Version I 30-01-2012
Front End Client: HTML , Dream Weaver Web Server: Apache, Tomcat, Web Sphere. Back End: DB2 9.7
Minimum
128 MB S
Pr
Intel Pentium III or Web Sphere AMD 800 MHz Data Tier Processor DB 2 Intel Pentium III or AMD 800 MHz RAM 256 MB Disk Space 500 MB RAM 1 GB Disk Space 3.5 GB
Page 10
Assiduous Group/MITS
Version I 30-01-2012
1. Specify input data: Define the data to be mined, data may be in the form of
dataset file or any other file etc.
2. Process data/ preprocess the input data: 3. Select technique/algorithm: Select the appropriate data mining algorithm. 4. Work on results: Select visualization tools to analyze the result.
2.5 User Characteristics:
Users can be characterized as: 1. General (Non Technical User): This category includes general users having no technical information. 2. Technical User: This category includes users having technical information. 3. Analyst: This category includes users having the ability to analyze the data as well as result.
2.6 Constraints:
Proposed application requires user-specified Support and Confidence framework as
Version I 30-01-2012
Market Basket Analysis Software Requirement Specification Assiduous Group Rules originating from the same itemset have identical support but can have different confidence support = #tuples(LHS, RHS)/N
Confidence: The other number is known as the confidence of the rule. Confidence is the ratio of the number of transactions that include all items in the consequent as well as the
antecedent (namely, the support) to the number of transactions that include all items in
the antecedent.
confidence of rule B given A is a measure of how much more likely it is that B
occurs when A has occurred 100% meaning that B always occurs if A has occurred confidence = #tuples(LHS, RHS) / #tuples(LHS) Example: bread and butter milk [90%, 1%]
which 2,000 include both items A and B and 800 of these include item C, the association rule "If A and B are purchased then C is purchased on the same trip" has a support of 800 transactions (alternatively 0.8% = 800/100,000) and a confidence of 40% (=800/2,000). One way to think of support is that it is the probability that a randomly selected transaction from the database will contain all items in the antecedent and the consequent, whereas the confidence is the conditional probability that a randomly selected transaction will include all the items in the consequent given that the transaction includes all the items in the antecedent.
system on which the database system runs.Database systems can be centralized, or client server, where one server machine executes work on behalf of multiple client machines.
Assiduous Group/MITS
Page 12
Version I 30-01-2012 Market Basket Analysis Software Requirement Specification Assiduous Group In case of three tier architecture, the client machine acts as merely a front end and does
not contain any direct database calls. Instead, the client ends the communication with an application server, usually through a forms interface. The application server in turn communicates with a database system to access data. The business logic of the application, which says what actions to carry out under what conditions, is embedded in the application server, instead of being distributed across multiple clients. Three tier applications are more appropriate for large applications, and for applications that run on
the
Page 13
Assiduous Group/MITS
Version I 30-01-2012
Market Basket Analysis Software Requirement Specification Assiduous Group Client Tier
It implements the "look and feel" of an application. It is responsible for the presentation
of data, receiving user events and controlling the user interface. Most ecommerce applications are web-based. The programming languages used are the combination of
HTML, CSS and Javascript. JSP or ASP are used for dynamic content.
HTML is a Web authoring markup language for defining content structures and rendering a web page.
Javascript is commonly used for client-side validation. Javascript does have some
control over the look-and-feel of a page in dynamic HTML. Application Tier This layer implements the business logic of the applications. It is usually powered by a Java Application Server (WebLogic or WebSphere). There're several sub-layers within the application layer.
Control Layer is the interface layer between presentation tier and application tier. The implementation of this layer is dependent on the languages used for
Transaction Layer usually implements business processes that may involve many business objects. In J2EE architecture, session beans are commonly used for implementing the transaction layer. Transaction Layer and Business Object Layer are not constrained by the programming languages for the presentation and the
Business Object Layer consists of objects that represent business entities which
Data Access Object (DAO) Layer is the interface between the application tier and persistence tier. Besides the methods for "creating", "retrieving", "updating" and "removing" a business object from database, DAO objects implement other
Assiduous Group/MITS
Page 14
Version I 30-01-2012
Market Basket Analysis Software Requirement Specification Assiduous Group business-specific methods as well. Even with JDBC, DAO objects may not be 100% database independent. Data Tier
number of transactions that include all items in the antecedent and consequent parts of the rule. (The support is sometimes expressed as a percentage of the total number of records in the database.)
measure of how often the collection of items in an association occur together as a
bought
Rules originating from the same itemset have identical support but can have
Confidence : The other number is known as the confidence of the rule. Confidence is the
ratio of the number of transactions that include all items in the consequent as well as the
antecedent (namely, the support) to the number of transactions that include all items in
the antecedent.
Assiduous Group/MITS
Page 15
Version I 30-01-2012
Market Basket Analysis Software Requirement Specification Assiduous Group confidence of rule B given A is a measure of how much more likely it is that B occurs when A has occurred 100% meaning that B always occurs if A has occurred confidence = #tuples(LHS, RHS) / #tuples(LHS) Example: bread and butter milk [90%, 1%]
which 2,000 include both items A and B and 800 of these include item C, the association rule "If A and B are purchased then C is purchased on the same trip" has a support of 800 transactions (alternatively 0.8% = 800/100,000) and a confidence of 40% (=800/2,000). One way to think of support is that it is the probability that a randomly selected transaction from the database will contain all items in the antecedent and the consequent, whereas the confidence is the conditional probability that a randomly selected transaction will include all the items in the consequent given that the transaction includes all the items in the antecedent.
Assiduous
Group/MITS
Page 16
Version I 30-01-2012 Market Basket Analysis Software Requirement Specification Assiduous Group
Special Thanks
We convey a special thanks to our department and to our
college. We also convey a special thanks to all these softwares and websites, they have been helping a lot in
Assiduous
Group/MITS
Page 17