Logic Based Pattern Discovery New

LOGIC BASED PATTERN DISCOVERY
A Project Report Submitted In Partial Fulfillment of the Requirements for the Award Of
MASTER OF TECHNOLOGY IN SOFTWARE ENGINEERING BY E.SREE LAKSHMI (09C31D2503) UNDER THE GUIDENCE OF MRS RAZIYA Assoc.Prof.
DEPARTMENT OF COMPUTER SCIENCE & ENGINEERING BALAJI INSTITUTE OF TECHNOLOGY & SCIENCE NARSAMPET, WARANGAL - 506331 2011-2012
DEPARTMENT OF INFORMATION TECHNOLOGY BALAJI INSTITUTE OF TECHNOLOGY & SCIENCE NARSAMPET, WARANGAL- 506331
CERTIFICATE
This is to certify that E.SREE LAKSHMI, Roll No 09C31D2503 of the M.Tech has satisfactorily completed the dissertation work entitled LOGIC BASED PATTERN DISCOVERY in the partial fulfillment of the requirements of the M. Tech degree during this academic year 2011-2012
Mr. D.Venkateshwarlu SRINIVAS Supervisor(s) Department External
Mr. M. Head of the
Abstract
Previous studies have presented convincing arguments that a frequent pattern mining algorithm should not mine all frequent patterns but only the closed ones, because the latter leads to not only a more compact yet complete result set but also better efficiency. However, most of the previously developed closed pattern mining algorithms work under the candidate maintenance-and test paradigm, which is inherently costly in terms of runtime and space usage when the support threshold is low or the patterns, become long. A new Pattern mining algorithm will be proposed to discover domain knowledge report as coherent rules, where coherent rules would be discovered based on the properties of inference analysis approach. In this approach I can use the Back Scan Pruning technique.
CHAPTER 1 INTRODUCTION
Data is stored in databases, data warehouses and other information repositories. As size of data increases, there is a bad need for data mining which is required to extract knowledge of interest for the users. Thus, data mining is a process of extracting knowledge from information repositories by extracting interesting data patterns representing knowledge. We get these interesting data patterns by evaluating data patterns (task relevant information) obtained from various databases and data warehouses. Data Mining is carried out using various data mining functionalities in which Association Rule Mining (ARM) is commonly used to extract interesting data patterns. It is used by marketing and retail communities in order to find
interesting association rules between frequent item sets which can boost the sales of an item set in the market in order to make profits. Mining association rules are useful for discovering relationships among items from large databases. Association rule mining deals with market basket data analysis for finding frequent item sets to generate valid and important association rules from them.
Association Rule Mining finds interesting data patterns based on association relationship between various items of a data set by using association rules which are used to specify the association relationship between various items of a data set. {milk, bread} {Butter}, now in this association rule, there is correlation between two item sets: {milk, bread} and {butter}. A frequent item set is a set of items that appear together frequently in a data set. Now the item sets whose frequency of occurrence is > = min_support threshold value given by the domain experts are only considered to be frequent patterns. Hence, Association Rule Mining is also called as Frequent pattern Mining.
An association between these frequent itemsets is interesting, if it satisfies two interestingness measures called support and confidence. By using a min_support value given by domain expert we lose some interesting association rules, as this threshold value cannot be always correct. So we should shift from existing support and confidence framework to a framework which uses a logic principle in order to check the interestingness of an association rule. As this framework relies completely on logic instead of min_support threshold value given by domain expert, all the interesting association rules are discovered. This framework is further enhanced by applying pruning technique in order to reduce search space so that, time and space complexity are reduced. 1.1 OBJECTIVE To eliminate the need of using the minimum support threshold for discovering interesting association rules by obtaining association rules based on their support value (i.e. their frequency of occurrence) as observed in the transactional data set and then evaluating these association rules based on certain logic principles. This process is followed in order to get only strong and interesting association rules and completely eliminate uninteresting association rules which we get when mining is performed based on minimum support threshold value given by a domain expert. This process is to be further enhanced in order to reduce space and time complexity. 1.2 PROBLEM STATEMENT The use of minimum support threshold generally assumes that: A domain expert can provide this threshold value accurately, which is not always accurate. The knowledge of interest i.e. an interesting data pattern in the form of an interesting association rule can be obtained within this threshold value. This single threshold value is highly enough to get knowledge of interest required by the user. Because of these assumptions, we having following disadvantages: Loss of association rules involving frequently observed items.
Loss of association rules involving infrequently observed items. In order to overcome these disadvantages, we need to shift from we need to use a framework other than the existing support and confidence framework which discovers interesting association rules
1.3 DEFINITIONS
Data mining: - It is a process of discovering knowledge from various information repositories by extracting interesting data patterns representing knowledge. These interesting data patterns are obtained by evaluating data patterns (i.e. task relevant information) obtained from various data sources like databases and data warehouses. Task - relevant information obtained from various data sources like databases and data warehouses is called data pattern. Association Rule Mining: - It is a data mining functionality which is used to find interesting data patterns based on association or correlation relationship between various items of a transactional data set. Association Rule:
It is a rule used to specify association relationship between
items of a frequent itemset obtained from a transactional data set. Let I = {i1, i2 , im} be a set of items. Let D, the task-relevant data, be a set of database transactions where each transaction T is a set of items such that T I. Each transaction is associated with an identifier, called TID. Let A be a set T. An association I, and A B=
of items. A transaction T is said to contain A, if and only if A rule is an implication of the form A . The rule A B, where A I, B
B holds in the transaction set D with support s, where s is
the percentage of transactions in D that contain A U B (i.e., the union of sets A and B, or say, both A and B). This is taken to be the probability, P (A U B). The rule A B has confidence c in the transaction set D, where c is the percentage of
transactions in D containing A that also contain B. This is taken to be the conditional probability, P (B / A). Support (A Confidence (A B) = P (A U B)(1.3.1) B) = P (B / A)..(1.3.2)
Itemset: - A frequent itemset refers to a set of items that appear together frequently in a transactional data set. For ex: {milk, bread}. Subsequence: - A data pattern that appears in a sequential order in a data set is called as a frequent sequential pattern or a frequently occurring subsequence. For ex:- Pattern which shows that customers tend to purchase first a PC followed by a digital camera and then a memory card is a frequent sequential pattern or subsequence.
Implication:
- If an association rule meets certain logic principles based on
values of a truth table then it is called as an implication. An implication is formed by using two propositions p and q, from which we have four implications: p p p p The symbol And the symbol to p means a false proposition. Thus, an association rule X Y is mapped q q is used to describe the relation between p and q . q q
q iff both X and Y are observed, here X is mapped to p and Y
is mapped to q. Equivalence: - An equivalence is a mode of implication, where an implication has to satisfy the following condition in order to qualify as an equivalence: p q iff (p xor q).(1.3.3) is equivalence symbol. The below
Here p and q are implications and and
given truth table is the truth table for equivalence. Here p and q are implications is equivalence symbol. The below given truth table is the truth table for equivalence. Table 1.3.1 Truth Table for Logical Equivalence
Thus, the association rule whose implication satisfies the equivalence condition given above is considered to be an interesting association rule.
CHAPTER 2 LITERATURE SURVEY

This section consists information about knowledge discovery process, architecture of data mining system, data warehouse, association rule mining, apriori algorithm and FP Growth Algorithm. 2.1 DATA MINING It is the process of extracting knowledge from large amounts of data. Data mining is an essential step in the process of knowledge discovery called KDD. Which is shown as follows: -
Figure 2.1.1 KNOWLEDGE DISCOVERY FROM
DATA (KDD)
The essential steps involved in data mining are : Data Preprocessing: - Data cleaning and data integration are steps involved in data preprocessing. In data cleaning, noisy data (data errors) is removed and in data integration, data from multiple data sources in merged into a single unified format. Data selection: - Where data relevant to the analysis task are retrieved from the database. Data transformation: - Where data are transformed or consolidated into forms appropriate for mining by performing summary or aggregation operations. Data mining: - It is an essential process where intelligent methods are applied in order to extract data patterns. Pattern evaluation: - In this step we identify the truly interesting patterns representing knowledge based on some interestingness measures. Knowledge presentation: - Here visualization and knowledge representation techniques are used to present the mined knowledge to the user. Figure 2.1.2 Architecture of a Typical Data Mining System
Database, data warehouse and other information repository : - This is a set of databases, data warehouses and other kinds of information repositories. Data cleaning and data integration techniques are performed on the data.
Database or data warehouse server : - The database or data warehouse server is responsible for fetching the relevant data, based on the users data mining request.
Knowledge base: - This is the domain knowledge that is used to evaluate the interestingness of resulting patterns. Such knowledge can include concept hierarchies, used to organize attributes or attribute values into different levels of abstraction.
Data mining engine: - This is essential to the data mining system and ideally consists of a set of functional modules for tasks such as characterization, association and correlation analysis, classification, prediction, cluster analysis, outlier analysis, and evolution analysis.
Pattern evaluation module: - This component uses interestingness measures and interacts with the data mining modules so as to focus the search towards interesting patterns. It uses interestingness thresholds to filter out discovered patterns.
User interface: -
This module communicates between users and the data
mining system, allowing the user to interact with the system by specifying a data mining query or task, providing information to help focus the search, and performing exploratory data mining based on the intermediate data mining results. In addition, this component allows the user to browse database and data warehouse schemas or data structures, evaluate mined patterns, and visualize the patterns in different forms. 2.2 DATA WAREHOUSE A Data Warehouse is a subject oriented, integrated, time variant and non volatile collection of data which supports managerial decision making process. The four keywords specified in the above definition can be described as follows: Subject Oriented: - A data warehouse provides a simple and concise view about particular subject issues by excluding data which is not useful for decision making process. Thus, it is specially designed to focus on the modeling and analysis of data for decision makers. For ex: - A data warehouse is organized around major subjects like customer, supplier, product and sales. Rather than concentrating on day-to-day operations. Integrated: -A data warehouse is constructed by integrating multiple heterogenous sources like relational databases and Data Warehouses. Time-Variant: - Data are stored to provide information from historical perspective like a period from 5 to 10 years. Non-Volatile: - A data warehouse is a permanent storage of data; it is a physically separate store of data transformed from the application data found in the operational environment. Due to this separation, a data warehouse does not require transaction processing, recovering and concurrency control mechanisms. It requires only two operations on data: Initial loading of data. Access of data
Figure 2.2.1 Three-Tier Data Warehouse Architecture
1) Bottom Tier: - It is a data warehouse server. Back-end tools are used to feed data into the bottom tier from operational databases or other external sources. These tools perform data extraction, cleaning, integration and transformation. The data is extracted using application program interface known as gateways, which is supported by the underlying DBMS and allows client programs to generate SQL code to be executed at a server. For ex: - ODBC (Open Database Connection). 2) Middle Tier: - The middle tier is an OLAP server implemented by using either a relational OLAP model i.e. an extended relational DBMS that maps operations on multi dimensional data to standard relational operations or a multidimensional
OLAP
model,
that
is,
special-purpose
server
that
directly
implements
multidimensional data and operations. 3) Top Tier: -The top tier is a front-end client layer, which contains query and reporting tools, analysis tools, and data mining tools. 2.3 ASSOCIATION RULE MINING It is a data mining functionality or a data mining method used to find interesting data patterns based on association or correlation relationship among a large set of data items by using association rules which specify the association relationship among data items. For ex:- {milk , bread} {butter} The itemsets whose frequency of occurrence is equal to greater than or equal to min_support count threshold given by domain experts are only considered to be frequent patterns or frequent itemsets. This threshold value is provided to start the pattern discovery process. Thus, ARM is also called as frequent pattern mining. An association or correlation between the items of these frequent itemsets is said to be interesting if it satisfies two interestingness measures called support and confidence that are used to evaluate the interestingness of an association rule. Figure 2.3.1 Example Of Association Rule Mining
For association rule A B: Support = support _count (A U B) / total no: of transactions = 2 / 4 = 50% Confidence = support_count (A U B) / support_count (A) = 2 / 3 = 66.6% So this rule is considered as interesting one. The above example shows that the confidence of rule A B can be easily
derived from the support counts of A and A U B. That is, once the support counts of A, B and A U B are known found, it is straightforward to derive the corresponding association rules
B and B
A and check whether they are strong. Thus the problem of mining
association rules can be reduced to that of mining frequent itemsets. Association rule mining can be viewed as a two-step process: 1. Find all frequent itemsets: - Each of these itemsets will occur at least as frequently as a predetermined minimum support count (min_sup). 2. Generate strong association rules from the frequent itemsets : - These rules must satisfy minimum support and minimum confidence and then only they are considered to be interesting association rules.
Types of Association Rules: The different types of association rules used in operational and relational databases are: 1. Quantitative Association rules 2. Single-Dimensional Association rules 3. Multi-Dimensional Association rules 4. Multi level Association rules
1) Quantitative Association rules: - This approach considers individual

numerical attributes as quantities so it is called dynamic multidimensional association rule. Here, Aquan1 ^ Aquan2 Acat, where Aquan1 and Aquan2 are tests on
quantitative attribute intervals (where the intervals are dynamically determined), and Acat tests a categorical attribute from the task-relevant data. Such rules have been referred to as two-dimensional quantitative association rules. For ex: - age(X, 30::: 39) ^ income(X, 42K:::48K) buys(X, HDTV)
2) Single-Dimensional Association rules: - It consists of only a single dimension, which is used multiple times. For ex: - age (X, 20::: 29) ^ buys(X,
laptop) buys(X, HP printer)
3) Multi-Dimensional Association rules: - Association rules that involve two or

more dimensions or predicates can be referred to as multidimensional association rules. For ex: - age(X, 20::: 29) ^ occupation(X, student) buys(X, laptop)
4) Multilevel Association rules: - When data mining is performed at multiple levels of abstraction rule which are extracted are referred to as multilevel association rules. This is done by using Concept Hierarchy.
CHAPTER 3
PROBLEM ANALYSIS 3.1 PROBLEM DESCRIPTION
The use of previous frequent pattern mining algorithms Apriori and FP Growth use a minimum support threshold to find frequent itemsets in order to discover interesting association rules and these algorithms are based on following assumptions: The threshold value provided by the domain expert is very accurate. The frequent patterns (frequent itemsets) must have occurred frequently at least equal to the threshold. Because of these assumptions, we have the following disadvantages: Loss of association rules involving frequently observed items. Loss of association rules involving infrequently observed items. No consideration for negative association rules.
Loss of Association rules involving Frequently Observed Items :-
Use of a minimum support threshold assumes that an ideal minimum support threshold exists for frequent patterns, and that a user can identify this threshold accurately. But it is unclear how to find this threshold as there is no universal standard for setting this threshold value. Different minimum support thresholds would result in inconsistent mining results, even when the mining process is performed on the same data set. i.e. a lower minimum support threshold would result in more unnecessary association rules being found, and a higher minimum support threshold would result in fewer association rules being found. We consider this situation as a case of losing association rules involving frequent items. The problem of losing frequent association rules can be solved only by lifting the minimum support threshold value. Loss of Association Rules involving infrequently Observed Items :Typically, a data set contains items that appear frequently while other items rarely occur. For Ex: - In a retail fruit market, fruits are frequently observed but occasionally bread is also observed. Some items are rare in nature or infrequently found in a data set. These items are called rare items. If a single minimum support threshold is used and is set high, those association rules involving rare items will not be discovered. Use of a single and lower minimum support threshold, on the other hand, would result in too many uninteresting association rules having that rare item. This is called the rare item problem. No consideration for negative association rules:Algorithms like Apriori and FP Growth do not give importance to the absence of items within a transactional data set. They can discover only positive association rules. For ex: - An association rule like {milk, bread} {butter}, which tells
about the absence of both antecedent and consequent parts of an association rule are not considered to be discovered during the mining process. 3.2 EXISTING SYSTEM The existing system can either implement Apriori algorithm or FP Growth algorithm. The main input parameter given to the existing system is the minimum support
threshold value in order to to get frequent itemsets. An association or correlation between these frequent itemsets is said to be interesting, if it satisfies two interestingness measures called support and confidence, which are used to evaluate the interestingness of an association rule. In this way we can discover interesting association rules. Demerits of existing system are: It is assumed that the domain expert provides an accurate minimum support threshold value. It is also assumed that the frequent patterns or frequent itemsets have occurred frequently atleast equal to the threshold. Negative association rules are not given any importance. Loss of association rules involving frequently observed items. Loss of association rules involving infrequently observed items.
3.3PROPOSEDSYSTEM A novel framework is proposed which removes the demerits of the existing system by removing the need for minimum support threshold value. Here associations are discovered based on logical implications. The principle of this approach considers that an association rule should be reported only when there is enough logical evidence about it in the data set. To do this, we should consider both the presence and absence of items during the mining process. For ex: An association such as A occurrences of A B, A B will be reported only when there are fewer B but more occurrences of A B
Figure 3.3.1 implications
Framework of Association Rules Based On Pseudo
In the first step, the association rules that are observed in the data set are mapped to their implications based on comparison between their support count values (i.e. their frequency of occurrence). The implications which are obtained in this way are called as pseudo implications.
In the second step, these pseudo implications are mapped to a mode of implication called equivalence based on some conditions. The pseudo implications which satisfy all these conditions are called as pseudo implications of equivalences. If only a pair of pseudo implications satisfy the same conditions, then they together form a coherent rule.
Coherent rule: - If a pair of pseudo implications satisfy all the four conditions of equivalence then those two pseudo implications form a coherent rule, these four conditions are: -
S (X, Y) > S (X,
Y)
S (X, Y) > S( S( S( X, X,
X, Y) Y) X, Y)
Y) > S (X, Y) > S (
Association rules decoupled from coherent rules are interesting association rules as they are related to true implication based only on logic but not on domain knowledge. So coherent rules dont need users to preset the minimum support threshold to get frequent patterns as they can be identified via truth table values.
Pseudo implications: - Association rules mapped to implications based on comparison between their support count values are called pseudo implications. These implications are called as pseudo as they resemble real implications. A pseudo implication is judged true or false based on comparison between supports but an implication is judged true or false based on binary values 1 or 0. Mapping association rules to Equivalences The association rules can be mapped to equivalences in two ways: -
Mapping an association rule to equivalence using a single transaction record Mapping an association rule to equivalence using multiple transaction records Mapping an association rule to equivalence using a single transaction record The steps involved in this process are: Itemsets of an association rule are mapped to propositions of an implication: For ex: - Presence of an itemset X is mapped to proposition p=T, iff X is observed. Absence of an itemset X is mapped to p=F i.e. are observed. p iff X is not observed. Thus, Itemsets X and Y are mapped to p and q=T iff both X and Y
Mapping association rules to implications:
For ex: - An association rule X X is observed but Y is not observed.
Y is mapped to implication p
q, iff
Mapping association rules to equivalences: Association rules are mapped to equivalences based on truth table values (T, F, F, T) for implications p q. For ex: - Association rule X X X X Y is mapped to p q iff q, p q, p q and p
Y is true (as p = T and q=T, therefore p equivalence q = T) Y is false Y is false and X Y is true.
Mapping an Association rule to equivalence using multiple transaction records In multiple transaction records, an itemset X is observed in many transaction records. So based on comparison between presence and absence of an itemset, each itemset can be mapped to propositions p and q as follows: If S(X) > S( X) then itemset X is mapped to p = T. X is considered interesting Y can be mapped X, Y).
as it is mostly observed in the dataset. But if a union of itemsets such as (X, Y) is involved, then X to an implication p in transactions when compared to (X, Y), ( X, Y) and ( q only when union of itemsets i.e. (X, Y) is more observed
Thus, Association rules that are mapped to their implications by comparing their support count values are called pseudo implications. These pseudo implications can be mapped to equivalence when following conditions are met: S (X ,Y) > S(X , S(X ,Y) > S( Y) X , Y)
S( S(
X, X,
Y) > S(X , Y) > S(
Y) X , Y)
Therefore, any association rule which is mapped to equivalence is called pseudo implication of equivalence. Now all pseudo implications cannot be mapped to equivalence using itemsets X and Y, due to variance in their support count values, but if two pseudo implications (for ex: - X Y and X Y) satisfy the same conditions given above for equivalence then they form a coherent rule. This process of finding coherent rules is called as coherent rule mining. Example for Coherent rule Mining Let us consider a data set related to 101 different zoo animals which are categorized into 7 classes as follows: Mammals Birds Fishes Insects Invertebrates Amphibians Reptiles This classification is done based on 18 different attributes, in which we have 15 Boolean attributes and each Boolean attribute is assigned either 1 or 0 based on presence or absence of that attribute for an animal. Classification is done based on these 18 attributes: Name of the animal Hair ( Boolean attribute) Feathers (Boolean attribute) Eggs (Boolean attribute) Milk (Boolean attribute) Airborne (Boolean attribute) Aquatic (Boolean attribute)
Predator (Boolean attribute) Toothed (Boolean attribute) Backbone (Boolean attribute) Breathes (Boolean attribute) Venomous (Boolean attribute) Fins (Boolean attribute) Legs (numeric attribute, integers value range: [0,2,4,5,6,8]) Tail ( Boolean attribute) Domestic (Boolean attribute) Cat size (Boolean attribute) Type (Numeric attribute, integer value range: [1, 2, 3, 4, 5, 6, 7] which represents each class of animals. Table 3.3.1 Total Frequency of Occurrence for Each Class of Animals
Let the minimum support threshold be 5%, now the classes of animals whose frequency of occurrence is greater than minimum support threshold are considered to be frequent. And the classes of animals whose frequency of occurrence is less than minimum support threshold are considered to be infrequent. Thus, the classes of animals: Mammals, Birds, Fishes, Insects and Invertebrates are considered to be frequent. And the classes of animals: Amphibians and Reptiles are considered to be infrequent. Problem with infrequent association rules The rules that involve infrequent items i.e. infrequent classes of animals like Amphibians and Reptiles are not discovered via Apriori approach though they are
interesting as their support count is less than minimum support threshold value. But by using Coherent Rule Mining approach we can discover these kinds of association rules. Let us consider an association rule:{Eggs (1), toothed (1), breathes (1), tail (1)}) {Reptile (1)}
Let, X = {eggs (1), toothed (1), breathes (1), tail (1)} And Y = {Reptiles (1)} Now the association rule X Y is reported as an interesting association rule i.e. if the
only when there is enough logical evidence about it in the data set, association rule satisfies all the four conditions for equivalence : S (X, Y) > S(X, S(X, Y) > S( S( S( X, X, Y) X, Y) Y) > S(X, Y) > S( Y) X, Y)
Table 3.3.2 This can be shown with the help of a table given below : -
Frequency of co-occurrences Y
Consequent Y={Rept ile(1)}
Not Tot Y={Reptile( al 0)}
Anteced ent X
X={eggs(1), toothed(1),breathes(1),tail(1)} 3
4 1
Not X={eggs(0),toothed(0),breathes(0 ),tail(0)} Total 5
2 5
97 10
96
Thus, the table given above shows that: S (X U Y) > S (X U S (X U Y) > S ( S( S( XU XU Y) (3 > 1) (3 > 2) Y) (95 > 1) (95 > 2)
X U Y)
Y) > S (X U Y) > S (
X U Y)
Thus the coherent rule formed is: {Eggs (1), toothed (1), breathes (1), tail (1)} {Eggs (1), toothed (1), breathes (1), tail (1)} {Reptile (1)} {Reptile (1)}
Thus, the given association rule is considered to be interesting as its interestingness is based on pure logic i.e. it is logically correct. This coherent rule specifies that an animal which lays eggs, which has teeth, which breathes through lungs and has a
tail is a reptile but an animal which does not have all these four attributes is not a reptile. Problem with Frequent Association Rules
Let us consider the comparison between the two approaches: Coherent Rule Mining and Apriori. Table 3.3.3 Frequent rules found for class Mammal using the two
approaches Now by using the approach called coherent rule mining, we get 5 coherent rules, from which we get 10 association rules which are logically correct. But by using Apriori approach, we get some unnecessary association rules that are not interesting. By using Apriori approach we get an association rule: Domestic (1) mammal (1) (whose support = 7.9 %, confidence = 61.5%), this can be shown through a table given below: Table 3.3.4 Table for the above given association rule Frequency occurrences of co)} Consequent Y={Mammal(1 Y={Mammal(0)} Y Not Total
Antecede nt X
13 X={Domestic(1)} 8 5
88 Not X={Domestic(0)} Total 33 41 55 60 101
The above table shows that the association rule does not satisfy one condition for equivalence as: S (X U Y) < S ( X U Y) (i.e. 8 < 33) X U Y)]. We can also observe that, 33
So this association rule is not considered to be interesting, as it does not satisfy one condition for equivalence [i.e. S (X U Y) > S ( and a weak association rule like Domestic (1) mammal (1) is reported which when used in a business out of 41 mammals i.e. 80.5% of mammals are not domestic, but this fact is ignored
application which leads to wrong decisions.
Enhancement to Proposed System
Here we are using the concept of pruning; Pruning is the process of removing the super sets of the item sets that do not satisfy any one of the four conditions for equivalence. For ex: - let us consider the association rule Domestic (1) pruned, this is called as downward closure property. mammal (1) is not considered to be interesting as it is logically incorrect. Therefore all its supersets are
This downward closure property is used within the Forecast To Prune Technique, where we calculate the coherent rule measure H for an itemset. Now by considering an opening window value w% (i.e. minimum support threshold) we calculate a moving window value mv, where mv = H (H * w %). If the H value of a superset is not within the range (mv, H) of the itemset, then that superset is pruned.
We calculate the coherent rule measure H of an association rule as follows:H = [min (Cov Y, m Cov Y) min (q1, q2) min (q3, q4)]
min (Cov Y, m Cov Y) Here Cov Y = q1 + q3, X U Y and q4 = Table 3.3.5 (1) Frequency occurrences of co} Anteceden tX Consequent Y={Mammal(1) Y Not Y={Mammal(0)} Total X U m Cov Y = q2 + q4, q1 = X U Y, Y. Mammal q2 = X U Y, q3 =
Table for the above association rule: - Milk (1)
X={milk(1)}
41
41
Not X={milk(0)} Total
0 41
60 60
60 101
We calculate the coherent rule measure H of above association rule as per above given table as follows:-
H = [min (41, 60) min (41, 0) min (0, 60)] min (41, 60)
=1
and mv = H (H * 5%) = 1 (1* 5%) = 1 0.05 = 0.95 Now, we consider a superset of the itemset {milk (1), mammal (1)} which is:{milk (1), feathers(0), and mammal (1)}, whose association rule is:Milk (1), feathers (0) mammal (1) whose H value is 1. Which is
within the range of (mv, H) i.e. (0.95,1). Therefore it is not pruned.
CHAPTER 4 SYSTEM STUDY

4.1 Feasibility Study
4.1.1 Technical Feasibility Evaluating the technical feasibility is the trickiest part of a feasibility study. This is because, at this point in time, not too many detailed design of the system, making it difficult to access issues like performance, costs on (on account of the kind of technology to be deployed) etc. A number of issues have to be considered while doing a technical analysis. i) Understand the different technologies involved in the proposed system: Before commencing the project, we have to be very clear about what are the technologies that are to be required for the development of the new system. i) Find out whether the organization currently possesses the required technologies: Is the required technology available with the organization? If so is the capacity sufficient? For instance Will the current printer be able to handle the new reports and forms required for the new system?
4.1.2 Operational Feasibility
Proposed project is beneficial only if it can be turned into information systems that will meet the organizations operating requirements. Simply stated, this test of feasibility asks if the system will work when it is developed and installed. Are there
major barriers to Implementation? Here are questions that will help test the operational feasibility of a project. Is there sufficient support for the project from management from users? If the current system is well liked and used to the extent that persons will not be able to see reasons for change, there may be resistance. Are the current business methods acceptable to the user? If they are not, Users may welcome a change that will bring about a more operational and useful systems. Have the user been involved in the planning and development of the project? Early involvement reduces the chances of resistance to the system and in general and increases the likelihood of successful project. Since the proposed system was to help reduce the hardships encountered. In the existing manual system, the new system was considered to be operational feasible. 4.1.3 Economical Feasibility Economic feasibility attempts 2 weigh the costs of developing and implementing a new system, against the benefits that would accrue from having the new system in place. This feasibility study gives the top management the economic justification for the new system. A simple economic analysis which gives the actual comparison of costs and benefits are much more meaningful in this case. In addition, this proves to be a useful point of reference to compare actual costs as the project progresses. There could be various types of intangible benefits on account of automation. These could include increased customer satisfaction, improvement in product quality better decision making timeliness of information, expediting activities, improved accuracy of operations, better documentation and record keeping, faster retrieval of information, better employee morale. The system is completely based on the Model View Controller architecture. This architecture defines a pattern in which the three individual components will work to-gather. The model is considered with all the business logic. The view will consider the user interface design and controller will transfer the data between the model and view.
In our system the view is designed using Java swing components provided with java programming language. The model and controller are developed using pure core java classes.
The following block diagram will show the MVC architecture.
CHAPTER 5 REQIREMENT ANALYSIS

5.1 Functional Requirements
Inputs: The input to the system will be a dataset. Zoo dataset has taken as an input dataset in this project. The inputs will be as follows. Select type of animal: Select one animal type among the given seven animal types. Processing The input data i.e. zoo data is processed by the model. Output The output will be coherent rules which satisfy the propositional logic. Performance requirements
Due to the high scope of the software, the performance requirements are high. The speed at which the software is required to operate is nominal. Error message design The design of error messages is an important part of the user interface design. As user is bound to commit some errors or other while designing a system the system should be designed to be helpful by providing the user with information regarding the error he/she has committed. Error detection: Even though every effort is make to avoid the occurrence of errors , still a small portion of errors are always likely to occur , these type of errors can be discovered by using validations to check input data. The system is designed to be a user friendly one. In other words the system has been designed to communicate effectively with the user. The system has been designed with Button.
5.2 Non Functional Requirements

The major non-functional Requirements of the system are as follows Usability The system is designed with completely automated process hence there is no or less user intervention. Reliability The system is more reliable because of the qualities that are inherited from the chosen platform java. The code built by using java is more reliable. Performance
This system is developing in the high level languages and using the advanced front-end and back-end technologies it will give response to the end user on client system with in very less time. Supportability The system is designed to be the cross platform supportable. The system is supported on a wide range of hardware and any software platform, which is having JVM, built into the system. 5.3
Hardware Requirements
The hardware used for the development of the project is: PROCESSOR RAM MONITOR HARD DISK : : : : A CPU with CORE2duo 2 GB RAM 17 COLOR 80 GB
5.4
Software Requirements
The software used for the development of the project is: OPERATING SYSTEM USER INTERFACE PROGRAMMING LANGUAGE IDE/WORKBENCH : : : : ANY OS AWT AND SWINGS JAVA MY ECLIPSE 6.0
CHAPTER 6 SYSTEM DESIGN
Design is multi-step process that focuses on data structure software architecture, procedural details, (algorithms etc.) and interface between modules. The design process also translates the requirements into the presentation of software that can be accessed for quality before coding begins. Computer software design changes continuously as new methods; better analysis and broader understanding evolved. Software Design is at relatively early stage in its revolution. Therefore, Software Design methodology lacks the depth, flexibility and quantitative nature that are normally associated with more classical engineering disciplines. However techniques for software designs do exist, criteria for design qualities are available and design notation can be applied.
6.1 Modules:
The system after careful analysis has been identified to be presented with the following modules: The Modules involved are User Interface Module Mapping Association rule Deriving Coherent Rules from mapped association rules
6.1.1 User Interface Module: Rich user interface developed in order to select the type of animal from the drop down list and a button to for generating coherent rules of that type. 6.1.2 Mapping Association rule In this module we derive the approach of mapping Association rule to equivalence. A complete mapping between the two is realized in three progressive steps. Each step depends on the success of a previous step. In the first step, item sets are mapped to propositions in an implication. Item sets can be either observed or not observed in an association rule. Similarly, a proposition can either be true or
false in an implication. Analogously, the presence of an item set can be mapped to a true proposition because this item set can be observed in transactional records. 6.1.3 Deriving Coherent Rules from mapped association rules The pseudo implications of equivalences can be further defined into a concept called coherent rules. We highlight that not all pseudo implications of equivalences can be created using item sets X and Y. Nonetheless, if one pseudo implication of equivalence can be created, then another pseudo implication of equivalence also coexists. Two pseudo implications of equivalences always exist as a pair because they are created based on the same, since they share the same conditions, two pseudo implications of equivalences. Coherent rules meet the necessary and sufficient conditions and have the truth table values of logical equivalence, by definition; a coherent rule consists of a pair of pseudo implications of equivalences that have higher support values compared to another two pseudo implications of equivalences. Each pseudo implication of equivalence is an association rule with the additional property that it can be mapped to a logical equivalence.
6.2
Module Diagrams:
6.2.1 UML Diagrams
Use Case Diagram
Sequence Diagram:
Activity Diagram:
6.3 Algorithm used: Search Algorithm

We propose to search for coherent rules by exploiting the antimonotone property found on the condition S(X, Y ) > S(~X, Y ) targeting at a preselected consequence item set Y . 6.3.1 Distinct Features of ChSearch We list some features of ChSearch compared to a priori. Unlike a priori, ChSearch: Does not require a preset minimum support threshold. ChSearch does not require a preset a minimum support threshold to find association rules. Coherent rules are found based on mapping to logical equivalences. From the coherent rules, we can decouple the pair for two pseudoimplications of equivalences. The latter can be used as association rules with the property that each rule can be further mapped to a logical equivalence. Does not need to generate frequent item sets. ChSearch does not need to generate frequent item sets. Nor does it need to generate the association rules within each item set. Instead, ChSearch finds coherent rules directly. Coherent rules are found within the small number of candidate coherent rules allowed through its constraints. Identifies negative association rules. ChSearch, by default, also identifies negative association rules. Given a set of transaction records that does not indicate item absence, a priori cannot identify negative association rules. ChSearch finds the
negative pseudoimplications of equivalences and uses them to complement both the positive and negative rules found. 6.3.2 Quality of Logic-Based Association Rules Coherent rules are defined based on logic. This improves the quality of association rules discovered because there are no missing association rules due to threshold setting. A user can discover all association rules that are logically correct without having to know the domain knowledge. This is fundamental to various application domains. For example, one can discover the relations in a retail business without having to study the possible relations among items. Any Association rule that is not captured by coherent rules can be denied its importance. These rules are either in contradiction with others (among the positive and negative association rules) or less stringent compared to the definition of logical equivalences. As an example, consider that a nonlogic-based association rule is found within 100 transaction records between item i1 and item i2 with confidence at 75 percent and support at 30 percent. This association rule is not important if the absence of the same item i1 (i.e. ~i1) is found associated with item i2 with a higher confidence at 85 percent and a higher support at 51 percent. Without the further analysis, the first discovery misleads decision makers to conclude that item i1 is associated with item i2, whereas the relation having item ~i1 is, in fact, stronger. Coherent rules avoid this problem all together based on logic.
CHAPTER 6 IMPLEMENTATION
Implementation is the most crucial stage in achieving a successful system and giving the users confidence that the new system is workable and effective. Implementation of a modified application to replace an existing one. This type of conversation is relatively easy to handle, provide there are no major changes in the system. Each program is tested individually at the time of development using the data and has verified that this program linked together in the way specified in the programs specification, the computer system and its environment is tested to the satisfaction of the user. The system that has been developed is accepted and proved to be satisfactory for the user. And so the system is going to be implemented very soon. A simple operating procedure is included so that the user can understand the different functions clearly and quickly.
Initially as a first step the executable form of the application is to be created and loaded in the common server machine which is accessible to the entire user and the server is to be connected to a network. The final stage is to document the entire system which provides components and the operating procedures of the system.
6.1 SCREEN SHOTS
Fig.1.Animal Table
Screen description: the above figure represents the table that is used in this project ,which is used for retrieval and comparison of attributes.
Fig.2. Main Window
Screen description: the above figure represents the mainwindow of this project, through which we can select animal type that is mammal or reptile ete.
Fig.3. Selecting Mammal Type
Screen description: the above figure represents that mammal type selected from dropdown list.
Fig.4. Coherent rules generated for Mammal type
Screen description: the above figure represents the output generated (Coherent rules) for the mammal type.
6.2 SAMPLE CODE

Orderedpowerset.java
package coherent;
import java.util.ArrayList; import java.util.Iterator; import java.util.StringTokenizer;
public class OrderedPowerSet { private ArrayList<String> list=new ArrayList(); Iterator it1=null; ArrayList result=new ArrayList(); public ArrayList { result.add(" "); int source[]=new int[src.length]; for(int var=0;var<src.length;var++) { int a=var+1; source[var]=a; result.add(""+src[var]+""); list.add(""+a+""); } getSet(String[] src)
Iterator it1=list.iterator(); while(it1.hasNext()) { ArrayList<String> list1=getSetList(list,source,src); list=new ArrayList(); list=list1; list1=new ArrayList(); it1.next();
} return result; } public ArrayList<String> getSetList(ArrayList<String> list,int[]
source,String[] src) { ArrayList res=new ArrayList(); it1=list.iterator(); while(it1.hasNext()) { String s=(String)it1.next(); String ss=s; int x=Integer.parseInt(getLastToken(s,",")); for(int i=x;i<source.length;i++) {
String s1=ss+","+source[i]; res.add(s1); addToResult(src,s1); } } return res; } public void addToResult(String src[],String str) { StringTokenizer st=new StringTokenizer(str); StringBuffer sb=new StringBuffer(); while(st.hasMoreTokens()) { int loc=Integer.parseInt(st.nextToken(",")); if(st.hasMoreTokens()) { sb=sb.append(src[loc-1]+","); } else { sb=sb.append(src[loc-1]); } } String r=new String(sb);
result.add(r); } private String getLastToken(String strValue,String token ) {
String strlttoken = null; String []strArray = strValue.split(token); strlttoken = strArray[strArray.length-1]; return strlttoken; } }
Powerset.java
package coherent;
import java.sql.SQLException; import java.util.*; import java.io.FileWriter; import java.io.IOException; import java.sql.Connection; import java.sql.ResultSet; import java.sql.Statement;
public class PowerSet {
ArrayList list; int values[]=new int[4]; public void getSet(String args[],String filename) throws IOException list=new ArrayList(); OrderedPowerSet ops=new OrderedPowerSet(); list=ops.getSet(args); FileWriter fw=new FileWriter(filename); Iterator itr = list.iterator(); StringBuffer s=new StringBuffer(); int powercount=0; while(itr.hasNext()) { powercount+=1; String item=itr.next().toString().replace("{","[").replace("}","]"); s=s.append("{"+item+"},"); } String result=new String(s); fw.write(result); fw.flush(); fw.close(); System.out.println(powercount); System.out.println("With "+filename+" ,PowerSet is generated in current working directory"); s=new StringBuffer(); {
setList(list); result=null; } int len; Connection con=null; Statement stmt=null; CompareTable ct = new CompareTable(); public void doCalculation(ArrayList list1,ArrayList list2,int
selectedatr) throws SQLException{ ct.connectionEstablish(); Iterator it2=list2.iterator(); String qryatr2=new String(); while(it2.hasNext()){ String s=it2.next().toString(); s=s.replace('[', ' '); s=s.replace(']', ' '); s=s.trim(); if(!s.equals("")){ qryatr2=new String(s); } } DBConnection db=new DBConnection(); int q1=0,q2=0,q3=0,q4=0; try {
con = db.getConnection(); stmt=con.createStatement(); } catch (ClassNotFoundException ex) { Logger.getLogger(PowerSet.class.getName()).log(Level.SEVERE, null, ex); } int totalcount=0; Iterator it1=list1.iterator(); int lines=0; while(it1.hasNext()) { String s=it1.next().toString(); s=s.replace('[', ' '); s=s.replace(']',' '); s=s.trim(); if(!s.equals("")) { StringTokenizer st=new StringTokenizer(s,","); int i=0; len=0; while(st.hasMoreTokens()){ st.nextToken(); len=len+1; } StringTokenizer st1=new StringTokenizer(s,",");
String qryatrs1[]=new String[len]; boolean legbo=false; while(st1.hasMoreTokens()) { qryatrs1[i]=st1.nextToken().trim(); if(qryatrs1[i].equals("LEGS")) { legbo=true; } i=i+1; } int legatr[]={0,2,4,5,6,8}; if(legbo) { for(int j=0;j<legatr.length;j++) { String qry1=ct.prepareQuery(qryatrs1, qryatr2, 1,selectedatr,legatr[j],
true,true); String qry2=ct.prepareQuery(qryatrs1, qryatr2, 1, selectedatr,legatr[j],
true,false); String qry3=ct.prepareQuery(qryatrs1, qryatr2, 1, selectedatr,legatr[j],
false,true); String qry4=ct.prepareQuery(qryatrs1, qryatr2, 1, selectedatr,legatr[j],
false,false); String qry=qry1+" UNION ALL "+qry2+" UNION ALL "+qry3+" UNION ALL "+qry4; ResultSet rs=stmt.executeQuery(qry);
rs.next(); q1=rs.getInt(1); rs.next(); q2=rs.getInt(1); rs.next(); q3=rs.getInt(1); rs.next(); q4=rs.getInt(1); if(((q1>q2)&&(q1>q3))&&((q4>q2)&&(q4>q3))) { totalcount+=1; String rel1=ct.displayOutput1ForLeg(qryatrs1,qryatr2,legatr[j]); System.out.println(q1+" "+q2+" "+q3+" "+q4); System.out.println(rel1); values[0]=q1; values[1]=q2; values[2]=q3; values[3]=q4; this.setValues(values); this.setResult(rel1);
} } }
else { String qry1=ct.prepareQuery(qryatrs1, qryatr2, 1, selectedatr,true,true); String qry2=ct.prepareQuery(qryatrs1, qryatr2, 1, selectedatr, true, false); String qry3=ct.prepareQuery(qryatrs1, qryatr2, 1, selectedatr, false, true); String qry4=ct.prepareQuery(qryatrs1, qryatr2, 1, selectedatr, false, false); String qry=qry1+" UNION ALL "+qry2+" UNION ALL "+qry3+" UNION ALL "+qry4; ResultSet rs=stmt.executeQuery(qry); rs.next(); q1=rs.getInt(1); rs.next(); q2=rs.getInt(1); rs.next(); q3=rs.getInt(1); rs.next(); q4=rs.getInt(1); if(((q1>q2)&&(q1>q3))&&((q4>q2)&&(q4>q3))) { String re1=ct.displayOutput1(qryatrs1,qryatr2); System.out.println(q1+" "+q2+" "+q3+" "+q4); System.out.println(re1); values[0]=q1; values[1]=q2; values[2]=q3;
values[3]=q4; this.setValues(values); this.setResult(re1); this.setX(qryatrs1); this.setY(qryatr2); totalcount+=1; } } } } System.out.println("Total Count = "+totalcount); ct.closeConnection(); } public void setList(ArrayList list) { this.list=list; } public ArrayList getList() { return list; } public void setValues(int[] values) { this.values=values;
} public int[] getValues() { return values; } private String[] x; private String y; private String result;
public String getResult() { return result; } public void setResult(String result) { this.result = result; } public String[] getX() { return x; } public void setX(String[] x) { this.x = x; } public String getY() { return y; }
public void setY(String y) { this.y = y; } }
Comparetable.java
public class CompareTable { Connection con=null; Statement stmt=null; ResultSet rs=null; CompareTable() { } public void connectionEstablish() throws SQLException { try { con = DBConnection.getConnection();
catch (ClassNotFoundException ex) {
} } public void closeConnection() throws SQLException
{ con.close(); } public String prepareQuery(String atr1[],String atr2,int i,int j,boolean b1,boolean b2) throws SQLException { StringBuffer sb=new StringBuffer(); if(!b1) { sb=sb.append("NOT("); } else { sb=sb.append("("); } for(int k=0;k<atr1.length;k++) { if(k>0) { sb=sb.append(" and "); } sb=sb.append(atr1[k]+"="+i); } sb=sb.append(")");
if(b2) { sb=sb.append(" and (TYPE="+j+")"); } else { sb=sb.append(" and NOT(TYPE="+j+")"); } String str=new String(sb); String qry="select count(*) from animal where "+str; return qry; } public String displayOutput1(String atr1[],String atr2) { StringBuffer s=new StringBuffer(); s=s.append("{"); for(int i=0;i<atr1.length;i++) { s=s.append(atr1[i]+"(1) , ");
} s=s.append(" } "); s=s.append("==> { "+atr2+"(1) }\n"); s=s.append("Not{ ");
for(int i=0;i<atr1.length;i++) { s=s.append(atr1[i]+"(1) , "); } s=s.append(" } "); s=s.append("==> Not{"+atr2+"(1) }\n"); return (new String(s));
public String displayOutput1ForLeg(String atr1[],String atr2,int legatr) { StringBuffer s=new StringBuffer(); s=s.append("{"); for(int i=0;i<atr1.length;i++) { if(atr1[i].equals("LEGS")) {
s=s.append(atr1[i]+"("+legatr+") , "); } else { s=s.append(atr1[i]+"(1) , ");
} } s=s.append(" } "); s=s.append("==> { "+atr2+"(1) }\n"); s=s.append("Not{ "); for(int i=0;i<atr1.length;i++) { if(atr1[i].equals("LEGS")) { s=s.append(atr1[i]+"("+legatr+") , "); } else { s=s.append(atr1[i]+"(1) , "); } } s=s.append(" } "); s=s.append("==> Not{"+atr2+"(1) }\n"); return new String(s); }
public String prepareQuery(String[] atr1, String atr2, int i, int j, int legatr, boolean b1, boolean b2) throws SQLException { StringBuffer sb=new StringBuffer();
if(!b1) { sb=sb.append("NOT("); } else { sb=sb.append("("); } for(int k=0;k<atr1.length;k++) { if(k>0) { sb=sb.append(" and "); } if(atr1[k].equals("LEGS")) { sb=sb.append(atr1[k]+"="+legatr); } else { sb=sb.append(atr1[k]+"="+i); } } sb=sb.append(")");
if(b2) { sb=sb.append(" and (TYPE="+j+")"); } else { sb=sb.append(" and NOT(TYPE="+j+")"); } String str=new String(sb); String qry="select count(*) from animal where "+str; return qry; } }
CHAPTER 7 SCOPE FOR FUTURE DEVELOPMENT

Every application has its own merits and demerits. The project has covered almost all the requirements. Further requirements and improvements can easily be done since the coding is mainly structured or modular in nature. Changing the existing modules or adding new modules can append improvements.
Further enhancements:
Further enhancements can be made to the application, so that the windows application functions are very attractive and useful manner than the present one. We applied the logic on zoo dataset. We can apply the same logic to any transaction dataset by doing slight modifications as well.
CHAPTER 8 CONCLUSION
We used mapping to logical equivalences according to propositional logic to discover all interesting association rules without loss. These association rules include item sets that are frequently and infrequently observed in a set of transaction records. In addition to a complete set of rules being considered, these association rules can also be reasoned as logical implications because they inherit propositional logic properties. Having considered infrequent items, as well as being implicational, these newly discovered association rules are distinguished from typical association rules. These new association rules reduce the risks associated with using an incomplete set of association rules for decision making, as following: Our new set of association rules avoids reporting that item A is associated with item B if there is a stronger association between item A and the absence of item B. Using prior association rules that do not consider this situation could lead a user to erroneous conclusions about the relationships among items in a data set. Again, identifying the strongest rule among the same items will promote information correctness and appropriate decision making. The risks associated with incomplete rules are reduced fundamentally because our association rules are created without the user having to identify a minimum support threshold. Among the large number of association rules, only those that can be mapped to logical equivalences according to propositional logic are considered interesting and reported.
CHAPTER 9 BIBLOGRAPHY
Books: 2. Java 2 complete Reference by Herbert Schieldt Software Engineering, A Practitioners Approach, 6th Edition, Tata
McGrawHill 3. Software Testing principles and practices, Srinivasan Desikan, Gopala
swami Ramesh, Pearson edition, India. 4. A Unified Modeling Language User Guide, 2nd edition, Book by Grady Booch, James RamBaugh, IvarJacobson for UML concepts and models.
References: Logic-Based Pattern Discovery Alex Tze Hiang Sim, Maria Indrawan, Samar Zutshi and Bala Srinivasan. R. Agrawal, T. Imielinski, and A. Swami, Mining Association Rules between Sets of Items in Large Databases, Sim, A.T.H, Indrawan, M., Srinivasan, B., Mining Infrequent and Interesting Rules from Transaction Records S. Brin, R. Motwani, J.D. Ullman, and S. Tsur, Dynamic Itemset Counting and Implication Rules for Market Basket Data, X. Wu, C.Zhang & S.Zhang Mining Both Positive and Negative Association Rules
CHAPTER 10 APPENDIX
10.1 List of Symbols
S.NO SYMBOL NAME SYMBOL DESCRIPTION
Class
Classes
represent
collection
of
similar entities grouped together.
Association
represents
static
relationship between classes. 2 Association
Aggregation is a form of association. 3 Aggregation It aggregates several classes into single class. Actors are the users of the system and other external entity that react 4 Actor with the system.
A use case is a interaction between 5 Use Case the system and the external environment. It is used for additional process communication. It is the communication between various use cases.
Relation (Uses)
Communication
It represents the state of a process. 8 State Each flows. 9 Initial State It represents the initial state of the object. It represents the final state of the object. It represents the various control flow between the states. It represents the decision making process from a constraint. state goes through various
10
Final State
11
Control Flow
12
Decision Box
Deployment diagrams use the nodes 13 Node for representing physical modules, which is a collection of components.
A circle in DFD represents a state or 14 Data Process/State process which has been triggered due to some event or action.
It represent any external entity such 15 External Entity as keyboard, sensors etc which are used in the system. 16 Transition It represent any communication that occurs between the processes. Object 17 Object Lifeline vertical lifelines represents that the
dimension
objects
communicates. 18 Message It represents the messages
exchanged.
10.2 List of Abbreviations
S.NO 1 2 3 4 5 6 7 8
ABBREVATION DFD API UML GUI IDE LBPD AR PD
DESCRIPTION Data Flow Diagram Application Programming Interface Unified Modelling Language Graphical User Interface Integrated Development Environment Logic based Pattern Discovery Association Rule Pattern Discovery

Logic Based Pattern Discovery New

Caricato da

Informazioni sul documento

Copyright

Formati disponibili

Condividi questo documento

Condividi o incorpora il documento

Opzioni di condivisione

Hai trovato utile questo documento?

Questo contenuto è inappropriato?

Copyright:

Formati disponibili

Logic Based Pattern Discovery New

Caricato da

Copyright:

Formati disponibili

LOGIC BASED PATTERN DISCOVERY

Mr. D.Venkateshwarlu SRINIVAS Supervisor(s) Department External

Mr. M. Head of the

It is a rule used to specify association relationship between

B holds in the transaction set D with support s, where s is

- If an association rule meets certain logic principles based on

q iff both X and Y are observed, here X is mapped to p and Y

Here p and q are implications and and

CHAPTER 2 LITERATURE SURVEY

Figure 2.1.1 KNOWLEDGE DISCOVERY FROM

This module communicates between users and the data

Figure 2.2.1 Three-Tier Data Warehouse Architecture

1) Quantitative Association rules: - This approach considers individual

3) Multi-Dimensional Association rules: - Association rules that involve two or

Loss of Association rules involving Frequently Observed Items :-

Figure 3.3.1 implications

Framework of Association Rules Based On Pseudo

S (X, Y) > S (X,

Y) > S (X, Y) > S (

Mapping association rules to implications:

For ex: - An association rule X X is observed but Y is not observed.

Y) > S(X , Y) > S(

Consequent Y={Rept ile(1)}

Not Tot Y={Reptile( al 0)}

Not X={eggs(0),toothed(0),breathes(0 ),tail(0)} Total 5

88 Not X={Domestic(0)} Total 33 41 55 60 101

application which leads to wrong decisions.

Enhancement to Proposed System

Table for the above association rule: - Milk (1)

Not X={milk(0)} Total

within the range of (mv, H) i.e. (0.95,1). Therefore it is not pruned.

CHAPTER 4 SYSTEM STUDY

4.1.2 Operational Feasibility

The following block diagram will show the MVC architecture.

CHAPTER 5 REQIREMENT ANALYSIS

5.2 Non Functional Requirements

CHAPTER 6 SYSTEM DESIGN

Use Case Diagram

6.3 Algorithm used: Search Algorithm

6.1 SCREEN SHOTS

Fig.2. Main Window

Fig.3. Selecting Mammal Type

Fig.4. Coherent rules generated for Mammal type

6.2 SAMPLE CODE

import java.util.ArrayList; import java.util.Iterator; import java.util.StringTokenizer;

} return result; } public ArrayList<String> getSetList(ArrayList<String> list,int[]

result.add(r); } private String getLastToken(String strValue,String token ) {

public class PowerSet {

true,true); String qry2=ct.prepareQuery(qryatrs1, qryatr2, 1, selectedatr,legatr[j],

true,false); String qry3=ct.prepareQuery(qryatrs1, qryatr2, 1, selectedatr,legatr[j],

false,true); String qry4=ct.prepareQuery(qryatrs1, qryatr2, 1, selectedatr,legatr[j],

public void setY(String y) { this.y = y; } }

catch (ClassNotFoundException ex) {

} } public void closeConnection() throws SQLException

} s=s.append(" } "); s=s.append("==> { "+atr2+"(1) }\n"); s=s.append("Not{ ");

s=s.append(atr1[i]+"("+legatr+") , "); } else { s=s.append(atr1[i]+"(1) , ");

CHAPTER 7 SCOPE FOR FUTURE DEVELOPMENT

McGrawHill 3. Software Testing principles and practices, Srinivasan Desikan, Gopala

similar entities grouped together.

relationship between classes. 2 Association

communicates. 18 Message It represents the messages

10.2 List of Abbreviations

ABBREVATION DFD API UML GUI IDE LBPD AR PD

Potrebbero piacerti anche