Sei sulla pagina 1di 81

ABSTRACT

1
ABSTRACT

Online shopping is becoming more and more common in our daily lives.
Understanding users’ interests and behaviour is essential in order to adapt e-commerce websites to
customers’ requirements. The information about users’ behaviour is stored in the web server logs. The
analysis of such information has focused on applying data mining techniques where a rather static
characterization is used to model users’ behaviour and the sequence of the actions performed by them is not
usually considered. Therefore, incorporating a view of the process followed by users during a session can be
of great interest to identify more complex behavioural patterns. To address this issue, this paper proposes a
linear-temporal logic model checking approach for the analysis of structured e-commerce web logs.

By defining a common way of mapping log records according to the e-commerce


structure, web logs can be easily converted into event logs where the behaviour of users is captured. Then,
different predefined queries can be performed to identify different behavioural patterns that consider the
different actions performed by a user during a session. Finally, the usefulness of the proposed approach has
been studied by applying it to a real case study of a Spanish e-commerce website. The results have identified
interesting findings that have made possible to propose some improvements in the website design with the
aim of increasing its efficiency.

2
TABLE OF CONTENTS

3
TABLE OF CONTENTS

Pg no:

1. INTRODUCTION 7

1.1 Objectives 8

2. LITERATURE SURVEY 11

2.1 Introduction 11

2.2 Review of Literature 12

2.3 Research Gap 20

3. REQIREMENTS AND SPECIFICATIONS 23

3.1 Hardware Specification 23

3.2 Software Specification 23

3.3 Non Functional Requirements 23

3.4 Input and Output Designs 25

3.5 System Environment 26

4. SYSTEM ANALYSIS 39

4.1 Existing System 39

4.2 Proposed System 39

4.3 Module Description 41

4.4 Logic Model 42

5. SYSTEM DESIGN 44

5.1 Clustering Algorithm 44

5.2 Architecture Design 44

5.3 UML Diagrams 44

4
5.4 ER Diagrams 46

5.5 Data Flow Diagrams 48

5.6 Sequence Diagrams 50

6. CODING 53

7. SYSTEM TESTING 58

7.1 Testing Methodologies 58

7.2 Types of Tests 59

7.3 Test Cases 62

7.4 Unit Testing 63

7.5 Integration Testing 64

7.6 Acceptance Testing 64

8. IMPLMENTATION 66

9. CONCLUSION 76

10. BIBLIOGRAPHY 78

5
INTRODUCTION

6
1. INTRODUCTION

In today’s ever connected world, the way people shop has changed. People are buying more and
more over the Internet instead of going traditional shopping. E-commerce provides customers with the
opportunity of browsing endless product catalogues, comparing prices, being continuously informed,
creating wish list and enjoying a better service based on their individual interests. This increasing electronic
market is highly competitive, featuring the possibility for a customer to easily move from one e-commerce
when their necessities are not satisfied. As a consequence, e-commerce business analysts require to know
and understand consumers’ behavior when those navigate through the website, as well as trying to identify
the reasons that motivated them to purchase, or not, a product. Getting this behavioral knowledge will allow
e-commerce websites to deliver a more personalized service to customers, retaining customers and
increasing benefits.

However, discovering customer’ behavior and the reasons that guide their buying process is a very
complex task. E-commerce websites provide customers with a wide variety of navigational options and
actions: users can freely move through different product categories, follow multiple navigational paths to
visit a specific product, or use different mechanisms to buy products, for example. Usually, these user
activities are recorded in the web server logs. Web server logs store, in an ordered way, the sequence of web
events generated by each user (commonly known as click-streams). The very valuable users’ behavior is
hidden in these logs, which must be discovered and analyzed. A correct analysis can be subsequently used to
improve the website contents and structure, to adapt and personalize contents, to recommend products, or to
understand the interest of users in specific products, for instance.

Data mining techniques have proved their usefulness for discovering patterns in log files (when
applied to the analysis of web server logs the term web usage mining is used). Its main goal is to discover
usage patterns trying to explain the users’ interests. Different techniques have been successfully used in the
field of e-commerce, such as classification techniques, clustering, association rules or sequential patterns. In
many application domains these techniques are used in conjunction with process mining techniques. Such
techniques are part of the business intelligence domain and apply specific algorithms to discover hidden
patterns and relationships in large data sets.

An e-commerce website is an open system where almost any customer behavior is possible. This
flexibility makes the discovery of a process-oriented model representing customers’ behavior a difficult task.
This is so because there are so many different possible interactions that the final process model is either an
over fitting spaghetti model or an under fitting flower model, from which no useful analysis can be done. As
a consequence, data mining techniques have been preferred for the analysis of e-commerce websites.
Nevertheless, today’s data mining techniques and tools have some constrains from the analysis point of

7
view. On the one hand, they do not work in a direct way with the sequences of events (the click-stream and
all the data associated to each click) generated during the user’s navigation through the website, but with an
abstraction of such sequence, a kind of global photograph that ignores causality relations. Such abstraction
describes what happened during the session of a customer by means of a set of summarized data, such as the
number of visited web pages, the frequency with which each product category was visited, or the time
customers spend on a web page or category, for instance. On the other hand, most techniques are only able
to classify these abstractions or discover simple relationships among certain high-level events of interest.

In this paper we propose the use of Temporal Logic and model checking techniques as an alternative
to data mining techniques. Such techniques have proved their applicability for open systems. We propose
here a methodology for using it in structured e-commerce websites. The goal is to analyze the usage of e-
commerce websites and to discover customers’ complex behavioral patterns by means of checking temporal
logic formulas describing such behaviors against the log model. At the beginning, web server logs are
preprocessed to extract the detailed traces (sequences of events of a user session). Events can be user or
system actions performed when a client visits a product or product category page, when he or she adds a
product to the wish list, when the search engine is used, etc. The business analyst can use a set of
(predefined) temporal logic patterns to formulate queries that could help him to discover and understand the
way clients use the website. Considering the website structure and contents as well as the different types of
user’s actions, these queries can check the existence of complex causality relationships between events
contained in the client sessions. From the tool point of view, the necessity of having control on the way the
checking algorithms are applied, as well as the disappointing performance results we obtained when using
some model checking tools at our disposal, mainly when used against big models, drove us towards the
interest of developing a specific model checking tool. We did it using the SPOT libraries for LTL model
checking.

As a use case of the proposed approach we describe the analysis carried out for the Up & Scrap e-
commerce website, an important on-line Spanish provider of scraping products. The case of study describes
the way raw logs have been processed, how the traces have been extracted, how users’ behavioral patterns
have been formulated and checked against the log. We also provide with some possible interpretations of the
results obtained for the queries as well as some possible actions which could help in the re-design of the
website whose aim is to improve it.

1.1 OBJECTIVES:

 Input Design is the process of converting a user-oriented description of the input into a computer-
based system. This design is important to avoid errors in the data input process and show the correct
direction to the management for getting correct information from the computerized system.
8
 It is achieved by creating user-friendly screens for the data entry to handle large volume of data. The
goal of designing input is to make data entry easier and to be free from errors. The data entry screen
is designed in such a way that all the data manipulates can be performed. It also provides record
viewing facilities.
 When the data is entered it will check for its validity. Data can be entered with the help of screens.
Appropriate messages are provided as when needed so that the user will not be in maize of instant.
Thus the objective of input design is to create an input layout that is easy to follow.

9
LITERATURE SURVEY

10
2. LITERATURE SURVEY

As per paper one gave approach of recommendation system in e-commerce environment. As


per paper two studied different Classification Algorithms for Consumer Online Shopping Attitudes and
Behavior. As per paper three gives way of customer relationship management with help of Data Mining. As
per paper four provide quality management framework for ecommerce website. For evaluating the quality of
the E-commerce application, an E-commerce Total Quality Management framework (E-TQMF) is proposed
which takes into consideration the quality aspect both from the customer's and quality expert's perspective.
As per paper five describes the process, methods, and specific applications of data mining in e-commerce
site. As per paper six clear how traditional e commerce has changed and Grow & Scale A Successful
Ecommerce Business as per new technology. As per paper seven conceptualize and strategies customer
knowledge management. As per paper eight implement and unify ecommerce analytics related to product,
transactions, customers, merchandising, and marketing more effectively measure performance associated
with customer acquisition, conversion, outcomes, and business impact Use analytics to identify the tactics
that will create the most value, and execute them more effectively. As per paper nine there is always an ever-
growing list of requirements while designing an e-commerce application, which needs to be flexible enough
for easy adaptation. The MEAN stack allows you to meet those requirements on time and build responsive
applications using JavaScript. As per paper ten e-commerce is a multi-disciplinary area, which should be
developed in co-operation with existing fields such as Information Systems and Technology; Marketing,
Finance and Supply Chain Management; Business Strategy and Management; Public Policy; Computer
Science and Telecommunications; and Legal Studies. We will solicit papers on current technologies from
these areas, as well as publish papers on completely new topics.

2.1 Introduction:
In last several years, many activities related to the research deals with the Web usage mining and
Web personalization is conducted. Many of the work emphasizes on retrieving useful patterns and rules with
the help of data mining approaches to recognize the users’ access behavior, so the conclusion can then be
taken related to website reformation or updating. The users navigate through a site and recommendation
engine assists the user to navigate numerous times. Some of the further enhanced web based systems provide
much more functionality providing way of dynamically modifying a site’s structure. Most of all research
works combine more than one of the available methods in Web personalization, say, content management,
user profiling techniques, publishing methods and Web usage mining techniques. In the consequence we
provide a brief detail of the important research works in the Web mining and personalization area.

11
One of the earliest efforts to take benefits of the information that can be collected by exploring a
user’s navigation through a Web site used in Leticia, a client-site agent that monitors the user’s browsing
behavior and then start searching for potential interesting pages for recommendations. The agent through
using a best-first search with heuristic approach also keep track further on the neighboring pages, as a result
deducing interest from the user’s navigational behavior and provides the user with suggestions.
2.2 Review of Literature:
Though Personalization of user data is a recent phenomenon due to the rise of Internet usage, but over
the years Web mining techniques have developed a fully fledged support to provide personalized experience
for user. Lot of work is done to improvise Web Personalization.
 Denis Parra, Peter Brusilovsky (2015) investigated the role of user’s ability to control a
personalized systems by implementing and analyzing a new interactive recommender interface, Set
Fusion. They checked whether allowing the user to control the process of integrating multiple
algorithms resulted in increased engagement and a better user experience. They gave an interactive
visualization using Venn diagram combining with sliders which provide an efficient visual model for
information filtering. Secondly, authors provide a three-dimension evaluation of the user experience
i.e. objective metrics, subjective user perception, and behavioral measures. Through the analysis of
these metrics, they proved the results like effect of trusting tendency on accepting the
recommendations and also uncover the importance of features such as being a native speaker (Denis
Parra, Peter Brusilovsky, 2015).
 Haoyuan Feng, Jin Tian, Harry Jiannan Wang, Minqiang Li (2015) suggest that capturing and
understanding user interests are a main part of social media analytics. Users of social media sites
generally belong to multiple interest communities, and their interests are constantly changing over
time. They proposed a new solution to this research problem by developing a temporal overlapping
community detection method based on time-weighted association rule mining. They conducted
experiments using Movie Lens and Netflix datasets, and experimental results show that proposed
approach outperforms several existing methods in recommendation precision and diversity (Haoyuan
Feng, Jin Tian, Harry Jiannan Wang, Minqiang Li, 2015).
 Ahmad Hawalah, Maria Fasli (2015) suggest that web personalization systems are used to improve
the user experience by providing tailor-made services based on the user’s interests and preferences
which are usually stored in user profiles and for such systems to remain effective, the profiles need to
be able to adapt and reflect the users’ changing behavior. In this proposal, authors introduce a set of
techniques designed to capture and track user interests and retain dynamic user profiles within a
personalization system. User interests are characterized as ontological concepts which are created by
mapping web pages traversed by a user to reference ontology and are later used to learn short-term
and long-term interests. A multi-agent system assists and coordinates the capture, storage,
management and adaptation of user interests. They propose a search system that utilizes their
12
dynamic user profile to provide a personalized search experience (Ahmad Hawalah, Maria Fasli,
2015).
 Haolong Fan, Farookh Khadeer Hussain, Muhammad Younas, Omar Khadeer Hussain (2015)
introduces a completely designed and cloud oriented personalization structure to make easy the
collection of preferences and the release of corresponding Software as a Service (SaaS) services.
They opted an approach in the design and development of the proposed framework is to produce
various models and techniques in an original way. The objective is to provide an integrated and
structured background wherein SaaS services can be added with enhanced personalization quality
and performance (Haolong Fan, Farookh Khadeer Hussain, Muhammad Younas, Omar Khadeer
Hussain, 2015).
 Preeti, Ankit and Purnima (2014) emphasize visualization in recommendation systems. It proposes
a contention based recommender framework which utilizes hybrid approach in which two results:
reliable clients and contentions happened between client agent is pictured, using D3 tool. This
visualization of argumentation graph is a better way to understand the reasoning provided by the
recommender system (Preeti, Rajpal, Ankit and Khurana, Purnima, 2014).
 Lei Li, Li Zheng, Fan Yang and Tao Li (2014) suggested an investigational remark for the growth
of user interests in real-world news recommender systems and then they suggested and proposed a
useful recommendation approach where the preferences of users whether they are of long-range and
short-range smoothly pooled while recommending news articles. In a given hierarchy of currently-
published news, groups of news which user might prefer are differentiated using the long-range
profile and after that, in each chooses groups of news, a new list of news items are selected as the
suggested candidates based on the short-range user profile (Li, Lei, Zheng, Li, Yang, Fan and Li,
Tao, 2014).
 Ji-Hong Park (2014) identifies that in SNSs, personalization, represented by updates and
maintenance of profile pages which results in such participation. His study assumes that
personalization affects on the constant use of SNSs through two factors: First the switching cost
which is an extrinsic factor and second is satisfaction which is an intrinsic factor. He conducted a
web-based survey with the samples of several SNS users from six universities in the US. He also
conducted in-person interviews with different university scholars to extract their thoughts on the
SNSs. Quantitative analysis implemented by testing the expected model with five hypotheses through
a structural equation modeling (SEM) technique. The written down interview data was analyzed
following the constant comparative technique. The result indicates that, the personalization increases
its switching cost as well as satisfaction, which affects the further use of SNSs. These results suggest
that it is necessary to combine both extrinsic and intrinsic factors of user perceptions when adding
personalization features on SNSs (Ji-Hong Park, 2014).

13
 Onur Yilmaz (2013) presented a tag-based website recommendation method, where similarity
measures are consolidated with semantic connections of tags. This methodology performs well in
prescribing new websites or getting client's present interests. However, there is no control on the tags
gave by clients in this framework. In spite of the fact that clients don't plan to deceive the strategy
while tagging websites, distinctive purposes of labeling can make disarray (Yilmaz, Onur, 2013).
 Su, Chang and Tseng (2013) present music recommendation framework that uses online networking
tags rather than client rating to compute the similarity between music pieces. Through the proposed
tag-based similarity, the client inclinations covered up in tags can be inferred effectively. The exact
assessments on genuine online networking datasets uncover that proposed approach in this paper
utilizing social tags beats the current ones which are utilizing just appraisals as a part of terms of
foreseeing the client's inclinations to music (Su, Ja-Hwung, Chang, Wei-Yi and S. Tseng, Vincent,
2013).
 Hong and Jeon (2013) proposed Personalized Research Paper Recommendation System PRPRS
(PRPRS) that outlined expansively and executed a user Profile-based algorithm for separating
essential word by word extraction and keyword inference. This Personalized Research Paper
Recommendation System estimates the likeness between the given topic and collected papers
through Cosine similarity which is actually used to suggest initial papers for each topic in the
information retrieval (Hong, Kwanghee, Jeon, Hocheol and Jeon, Changho, 2013).
 Lee, Lee and Kim (2013) have proposed Personalized Academic Research Paper Recommendation
System (PARPRS) utilizing collaborative filtering methods that suggests related articles, for every
researcher that may be interesting to them. Firstly they used their web crawler to retrieve research
papers from the web. Then, the text similarity is used to define similarity between two papers.
Finally, they introduce recommender system developed using collaborative filtering methods.
Evaluation results demonstrate that system recommends good quality research papers (Lee,
Joonseok, Lee, Kisung and G. Kim Jennifer, 2013).
 Bedi and Agarwal (2013) present a work on Aspect-Oriented Trust Based Mobile Recommender
System (AOTMRS) that uses the idea of trust and Aspect Oriented Programming for guidance
looking for and choice making procedure like genuine living. The proposed system AOTMRS
develops a mobility aspect and generates the reliable proposals which is based on the user
preferences and his demographic data such as location, time, etc (Bedi, Punam and Agarwal, Sumit
Kumar, 2013).
 Pan and Chen (2013) gave a new Group Bayesian Personalized Ranking (GBPR) system for one-
class collaborative filtering or collaborative ranking which directly formulates user’s ranking-related
preferences (Pan, Weike and Chen, Li, 2013).
 S.Geetha rani (2013) proposes click count and link click oriented ranking technique. In this
approach click of each query is counted and the link is also evaluated in the submitted query. The

14
relevance between the query and obtained information can be analyzed, evaluated and ranked (rani,
S.Geetha, 2013).
One of the subareas of web personalization is e-learning. Personalization of e-learning is
accepted as a solution for exploring the affluence of individual differences and the different abilities
for knowledge communication. To implement a predefined personalization technique for
personalizing a course, some student’s characteristics have to be considered. Fathi, Leila,
Mohammad (2013) have considered some solutions to the query like to make dynamic E-learning
personalization according to an appropriate strategy. The study is to combining the automatic
evaluation, selection and use of personalization techniques.
 D.Suresh, S.Prakasam (2013) suggests the instructors to use the e-learning system using rank-
based clustering algorithm to get consistency in the delivery of content, quality content in learning
documents, students self-learning concept and in their test performance improvement (Suresh, D.
and Prakasam, S., 2013). Diana Butucea (2013) provides a theoretical structure through a technical
implementation concept in relative to the e-learning context for visually impaired persons. The
solution which arises after the theoretical structure proposes an analytical approach over the
computer aided learning system. It defines the approach of personalized learning and gives an
illustration of implementation for a software system. This example then provides support and help
for visually challenged computer users (BUTUCEA, Diana, 2013).
 Yarandi, Jahankhani & Tawil (2013) gave an ontology-based technique to develop adaptive e-
learning system. It is based on the design of semantic content, learner and domain models to identify
the teaching process for individual learner’s requirements. The new adaptive e-learning has the
ability to support personalization based on learner’s ability, learning approach, preferences and
knowledge levels (Yarandi, Maryam, Jahankhani, Hossein and Tawil, Abdel-Rahman H., 2013).
 Cakulaa and Sedleniecea (2013) have targeted to classify overlapping points of KM and e-learning
steps to upgrade the website structure and transfer of personalized course knowledge with the help
of effective ontology and metadata norms. This research gives a knowledge management
implementation for a personalized e-learning system (Cakulaa, Sarma and Sedleniecea, Maija,
2013).
Another main application area of personalization is ecommerce. Personalized
recommendation technology in ecommerce is extensive to solve the problem of information
overload. However, with the future growth of the number of ecommerce client and products, the
unique recommendation algorithms and systems face many new challenges such as representation of
user’s interests more accurately, providing more different recommendation modes and supporting
large-scale expansion. To face these challenges Dong et al. (2013), designed and implemented a
personalized hybrid recommendation system, which can support huge data set using Cloud
technology. World Wide Web has witness three generations from Information to Social to Semantic

15
Web. Now the journey is towards the fourth generation which is termed as the Wisdom Web.
Presently Internet has become important part of our life, user wants the Web to identify their
requirements and interests and deliver the contents as per need. Search engines play vital role in
information extraction and delivery. Aarti Singh and Basim Alhadidi presented a framework for
knowledge oriented personalized search engine which may able to give personalized environment to
its users. This framework gives a ideas for the next generation of WWW and contributes towards
Wisdom Web. In another research paper Aarti Singh and Dr. Singh (2013) highlight the technologies
contributing towards the next generation of WWW and also suggest future direction for Web
Personalization.
Another new area is being suggested by Hanak et al. (2013). They have suggested approach
for measuring personalization in Web search results. They have also applied this methodology to
number of users (approx. 200 users) on Google Web Search and have extracted the results. The
reasons of personalization on Google Web Search are also identified. There effort is a step towards
understanding the scope and effects of personalization on Web search engines (Hannak, Aniko,
Sapie˙zy´nski, Piotr, MolaviKakhki, Arash, Krishnamurthy, Balachander, Lazer, David, Mislove,
Alan and Wilson, Christo, 2013).
 Ibrahim F. Moawad, HanaaTalha, EhabHosny, Mohamed Hashim (2012) proposed a concept
that introduces a system to build a user profile from initial information, and keep it through implicit
feedback from user to construct a complete, dynamic and the latest user profile. In the web search
process, the model semantically optimizes the user query in two steps. In first step, query
optimization can be done using user profile preferences and in second step query optimization done
using the Word Net ontology. The model constructed on the benefit of the search engines by
utilizing them to regain the web search results (Ibrahim F. Moawad, HanaaTalha, EhabHosny,
Mohamed Hashim (2013).
One of the analyses done by Chhavi Rana (2012), gave a precise and broad understanding of
the work done in this area. The paper reviews usefulness of different approaches along with the
projects associated with some of the techniques. After that there has been lot of improvement that is
encountered in the area of Web Personalization. This paper tries to summarize the work done in
various subfields of Web Personalization after 2012. With the dramatic increase in the number of
websites on the Internet, tagging has become popular for finding related, personal and important
documents (Rana, Chhavi, 2012).
 A.Vaishnavi (2011) proposed a technique for developing Web Personalization system using
Modified Fuzzy Probabilistic C Means (MFPCM). She claims that this approach 39 increases the
possibility that URLs given to a user will be of his interest (A.Vaishnavi, 2011). Another area
explored by many researchers is Research Paper Recommendation System. Different researchers
have used different approaches.

16
 Namita Mittal, Richi Nayak, MC Govil and KC Jain (2010) suggest an approach for personalized
web search which utilizes ontology to represent the setting of clients' need, dynamic client’s profile
updated on time and recommendations received from similar users collectively. The proposed
method for combining ontology, dynamic client profile and collaborative filtering improves the
accuracy of the information retrieval (Mittal, Namita, Nayak, Richi, Govil, MC and Jain, KC, 2010).
 Xia Min-jie and Zhang Jin-ge (2010) present the User ID-URL connected matrix as indicated by
log data, and the strategy is simple and easy to enhance the way of algorithm. It computes User ID-
URL associated matrix and Distance matrix to collect users into groups, and recommend
personalized recommend goods more effectively which may be preferred by the user (Min-jie, Xia
and Jin-ge, Zhang, 2010).
 Teerakorn Bounoy, Aranya Walairacht (2010) proposed a technique to learn user profiles from
users' search histories which can be collected without direct user involvement has been proposed in
this paper. The user profiles can be created from the user's search history dynamically and it is
upgraded by a common profile which is automatically retrieved from a general category hierarchy
and used to improve retrieval effectiveness in Web search based on a new click through
interpretation. The categories 40 that may be of useful to the user are deduced based on his/her query
and the two profiles. The experimental results show the accuracy of using both profiles is quite
better than using only the user profile or only general profile And the results also indicate that using
of this technique reduces the number of documents retrieved by the search engine almost one third is
reduced (Bounoy Teerakorn, Walairacht Aranya, 2010).
 Qingyan Yang, Ju Fan, Jianyong Wang, Lizhu Zhou (2010) provides a personalized Web page
recommendation model called, PIGEON (Personalized web page recommendation). It is an iteration
algorithm which is based on the concept graph theory. It uses a collaborative filtering and a topic-
aware Markov model to discover interested topics of users, which is used to measure user
similarities. They propose a topic-aware Markov model to study users’ navigation approach which
collects both temporal and topical relevance of pages by which topically coherent pages can be
recommended (Yang, Qingyan, Fan, Ju, Wang, Jianyong and Zhou, Lizhu, 2010).
 Yuewu Dong, Jiang tao Li (2010) aims to implement personalized interfaces for each user and
services according as their individual behavior. In this paper, a Web mining based practical
architecture of a personalized learning system is suggested and providing an environment to the
learner with a personalized learning. The process of Web mining in this architecture is given and it
includes four steps:
 Data collection,
 Data pretreatment,
 Data analysis,
 Computation and generation of personalized output.
17
They implement the proposed architecture to a education system and receives acceptable results
(Dong, Yuewu and Li., Jiangtao, 2010).
 Jie Yu, Fang fang Liu (2010) provide an effective means when different users submit the same
query then Personalized Web search gives the specific results to them. Real time information about
user’s need is a major concern in personalized search. The majority of the methods focus further on
the site structuring, user profile based on Web pages which affects the efficiency of search engine.
Dynamics of user profile is generally ignored. They introduced a system that correctly provides
preferences of users by choosing for effective personalized search (Yu, Jie and Liu, Fangfang, 2010).
 Irene Garrigós Jaime Gomez, Geert-Jan Houben (2010) discovers personalization of websites as
an important matter in Web modeling methods due to their big and different groups of user.
However, due to the existence of too many details to represent the same design concepts in different
methodologies, personalization specifications cannot be used out of the scope of a single tool or
method. In some cases, personalization is not defined as a separate dimension, which being difficult
to maintain and update. They gave solution to the identified problems by presenting a generic
modeling technique to obtain the specification of the personalization. Personalization specifications
can be reused among multiple websites and different development environments (Irene Garrigós
Jaime Gomez, Geert-Jan Houben 2010).
 Alan L. Montgomery, Michael D. Smith (2009) identifies personalization as a main component of
an interactive marketing policy. Its purpose is to take a consistent product or service to an individual
customer's needs. The goal is to obtain profit for the producer and increased value for the consumer.
Applications of personalization make progress heavily in presence of the Internet, since it provides
an environment which is information rich and well suited to interactivity (Alan L. Montgomery,
Michael D. Smith 2009).
 R. Forsati, M. R. Meybodi and A. GhariNeiat (2009) introduced a new web site personalization
system based on the proposed Weighted Association Rule (WAR) model. The association rule
mining is done by assigning a considerable weight to the pages based on time spent by each web user
on each page and number of visit of each page. The proposed weighting measure can be used to
determine the weight of a page to a user, and try to give more weight age to pages which are more
useful to the user, in order to capture the user’s information need more precisely and recommend
pages more useful to the user (Forsati, Meybodi and Neiat, A. Ghari, 2009).
 WANG Xiao-Gang and LI Yue (2009) proposed an intellectual web recommender system
recognized as WAPPS. It is based on linear or sequential web access patterns. In the given system,
the linear web access pattern mining algorithm (CS mine) is used to extract repeated sequential web
access patterns. The mined patterns are stored in the Pattern-tree, which is then used for matching
and generating web links for online recommendations (Xiao-Gang, WANG and Yue, LI2009).

18
 Magdalini Eirinaki, Michalis Vazirgiannis (2005) choose the Markov Chain to synopsize the
Navigational Tree NTG. They present initial results regarding the impact of incorporating there
proposed method in the Markov Chain prediction model. More explicitly, they select the top-n most
popular paths, as derived from the web logs. For these paths they compute a recommendation set,
using two variations of Markov Chains It then compare the recommendation sets with the actual
paths followed by the users (Eirinaki, Magdalini and Vazirgiannis, Michalis, 2005).
 Zhao et al. (2005) put forward a unified model multilevel Web personalization. That approach is
unique and different from existing approaches. It is identified that the two dimensions, their hybrid
and the fundamental customization can be located in an information retrieval model. This model is
totally based on the hierarchical theory, any of the concepts can be easily understandable from
variety of views and also each and every view contains many levels. It allow a better personalization,
they suggest the proper utilization of the model for multilevel service structure and multilevel user
structure. The work utilizes the multilevel structures to make more accurate web recommendation
and personalization systems (Zhao, Yan, Yao, Yiyu and Zhong, Ning, 2005).
 Sutheera Puntheeranurak and Hidekazu Tsuji (2005) developed an approach that user
information learned from user’s web logs data to create exact complete individual profiles. One
profile division contains truth or facts about a user while the other division contains rules which
describe the user’s behavior. They actually use Web usage mining to obtain from the data, the
behavioral rules (Puntheeranurak, Sutheera and Tsuji, Hidekazu, 2005).

The one of the advanced system is the Web Personalizes, which was proposed by Mobasher
et al.(2001) to determine information for the provision of recommendations to current users based on
their browsing behavior to previous users, Web Personalizes for mining Web log files provides a
framework. Web Personalizes exclusively depends on unidentified web usage data provided by web
logs and the hypertext organization of a site. After data assembling and preprocessing (converting the
content, usage, and structure information available in the different data sources into various data
representation), techniques of data mining such as clustering, pattern discovery, association rules,
and classification are used to determine attractive usage patterns. The outcomes are then used for the
formation of aggregated usage profiles, as a result to specify decision rules. The matching of each
and every user’s activity against these profiles is carried out by the recommendation engine which
provides users with a listing of suggested hypertext links. Currently this framework has been
extended [Mobasher et al.] to integrate content profiles into the recommendation process as a way to
improve the usefulness of personalization activities. Content and usage profiles are represented in the
form of weighted collections of page view records. The pages with somewhat same content can be
grouped, and this grouping can be done the content profiles in various ways. On the whole the
objective is to build a consistent approach for both content and usage profiles so as to integrate them

19
more smoothly. The system is divided into two modules: first the offline component consists of
specific Web mining tasks and data preparation, and second the online module called a real-time
recommendation engine (Mobasher, Bamshad, Dai, Honghua, Luo, Tao and Nakagawa, Miki, 2002).
 Cooley et al and Srivastava et al (2000) define Web usage mining as a 3-phase process, comprising
of preprocessing, pattern discovery, and pattern analysis. Cooley et al and Srivastava et al prototype
system (Web SIFT), does smart refinement and preprocessing for identification of users, server
sessions, and deducing cached page references by the use of the referrer field, and also perform
content and structure preprocessing [Cooley et al.]. By the use of data mining techniques such as
clustering, sequential pattern analysis, association rules, and classification and also by the use of
general statistic algorithms, Pattern discovery is achieved. Finally then the results then are evaluated
through a easy knowledge query mechanism, which may be a visualization tool, or the information
filter, which makes use of the preprocessed data and structure information to clean the results of the
knowledge discovery algorithms automatically (Mobasher, B., Cooley, R. and Srivastava, J., 2000).
 Armstrong et al. (1995) proposed one of the most prominent frameworks from the beginning of
Web use mining is Web Watcher. The thought is to make a tour guide agent that gives route
indications to the client through a given Web gathering, in light of its information of the client's
advantage, the area and significance of different things in the area, and also the path in which
different clients have connected with the accumulation before. The framework begins by profiling
the client, securing data about her hobbies. Every time the client asks for a page, this data is directed
through an intermediary server keeping in mind the end goal to effortlessly monitor the client session
over the Web. Its procedure for giving recommendation is based on input from previous visits. A
comparative framework is the Personal Web Watcher that is organized to follow for a specific client,
exhibiting his hobbies. It exclusively stores the locations of pages required by the client and
highlights interesting hyperlinks without including the user in its learning process, requesting
essential words or assumption about pages as Web Watcher does (Armstrong, R, Freitag, D.
Joachims, T. and Mitchell, T.1995).

2.3 Research Gap:


As we have referred the above research papers based on web personalization problem in the web
usage mining, we can conclude the following things based on considered papers. One of the main
application areas of personalization is e-commerce. Personalized recommendation technology in e-
commerce is extensive to solve the problem of information overload. However, with the future growth of the
number of ecommerce client and products, the unique recommendation algorithms and systems face many
new challenges such as representation of user’s interests more accurately, providing more different
recommendation modes and supporting large-scale expansion. One of the subareas of web personalization is
e-learning. Personalization of e-learning is accepted as a solution for exploring the affluence of individual
20
differences and the different abilities for knowledge communication. To implement a predefined
personalization technique for personalizing a course, some student’s characteristics have to be considered.
Another area explored by many researchers is Research Paper Recommendation System. Different
researchers have used different approaches. Haoyuan Feng, Jin Tian, Harry Jiannan Wang, Minqiang Li
(2015) suggest that capturing and understanding user interests are a main part of social media analytics.
Users of social media sites generally belong to multiple interest communities, and their interests are
constantly changing over time. Ahmad Hawalah, Maria Fasli (2015) suggest that web personalization
systems are used to improve the user experience by providing tailor-made services based on the user’s
interests and preferences which are usually stored in user profiles and for such systems to remain effective,
the profiles need to be able to adapt and reflect the users’ changing behavior. Teerakorn Bounoy, Aranya
Walairacht (2010) proposed a technique to learn user profiles from users' search histories which can be
collected without direct user involvement has been proposed in this paper. The user profiles can be created
from the user's search history dynamically and it is upgraded by a common profile which is automatically
retrieved from a general category hierarchy and used to improve retrieval effectiveness in Web search based
on a new click through interpretation. Sutheera Puntheeranurak and Hidekazu Tsuji (2005) developed an
approach that user information learned from user’s web logs data to create exact complete individual
profiles.

21
REQUIREMENTS AND SPECIFICATIONS

22
3. REQIREMENTS AND SPECIFICATIONS

3.1 Hardware Specifications:-

 Processor - Pentium –III


 Speed - 1.1 GHz
 RAM - 256 MB (min)
 Hard Disk - 20 GB
 Key Board - Standard Windows Keyboard
 Mouse - Two or Three Button Mouse
 Monitor - SVGA

3.2 Software Specification:-


 Operating System : Windows 95/98/2000/XP /7

 Application Server : Tomcat5.0/6.X /8.X

 Front End : HTML, Java, Jsp

 Scripts : JavaScript, jquery, ajax

 Server side Script : Java Server Pages.

 Database Connectivity : Mysql.

3.3 Non Functional Requirements:

 Secure access of confidential data (user’s details). SSL can be used.


 24 X 7 availability.
 Better component design to get better performance at peak time
 Flexible service based architecture will be highly desirable for future extension

N-Tier Architecture:

Simply stated, an n-tier application helps us distribute the overall functionality into various tiers or layers:

Presentation Layer, Business Rules Layer, Data Access Layer. Each layer can be developed independently
of the other provided that it adheres to the standards and communicates with the other layers as per the
specifications.

23
This is the one of the biggest advantages of the n-tier application. Each layer can potentially treat the other
layer as a ‘Block-Box’. In other words, each layer does not care how other layer processes the data as long
as it sends the right data in a correct format.

The Presentation Layer:

Also called as the client layer comprises of components that are dedicated to presenting the data to
the user. For example: Windows/Web Forms and buttons, edit boxes, Text boxes, labels, grids, etc.

The Business Rules Layer:

This layer encapsulates the Business rules or the business logic of the encapsulations. To have a
separate layer for business logic is of a great advantage. This is because any changes in Business Rules can
be easily handled in this layer. As long as the interface between the layers remains the same, any changes to
the functionality/processing logic in this layer can be made without impacting the others. A lot of client-
server apps failed to implement successfully as changing the business logic was a painful process.

The Data Access Layer:

This layer comprises of components that help in accessing the Database. If used in the right way, this
layer provides a level of abstraction for the database structures. Simply put changes made to the database,
tables, etc do not affect the rest of the application because of the Data Access layer. The different application
layers send the data requests to this layer and receive the response from this layer.

24
The Database Layer:

This layer comprises of the Database Components such as DB Files, Tables, Views, etc. The Actual
database could be created using SQL Server, Oracle, Flat files, etc.In an n-tier application, the entire
application can be implemented in such a way that it is independent of the actual Database. For instance, you
could change the Database Location with minimal changes to Data Access Layer. The rest of the Application
should remain unaffected.

3.4 Input and Output Design:

3.4.1 Input Design:

Inaccurate input data are the most common causes of errors in data processing. Errors entered by data
entry operators can be controlled by the Input design. "Input design is the process of converting user
originated inputs to computer based formats". It consists of developing specification and procedure for data
preparation.

Objectives of input design:

The main objectives of input design are:

 Controlling amount of input: Due to so many reasons, design should control the quantity of data
for input. Reducing the data requirement can lower cost by reducing labor expenses. By reducing
input requirement, the analyst can speed the entire process from data capture to providing results to
the users.
 Avoiding delay: A processing delay resulting from data preparation or data entry operator is called
bottleneck. Avoiding bottleneck should always be one objective of the analyst while designing
output.
 Avoiding errors in data: The rate at which errors occurs depends on the quantity of data, i.e. smaller
the amount of data to input the fewer the opportunities for errors.
 Keeping the process simple: Simplicity works and is accepted by the users. Complexity should be
avoided when there are simple alternatives.

3.4.2 Output Design:

The term output necessarily implies to information on printed or displayed by an information system.
Following are the activities that are carried out in output design stage. Identification of specific output
required to meet the information requirements. Selecting of methods for processing outputs. Designing of
reports, formats or other documents that acts as a carrier of information.

25
3.5 System Environment:

3.5.1 Java Technology


Java technology is both a programming language and a platform.

3.5.1.1 The Java Programming Language


The Java programming language is a high-level language that can be characterized by all of the
following buzzwords:
 Simple
 Architecture neutral
 Object oriented
 Portable
 Distributed
 High performance
 Interpreted
 Multithreaded
 Robust
 Dynamic
 Secure
With most programming languages, you either compile or interpret a program so that you can run it
on your computer. The Java programming language is unusual in that a program is both compiled and
interpreted. With the compiler, first you translate a program into an intermediate language called Java byte
codes —the platform-independent codes interpreted by the interpreter on the Java platform. The interpreter
parses and runs each Java byte code instruction on the computer. Compilation happens just once;
interpretation occurs each time the program is executed. The following figure illustrates how this works.

You can think of Java byte codes as the machine code instructions for the Java Virtual Machine
(Java VM). Every Java interpreter, whether it’s a development tool or a Web browser that can run applets, is
an implementation of the Java VM. Java byte codes help make “write once, run anywhere” possible. You
can compile your program into byte codes on any platform that has a Java compiler. The byte codes can then
be run on any implementation of the Java VM. That means that as long as a computer has a Java VM, the

26
same program written in the Java programming language can run on Windows 2000, a Solaris workstation,
or on an iMac.

3.5.2 The Java Platform


A platform is the hardware or software environment in which a program runs. We’ve already
mentioned some of the most popular platforms like Windows 2000, Linux, Solaris, and MacOS.
Most platforms can be described as a combination of the operating system and hardware. The Java
platform differs from most other platforms in that it’s a software-only platform that runs on top of
other hardware-based platforms.

The Java platform has two components:


 The Java Virtual Machine (Java VM)
 The Java Application Programming Interface (Java API)
You’ve already been introduced to the Java VM. It’s the base for the Java platform and is ported
onto various hardware-based platforms.

The Java API is a large collection of ready-made software components that provide many useful
capabilities, such as graphical user interface (GUI) widgets. The Java API is grouped into libraries of
related classes and interfaces; these libraries are known as packages. The next section, What Can
Java Technology Do? Highlights what functionality some of the packages in the Java API provide.
The following figure depicts a program that’s running on the Java platform. As the figure shows,
the Java API and the virtual machine insulate the program from the hardware.

27
Native code is code that after you compile it, the compiled code runs on a specific hardware
platform. As a platform-independent environment, the Java platform can be a bit slower than native
code. However, smart compilers, well-tuned interpreters, and just-in-time byte code compilers can
bring performance close to that of native code without threatening portability.

3.5.3 What Can Java Technology Do?


The most common types of programs written in the Java programming language are applets
and applications. If you’ve surfed the Web, you’re probably already familiar with applets. An applet
is a program that adheres to certain conventions that allow it to run within a Java-enabled browser.

However, the Java programming language is not just for writing cute, entertaining applets for the
Web. The general-purpose, high-level Java programming language is also a powerful software
platform. Using the generous API, you can write many types of programs.
An application is a standalone program that runs directly on the Java platform. A special kind of
application known as a server serves and supports clients on a network. Examples of servers are Web
servers, proxy servers, mail servers, and print servers. Another specialized program is a servlet. A
servlet can almost be thought of as an applet that runs on the server side. Java Servlets are a popular
choice for building interactive web applications, replacing the use of CGI scripts. Servlets are similar
to applets in that they are runtime extensions of applications. Instead of working in browsers, though,
servlets run within Java Web servers, configuring or tailoring the server.
How does the API support all these kinds of programs? It does so with packages of software
components that provides a wide range of functionality. Every full implementation of the Java
platform gives you the following features:
 The essentials: Objects, strings, threads, numbers, input and output, data structures, system
properties, date and time, and so on.
 Applets: The set of conventions used by applets.
 Networking: URLs, TCP (Transmission Control Protocol), UDP (User Data gram Protocol)
sockets, and IP (Internet Protocol) addresses.
 Internationalization: Help for writing programs that can be localized for users worldwide.
Programs can automatically adapt to specific locales and be displayed in the appropriate
language.
 Security: Both low level and high level, including electronic signatures, public and private key
management, access control, and certificates.
 Software components: Known as JavaBeansTM, can plug into existing component architectures.
 Object serialization: Allows lightweight persistence and communication via Remote Method
Invocation (RMI).

28
3.5.4 Java Database Connectivity (JDBCTM): Provides uniform access to a wide range of relational
databases.
The Java platform also has APIs for 2D and 3D graphics, accessibility, servers, collaboration,
telephony, speech, animation, and more. The following figure depicts what is included in the Java 2
SDK.

3.5.5 How Will Java Technology Change My Life?


We can’t promise you fame, fortune, or even a job if you learn the Java programming language.
Still, it is likely to make your programs better and requires less effort than other languages. We
believe that Java technology will help you do the following:
 Get started quickly: Although the Java programming language is a powerful object-oriented
language, it’s easy to learn, especially for programmers already familiar with C or C++.
 Write less code: Comparisons of program metrics (class counts, method counts, and so on)
suggest that a program written in the Java programming language can be four times smaller
than the same program in C++.
 Write better code: The Java programming language encourages good coding practices, and
its garbage collection helps you avoid memory leaks. Its object orientation, its JavaBeans
component architecture, and its wide-ranging, easily extendible API let you reuse other
people’s tested code and introduce fewer bugs.
 Develop programs more quickly: Your development time may be as much as twice as fast
versus writing the same program in C++. Why? You write fewer lines of code and it is a
simpler programming language than C++.
 Avoid platform dependencies with 100% Pure Java: You can keep your program portable
by avoiding the use of libraries written in other languages. The 100% Pure Java TM Product
Certification Program has a repository of historical process manuals, white papers, brochures,
and similar materials online.
29
 Write once, run anywhere: Because 100% Pure Java programs are compiled into machine-
independent byte codes, they run consistently on any Java platform.
 Distribute software more easily: You can upgrade applets easily from a central server.
Applets take advantage of the feature of allowing new classes to be loaded “on the fly,”
without recompiling the entire program.
3.5.6 ODBC
Microsoft Open Database Connectivity (ODBC) is a standard programming interface for application
developers and database systems providers. Before ODBC became a de facto standard for Windows
programs to interface with database systems, programmers had to use proprietary languages for each
database they wanted to connect to. Now, ODBC has made the choice of the database system almost
irrelevant from a coding perspective, which is as it should be. Application developers have much more
important things to worry about than the syntax that is needed to port their program from one database to
another when business needs suddenly change.
Through the ODBC Administrator in Control Panel, you can specify the particular database that is
associated with a data source that an ODBC application program is written to use. Think of an ODBC data
source as a door with a name on it. Each door will lead you to a particular database. For example, the data
source named Sales Figures might be a SQL Server database, whereas the Accounts Payable data source
could refer to an Access database. The physical database referred to by a data source can reside anywhere on
the LAN.
The ODBC system files are not installed on your system by Windows 95. Rather, they are installed
when you setup a separate database application, such as SQL Server Client or Visual Basic 4.0. When the
ODBC icon is installed in Control Panel, it uses a file called ODBCINST.DLL. It is also possible to
administer your ODBC data sources through a stand-alone program called ODBCADM.EXE. There is a 16-
bit and a 32-bit version of this program and each maintains a separate list of ODBC data sources.

From a programming perspective, the beauty of ODBC is that the application can be written to use
the same set of function calls to interface with any data source, regardless of the database vendor. The source
code of the application doesn’t change whether it talks to Oracle or SQL Server. We only mention these two
as an example. There are ODBC drivers available for several dozen popular database systems. Even Excel
spreadsheets and plain text files can be turned into data sources. The operating system uses the Registry
information written by ODBC Administrator to determine which low-level ODBC drivers are needed to talk
to the data source (such as the interface to Oracle or SQL Server). The loading of the ODBC drivers is
transparent to the ODBC application program. In a client/server environment, the ODBC API even handles
many of the network issues for the application programmer.
The advantages of this scheme are so numerous that you are probably thinking there must be some
catch. The only disadvantage of ODBC is that it isn’t as efficient as talking directly to the native database
30
interface. ODBC has had many detractors make the charge that it is too slow. Microsoft has always claimed
that the critical factor in performance is the quality of the driver software that is used. In our humble opinion,
this is true. The availability of good ODBC drivers has improved a great deal recently. And anyway, the
criticism about performance is somewhat analogous to those who said that compilers would never match the
speed of pure assembly language. Maybe not, but the compiler (or ODBC) gives you the opportunity to write
cleaner programs, which means you finish sooner. Meanwhile, computers get faster every year.

3.5.7 JDBC
In an effort to set an independent database standard API for Java; Sun Microsystems developed Java
Database Connectivity, or JDBC. JDBC offers a generic SQL database access mechanism that provides a
consistent interface to a variety of RDBMSs. This consistent interface is achieved through the use of “plug-
in” database connectivity modules, or drivers. If a database vendor wishes to have JDBC support, he or she
must provide the driver for each platform that the database and Java run on.
To gain a wider acceptance of JDBC, Sun based JDBC’s framework on ODBC. As you discovered
earlier in this chapter, ODBC has widespread support on a variety of platforms. Basing JDBC on ODBC will
allow vendors to bring JDBC drivers to market much faster than developing a completely new connectivity
solution.
JDBC was announced in March of 1996. It was released for a 90 day public review that ended June
8, 1996. Because of user input, the final JDBC v1.0 specification was released soon after.
The remainder of this section will cover enough information about JDBC for you to know what it is about
and how to use it effectively. This is by no means a complete overview of JDBC. That would fill an entire
book.

 JDBC Goals
Few software packages are designed without goals in mind. JDBC is one that, because of its many
goals, drove the development of the API. These goals, in conjunction with early reviewer feedback, have
finalized the JDBC class library into a solid framework for building database applications in Java.
The goals that were set for JDBC are important. They will give you some insight as to why certain
classes and functionalities behave the way they do. The eight design goals for JDBC are as follows:

 Level API
The designers felt that their main goal was to define a SQL interface for Java. Although not the
lowest database interface level possible, it is at a low enough level for higher-level tools and APIs to be
created. Conversely, it is at a high enough level for application programmers to use it confidently.
Attaining this goal allows for future tool vendors to “generate” JDBC code and to hide many of JDBC’s
complexities from the end user.
31
 SQL Conformance
SQL syntax varies as you move from database vendor to database vendor. In an effort to support a
wide variety of vendors, JDBC will allow any query statement to be passed through it to the underlying
database driver. This allows the connectivity module to handle non-standard functionality in a manner
that is suitable for its users.

 JDBC must be implemental on top of common database interfaces


The JDBC SQL API must “sit” on top of other common SQL level APIs. This goal allows JDBC
to use existing ODBC level drivers by the use of a software interface. This interface would translate
JDBC calls to ODBC and vice versa.
 Provide a Java interface that is consistent with the rest of the Java system
Because of Java’s acceptance in the user community thus far, the designers feel that they should not
stray from the current design of the core Java system.

 Keep it simple
This goal probably appears in all software design goal listings. JDBC is no exception. Sun felt that
the design of JDBC should be very simple, allowing for only one method of completing a task per
mechanism. Allowing duplicate functionality only serves to confuse the users of the API.

 Use strong, static typing wherever possible


Strong typing allows for more error checking to be done at compile time; also, less error appear at
runtime.

 Keep the common cases simple


Because more often than not, the usual SQL calls used by the programmer are simple SELECT’s,
INSERT’s, DELETE’s and UPDATE’s, these queries should be simple to perform with JDBC. However,
more complex SQL statements should also be possible.

Finally we decided to proceed the implementation using Java Networking.

And for dynamically updating the cache table we go for MS Access database.

Java ha two things: a programming language and a platform.

Java is a high-level programming language that is all of the following

Simple Architecture-neutral

Object-oriented Portable

Distributed High-performance

32
Interpreted multithreaded

Robust Dynamic

Secure

Java is also unusual in that each Java program is both compiled and interpreted. With a
compile you translate a Java program into an intermediate language called Java byte codes the
platform-independent code instruction is passed and run on the computer.

Compilation happens just once; interpretation occurs each time the program is executed. The
figure illustrates how this works.

Java Program Interpreter

Compilers My Program

You can think of Java byte codes as the machine code instructions for the Java Virtual
Machine (Java VM). Every Java interpreter, whether it’s a Java development tool or a Web
browser that can run Java applets, is an implementation of the Java VM. The Java VM can also be
implemented in hardware.

Java byte codes help make “write once, run anywhere” possible. You can compile your Java
program into byte codes on my platform that has a Java compiler. The byte codes can then be run
any implementation of the Java VM. For example, the same Java program can run Windows NT,
Solaris, and Macintosh.

3.5.8 Networking

 TCP/IP stack

The TCP/IP stack is shorter than the OSI one:

33
TCP is a connection-oriented protocol; UDP (User Datagram Protocol) is a connectionless
protocol

 IP datagram’s

The IP layer provides a connectionless and unreliable delivery system. It considers each
datagram independently of the others. Any association between datagram must be supplied by the
higher layers. The IP layer supplies a checksum that includes its own header. The header includes
the source and destination addresses. The IP layer handles routing through an Internet. It is also
responsible for breaking up large datagram into smaller ones for transmission and reassembling
them at the other end.

 UDP

UDP is also connectionless and unreliable. What it adds to IP is a checksum for the contents of
the datagram and port numbers. These are used to give a client/server model - see later.

 TCP

TCP supplies logic to give a reliable connection-oriented protocol above IP. It provides a
virtual circuit that two processes can use to communicate.

 Internet addresses

In order to use a service, you must be able to find it. The Internet uses an address scheme for
machines so that they can be located. The address is a 32 bit integer which gives the IP address.
This encodes a network ID and more addressing. The network ID falls into various classes
according to the size of the network address.

 Network address

Class A uses 8 bits for the network address with 24 bits left over for other addressing. Class B
uses 16 bit network addressing. Class C uses 24 bit network addressing and class D uses all 32.

 Subnet address

Internally, the UNIX network is divided into sub networks. Building 11 is currently on one sub
network and uses 10-bit addressing, allowing 1024 different hosts. Host address 8 bits are finally
used for host addresses within our subnet. This places a limit of 256 machines that can be on the
subnet.
34
 Total address

The 32 bit address is usually written as 4 integers separated by dots

 Port addresses

A service exists on a host, and is identified by its port. This is a 16 bit number. To send a
message to a server, you send it to the port for that service of the host that it is running on. This is
not location transparency! Certain of these ports are "well known".

 Sockets

A socket is a data structure maintained by the system to handle network connections. A socket
is created using the call socket. It returns an integer that is like a file descriptor. In fact, under
Windows, this handle can be used with Read File and Write File functions.

#include <sys/types.h>
#include <sys/socket.h>
int socket(int family, int type, int protocol);

Here "family" will be AF_INET for IP communications, protocol will be zero, and type will
depend on whether TCP or UDP is used. Two processes wishing to communicate over a network
create a socket each. These are similar to two ends of a pipe - but the actual pipe does not yet exist.

3.5.9 JFree Chart

JFreeChart is a free 100% Java chart library that makes it easy for developers to display
professional quality charts in their applications. JFreeChart's extensive feature set includes:

A consistent and well-documented API, supporting a wide range of chart types;

A flexible design that is easy to extend, and targets both server-side and client-side
applications; Support for many output types, including Swing components, image files (including
PNG and JPEG), and vector graphics file formats (including PDF, EPS and SVG);

35
JFreeChart is "open source" or, more specifically, free software. It is distributed under the
terms of the GNU Lesser General Public License (LGPL), which permits use in proprietary
applications.

3.5.10 Map Visualizations


Charts showing values that relate to geographical areas. Some examples include: (a)
population density in each state of the United States, (b) income per capita for each country in
Europe, (c) life expectancy in each country of the world. The tasks in this project include:

Sourcing freely redistributable vector outlines for the countries of the world, states/provinces
in particular countries (USA in particular, but also other areas);

Creating an appropriate dataset interface (plus default implementation), a rendered, and


integrating this with the existing XYPlot class in JFreeChart;

Testing, documenting, testing some more, documenting some more.

 Time Series Chart Interactivity


Implement a new (to JFreeChart) feature for interactive time series charts --- to display a separate control
that shows a small version of ALL the time series data, with a sliding "view" rectangle that allows you to
select the subset of the time series data to display in the main chart.

 Dashboards
There is currently a lot of interest in dashboard displays. Create a flexible dashboard mechanism that
supports a subset of JFreeChart chart types (dials, pies, thermometers, bars, and lines/time series) that
can be delivered easily via both Java Web Start and an applet.

 Property Editors
The property editor mechanism in JFreeChart only handles a small subset of the properties that can
be set for charts. Extend (or reimplement) this mechanism to provide greater end-user control over the
appearance of the charts.

3.5.11 Tomcat 6.0 web server


Tomcat is an open source web server developed by Apache Group. Apache Tomcat is the servlet
container that is used in the official Reference Implementation for the Java Servlet and JavaServer Pages
technologies. The Java Servlet and JavaServer Pages specifications are developed by Sun under the Java
Community Process. Web Servers like Apache Tomcat support only web components while an application
server supports web components as well as business components (BEAs Weblogic, is one of the popular
application server).To develop a web application with jsp/servlet install any web server like JRun, Tomcat
etc to run your application.

36
37
SYSTEM ANALYSIS

38
4. SYSTEM ANALYSIS

4.1 Existing System:

With the rapid development of Internet in China, the industry's business model has changed. At
present, great process has been made in Web e-commerce platform for its convenience and transaction fast.
Competition for users is the key factor for e-commerce business in the increasingly fierce competition. If
you can grasp customer needs, develop targeted business activities, not only can provide convenient trading
mode and a wide choice for customers, but also make the e-commerce business to retain customers better.
One of the solutions is Web data mining technology. We can get the user behaviour from the browsing
behaviour of customers on Web and further analysis, then to find a solution. This will allow sellers know
more about their customers' needs, and provide personalized according to customer preferences, then obtains
the competitive advantage.

4.1.1 Disadvantages:
 Timing consumption is high
 No security for the data
 Data losing may be happen
 Data transferring is difficult
4.2 Proposed System:
In today’s ever connected world, the way people shop has changed. People are buying more and
more over the Internet instead of going traditional shopping. E-commerce provides customers with the
opportunity of browsing endless product catalogues, comparing prices, being continuously informed,
creating wish list and enjoying a better service based on their individual interests. This increasing electronic
market is highly competitive, featuring the possibility for a customer to easily move from one e-commerce
when their necessities are not satisfied. As a consequence, e-commerce business analysts require to know
and understand consumers’ behaviour when those navigate through the website, as well as trying to identify
the reasons that motivated them to purchase, or not, a product. Getting this behavioural knowledge will
allow e-commerce websites to deliver a more personalized service to customers, retaining customers and
increasing benefits.

 In the characterization contains the web browser used by the customer, the number of visited
webpage’s, the time the customer spent on each page, or the keywords used in search engine; focus
on the users’ interest in the different product categories and their characterization consist of the list of
visited categories and the frequency of such visits.

39
 Unlike the previous approaches, uses text mining techniques to discover the most frequent words
contained in the Web pages a customer visits, generating the session characterization from these
words. This solution tries to identify the user’s interests from the contents of the visited pages.
 Clustering algorithms are generally used to discover the sets of sessions showing a similar behaviour
or some common interests.
 This information can subsequently be used to improve the website contents and structure, to adapt
and personalize contents to recommend products to understand customers’ behaviour related to the
buying process or to understand the interest of users in specific products.
 Other researchers apply alternative mining techniques to predict the user’s behaviour. Extract the
users’ navigational sequences to create statistical and probabilistic models able to predict the user
next click. These models are represented as Markov chains. Nevertheless, these approaches present
some drawbacks: the process of creating these models is computationally very expensive, and,
besides, this type of models responds to very short-term reasoning (the model does not have
information to know how the current navigational state has been reached and how future states
representing long term goals can be reached). The combination of clustering algorithms and Markov
chains improves the predictions of these statistical models, as shown in. The idea is to first group
user sessions applying some clustering algorithms and, after, to generate a specific Markov chain for
each of the obtained clusters. Currently, there are powerful commercial tools for analyzing logs of e-
commerce websites, being Google Analytics one of the main ones. Google Analytics controls the
network traffic, collects information about user sessions (first and last web page visited, pages
visited, time spent on each page, etc.), and displays reports synthesizing users’ behaviour. These
traffic-based data can also be combined with other users’ personal and geographic information.
Google Analytics is not able to import the web server logs of a website, but it works analyzing the
information collected by means of page tagging techniques. Another interesting feature of the
followed mining approach is the fact of being able to analyze sequences of detailed events. The fact
of considering the causal relations of events inside a user session, allowing to looking for intra-
session patterns (and not only patterns repeated in different sessions) can provide the analysts with a
much more detailed perspective of user behaviour.
 Currently, there are powerful commercial tools for analyzing logs of e-commerce websites, being
Google Analytic one of the main ones. Google Analytic controls the network traffic, collects
information about user sessions (first and last web page visited, pages visited, time spent on each
page, etc.), and displays reports synthesizing users’ behavior. These traffic-based data can also be
combined with other users’ personal and geographic information. Google Analytic is not able to
import the web server logs of a website, but it works analyzing the information collected by means of
page tagging techniques. These techniques have some disadvantages with respect to the log-based
analysis, such as dependence on JavaScript and cookies, the necessity of adding page tags to every

40
page, the complexity of tag-based implementations, the fact that, as a result, customers may
experience a change in the download time of the website, or privacy concerns, for instance.
Nevertheless, Google reports are rich in data that, in turn, require experts in the problem domain to
exploit them. In any case, the conclusions of the analysis can be used to improve the website design,
to design advertising and marketing campaigns, to analyze customers demographic information or to
control real-time traffic. Similar commercial tools are Clicky, Piwik , Adobe Analytics, W3Counter
Web Analytical tool .
 The methodology and tool proposed in this work try to overcome some of the drawbacks of the
previous approaches, providing with the possibility of getting a very accurate interpretation of users’
behavior.
 In comparison to the clustering approaches and the commented commercial tools, the advantage of
our mining technique is that this provides causal relations among events of a user trace, instead of
providing with a global view of the whole session. Besides, it is the fact of avoiding the need of
tagging the web pages.
 With respect to those approaches whose main objective is predicting the coming possible events (as
the case of Markov models, for instance), the approach allows having a global view of the sessions,
making easier a global analysis of the user behavior, giving hints and facilitating the re-design of the
website for a better adaptation to the user necessities.
 An interesting feature of the approach followed in this paper is that it properly fits the open nature of
the use of e-commerce websites, where there are very few constrains for the users to navigate among
site web pages.
 Another interesting feature of the followed mining approach is the fact of being able to analyze
sequences of detailed events. The fact of considering the causal relations of events inside a user
session, allowing looking for intra-session patterns (and not only patterns repeated in different
sessions) can provide the analysts with a much more detailed perspective of a user behavior.

4.2.1 Advantages:
 A secure control system distributes appropriate resources to be utilized in different occasions.
 The data stored in cloud systems need a mechanism to ensure their data not lost or modified by
unauthorized users.

4.3 Module Description:


 Clustering Module
 Behavioural Module

4.3.1 Clustering Module: Clustering algorithms are generally used to discover the sets of sessions showing
a similar behaviour or some common interests.

41
4.3.2 Behavioural Module:

 Behavioural knowledge will allow e-commerce websites to deliver a more personalized


service to customers, retaining customers and increasing benefits.
 The goal is to analyze the usage of e-commerce websites and to discover customers’ complex
behavioural patterns by means of checking temporal logic formulas describing such behaviour
against the log mode

4.4 Logic Model:


4.4.1 Linear- temporal logic model:

Linear temporal logic or linear-time temporal logic is a modal temporal logic with modalities

referring to time. In LTL, one can encode formulae about the future of paths, e.g., a condition will eventually
be true, a condition will be true until another fact becomes true, etc. It is a fragment of the more complex,
which additionally allows branching time and quantifiers. Subsequently LTL is sometimes
called propositional temporal logic, abbreviated linear temporal logic is a fragment of S1S monadic second-
order logic of one successor

42
SYSTEM DESIGN

43
5. SYSTEM DESIGN

5.1 Clustering algorithm:


Cluster analysis or clustering is the task of grouping a set of objects in such a way that objects in the
same group (called a cluster) are more similar (in some sense or another) to each other than to those in other
groups (clusters). It is a main task of exploratory data mining, and a common technique for statistical data
analysis, used in many fields, including machine learning, pattern recognition, image analysis, information
retrieval, bioinformatics, data compression, and computer graphics.

5.2 Architecture Design:

5.3 UML Diagram:


A UML diagram is a diagram based on the UML(Unified Modeling Language) with the purpose of
visually representing a system along with its main actors, roles, actions, artifacts or classes, in order to better
understand, alter, maintain, or document information about the system

44
5.3.1 Admin activity diagram:

5.3.2 User activity diagram:

45
5.4 ER Diagram:

ER modeling is a top-down structure to database design that begins with identifying the important data
called entities and relationships in combination with the data that must be characterized in the model.
ER modeling is an important technique for any database designer to master and forms the basis of the
methodology.

5.4.1 User:

46
5.4.2 Admin:

5.4.3 Class diagram:

47
5.5 Data Flow Diagram:
A data flow diagram (DFD) is a graphical representation of the "flow" of data through an information
system. DFDs can also be used for the visualization of data processing (structured design).

5.5.1 Admin flow diagram:

48
5.5.2 User flow Diagram:

49
5.6 Sequence Diagram:
A sequence diagram describes an interaction among a set of objects participated in a collaboration (or
scenario), arranged in a chronological order; it shows the objects participating in the interaction by their
"lifelines" and the messages that they send to each other.

5.6.1 User sequence Diagram:

50
5.6.2 Admin sequence Diagram:

51
CODING

52
6. CODING

adminlogin.jsp:
<!DOCTYPE html>

<html lang="en">

<head>

<title>Home</title>

<meta charset="utf-8">

<meta name = "format-detection" content = "telephone=no" />

<link rel="icon" href="images/favicon.ico">

<link rel="shortcut icon" href="images/favicon.ico" />

<link rel="stylesheet" href="css/animation.css">

<link rel="stylesheet" href="css/camera.css">

<link rel="stylesheet" href="css/contact-form.css">

<link rel="stylesheet" href="css/touchTouch.css">

<link rel="stylesheet" href="css/style.css">

<script src='//maps.googleapis.com/maps/api/js?v=3.exp&amp;sensor=false'></script>

<script src="js/jquery.js"></script>

<script src="js/jquery-migrate-1.1.1.js"></script>

<script src="js/jquery.easing.1.3.js"></script>

<script src="js/script.js"></script>

<script src="js/jquery.ui.totop.js"></script>

<script src="js/touchTouch.jquery.js"></script>

<script src="js/isotope.pkgd.js"></script>

<script src="js/TMForm.js"></script>

<script src="js/modal.js"></script>

<script src="js/camera.js"></script>

<!--[if (gt IE 9)|!(IE)]><!-->

<script src="js/jquery.mobile.customized.min.js"></script>
53
<!--<![endif]-->

<style>

.top

background-image:url(images/po.jpg);

background-size:100%100%;

border-radius:25px;

border:2px solid white;

margin-top:30px;

margin-left:30px;

height:260px;

width:500px;

background-color:white;

float:left;

overflow:hidden;

.last

border-radius:25px;

border:2px solid white;

margin-top:25px;

margin-left:110px;

height:260px;

width:450px;

background-color:white;

float:left;

overflow:hidden;

.last table

54
{

margin-top: 60px;

margin-left: 90px;

</style>

<script>

$(window).load(function(){

$().UItoTop({ easingType: 'easeOutQuart' });

$('.gallery .gall_item').touchTouch();

});

$(document).ready(function(){

jQuery('#camera_wrap').camera({

loader: false,

pagination: true ,

minHeight: '500',

thumbnails: false,

height: '44.42708333333333%',

caption: true,

navigation: false,

fx: 'mosaic'

});

$('.gallery .gall-item').touchTouch();

});

</script>

</head>

<body>

<section id="portfolio" class="page">

<div class="container_12">

<div class="grid_12">

55
<h2 class="color2"><font color="yellow">Admin Login Page</font></h2>

<div id="filters" class="button-group">

<a href="#skills" class="btn" data-filter="userlogin.jsp">Back</a>

<a href="index.html" class="btn" data-filter=".photo">logout</a>

<!-- <a href="index.html#portfolio" class="btn" data-filter=".design">Back</a> -->

</div>

</div>

<div class="top">

</div>

<div class="last">

<form name="" action="admindb.jsp" method="post">

<table align="center">

<tr><td><font color="blue"> Name :</font></td>

<td><input type="text" name="uname" style="height:20px;" autocomplete="off"></td></tr>

<tr><td><font color="blue">Password :</font></td>

<td><input type="password" name="pwd" style="height:20px;"></td></tr>

&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;<tr>

<td align="center">

<button type="submit" value="submit">Login</button>

<button type="reset" value="submit">Clear</button></td></tr>

</table>

</form>

</div>

<div class="clear"></div>

</div>

</section>

</body>

</html>

56
SYSTEM TESTING

57
7. SYSTEM TESTING
The purpose of testing is to discover errors. Testing is the process of trying to discover every
conceivable fault or weakness in a work product. It provides a way to check the functionality of components,
sub assemblies, assemblies and/or a finished product it is the process of exercising software with the intent
of ensuring that the software system meets its requirements and user expectations and does not fail in an
unacceptable manner. There are various types of test. Each test type addresses a specific testing requirement.

7.1 Testing Methodologies

Testing is the process of finding differences between the expected behavior specified by system
models and the observed behavior implemented system. From modeling point of view , testing is the attempt
of falsification of the system with respect to the system models. The goal of testing is to design tests that
exercise defects in the system and to reveal problems. The process of executing a program with intent of
finding errors is called testing. During testing, the program to be tested is executed with a set of test cases ,
and the output of the program for the test cases is evaluated to determine if the program is performing as
expected . Testing forms the first step in determining the errors in the program. The success of testing in
revealing errors in program depends critically on test cases. Strategic Approach to Software Testing: The
software engineering process can be viewed as a spiral. Initially system engineering defines the role of
software and leads to software requirements analysis where the information domain, functions, behavior,
performance, constraints and validation criteria for software are established. Moving inward along the spiral
,we come to design and finally to coding . To develop computer software we spiral in along streamlines that
decreases the level of abstraction on each item. A Strategy for software testing may also be viewed in the
context of the spiral. Unit testing begins at the vertex of the spiral and concentrates on each unit of the
software as implemented in source code. Testing will progress by moving outward along the spiral to
integration testing, where the focus on the design and the concentration of the software architecture. Talking
another turn on outward on the spiral we encounter validation testing where requirements established as part
of software requirements analysis are validated against the software that has been constructed. Finally we
arrive at system testing, where the software and other system elements are tested as a whole.

7.1.1 Strategic Approach to Software Testing:

The software engineering process can be viewed as a spiral. Initially system engineering defines the
role of software and leads to software requirements analysis where the information domain , functions ,
behavior , performance , constraints and validation criteria for software are established. moving inward
along the spiral , we come to design and finally to coding . To develop computer software we spiral in along
streamlines that decreases the level of abstraction on each item.

58
A Strategy for software testing may also be viewed in the context of the spiral. Unit testing begins at
the vertex of the spiral and concentrates on each unit of the software as implemented in source code. Testing
will progress by moving outward along the spiral to integration testing, where the focus on the design and
the concentration of the software architecture. Talking another turn on outward on the spiral we encounter
validation testing where requirements established as part of software requirements analysis are validated
against the software that has been constructed. Finally we arrive at system testing, where the software and
other system elements are tested as a whole.
7.2 TYPES OF TESTS:

UNIT TESTING

MODULE

Component
SUB-SYSTEM

SYSTEM TESTING
Integration Testing

ACCEPTANCE
User Testing

 Different Levels of Testing

Client Needs Acceptance Testing


Requirements System Testing
Design Integration Testing
Code Unit Testing
59
Testing is the process of finding difference between the expected behavior specified by system models and
the observed behavior of the implemented system.
 TESTING ACTIVITIES

Different levels of testing are used in the testing process , each level of testing aims to test different aspects
of the system. The basic levels are:
 Unit testing
 Integration testing
 System testing
 Acceptance testing

 TYPES OF TESTS:
 Unit testing:
Unit testing involves the design of test cases that validate that the internal program logic is
functioning properly, and that program inputs produce valid outputs. All decision branches and internal code
flow should be validated. It is the testing of individual software units of the application .it is done after the
completion of an individual unit before integration. This is a structural testing, that relies on knowledge of its
construction and is invasive. Unit tests perform basic tests at component level and test a specific business
process, application, and/or system configuration. Unit tests ensure that each unique path of a business
process performs accurately to the documented specifications and contains clearly defined inputs and
expected results.

 Integration Testing:

Integration tests are designed to test integrated software components to determine if they actually run
as one program. Testing is event driven and is more concerned with the basic outcome of screens or fields.
Integration tests demonstrate that although the components were individually satisfaction, as shown by
successfully unit testing, the combination of components is correct and consistent. Integration testing is
specifically aimed at exposing the problems that arise from the combination of components.

 Functional Test:
Functional tests provide systematic demonstrations that functions tested are available as specified by
the business and technical requirements, system documentation, and user manuals.

Functional testing is centered on the following items:

Valid Input : identified classes of valid input must be accepted.

Invalid Input : identified classes of invalid input must be rejected.

Functions : identified functions must be exercised.

60
Output : identified classes of application outputs must be exercised.

Systems/Procedures: interfacing systems or procedures must be invoked.

Organization and preparation of functional tests is focused on requirements, key functions, or special
test cases. In addition, systematic coverage pertaining to identify Business process flows; data fields,
predefined processes, and successive processes must be considered for testing. Before functional testing
is complete, additional tests are identified and the effective value of current tests is determined.

 System Test:
System testing ensures that the entire integrated software system meets requirements. It tests a
configuration to ensure known and predictable results. An example of system testing is the configuration
oriented system integration test. System testing is based on process descriptions and flows, emphasizing pre-
driven process links and integration points.

 White Box Testing:


White Box Testing is a testing in which in which the software tester has knowledge of the inner
workings, structure and language of the software, or at least its purpose. It is purpose. It is used to test
areas that cannot be reached from a black box level.

 Black Box Testing:


Black Box Testing is testing the software without any knowledge of the inner workings, structure or
language of the module being tested. Black box tests, as most other kinds of tests, must be written from a
definitive source document, such as specification or requirements document, such as specification or
requirements document. It is a testing in which the software under test is treated, as a black box .you
cannot “see” into it. The test provides inputs and responds to outputs without considering how the
software works.

7.3 TEST CASES


A test case is a set of input data and an expected result that exercises the component with the purpose
of causing failures and detecting faults. Test cases are classified into black box test and white box test. Black
box test focus on input/output behavior of the component. White box test focus on internal structure of the
components.

7.3.1 TEST CASE EXECUTION AND ANALYSIS

61
7.3.1.1 Admin Login:

TEST CASE CONDITIONS EXPECTED OBSERVED STATUS


BEING BEHAVIOUR BEHAVIOUR
CHECKED
User Check validation Display a Display a Pass
name=”Admin” message wrong message wrong
user name user name
Password=”Admin Existence of Display a Display a Pass
“ mandatory message to fill message to fill all
Fields all empty man empty man datary
datary fields fields
User Existence of man If valid user it If valid user it Pass
name=”Adnin” datary displays home displays home
Password=”Admin” Fields page else error page else error
message message

7.3.1.2 User Login:

TEST CASE CONDITIONS EXPECTED OBSERVED STATUS


BEING BEHAVIOUR BEHAVIOUR
CHECKED
User Check validation Display a Display a Pass
name=”nani” message wrong message wrong
user name user name
Password=”nani “ Existence of Display a Display a Pass
mandatory message to fill message to fill all
Fields all empty man empty man datary
datary fields fields
User Existence of man If valid user it If valid user it Pass
name=”nani” datary displays home displays home
Password=”nani” Fields page else error page else error
message message

7.4 Unit Testing:

Unit testing is usually conducted as part of a combined code and unit test phase of the software
lifecycle, although it is not uncommon for coding and unit testing to be conducted as two distinct phases.

 Test strategy and approach


Field testing will be performed manually and functional tests will be written in detail.

62
 Test objectives

 All field entries must work properly.


 Pages must be activated from the identified link.
 The entry screen, messages and responses must not be delayed.
 Features to be tested

 Verify that the entries are of the correct format


 No duplicate entries should be allowed
 All links should take the user to the correct page.
7.5 Integration Testing:
Software integration testing is the incremental integration testing of two or more integrated software
components on a single platform to produce failures caused by interface defects.

The task of the integration test is to check that components or software applications, e.g. components
in a software system or – one step up – software applications at the company level – interact without error.

Test Results: All the test cases mentioned above passed successfully. No defects encountered.

7.6 Acceptance Testing:


User Acceptance Testing is a critical phase of any project and requires significant participation by the
end user. It also ensures that the system meets the functional requirements.

Test Results: All the test cases mentioned above passed successfully. No defects encountered.

63
IMPLEMENTATION

64
8. IMPLEMENTATION

Home Page:

Admin page:

65
Admin Login Page:

Admin View:

66
Adding Product:

User Login:

67
User Registration:

User View:

68
Searching Product:

Searching Product:

69
Payment Option:

Delivery Details:

70
Traversal:

Admin Checking:

71
User Transactions:

User Visit List:

72
Graph:

73
CONCLUSION

74
9. CONCLUSION

In the case of open systems, where the sequences of interactions (stored as system logs) are not
constrained by a workflow, process mining techniques whose objective is to extract a process model will
usually provide with either over fitting spaghetti models or under fitting flower models, from which little
interesting information can be extracted. A more flexible approach is required. In the paper we apply LTL-
based model checking techniques to analyze e-commerce web logs. To enable this analysis, we have
proposed a common way of representing event types and attributes considering the e-commerce web
structure, the product categorization and the possibilities of users to navigate throw the website according to
such organization. From this structural point of view, the analysis carried out has allowed us to identify
several issues and to propose improvements regarding the product categorization and the organization of
some of the website sections, which have been transferred to the enterprise managers. Although the paper is
strongly related to that website, the proposed approach is general and the methodology is applicable to
structured e-commerce websites. The first phase of the methodology, the preprocessing phase, is the one
which is specific for each e-commerce website, since it depends on the specific system log and, meanwhile
the analysis technique and the queries can be completely reused. It can be executed in parallel, deploying
different parallel servers with different parts of the log and executing the queries in parallel. We also plan to
extend the set of studied patterns in order to analyze more behavioural patterns and to facilitate their
automatic discovery. For that, a side-by-side work with specialists of the problem domain is required in
order to define a set of interesting queries as wide as possible. Additionally, extending the web server logs
with information about users or online customer reviews is going to be studied. User’s information would
allow us to study multi session patterns and correlate results with demographic information; while, online
reviews would allow us to analyze customer’s feedbacks in order to recommend products.

75
BIBILOGRAPHY

76
10. BIBILOGRAPHY
[1] J. B. Schafer, J. A. Konstan, and J. Riedl, “E-commerce recommendation applications.” Hingham, MA, USA:
Kluwer Academic Publishers, Jan. 2001, vol. 5, no. 1-2, pp. 115–153.

[2] N. Poggi, D. Carrera, R. Gavalda, J. Torres, and E. Ayguade, “Characterization of workload and resource
consumption for an online travel and booking site,” in Workload Characterization (IISWC), 2010 IEEE International
Symposium on. IEEE, 2010, pp. 1–10.

[3] R. Kohavi, “Mining e-commerce data: the good, the bad, and the ugly,” in Proceedings of the seventh ACM
SIGKDD international conference on Knowledge discovery and data mining. ACM, 2001, pp. 8–13.

[4] W. W. Moe and P. S. Fader, “Dynamic conversion behavior at ecommerce sites,” Management Science, vol. 50,
no. 3, pp. 326–335, 2004.

[5] G. Liu, T. T. Nguyen, G. Zhao, W. Zha, J. Yang, J. Cao, M. Wu, P. Zhao, and W. Chen, “Repeat buyer prediction
for e-commerce,” in Proceedings of the 22Nd ACM SIGKDD International Conference on Knowledge Discovery and
Data Mining, ser. KDD ’16. New York, NY, USA: ACM, 2016, pp. 155–164.

[6] J. D. Xu, “Retaining customers by utilizing technology-facilitated chat: Mitigating website anxiety and task
complexity,” Information & Management, vol. 53, no. 5, pp. 554 – 569, 2016.

[7] Y. S. Kim and B.-J. Yum, “Recommender system based on click stream data using association rule mining,”
Expert Systems with Applications, vol. 38, no. 10, pp. 13 320–13 327, 2011.

[8] R. Kosala and H. Blockeel, “Web mining research: A survey,” SIGKDD Explor. Newsl., vol. 2, no. 1, pp. 1–15,
Jun. 2000.

[9] F. M. Facca and P. L. Lanzi, “Mining interesting knowledge from weblogs: a survey,” Data & Knowledge
Engineering, vol. 53, no. 3, pp. 225–241, 2005.

[10] C. J. Carmona, S. Ram´ırez-Gallego, F. Torres, E. Bernal, M. J.del Jesus, and S. Garcıa, “Web usage mining to
improve the design of an ecommerce website: Orolivesur.com,” Expert Systems with Applications, vol. 39, no. 12, pp.
11 243–11 249, 2012.

[11] Q. Song and M. Shepperd, “Mining web browsing patterns for ecommerce,” Computers in Industry, vol. 57, no.
7, pp. 622–630, 2006.

[12] O. Arbelaitz, I. Gurrutxaga, A. Lojo, J. Muguerza, J. M. Prez, and I. Perona, “Web usage and content mining to
extract knowledge for modeling the users of the bidasoa turismo website and to adapt it.” Expert Syst. Appl., vol. 40,
no. 18, pp. 7478–7491, 2013.

[13] J. K. Gerrikagoitia, I. Castander, F. Rebon, and A. Alzua-Sorzabal, “New ´ trends of intelligent e-marketing
based on web mining for e-shops”, Procedia-Social and Behavioral Sciences, vol. 175, pp. 75–83, 2015.

77
[14] Y. H. Cho and J. K. Kim, “Application of web usage mining and product taxonomy to collaborative
recommendations in e-commerce,” Expert Systems with Applications, vol. 26, no. 2, pp. 233 – 246, 2004.

[15] K.-J. Kim and H. Ahn, “A recommender system using {GA} kmeans clustering in an online shopping market,”
Expert Systems with Applications, vol. 34, no. 2, pp. 1200 – 1209, 2008.

[16] Q. Su and L. Chen, “A method for discovering clusters of e-commerce interest patterns using click-stream data,”
Electronic Commerce Research and Applications, vol. 14, no. 1, pp. 1 – 13, 2015.

[17] J. Srivastava, R. Cooley, M. Deshpande, and P.-N. Tan, “Web usage mining: Discovery and applications of
usage patterns from web data,” SIGKDD Explore. News, vol. 1, no. 2, pp. 12–23, Jan. 2000.

[18] Q. Zhang and R. S. Segall, “Web mining: a survey of current research, techniques, and software,” International
Journal of Information Technology & Decision Making, vol. 7, no. 04, pp. 683–720, 2008.

[19] B. Singh and H. K. Singh, “Web data mining research: a survey,” in Computational Intelligence and Computing
Research (ICCIC), 2010 IEEE International Conference on. IEEE, 2010, pp. 1–10.

[20] W. M. P. van der Aalst, Process Mining: Discovery, Conformance and Enhancement of Business Processes, 1st
ed. Springer Publishing Company, Incorporated, 2011.

[21] N. Poggi, V. Muthusamy, D. Carrera, and R. Khalaf, “Business process mining from e-commerce web logs,” in
Proceedings of the 11th International Conference on Business Process Management, ser. BPM’13. Berlin, Heidelberg:
Springer-Verlag, 2013, pp. 65–80.

[22] F. M. Maggi, R. P. J. C. Bose, and W. M. P. van der Aalst, Efficient Discovery of Understandable Declarative
Process Models from Event Logs. Berlin, Heidelberg: Springer Berlin Heidelberg, 2012, pp. 270– 285.

[23] M. Raim, C. Di Ciccio, F. M. Maggi, M. Mecella, and J. Mendling,“Log-based understanding of business


processes through temporal logic query checking,” in On the Move to Meaningful Internet Systems: OTM 2014
Conferences: Confederated International Conferences: CoopIS, and ODBASE 2014, Amantea, Italy, October 27-31,
2014, Proceedings, R. Meersman, H. Panetto, T. Dillon, M. Missikoff, L. Liu, O. Pastor, A. Cuzzocrea, and T. Sellis,
Eds. Berlin, Heidelberg: Springer Berlin Heidelberg, 2014, pp. 75–92. 2169-3536 (c) 2016 IEEE. Translations and
content mining are permitted for academic research only. Personal use is also permitted, but
republication/redistribution requires IEEE permission. See
http://www.ieee.org/publications_standards/publications/rights/index.html for more information.

This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content
may change prior to final publication. Citation information: DOI 10.1109/ACCESS.2017.2707600, IEEE Access
ACCEPTED FOR ITS PUBLICATION IN IEEE ACCESS 17

[24] A. Burattin, M. Cimitile, F. M. Maggi, and A. Sperduti, “Online discovery of declarative process models from
event streams,” IEEE Transactions on Services Computing, vol. 8, no. 6, pp. 833–846, 2015.

78
[25] A. Duret-Lutz, A. Lewkowicz, A. Fauchille, T. Michaud, E. Renault, and L. Xu, “Spot 2.0 — a framework for
LTL and ω-automata manipulation,” in Proceedings of the 14th International Symposium on Automated Technology
for Verification and Analysis (ATVA’16), ser. Lecture Notes in Computer Science, vol. 9938. Springer, Oct. 2016, pp.
122–129.

[26] R.-S. Wu and P.-H. Chou, “Customer segmentation of multiple category data in e-commerce using a soft-
clustering approach,” Electronic Commerce Research and Applications, vol. 10, no. 3, pp. 331–341, May 2011.

[27] L. G. Vasconcelos, R. D. C. Santos, and L. A. Baldochi, “Exploiting client logs to support the construction of
adaptive e-commerce applications,” in 2016 IEEE 13th International Conference on e-Business Engineering (ICEBE),
2016, pp. 164–169.

[28] Y.-L. Chen, M.H. Kuo, S.Y. Wu, and K. Tang, “Discovering recency, frequency, and monetary (rfm) sequential
patterns from customers’ purchasing data,” Electronic Commerce Research and Applications, vol. 8, no. 5, pp. 241–
251, Oct. 2009.

[29] S. Kim, J. Yeo, E. Koh, and N. Lipka, “Purchase influence mining: Identifying top-k items attracting purchase of
target item,” in Proceedings of the 25th International Conference Companion on World Wide Web, ser. WWW ’16
Companion. International World Wide Web Conferences Steering Committee, 2016, pp. 57–58.

[30] S. D. Bernhard, C. K. Leung, V. J. Reimer, and J. Westlake, “Click stream prediction using sequential stream
mining techniques with markov chains,” in Proceedings of the 20th International Database Engineering &
Applications Symposium, ser. IDEAS ’16. New York, NY, USA: ACM, 2016, pp. 24–33.

[31] L. Lu, M. Dunham, and Y. Meng, “Mining significant usage patterns from click stream data,” in Proceedings of
the 7th International Conference on Knowledge Discovery on the Web: Advances in Web Mining and Web Usage
Analysis, ser. WebKDD’05. Berlin, Heidelberg: SpringerVerlag, 2006, pp. 1–17.

[32] W. M. van Der Aalst, M. Pesic, and H. Schonenberg, “Declarative workflows: Balancing between flexibility and
support,” Computer Science Research and Development, vol. 23, no. 2, pp. 99–113, 2009.

[33] A. Burattin, F. M. Maggi, and A. Sperduti, “Conformance checking based on multi-perspective declarative
process models,” Expert Systems with Applications, vol. 65, pp. 194 – 211, 2016.

[34] (2017) Google analytics. Accessed 22nd May 2017. [Online]Available:


https://analytics.google.com/analytics/web/

[35] (2017) Clicky. Accessed 22nd May 2017. [Online] Available: https://clicky.com

[36] (2017) Piwik open-source analytics platform. Accessed 22nd May 2017. [Online] Available: https://piwik.org

[37] (2017) Adobe analytics. Accessed 22nd May 2017. [Online] Available:
https://analytics.google.com/analytics/web/

[38] (2017) W3counter. Accessed 22nd May 2017. [Online] Available: https://www.w3counter.com

79
[39] A. Pnueli and Z. Manna, “The temporal logic of reactive and concurrent systems”, 1992.

[40] E. M. Clarke, A. Emerson, and A. P. Sistla, “Automatic Verification of Finite State Concurrent System Using
Temporal Logic Specifications: a Practical Approach,” in Proceedings of the 10th ACM SIGACT-SIGPLAN
symposium on Principles of programming languages (POPL 1983), Austin, Texas, January 24 - 26, 1983. New York,
NY, USA: ACM, 1983, pp. 117–126.

[41] E.Clarke, O.Grumberg, and D. Long, “Verification tools for finite-state concurrent systems,” in
Workshop/School/Symposium of the REX Project (Research and Education in Concurrent Systems). Springer, 1993,
pp. 124–175.

[42] J. Couvreur, “On-the-fly verification of linear temporal logic,” in Proceedings of Formal Methods: World
Congress on Formal Methods in the Development of Computing Systems, Toulouse (France), September, 1999, pp.
253–271.

[43] W. van der Aalst and M. Pesic, “Decserflow: Towards a truly declarative service flow language,” in The Role of
Business Processes in Service Oriented Architectures, ser. Dagstuhl Seminar Proceedings, F. Leymann, W. Reisig, S.
R. Thatte, and W. van der Aalst, Eds., no. 06291. Dagstuhl, Germany: Internationals Begegnungs- und
Forschungszentrum fur Informatik (IBFI), Schloss Dagstuhl, Germany, 2006.

[44] A. Bauer and P. Haslum, “Ltl goal specifications revisited,” in Proceedings of the 2010 Conference on ECAI
2010: 19th European Conference on Artificial Intelligence. Amsterdam, The Netherlands: IOS Press, 2010, pp. 881–
886.

[45] G. De Giacomo, R. De Masellis, and M. Montali, “Reasoning on ltl on finite traces: Insensitivity to infiniteness,”
in Proceedings of the Twenty Eighth AAAI Conference on Artificial Intelligence, ser. AAAI’14. AAAI Press, 2014,
pp. 1027–1033.

[46] P. Alvarez, J. Fabra, S. Hern ´ andez, and J. Ezpeleta, “Alignment of ´ teacher’s plan and students’ use of lms
resources. Analysis of moodle logs,” in 2016 15th International Conference on Information Technology Based Higher
Education and Training (ITHET), Sept 2016, pp. 1–8.

[47] (1995) Common Log Format (CLF). The World Wide Web Consortium (W3C). Online Available:
http://www.w3.org/Daemon/User/Config/ Logging.html#common-logfileforma

[48] M. Srivastava, R. Garg, and P. Mishra, “Preprocessing techniques in web usage mining: A survey,” International
Journal of Computer Applications, vol. 97, no. 18, 2014.

[49] K. S. Reddy, M. K. Reddy, and V. Sitaramulu, “An effective data preprocessing method for web usage mining,”
in Information Communication and Embedded Systems (ICICES), 2013 International Conference on. IEEE, 2013, pp.
7–10.

[50] G. Neelima and S. Rodda, “Predicting user behavior through sessions using the web log mining,” in 2016
International Conference on Advances in Human Machine Interaction (HMI), 2016, pp. 1–5.

80
[51] R. Y. Lau, J. L. Zhao, G. Chen, and X. Guo, “Big data commerce,” Information & Management, vol. 53, no. 8,
pp. 929 – 933, 2016, big Data Commerce.

[52] J. Qi, Z. Zhang, S. Jeon, and Y. Zhou, “Mining customer requirements from online reviews: A product
improvement perspective,” Information & Management, vol. 53, no. 8, pp. 951 – 963, 2016.

[53] Y. Kang and L. Zhou, “Rube: Rule-based methods for extracting product features from online consumer
reviews,” Information & Management, vol. 54, no. 2, pp. 166 – 176, 2017

81