Documenti di Didattica
Documenti di Professioni
Documenti di Cultura
INTRODUCTION
Nowadays, the web is playing a significant role in delivering information to users’ fingertips. A
web page can be localized by a fixed url, and displays the page content as time-varying snapshot.
Among the common web behaviors, web revisitation is to re-find the previously viewed web
pages, not only the page url, but also the page snapshot at that access timestamp [1]. A 6-week
user study with 23 participants showed nearly 58% of web access belonged to web revisitation
[2]. Another 1-year user study involving 114 participants revealed around 40% of queries were
re-finding requests [3]. According to [4], on average, every second page loaded was already
visited before by the same user, and the ratio of revisited pages among all visits ranges between
20% and 72%.
Psychological studies show that humans rely on both episodic memory and semantic memory to
recall information or events from the past. Human’s episodic memory receives and stores
temporally dated episodes or events, together with their spatial-temporal relations, while
human’s semantic memory, on the other hand, is a structured record of facts, meanings, concepts
and skills that one has acquired from the external world. Semantic information is derived from
accumulated episodic memory. Episodic memory can be thought of as a “map” that ties together
items in semantic memory. The two memories make up the category of human user’s declarative
memory, and work together in user’s information recollecting activities [5]. Thus, when a user’s
web revisitation behavior happens, s/he tends to utilize episodic memory, interweaved with
semantic memory, to recall the previously focused pages. Here, semantic memory
accommodates content information of previously focused pages, and episodic memory keeps
these pages’ access context (e.g., time, location, concurrent activities, etc.) [6], [7].
Inspired by the psychological findings, this project explores how to leverage our natural recall
process of using episodic and semantic memory cues to facilitate personal web revisitation.
Considering the differences of users in memorizing previous access context and page content
cues, a relevance feedback mechanism is involved to enhance personal web revisitation
performance.
1
1.1. EXISTING SYSTEM
In the literature, a number of techniques and tools like bookmarks, history tools, search
engines, metadata annotation and exploitation, and contextual recall systems have been
developed to support personal web revisitation. The most closely related work of this study is
Memento system [8], which unifies context and content to aid web revisitation. It defined the
context of a web page as other pages in the browsing session that immediately precede or
follow the current page, and then extracted topic-phrases from these browsed pages based on
the Wikipedia topic list. In comparison, the context information considered in this work
includes access time, location and concurrent activities automatically inferred from user’s
computer programs. Instead of extracting content items from the full web page as done in [8],
we extract them from page segments displayed on the screen in the user’s view, and assign a
probabilistic value to each extracted term based on user’s page browsing behaviors (i.e.,
dwell time and highlighting), as well as page’s subject headings and term frequency-inverse
document frequency (tf-idf), reflecting user’s impression and likeliness of using the keyword
as recall content cues. Other closely related work such as [9], [10], [11] enabled users to
search for contextually related activities (e.g., time, location, concurrent activities, meetings,
music playing, interrupting phone call, or even other files or web sites that were open at the
same time), and find a target piece of information (often not semantically related) when that
context was on. This body of research emphasizes episodic context cues in page recall. How
to grasp possibly impressive semantic content cues from user’s page access behaviors, and
utilize them to facilitate recall are not discussed. To tailor to individual’s web revisitation
characteristics, as well as human user’s context and content memory degradation nature, this
study presents methods to dynamically tune influential parameters in building and
maintaining probabilistic context and content memories for recall.
When a user accesses a web page, which is of potential to be revisited later by the user (i.e.,
page access time is over a threshold), the context acquisition and management module
captures the current access context (i.e., time, location, activities inferred from the currently
running computer programs) into a probabilistic context tree. Meanwhile, the content
extraction and management module performs the unigram based extraction from the
2
displayed page segments and obtains a list of probabilistic content terms. The probabilities of
acquired context instances and extracted content terms reflect how likely the user will refer to
them as memory cues to get back to the previously focused page.
Later, when a user requests to get back to a previously focused page through context and/or
content keywords, the re-access by context keywords module and re-access by content
keywords module search the probabilistic context tree repository and probabilistic term list
repository, respectively. The result generation and feedback adjustment module combines the
two search results and returns to the user a ranked list of visited page URLs. The relevance
feedback mechanism dynamically tunes influential parameters (including memories’ decay
rates, page reading time threshold, interleaved window size threshold, weight vectors in
computing the association and impression scores), which are critical to the construction and
management of context and content memories for personal web revisitation.
The main contributions of our project thus lie in the following three aspects:
• We present a personal web revisitation technique, called WebPagePrev, that allows users to
get back to their previously focused pages through access context and page content keywords.
Underlying techniques for context and content memories’ acquisition, storage, and utilization
for web page recall are discussed.
• Dynamic tuning strategies to tailor to individual’s memorization strength and recall habits
based on relevance feedback (e.g., weight preference calculation, decay rate adjustment, etc.)
are developed for performance improvement.
• We evaluate the effectiveness of the proposed technique WebPagePrev, and report the
findings (e.g., the importance of context and content factors) in web revisitation through a 6-
month user study with 21 participants.
3
1.3. REQUIREMENT SPECIFICATION
Software Requirements:
• Operating system : Windows XP/7.
• Coding Language : JAVA/J2EE
• Data Base : MYSQL
Hardware Components:
• System : Pentium IV 2.4 GHz.
• Hard Disk : 40 GB.
• Floppy Drive : 1.44 Mb.
• Monitor : 15 VGA Colour.
• Mouse : Logitech.
• RAM : 512 Mb.
4
2. LITERATURE SURVEY
People often repeat Web searches, both to find new information on topics they have
previously explored and to re-find information they have seen in the past. The query
associated with a repeat search may differ from the initial query but can nonetheless lead to
clicks on the same results. This project explores repeat search behavior through the analysis
of a one-year Web query log of 114 anonymous users and a separate controlled survey of an
additional 119 volunteers. Our study demonstrates that as many as 40% of all queries are re-
finding queries. Re-finding appears to be an important behavior for search engines to
explicitly support, and we explore how this can be done. We demonstrate that changes to
search engine results can hinder re-finding, and provide a way to automatically detect repeat
searches and predict repeat clicks.
Millions of web pages are visited, and revisited every day. On average, every second page
loaded was already visited before by the same user — individual means for recurrence rates
range between 20% and 72% (cf. p. 24). People revisit pages within a session or between
parallel ones, they reuse web-based tools habitually, monitor specific content or resume
interrupted sessions, and they want to re-find content after longer periods of time. Current
history tools that support such revisits show unique and severe shortcomings. Often, revisits
are cumbersome, more than necessary. This survey summarizes existing knowledge about
revisitations on the web, and surveys the potential of graphic-based web history tools. A
taxonomy of revisit-types distinguishes between short-, medium-, and long-term revisits, but
also intra-and inter-session revisits. Assisted by a clear nomenclature this provides more
clarity to the current discussion. The potential use of graphic-based tools is analyzed and
discussed with respect to the found categories. The value of the current, mainly non-graphical
history tools, such as back button, bookmarks, history list, search engines, and search bars is
examined and related to the potential offered by graphic-based tools. The survey provides
summaries of key studies and bodies of research for those who are interested in improving
the web users' experience by simplifying the processes of going back to resources visited
seconds, minutes, hours, weeks, or even months ago. It is meant for developers and
researchers, browser and search engine producers, web usability professionals, and those who
feel an irresistible urge to creatively innovate the web. The time has come to design and offer
more appropriate history support. This survey aims at providing a foundation, as well as
5
valuable ideas for doing so.
Current tools (such as browser histories) only provide users with basic information such as
the date of the last visit and title of the page visited. In this project, we describe a system that
provides users with descriptive topic-phrases that aid re-finding. Unlike prior work, our
system considers both the content of a webpage and the context in which the page was
visited. Preliminary evaluation of this system suggests users find this approach of combining
content with context useful.
Taking advantages of access context (like time, location, concurrent activity), context-based
search of previously accessed web pages is also being investigated, due to the fact that
context under which information is accessed tends to be more easily to remember than
content. To mimic users’ memory recall, we present a way to automatically capture user’s
access context from user’s concurrent computer programs, and manage it in a probabilistic
context tree for each accessed web page in a life cycle.
6
Table 2.1: Literature Survey on Personal
Web Revisitation
S.No Year Author Title Technique Advantage Disadvantage
7
4 2017 M. Mayer Web History This smmarizes The value of the Tool support
Tools And existing knowledge current, mainly for re-
Revisitation about revisitations non-graphical visitation is
Support: A on the web, and history tools is lacking.
Survey Of surveys the examined and
Existing potential of related to the
Approaches graphic-based web potential offered
And Directions history tools. by graphic-based
tools.
8
3. PERSONAL WEB REVISITATION BY CONTEXT
AND CONTENT KEYWORDS WITH RELEVANCE
FEEDBACK
3.1. ARCHITECTURE
Fig.3.1 plots our personal web revisitation framework with relevance feedback. It consists of
two main phases.
(1) Preparation for web revisitation: When a user accesses a web page, which is of potential
to be revisited later by the user (i.e., page access time is over a threshold), the context
acquisition and management module captures the current access context (i.e., time, location,
activities inferred from the currently running computer programs) into a probabilistic context
tree. Meanwhile, the content extraction and management module performs the unigram based
extraction from the displayed page segments and obtains a list of probabilistic content terms.
The probabilities of acquired context instances and extracted content terms reflect how likely
the user will refer to them as memory cues to get back to the previously focused page.
(2) Web revisitation: Later, when a user requests to get back to a previously focused page
through context and/or content keywords, the re-access by context keywords module and re-
access by content keywords module search the probabilistic context tree repository and
9
probabilistic term list repository, respectively. The result generation and feedback adjustment
module combines the two search results and returns to the user a ranked list of visited page
URLs. The relevance feedback mechanism dynamically tunes influential parameters
(including memories’ decay rates, page reading time threshold, interleaved window size
threshold, weight vectors in computing the association and impression scores), which are
critical to the construction and management of context and content memories for personal
web revisitation.
3.2. MODULES
Context Acquisition
Three kinds of user’s access context, i.e., access time, access location, and concurrent
activities, are captured. While access time is determinate, access location can be derived from
the IP address of user’s computing device. By calling the public IP localization API, we can
map the IP address (e.g., ”166.111.71.131”) to a region (e.g., ”Beijing, Tsinghua
University”). In order to get a high-precision location, we further build an IP region
geocoding database, which could translate a static IP address to a concrete place like ”Lab
Building, Room 216”. If the user’s GPS information is available, a public GPS localization
application could also help localize the user to a Point of Interest (POI) in the region. User’s
concurrent activities are inferred from his/her computer programs, running before and after
the page access. We continuously monitor the change of user’s focused program windows,
which can be either a web page, a word file, or a chatting program window, etc., during user’s
interaction with the computer. Once a user visits a web page longer than a threshold τc,
computer programs that run interleaving with the current web access program for over τc
time are taken as the associated computer programs (i.e., context activities).
10
Construction of Probabilistic Context Trees
Access context (i.e., time, location, and concurrent computer programming activities) is
organized in a probabilistic context tree to support generalized revisit queries due to human
user’s cognitive understanding and progressive decay during learning and recalling processes
[30]. Each leaf node is bounded with a score in [0,1], stating the likelihood that this context
node is used as a contextual cue. In the activity subtree, leaf nodes’ scores are the association
scores defined in Def. 2. As time and location are deterministic, leaf nodes in the time and
location subtrees are set to 1.0. With the scores of all the independent leaf nodes available,
we can compute the scores of their parent nodes through Jordan formula [31], which is
defined as the union of n random events based on the inclusion-exclusion principle.
Busy as a context node is a general activity status to describe whether the associated
computer programs are concerning about working or learning. On the contrary, relaxed
describes the status of entertainment and leisure. We apply the Dewey encoding scheme to
probabilistic context trees based on [32], [33], [34]. Dewey code is a widely used coding
scheme for tree structure, where each node is assigned a Dewey number to represent the path
from the root to the node. Each component of the path represents the local order of an
ancestor node. For example, a tree node n encoded as n1.n2 ...nk is a descendant of tree node
m encoded as m1.m2 ...mf iff k>f and n1.n2 ...nf = m1.m2 ...mf. In our probabilistic context
trees, the Dewey number of the root is actually the tree id. For each node in a probabilistic
context tree, we build a Trie-based index according to its keywords.
The obtained probabilistic context trees will evolve dynamically in life cycles to reflect the
gradual degradation of human’s episodic memorization as well as the context keywords that
users will use for recall. That is, for each node in the probabilistic context tree, its association
score will progressively decay with time. Psychological study [35] showed that the
memorization status of a value v can be expressed as a function of the exponential in the
square root of elapsing time (also called age).
For different hierarchical values in the probabilistic context tree, as specific values at lower
levels usually degrade faster than general ones at upper levels in human’s memory, different
decay rates are assigned in line with the Ebbinghaus Forgetting Curve, a graph illustrating
11
how we forget information over time. It was formulated in 1885 by psychologist Hermann
Ebbinghaus, who conducted experiments on himself to understand how long the human mind
retains information over time. Ebbinghaus took himself as a test subject to examine his own
capacity to recollect information by creating a set of 2,300 three-letter, meaningless words to
memorize. He studied multiples lists of these words and tested his recall of them at different
time intervals over a period of one year. Ebbinghaus discovered that 58.2% was remembered
after 20 minutes, 44.2% after 1 hour, 35.8% after 8-9 hours, 33.7% after 1 day, 27.8% after 2
days, and 25.4% after 6 days. Fitting formula with these experimental values, we can
calculate and obtain seven different decay rates, and the average decay rate approximates to
0.05. Further similar memorization experiments on meaningful essays and poems
demonstrate similar exponential decay patterns in the square root of elapsing time, whose
corresponding decay rates exhibit linear relationships with that on words,. Based on these
findings, we initialize the decay rates at different hierarchical levels.
Apart from memory degradation, the probabilistic context tree may also experience
reinforcement due to user’s revisit queries. That is, if user types in a context value in the
context tree, its possibly degraded association score is reset to the original one, and all its
ancestors’ scores (if degraded) are also re-computed based on this original value. The decay
starting time for its located level is meanwhile reset to the current time.
Apart from access context, users may also get back to the previous viewed pages through
some content keywords. Instead of extracting content terms from the full web page, we only
consider the page segments shown on the screen. There are many term weighting schemes in
the information retrieval field. The most generic one is to calculate term frequency-inverse
document frequency (tf-idf) [36]. For personalized web revisitation, merely counting the
occurrence of a term in the presented page segment is not enough. Also, user’s web page
browsing behaviors (e.g. visitation time length and highlighting or not), as well as page’s
subject headings, are counted as user’s impression and potential interest indicators for later
recall. In a similar manner as access context, we bind an impression score to each extracted
content term d, showing how likely the user will refer to it for recall based on the four
normalized features.
To gain the speed benefits of indexing at retrieval time, we apply Trie tree to organize the
12
extracted term lists based on the longest common prefix. For each term at Trie tree, inverted
index is then built to store a mapping from extracted term lists in advance. Within a target
page collection, we assume that each page has a unique serial number, known as the page
identifier (PageID). During index construction, the input is term lists for the web pages, we
insert the terms into the Trie tree. Meanwhile instances of the same term are grouped
together, and the result is split into a dictionary and postings as shown in the right column.
The dictionary records some statistics, such as the number of web pages that contain each
term (Pagefreq), which also corresponds to the length of each postings list. And postings list
stores a list of pairs of impression score dIs(w,d,t) and PageID for a term d. Here, the time
complexity of building content term lists is O(nd ·|d|), where nd is the number of extracted
terms, and |d| is the average term length.
Now each user’s accessed web page w is bounded with a probabilistic context tree (denoted
as w#tree) and a probabilistic term list (denoted as w#list). Let W be the set of user’s
previously accessed web pages. A revisit query posted by the user at time t is expressed,
where Qc is a set of context keywords, Qd is a set of content keywords, and answer Wm is a
ranked list of matched web pages from W.
The detailed procedure is illustrated in Algorithm 1. Through scanning the inverted index, the
candidate matched page set Wc can be determined based on matched context trees and
matched term lists against a revisit query Q. To compute context ranking, it firstly splits the
matched context tree into multiple satisfactory subtrees, then traverses the matched nodes to
merge ancestor nodes with child nodes along the same hierarchical path. Further, the matched
web pages’ ranking score is the product of context ranking and content ranking. Finally, the
matched pages with lower ranking score are removed, where the parameter δ is initially
13
assigned to 0.2, and dynamically tuned based on relevance feedback.
Relevance feedback is an interactive approach that has been shown to work particularly well
in classical information retrieval and more recently in web search domain [36]. When a user
interacts with WebPagePrev during web revisitation phase, she can either manually enter
some context keywords, or pick up suggested values from contextual hierarchies by clicking
the left side buttons of time, location, and activity bars. Each contextual hierarchy is
dynamically maintained by analyzing the user’s clicking behaviors and the statistical
frequencies of captured context instances. Frequently accessed context items are top listed in
the corresponding contextual hierarchy. User’s typos in refinding requests are automatically
corrected by the system based on its indexed content and context keywords.
14
3.3.1 Use Case Diagram
A use case diagram in the Unified Modeling Language (UML) is a type of behavioral
diagram defined by and created from a Use-case analysis. Its purpose is to present a graphical
overview of the functionality provided by a system in terms of actors, their goals (represented
as use cases), and any dependencies between those use cases. The main purpose of a use case
diagram is to show what system functions are performed for which actor. Roles of the actors
in the system can be depicted.
15
3.3.2 Sequence Diagram
16
3.3.3 Activity Diagram
17
3.3.4 Class Diagram
In software engineering, a class diagram in the Unified Modeling Language (UML) is a type
of static structure diagram that describes the structure of a system by showing the system's
classes, their attributes, operations (or methods), and the relationships among the classes. It
explains which class contains information.
The DFD is also called as bubble chart. It is a simple graphical formalism that can be used to
represent a system in terms of input data to the system, various processing carried out on this
data, and the output data is generated by this system. The data flow diagram (DFD) is one of
the most important modeling tools. It is used to model the system components. These
components are the system process, the data used by the process, an external entity that
interacts with the system and the information flows in the system.
18
DFD shows how the information moves through the system and how it is modified by a series
of transformations. It is a graphical technique that depicts information flow and the
transformations that are applied as data moves from input to output.
DFD is also known as bubble chart. A DFD may be used to represent a system at any level of
abstraction. DFD may be partitioned into levels that represent increasing information flow
and functional detail.
Fig 3.6(a) Data Flow Diagram of Fig 3.6(b) Data Flow Diagram of
Admin User
19
4. PERSONAL WEB REVISITATION SYSTEM TESTING
AND RESULTS
Software Validation
Validation is process of examining whether or not the software satisfies the user
requirements. It is carried out at the end of the SDLC. If the software matches requirements
for which it was made, it is validated.
Software Verification
Verification is the process of confirming if the software is meeting the business
requirements, and is developed adhering to the proper specifications and methodologies.
● Errors - These are actual coding mistakes made by developers. In addition, there
is a difference in output of software and desired output, is considered as an error.
20
● Fault - When error exists fault occurs. A fault, also known as a bug, is a result of
an error which can cause system to fail.
● Failure - failure is said to be the inability of the system to perform the desired
task. Failure occurs when fault exists in the system.
● Manual - This testing is performed without taking help of automated testing tools. The
software tester prepares test cases for different sections and levels of the code,
executes the tests and reports the result to the manager.
Manual testing is time and resource consuming. The tester needs to confirm whether
or not right test cases are used. Major portion of testing involves manual testing.
● Automated This testing is a testing procedure done with aid of automated testing
tools. The limitations with manual testing can be overcome using automated test
tools.
A test needs to check if a webpage can be opened in Internet Explorer. This can be easily
done with manual testing. But to check if the web-server can take the load of 1 million users,
it is quite impossible to test manually.
There are software and hardware tools which helps tester in conducting load testing, stress
testing, regression testing.
Testing Approaches
Tests can be conducted based on two approaches –
● Functionality testing
● Implementation testing
When functionality is being tested without taking the actual implementation in concern it is
known as black-box testing. The other side is known as white-box testing where not only
functionality is tested but the way it is implemented is also analyzed.
Exhaustive tests are the best-desired method for a perfect testing. Every single possible value
in the range of the input and output values is tested. It is not possible to test each and every
value in real world scenario if the range of values is large.
21
Black-box testing
It is carried out to test functionality of the program. It is also called ‘Behavioral’ testing. The
tester in this case, has a set of input values and respective desired results. On providing input,
if the output matches with the desired results, the program is tested ‘ok’, and problematic
otherwise.
In this testing method, the design and structure of the code are not known to the tester, and
testing engineers and end users conduct this test on the software.
● Equivalence class - The input is divided into similar classes. If one element of a
class passes the test, it is assumed that all the class is passed.
● Boundary values - The input is divided into higher and lower end values. If these
values pass the test, it is assumed that all values in between may pass too.
● Cause-effect graphing - In both previous methods, only one input value at a time
is tested. Cause (input) – Effect (output) is a testing technique where
combinations of input values are tested in a systematic way.
White-box testing
It is conducted to test program and its implementation, in order to improve code efficiency or
structure. It is also known as ‘Structural’ testing.
In this testing method, the design and structure of the code are known to the tester.
Programmers of the code conduct this test on the code.
● Control-flow testing - The purpose of the control-flow testing to set up test cases
which cover all statements and branch conditions. The branch conditions are
tested for both being true and false, so that all statements can be covered.
22
● Data-flow testing - This testing technique emphasis to cover all the data
variables included in the program. It tests where the variables were declared
and defined and where they were used or changed.
Testing Levels
Testing itself may be defined at various levels of SDLC. The testing process runs parallel to
software development. Before jumping on the next stage, a stage is tested, validated and
verified.
Testing separately is done just to make sure that there are no hidden bugs or issues left in the
software. Software is tested on various levels -
Unit Testing
While coding, the programmer performs some tests on that unit of program to know if it is
error free. Testing is performed under white-box testing approach. Unit testing helps
developers decide that individual units of the program are working as per requirement and
are error free.
Integration Testing
Even if the units of software are working fine individually, there is a need to find out if the
units if integrated together would also work without errors. For example, argument passing
and data updation etc.
System Testing
The software is compiled as product and then it is tested as a whole. This can be
accomplished using one or more of the following tests:
● Performance testing - This test proves how efficient the software is. It tests the
effectiveness and average time taken by the software to do desired task.
Performance testing is done by means of load testing and stress testing where the
software is put under high user and data load under various environment
conditions.
● Security & Portability - These tests are done when the software is meant to work
on various platforms and accessed by number of persons.
23
Acceptance Testing
When the software is ready to hand over to the customer it has to go through last phase of
testing where it is tested for user-interaction and response. This is important because even if
the software matches all user requirements and if user does not like the way it appears or
works, it may be rejected.
● Alpha testing - The team of developer themselves perform alpha testing by using
the system as if it is being used in work environment. They try to find out how
user would react to some action in software and how the system should respond
to inputs.
● Beta testing - After the software is tested internally, it is handed over to the users
to use it under their production environment only for testing purpose. This is not
as yet the delivered product. Developers expect that users at this stage will bring
minute problems, which were skipped to attend.
Regression Testing
Whenever a software product is updated with new code, feature or functionality, it is tested
thoroughly to detect if there is any negative impact of the added code. This is known as
regression testing.
Testing Documentation
Testing documents are prepared at different stages -
Before Testing
Testing starts with test cases generation. Following documents are needed for reference –
● Test Policy document - This describes how far testing should take place before
releasing the product.
● Test Strategy document - This mentions detail aspects of test team, responsibility
matrix and rights/responsibility of test manager and test engineer.
24
matrix. These matrices help testers know the source of requirement. They can be
traced forward and backward.
● Test description - This document is a detailed description of all test cases and
procedures to execute them.
● Test case report - This document contains test case report as a result of the test.
● Test logs - This document contains test logs for every test case report.
After Testing
The following documents may be generated after testing :
● Test summary - This test summary is collective analysis of all test reports and
logs. It summarizes and concludes if the software is ready to be launched. The
software is released under version control system if it is ready to launch.
25
4.2. OUTPUT SCREENS
The fig 4.1 shows home page of the website personal web revisitation. This page consists of
the options such as home, user, admin and sign up.
26
The fig 4.2 shows a new user registration form. To gain access to the website as a new user,
the new user registration form has to be filled, which consists of name, email, password,
street, state, country, pincode and choose file options.
The fig 4.2 has a field called choose file, this field is used to select a display picture for the
user. It is a .jpg or .png file. The selection of the file is shown in fig 4.3.
27
Fig 4.4 Admin Login
The fig 4.4 shows admin login page where admin can login. It is a special login for admin to
give him special privileges. The user name and the password fields provide authentication to
the admin.
The fig 4.5 consists of admin home page it has the fields which admin can use such as home,
make site, view content, user content, user details, search history, logout options. The logout
option which is present helps the user to logout securely.
28
Fig 4.5 Admin Home Page
29
The fig 4.6 shows how to make our own site. The activity of the site has to be selected where
we have to select from the options busy or relaxed. We also have to give the site name and
enter a few keywords with we want to access the file in the future.
The fig 4.7 shows the selection of the files. This field is used to select the files which should
be displayed as the particular site.
30
Fig 4.8 Website Details Table
The fig 4.8 consists of website details which has a table containing id, site name, keyword,
updated date and the web link. This is a table which athe admin can use to check all the
available sites.
The fig 4.9 displays user details such as user id, user name, email, state, country, image. This
table can be used by the admin to check the users who are registered.
31
Fig 4.9 User Details Table
32
Fig 4.10 User Search History Table
The fig 4.10 displays the user search history of the pages which the user has visited before. It
displays user id, user name, country, file keywords and the weblinks of the user serach
history. This gives us all the information about the search of each user.
The fig 4.11 shows the user login page where user has to enter the user name and password
for login. This gives the authentication to the user to access his/her information.
33
Fig 4.11 User Login
34
The fig 4.12 displays the user home page where the user can access the information and can
logout from the page if needed. It consists of the tabs search, profile, history, revisitation and
logout.
The fig 4.13 shows query search by using context and content keyword where user can search
a query and then click on the search button.
35
Fig 4.14 Search Result of a Query
The fig 4.14 displays the search result of the query which consists of previously visited web
pages links with location, time and date. Fig 4.15 and fig 4.16 are the continuation of the
previous screenshot. The search result is the list with the most probable site at the first. The
last access time is the time about when the page was accessed recently and it also gives the
location of the last access.
36
Fig 4.15 Search Result of a Query
37
Fig 4.17 Accessing a Website
Once all the search results are displayed, the user can now access a particular site just by clicking it.
Fig 4.17 is an example to it. The content of the website is displayed as shown.
38
The fig 4.18 shows how the search is performed using the search bar. After typing the query
click enter to get the results.
After clicking enter the page displayed is shown in fig 4.19. This search also gives the last
access time and location of all the sites.
Fig 4.20 displays the user profile which consists of details which user has given during
registration. The details include name, state, location. It also consists of a ‘back’ tab which
takes us back to the previous page.
39
Fig 4.20 User Profile
40
The fig 4.21 shows the search history of the user . The search history table contains the
information searched by the user. It also gives the country where the search is performed
from and the key words that are used to to search the query. The last column consists of the
web link used to perform the search.
The fig 4.23 displays the search history by context and content keywords where user can
search for the previously accessed pages.
41
Fig 4.23 Search History by using Context and Content Keyword
The fig 4.23 shows the search history tab, with which we can search the history of the access
to the pages which contain the query. Fif 4.24 is the result of the search. It gives the table
which contains the country of access, file keywords and the web link.
42
5. CONCLUSION
CONCLUSION
Drawing on the characteristics of human brain memory in organizing and exploiting episodic
events and semantic words in information recall, this project presents a personal web
revisitation technique based on context and content keywords. Context instances and page
content are respectively organized as probabilistic term lists.
When a user does re-finding, s/he usually has certain purposes in mind, like preparing a
project proposal, writing codes, etc. WebPagePrev strives to support users to re-find what
they accessed through previous access time, location, concurrent activities, and content
keywords.
FUTURE SCOPE
Beyond that, more user-centric context factors (e.g., access purpose, expertise, background,
interest, etc.), as well as social context factors (e.g., external events, surrounding people, etc.),
could be inferred from user’s profile, agenda, and external service providers, and bounded
with the accessed pages. In this way, not only the user him/herself could benefit from such
rich contextual cues during re-finding process, but also other users with similar access
purpose and background could share the more directed page access.
43
REFERENCES
RESEARCH PROJECTS
[1] Li Jin, Gangli Liu, Chaokun Wang and Ling Feng, Senior Member, IEEE. Personal Web
Revisitation by Context and Content Keywords with Relevance Feedback. IEEE Transactions
on Knowledge and Data Engineering,Year: 2017,Volume: 29, Issue: 7
[2] L. Tauscher and S. Greenberg. How people revisit web pages:empirical findings and
implications for the design of historysystems. International Journal of Human Computer
Studies, 47(1):97–137, 1997.
[4] M. Mayer. Web history tools and revisitation support: a surveyof existing approaches and
directions. Foundations and Trends inHCI, 2(3):173–278, 2009.
[5] L. C. Wiggs, J. Weisberg, and A. Martin. Neural correlates ofsemantic and episodic
memory retrieval. Neuropsychologia, pages103–118, 1999.
[8] C. E. Kulkarni, S. Raju, and R. Udupa. Memento: unifying contentand context to aid
webpage re-visitation. In UIST, pages 435–436,2010.
44
[10] T. Deng, L. Zhao, H. Wang, Q. Liu, and L. Feng. Refinder:a context-based information
re-finding system. IEEE TKDE,25(9):2119–2132, 2013.
[11] T. Deng, L. Zhao, and L. Feng. Enhancing web revisitation bycontextual keywords. In
ICWE, pages 323–337, 2013.
[12] H. Takano and T. Winograd. Dynamic bookmarks for the WWW.In HYPERTEXT,
pages 297–298, 1998.
[13] S. Kaasten and S. Greenberg. Integrating back, history andbookmarks in web browsers.
In HCI, pages 379–380, 2001.
[15] R. Kawase, G. Papadakis, E. Herder, and W. Nejdl. Beyond theusual suspects: context-
aware revisitation support. In HT, pages27–36, 2011.
[16] D. Morris, M. R. Morris, and G. Venolia. Searchbar: a searchcentricweb history for task
resumption and information refinding.In CHI, pages 1207–1216, 2008.
[19] S. S. Won, J. Jin, and J. I. Hong. Contextual web history: usingvisual and contextual
cues to improve web browser history. InCHI, pages 1457–1466, 2009.
[20] T. V. Do and R. A. Ruddle. The design of a visual history tool tohelp users refind
information within a website. In ECIR, pages459–462, 2012.
[21] F. Rizzo, F. Daniel, M. Matera, S. Albertario, and A. Nibioli. Evaluating the semantic
45
memory of web interactions in the xmem project. In AVI, pages 185–192, 2006.
[23] S. Tyler and J. Teevan. Large scale query log analysis of re-finding. In WSDM, pages
191–200, 2010.
[24] J. Teevan. The re:search engine: simultaneous support for finding and re-finding. In
UIST, pages 23–32, 2007.
[25] E. Adar, J. Teevan, and S. T. Dumais. Large scale analysis of web revisitation patterns.
In CHI, pages 1197–1206, 2008.
[28] S. Dumais, E. Cutrell, J. Cadiz, G. Jancke, R. Sarin, and D. C. Robbins. Stuff i’ve seen:
a system for personal information retrieval and re-use. In SIGIR, 2003.
[29] R. Sorabji. Aristotle on memory. University of Chicago Press, 2rd edition, 2006.
[30] H. C. Ellis and R. R. Hunt. Fundamentals of human memory and cognition. William C.
Brown, 3rd edition, 1983.
[31] R. Durrett. Probability: theory and examples. Cambridge University Press, 4rd edition,
2010. [32] L. Guo, F. Shao, C. Botev, and J. Shanmugasundaram. XRANK: ranked keyword
search over xml documents. In SIGMOD, pages 16–27, 2003.
[33] J. Li, C. Liu, R. Zhou, and W. Wang. Top-k keyword search over probabilistic xml data.
46
In ICDE, pages 673–684, 2011.
[34] H. Georgiadis and V. Vassalos. Improving the efficiency of xpath execution on relational
systems. In EDBT, pages 570–587, 2006.
[36] I. Ruthven and M. Lalmas. A survey on the use of relevance feedback for information
access systems. Knowledge Engineering Review, 18(2):95–145, 2003.
47
APPENDIX
//User signup
//reg.jsp
<!DOCTYPE html>
<html>
<%@page contentType="text/html" pageEncoding="UTF-8"%>
<head>
<title>Personal Web Revisitation</title>
<meta name="viewport" content="width=device-width, initial-scale=1">
<meta charset="utf-8">
<meta name="keywords" content="Megacorp a Responsive web template, Bootstrap Web
Templates, Flat Web Templates, Android Compatible web template,
Smartphone Compatible web template, free webdesigns for Nokia, Samsung, LG,
SonyEricsson, Motorola web design" />
<script>
addEventListener("load", function () {
setTimeout(hideURLbar, 0);
}, false);
function hideURLbar() {
window.scrollTo(0, 1);
}
</script>
<link href="css/bootstrap.css" rel='stylesheet' type='text/css' /> //Setting the style of the page
<link href="css/easy-responsive-tabs.css" rel='stylesheet' type='text/css' />
<link href="css/style.css" rel='stylesheet' type='text/css' />
<link href="css/mail.css" rel="stylesheet" type='text/css' media="all" />
<link href="css/font-awesome.css" rel="stylesheet">
<link
href="//fonts.googleapis.com/css?family=Roboto+Mono:300,300i,400,400i,500,500i,700"
rel="stylesheet">
</head>
<body>
<!--Header-->
<div class="header" id="home">
<!--top-bar-->
<div class="top-bar_w3agileits">
<div class="header-nav">
<div class="inner-nav_wthree_agileits"> //setting the toogle for better mobile display
<nav class="navbar navbar-default">
<!-- Brand and toggle get grouped for better mobile display -->
<div class="navbar-header">
<button type="button" class="navbar-toggle collapsed" data-toggle="collapse" data-
target="#bs-example-navbar-collapse-1">
<span class="sr-only">Toggle navigation</span>
<span class="icon-bar"></span>
<span class="icon-bar"></span>
48
<span class="icon-bar"></span>
</button>
</div>
<!-- Collect the nav links, forms, and other content for toggling -->
<div class="collapse navbar-collapse nav-wil" id="bs-example-navbar-collapse-1">
<nav>
<ul class="nav navbar-nav">
<li><a href="index.jsp">Home</a></li> //navigation bar for the user sign up page
<li><a href="user.jsp">User</a></li>
<li><a href="admin.jsp">Admin</a></li>
<li><a href="reg.jsp" class="active">Sign Up</a></li>
</ul>
</nav>
</div>
</nav>
<!-- <div class="search">
<div class="cd-main-header">
<ul class="cd-header-buttons">
<li><a class="cd-search-trigger" href="#cd-search"> <span></span></a></li>
</ul>
cd-header-buttons
</div>
<div id="cd-search" class="cd-search">
<form action="#" method="post">
<input name="Search" type="search" placeholder="Click enter after typing...">
</form>
</div>
</div>-->
<div class="clearfix"></div>
</div>
</div>
</div>
</div>
<!--//Header-->
<!--/inner_banner-->
<div class="inner_banner">
</div>
<!--//inner_banner-->
<!--/short-->
<div class="services-breadcrumb"> //code for all the fields in user signup
<div class="inner_breadcrumb">
<ul class="short">
<li><a href="user.jsp">Login</a><span>|</span></li>
<li>Registration</li>
</ul>
</div>
</div>
<!--//short-->
49
<!-- /inner_content -->
<div class="banner_bottom">
<div class="container">
<div class="mail_form">
<h3 class="tittle mail">New User <span>Registration </span></h3> /*title for new user
registration*/
<div class="inner_sec_info_wthree_agile">
<form method="post" action="Registration" enctype="multipart/form-data">
<center>
<br>
<span class="input input--chisato"> //code for name field
<input class="input__field input__field--chisato" name="Name" type="text" id="input-13"
placeholder="Name" required="" />
<label class="input__label input__label--chisato" for="input-13">
<span class="input__label-content input__label-content--chisato" data-content=""></span>
</label>
</span>
<span class="input input--chisato"> //code for email field
<input class="input__field input__field--chisato" name="Email" type="email" id="input-14"
placeholder="Email " required="" />
<label class="input__label input__label--chisato" for="input-14">
<span class="input__label-content input__label-content--chisato" data-content=""></span>
</label>
</span>
<br>
<span class="input input--chisato"> //code for password field
<input class="input__field input__field--chisato" name="Password" type="password"
id="input-15" placeholder="Password " required="" />
<label class="input__label input__label--chisato" for="input-15">
<span class="input__label-content input__label-content--chisato" data-
content="Password"></span>
</label>
</span>
<span class="input input--chisato"> //code for street field
<input class="input__field input__field--chisato" name="Street" type="text" id="input-15"
placeholder="Street " required="" />
<label class="input__label input__label--chisato" for="input-15">
<span class="input__label-content input__label-content--chisato" data-
content="Street"></span>
</label>
</span>
<br>
<span class="input input--chisato"> //code for state field
<input class="input__field input__field--chisato" name="State" type="text" id="input-15"
placeholder="State " required="" />
<label class="input__label input__label--chisato" for="input-15">
<span class="input__label-content input__label-content--chisato" data-
content="State"></span>
</label>
</span>
50
<span class="input input--chisato"> //code for country field
<input class="input__field input__field--chisato" name="Country" type="text" id="input-
15" placeholder="Country " required="" />
<label class="input__label input__label--chisato" for="input-15">
<span class="input__label-content input__label-content--chisato" data-
content="Country"></span>
</label>
</span>
<br>
<span class="input input--chisato"> //code for pincode field
<input class="input__field input__field--chisato" name="Pincode" type="text" id="input-
15" placeholder="Pincode " required="" />
<label class="input__label input__label--chisato" for="input-15">
<span class="input__label-content input__label-content--chisato" data-
content="Pincode"></span>
</label>
</span>
<span class="input input--chisato"> //code for uploading a photo
<input class="input__field input__field--chisato" name="Photo" type="file" id="input-15"
placeholder="Photo " required="" />
<label class="input__label input__label--chisato" for="input-15">
<span class="input__label-content input__label-content--chisato" data-
content="Photo"></span>
</label>
</span>
<br>
<input type="submit" value="Submit"> //code for submitting the form
</center>
</form>
</div>
</div>
<div class="clearfix"> </div>
<!--footer -->
<!-- //footer -->
<!-- copyright -->
<div class="copyright">
<div class="container">
<div class="copyrighttop">
<ul>
<li>
<h4>Follow us on:</h4> //code for the footer bar
</li>
<li><a class="facebook" href="#"><i class="fa fa-facebook" aria-
hidden="true"></i></a></li>
<li><a class="facebook" href="#"><i class="fa fa-twitter" aria-hidden="true"></i></a></li>
<li><a class="facebook" href="#"><i class="fa fa-google-plus" aria-
hidden="true"></i></a></li>
<li><a class="facebook" href="#"><i class="fa fa-linkedin" aria-
hidden="true"></i></a></li>
51
</ul>
</div>
<div class="copyrightbottom">
<!--<p>© 2018 | Design By <a href="#">Jp</a></p>-->
</div>
<div class="clearfix"></div>
</div>
</div>
<!-- //copyright -->
<!-- js -->
<script type="text/javascript" src="js/jquery-2.2.3.min.js"></script>
<!-- //js -->
<!--search-bar-->
<script src="js/main.js"></script>
<!--//search-bar-->
<script>
$('ul.dropdown-menu li').hover(function () {
$(this).find('.dropdown-menu').stop(true, true).delay(200).fadeIn(500);
}, function () {
$(this).find('.dropdown-menu').stop(true, true).delay(200).fadeOut(500);
});
</script>
<!-- start-smoth-scrolling -->
<script type="text/javascript" src="js/move-top.js"></script>
<script type="text/javascript" src="js/easing.js"></script>
<script type="text/javascript">
jQuery(document).ready(function ($) {
$(".scroll").click(function (event) {
event.preventDefault();
$('html,body').animate({
scrollTop: $(this.hash).offset().top
}, 900);
});
});
</script>
<!-- start-smoth-scrolling -->
<script type="text/javascript">
$(document).ready(function () {
/*
var defaults = { //to move to the top of the screen
containerID: 'toTop', // fading element id
containerHoverID: 'toTopHover', // fading element hover id
scrollSpeed: 1200,
easingType: 'linear'
};
*/
$().UItoTop({
easingType: 'easeOutQuart'
52
});
});
</script>
<a href="#" id="toTop" style="display: block;"> <span id="toTopHover" style="opacity:
1;"> </span></a>
<script type="text/javascript" src="js/bootstrap-3.1.1.min.js"></script>
</body>
</html>
<script>
addEventListener("load", function () {
setTimeout(hideURLbar, 0);
}, false);
function hideURLbar() {
window.scrollTo(0, 1);
}
</script> //setting the style of the page
<link href="css/bootstrap.css" rel='stylesheet' type='text/css' />
<link href="css/easy-responsive-tabs.css" rel='stylesheet' type='text/css' />
<link href="css/style.css" rel='stylesheet' type='text/css' />
<link href="css/mail.css" rel="stylesheet" type='text/css' media="all" />
<link href="css/font-awesome.css" rel="stylesheet">
<link
href="//fonts.googleapis.com/css?family=Roboto+Mono:300,300i,400,400i,500,500i,700"
rel="stylesheet">
</head>
<body>
<!--Header-->
<div class="header" id="home">
<!--top-bar-->
53
<div class="top-bar_w3agileits">
<!--//short-->
<!-- /inner_content -->
<div class="banner_bottom">
54
<div class="container">
<div class="mail_form"> //a table to display user search history and its alignment
<h3 class="tittle mail">User Search History</h3>
<!-- start body--->
<br><table border="2" style="text-align: center; margin-left: 40px; border-color:
black">
<tr>
<th style="text-align: center;width: 100px; font-size: 16px; color:
black">User ID</th>
<th style="text-align: center;width: 100px; font-size: 16px; color:
black">User Name</th>
<th style="text-align: center;width: 200px; font-size: 16px; color:
black">Country</th>
<th style="text-align: center;width: 200px; font-size: 16px; color:
black">File Keywords</th>
<th style="text-align: center;width: 300px; font-size: 16px; color:
black">Web Link</th>
</tr>
<tr>
<%
</tr>
<%
}
} catch (Exception ex) {
ex.printStackTrace();
}
%>
</table>
55
<br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br>
<br>
<!---End body---->
<li><a class="facebook" href="#"><i class="fa fa-twitter" aria-hidden="true"></i></a></li>
<li><a class="facebook" href="#"><i class="fa fa-google-plus" aria-
hidden="true"></i></a></li>
<li><a class="facebook" href="#"><i class="fa fa-linkedin" aria-
hidden="true"></i></a></li>
<!-- //copyright -->
<!-- js -->
<script type="text/javascript" src="js/jquery-2.2.3.min.js"></script>
<!-- //js -->
<!--search-bar-->
<script src="js/main.js"></script>
<!--//search-bar-->
<script>
$('ul.dropdown-menu li').hover(function () {
$(this).find('.dropdown-menu').stop(true, true).delay(200).fadeIn(500);
}, function () {
$(this).find('.dropdown-menu').stop(true, true).delay(200).fadeOut(500);
});
</script>
<!-- start-smoth-scrolling -->
<script type="text/javascript" src="js/move-top.js"></script> //scroll bar for accessing
the whole table*/
<script type="text/javascript" src="js/easing.js"></script>
<script type="text/javascript">
jQuery(document).ready(function ($) {
$(".scroll").click(function (event) {
event.preventDefault();
$('html,body').animate({
scrollTop: $(this.hash).offset().top
}, 900);
});
});
</script>
<!-- start-smoth-scrolling -->
<script type="text/javascript">
$(document).ready(function () {
/*
var defaults = {
containerID: 'toTop', // fading element id
containerHoverID: 'toTopHover', // fading element hover id
scrollSpeed: 1200,
easingType: 'linear'
};
*/
56
$().UItoTop({
easingType: 'easeOutQuart'
});
});
</script>
<a href="#" id="toTop" style="display: block;"> <span id="toTopHover" style="opacity:
1;"> </span></a>
<script type="text/javascript" src="js/bootstrap-3.1.1.min.js"></script>
</body>
</html>
57