Sei sulla pagina 1di 4

88 (IJCNS) International Journal of Computer and Network Security,

Vol. 2, No. 10, 2010

Veracity Finding From Information Provided on the


Web
D.Vijayakumar1, B.srinivasarao2, M.Ananda Ranjit Kumar 3

JNTU UNIVERSITY, P.V.P.S.I.T, C.S.E Dep., Vijayawada, A.P., India,


1 2
dvk_mathematics@yahoo.com , buragasrinivasarao@gmail.com

3
Asst.Professor in C.S.E Dep., L.B.R.College of engineering, Vijayawada, A.P., India,
anandranjit@gmail.com

some website and such a fact can be either true or false. In


Abstract: The quality of information on the web has always
been a major concern for internet users. On machine learning this paper, we only study the facts that are either
approach, the quality of web page is defined by human properties of objects or relationships between two objects we
preference. The two Approaches Page Rank and Authority-Hub also require that the facts can be parsed from the web pages.
analysis are used to find pages with high There are often conflicting facts on the Web, such as
authorities.Unfortunately; the popularity of web page does not different sets of authors for a book. There are also many
necessarily lead to accuracy of information. In his paper, we
websites, some of which are more trustworthy than others. A
propose “Truth Finder”, which utilizes the relationships
between web sites and their information, i.e., a web site is
fact is likely to be true if it is provided by trustworthy
trustworthy if it provides many pieces of true information, and a websites. A website is trustworthy if most facts it provides
piece of information is likely to be true if it is provided by many are true. At each iteration, the probabilities of facts being
trustworthy web sites. Our experiments show that Truth Finder true and the trustworthiness of websites is inferred from
successfully finds quality of information better than Page Rank each other. This iterative procedure is rather different from
and Authority-Hub analysis, and identifies trustworthy web sites Authority-Hub analysis. Thus, we cannot compute the
better than the popular search engines.
trustworthiness of a website by adding up the weights of its
Keywords: page rank, hub analysis, Trustworthy. facts as in, nor can we compute the probability of a fact
being true by adding up the trustworthiness of websites
1. Introduction providing it instead, we have to resort to probabilistic
computation. Second and more importantly, different facts
THE World Wide Web has become a necessary part of influence each other. For example, if a website says that a
our lives and might have become the most important book is written by “Jessamyn Wendell” and another says
information source for most people. Everyday, people “Jessamyn Burns Wendell,” then these two websites actually
retrieve all kinds of information from the Web. For support each other although they provide slightly different
example, when shopping online, people find product facts. We incorporate such influences between facts into our
specifications from websites like Amazon.com or computational model. In summary, we make three major
ShopZilla.com. When looking for interesting DVDs, they distributions in this paper. First, we formulate the Veracity
get information on websites such as NetFlix.com problem about how to discover true facts from conflicting
Unfortunately, the popularity of web pages does not information. Second, we propose a framework to solve this
necessarily leads Accuracy of information. two. problem, by defining the trustworthiness of Websites,
Observations are made in our experiments: 1) even the most confidence of facts, and influences between facts. Finally,
popular websites may Contain many errors, whereas some we propose an algorithm called TRUTHFINDER for
comparatively not so popular Websites may provide more identifying true facts using iterative methods. Our
accurate information. 2) More accurate information can be experiments show that TRUTHFINDER achieves very high
inferred by using many different websites instead of relying accuracy in discovering true facts, and it can select better
on single website. trustworthy websites than authority-based search engines
such as Google.
2. Problem Definition
In this paper, we propose a new problem called the 3. Basic Definitions
Veracity problem, which is formulated as follows: Given a Confidence of facts: -The confidence of fact f is the
large amount of conflicting information about many objects, probability of f being correct, according to the best of our
which is provided by multiple websites, how can we Knowledge and is denoted by s (f).
discover the true fact about each object? We use the word
“fact” to represent something that is claimed as a fact by
(IJCNS) International Journal of Computer and Network Security, 89
Vol. 2, No. 10, 2010

Trustworthiness of websites: - The trustworthiness of a on different web sites E.g., “Jennifer Widom” vs. “J.
website w is the expected confidence of the facts provided by Widom”.The false facts on different web sites are less
w and is denoted by t (w) likely to be the same or similar. False facts are often
introduced by random factors. A web site that provides
Our Problem Setting mostly true facts for many objects will likely provide
true facts for other objects
• Each object has a set of conflictive facts
• E.g., different author names for a book 4.4 Overview of Our Method
• And each web site provides some facts 3.3.1 Confidence of facts ↔ Trustworthiness of web
• How to find the true fact for each object? sites
A fact has high confidence if it is provided by (many)
trustworthy web sites. A web site is trustworthy if it
provides many facts with high confidence.
Our method, Truth Finder Initially, each web site is
equally trustworthy and Based on the above four
heuristics, infer fact confidence from web site
trustworthiness, and then backwards. Repeat until
achieving stable state.

Web sites facts

Figure 1. Input of Truth finder W1 f1


High
High
3.1 Trustworthiness of the Web Worthiness
The trustworthiness problem of the web. According to a Confidence Hubs Authorities
survey on credibility of web sites:
Figure 2. Facts ↔ Authorities, Web sites ↔ Hubs
• 54% of Internet users trust news web sites most
of time
• 26% for web sites that sell products
3.3.2 Difference from authority-hub analysis
• 12% for blogs
Given a large amount of conflicting information about Linear summation cannot be used. A web site is trustable
many objects, provided by multiple web sites How to if it provides accurate facts, instead of many facts.
discover the true fact about each object? Confidence is the probability of being true. Different
Different websites often provide conflicting info. On a facts of the same object influence each other.
subject, e.g., Authors of “ Rapid Contextual Design”

Table1: Conflicting Information about Book Authors 0.0


User TruthFinder User
Online Store Authors System
Powell’s Holtzblatt, Karen
books
Barnes & Karen Holtzblatt, Jessamyn
Noble Wendell, Shelley Wood Figure 3. Context Diagram
A1 Books Karen Holtzblatt, Jessamyn Burns
Wendell, Shelley Wood 4. Modules
Cornwall Holtzblatt-Karen, Wendell-
books Jessamyn Burns, Wood Collection data: First we have to collect the specific data
Mellon’s Wendell, Jessamyn about an object from different websites. The collected data
books is stored in related database. Create table for specific
Lakeside Wendell, Jessamynholtzblatt, object and store the facts about a particular object.
books Karenwood, Shelley
Blackwell Wendell, Jessamyn, Holtzblatt, Data search: Searching the related data link according to
online Karen, Wood, Shelley user input. In this module user retrieve the specific data
about an object. Here user searches data in three
4.3 Basic Heuristics for Problem Solving ways.1.Normal search2.Page rank search

There is usually only one true fact for a property of an Truth finder search: We design a general framework
object. This true fact appears to be the same or similar for the Veracity problem, and invent an algorithm Called
90 (IJCNS) International Journal of Computer and Network Security,
Vol. 2, No. 10, 2010

Truth Finder. It utilizes the relationships between web Set of facts provided
sites and their information. That is a web site is by w
trustworthy if it provides many pieces of true information,
and a piece of information is likely to be true if it is We compute the confidence of a fact f: s (f)
provided by many trustworthy web sites. One minus the probability
that all web sites providing f are wrong
Result calculation: For each response of the query we Probability that w
Is wrong
s( f ) =1− ∏(1−t(w))
are calculating the Performance. Using the count
calculated find the best link and show as the out put. w∈W( f )
Set of websites providing f

Truth Discovery with multiple


conflicting information

5.2 Computation Model (2)


Home
Influence between related facts
Example: For a certain book B
w1: B is written by “Jennifer Widom” (fact f1)
Login Login w2: B is written by “J. Widom” (fact f2)
Validatio f1 and f2 support each other
n If several other trustworthy web sites say this
book is written by “Jeffrey Ullman”, then f1
Query
and f2 are likely to be wrong
Process
5.3 Computation Model (3)

A user may provide “influence function” between


Search Conflicting related facts (e.g., f1 and f2 )
Engine Web Pages E.g., Similarity between people’s names

The confidence of related facts are adjusted according


Truth Truth Finder to the influence function
Finder Webpage’s
t(w1)
w1 s(f1)
Output f1
t(w2) s(f2) o1
Figure 4. System Architecture
w2
f2
5. Computational Model t(w3)
w3
A Website trustworthiness if it provides facts with high
confidence. We can see that the website trustworthiness and
fact confidence are determined by each other, and we can
use an iterative method to compute both. Because true facts Figure 5. .Computing confidence of a fact.
are more consistent than false facts. We introduce the model
of iterative computation Experiments: Finding Truth of Facts
Determining authors of books
5.1 Computation Model (1): t (w) and s(f) • Dataset contains 1265 books listed on
abebooks.com
We compute the trustworthiness of a web site w: t(w) by • We analyze 100 random books (using book mages)
calculating the
Average confidence of facts it provides
Sum of fact confidence
∑ s( f )
t (w) = f ∈F ( w)

F (w)
(IJCNS) International Journal of Computer and Network Security, 91
Vol. 2, No. 10, 2010

http://mathworld.wolfram.com/LogisticEquation.html
, 2008.

[2] T. Mandl, “Implementation and Evaluation of a


Table 2: Comparison of the Results of Voting, Truth finder, Quality-Based Search Engine,” Proc. 17th ACM
And Barnes & Noble Conf. Hypertext and Hypermedia, Aug. 2006.
Truth Barnes & [3] R. Guha, R. Kumar, P. Raghavan, and A. Tomkins,
Case Voting
Finder Noble “Propagation of Trust and Distrust,” Proc. 13th Int’l
Correct 71 85 64 Conf. World Wide Web (WWW), 2004.
Miss author(s) 12 2 4 [4] G. Jeh and J. Widom, “SimRank: A Measure of
Incomplete 18 5 6 Structural-Context Similarity,” Proc. ACM SIGKDD
names ’02, July 2002.
Wrong 1 1 3 [5] J.M. Kleinberg, “Authoritative Sources in a
first/middle Hyperlinked Environment,” J. ACM, vol. 46, no. 5,
names pp. 604-632, 1999.
Has redundant 0 2 23 [6] J.S. Breese, D. Heckerman, and C. Kadie,
names “Empirical Analysis of Predictive Algorithms for
Add incorrect 1 5 5 Collaborative Filtering,” technical report, Microsoft
names Research, 1998.
No information 0 0 2

Experiments: Trustable Info Providers Author Profile

Finding trustworthy information sources most D.Vijayakumar received the B.sc Degree
trustworthy bookstores found by Truth Finder vs. Top from ANU University in 2002 and the
ranked bookstores by Google (query “bookstore”) M.sc mathematics degree from ANU
University in 2004. He is doing M.Tech,
Table 3: Comparison of the Accuracies of Top Computer science & Engineering in
Bookstores by TRUTHFINDER and by Google P.V.P.S.I.T, Vijayawada, Andhra Pradesh,
Truth finder India. Now he was working as a lecturer
in Sri Viveka Institute Of Technology,
Bookstore trustworthiness #boo Accurac
k y
TheSaintBookstor 0.971 28 0.959
e
MildredsBooks 0.969 10 1.0
Alphacraze.com 0.968 13 0.947

Google
Bookstore Google rank #book Accuracy
Barnes & Noble 1 97 0.865
Powell’s books 3 42 0.654

6. Conclusion

In this paper, we introduce and formulate the Veracity


Problem, which aims at resolving conflicting facts from
multiple websites and finding the true facts among them.
We propose TRUTHFINDER, an approach that utilizes the
interdependency between website trustworthiness and fact
confidence to find trustable websites and true facts.
Experiments show that TRUTHFINDER achieves high
accuracy at finding true facts and at the same time identifies
Websites that provide more accurate information

References:

[1] LogisticalEquationfrom Wolfram MathWorld,

Potrebbero piacerti anche