Sei sulla pagina 1di 62

Truth Discovery with Multiple Conflict Information

Dept of CSE

A Project report on

TRUTH DISCOVERY WITH MULTIPLE CONFLICT INFORMATION


Mini Project report Submitted in partial fulfillment of the Requirements for the award of the Degree of BACHELOR OF TECHNOLOGY In COMPUTER SCIENCE & ENGINEERING By
1) SASIKANTH.DRONAVALLI 2) APPARAO.ETCHERLA (09NF1A0513) (09NF1A0514)

Under the Esteemed Guidance of Dr.UMA DEVI


Assoc. Prof of the C.S.E Department

DEPARTMENT OF COMPUTER SCIENCE & ENGINEERING


UNIVERSAL COLLEGE OF ENGINEERING AND TECHNOLOGY
(Approved by A.I.C.T.E, Affiliated to J.N.T.U, Kakinada)

DOKIPARRU, PERECHERLA, GUNTUR-522438, AP (2009-2013)


Universal College of Engineering & Technology Page 1

Truth Discovery with Multiple Conflict Information

Dept of CSE

UNIVERSAL COLLEGE OF ENGINEERING & TECHNOLOGY


(Affiliated to jntu, Kakinada & Approved by AICTE New Delhi) DOKIPARRU (P), PERICHARLA (M), GUNTUR-522438.

DEPARTMENT OF COMPUTER SCIENCE OF ENGINEERING

CERTIFICATE
This is to certify that the project report entitled TRUTH DISCOVERY WITH MULTIPLE CONFLICT INFORMATION being carried by Mr. D.Sasikanth,E.Apparao in the partial fulfillment for the award of the degree of Bachelor of technology in COMPUTER SCIENCE &

ENGINEERING to Jawaharlal Nehru Technological University is a record of bonafied work carried out by him under my guidance and supervision. The results embodied in this project have not been submitted to any other University or Institute for the award Degree or Diploma. of any

Guide Name Dr.UMA DEVI Asst.prof of C.S.E

Head of the department Mr.Srinivasulu,M.Tech DeptProf of C.S.E Dept

External

Universal College of Engineering & Technology

Page 2

Truth Discovery with Multiple Conflict Information

Dept of CSE

ACKNOWLEDGEMENT
I feel immense pleasure to express my sincere thanks and profound sense of gratitude to all those people who played a valuable role for the successful completion of my project by their invaluable suggestions and advices. I am thankful to our principal Dr.S.SREENADH REDDY GARU, for permitting and encouraging me in doing this project. I am very much thankful to our secretary & correspondent of our college Fr.L.LOURDHU REDDY GARU, for his encouragement and motivations to complete my project. I am deeply intended to Mr.V.SRINIVASULUGARU, Prof. & HOD of the C.S.E Department, whose motivation and constant encouragement has led to pursue a project in the field of software development. I am very much obliged and thankful to my internal guide Dr.UMA DEVI GARU, Asst.Prof of the C.S.E Department for providing this opportunity and constant encouragement given by him during the course. I am grateful to his valuable guidance and suggestions during my project. My Parents have put myself ahead of themselves. Because of their hard work and dedication, I have had opportunities beyond my wildest dreams. My heartfelt thanks to them for giving me all I ever needed to be successful student and individual. Finally I express my thanks to all my other professors, classmates, friends, neighbors and my family members who helped me for the completion of my project and without infinite love and patience this would never have been possible.
Universal College of Engineering & Technology Page 3

Truth Discovery with Multiple Conflict Information

Dept of CSE

TRUTH DISCOVERY WITH MULTIPLE CONFLICT INFORMATION

Universal College of Engineering & Technology

Page 4

Truth Discovery with Multiple Conflict Information

Dept of CSE

ABSTRACT The world-wide web has become the most important information source for most of us. Unfortunately, there is no guarantee for the correctness of information on the web. Moreover, different web sites often provide conflicting in-formation on a subject, such as different specifications for the same product. In this paper we propose a new problem called Veracity that is conformity to truth, which studies how to find true facts from a large amount of conflicting information on many subjects that is provided by various web sites. We design a general framework for the Veracity problem, and invent an algorithm called Truth Finder, which utilizes the relationships between web sites and their information, i.e., a web site is trustworthy if it provides many pieces of true information, and a piece of information is likely to be true if it is provided by many trustworthy web sites. Our experiments show that Truth Finder successfully finds true facts among conflicting information, and identifies trustworthy web sites better than the popular search engines.

Universal College of Engineering & Technology

Page 5

Truth Discovery with Multiple Conflict Information

Dept of CSE

TABLE OF CONTENTSPage No.


1. Introduction........................................................................... 2. Literature Survey................................................................... 3. Feasibility Study..................................................................... 4. System Specifications............................................................. 4.1 Hardware Requirements..................................................... 4.2Software Requirements....................................................... 5. Language Specifications.......................................................... 5.1 Over view of HTML............................................................ 5.2 **************** 5.3 **************** 5.4 **************** 5.5 **************** 5.6 **************** 6. Software Design.........................................................................41 6.1 Design of Data Base.............................................................. 6.2 UML Diagrams...................................................................... 7. System Testing............................................................................ 8. Output Screens........................................................................... Screen Shots........................................................................... 9. Conclusion................................................................................... 10.References / Bibliography.......................................................... 64 52 18 07 11 13 16

Universal College of Engineering & Technology

Page 6

Truth Discovery with Multiple Conflict Information

Dept of CSE

INTRODUCTION

Universal College of Engineering & Technology

Page 7

Truth Discovery with Multiple Conflict Information

Dept of CSE

ABOUT THE PROJECT


THE World Wide Web has become a necessary part of our lives and might have become the most important information source for most people. Every day, people retrieve all kinds of information from the Web. For example, when shopping online, people find product specifications from websites like Amazon.com or ShopZilla.com. When looking for interesting DVDs, they get information and read movie reviews on websites such as NetFlix.com or IMDB.com. When they want to know the answer to a certain question, they go to Ask.com or Google.com. Is the World Wide Web always trustable? Unfortunately, the answer is no. There is no guarantee for the correctness of information on the Web. Even worse, different websites often provide conflicting information.

PROPOSED SYSTEM

We formulate the Veracity problem about how to discover true facts from conflicting information.

Second, we propose a framework to solve this problem, by defining the trustworthiness of websites, confidence of facts, and influences between facts.

Finally, we propose an algorithm called Truthfinder for identifying true facts using iterative methods.

Universal College of Engineering & Technology

Page 8

Truth Discovery with Multiple Conflict Information

Dept of CSE

Modules of the project


Collection of unrelated data Data search Truth Finder algorithm Result calculation

Collection of data First we have to collect the specific data about an object and it is stored in related database. Create table for specific object and store the facts about a particular object. Data search Searching the related data link according to user input. In this module user retrieve the specific data about an object. Truth algorithm We design a general framework for the Veracity problem, and invent an algorithm called Truth Finder, which utilizes the relationships between web sites and their information, i.e., a web site is trustworthy if it provides many pieces of true information, and a piece of information is likely to be true if it is provided by many trustworthy web sites.

Universal College of Engineering & Technology

Page 9

Truth Discovery with Multiple Conflict Information

Dept of CSE

Result calculation For each response of the query we are calculating the Performance. Using the count calculated find the best link and show as the out put.

EXISTING SYSTEM

Page Rank and Authority-Hub analysis is to utilize the hyperlinks to find pages with high authorities.

These two approaches identifying important web pages that users are interested in, Unfortunately, the popularity of web pages does not necessarily lead to accuracy of information

DRAWBACKS OF EXISTNG SYSTEM

The popularity of web pages does not necessarily lead to accuracy of information.

Even the most popular website may contain many errors. Whereas some comparatively not-so-popular websites may provide more accurate information

Universal College of Engineering & Technology

Page 10

Truth Discovery with Multiple Conflict Information

Dept of CSE

LITERATURE SURVEY

Universal College of Engineering & Technology

Page 11

Truth Discovery with Multiple Conflict Information

Dept of CSE

LITERATURE SURVEY
Data quality: Data quality is the quality of data. Data are of high quality "if they are fit for their intended uses in operation, decision making and planning. Alternatively, the data are deemed of high quality if they correctly represent the real-world construct to which they refer. These two views can often be in disagreement, even about the same set of data used for the same purpose. Web Mining: Web mining is the integration of information gathered by traditional data mining methodologies and techniques with information gathered over the World Wide Web. (Mining means extracting something useful or valuable from a baser substance, such as mining gold from the earth.) Web mining is used to understand customer behavior, evaluate the effectiveness of a particular Web site, and help quantify the success of a marketing campaign. Link Analysis: The separation of an intellectual or material whole into its constituent parts for individual study.The study of such constituent parts and their interrelationships in making up a whole. A spoken or written presentation of such study: published an analysis of poetic meter.

Universal College of Engineering & Technology

Page 12

Truth Discovery with Multiple Conflict Information

Dept of CSE

FEASIBILITY STUDY

Universal College of Engineering & Technology

Page 13

Truth Discovery with Multiple Conflict Information

Dept of CSE

FEASIBILITY STUDY

All projects are feasible given unlimited resources and infinite time! Unfortunately, the development of computer based system or product is more likely plagued by a scarcity of resources and difficult delivery dates. It is both necessary and prudent to evaluate the feasibility of a project at the earliest possible time. Months or years of effort, thousands or millions of dollars, and untold professional embarrassment can be averted if an ill-conceived system is recognized early early in the definition phase.

Technical Feasibility:
Technical to design the project is feasibly, the entire modules described in the modules description can be created using front-end interaction HTML and back end database MS-Access.

Economic Feasibility:
Economic analysis is most frequently used method for evaluating the effectiveness of the proposed system. This is an ongoing effort that improves in accuracy at each phase of the system life cycle. As the necessary software and hardware is available in the house of internet centers, the initial investment for the proposed system is very less. The proposed system minimizes the time and effort right from data collection to answering queries, which results in significant saving in operation costs. Thus, the proposed is economically feasible.

Universal College of Engineering & Technology

Page 14

Truth Discovery with Multiple Conflict Information

Dept of CSE

Operational Feasibility:
In our application front end is developed using GUI.So it is very easy to the user to enter the necessary information. But user has some knowledge on using web application before going to use our application.

Social Feasibility:
It is a determination of whether the people will accept a proposed project or not.

Management Feasibility:
It determines whether the proposed project will be acceptable to the management.

Legal Feasibility:
The concern about the legalities is satisfied.

Time Feasibility:
It determines whether a proposed project can be implemented fully within stipulated time. We strongly feel that the proposed system is feasible in all respects.

Universal College of Engineering & Technology

Page 15

Truth Discovery with Multiple Conflict Information

Dept of CSE

SYSTEM SPECIFICATIONS

Universal College of Engineering & Technology

Page 16

Truth Discovery with Multiple Conflict Information

Dept of CSE

SYSTEM SPECIFICATIONS

The project is standalone application. When we talk about hardware and software,We have to mention the following requirements. Hardware:

PROCESSOR RAM

: PENTIUM IV 2.6 GHz

: 512 MB DD RAM

MONITOR : 15 COLOR HARD DISK : 20 GB

CDDRIVE : LG 52X KEYBOARD MOUSE : STANDARD 102 KEYS

: 3 BUTTONS

Software:

FRONT END TOOL USED

: Java, J2ee (JSP) : JFrameBuilder

OPERATING SYSTEM : Windows Xp BACK END : Sql Server 2000


Page 17

Universal College of Engineering & Technology

Truth Discovery with Multiple Conflict Information

Dept of CSE

LANGUAGE SPECIFICATIONS

Universal College of Engineering & Technology

Page 18

Truth Discovery with Multiple Conflict Information

Dept of CSE

LANGUAGE SPECIFICATIONS JAVA OVERVIEW:


Java is powerful but lean on Object Oriented programming language. It has generated a lot of excitement because it makes it possible to program for Internet by creating applets, programs that can be embedded in web page. The context of an applet is limited only by one's imagination. For example, an applet can be an animation with sound, an interactive game or a ticker tape with constantly updated stock prices. Applets can be just little decorations to liven up web page, or they can be serious applications like word processors or spreadsheet. But Java is more than a programming language for writing applets. It is being used more and more for writing standalone applications as well. It is becoming so popular that many people believe it will become standard language for both general purpose and Internet programming. There are many buzzwords associated with Java, but because of its spectacular growth in popularity, a new buzz word has appeared ambiguous, Indeed, all indications are that it will soon be everywhere. Java builds on the strength of C++. It has taken the best features of C++ and discarded the more problematic and error prone parts. To this lean core, it has added garbage collection (automatic memory management), multithreading (the capacity for one program to do more than one thing at a time), security capabilities. The result is that Java is simple, elegant, powerful and easy to use.

Universal College of Engineering & Technology

Page 19

Truth Discovery with Multiple Conflict Information

Dept of CSE

Java is actually a platform consisting of three components: 1. Java programming language. 2. Java is library of classes and interfaces. 3. Java is a Virtual Machine.

The following sections will say more about these components:

JAVA IS PORTABLE
One of the biggest advantages Java offers is that it is portable. An application written in Java will run on all the major platforms. Any computer with a Java based browser can run the applications or applets written in the Java programming language. A programmer no longer has to write one program to run on a

Macintosh, another program to run on a Windows machine, still another to run on a UNIX machine, and so on. In other words, with Java, developers write their programs only once. The Virtual Machine is what gives Java its cross platform capabilities. Rather than being compiled into Machine language, which is different for each operating systems and computer architecture, Java code is compiled into byte codes. With other languages, the program code is compiled into a language that the computer can understand. The problem is that other computers with different machine instruction set cannot understand that language. Java code, on the other hand is compiled into byte codes rather than a machine language. These

Universal College of Engineering & Technology

Page 20

Truth Discovery with Multiple Conflict Information

Dept of CSE

byte codes go to the Java Virtual Machine, which executes them directly or translates them into the language that is understood by the machine running it. With JDBC API extending Java, a programmer writing Java code can access all the major relational databases on any platform that supports the Java Virtual Machine.

JAVA IS OBJECT-ORIENTED
The Java programming language is object oriented, which makes program design focus on what you are dealing with rather than on how you are going to do something. This makes it more useful for programming in

sophisticated projects because one can break the things down into understandable components. A big benefit is that these components can then be reused.

Object oriented languages use the paradigm of classes. In simplest term, a class includes both the data and the functions to operate on the data. You can create an instance of a class, also called an object, which will have all the data members and functionality of its class. Because of this, you can think of a class as being like a template, with each object being a specific instance of a particular type of class. The class paradigm allows one to encapsulate data so that specific data values are those using the data can not see function implementation. Encapsulation makes it possible to make the changes in code without breaking other programs that use that code. If for example the implementation of a function is changed, the change is
Universal College of Engineering & Technology Page 21

Truth Discovery with Multiple Conflict Information

Dept of CSE

invisible to another programmer who invokes that function, and it does not affect his/her program, except hopefully to improve it. Java includes inheritance, or the ability to derive new classes from existing classes. The derived class, also called a subclass, inherits all the data and the functions of the existing class, referred to as the parent class. A subclass can add new data members to those inherited from the parent class. As far as methods are concerned, the subclass can reuse the inherited methods, as it is, change them, and/or add its own new methods.

JAVA MAKES IT EASY TO WRITE CORRECT CODE


In addition to being portable and object oriented, Java facilitates writing correct code. Programmers spend less time writing Java code and a lot less time

debugging it. In fact, developers have reported slashing development time by as much as two thirds. The following is a list of some of Java's features that make it easier to write correct code.

GARBAGE COLLECTION:
Automatically takes care of allocating and de allocating memory, a huge potential source of errors. If an object is no longer being used (has no references to it), then it is automatically removed from memory, or "Garbage Collected". Programmers don't have to keep track of what has been allocated and de allocated themselves, which makes their job a lot easier, but, more importantly it stops memory leaks.

Universal College of Engineering & Technology

Page 22

Truth Discovery with Multiple Conflict Information

Dept of CSE

NO POINTERS:

Eliminates big source errors. By using object references instead of many pointers, problems with pointer arithmetic are eliminated, and problems with inadvertently accessing the wrong memory address are greatly reduced.

STRONG TYPING:
Cuts down on runtime errors. Because Java enforces strong type checking, many errors are caught when code is compiled. Dynamic binding is possible and often very useful, but static binding with strict type checking is used when possible.

SIMPLICITY:
Java is made easier to learn and use correctly. Java keeps it simple by having just one way to do something instead of having several alternatives, as in some languages. Java also stays lean by not including multiple inheritance, which

eliminates the errors and ambiguity that arise when you create a subclass that inherits from two or more classes. To replace capabilities, multiple inheritance provides Java lets you add functionality to a class throw the use of interfaces.

Universal College of Engineering & Technology

Page 23

Truth Discovery with Multiple Conflict Information

Dept of CSE

JAVA INCLUDES A LIBRARY OF CLASSES AND INTERFACES:


The Java platform includes an extensive class library so that programmers can use already existing classes as it is, create subclasses to modify existing classes, or implement interfaces to augment the capabilities of classes. Both classes and interfaces contain data members (fields) and functions (methods), but there are major differences. In a class, fields may be either variable or constant, and methods are fully implemented. In an interface, fields must be constants, and methods are just prototypes with no further implementations. The prototypes give the method signature (the return type, the function name, and the number of parameters, with the type for each parameter), but the programmer must supply implementations. To use an interface, a programmer defines a class, declares that it implements the interface, and then implements all the methods in that interface as part of the class. These methods are implemented in a way that is appropriate for the class in which the methods are being used. Interfaces let one add functionality to a class and give a great deal of flexibility in doing it. In other words interfaces provide most of the advantages of multiple inheritance without its disadvantages. A package is a collection of related Java classes and interfaces. The following list, though not complete, gives examples of some Java packages and what they cover.

java.lang-- This package is so basic that it is automatically included in any


Java program. It includes classes dealing with numeric, strings, objects, runtime, security and threads.

Universal College of Engineering & Technology

Page 24

Truth Discovery with Multiple Conflict Information

Dept of CSE

Java.io --Classes that manage reading data from input streams and writing data to
the output streams.

Java.util --Miscellaneous utility classes, including generic data structures, bit sets,
time, date, the string manipulation, random number generation, system properties, notification and enumeration of data structures.

Java.net --Classes for network support. Java.awt --Classes that manage user interface components such as windows,
dialog boxes, buttons, check boxes, lists, menus, scrollbars, and text fields; the "AWT" stands Abstract Window Toolkit.

Java.awt. Image -- Classes for managing image data, including color models,
dropping, color filtering, setting pixel values, and grabbing snapshots.

Java.applet --The Applet class, which provides the ability to write applets, this
package also includes several interfaces that connect an applet to its document and to resources for playing audio.

Java.sql --The JDBC API, classes and interfaces that access databases and send
SQL statements.

Universal College of Engineering & Technology

Page 25

Truth Discovery with Multiple Conflict Information

Dept of CSE

HTML
To publish information for global distribution, one needs a universally understood language, a kind of publishing mother tongue that all computers may potentially understand. The publishing language used by the World Wide Web is HTML (Hyper Text Markup Language). HTML gives authors the means to Publish online documents with headings, text, tables, lists, photos, etc. Retrieve online information via hypertext links, at the click of a button Design forms for conducting transactions with remote services, for use in searching for information, making reservations, ordering products etc. Include spread - sheets, video clips, sound clips, and other applications directly in their documents.

A brief history of HTML:


HTML was originally developed by Tim Berners-Lee while at CERN, and popularized by the Mosaic browser developed at NCSA. During the course of the 1990s it has blossomed with the explosive growth of the Web during this time. HTML has been extended in a number of ways. The Web depends on Web page authors and vendors sharing the same conventions for HTML. This has motivated joint work on specifications for HTML. HTML 2.0 (November 1995) was developed under the aegis of the Internet Engineering Task Force (IETF) to codify common practice in late 1994. HTML (1993) and ([HTML.30]) (1995) proposed much richer versions of HTML, despite never receiving consensus in standards discussions, these drafts led to the adoption of new features. The efforts of the World Wide Web Consortiums HTML
Universal College of Engineering & Technology Page 26

Truth Discovery with Multiple Conflict Information

Dept of CSE

working group to codify common practice in 1996 resulted in HTML 3.2 (January 1997). Most people agree that HTML documents should work well across Achieving interoperability lowers costs to

different browsers and platforms.

content providers since they must develop only one version of a document. If the effort is not made, there is much greater risk that the Web will devolve into a proprietary world of incompatible formats, ultimately reducing the Webs commercial potential for all participants.

Tables:
Authors now have greater control over structure and layout (e.g. column groups). The ability of designers to recommend column widths user agents to display table data incrementally (as it arrives) rather than waiting for the entire table before rendering.

Compound documents:
HTML now offers a standard mechanism for embedding generic media objects and applications in HTML documents. The OBJECT element (together with its more specific ancestor elements IMG and APPLET provides a mechanism for including images, video, sound, mathematics, specialized applications, and other objects in a document. It also allows authors to specify a hierarchy of alternate renderings for user agents that dont support a specific rendering.

Universal College of Engineering & Technology

Page 27

Truth Discovery with Multiple Conflict Information

Dept of CSE

Style sheets:
Style sheets simplify HTML markup and largely relieve HTML of the responsibilities of presentation. They give both authors and users control over the presentation of documents- font information, alignment, colors, etc.

Style information can be specified for specific elements or groups of elements either within an HTML document or in separate style sheets. The mechanism for associating a style sheet with a document is independent of the style sheet language. Before the advent of style sheets, authors had limited control over rendering HTML 3.2 included a number of attributes and elements offering control over alignment, font size, and text color. Authors also exploited tables and images as a means for laying out pages. The relatively long time it takes for users to upgrade their browsers, means that these features will continue to be used for some time. However, since style sheets offer more powerful presentation mechanisms, the World Wide Web Consortium will eventually phase out many of HTMLs presentation elements and attributes.

Scripting:
Through scripts, authors may create smart forms that react at users fill them out. Scripting allows designers to create dynamic Web pages, and to use HTML as a means to build networked applications. The mechanism provided to associate HTML with scripts are independent of particular scripting languages.

Universal College of Engineering & Technology

Page 28

Truth Discovery with Multiple Conflict Information

Dept of CSE

Printing:
Sometimes, authors will want to make it easy for users to print more than just the current document. When documents form part of a larger work, the relationships between them can be described using the HTML LINK element or using W3Cs Resource Description Language.

Well formed HTML File:


Be sure that your source HTML files (to which the templates will be applied) are we formed. This means that the HTML is correctly formed with an opening <HTML> tag, a head section enclosed by <HEAD></HEAD> tags, a body section enclosed by <BODY></BODY> tags, and a closing </HTML> TAG. The template servlets parser is very strictif you dont have proper markup, you may see pages with no content. The default template file goes in the document base directory. In general, there needs to be only one default template file per document set. However, like the default template files, more can be used in subdirectories to override attributes present in the parent default template file.

Use of HTML Frames:


The template file cannot read in a frame set from a source file because only the contents of the HEAD section and the BODY section are copied. However, there is nothing to stop you from using files processed by the template servlet as content in your own frame set.

Universal College of Engineering & Technology

Page 29

Truth Discovery with Multiple Conflict Information

Dept of CSE

JAVA SCRIPT
JavaScript Facts: JavaScript Is Embedded Into HTML:
JavaScript code is usually embedded into a HTML document and is executed, within them. By itself JavaScript has no user interface. It relies on, HTML to provide the means of interaction with the user. Most Java Script objects have HTML tags they represent. JavaScript extends the capabilities of HTML by providing events to HTML tags and provide event driven code to execute it.

JavaScript Is Browser Dependent:


JavaScript depends on the Web browser to support it. If the browser does not support it, JavaScript code will be ignored. Internet Explorer 3.0 and, Netscape Navigator 2.0 onwards support JavaScript.

JavaScript Is an Interpreted Language:


JavaScript is interpreted at runtime by the browser before it is executed it. It is not compiled into a separate program like a .exe but remains part of the HTML file.
JavaScript Is a Loosely Typed Language:

JavaScript is very flexible compared to Java. You need not specify, The data type of a variable while declaring it. Also you need not declare Variables explicitly. It is perfectly legal to declare variable as and when you Require them.
Universal College of Engineering & Technology Page 30

Truth Discovery with Multiple Conflict Information

Dept of CSE

JavaScript Is an Object-Based Language:


JavaScript is an object-based language. You can work with objects that encapsulates data (properties) and behavior (method). However JavaScript object model is instance-based and there is no inheritance. This is the basic difference between an object oriented and an object-based language.

JavaScript Is Event-Driven:
HTML objects such as buttons are enhanced to support event Handlers. You can specify the functionality of a button.

JavaScript Is Not Java:


Java applet is stored in a separate file and connected to a HTML file Through the <applet> tag, and it is a strongly typed, object oriented compiled Language. JavaScript is loosely typed object based, interpreted language to create scripts. meant

JavaScript Is Multifunctional:
JavaScript can be used to Enhance HTML pages Develop client side applications Build to a certain extent client/server Web applications Create extensions to a Web server. Provide database connectivity without using CGI.

Universal College of Engineering & Technology

Page 31

Truth Discovery with Multiple Conflict Information

Dept of CSE

JAVA DATABASE CONNECTIVITY (JDBC) JDBC OVERVIEW:

JDBC is a Java API for executing SQL statements. (JDBC is a trademarked name and is not an acronym; nevertheless, JDBC is often thought of as understanding for "Java Database Connectivity".) It consists of a set of classes and interfaces written in the Java programming language. JDBC provides a standard API for tool/database developers and makes it possible to write database applications using a pure Java API.

Using JDBC, it is easy to send SQL statements to virtually any relational database. In other words, with the JDBC API, it is not necessary to write one program to access a Sybase database, another program to access a Oracle database, another program to access a Informix database, and so on.. One can write a single program using the JDBC API, and the program will be able to send SQL statements to the appropriate database. And with in an application written in Java programming language, one also doesn't have to worry about writing different applications to run on different platforms. The combination of Java and JDBC lets a programmer to write it once run it anywhere.

Java, being robust, secure, easy to understand, and automatically downloaded on a network, is an excellent language basis for database applications. What is needed is

Universal College of Engineering & Technology

Page 32

Truth Discovery with Multiple Conflict Information

Dept of CSE

a way for Java applications to walk to a variety of different databases. JDBC is the mechanism for doing this.

JDBC extends the concepts, which can be done in Java. For example, with Java and JDBC API, it is possible to publish a web page containing an applet that uses information obtained from a remote database or an enterprise can use JDBC to connect all this employees (even if they are using conglomeration of Windows, Macintosh, and Unix machines) to one or more internal databases via an Internet. With more and more programmers using the Java programming language, the need for easy database access from Java is continuing to grow.

MIS managers like the combination of Java and JDBC because it makes disseminating information easy and economical. Businesses can continue to use their installed databases and access information easily even of it is stored on different database management systems. Development time for new applications is short. Installation and version controls are greatly simplified. A programmer can write an application or an update once, put it on the server, and everybody has access to the latest version. And for business selling information services, Java and JDBC offer a better way of getting out information updates to external customers.

Universal College of Engineering & Technology

Page 33

Truth Discovery with Multiple Conflict Information

Dept of CSE

Introduction to J2SDK
The J2SDK package contains Sun's Java development environment. This is useful for developing Java programs and provides the runtime environment necessary to run Java programs. It also includes a plug-in for browsers so that they can be Java aware. The JDK comes in two flavors, a precompiled binary and a source package. Previously, the plugin included in the JDK binary package was unusable on LFS owing to incompatibilities with GCC-3 compiled browsers. This is not the case anymore. The source package requires registration at the Sun developer site and accepting the Sun Community Source License. The source code cannot be downloaded from some countries, so for users from those countries, the binary may be the only option. Even if you plan on compiling the JDK source, you will need to download the binary version to bootstrap the JDK build. Follow the link below to download both source and binary packages. When downloading the source (two files required), also download the Mozilla headers package available at the same location. To build from source, you'll end up downloading a total of four files.

Universal College of Engineering & Technology

Page 34

Truth Discovery with Multiple Conflict Information

Dept of CSE

SQL SERVER 2005


The next release of SQL Server is designed to help enterprises address these challenges. SQL Server 2005 is Microsofts next generation data management and analysis solution that will deliver increased security, scalability, and availability to enterprise data and analytical applications while making them easier to create, deploy, and manage. Building on the strengths of SQL Server 2000, SQL

Server 2005 will provide an integrated data management and analysis solution that will help organizations of any size to:

Build and deploy enterprise applications that are more secure, scalable, and reliable.

Maximize the productivity of IT by reducing the complexity of creating, deploying, and managing database applications.

Empower developers through a rich, flexible, modern development environment for creating more secure database applications.

Share data across multiple platforms, applications, and devices to make it easier to connect internal and external systems.

Deliver robust, integrated business intelligence solutions that help drive informed business decisions and increase productivity across your entire organization.

Control costs without sacrificing performance, availability, or scalability. Read on to learn more about the advancements SQL Server 2005 will deliver in three key areas: enterprise data management, developer productivity, and business intelligence.

Universal College of Engineering & Technology

Page 35

Truth Discovery with Multiple Conflict Information

Dept of CSE

Enterprise Data Management


In todays connected world, data and the systems that manage that data must always be available to your users. With SQL Server 2005, users and IT professionals across your organization will benefit from reduced application downtime, increased scalability and performance, and tight security controls. SQL Server 2005 will include enhancements to enterprise data management in the following areas:

Availability. Investments in high availability technologies, additional backup and restore capabilities, and replication enhancements will enable enterprises to build and deploy highly reliable applications.

Scalability. Scalability advancements such as partitioning, snapshot isolation, and 64-bit support will enable you to build and deploy your most demanding applications using SQL Server 2005.

Security. Enhancements such as secure by default settings and an enhanced security model will help provide a high level of security for your enterprise data.

Manageability. A new management tool suite, expanded self-tuning capabilities, and a powerful new programming model will increase the productivity of database administrators.

Interoperability. Through deep support for industry standards, Web services, and the Microsoft .NET Framework, SQL Server 2005 will support interoperability with multiple platforms, applications, and devices.

Universal College of Engineering & Technology

Page 36

Truth Discovery with Multiple Conflict Information

Dept of CSE

Developer Productivity

One of the key barriers to developer productivity has been the lack of integrated tools for database development and debugging. SQL Server 2005 will provide advancements that fundamentally change the way that database applications are developed and deployed. Enhancements for developer productivity will include:

Improved tools. Developers will be able to utilize one development tool for Transact-SQL, XML, Multidimensional Expression (MDX), and XML for Analysis (XML/A).

Expanded language support. With the common language runtime (CLR) hosted in the database engine, developers will be able to choose from a variety of familiar languages to develop database applications, including Transact-SQL, Microsoft Visual Basic .NET, and Microsoft Visual C#.NET.

XML and Web services. SQL Server 2005 will support both relational and XML data natively, so enterprises can store, manage, and analyze data in the format that best suits their needs.

Support for existing and emerging open standards such as Hypertext Transfer Protocol (HTTP), XML, Simple Object Access Protocol (SOAP), XQuery, and XML Schema Definition (XSD) will also facilitate communication across extended enterprise systems.

Universal College of Engineering & Technology

Page 37

Truth Discovery with Multiple Conflict Information

Dept of CSE

Business Intelligence
The challenge and promise of business intelligence revolves around providing employees with the right information, at the right time. Accomplishing this vision demands a business intelligence solution that is comprehensive, secure, integrated with operational systems, and available all day, every day. SQL Server will help companies to achieve this goal with SQL Server 2005. Business intelligence advancements will include:

Integrated platform. SQL Server 2005 will deliver an end-to-end business intelligence platform with integrated analytics including online analytical processing (OLAP); data mining; extract, transformation, and load (ETL) tools; data warehousing; and reporting functionality.

Improved decision making. Advancements to existing business intelligence features, such as OLAP and data mining, and the introduction of a new reporting server will provide enterprises with the ability to transform information into better business decisions at all organizational levels.

Security

and

availability.

Scalability,

availability,

and

security

enhancements will help to provide users with uninterrupted access to business intelligence applications and reports.

Enterprise-wide analytical capabilities. An improved ETL tool will enable organizations to more easily integrate and analyze data from multiple heterogeneous information sources. By analyzing data across a wide array of operational systems, organizations may gain a competitive edge through a holistic understanding of their business.

Universal College of Engineering & Technology

Page 38

Truth Discovery with Multiple Conflict Information

Dept of CSE

PLATFORM/PACKAGES SELECTED
WINDOWS XP:
Windows XP is an operating system produced by Microsoft for use on personal computers, including home and business desktops, laptops and media centers. First released to computer manufacturers on August 24, 2001, it is the second most popular version of Windows, based on installed user base. The name "XP" is short for "experience" highlighting the enhanced user experience. Windows XP, the successor to Windows 2000 and Windows ME, was the first consumer-oriented operating system produced by Microsoft to be built on the Windows NT kernel. Windows XP was released worldwide for retail sale on October 25, 2001, and over 400 million copies were in use in January 2006. It was succeeded by Windows Vista in January 2007. Direct OEM and retail sales of Windows XP ceased on June 30, 2008. Microsoft continued to sell Windows XP through their System Builders (smaller OEMs who sell assembled computers) program until January 31, 2009. On April 10, 2012, Microsoft reaffirmed that extended support for Windows XP and Office 2003 would end on April 8, 2014 and suggested that administrators begin preparing to migrate to a newer OS. The NT-based versions of Windows, which are programmed in C, C++, and assembly, are known for their improved stability and efficiency over the 9x versions of Microsoft Windows. Windows XP presented a significantly redesigned graphical user interface, a change Microsoft promoted as more user-friendly than previous versions of Windows. A new software management facility called sideby-side assembly was introduced to ameliorate the "DLL hell" that plagued 9x versions of Windows. It is also the first version of Windows to use product activation to combat illegal copying. During Windows XP's development, the project was codenamed "Whistler", after Whistler, British Columbia, as many Microsoft employees skied at the WhistlerBlackcomb ski resort. According to web analytics data generated by W3Schools, from September 2003 to July 2011, Windows XP was the most widely used operating system for accessing the w3schools website, which they claim is consistent with statistics from other

Universal College of Engineering & Technology

Page 39

Truth Discovery with Multiple Conflict Information

Dept of CSE

websites. As of October 2012, Windows XP market share is at 22.1% after having peaked at 76.1% in January 2007. Windows XP featured a new task-based GUI (Graphical user interface). The Start menu and taskbar were updated and many visual effects were added, including:

A translucent blue selection rectangle in Windows Explorer Drop shadows for icon labels on the desktop Task-based sidebars in Explorer windows ("common tasks") The ability to group the taskbar buttons of the windows of one application into one button, with a popup menu listing the window titles The ability to lock the taskbar to prevent accidental changes (Windows 2000 with Internet Explorer 6 installed had the ability to lock Windows Explorer and Internet Explorer toolbars, but not the taskbar) The highlighting of recently added programs on the Start menu Shadows under menus (Windows 2000 had shadows under mouse pointers, but not menus)

Universal College of Engineering & Technology

Page 40

Truth Discovery with Multiple Conflict Information

Dept of CSE

SYSTEM DESIGN

Universal College of Engineering & Technology

Page 41

Truth Discovery with Multiple Conflict Information

Dept of CSE

System Design
Once the analysis stage is completed, the next stage is to determine in broad outline form how the problem might be solved. During system design, we are beginning to move from the logical to physical level. System design involved architectural and detailed design of the system. Architectural design involves is identifying software components, decomposing them into processing modules and conceptual data structures, and specifying the interconnections among components. Detailed design is concerned with how to package processing modules and how to implement the processing algorithms, data structures and interconnections of standard algorithms, invention of new algorithms, and design of data representations and packaging of softwares products. Two kinds of approaches are available: Top down approach Bottom approach

Design of Database:
The design of database includes decision about the nature and content of the table, such as whether they are to be used for storing current information details, reference information of historical data. Which data entity to include in table and which entity to be the key.Characteristics of each entity such as name, data type and

length.Relationships between the data entities of different tables and


Universal College of Engineering & Technology Page 42

Truth Discovery with Multiple Conflict Information

Dept of CSE

databases.Normalization techniques to be adopted in designing the database. The data flow diagrams have been drawn and analyzed and the relationships between the entities have been identified. The objective in the development of database technology has been to treat data as an organizational resource and as an integrated whole. Database is an integrated collection of data. The database for the DFMS consists of data, which are organized with an aim to achieve three major objectives data integrity, data consistency and data independence.The Database forDistributed file maintenance system consists of the flow diagrams is shown below in UML diagrams.

Design of code:
Since information systems projects are designed with space, time and cost saving in mind, coding methods in which conditions, words, ideas or control errors and speed the entire process. The Purpose of the code is to facilitate the identification and retrieval of the information. A code is an ordered collection of symbols designed to provide unique identification of an entity or an attribute.

Design of Input:
Design of input involves the following decisions economic feasibility. Input data. Input medium The way data should be arranged or coded Validation needed to detect every step to follow when error occurs.
Universal College of Engineering & Technology Page 43

Truth Discovery with Multiple Conflict Information

Dept of CSE

Training and placements cell input controls provide ways to ensure that only authorized users access the system guarantee the valid transactions, validate the data for accuracy and determine whether any necessary data has been omitted. The primary input medium chosen is display. Screens have been developed for input of data using HTML. The validations for all Important inputs are taken care of through various events using JSP controls.

Design of Output:
Design of output involves the following decisions Information to present Output medium Output layout Output of this system is given in easily understandable, user-friendly manner; Layout of the output is decided through the discussions with the different users.

Design of control:
The system should offer the means of detecting and handling errors. Input controls provides ways to Valid transactions are only acceptable. Validates the accuracy of data. Ensures that all mandatory data have been captured.

Universal College of Engineering & Technology

Page 44

Truth Discovery with Multiple Conflict Information

Dept of CSE

All entities to the system will be validated. And updating of tables is allowed for only valid entries. Means have been provided to correct, if any by change incorrect entries have been entered into the system, they can be edited.

Design for user interface:


User interface is very important in any kind of the system development. The primary interface of the system with user is through menus. Menus are ideal for function that can be identified during the analysis of the phase and not expected top change in the future. Having user friendliness in mind the entire system has been developed. The screens have been developed. The screens have designed so as to facilitate the user to input data in any easy manner.

UML DIAGRAMS:
INTRODUCTION:
UML is a notation that resulted from the unification of Object Modeling Technique and Object Oriented Software Technology .UML has been designed for broad range of application. Hence, it provides constructs for a broad range of systems and activities.

Universal College of Engineering & Technology

Page 45

Truth Discovery with Multiple Conflict Information

Dept of CSE

AN OVERVIEW OF UML WITH FIVE DIAGRAMS

1. USE CASE DIAGRAMS


Use cases are used during requirements elicitation and analysis to represent the functionality of the system. Use cases focus on the behavior of the system from the external point of view. The actor areOutside the boundary of the system, whereas the use cases are inside the boundary of the system.

truth discovery
(from Logical View)

home

user search

truthfinder display the details

Fig 6.3e Use case diagram

Universal College of Engineering & Technology

Page 46

Truth Discovery with Multiple Conflict Information

Dept of CSE

2. CLASS DIAGRAMS
Class diagrams to describe the structure of the system. Classes are abstraction that specify the common structure and behavior of a set Class diagrams describe the system in terms of objects, classes, attributes, operations and their associations.

Fig 6.3f Class Diagram

Universal College of Engineering & Technology

Page 47

Truth Discovery with Multiple Conflict Information

Dept of CSE

3. SEQUENCE DIAGRAMS
Sequence diagrams are used to formalize the behavior of the system and to visualize the communication among objects. They are useful for identifying additional objects that participate in the use cases. A Sequence diagram represents the interaction that take place among these objects.

user

home page

Search

Truth Finder

Output

enters login details enter uid invalid invalid valid Query conflicting information true facts

true facts

Fig 6.3g Sequence diagram

Universal College of Engineering & Technology

Page 48

Truth Discovery with Multiple Conflict Information

Dept of CSE

4.COLLABORATION DIAGRAM:

A collaboration diagram emphasisies the organization of objects that participate in an interaction

Fig 6.3h Collaboration diagram

Universal College of Engineering & Technology

Page 49

Truth Discovery with Multiple Conflict Information

Dept of CSE

5. STATECHART DIAGRAMS
State chart diagrams describe the behavior of an individual object as a number of states and transitions between these states. A state represents a particular set of values for an object. The sequence diagram focuses on the messages exchanged between objects, the state chart diagrams focuses on the transition between states.

Home

Login

Query Process

Database

Conflicting Information

Truthfinder

Result

Fig 6.3i State chart diagram

Universal College of Engineering & Technology

Page 50

Truth Discovery with Multiple Conflict Information

Dept of CSE

6. ACTIVITY DIAGRAMS

An activity diagram describes a system in terms of activities. Activities are states that represents the execution of a set of operations. Activity diagrams are similar to flowchart diagram and data flow.

Fig 6.3j Activity diagram

Universal College of Engineering & Technology

Page 51

Truth Discovery with Multiple Conflict Information

Dept of CSE

SYSTEM TESTING

Universal College of Engineering & Technology

Page 52

Truth Discovery with Multiple Conflict Information

Dept of CSE

SOFTWARE TESTING TECHNIQUES


Software testing is a critical element of software quality assurance and represents the ultimate review of specification, designing and coding.

TESTING OBJECTIVES:
Testing is process of executing a program with the intent of finding an error. A good test case design is one that has a probability of finding an as yet undiscovered error. A successful test is one that uncovers an as yet undiscovered error. These above objectives imply a dramatic change in view port. Testing cannot show the absence of defects, it can only show that software errors are present.

TEST CASE DESIGNS:


Any engineering product can be tested in one of two ways.

White Box Testing: This testing is also called as glass box testing. In this
testing by knowing the specified function that a product has been designed to perform test can be conducted that demonstrates each function is fully operation at the same time searching for errors in each function. It is a test case design method that uses the control structure of the procedural design to derive test cases. Basis path testing is a white box testing.

Basis Path Testing:


1. 2. 3. 4. Flow graph notation Cyclomatic Complexity Deriving test cases Graphic matrices

Universal College of Engineering & Technology

Page 53

Truth Discovery with Multiple Conflict Information

Dept of CSE

.Control Structure Testing:


1. Condition Testing 2. Data flow testing 3. Loop testing

Black Box Testing:


In this testing by knowing the internal operation of a product, tests can be conducted to ensure that all gears mesh, that is the internal operation performs according to specification and all internal components have been adequately exercised. It fundamentally focuses on the functional requirements of the software. The steps involved in black box test case design are: i. ii. iii. iv. Graph based testing methods Equivalence partitioning Boundary value analysis Comparison testing

SOFTWARE TESTING STRATEGIES:


A software testing strategy provides a road map for the software developer. Testing is a set of activities that can be planned in advance and conducted systematically. For this reason a template for software testing a set of steps into which we can place specific test case design methods should be defined for software engineering process .any software testing strategy should have the following characteristic. 1. Testing begins at the module level and works outward toward the integration of the entire computer based systems. 2. Different testing techniques are appapriate at different points in time.
Universal College of Engineering & Technology Page 54

Truth Discovery with Multiple Conflict Information

Dept of CSE

3. The developer of the software and an independent test group conducts testing. 4. Testing and debugging are different activities but debugging must be accommodated in any testing strategy.

Unit testing
Unit testing focuses verification efforts in smallest unit of software design 1. Unit test considerations 2. Unit test procedures

Integration testing:
Integration testing is a systematic technique for constructing the program structure while conducting testing to uncover errors associated with interfacing. There are two types of integration testing: 1. Top-Down Integration: Top down integration is an incremental approach to construction of program structures. Modules are integrated by moving down wards throw the control hierarchy beginning with the main control module. 2. Bottom-Up:Integration: Bottom up integration as its name implies, begins construction and testing with automatic modules. 3. Regression testing:Inthiscontext of an integration test strategy, regression testing is the re execution of some subset of text that have already been conducted to ensures that changes have not propagate unintended side effects.

VALIDATION TESTING:
At the compilations of integration testing, software is completely assembled as a package; interfacing errors have been uncovered and corrected, and
Universal College of Engineering & Technology Page 55

Truth Discovery with Multiple Conflict Information

Dept of CSE

a final series of software tests- validation testing may begin. Validation can be fined in many ways, but a simple definition is that validation succeeds when software function in a manner that can be reasonably expected by the customer. Reasonable expectation is defined in the software requirement specification a document that describes all user-visible attributes of the software. The specification contains a section titled Validation criteria. Information contained in that section forms the basis for a validation testing approach.

VALIDATION TEST CRITERIA:


Software validation is achieved through a series of black-box tests that demonstrate conformity with requirement. A test plan outlines the classes of tests to be conducted, and a test procedure defines specific test cases that will be used in an attempt to uncover errors in conformity with requirements. Both the plan and procedure are designed to ensure that all functional requirements are satisfied, all performance requirements are achieved, documentation is correct and humanengineered, and other requirements are met. After each validation test case has been conducted, one of two possible conditions exists: (1) The function or performance characteristics conform to specification and are accepted, or (2) a deviation from specification is uncovered and a deficiency list is created. Deviation or error discovered at this stage in a project can rarely be corrected prior to schedule completion. It is often necessary to negotiate with the customer to establish a method for resolving deficiencies.

Configuration Review
An important element of the validation process is a configuration review. The intent of the review is to ensure that all the elements of software configuration have been properly developed, are catalogued, and have the
Universal College of Engineering & Technology Page 56

Truth Discovery with Multiple Conflict Information

Dept of CSE

necessary details to support the maintenance phase of the software life cycle. The configuration review some time called an audit.

Alpha and Beta Testing


It is virtually impossible for a software developer to foresee how the customer will really use a program. Instructions for use may be miss interpreted, strategy combination of data may be regularly used, and output that seemed clear to the tester may be unintelligible to a user in the field. When custom software is built for one customer, a series of acceptance tests are conducted to enable the customer to validate all requirements. Conducted by the end user rather than the system developer, an acceptance test can range from an informal test drive to a planned and systematically executed series of tests. In fact, acceptance test can be conducted over a period of weeks or months, thereby uncovering cumulative errors that might degrade the system overtime. If software is developed as a product to be used by many customers, it is impractical to perform formal acceptance tests with each one. Most software product builders use a process called alpha and beta testing to uncover errors that only the end user seems able to find. A customer conducts the alpha test at the developers site. The software used in a natural setting with the developer locking over the scheduler of the user recording errors and usage problems. Alpha tests are conducted in controlled environment. The beta test is conducted at one or more customer sites by the end user of the software. Unlike alpha testing, the developer is generally not present. Therefore, the beta test is a live application of the software in an environment that cannot be controlled by a developer. The customer records all the problems that are encountered during beta testing and reports these to the developer at regular intervals. As a result of problems reported during beta test, the software developer makes modification and then prepares for release of the software product to the entire customer base.
Universal College of Engineering & Technology Page 57

Truth Discovery with Multiple Conflict Information

Dept of CSE

Universal College of Engineering & Technology

Page 58

Truth Discovery with Multiple Conflict Information

Dept of CSE

Universal College of Engineering & Technology

Page 59

Truth Discovery with Multiple Conflict Information

Dept of CSE

Universal College of Engineering & Technology

Page 60

Truth Discovery with Multiple Conflict Information

Dept of CSE

CONCLUSION
In this project, we introduce and formulate the Veracity aims at resolving conflicting facts from problem, which

multiple websites and finding the true

facts among them. We propose TRUTHFINDER, an approach that utilizes the inter dependency between website trustworthiness and fact confidence to find trustable websites and true facts. Experiments show that TRUTHFINDER achieves high accuracy at finding true facts and at the same time identifies websites that provide more accurate information.

SCOPE FOR FUTURE ENHANCEMENT


In a real time this project shows all the best results for an every search object. The user can download and upload the data for requirements after his or her registration. The larger data can be loaded in database.

Universal College of Engineering & Technology

Page 61

Truth Discovery with Multiple Conflict Information

Dept of CSE

REFERRED
http://www java.sun.com http://www.java2s.com http://www.w3schools.com http://www.microsoft.com.com

Universal College of Engineering & Technology

Page 62

Potrebbero piacerti anche