Documenti di Didattica
Documenti di Professioni
Documenti di Cultura
PROJECT NAME
Project Coordinator : Dr T.V Ananthan Guide Name : Golda selia Batch : CSE C Group members: NISHIKANT (REG NO-91061101068 CSE-B IV yr) SURAJ KUMAR (REG NO-91061101113 CSE-C IV yr) UPENDRA KUMAR (REG NO-91061101114 CSE-C IV yr)
20
PROJECT TITLE
INTRODUCTION
Data mining is an important part of knowledge discovery
process that analyzes large enormous set of data and gives us unknown, hidden and useful information and knowledge.
A major objective of this project is to provide automated
query generation components so that casual users do not have to learn the query language in order to perform extraction.
In medical applications it will develop a tool that can help
METHODOLOGY
In this project we describe a novel approach for
information extraction in which extraction needs are expressed in the form of DATABASE QUERIES, which are evaluated and optimized by database systems.
Using database queries for information extraction
enables generic extraction and minimizes reprocessing of data by performing incremental extraction to identify which part of the data is affected by the change of components or goals.
The generated syntactic parse trees and semantic entity tagging of the processed text is stored in a relational database, called parse tree database (PTDB).
Extraction Phase
Extraction is then achieved by issuing database queries to PTDB. To express extraction patterns, we designed and implemented a query language called parse tree query language (PTQL) that is suitable for generic extraction.
ARCHITECTURAL DIAGRAM
In database module we are entering the database name Medline and view tables from that database and select one table as drug from list of tables which we want to proceed for our project. Then the table is added to information retrieval engine (IR), and then select process button. Then the information in that selected table is being processed
SEARCH and CLEAR, It shows a pipeline of text processing modules in order to perform relationship extraction. These include. Sentence splitting: Identifies sentences from a paragraph of text, Tokenization: Identifies word tokens from sentences, Named entity recognition: Identifies mentions of entity types of interest.
MODULE 2
Parser module:
sentences, and Obtains relationships based on a set of extraction. The extraction patterns over parse trees can be expressed in this proposed parse tree query language.
3. PARSER MODULE
PTQL query and transforms it into keyword-based queries and SQL queries, which are evaluated by the underlying RDBMS and information retrieval (IR) engine. It provides automated query generation components so that casual users do not have to learn the query language in order to perform extraction.
RESULT COMPARISON
Our experiments show that in the event of deployment
of a new module, our incremental extraction approach reduces the processing time by 89.64 percent as compared to a traditional pipeline approach. By applying our methods to a corpus of 17 million biomedical abstracts, our experiments show that the query performance is efficient for real-time applications. Our experiments also revealed that our approach achieves high quality extraction results.
THANK YOU