Sei sulla pagina 1di 3

Assignment No.

03 SEMESTER FALL 2011 CS614- Data Warehousing Asad Amanat BC070400930 Marks: 20
Total Marks: 20 Due Date: 27/12/2011

Question 1: [ 13 marks ] How FUZZY-FINGERPRINTING and LOCALITY-SENSITIVE HASHING are used in this paper to search the text? Which hashing techniques out of these two is best in your point of view? Justify your answer with reasons. Solution:

The paper introduced quite 2 different technique construction for hash based indexing. These principles are driven from fuzzy finger printing and locality sensitive respectively. According to an analysis of both hashing approaches to show their applicability for near duplication task and similarity search task and then we compare in term of precision and recall. The result of our search says that fuzzy finger printing outperform the locality sensitive hashing in task of near duplicate detection. Within the similarity task fuzzy finger printing achieves a clearly higher precision as compared to locality sensitive. Actually the locality sensitive hashing technique is used to control the different kind of high-dimensional vector based object representation. While the Fuzzy Finger-print is used for other domain of interests. Stated that the object of this domain can be characterized with a small set of discriminative features. In my opinion fuzzy Finger printing is best because our search based on the theoretical analysis of similarity of hash function and then focus on the on the practical implementation. We want to quantify the relation between the determinants of the Fuzzy Finger printing and then achieved the retrieval performance in order to build the hash index for special purpose retrieval task. Thats why we apply Fuzzy Finger printing as a main and vital technology in our text based plagiarism analysis.
Question 2: [ 7 marks ] In this paper three fundamental text retrieval tasks where hash-based indexing can be applied are discussed that are: (i) grouping, (ii) similarity search and (iii) classification.

Which task(s) is more suitable for a text based search retrieval? Provide reasons to support your answer. Solution:

Three fundamental text retrieval task where hash based indexing can be applied are grouping similarity search and classification. These are used for text based retrieval task in hash based indexing. Grouping and classification are similar among their functionality. Both used for finding the large results that need to be refined and visually prepared and clean Rom duplicates. Classification is used with a small number of classes. Out of these three the similarity search is most suited text retrieval task that is based on term query. It has some key feature. Many large search engines like Google, yahoo and Alta Vista use satisfying these kinds of information needs. To identify the similar document appropriate key words are extracted from document query. A no. k of term query are formed from these key words and the respective result sets match with document query. In this way we text based retrieval performs. It can assume dramatic proportion. This situation is aggravated if an application like plagiarism analysis requires the segmentation of the query document in order to realize a similarity search for one paragraph at a time. It is not as suitable for document to document similarity search as for short user queries. Its improves the search quality and search efficiency too. On a large collection of document with a very small amount of data we can perform this technique. We can say that our scheme outperform the slandered textual similarity search on the inverted representation both in term of quality and efficiency.

Potrebbero piacerti anche