Sei sulla pagina 1di 2

Assessment - 3

CSE 3024: Web Mining Slot: L13 + L14


L45 + L46

Online Submission Deadline: 11th March 2020

TF-IDF, SNA, Page Rank and HITS

[3 + 2 + 3 + 2]

 Upload your code and result as a single pdf file in VTOP


 File should contain
 Question
 Input data
 Code
 Result / Output screen
_________________________________________________________________________

1. Write a program to extract the contents (excluding any tags) from the following five
websites
https://en.wikipedia.org/wiki/Web_mining
https://en.wikipedia.org/wiki/Data_mining
https://en.wikipedia.org/wiki/Artificial_intelligence
https://en.wikipedia.org/wiki/Machine_learning
https://en.wikipedia.org/wiki/Mining
save the content in five separate .doc file. Considering a vector space model and do
the following operations according to the query “Mining of large data”
 Bag-of-Words (Document set)
 TF (Document set)
 IDF (Document set)
 TF-IDF (Document set)
 TF-IDF (Query)
 Normalized (Query)
 Normalized - TF-IDF (Document set)
 Cosine Similarity
 Euclidean Distance
 Document Ranking (Display Order)
 Document Similarity (Among Documents)

2. Find out different types of centrality (degree, Betweenness, closeness) and prestige
(Degree, Proximity) using a graph dataset given in the following link.
http://snap.stanford.edu/data/wiki-Vote.txt.gz

3. Write a program to display the page rank of the given directed graph representing
web of six pages and damping factor is 0.9. Input to the program must be
adjacency matrix or adjacency list of the given web graph along with damping factor

CSE 3024: Web Mining Page 1


Assessment - 3
and threshold value (stopping criteria:- ε = 0.05). The program must print the result
after each of the following scenario:

a. Handling the nodes with no outgoing links


b. Stochastic matrix formation
c. Page rank of all the seven nodes after each iteration
d. Total number iteration count until stopping criteria.

4. Write a program to implement HITS algorithm for the graph shown in Question No. 3
and display the final authority score and hub score of all the nodes after stopping
criteria is attained. (Note.: Consider the same criteria as mentioned for Question
No. 3)

CSE 3024: Web Mining Page 2

Potrebbero piacerti anche