Documenti di Didattica
Documenti di Professioni
Documenti di Cultura
[3 + 2 + 3 + 2]
1. Write a program to extract the contents (excluding any tags) from the following five
websites
https://en.wikipedia.org/wiki/Web_mining
https://en.wikipedia.org/wiki/Data_mining
https://en.wikipedia.org/wiki/Artificial_intelligence
https://en.wikipedia.org/wiki/Machine_learning
https://en.wikipedia.org/wiki/Mining
save the content in five separate .doc file. Considering a vector space model and do
the following operations according to the query “Mining of large data”
Bag-of-Words (Document set)
TF (Document set)
IDF (Document set)
TF-IDF (Document set)
TF-IDF (Query)
Normalized (Query)
Normalized - TF-IDF (Document set)
Cosine Similarity
Euclidean Distance
Document Ranking (Display Order)
Document Similarity (Among Documents)
2. Find out different types of centrality (degree, Betweenness, closeness) and prestige
(Degree, Proximity) using a graph dataset given in the following link.
http://snap.stanford.edu/data/wiki-Vote.txt.gz
3. Write a program to display the page rank of the given directed graph representing
web of six pages and damping factor is 0.9. Input to the program must be
adjacency matrix or adjacency list of the given web graph along with damping factor
4. Write a program to implement HITS algorithm for the graph shown in Question No. 3
and display the final authority score and hub score of all the nodes after stopping
criteria is attained. (Note.: Consider the same criteria as mentioned for Question
No. 3)