Final Report

i
A Recommender Evaluation: Towards a

Production Recommender System
A Report Submitted
In Partial Fulfillment of the Requirements for the Degree of
Bachelors of Technology in Computer Science & Engineering
Rishabh Tyagi(11-1-5-098)
Nikhil Kharode(11-1-5-001)
Nabajyoti Hazarika(11-1-5-088)
Rajiv Mandal(10-1-5-038)
Under The Guidance of
Prabhakar Sarma Neog
Department of Computer Science & Engineering

NATIONAL INSTITUTE OF TECHNOLOGY SILCHAR
2015
DECLARATION
Report Title: A Recommender Evaluation: Towards a Production

Recommender System
Degree for which the Report is submitted: Bachelors of Technology
I declare that the presented report represents largely my own ideas and
work in my own words. Where others ideas or words have been
included, I have adequately cited and listed in the reference materials.
The report has been prepared without resorting to plagiarism. I have
adhered to all principles of academic honesty and integrity. No falsified
or fabricated data have been presented in the report. I understand that
any violation of the above will cause for disciplinary action by the
Institute, including revoking the conferred degree, if conferred, and can
also evoke penal action from the sources which have not been properly
cited or from whom proper permission has not been taken.
...........................................
Date:
Rishabh Tyagi (11-1-5-098)

Nikhil Kharode(11-1-5-001)
Nabajyoti Hazarika(11-1-5-088)
Rajiv Mandal(10-1-5-038)
CERTIFICATE
This is to certify that the project work entitled " A Recommender Evaluation
: Towards a Production Recommender System " submitted by Rishabh
Tyagi(11-1-5-098), Nikhil Kharode(11-1-5-001), Nabajyoti Hazarika(11-15-088), Rajiv Mandal(10-1-5-038) in the partial fulfillment of award of
degree of Bachelors of Technology in Computer Science & Engineering at
National Institute of Technology, Silchar, was done under the guidance and
supervision of Prabhakar Sarma Neog . The matter presented in the report
has not been submitted for the award of any other degree of this or any
other institute/university. I wish them all success in life.
.........................................
Prabhakar Sarma Neog

Department of Computer Science and Engineering
Date
ABSTRACT
Designing a recommender system involves many considerations. Apart from

producing accurate recommendation, a production quality recommender
system must be able to work in real time under constraints like memory,
latency and so on. Structure of data set and availability of domain specific
information also plays a major part while choosing the appropriate algorithm
and their implementation. We have conducted many experiments to study the
application and outcome of various assemblies of algorithms towards
designing of recommender system for a given domain.
vii
ACKNOWLEDGMENTS
We take this occasion to render our deep sense of gratitude and tribute to our
supervisor, Prabhakar Sarma Neog , Assistant Professor for his constant and
valuable guidance in the truest sense throughout the course of the work. It was
his encouragement and support from the initial to the final level enabled us to
develop an understanding of the subject. Every time we had a problem, we
rushed to him for advice, and he never ever let us down. His timely suggestions
helped us to circumvent all sorts of hurdles that we had to face throughout our
work. We are deeply indebted for his inspiration, motivation and guidance.
We wish to acknowledge the continuous support and blessings of our parents

and family which made this report possible. Although they were physically far
away from us, their immense faith and wish is gratefully acknowledged.
Finally we believe this research experience will greatly benefit our

career in the future.
Rishabh Tyagi
Nikhil Kharode
Nabajyoti Hazarka
Rajiv Mandal
vi
Contents
ABSTRACT
Ii
ACKNOWLEDGMENTS
Ii
List of Figures
Ii
List of Tables
Iii
1 Introduction
1.1 Motivation . . . . . . . . . . . . . . . . . . . . . . . . . . .
2
2
1.2 Objective . . . . . . . . . . . . . . . . . . . . . . . . . . .
2 Literature Review
3 Background
3.1 Content-based Recommendation . . . . . . . . . . . . . . .
8
9
3.2 Collaborative Filtering(CF) . . . . . . . . . . . . . . . . . .
11
3.2.1 Memory-based Collaborative Filtering . . . . . . . .
12
3.2.2 Model-based Collaborative Filtering . . . . . . . . .
14
3.3 Exploring Similarity Matrices . . . . . . . . . . . . . . . .
15
3.4 User neighborhood . . . . . . . . . . . . . . . . . . . . . .
19
4 Issues in Recommender System

4.1 Cold-start problem . . . . . . . . . . . . . . . . . . . . . .
4.2 Evaluation . . . . . . . . . . . . . . . . . . . . . . . . . . . 21
4.3 Privacy and Trust . . . . . . . . . . . . . . . . . . . . . . . 21
4.4 Trust based Recommender Systems..21
5 Single Valued Decomposition(SVD++) and Latent Factors.24
5.1 Baswline Estimates24
5,2 Latent Factor Model..25
5.3 Model Learning..26
20
20
6 Data sets
6.1 MovieLense- MovieLense Dataset (100k) . . . . . . . . . .
27
27
6.2 Jester data set . . . . . . . . . . . . . . . . . . . . . . . . .
28
6.3 MS Web data set . . . . . . . . . . . . . . . . . . . . . . .
29
7 Implementation
30
7.1 About Apache Mahout . . . . . . . . . . . . . . . . . . . .
31
7.2 Introduction to recommendation in Apache Mahout . . . . . 32
7.3 Installation of Apache Mahout . . . . . . . . . . . . . . . .
33
6.3.1 Java and IDE . . . . . . . . . . . . . . . . . . . . .
33
6.3.2 Installing Maven . . . . . . . . . . . . . . . . . . .
34
6.3.3 Building A Recommender Engine . . . . . . . . . .
34
8 Evaluation Metrics
8.1 Predictive Accuracy Metrics . . . . . . . . . . . . . . . . .
36
36
8.2 Classification Accuracy Metrics . . . . . . . . . . . . . . .
37
8.3 Rank Accuracy Metrics . . . . . . . . . . . . . . . . . . . .
38
9 Results and Explanation
48
10 Conclusion
49
A Netflix Prize Competition
50
ii
List of Figures
3.1 Ted.com uses a top-k item recommendation approach to rank
items. . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
10
3.2 User-based collaborative filtering. . . . . . . . . . . . . . .
13
5.1 MovieLense dataset . . . . . . . . . . . . . . . . . . . . . .
22
5.2 Jester dataset . . . . . . . . . . . . . . . . . . . . . . . . .
23
5.3 MSweb dataset . . . . . . . . . . . . . . . . . . . . . . . .

24
Diagrammatic representation of precision-recall . . . . . .
7.1 .
32
8.1
User based Vs Item based varying training data set size . . . 33
8.2 Evaluation for similarity methods . . . . . . . . . . . . . . .
34
8.3 Evaluation for Nearest Neighborhood . . . . . . . . . . . .

35
8.4 Evaluation for Threshold Neighborhood . . . . . . . . . . . 37
8.5 Performance of UBCF and IBCF against Jester . . . . . . .
37
8.6 Performance of UBCF and IBCF against Jester . . . . . . . 38

8.7 Performance of UBCF and IBCF against MSWeb . . . . . . 39
8.8 Performance of UBCF and IBCF against MSWeb . . . . . . 40
iii
List of Tables
3.1 Movie rating scenario user rating 1-5 scale . . . . . . . . .
12
3.2 Item-based collaborative filtering . . . . . . . . . . . . . . .
14
3.3
Five users has given rating from(1 to 5)on three itemset . . . 17
3.4
Rearranging the Rating table with lower to higher Ranking . 17
3.5 Correlation to user 1 with other users . . . . . . . . . . . . .
18
3.6 Correlation to user 1 with other users . . . . . . . . . . . . .
18
6.1
Recommender System Software freely available for research. 25
7.1 Confusion matrix of two classes when considering the

retrieval of document . . . . . . . . . . . . . . . . . . . . . . 31
8.1 Comparison with different similarity metrics using nearestneighborhood . . . . . . . . . . . . . . . . . . . . . . . . . 35
8.2 Comparison with different similarity metrics using threshold
neighborhood . . . . . . . . . . . . . . . . . . . . . . . . . 36
Chapter 1
Introduction
1.1 Motivation
Today we are living in the era of Internet and Digitization. These together is
bring in unification of world/boundaries. Digital media and technology is one
of the fastest growing concepts in the world. It has changed the way we do
just about everything. Due to this digital revolution and web technologies, the
amount of information in the world is increasing with high volume, velocity and
variety. Processing of these overloaded data is very much necessary due to
limited intake capacity of human and limited time to make decisions.
Recommender system try to process these overloaded data in a personalized
way of a particular interest of a user. In another way we can also say that
recommender systems are such intelligent systems which can capture the
effort of some available users decisions to help a large community of other
users back to make their decisions quickly. In this way it also help users to
locate possible items of interest more quickly by filtering and ranking them in
a personalized way. Some of these systems provide the end user not only
with such a personalized item list but also with an explanation which describes
why a specific item is recommended and why the system supposes that the
user will like it [1]. I feel this field has both the power of research and also a
source to support people.
In recent years the popularity of Recommender System is increasing day by
day as it is incorporated with a variety of applications like movies, music [2],
news, books, research articles [3], search queries, social tags[4], and
products in general. Now a days Recommender System is also
integrated with experts [5], jokes, restaurants [6], financial services,
life insurance, persons (online dating), and Twitter [7].
Another accelerator for the research in the field of Recommender Systems is the
Netflix Prize competition started in October 2006. The competition is held by
Netflix, an online DVD rental service, and seeks to improve the accuracy of
predictions about how much someone is going to like a movie based on their
preferences. Many scientist, students, engineers and enthusiasts were attracted
by the free accessible large scale data set and the public announced $1,000,000
prize money for the winner of the competition[8].
1.2 Objective
Our first objective is to study the current research trend on recommender
system. We investigate the existing recommendation algorithms, which are
mainly collaborative filtering methods along with their mechanisms. For
experimentation we produce some tools and techniques. We analyze the
similarity
methods
mathematically
and
implemented
them
in
recommendation algorithm. These analysis are done to find an effective

method which may fit for a given problem domain.
After implementing the recommendation algorithms we got some prediction
results. To measure the accuracy of our predictions we evaluated them using
some prediction based as well as information retrieval(IR) methods. So, our
second objective is evaluation of traditional similarity algorithms.
Chapter 2
Literature Review
Herbert Simon in his book " The sciences of the artificial" mentioned about
the necessity of recommender system as follows - As of the mid-1990s the
lesson has still not been learned. An information superhighway is proclaimed without any concern about the traffic jams it can produce or the
parking spaces it will require. Nothing in the new technology increases the
number of hours in the day or the capacities of human beings to absorb
information. The real design problem is not to provide more information to
people but to allocate the time they have available for receiving information
so that they will get only the information that is most important and relevant
to the decisions they will make. The task is not to design informationdistributing systems but intelligent information-filtering systems. [9; 10]
In the year 1992, Goldberg et al.[11]has made an experimental mail system
named as Tapestry. Their main aim was to tackle overloaded amount of
documents in electronic mail system which was hugely popular at that time.
Other mail systems at that time use contents of a document as a measure of
filtering. But Goldberg and his friends use a new method along with content
based filtering where human decision is involved. They named that method
as collaborative filtering. They defined it as follows "Collaborative filtering
simply means that people collaborate to help one another perform filtering by
recording their reactions to documents they read." However, other names
were also suggested in the beginning of the recommender system research,
4
such as social filtering [12] and social information filtering [13]. As men-tioned,
the more general term recommender system was coined by Resnick and
Varian in 1997 and it subsequently became the most popular term for these
systems[10]. Tapestry system has limitation as it is designed for small
workgroups where the members more or less known to each other.
GroupLense[14;15] first introduced an automated collaborative

filtering system using a neighborhood-based algorithm. GroupLense
provided personalized predictions for Usenet news articles. To find
similar users they used several similarity methods like Pearsoncorrelations. The Ringo music recommender [13] and Bellcore Video
Recommender [12] expanded upon the original GroupLense algorithm.
Shardanand and Maes (1995) tested four possible algorithms in Ringo. They
set aside 20% of the ratings in each profile to form a target set (ratings that they
attempted to predict) while the remaining 80% constituted the source set. The
criteria for evaluation was the mean absolute error (needs to be minimized), the
standard deviation of the errors (also to be minimized), and T, or the percentage
of target values for which the algorithm can compute pre-dictions (needs to be
maximized). The base case against which the algorithms were compared was
the average of all ratings received by an artist in the data set (used as a proxy
for the predicted score for the artist)[16].
The four algorithms that Shardanand and Maes (1995) evaluated were the
mean squared differences algorithm, the Pearson algorithm (Pearson r
correlation coefficient to measure similarity between profiles), the constrained
Pearson r algorithm, and the artist-artist algorithm. The first three are what today would be called user-based collaborative filtering and the last represents an
item-based collaborative filtering approach. The best algorithm overall,
considering both accuracy and the percentage of target values that can be
predicted, was the constrained Pearson r. Incidentally, Pearson r correlation
coefficient used by GroupLens was not very efficient according to the

tests by Shardanand and Maes (1995)[16].
In year 1997,[6] from Stanford University "Fab" is created which is
a hybrid recommendation system for web content. Authors defined
that in Fab: when a document is recommended to a user u either
because it corresponds to the profile of user u (content-based
filtering) or because it has been appreciated by a user who has a
similar profile to u (collaborative filtering).Fab uses the vector space
model (Salton and McGill, 1983) as the representation of items.
DailyLearner [17] is a news recommender service which comes into
picture in year 1998. It is based on a centralized content-based
recommender system called Adaptive Information Server (AIS), also
developed by the authors. Web users can, after reading a news article,
send feedback to the system using 3 classes: "interesting, "not interesting"
and "known". Moreover, if a user requests more information on news, this
news is classified as "interesting" automatically.[16]
(Herlocker, 2000) describes a recommender system as a system that
predicts what items a user will find interesting or useful. This definition of a
recommender system is broader than the definition of Resnick and Varian;
it does not imply that opinions of other people have to be used to
recommend items; it also allows recommender systems to use other
mechanisms to predict what users find interesting [18].
As research efforts on recommender systems started to take place in
academia, ecommerce sites, led by the pioneering efforts of Amazon.com that
went online in 1995, also saw their potential and started to implement them to help
their users find items of interest out of the huge number of items available[10].
In the survey of [19] it is found that, [20] authors first proposed the prediction
of missing values in the U-I matrix by using the side information of items(e.g: title,
genre
of
movie)and
then
deploying
user-based
CF
to
generate
recommendation. [21]focused on tags as side information. Geotags are also

used to extend memory based CF for personalized location prediction[22],
restaurant recommendation[23] etc. Tagommenders[24] were proposed as a
group of tag-based recommendation algorithms that utilize the inferred
preferences for tags to predict the users preferences for items. Social networks
can naturally provide neighborhood for each user which can be use as a
replacement of neighborhood . [25] gives example trustees in a trust network.
Tidal Trust[26]and Mole Trust[27] are the two most famous models that predict
a users rating on an item by aggregating the ratings from the users trustees.
Chapter 3
Background
Generation of information is boundless in these days. Though it is helpful but in
some cases people feel more pain than gain. To overcome the pain which is
obtained from countless information around us, people always takes help of
machines and generated some machine learning tools to help people. Content
based techniques deal with the contents of a documents and produced a
categorized list out of them. It makes some part easy to deal with large amount
of data. Search engines where some query can be thrown to find some information are examples of content based filtering. Another solution available
related to content based is top-k recommendation. In this approach a list is
maintained with most popular items which is common to all users. For ex-ample:
www.ted.com is a website where the most popular talks can be found in top-k list
manner. Users can sort items bases on the different approaches such as overall
popularity (most viewed), popularity in the past week (most emailed this week),
or popularity in the past month (most popular this month) among others[28]. A
figure is given for the system. Main problem is that in both cases these
recommendations are not customized to users interest.
In mid 1990s the first paper on Collaborative filtering appeared. Later it became
more famous as Recommender System. Traditional collaborative filtering
techniques take earlier user to item relation as some ratings to predict the future
interest of that users or may be others. In content based filtering the
methodologies use some type of similarity score to match the query describing
the content with the individual titles or items, and then present the
user with a ranked list of suggestions [29]. But on the other hand
Collaborative filtering methodologies donot use any information regarding
the actual content(eg: words, authors, description)of the items, but are
rather based on usage or preference patterns of other user [30; 29].
Collaborative filtering method is based on the data structure as user-item
matrix having users and items consisting of their rating scores.
Next we visit various techniques used in recommender systems

under the following headings:
1. Content based
2. Collaborative
(a) Memory based
i. User based
ii. Item based
(b) Model based
3.1 Content-based Recommendation

The problem of recommending items from some fixed database has
been studied extensively, and two main paradigms have emerged. One
is content based and another one is Collaborative filtering. In a contentbased recommender system, keywords or attributes are used to
describe items. A user profile is built with these attributes (when a user
upvotes or "likes" something his profile is going to be updated and vice
versa). Items are ranked by how closely they match the user attribute
profile, and the best matches are recommended.
Keyword matching or the Vector Space Model(VSM)is the simplest and fast
method for content based recommender which uses term weighting scheme, TFIDF(Term Frequency-Inverse Document Frequency). LetD = fd1; d2; ::; dN g
denote a set of documents or items, and T = ft1; t2; :::; tng be the set of words
Figure 3.1: Ted.com uses a top-k item recommendation approach to rank items.
in the corpus or attributes of items[31]. T is obtained by applying some

standard natural language processing operations, such as tokenization,
stopwords removal, and stemming [31]. Each document d j is represented
as a vector in a n-dimensional vector space, so d j = fw1 j; w2 j; :::; dn jg,
where wk j is the weight for term tk in document d j.[28]
N is the total number of documents or items, nk is the number of

documents or items where term tk appears at least once.
where the maximum is computed over the frequencies fz, j of all terms tz that
occur in document dj. In order for the weights to fall in the [0,1] interval and
10
for the documents to be represented by vectors of equal length, weights obtained by Equation (3.1) are usually normalized by cosine normalization:[28]
3.2 Collaborative Filtering(CF)

Collaborative filtering (CF) is the current mainstream approach used to
build web-based recommender system. The goal of a collaborative
filtering(CF) is to suggest new items by predicting the preference of a
certain item for a particular user based on the users previous likings,
dislikings and the opinions of other like-minded users.[32]
Problem Definition: In a standard setting of CF, there is a set of users(e.g.
M users) and a set of items (e.g. N items). The preferences of users to
individual items can be denoted by a U-I matrix R, in which the value Ri j
denotes the preferences of users i to item j, if Ri j>0. The users preferences
can be expressed either directly, such as by ratings, or
indirectly using binary values indicating whether the user has clicked,
viewed, or purchased the items. It is noticeable that the preferences of
users to items are usually very limited which makes the matrix R sparse.
Ri j =? is used to denote unknown preference from user i to item j [33].
Under this setting, the problem can be defined as: Given a User-Item matrix R
that represents a known set of M users preferences to N items, recommend to
each user a list of items that are ranked in a descending order of relevance to
the users interest. GroupLens[14; 15] first introduced an automated
collaborative filtering system using neighborhood based algorithm. GroupLens
provided personalized predictions for Usenet news articles.
11
Titanic
Alice 5
Bob ?
Jim
2
Kate ?
Inception
?
1
4
2
Toystory
3
?
?
?
Taken
?
4
?
?
Skyfall
?
?
?
3
Matrix
1
?
5
?
Table 3.1: Movie rating scenario user rating 1-5 scale
3.2.1 Memory-based Collaborative Filtering

In a memory-based[34] approach, the recommender system aims to
predict the missing ratings based on either similarity between users
or similarity between items. The collaborative filtering algorithms that
use similarities among users are called user based collaborative
filtering[35]. Hypothesis behind user based collaborative filtering is
that similar users have similar taste. Hence, to make a reasonable
recommendation, it finds similar users, then uses these users taste
to make a recommendation for the target user have similar tastes.
In another approach where similarity between items is considered for
missing rating prediction it is known as item based collaborative filtering
[32]. The second approach is built upon the consistency of a users taste.
If a user liked a product, she will like similar products as well.
User Based CF
In a user-based approach, the recommender ranks users based on the similarity among them, and uses suggestions provided by most similar users to recommend new items [75]. The user-based approach of collaborative filtering
systems are not as preferred as an item-based approach due to the instability in
the relationships between users. For a system which handles a large user base,
even the smallest change of the user data is likely to reset the entire group of
similar users. Explanation user-based collaborative filtering example in fig:
Alice,Bob and Joe are similar to Jack based on their history(movies in the left
hand side). Now we want to recommend a new movie from the list in the right
hand side to Jack. Based on the past preferences of Alice, Bob
12
Figure 3.2: User-based collaborative filtering.
and Joe, "The Da Vinci Code" is liked by all of the three, while "Black
Swan" is only liked by two of them. Therefore "The Da Vinci Code" is
likely to be recommended to Jack.
Item Based CF
Item-based CF[32; 36] is a model-based approach which produces recommendations based on the relationship between items inferred from the rating
matrix. The assumption behind this approach is that users will prefer items
that are similar to other items they like. The model-building step consists of
calculating a similarity matrix containing all item-to-item similarities using a
given similarity measure. Popular are again Pearson correlation and Cosine
similarity. All pairwise similarities are stored in a n n similarity matrix S. To
reduce the model size to n k with k n, for each item only a list of the k most
similar items and their similarity values are stored. The k items which are most
similar to item i is denoted by the set S(i) which can be seen as the
neighborhood of size k of the item. Retaining only k similarities per item improves the space and time complexity significantly but potentially sacrifices
some recommendation quality [32].
13
To make a recommendation based on the model we use the similarities to

S
i1
i2
i
3
i4
i5
i6
i7
i8
i1
0.1
0
0.3
0.2
0.4
0
0.1
-
i2
0.1
0.8
0.9
0
0.2
0.1
0
0.0
i3
0
0.8
0
0.7
0.1
0.3
0.9
4.6
i4
0.3
0.9
0
0
0.3
0
0.1
2.8
i5
0.2
0
0.4
0
0.1
0
0
-
i6
0.4
0.2
0.1
0.3
0.2
0
0.1
2.7
i7
0
0.1
0.3
0
0.1
0
0
0.0
i8
0.1
0
0.5
0.1
0
0.1
0
-
ua
2
?
?
?
4
?
?
5
Table 3.2: Item-based collaborative filtering
calculate a weighted sum of the users ratings for related items.
Table 3.2: shows an example for n = 8 items with k = 3. For the similarity
matrix S only the k = 3 largest entries are stored per row (these entries are
marked using bold face). For the example we assume that we have ratings
for the active user for items i1,i5, and i8. The rows corresponding to these
items are highlighted in the item similarity matrix. We can now compute the
weighted sum using the similarities (only the reduced matrix with the k = 3
highest ratings is used) and the users ratings. The result (below the matrix)
shows that i3 has the highest estimated rating for the active user.
3.2.2 Model-based Collaborative Filtering

These methods rely on the latent factors of users and items contributing to the
rating matrix. The rating matrix is decomposed into two lower rank matrices
comprising of the feature vectors of the users and items respectively. Predic-tion
is done by minimizing an object function representative of the features of the
users and items described by their feature vectors.
14
3.3 Exploring Similarity Matrices

As memory based methods act directly on rating matrix, where ratings are
expressed by users ,to measure similarity between two users some
similarity matrices are used. There are several similarity algorithms that
have been used in the collaborative filtering recommendation algorithm
[29; 32] Pearson correlation, cosine vector similarity, adjusted cosine
vector similarity, mean squared difference and Spearman correlation.
Euclidean distance similarity : This similarity method measure the distance
between users. In this case users are considered as points in a space of many
dimensions(dimensions are related to number of items). This method compute
the Euclidean distance
0 0
d between two such user points. This value alone
doesnt constitute a valid similarity metric, because larger values would mean
more-distant, and therefore less similar, users. The value should be smaller
when users are more similar. Therefore 1 (1 + d) is implemented. When distance
is 0 it will indicate users have identical preferences and result will be
1,decreasing to 0 as d increases. This similarity method never returns a negative
value, but larger values still mean more similarity.[37]
Pearson correlation based similarity : This algorithm is an alternate approach
to measure similarity between user profiles and it is based on the use of the
standard Pearson r correlation coefficient. The possible values of the Pearson coefficient ranges from -1 to +1 including 0. Values near -1 indicate a
negative correlation while values close +1 indicates a positive correlation; a
value of 0 shows no correlation at all. Once Pearson coefficient has been
calculated the recommendation can be done as in the previous way by
averaging the values of the most similar profiles. One important characteristic
of this algorithm is that it takes into account not only positive correlation but
also negative correlation to make the predictions.
There are two issues with computing similarity between users using Pearson
15
correlation. One is the question of what to do with items that one user
has rated but the other has not. The straightforward, statistically
correct way to handle this is to only consider items that both users
have rated, and to do this consistently. This results in the following
formula, where Iu is the items rated by u:
ru should be computed just over the ratings in Ia \Iu.

The other issue is the fact that users with few rated items in common will have
very high similarities. The Pearson correlation over a single pair of ratings is
undefined (division by 0); over 2 pairs it is 1. But if each of the users has rated
many items, the fact that they have rated two of the same items is not a good
basis for saying that they are perfectly similar. The typical way to deal with
this is significance weighting [35] ; multiply the similarity by
min(jIu\Ivj;50)
50
decreasing similarity linearly until the users have at least 50 rated

items in common. This is somewhat ad-hoc, but improves the
performance of recommenders using Pearson correlation
Cosine vector similarity : Here preferences of users is considered as points in
multidimensional space(items). Cosine value ranges from -1 to 1. Most similar

users will produce a smaller cosine angle nearer to 1 point and dissimilar one
will produce large cosine angle(180 degree) nearer to -1. In mahout
implementation this concept is combined with Pearson-correlation[37].
Spearman correlation : The Spearman correlation is an interesting variant on
the Pearson correlation, for our purposes. Rather than compute a cor-relation
based on the original preference values, it computes a correlation based on
the relative rank of preference values. Imagine that, for each user, their leastpreferred items preference value is overwritten with a 1. Then
16
the next-least-preferred items preference value is changed to 2, and

so on. To illustrate this, imagine that you were rating movies and gave
your least-preferred movie one star, the next-least favorite two stars,
and so on. Then, a Pearson correlation is computed on the
transformed values. This is the Spearman correlation.
This process loses some information. Although it preserves the essence of the
preference valuestheir orderingit removes information about exactly how
much more each item was liked than the last. This may or may not be a good
idea; its somewhere between keeping preference values and forgetting them
entirelytwo possibilities that weve explored before [37]. Eample:
Procedure to calculate Spearman Rank Correlation:

User
User 1
User 2
User 3
User 4
User 5
Item 101
3.0
1.0
1.0
2.0
3.0
Item 102
2.0
2.0
2.0
Item 103
1.0
3.0
1.0
1.0
Table 3.3: Five users has given rating from(1 to 5)on three itemset
1.Sort the data by the first column(Xi).Create a new column(xi)and

assign it the ranked values 1,2,3....,n.
2.Next, sort the data by the second column(Yi).Create the fourth
column (yi) and similarly assign it ranked values 1,2,3...,n.
3.Create a column (di)to hold the differences between the two rank
2
columns(xiandyi). 4. Create a column to hold (di) .
2
2
with di found ,add them to find di = 8. (n) is the no of samples or items.In
User 1(Xi)
Item103(1.0)
Item102(2.0)
Item101(3.0)
User 2(Yi)
Item103(3.0)
Item102(2.0)
Item101(1.0)
Rank(xi)
1
2
3
Rank(yi)
3
2
1
di(xi-yi)
-2
0
2
di2
4
0
4
Table 3.4: Rearranging the Rating table with lower to higher Ranking
this example no of items is 3.These values are substituted into the equation
17
correlation between user 1 and user 2 is -1.

User
User 1
User 2
User 3
User 4
User 5
Item 101
3.0
1.0
1.0
2.0
3.0
Item 102
2.0
2.0
2.0
Item 103
1.0
3.0
1.0
1.0
Correlation to user 1
1.0
-1.0
1.0
1.0
Table 3.5: Correlation to user 1 with other users
Tanimoto coefficient similarity : There are also present such user similarity method
that ignore preference values entirely.They dont consider whether a user

expresses a high or low preference for an item - only that the user expresses
a preference at all. TanimotoCoefficientSimilarity is a function in mahout that
uses Tanimoto coefficient method.This value is also known as Jaccard
coefficient.Basically it is the ratio of the size of overlapping area to the size of
union of two users preferred items. When two users choice com-pletely
match, it gives 1.0 similarity. When they have nothing in common then 0.0
similarity. This method never produce -ve value [37]. But using some simple
maths result range can be expanded from 1 to -1, that math is
similarity = 2.similarity -1.

But this method should be used only when data set contains boolean
preferences like yes or no and no preferences to begin with.
Item
101 102 103 104
User 1
User 2
User 3
User 4
User 5
105 106 107

-
similarity to user1
1.0
0.75
0.17
0.4
0.5
Table 3.6: Correlation to user 1 with other users
18
3.4 User neighborhood

Two ways to consider neighborhood selection:
Fixed-size neighborhoods
It define a neighborhood of most similar users by picking a fixed
number of closest neighbors. For some data set this value should be
set at optimal level so that good evaluation result obtained.
Threshold-based neighborhood
The threshold should be between -1 and 1, because all similarity
metrics return similarity values in this range. Higher threshold value
means less neighbors.
19
Chapter 4
Issues in Recommender System
Some of the common problem issues of recommender system are discussed
here, this thesis does not provide solution for these problems. These are some
of the open issues that our researchers can explore in future.
4.1 Cold-start problem

The basic problem faced by most of the recommender system is cold start
problem[38]. This problem arises due to the lack of information about users
and items to generate good recommendations. There are three types of
cold-start problems: new user, new item and new system.
Cold-start problem for user - Whenever a new user entered into a

system, most of the information about the user is unknown. Such
situations make difficulty to create accurate predictions for users. It
hampers on the performance of a recommender system.
Cold-start problem for item - In case of item, when a new item enters the
system, it is unlikely that collaborative filtering systems will recommend it to
many users because very few users have yet rated or purchased that item.
Cold-start problem for system - A new system may face both of the
above mentioned problem and so it is named as system cold start.
20
4.2 Evaluation
Till now in the field of recommender system it is challenging to identify
the best algorithm for a given purpose, as researchers disagree on
which attributes it should be measured, and on which metrics should be
used for each attribute evaluation. In the paper of Herlocker [39], authors
broadly discussed about this problem and they point out three points.
According to them different algorithms may be better or worse on
different data sets. Second point is that goals for which evaluation is
performed may differ. Final point says that it is challenging to decide
what combinations of measures to use in comparative evaluation.
4.3 Privacy and Trust

Recommender systems use personal information of users, such as user characteristics, their interests and opinions in the form of ratings for items, to provide
services that better meet the needs of each individual user. "The other edge of
the sword is that recommender systems provide perfect tools for mar-keters and
others to invade users privacy" [40]. Hence, according to [40] pri-vacy is an
important issue to take into account, not only from a moral view-point, but also
from a legal viewpoint; users must know or be able to trust that their privacy is
guaranteed by a recommender system. [41]also notices that people may have
personal demands concerning their privacy.[18]
Security is an important factor to ensure that personal data remains per-sonal
and is only used by those who have a right to access and use that data. [40]
discusses several security methods to allow users to remain anonymous and to
assure that personal data is secure during transport. Even though a
recommender system can have the best security methods in place, it is still the
user who must decide whether to trust the recommender system and the people
behind the system; trust that security methods are indeed implemented and that
these methods are capable of keeping their personal data safe.[18]
21
4.4 Trust based Recommender Systems

Recommender Systems allow people to find the resources they need by making use
of the experiences and opinions of their nearest neighbours. Costly annotations by
experts are replaced by a distributed process where the users take the initiative.
While the collaborative Approach enables the collection of a vast amount of data, a
new issue arises: the quality assessment. The elicitation of trust values among
users,termed web of trust, allows a twofold enhancement of Recommender
Systems. Firstly, the filtering process can be informed by the Reputation of users
which can be computed by propagating trust. Secondly, the trust metrics can help to
solve a problem associated with the usual method of similarity assessment, its
reduced computability. An empirical evaluation on Epinions.com dataset shows that
trust propagation can increase the coverage of Recommender Systems while
preserving the Quality of predictions. The greatest improvements are achieved for
users who provided few ratings.
Problems associated with Collaborative Filtering

RSs based on CF suffer some inherent weaknesses
Data sparsity causes the first serious weakness of Collaborative Filtering
Clod start users-New users
What if Black hat Users become similar? Attacker can copy the ratings of
target user and fool the system into thinking that the attacker is in fact the most
similar user to target user.
User similarity is computable only against few users. The first step suffers another
problem. In order to be able to create good quality recommendations, RSs should be
able to compare the current user against every other user with the goal of selecting
the best neighbours with the more relevant item ratings. This step is mandatory and
its accuracy affects the overall system accuracy: failing to find good neighbours
will lead to poor quality recommendations. However, since the ratings matrix is
usually very sparse because users tend to rate few of the millions of items, it is often
the case that two user dont share the minimum number of items rated in common
required by user similarity metrics for computing similarity. For this reason, the
system is forced to choose neighbours in the small portion of comparable users and
will miss other no comparable but relevant users. This problem is not as serious for
users with
hundreds of ratings but for users with few ratings. However it can be argued that it is
more important (and harder) for an RS to provide a good recommendation to a user
with few ratings in order to invite her to provide more ratings and keep using the
system than to a user with many ratings that is probably already using the system
regularly.
Easy attacks by malicious insiders. Recommender Systems are often used in ecommerce sites (for example, in Amazon.com). In those contexts, being able to
influence recommendations could be very attractive: for example, an author may want
to force Amazon.com to always recommend the book she wrote. However,
subverting standard CF techniques is very easy [10]. The simplest attack is the copyprofile attack: the attacker can copy the ratings of target user and the system will think
the attacker is the most similar user to target user. In this way every additional item
the attacker rates highly will probably be recommended to the target user. Since
currently RSs are mainly centralized servers, creating a fake identity is a timeconsuming activity and hence these attacks are not currently heavily carried on and
studied. However we believe that, as soon as the publishing of ratings and opinions
becomes more decentralized (for example, with SemanticWeb formats such as RVW
[2] or FOAF [3]), these types of attacks will become more and more an issue.
Basically, creating such attacks will become as widespread as spam is today, or at
least as easy.
Web Of Trust
The webs of trust of all the users can then be aggregated in a global trust network, or
social network (Figure 1), and a graph walking algorithm be used to predict the
importance of a certain node of the network. This intuition is exploited, for
example, by PageRank [11], one of the algorithm powering the search engine
Google.com. According to this analysis, the Web is a network of content without a
centralized quality control and PageRank tries to infer the authority of every single
page by examining the structure of the network. PageRank follows a simple idea: if a
link from page A to page B represents a positive vote issued by A about B, then the
global rank of a page depends on the number (and quality) of the incoming links.
The same intuition can be extended from web pages to users: if users are allowed to
cast trust values on other users, then these values can be used to predict the
trustworthiness of unknown users. For example, the consumer opinion site
Epinions.com, where users can express opinions and ratings on items, also allows
users to express their degree of trust in other users. Precisely, the epinions.com
FAQ suggests a user should add in her web of trust reviewers whose reviews and
ratings they have consistently found to be valuable.
Fig1. Trust network. Nodes are users and edges are trust statements. The dotted
edge is one of the undefined and predictable trust statements
Trust metrics [3,14,8] have precisely the goal of predicting, given a certain
user, trust in unknown users based on the complete trust network. For example, in
Figure 1, a trust metric can predict the level of trust of A in D. Trust metrics can be
divided into local and global. Local Trust metrics take into account the very personal
and subjective views of the users and end up predicting different values of trust in
other users for every single user. Instead global trust metrics predict a global
reputation value that approximates how the community as a whole considers a
certain user. In this way, they dont take into account the subjective opinions of each
user but average them across standardized global values. PageRank [11], for example,
is a global metric. However, in general, local trust metrics are computationally more
expensive because they must be computed for each user whereas global ones are just
run once for all the
Community. In the following, we argue that trust-awareness can overcome all the
weaknesses. Precisely, trust propagation allows us to compute a relevance measure,
alternative to user similarity that can be used as an additional or complementary
weight when calculating recommendation predictions. In [9] we have shown how this
predicted trust value, thanks to trust propagation, is computable on much more users
than the user similarity value. CF systems have problems scaling up because
calculating the neighbours set requires computing User Similarity of current user
against every other user. However, we can significantly reduce the number of users
which RS has to consider by prefiltering users based on their predicted trust value.
For example, it would be possible to consider only users at a small distance in social
network from current user or considering only users with a predicted trust higher than
a certain threshold. Moreover, trust metrics can be attack-resistant [8], i.e. they can
be used to spot malicious users and to only take into account reliable users and their
ratings. It should be kept in mind, however, that there isnt a global view of which
user is reliable or trustworthy so that, for example, a user can be considered
trustworthy by one user and untrustworthy by another user.
Trust aware Recommender system architecture

In this section we present the architecture of our proposed solution: Trust-aware
Recommender Systems. Figure 2 shows the different modules (black boxes) as well
as input and output matrices of each of them (white boxes). The overall system
takes as input the trust matrix (representing all the community trust statements) and
the ratings matrix (representing all the ratings given by users to items) and
produces, as output, a matrix of predicted ratings that the users would assign to the
items. These matrix is used by the RS for recommending the most liked items to the
user: precisely, the RS selects, from the row of predicted ratings relative to the user,
the items with highest values. Of course, the final output matrix could be somehow
sparse, i.e. having some cells with missing values, when the system is not able to
predict the rating that the user would give to the item. Actually the quantity of
predictable ratings is one of the evaluation strategies.
Chapter 5
SVD++(Single Valued Decomposition and Latent Features)
5.1 Baseline Estimates
Typical CF data exhibit large user and item effects i.e., systematic tendencies for
some users to give higher ratings than others, and for some items to receive higher
ratings than others. It is customary to adjust the data by accounting for these effects,
which we encapsulate within the baseline estimates. Denote by the overall average
rating. A baseline estimate for an unknown rating rui is denoted by bui and accounts
for the user and item effects:
The parameters bu and bi indicate the observed deviations of user u and item i,
respectively, from the average. For example, suppose that we want a baseline estimate
for the rating of the movie Titanic by user Joe. Now, say that the average rating over
all movies, , is 3.7 stars. Furthermore, Titanic is better than an average movie, so it
tends to be rated 0.5 stars above the average. On the other hand, Joe is a critical user,
who tends to rate 0.3 stars lower than the average. Thus, the baseline estimate for
Titanics rating by Joe would be 3.9 stars by calculating 3.7 0.3 + 0.5. In order to
estimate bu and bi one can solve the least squares problem:
5.2 Latent Factor Model

Suppose that a recommender system includes m users and n items. Let R = [ru;i]m_n
denote the user-item rating matrix, where each entry ru;i represents the rating given
by user u on item i. For clarity, we preserve symbols u; v for users, and i; j for items.
Let Iu denote the set of items rated by user u. Let pu and qi be a d-dimensional latent
feature vector of user u and item i, respectively. The essence of matrix factorization
is to find two low-rank matrices: user-feature matrix P 2 Rd_m and item-feature
matrix Q 2 Rd_n that can adequately recover the rating matrix R, i.e., R _ P>Q, where
P> is the transpose of matrix P. Hence, the rating on item j for user u can be predicted
by the inner product
of user-specific vector pu and item-specific vector qj , i.e., ^ru;j = q> j pu. In this
regard, the main task of recommendations is to predict the rating ^ru;j as close as
possible to the ground truth ru;j . Formally, we can learn the user- and item-feature
matrices by minimizing the following loss (objective) function:
5.3 Model Learning

To obtain a local minimization of the objective function given by Equation 1, we
perform the following gradient descents on
For all the users and items.
Chapter 6
Data sets
6.1 MovieLense- MovieLense Dataset (100k)
Description
The 100k MovieLense ratings data set. The data was collected through the
MovieLens web site (movielens.umn.edu) during the seven-month period from
September 19th, 1997 through April 22nd, 1998. The data set contains about
100,000 ratings (1-5) from 943 users on 1664 movies. R> MovieLense
943 x 1664 rating matrix of class realRatingMatrix with 99392 ratings.
References
Herlocker, J., Konstan, J., Borchers, A., Riedl, J.. An Algorithmic Framework for Performing Collaborative Filtering. Proceedings of the 1999 Con-
(a) Raw rating distribution for movielense (b) Normalize rating distribution for movielense
Figure 5.1: MovieLense dataset
22
Preference on Research and Development in Information Retrieval. Aug.

1999.
6.2 Jester data set

The Jester is sample data set of 5000 users from anonymous ratings data
from the Jester Online Joke Recommender System collected between
April 1999 and May 2003. The data set contains ratings for 100 jokes on a
scale of -10 to +10. Number of jokes rated by all users are 36 or more. In
the histogram, shows an interesting distribution where all negative values
occur with an almost identical frequency and the positive ratings more
frequent with a steady decline towards the rating 10. After normalizing at
row centering we get the normal distribution.[42]
(a) Raw rating distribution for Jester data set
(b) Normalize rating distribution for Jester data set
Figure 5.2: Jester dataset
23
6.3 MS Web data set

In [43] paper authors used MS Web dataset. This dataset stores individual
users visits to various areas(vroots)of Microsoft corporate web site in a oneweek timeframe in Feburary 1998.Implicit rating database and application can
be found in this dataset. Each vroot was characterized as being vis-ited(vote
of one)or not(no vote).Total no of users and titles(items)are 32710 and
285.Using R language description of dataset is as below -
R> MSWeb
32710 x 285 rating matrix of class binaryRatingMatrix with 98653
ratings. We took a portion of the dataset for evaluation.
R>MSWeb10 <- sample(MSWeb[rowCounts(MSWeb)>10],100)
100 x 285 rating matrix of class binaryRatingMatrix with 1381
ratings. R> hist(rowCounts(MSWeb10))
R> hist(colCounts(MSWeb10))
(a) Row count in MSweb
(b) Columns count in MSweb
Figure 5.3: MSweb dataset
24
Chapter 7
Implementation
Many
organizations
developed
different
platforms
to
implement
recommender system. Some of the open source projects are listed in the
following table. These platforms gives researchers an easy path for
studying, building and experimenting with different recommender system
algorithms.
Among
them
Apache
Mahout(The
Apache
Software
Foundation 2014a)and Recom-menderlab(Hashler 2014)are used by us.
Software
Apache
Mahout
Cofi
Crab
easyrec
LensKit
Description
Machine learning library
includes collaborative filtering
Collaborative filtering library
Components to create
recommender systems
Recommender for Web pages
Collaborative filtering algorithms
from GroupLens Research
Language URL
Java
http://mahout.apache.org/
Java
Java
http://www.nongnu.org/cofi/
https://github.com/
muricoca/crab
http://easyrec.org/
Java
http://lenskit.grouplens.org/
Python
Recomm
Testing,developing environment R
enderlab
http://R-Forge.R-project.org/
projects/recommenderlab/
Table 6.1: Recommender System Software freely available for research.
7.1 About Apache Mahout

Mainly Mahout is an open source machine learning library from Apache.It is
scalable in nature and performs well in large scale data size. Its implementa-tion
has to be done in Java as it is a Java library. Hadoop distributed projects
25
also can be integrated here.
7.2 Introduction to recommendation in Apache Mahout

The detailed introduction to Apache Mahout can be found in their website (The
Apache Software Foundation 2014a). Some of the points are mentioned in this
section. The recommender engine within Apache Mahout is achieved via Taste,
a formerly separated project written by Sean Owen and Sebastian Schelter (The
Apache Software Foundation 2014a). Now, Taste can be regarded as a flexible,
mature and kind of independent component inside Mahout. It not only supports
the basic user based and item-based CF approaches, but also provides
extendable
interfaces
to
connect
and
conduct
users
customized
recommendation. Compared to the currently prevalent Hadoop technology,

Taste is focused on dealing with single-machine tasks.
Taste has five package interfaces as key abstractions to conduct recommendations: DataModel is a connector to extract the information of user preference from
the data source. JDBCDataModel and FileDataModel are possible to excess and
read the information from data base and files, respectively; User-similarity and
Item-similarity are the another package interfaces to figure out similar users or
items for the specific users or items, namely neighborhood. Similarity algorithm
is the core for CF recommendation engine. Taste pack-ages many popular
similarity algorithms, like Pearson correlation similarity, Euclidean distance
similarity, Spearman correlation Taste has five package interfaces as key
abstractions to conduct recommendations.
7.3 Installation of Apache Mahout

7.3.1 Java and IDE
Our experimentation environment is Windows 7 with 32 bit OS and Centos
7.Mahout requires Java6. We installed Eclipse (http://www.eclipse.org) the
26
most popular, free Java IDE with resent version luna. It supports Java8.
7.3.2 Installing Maven

Mahouts build and release system is built around Maven. Maven is a
command-line tool that manages dependencies, compiles code,
packages releases, generates documentation, and publishes formal
releases. To use Mahout, Maven should be installed. But in Eclipse
Luna, Maven is embedded in itself by default.
First of all we have to create a Maven project. All the required jar files for
experimentation can be added automatically or manually. We added them
automatically by setting the proxy. For setting the proxy we add a setting.xml
file in C->Users->lakhya->.m2 file path. This .m2 directory is created whenever we try to create a new Maven project in Eclipse. To add dependency, we
will open the existing pom.xml file and add the recent dependency from
(https://mahout.apache.org/general/downloads.html) site. But whenever we
try to create a new maven project we can get an error like "Could not re-solve
archetype
org.apache.maven.archetypes:maven-archetype-quickstart
RELEASE from any of the configured repositories." This is because an in build

repository is comes along the maven. To remove this error we have to delete
the repository folder in .m2 directory first and again we will have to create a
new Maven project. This time all the jar files and dependencies are
automatically added to our file path. Now we are ready to create the coding
of recommender system.
7.3.3 Building A Recommender Engine

To build a Recommender System on a specific domain we first need some
predefined data file as an input. This data file can contain predefined ratings
for different items by different users. In this case item to user ratio is not equal.
Either items or users can be large. For experimentation purpose we can
prepare such small data file by ourself. Our aim is to predict the ratings for
unrated items by a user and prepare a top -k list as an recommendation.
27
For data input we used DataModel, it gives us the facility to store and access
all the preference ,item, user data needed in the computation. Next, we need
UserSimilarity methods. It will provide the different similarity mea-sure
techniques to compute user similarity needed in collaborative filtering. A
UserNeighborhood implementation gives the concept of a group of user which
is most similar to a given user. Finally, Recommender implementation
combines all this concepts to recommend items to users.
In our experimentation process we use MovieLense100k data set.There are
various sizes of movielense data can be found.But this data files are .dat file
formatted. To input data file in our FileDataModel we convert that .dat file in
.csv(comma separated value)format.
28
Chapter 8
Evaluation Metrics
In literature, different accuracy measure for recommender system can be
found since 1994[15]. Out of them, some of the most popular methods are
classified into three classes by [39] authors. These are namely : predictive
accuracy metrics, classification accuracy metrics and rank accuracy metrics.
8.1 Predictive Accuracy Metrics

These type of metrics measure the closeness between systems
predicted ratings and true user ratings. In context to recommender
system formula looks like as in below.
Mean Absolute Error(MAE)

Mean Absolute Error measure error using Predictive Accuracy Metrics principle. The MAE is defined as the average absolute deviation between predicted ratings and true ratings. According to[44] we can also say that this
metric measure average absolute deviation between each predicted rating
P(u;i ) and each users real ones R(u;i ). Here N is number of

observations available, where i item must be rated by u user.
29
This metric is used for evaluation of several recommender system[43;

45; 13].Mean Squared Error, Root Mean Squared Error, and Normalized
Mean Absolute Error [46] are the other variations of MAE .
8.2 Classification Accuracy Metrics

According to the authors of [39] paper, these metrics measure the frequency
of correct or incorrect decisions about the goodness of recommended items.
That is why these metrics are suitable for true binary preferences data.
Problem may occur when these matrics are applied to real time data
sets, due to data sparsity.
For this problem in collaborative filtering, items which has no rating are
ignored from the top list which is generated for recommendation.
Another approach to evaluate sparse data sets is to assume default

ratings[43]. The drawback of this approach is that the default rating
may be very different from the true rating.
A third approach is found in literature as to compute how many of the highly
rated items are found in the generated recommendation list. Precision- Recall and Receiver Operating Characteristic(ROC) methods comes into this
category. In research articles these measures are considered as
Information Retrieval measure. Cleverdon is the first person to proposed
them as key metrics in year 1968 [47].For the evaluation of recommender
systems they have been used by [17; 48; 49].
A confusion matrix table is presented here. Diagonal numbers a and d count
the correct decisions: retrieve a document when it is relevant, do not retrieve
it when it is non-relevant. The numbers b and c count the incorrect cases.
Precision and Recall are measure of the degree to which the system present
relevant information: Precision is defined as the ratio of relevant items selected
to the number of items selected, while recall is the ratio of relevant
30
Relevant Non-relevant
Retrieved
A
b
Not-retrieved C
d
Table 7.1: Confusion matrix of two classes when considering the retrieval of document
Items selected to the total number of relevant items available.
A ROC curve represents recall against fallout. The objective of ROC curve
analysis are to return all of the relevant documents without returning the
irrelevant ones. It does so (see Fig ) by maximizing recall(called the true
positive rate) while minimizing the fallout(false positive rate).
F-measure is also often used to combine precision and recall [39].
8.3 Rank Accuracy Metrics

These metrics are used in recommenders based on the display of an
ordered list of elements. Here highest rank items are preferred.
31
Figure 7.1: Diagrammatic representation of recision-recall
32
Chapter 9
Results and Explanation
Comparison of User based and Item based method
Figure 8.1: User based Vs Item based varying training data set size
The Figure 8.1: is a comparison graph between User based and Item based
recommendation on MovieLense data set. Using this graph we represent the
error of prediction which is measured against Average Absolute Difference with
increasing amount of Training data set size. When there is a very less amount of
training data in both User based and Item based they give a high value of error.
Its significance is that both the recommendation method suffers from cold start
problem at the initial stage. But as the training data set
33
size increases both the method gives apparently good results(less error).
It is noticeable that Item based method after some time gets saturated. In
contrast User based method still can give better performance.
Comparison of different similarity measure
Figure 8.2: Evaluation for similarity methods
At difference threshold neighborhood six different similarity methods are

compared here. Tanimoto similarity method cannot compute any prediction
after 0.2 threshold value so it co inside with x- axis. Euclidean similarity is
producing less error and best among all. LogLikelihood and Pearson methods are giving almost same error rate, but at high threshold loglikelihood
producing less error. We know that Pearson correlation cant produce
recommendation when there is a single value. As in MovieLense data set
there is sparsity of rating value, this may be the reason behind its worse
performance. Spearman correlation is producing highest error rate and its
error rate increasing along with threshold neighborhood size. Uncentered
cosine similarity is producing less error than Spearman correlation.
Effect of Nearest Neighborhood in recommendation of CF

Comparison of Average Absolute Difference with different similarity metrics
Using Nearest-Neighborhood in User based CF. Some values are not a
34
No of NNeighbor
Similarity
n=1 n=3 n=5 n=7 n=9 n=11
Pearson
NaN 0.89 0.83 0.92 0.77 0.90
Loglikelihood NaN 1.02 0.83 0.84 0.83 0.88
Euclidean
NaN 0.68 0.78 0.86 0.77 0.83
n=13
0.80
0.91
0.85
n=15
0.87
0.84
0.80
n=17
0.90
0.86
0.71
n=19
0.87
0.82
0.76
Table 8.1: Comparison with different similarity metrics using nearest-neighborhood
number, or undefined, and are denoted by Javas NaN symbol.In userbased collaborative filtering there is two approaches to consider
neighborhood. In Table 8.1: we can see the different error values with
varying no of fixed size neighborhood applied to three different similarity
methods. At neighborhood equal to 1 no similarity method can provide
prediction. Therefore we are getting "not a number" in implementation.
Figure 8.3: Evaluation for Nearest Neighborhood
In Graph also that space is showing blank the reason is that at neighbor-hood=1
user-based collaborative filtering is facing cold start problem. But whenever the
no neighborhood increases we are getting some amount of error value means
user-based collaborative filtering is starts predicting values. At neighborhood
equal to seventeen we are getting a lower value of error in Euclidean based
similarity. The Euclidean distance similarity metric may
35
be a little better than Pearson, though their results are quite similar. It also
appears that using a small neighborhood is better than a large one; the
best evaluations occur when using a neighborhood of seventeen people.
From this point we can also consider that in case of MovieLense data set
may be the users preferences are truly quite personal, and incorporating
too many others in the computation does not help.
Effect of Threshold Neighborhood in recommendation of CF

No of ThresholdNeighborhood
Similarity
t=0.1 t=0.2 t=0.3 t=0.4 t=0.5 t=0.6
Pearson
0.79 0.80 0.79 0.78 0.81 0.81
Loglikelihood 0.80 0.80 0.81 0.80 0.81 0.80
Euclidean
0.79 0.79 0.81 0.74 0.74 0.75
t=0.7
0.87
0.82
0.87
t=0.8
0.89
0.80
0.86
t=0.9
0.87
0.80
0.84
Table 8.2: Comparison with different similarity metrics using threshold neighborhood
The Table 8.2: hold the error values which is produced by the
average absolute difference at different threshold neighborhood using
user-based collaborative filtering in MovieLense data set. It is
noticeable that at threshold value 0.4 and 0.5 the Euclidean distance
similarity method is producing less error compared to other metrics.
Fig 8.4: is showing the graphical representation of table values present in

Table 8.2.The line which is representing Euclidean distance similarity is lowered at 0.4 to 0.5 position means that at these threshold neighborhoods it is
going to give good prediction values. But as neighborhood increases all the
three similarity methods produce higher values of error rate means inaccurate prediction values. Loglikelihood is generating a constant error rate and
prediction values of Pearson correlation are not that much good compared to
Euclidean distance similarity. We also notice that consideration of higher
amount of neighborhood is not going to give good result.
36
Figure 8.4: Evaluation for Threshold Neighborhood
Performance of UBCF in Jester data set
Figure 8.5: Performance of UBCF and IBCF against Jester
In Jester data set four algorithms are applied namely user-based

collaborative filtering, Item-based collaborative filtering,Popular or top
-k recommendation and random recommendation. Here we consider
six items(jokes =1,3,5,10,15,20)and evaluate them using splitting
method with training data 90% and goodRating=5.
37
Figure 8.6: Performance of UBCF and IBCF against Jester
In user based collaborative filtering "cosine similarity" and nearest neighborhood=50 is considered. As its rating range is +10 to -10 for evaluation
purpose we have to consider a good Rating value as 5. Again user-based
collaborative filtering is going to give high value of precision-recall other than
popular and item based collaborative filtering. The structure of Jester and
MovieLense data set is similar in nature but their rating range is dissimilar.
So,inference is that user-based collaborative filtering also going to give high
precision-recall value compared to other methods and though there is
dissimilarity of rating still it will not going to give difference in the performance of
user-based collaborative filtering. Random recommendation is worst out of four.
But popular recommendation can be placed after UBCF. Because this means
that jokes rated high are usually liked by everyone and are safe recommendation.
But this might be different for other data set. IBCF finds what is good between
the items, but it may be fail to users having different taste than in general.
Another thing that ROC curve cannot explain all the viewpoints , though UBCF
does
better
but
more
expensive
to
generate
recommendations
at
recommendation time, as it uses the whole matrix. On the other hand IBCF saves
only k closest items in the matrix and does not need to save anything. But ,if we
want serendipity UBCF does a better job.
38
Performance of UBCF in MSWeb data set
Figure 8.7: Performance of UBCF and IBCF against MSWeb
Here we are taking Msweb dataset.Msweb is a 32710 x 285 rating matrix

of class binaryRatingMatrix with 98653 ratings. A portion of dataset is
taken for evaluation 100 x 285 rating matrix of class binaryRatingMatrix
with 1374 ratings. We are evaluating three items, with four fold cross
validation method along with four algorithms user-based collaborative
filtering, item-based collaborative filtering, popular or top-k and random.
This is a binary rating matrix and user-based collaborative filtering is going

to give high precision-recall value. We notice that in precision-recall curve
IBCF, POPULAR and UBCF are very much close to each other, reason is
that it is a binary rating data set only if users are interested to the site will
visit the web site otherwise not. There is very much little chance that
uninterested persons are going to visit that web site.
39
Figure 8.8: Performance of UBCF and IBCF against MSWeb
40
Chapter 10
Conclusion
Selecting appropriate algorithms for building a recommender system
is a tricky job. There is not a singular way for applying these
algorithms in a general manner.
Recommender systems are applied in various contexts and applications. If
one method perform better on one domain it is not guaranteed that it will perform same on an another system as before. Because several factors influence
the performance of the method. Sometimes required information for prediction
may not be available. Therefore, a series of experimentation should be done
to find the best method for a specific domain.
We have done some experiments on data set and find the better method for
prediction. But more better method may be available for that data sets.
For collaborative filtering, rating matrix is the most prominent source of information. Along that rating, some data sets also provide some side
information(e.g. gender,age about users and movie title,actor etc). If all
these extra information can be incorporated at the time of prediction
mechanism surely we will get some good prediction values. I think there is
a lot of future scope to improve collaborative filtering techniques.
Our evaluation is obtained in a single node machine but there is a

provision to integrate the recommender engine in distributed
environment also with the help of hadoop .
41
Appendix A
Netflix Prize Competition
In 2006, the online DVD rental company Netflix announced a contest to improve the state of its recommender system. To enable this, the company released a training set of more than 100 million ratings spanning about 500,000
anonymous customers and their ratings on more than 17,000 movies, each
movie being rated on a scale of 1 to 5 stars. Participating teams submit
predicted ratings, and Netflix calculates a root-mean-square error(RMSE)
based on the held out truth. The first team that can improve on the Netflix
algorithms RMSE performance by 10 percent or more wins a $ 1 million prize.
If no team reaches the 10 percent goal, Netflix gives 50,000 Progress Prize
to the team in first place after each year of the competition.
The contest created a buzz within the collaborative filed. Until this point
,the only publicly available data for collaborative filtering research was
orders of magnitude smaller. The release of this data and the
competitions al-lure spurred a burst of energy and activity. According to
the contest website (www.netflixprize.com), more than 48,000 teams
from 182 different countries have downloaded the data [50].
42
Bibliography
[1] F. Gedikli, D. Jannach, and M. Ge, How should i explain? a comparison
of different explanation types for recommender systems, Int. J. Hum.Comput. Stud., vol. 72, no. 4, pp. 367382, Apr. 2014. [Online].
Available: http://dx.doi.org/10.1016/j.ijhcs.2013.12.007
[2] D. Eck, P. Lamere, T. Bertin-mahieux, and S. Green, Automatic

generation of social tags for music recommendation, in Advances
in Neural Information Processing Systems 20, J. Platt, D. Koller,

Y. Singer, and S. Roweis, Eds. Curran Associates, Inc., 2008,
pp. 385392. [Online]. Available: http://papers.nips.cc/paper/3370automatic-generation-of-social-tags-for-music-recommendation.pdf
[3] M. Gori and A. Pucci, Research paper recommender systems: A
random-walk based approach, in Web Intelligence, 2006. WI 2006.
IEEE/WIC/ACM International Conference on, Dec 2006, pp. 778781.
[4] V. Zanardi and L. Capra, Social ranking: Uncovering relevant

content using tag-based recommender systems, in Proceedings of
the 2008 ACM Conference on Recommender Systems, ser.
RecSys 08. New York, NY, USA: ACM, 2008, pp. 5158. [Online].
Available: http://doi.acm.org/10.1145/1454008.1454018
[5] X. Amatriain, N. Lathia, J. M. Pujol, H. Kwak, and N. Oliver, The wisdom
of the few: A collaborative filtering approach based on expert opinions
from the web, in Proceedings of the 32Nd International ACM SIGIR
Conference on Research and Development in Information
43
Retrieval, ser. SIGIR 09. New York, NY, USA: ACM, 2009, pp. 532
539. [Online]. Available: http://doi.acm.org/10.1145/1571941.1572033
[6] M. Balabanovic and Y. Shoham, Fab: Content-based, collaborative

recommendation, Commun. ACM, vol. 40, no. 3, pp. 6672, Mar.
[7] P. Gupta, A. Goel, J. Lin, A. Sharma, D. Wang, and R. Zadeh, Wtf: The
who to follow service at twitter, in Proceedings of the 22Nd International
Conference on World Wide Web, ser. WWW 13. Republic and Canton
of Geneva, Switzerland: International World Wide Web Conferences
Steering
Committee,
2013,
pp.
505514.
[Online].
Available:
http://dl.acm.org/citation.cfm?id=2488388.2488433
[8] S. Spiegel, Masters thesis.

[9] H. A. Simon, The sciences of the artificial. MIT press, 1996, vol. 136.
[10] J. Leino, User factors in recommender systems: Case studies in

e-commerce, news recommending, and e-learning, 2014.
[11] D. Goldberg, D. Nichols, B. M. Oki, and D. Terry, Using
collabora-tive filtering to weave an information tapestry,
Communications of the ACM, vol. 35, no. 12, pp. 6170, 1992.
[12] W. Hill, L. Stead, M. Rosenstein, and G. Furnas, Recommending and
evaluating choices in a virtual community of use, in Proceedings of the
SIGCHI conference on Human factors in computing systems. ACM
Press/Addison-Wesley Publishing Co., 1995, pp. 194201.
[13] U. Shardanand and P. Maes, Social information filtering:

algorithms for automating word of mouth, in Proceedings of the
SIGCHI con-ference on Human factors in computing systems.
ACM Press/Addison-Wesley Publishing Co., 1995, pp. 210217.
[14] J. A. Konstan, B. N. Miller, D. Maltz, J. L. Herlocker, L. R. Gordon, and
44
J. Riedl, Grouplens: applying collaborative filtering to usenet news,
Communications of the ACM, vol. 40, no. 3, pp. 7787, 1997.

[15] P. Resnick, N. Iacovou, M. Suchak, P. Bergstrom, and J. Riedl, Grouplens: an open architecture for collaborative filtering of netnews, in
Proceedings of the 1994 ACM conference on Computer

supported co-operative work. ACM, 1994, pp. 175186.
[16] F. Meyer, Recommender systems in industrial contexts, arXiv
preprint arXiv:1203.4487, 2012.
[17] D. Billsus and M. J. Pazzani, Learning collaborative information
fil-ters. in ICML, vol. 98, 1998, pp. 4654.
[18] M. Van Setten, Supporting people in finding information: hybrid
rec-ommender systems and goal-based structuring, 2005.
[19] Y. Shi, M. Larson, and A. Hanjalic, Collaborative filtering beyond the
user-item matrix: A survey of the state of the art and future challenges,
ACM Computing Surveys (CSUR), vol. 47, no. 1, p. 3, 2014.

[20] P. Melville, R. J. Mooney, and R. Nagarajan, Content-boosted
collab-orative filtering for improved recommendations, in
AAAI/IAAI, 2002, pp. 187192.
[21] K. H. Tso-Sutter, L. B. Marinho, and L. Schmidt-Thieme, Tagaware recommender systems by fusion of collaborative filtering
algorithms, in Proceedings of the 2008 ACM symposium on
Applied computing. ACM, 2008, pp. 19951999.
[22] M. Clements, P. Serdyukov, A. P. De Vries, and M. J. Reinders, Using flickr geotags to predict user travel behaviour, in Proceedings of
the 33rd international ACM SIGIR conference on Research and
devel-opment in information retrieval. ACM, 2010, pp. 851852.
[23] T. Horozov, N. Narasimhan, and V. Vasudevan, Using location for personalized poi recommendations in mobile environments, in Applica45
tions and the Internet, 2006. SAINT 2006. International Symposium on.
IEEE, 2006, pp. 6pp.

[24] S. Sen, J. Vig, and J. Riedl, Tagommenders: connecting users
to items through tags, in Proceedings of the 18th international
conference on World wide web. ACM, 2009, pp. 671680.
[25] P. Massa and B. Bhattacharjee, Using trust in recommender
systems: an experimental analysis, in Trust Management.
Springer, 2004, pp. 221235.
[26] J. A. Golbeck, Computing and applying trust in web-based
social net-works, 2005.
[27] P. Massa and P. Avesani, Trust-aware recommender systems, in
Proceedings of the 2007 ACM conference on Recommender

systems. ACM, 2007, pp. 1724.
[28] M. A. Abbasi, J. Tang, and H. Liu, Trust-aware recommender
sys-tems.
[29] J. S. Breese, D. Heckerman, and C. Kadie, Empirical analysis
of predictive algorithms for collaborative filtering, in Proceedings of the Fourteenth Conference on Uncertainty in Artificial
Intelligence, ser. UAI98. San Francisco, CA, USA: Morgan
Kaufmann Publishers Inc., 1998, pp. 4352. [Online]. Available:
http://dl.acm.org/citation.cfm?id=2074094.2074100
[30] P. Resnick and H. R. Varian, Recommender systems,
Commun. ACM, vol. 40, no. 3, pp. 5658, Mar. 1997. [Online].
Available: http://doi.acm.org/10.1145/245108.245121
[31] P. Lops, M. De Gemmis, and G. Semeraro, Content-based
recom-mender systems: State of the art and trends, in
Recommender systems handbook. Springer, 2011, pp. 73105.
46
[32] B. Sarwar, G. Karypis, J. Konstan, and J. Riedl, Item-based

collabo-rative
filtering
recommendation
algorithms,
in
Proceedings of the 10th international conference on World Wide

Web. ACM, 2001, pp. 285 295.
[33] Y. Shi, M. Larson, and A. Hanjalic, Collaborative filtering beyond the
user-item matrix: A survey of the state of the art and future
challenges, ACM Comput. Surv., vol. 47, no. 1, pp. 3:13:45, May
2014. [Online]. Available: http://doi.acm.org/10.1145/2556270
[34] G. Adomavicius and A. Tuzhilin, Toward the next generation of

rec-ommender systems: A survey of the state-of-the-art and
possible ex-tensions, IEEE Transactions on Knowledge and
Data Engineering, vol. 17, no. 6, pp. 734749, 2005.
[35] J. L. Herlocker, J. A. Konstan, A. Borchers, and J. Riedl, An
algorithmic framework for performing collaborative filtering, in
Proceedings of the 22Nd Annual International ACM SIGIR
Conference on Research and Development in Information Retrieval,
ser. SIGIR 99. New York, NY, USA: ACM, 1999, pp. 230237.
[Online]. Available: http://doi.acm.org/10.1145/312624.312682
[36] M. Deshpande and G. Karypis, Item-based top-n recommendation
algorithms, ACM Trans. Inf. Syst., vol. 22, no. 1, pp. 143177, Jan.
[37] S. Owen, R. Anil, T. Dunning, and E. Friedman, Mahout in action.

Manning, 2011.
[38] D. Maltz and K. Ehrlich, Pointing the way: active collaborative
fil-tering, in Proceedings of the SIGCHI conference on Human
factors in computing systems. ACM Press/Addison-Wesley
Publishing Co., 1995, pp. 202209.
[39] J. L. Herlocker, J. A. Konstan, L. G. Terveen, and J. T. Riedl, Evalu-
47
ating collaborative filtering recommender systems, ACM Transactions
on Information Systems (TOIS), vol. 22, no. 1, pp. 553, 2004.

[40] S. Braynov, Personalization and customization technologies,
The In-ternet Encyclopedia, 2003.
[41] J. Schreck, Security and privacy in user modeling. Springer
Science & Business Media, 2003, vol. 2.
[42] M. Hahsler, recommenderlab: A framework for developing and
testing recommendation algorithms, Nov, 2011.
[43] J. S. Breese, D. Heckerman, and C. Kadie, Empirical analysis of
pre-dictive algorithms for collaborative filtering, in Proceedings of
the Fourteenth conference on Uncertainty in artificial intelligence.
Morgan Kaufmann Publishers Inc., 1998, pp. 4352.
[44] F. H. del Olmo and E. Gaudioso, Evaluation of recommender

systems: A new approach, Expert Systems with Applications,
vol. 35, no. 3, pp. 790804, 2008.
[45] J. L. Herlocker, J. A. Konstan, A. Borchers, and J. Riedl, An algorithmic framework for performing collaborative filtering, in Proceedings of
the 22nd annual international ACM SIGIR conference on Research and
development in information retrieval. ACM, 1999, pp. 230237.
[46] K. Goldberg, T. Roeder, D. Gupta, and C. Perkins, Eigentaste:

A constant time collaborative filtering algorithm, Information
Retrieval, vol. 4, no. 2, pp. 133151, 2001.
[47] J. Mills et al., Factors determiningthe performance of indexing
sys-tems, Volume I-Design, Volume II-Test Results, ASLIB
Cranfield Project, Reprinted in Sparck Jones & Willett, Readings
in Information Retrieval, 1966.
48
[48] B. Sarwar, G. Karypis, J. Konstan, and J. Riedl, Analysis of recommendation algorithms for e-commerce, in Proceedings of the 2nd ACM
conference on Electronic commerce. ACM, 2000, pp. 158167.
[49] , Application of dimensionality reduction in recommender

system-a case study, DTIC Document, Tech. Rep., 2000.
[50] Y. Koren, R. Bell, and C. Volinsky, Matrix factorization techniques for
recommender systems, Computer, no. 8, pp. 3037, 2009.
49

Final Report

Caricato da

Informazioni sul documento

Copyright

Formati disponibili

Condividi questo documento

Condividi o incorpora il documento

Opzioni di condivisione

Hai trovato utile questo documento?

Questo contenuto è inappropriato?

Copyright:

Formati disponibili

Final Report

Caricato da

Copyright:

Formati disponibili

i

A Recommender Evaluation: Towards a

Under The Guidance of

Prabhakar Sarma Neog

Department of Computer Science & Engineering

Report Title: A Recommender Evaluation: Towards a Production

Rishabh Tyagi (11-1-5-098)

Prabhakar Sarma Neog

Designing a recommender system involves many considerations. Apart from

We wish to acknowledge the continuous support and blessings of our parents

Finally we believe this research experience will greatly benefit our

3.2 Collaborative Filtering(CF) . . . . . . . . . . . . . . . . . .

3.2.1 Memory-based Collaborative Filtering . . . . . . . .

3.2.2 Model-based Collaborative Filtering . . . . . . . . .

3.3 Exploring Similarity Matrices . . . . . . . . . . . . . . . .

3.4 User neighborhood . . . . . . . . . . . . . . . . . . . . . .

4 Issues in Recommender System

6.2 Jester data set . . . . . . . . . . . . . . . . . . . . . . . . .

6.3 MS Web data set . . . . . . . . . . . . . . . . . . . . . . .

6.3.1 Java and IDE . . . . . . . . . . . . . . . . . . . . .

6.3.2 Installing Maven . . . . . . . . . . . . . . . . . . .

6.3.3 Building A Recommender Engine . . . . . . . . . .

8.2 Classification Accuracy Metrics . . . . . . . . . . . . . . .

8.3 Rank Accuracy Metrics . . . . . . . . . . . . . . . . . . . .

9 Results and Explanation

A Netflix Prize Competition

3.2 User-based collaborative filtering. . . . . . . . . . . . . . .

5.1 MovieLense dataset . . . . . . . . . . . . . . . . . . . . . .

5.2 Jester dataset . . . . . . . . . . . . . . . . . . . . . . . . .

5.3 MSweb dataset . . . . . . . . . . . . . . . . . . . . . . . .

User based Vs Item based varying training data set size . . . 33

8.2 Evaluation for similarity methods . . . . . . . . . . . . . . .

8.3 Evaluation for Nearest Neighborhood . . . . . . . . . . . .

8.6 Performance of UBCF and IBCF against Jester . . . . . . . 38

3.2 Item-based collaborative filtering . . . . . . . . . . . . . . .

Five users has given rating from(1 to 5)on three itemset . . . 17

Rearranging the Rating table with lower to higher Ranking . 17

3.5 Correlation to user 1 with other users . . . . . . . . . . . . .

3.6 Correlation to user 1 with other users . . . . . . . . . . . . .

Recommender System Software freely available for research. 25

7.1 Confusion matrix of two classes when considering the

recommendation algorithm. These analysis are done to find an effective

GroupLense[14;15] first introduced an automated collaborative

coefficient used by GroupLens was not very efficient according to the

recommendation. [21]focused on tags as side information. Geotags are also

Next we visit various techniques used in recommender systems

3.1 Content-based Recommendation

in the corpus or attributes of items[31]. T is obtained by applying some

N is the total number of documents or items, nk is the number of

3.2 Collaborative Filtering(CF)

Table 3.1: Movie rating scenario user rating 1-5 scale

3.2.1 Memory-based Collaborative Filtering

Figure 3.2: User-based collaborative filtering.

To make a recommendation based on the model we use the similarities to

Table 3.2: Item-based collaborative filtering

calculate a weighted sum of the users ratings for related items.

3.2.2 Model-based Collaborative Filtering

3.3 Exploring Similarity Matrices

Pearson correlation based similarity : This algorithm is an alternate approach

ru should be computed just over the ratings in Ia \Iu.

decreasing similarity linearly until the users have at least 50 rated

multidimensional space(items). Cosine value ranges from -1 to 1. Most similar

the next-least-preferred items preference value is changed to 2, and

Procedure to calculate Spearman Rank Correlation: