Documenti di Didattica
Documenti di Professioni
Documenti di Cultura
Recommendation Systems
!1
Antonio Bahamonde
Centro de Inteligencia Articial.
Universidad de Oviedo at Gijn. Spain
Spanish Association for Articial Intelligence
From 9/2012 till 7/2013
Visiting Professor in Cornell University
!2
Contents
Recommendation Systems
!3
RS: Denitions
RS help to match users with items
RS are software agents that elicit the interests and preferences of individual consumers [...] and make recommendations accordingly. Different system designs / paradigms
Based on availability of exploitable data
Implicit and explicit user feedback
Domain characteristics
!4
Recommendation Systems
Retrieval perspective
from the Long Tail Users did not know about existence
Recommendation Systems
!5
Predict to what degree users like an item Most popular evaluation scenario in research
Interaction perspective
Give users a "good feeling"
domain
Educate users about the product
Recommendation Systems
!6
Popular RS
Recommendation Systems
!7
Now in the rst line of actuality since Apple suggested its use instead of the failed new maps in iOS6.
Recommendation Systems !8
Bestseller lists
Top 40 music lists
The recent returns shelf at the library
Unmarked but well-used paths thru the woods
Most popular lists in newspapers
Many weblogs
Read any good books lately?
!9
Recommendation Systems
Collaborative Filtering
The most prominent approach to RS
used by large, commercial ecommerce sites wellunderstood, various algorithms and variations exist applicable in many domains (book, movies, DVDs, ..)
Approach
use the "wisdom of the crowd" to recommend items (crowdsourcing)
users give ratings to catalog items (implicitly or explicitly) customers who had similar tastes in the past, will have similar tastes in the future
Recommendation Systems
!10
Collaborative Filtering
Approaches:
neighborhood
latent factor
Recommendation Systems
!11
item-item
alice item1 5 3 4 3 1 user1 user2 user3 user4 item2 3 1 3 3 5
4 2 4 1 5
user-user
item3
item4 4 3 3 5 2
item5 ? 3 5 4 1
user-user
item1 item2 3 1 3 3 5 item3 4 2 4 1 5 item4 4 3 3 5 2 item5 ? 3 5 4 1
5 3 4 3 1
Recommendation Systems
!13
item-item
item1 item2 3 1 3 3 5 item3 4 2 4 1 5 item4 4 3 3 5 2 item5 ? 3 5 4 1
5 3 4 3 1
Recommendation Systems
!14
cosine
a, b cos(a, b) = a b
simpu, u1 qprpu1 , iq rpu1 , qq ` u1 u bpu, iq rpu, q
1 1 u simpu, u q u similar formula for item-item
Recommendation Systems !15
0,853 0,707 0
3,00
-0,792
p5 3.2q 0.969 ` p4 3.4q 0.582 bpalice, item5q 3.25 ` 4.6 0.969 ` 0.582
Recommendation Systems !17
Scalability
user-user is a memory based method millions of users does not scale for most real-world scenarios
Recommendation Systems
!18
item-item
is model based models learned ofine are stored for predictions at run-time allows explanations no cold start problem
Recommendation Systems
!19
not all neighbors should be taken into account (similarity thresholds) not all item are rated (co-rated) not involved the loss function
Recommendation Systems
!20
Netix Prize
(Sep, 21, 2009): Netflix Awards $1 Million Prize and Starts a New Contest [...]try to predict what movies particular customers would prefer Accurately predicting the movies Netflix members will love is a key component of our service, said Neil Hunt, chief product officer (Netflix)
Recommendation Systems
!21
Netix Prize
The Netix dataset
more than 100 million movie ratings (1-5 stars)
Nov 11, 1999 and Dec 31, 2005
about 480, 189 users and n = 17, 770 movies
99% of possible rating are missing
movie average 5600 ratings
user rates average 208 movies
training and quiz (test-prize) data
Recommendation Systems
!22
Netix Prize
The loss function: root mean squared error (RMSE)
d
1 RM SE prpu, iq bpu, iqq2 |Quiz|
pu,iqPQuiz
Netix had its own system, Cinematch, which achieved 0.9514. The prize was awarded to a system that reach RMSE below 0.8563 (10% improvement)
Recommendation Systems
!23
Netix Prize
Leaderboard
See
Team 1 2 3 The Emsemble Grand Prize Team Opera Solutions and Vandelay United
RMSE
Date
BellKor's Pragmatic Chaos 0,8567 26/07/09 0,8567 26/07/09 0,8582 10/07/09 0,8588 10/07/09
Recommendation Systems
!24
Yehuda Koren, Robert M. Bell: Advances in Collaborative Filtering. Recommender Systems Handbook 2011: 145-186
Yehuda Koren, Robert Bell,
Recommendation Systems
!25
3 major components:
Recommendation Systems
!26
Baseline approach
Average in Netix: 3.7
Joe critical: 0.3 less than average
Titanic: 0.5 more than average (all users)
Recommendation Systems
!27
parameters bu and bi indicate the observed deviations of user u and item i ctively, from the average. For example, suppose that we want a baseline predi he rating of the movie Titanic by userapproach that the average rating Baseline Joe. Now, say movies, , is 3.7 stars. Furthermore, Titanic is better than an average movie nds to be rated 0.5 stars above the average. On the other hand, Joe is a cri , who tends to rate 0.3 stars lower than the average. Thus, the baseline predi The equations sound appealing, but Koren and Bell + 0.5. In o Titanics rating by Joe would be 3.9 stars by calculating 3.7 0.3 propose to can solve the a least square approach:
stimate bu and bi onelearn it using least squares problem 4 Y
2 2 2 min (rui bu bi ) + 1 ( bu + bi ) .
(u,i)K Netix data b
i 99% of the possible ratings are missing beca where small portion of the movies. The (u, i) pairs for whi only a in the set K = {(u, i) | rui is known}. Each user u is asso denoted by R(u), a regularization all the avoid and the last term is which contains term toitems for which r Likewise, overtting R(i) denotes the set of users who rated item i. a set denoted by N(u), which contains all items for whic !28 Recommendation Systems preference (items that he rented/purchased/watched, etc.).
Baseline approach
In the Netix data
The average rating () = 3.6 Learned bias user (bu) has average 0.044 their absolute value (|bu|) is 0.32 item (movie) (bi) has average -0.26 (s.d. 0.48), (|bi|) is 0.43
baseline RMSE 0,9799 Cinematch 0,9514 Prize 0,8567
Recommendation Systems
!29
Improvements
temporal dynamics (temporal drift of preferences)
implicit feedback (movie rental history (not available; rated movies!),
baseline + temporal +tem (spline) Cinematch Prize
RMSE
Recommendation Systems
0,9799
0,9771
0,9603
0,9514
0,8567
!30
Matrix factorization
Tries to capture users and items relationships Based on well-known algebraic decomposition of matrices used before in Information Retrieval (LSI) Intended idea: consider latent variables As implemented by Koren and Bell, this approach won the Netix Prize
Recommendation Systems
!31
Matrix factorization
Transform both items and users into a feature space of lower dimensionality (k), the latent space
Tries to explain ratings by characterizing both items and users on factors automatically learned from data.
Factors might measure aspects as comedy, drama, amount of action, ...
Efcient implementation ofine
Admit improvements in temporal drift and implicit feedback
!32
Recommendation Systems
S = diag(s1 , . . . , sn ),
S = diag( s1 , . . . ,
s1 s2 , . . . sn1 sn 0
sn )
and then
M = (U
Recommendation Systems
S) ( S Items )
!33
(Itemsk)T
Mk =
#Users
Uk
Mk = (Uk
Recommendation Systems
Sk ) (
Sk
T Itemsk )
!34
Matrix factorization
%fill missing values with 0 Yehuda % fix k <= rank(M); Koren and Robe % call pu row u-th of U_rep
by also adding in the aforementioned %call qi row i-th of Items_rep Items_rep = Itemsk * sqrt(Sk); baseline predictors that depend
r or item. Thus, a rating is predicted by the rule rui =
T + bi + bu + qi pu .
r to learn the model parametersk(b ,(bi ,Sk ItemsT ) minimize the pu and qi ) k we Mk = (Uk S ) u uared error
Recommendation Systems !35
Matrix factorization
In this case M
alice item1 5 3 4 3 1 0,00 user1 user2 user3 user4 bi item2 3 1 3 3 5 -0,20 item3 4 2 4 1 5 0,00 item4 4 3 3 5 2 0,20 item5 ? 3 5 4 1 0,00 bu 0,80 -0,80 0,60 0,00 -0,40 3,20
Recommendation Systems
!36
Matrix factorization
M = rui - () - bu - bi;
alice item1 1,0 0,6 0,2 -0,2 -1,8 user1 user2 user3 user4
item2
Recommendation Systems
!37
Matrix factorization
M [U S Items] = svd(M);
U=
S=
Recommendation Systems
Matrix factorization
[U S Items] = svd(M);
Items =
Recommendation Systems
!39
Matrix factorization
For k = 2
Uk =
-0,143 -0,290 -0,097 -0,406 0,849 0,348 0,185 0,478 -0,762 -0,188
Sk =
4,971 0,000 0,000 0,000 2,591 0,000
Itemsk =
-0,359 0,515 0,575 -0,300 -0,431 0,403 -0,478 0,496 -0,581 0,159
Recommendation Systems
!40
Matrix factorization
k=2
1
It1 U2 It3
0.5
Alice It5 U1
0
U4
0.5
It2 It4 U3
1.5 1
Recommendation Systems
0.5
0.5
1.5
2
!41
Matrix factorization
Recommendation Systems
!42
Matrix factorization
However,
Earlier works relied on imputation:
increases enormously the amount of data to be handled
data is distorted due to inaccurate imputations
Recommendation Systems
!43
is created by also adding in the aforementioned baseline predictors that depend o on the user or item. Thus, a rating is predicted by the rule
T rui = + bi + b + q pu estimators, Korenuand iBell,.set
Matrix factorization
To compute all an optimization problem that admits an efcient solution and In orderavoids the problem parametersvalues
, pu and qi ) we minimize and rR to learn the model of missing (bu , bi Yehuda Koren the larized squared error
ted by min adding in the aforementioned+ (b2 + bpredictors that2dep also (rui bi bu qT pu )2 baseline 2 + qi 2 + pu ) . 4 i i u
user b ,q ,p (u,i)K a rating is predicted by the rule or item. Thus,
The constant 4 , which controls +extentbu + qT pu . rui = the bi + of regularization, is usually determ i by cross validation. Minimization is typically performed by either stochastic gr ent descentqor alternating least squares. (b , b , inspired in ) we and rows minimize i and pu are vectors of k components order to learn the model parameters u i pu and qi columns of the svd full matrix approach Alternating least squares techniques rotate between xing the pu s to solve fo squared error q s to solve for the p s. Notice that when one of these is take qi s Recommendation Systems i and xing the u !44 a constant, the optimization problem is quadratic and can be optimally solved;
Matrix factorization
The optimization problem can be solved
Alternating least squares technique rotate between
xing the pus to solve for the qis, and
xing the qis to solve for the pus
(each are quadratic problems that can be optimally solved)
Stochastic Gradient Descent
Recommendation Systems
!45
Matrix factorization
The optimization problem can be solved
Stochastic Gradient Descent
= 0.005; 4 = 0.02; for rui K do T rui = + bi = bu + qi pu ; eui = rui rui ; bu bu + (eui 4 bu ); bi bi + (eui 4 bi ); qi qi + (eui pu 4 qi ); pu pu + (eui qi 4 pu ); end for
!46
Recommendation Systems
for rui K do T rui = + bi = bu + qi pu ; eui = rui rui ; bu bu + (eui 4 bu ); bi bi + (eui 4 bi ); qi qi + (eui pu 4 qi ); pu pu + (eui qi 4 pu ); end for Considering the set R(u) of items that each user u has rated as an implicit feedback
= 0.007; 5 = 0.005; 6 = 0.015; repeat for rui K do T rui = + bi = bu + qi pu ; eui = rui rui ; bu bu + (eui 5 bu ); bi bi + (eui 5 bi ); 1/2 qi qi + (eui (pu + |R(u)| jR(u) yj ) 6 qi ); pu pu + (eui qi 6 pu ); for j R(u) do 1/2 jj yj + (eui |R(u)| qi 6 yj ); end for end for = 0.9 until convergence %around 30 iterations
!47
Recommendation Systems
times SVD++
Recommendation Systems
!48
RMSE = 0.8567
Remember that the second team (The Ensemble) reached the same score. The victory was awarded to Koren and Bell since their results were submitted 20 minutes before.
Recommendation Systems
!49
Playlists:
Shuo Chen, Joshua Moore, Douglas Turnbull, Thorsten Joachims, Playlist Prediction via Metric Embedding, ACM Conference on Knowledge Discovery and Data Mining (KDD), 2012.
Recommendation Systems
!50
References
Yehuda Koren, Robert M. Bell: Advances in Collaborative Filtering. Recommender Systems Handbook 2011: 145-186 William W. Cohen: Collaborative Filtering: A Tutorial. Carnegie Mellon University Dietmar Jannach, Gerhard Friedrich: Tutorial: Recommender Systems. International Joint Conference on Articial Intelligence (IJCAI). Barcelona, July 17, 2011. TU Dortmund, AlpenAdria Universitt Klagenfurt)
Recommendation Systems
!51