Sei sulla pagina 1di 6

Analysis of Strategies for Item Discovery in Social Sharing on the Web

Ching-man Au Yeung
NTT Communication Science Laboratories 2-4 Hikaridai Seika-cho Soraku-gun Kyoto , 619-0237, Japan

auyeung@cslab.kecl.ntt.co.jp ABSTRACT
Social sharing sites, where users share their favourite items online, have attracted great attention and created large online communities in recent years. Besides allowing users to organise and share their favourite resources, these sites also oer users a platform for exploratory search and resource discovery. While these Web sites usually present users with a list of popular or recently added items, it is not clear whether users would benet from the current mechanisms of item presentation. In this paper, we frame the above issue from the perspective of the users. Instead of asking how a Web site should present the shared items, we ask what kind of strategies a user should take in order to maximise the chance of discovering new and interesting items. In particular, we focus on how users can treat other users as information lters, and examine how a group of users can be selected such that one can benet from their collected items. We carry out experiments of simulation on datasets collected from a popular social sharing site, and show that there are actually strategies, other than just tracking the popular items, that would expose a user to new and interesting items at an earlier time. We also discuss the implications and signicance of our study on social sharing on the Web and collaborative ltering in general. exploratory search and resource discovery, which are actions not necessarily driven by information needs. In a social sharing site, users would rely on other users to discover new and interesting items. In most cases, this may involve searching by using user-specied tags, browsing popular or recent items, browsing the collections of other users, or subscribing to users who have similar interests. In general, it is common for a system to present a list of items that have received much attention recently. However, from the perspective of the users, it is not clear whether presenting them with a list of popular items would facilitate the discovery of new and interesting items. As users are the ones who actively introduce items to the system and make them visible to other users, it is reasonable to expect that the notion of users as information lters in social sharing should be given more considerations. We frame the above issue of item discovery in social sharing sites from the perspective of the users. In other words, instead of asking how a Web site should present the shared items, we ask what kind of strategies a user should take in order to maximise the chance of discovering new and interesting items. In particular, we focus on how users can treat other users as information lters, and examine how a group of users can be selected such that one can benet from their collected items. We are concerned with whether a particular user would be able to come across an interesting item as early as possible given dierent strategies. In a more general sense, this is related to the problem of suggesting potential friends to a user such that he will be able to benet from the shared items of these friends in the future. In order to study this problem, we compare several different methods of selecting a group of users that are potentially good lters for other users. Then, by making use of a large dataset collected from Delicious, the popular social bookmarking site, we carry out simulations and study which method performs the best in bringing new and interesting items to the attention of the users as early as possible. The experiment shows that users can benet from strategies such as following the recently active users or users that have attracted a lot of followers, other than simply monitoring the list of popular items. From a practical point of view, it is understandable that it would be dicult for users to identify certain group of users to follow, and hence our results mean that a social sharing system should support users with the identication and selection of such a group of users. The rest of this paper is structured as follows. In the next section, we will briey introduce social sharing sites and discuss related works on facilitating item discovery. In Section

Keywords
Social sharing, item discovery, item adoption

1.

INTRODUCTION

Social sharing sites such as Delicious1 , Flickr2 and YouTube3 , where users share their favourite items online, have attracted great attention and created large online communities in recent years. Unlike traditional Web search engines, which answer user queries by employing algorithms that rank pages according to their content or link structures, these social media Web sites aggregate the preferences of a large number of users and present items based primarily on their popularity. Besides allowing users to organise and share their favourite resources on the Web, they also oer users a platform for
1 2

Delicious: http://www.delicious.com/ Flickr: http://www.flickr.com/ 3 YouTube: http://www.youtube.com/


Copyright is held by the authors. Web Science Conf. 2010, April 26-27, 2010, Raleigh, NC, USA. .

2.

ONLINE SOCIAL SHARING

Many online social sharing sites have become popular platforms for social interactions. Examples include Delicious for sharing bookmarks, CiteULike4 for sharing academic references, and LibraryThing5 for sharing books. These Web sites allow users to express their preferences, organise their favourite items on the Web, and sharing the items with other users. When there are more and more users, these Web sites turn out to be places where users can discover new and interesting items through the contributions of other users. Users would make use of social sharing sites to perform exploratory search [2, 7] and resource discovery even when they do not have specic information needs. Mathes [9] remarks that the benet of serendipity is one of the reasons why social bookmarking systems like Delicious have become so popular. In fact, in order to attract more visitors, social sharing sites do make good use of the fact that users constantly contribute new items to the system. The most straightforward method to determine whether an item is interesting is by counting how many users have adopted the item. In Delicious, URLs bookmarked by a large number of users recently will be featured on the page of popular bookmarks (although the exact mechanism is unknown).6 In LibraryThing, there is a list of books that received the largest number of reviews in the past.7 For other multimedia content, as in Youtube, a system can also present items that are currently viewed by some other users. This kind of presentations capture the zeitgeist of the online community which is assumed to be of interest to most of the users. However, given that the number of items in a social sharing site rises very quickly, a list of popular item would usually contain only a very tiny portion of all the items a user is interested in, and usually the list can be very narrow and be biased towards popular topics. Hence, other methods of presenting items to the users are desirable. Of course, there have been already a lot of proposals to tackle this problem, as we can see in a lot of research works of generating recommendations in social sharing sites [3, 8, 10, 14]. However, recommendation algorithms in general may suer from data sparsity and scalability, which are common problems in social sharing sites that can involve huge number of users and items. On the other hand, besides generating recommendations based on similar interests, there are also research works focusing on the asymmetric relations between users in a social network, i.e. whether a user would inuence other users when they make their decision to adopt an item [5, 12, 11]. Hence, identifying users who are inuential to a target user can also be crucial in improving accuracy of recommendations. In this paper, instead of trying to improve precision and recall of recommendations, we are interested in how we can
4 5

Cumulated Bookmarks

3, we discuss item discovery and describe several strategies that would be useful in assisting users to discover new and interesting items. In Section 4, we present our empirical study using data collected from Delicious, a popular social bookmarking site. Finally, we give conclusions in Section 5.

7 10 6 5 4 3 2 1 00

Delicious

200

400

Timeline (days)

600

800

1000

1200

Figure 1: The number of bookmarks added on a set of URLs over time. These URLs all rst appeared in Delicious in the rst half of 2004. make items which a user is interested in visible to the user as early as possible, by focusing on identifying groups of users that act as information lters. This represents something between computationally expensive collaborative ltering algorithms and simple methods such as ranking of items by their popularity. In fact, coming up with a list of popular items is only one of the many ways to aggregate the collective opinions of the users. Instead of considering the contributions of all users equal, we can focus on a particular group of users that would potentially allow us to get to know a dierent set of items, and we will discuss dierent possibilities in the next section.

3. 3.1

ITEM DISCOVERY Delay in Item Adoption

CiteULike: http://www.citeulike.org LibraryThing: http://www.librarything.com/ 6 http://delicious.com/popular/ 7 http://www.librarything.com/zeitgeist 2

Since in most cases we only know whether a user has adopted an item or not, we consider that an item is interesting to a user if the user adopts the item at some time. This is reasonable because when a user cannot assign any rating to an item, the fact that the user keeps the item in his collection suggests that it is interesting to him and he would like to refer to it again in the future. In some cases where ratings are allowed, such as in Epinions8 , whether an item is interesting can further be judged by the scores given by the users. To facilitate item discovery, a system should try to present these interesting items to a user as soon as these items appear in the system. From the perspective of the users, this is equivalent to where they should look for these items. Currently, as we have discussed above, this is usually done by checking a list of popular items, or browsing all the items in the system at random. Before we discuss about the other possibilities, it would be good to rst take a look at how much room there is for improvement. Figure 1 shows the growth of the number of bookmarks on a set of about 60,000 URLs that were rst introduced to Delicious in the rst half of 2004. We can see that the number of bookmarks continue to rise even after 3 years the URLs rst appeared in the system. Among these bookmarks, some are of course made by users who only joined Delicious at a later time, and some are probably made by users who only found some of these URLs useful
8

Epinions: http://www.epinions.com/

or interesting such that they bookmarked them at the time they did instead of earlier. However, it is also possible that many users bookmarked these URLs long after they rst appeared in the system because they had not come across these URLs earlier. In fact, if we count the number of days between the day when a bookmark was made and when the URL was rst introduced to Delicious (or when the specic user joined Delicious, whichever was later), about 35% of the adoptions involve more than 100 days delay. While it is possible that URLs bookmarked on Delicious have long-lasting values such that users only bookmark them whenever they need the information provided by the URLs. However, the fact that they have bookmarked the URLs also suggests that they would like to keep it for future reference. Therefore, we believe it is desirable to let the users come across items they are interested in at an earlier time, and the above ndings suggest that there can be a lot of room for improvement. In the following, we present dierent methods of how this can be done by treating users as information lters.

activeness of a user. Formally, the activeness of a user in a period [t0 , t1 ) is dened as follows: activeness(u, t0 , t1 ) = |{i|(u, i, t) A t0 t < t1 }|. (1)

3.2.2

Following users with many followers

3.2

Strategies

The advantage of having a large community of users expressing their preferences over a set of shared items is that one user (target user) can get to know about new and interesting items through other users. From the perspective of a user of a social sharing system, other users can be considered as lters of the large volume of information available in the system. Paying attention to dierent groups of users would then result in an exposure to a dierent set of items. Of course, collaborative ltering is a technique that is based on the notion of treating users as lters. However, usually only similarity between users is considered when selecting a group of users. Here, we consider selecting a particular group of users to follow as a strategy for discovering new items. Of course, whether a user is able to identify such groups of users to follow depends on whether the social sharing site is providing such information. However, we choose to view this problem from the perspective of the users as it allows more intuitive discussion of the process of selecting a group of users as information lters. In the following we describe several strategies. Firstly, we dene some symbols such that we can mathematically describe the details below. Let U be a set of users and I be a set of items. The fact that a user adopts an item at time t is represented by a tuple (u, i, t) A, where u U , i I and A is the set of adoption patterns.

The number of users who have followed the action of a particular user is another aspect by which we can judge whether we will benet from this user. In systems in which there is an explicit social network, this can be simply the number of friends/subscribers of a particular user. For example, in the case of Epinions, we can select a group of users that are trusted by the largest number of other users. On the other hand, in systems in which an explicit social network is not present, we can consider the number of users who have adopted the same item later than the particular user. Users whose actions attract more followers should have a greater impact on the system and are thus more likely to help us discover new items. One problem with this strategy is that early users of the system would have an advantage over users who started at a later time, because the former has a longer time to accumulate followers. Hence, we can again consider certain time frame and only consider the number of followers attracted by a particular user within the period. Formally, the number of followers of a user in a period [t0 , t1 ) is dened as follows: f ollowers(u, t0 , t1 ) =
u U

|{u |(u , i, tx ), (u, i, ty ) A tx > ty t0 tx < t1 }|. (2)

3.2.3

Following predecessors

The above two dierent types of strategies imply that whenever a strategy is chosen the same group of users would be selected regardless of which user we are focusing on. However, it is possible to perform some personalisation and come up with a group of users that are dierent for dierent target users. In particular, we can consider the set of predecessors of the target user. By predecessors, we refer to users who adopted some items before the target user did. The reason behind considering these predecessors is that since the target user is interested in items that have been adopted by these users, the items that these users adopt in the future are also likely to be interesting to the target user. Formally, the set of predecessors at time t0 is dened as follows: predecessors(u, t0 ) = {u |(u , i, tx ), (u, i, ty ) A t x < t y t y < t 0 }, (3)

3.2.1

Following active users

and the score of a predecessor u (the number of times he precedes the target user) is dened as follows scorep (u, u , t0 ) = |{i|(u , i, tx ), (u, i, ty ) A tx < ty ty < t0 }|. (4)

The activeness of a user in a social sharing site can be measured by how many items he has adopted. The reason that an active user should be followed is obvious: since he is active, one would be more likely to encounter new items in his collection. However, the activeness of a user can be misleading if only the number of items of the user is considered. For example, a spammer who introduces a large amount of uninteresting items to the system would be considered as an active user. In addition, that a user was active in the past does not imply that he will still be active in the future. To solve the latter problem, we can restrict the measurement of activeness to a short period of time immediately before a decision has to be made, instead of considering the all-time 3

3.2.4

Following like-minded users

In a system which involves interactions between a group of users and a set of items, recommendations can be generated by using collaborative ltering techniques. In particular, user-based collaborative ltering involves identifying a group of users (neighbours) whose adoption histories are similar (correlated) to the target user. The items from these neighbours are believed to be interesting to the target user as well because they share similar interests. For a particular

user, the set of like-minded users should overlap the set of predecessors. This is because if a user is a predecessor of the target user it implies that they share similar adoption histories. The main dierence between the two groups is that for predecessors we consider the order of adoption, while for the like-minded neighbours we only care about whether two users have adopted the same items. Formally, if we have each user u characterised by an item vector vu whose elements vui s indicate whether u has adopted i, the similarity can be calculated using the cosine similarity measure: sim(u, u ) = vu vu , ||vu || ||vu || (5)

50 Percentage of Users 40 30 20 10 0 A F PR CF PO Items=50 A F PR CF PO Items=100 A F PR CF PO Items=500

and the similarity can be used to weight the items that have been adopted by the respective users. The objective of selecting a group of users is to treat them as lters such that the target user can focus on the items associated with these users to nd something new and interesting, instead of searching or browsing through all the items in system. In practice, given the limited attention of a user, he may only be able to keep track of a certain number of users [4], and for each user a certain number of items (e.g. the 100 most recent items). As a result, presentation of the items would benet from some simple ranking algorithms, such as ordering items based on how recent or how popular they are among the selected group of users, or weighting items by the relevant scores of the users depending on the strategy chosen.

Figure 2: The percentage of users to whom a strategy presented the highest number of interesting items in advance. The three blocks correspond to dierent number of items obtained. Here, A stands for activeness-based strategy, F for followers-based, PR for predecessor-based, CF for collaborative ltering, and PO for simple popularity. common ground for comparing the dierent strategies mentioned above. In the dataset, we observe that activities in Delicious show a periodic rise and drop with a cycle of 7 days, which is probably because users tend to use Delicious more frequently over the weekends. Hence, in our simulation we divide the timeline into intervals of 7 days, thus generate new groups of users and carry out the above measurement every 7 days. In particular, based on the description above, we come up with the following 5 types of strategies of following dierent groups of users for item discovery at a particular timestamp t: Following the recent active users (using activeness(u, t 7, t)); Following users with highest number of followers in the last 7 days (using f ollowers(u, t 7, t)); Following predecessors; Following like-minded users (collaborative ltering); Monitoring the list of popular items; To avoid data sparsity, we carry out simulation on 20 subsets of the whole datasets. Each subset corresponds to one of the 20 most popular tags in Delicious. For each of the subsets, we randomly select 1,000 users who have adopted more than 50 items in a selected 100-day period. Then we carry out the above procedures for each user, recording the number of interesting items exposed to the user under different strategies. Then we count the number of times each strategy returns the highest number of interesting items in advance for the selected users. Each of the rst 4 cases has two parameters, namely the number of users (neighbours) selected (how many users to follow), and the number of items monitored (how many items we obtain through the selected group of users). As for the last one, we only have one parameter, which is the number of popular items to monitor. In Figure 2, for each strategy we plot the percentage of all selected users to whom the strategy presented the highest number of items interesting to the users earlier than when 4

4.

EMPIRICAL STUDIES

In order to compare the dierent strategies described above, we carry out simulations on a large dataset collected from Delicious. The dataset, described in [13], is distributed by the authors for research purpose. It contains the bookmarking activities of over 950,000 users, involving over 50 million items (URLs) and spanning the period between Sep 2003 to Dec 2007.9 . In fact, Delicious allows users to subscribe to other users and establish a social network. This would allow a strategy of following a group of friends in ones social network. However, the dataset does not contain this information, and since it is anonymised we are not able to collect the corresponding data by ourselves. Hence we only focus on strategies mentioned in the previous section in this paper. During the simulation, at a particular time and for a particular user, we use one of methods described above to select a group of users, and generate a list of items from the collections of these users using simple weighting methods (e.g. by how active they are, how many followers they have, or how similar they are to the user in question, depending on the strategy under consideration). We then check whether in this list there exist items that are interesting to but are not yet adopted by this user at this time. The larger the number of such items in the list, the better the method is judged to be suitable for presenting items to the users. Note that like many experiments on recommendation algorithms, the items that are dened to be interesting to a user are biased to items that the user has actually adopted. There may well be other items that the user would nd interesting but we can never know. However, we believe the data provides a
9

http://www.dai-labor.de/index.php?id=1726

Percentage of Interesting Items Found

40 35 30 25 20 15 10 5 0 A F PR CF PO Items=50 A F PR CF PO Items=100 A F PR CF PO Items=500

Figure 3: The average percentage of interesting items returned by dierent strategies for the users in the simulation.

they would have adopted the items. For example, when the number of items is 50, the activeness-based strategy returns more interesting items in advance than other strategies for 29% of the users. In Figure 3, we plot the average percentage of interesting items returned by each strategy for dierent number of items. In the experiment, the number of users to follow is set to 100. We can see that when we focus on a relatively small number of items, following active users or users with many followers would allow more users to come across interesting items earlier than if they only check for the popular items. Following users who are active in the recent period seems to be very useful in facilitating users to discover new and interesting items, even when compared to personalised methods such as the predecessor-based strategy and collaborative ltering. Monitoring popular items only becomes better for more users when the number of items to monitor becomes very large. We believe that this result is probably due to the diversity of the list of items returned by the activeness-based strategy. Users who are recently very active would probably collect items or even introduce new items that are interesting to more users in the system. These users are also likely to adopt some items that are yet to attract the attention required to be placed in the list of popular items, therefore it becomes possible to discover new items earlier than monitoring the popular items. On the other hand, following users who have many followers is not as good. This may be because users with a lot of followers do not necessarily collect a lot of items, they may just happen to be a little bit earlier than most other users in adopting an item that would become very popular. The personalised strategies perform rather poorly, and this is probably due to the fact that data in Delicious is very sparse. In addition, recently popular items in Delicious are much more visible that the average items such that items interesting to the users appear in the list of popular items earlier than in the collections of a users similar neighbours. From this experiment, we can also see that dierent strategies actually help dierent users. For example, referring to Figure 2, while popularity-based strategy attains the highest percentage when number of items to monitor equals 500, the activeness-based strategy is still better for about 40% of the users. This suggests that, in order to facilitate item discovery, social sharing sites should provide users with more 5

dierent ways, in particular methods that treat active users as information lters, to explore items contributed by other users. Furthermore, we are aware of the fact that spamming activities can be quite common in social sharing sites [6]. Probably the activeness-based strategy is most vulnerable to spamming because spammers (or bots deployed by spammers) can perform a lot of actions in a short time such that they can be easily selected as an active user. In fact, if we consider the all-time active users, it is usually the case that we nd some spammers in the list of selected users. However, if we measure activeness only in a recent period, the chance of having a spammer on the list is greatly reduced. This is because while spammers perform a lot of adoptions within a short period of time, they are not found to do so constantly. On the other hand, a spammer is much less likely to attract many followers, to be a predecessor or to be a neighbour of any user in collaborative ltering, the other strategies are much less prone to spamming activities. However, we look forward to study more thoroughly the relations between spamming activities and dierent strategies in our future work. Lastly, a rather unexpected result is that following predecessors instead of similar neighbours actually allows a user to come across more interesting items earlier. Intuitively, predecessors should be a subset of neighbours because predecessors are users who adopt similar sets of items earlier than the target user, while neighbours include all users with similar sets of items regardless of whether they came before or after the target user. Such a result suggests that there might be some asymmetric inuence among the users such that some users have the tendency to adopt items from their predecessors. In fact, we investigate this issue in another paper [1], and discover that inferring inuence among users help us predict item adoption more accurately than using nearest-neighbour collaborative ltering techniques.

5.

CONCLUSIONS

In this paper, we approached the issue of item discovery from the perspective of the users, and compared several different strategies a user can adopt to increase their chances of coming across interesting items at an earlier time. While popularity has been commonly used to lter items, we nd that it is not always the best way for users to discover new and interesting items. Methods such as treating active users or users with a lot of followers as lters would allow many users to come across items they are interested in at an earlier time. In other words, instead of either treating the opinion of each user equal and present an aggregated list of popular items, or suggesting users to just follow their friends in a sharing network, a social sharing site can present items by doing ltering based on some well-dened user characteristics. While we approach the issue from the perspective of the users and discuss about strategies for item discovery, from the perspective of system design these methods can be easily implemented in a social sharing site. We believe our study provides valuable insight into what a social sharing system can do to facilitate users in item discovery. Social sharing or in general social media sites have provided new channels for both the promotion and discovery of new information. However, whether users can benet from the contribution of other users depends signicantly on how information is aggregated and presented. We believe

that the way information is presented should become more user-centric. In other words, more emphasis should be put on who contributes the items, and it is desirable to identify users who are not only active but are also consistent, credible and having inuences on other users.

6.

ACKNOWLEDGEMENT

The author would like to thank Adam Jatowt for his valuable comments and discussions.

7.

REFERENCES

[1] C.-m. Au Yeung and T. Iwata. Capturing implicit user inuence in online social sharing. In Proceedings of 21st ACM Conference on Hypertext and Hypermedia, Toronto, June 13-16, New York, NY, USA, 2010. ACM. [2] L. J. Bannon, I. Wagner, C. Gutwin, R. H. R. Harper, and K. Schmidt. Social bookmarking and exploratory search. In Proceedings of the 10th European Conference on Computer-Supported Cooperative Work, pages 2140. Springer, 2007. [3] T. Bogers and A. van den Bosch. Recommending scientic articles using citeulike. In RecSys 08: Proceedings of the 2008 ACM conference on Recommender systems, pages 287290, New York, NY, USA, 2008. ACM. [4] H. Chun, H. Kwak, Y.-H. Eom, Y.-Y. Ahn, S. Moon, and H. Jeong. Comparison of online social relations in volume vs interaction: a case study of cyworld. In IMC 08: Proceedings of the 8th ACM SIGCOMM conference on Internet measurement, pages 5770, New York, NY, USA, 2008. ACM. [5] D. Crandall, D. Cosley, D. Huttenlocher, J. Kleinberg, and S. Suri. Feedback eects between similarity and social inuence in online communities. In KDD 08: Proceeding of the 14th ACM SIGKDD international conference on Knowledge discovery and data mining, pages 160168, New York, NY, USA, 2008. ACM. [6] P. Heymann, G. Koutrika, and H. Garcia-Molina. Fighting spam on social web sites: A survey of approaches and future challenges. IEEE Internet Computing, 11(6):3645, 2007.

[7] Y. Kammerer, R. Nairn, P. Pirolli, and E. H. Chi. Signpost from the masses: learning eects in an exploratory social tag search browser. In CHI 09: Proceedings of the 27th international conference on Human factors in computing systems, pages 625634, New York, NY, USA, 2009. ACM. [8] I. Konstas, V. Stathopoulos, and J. M. Jose. On social networks and collaborative recommendation. In SIGIR 09: Proceedings of the 32nd international ACM SIGIR conference on Research and development in information retrieval, pages 195202, New York, NY, USA, 2009. ACM. [9] A. Mathes. Folksonomies - cooperative classication and communication through shared metadata, December 2004. [10] A. Shepitsen, J. Gemmell, B. Mobasher, and R. Burke. Personalized recommendation in social tagging systems using hierarchical clustering. In RecSys 08: Proceedings of the 2008 ACM conference on Recommender systems, pages 259266, New York, NY, USA, 2008. ACM. [11] X. Song, Y. Chi, K. Hino, and B. L. Tseng. Information ow modeling based on diusion rate for prediction and ranking. In WWW 07: Proceedings of the 16th international conference on World Wide Web, pages 191200, New York, NY, USA, 2007. ACM. [12] X. Song, B. L. Tseng, C.-Y. Lin, and M.-T. Sun. Personalized recommendation driven by information ow. In SIGIR 06: Proceedings of the 29th annual international ACM SIGIR conference on Research and development in information retrieval, pages 509516, New York, NY, USA, 2006. ACM. [13] R. Wetzker, C. Zimmermann, and C. Bauckhage. Analyzing social bookmarking systems: A del.icio.us cookbook. In Proc. of Mining Social Data Workshop, collocated with ECAI 2008, pages 2630, 2008. [14] X. Xin, I. King, H. Deng, and M. R. Lyu. A social recommendation framework based on multi-scale continuous conditional random elds. In CIKM 09: Proceeding of the 18th ACM conference on Information and knowledge management, pages 12471256, New York, NY, USA, 2009. ACM.

Potrebbero piacerti anche