Recommender systems are subject to active research and different approaches emerged. In the context of this year's ECML PKDD Discovery Challenge, BibSonomy's tag recommendations were provided by 14 different recommender systems from 10 different research teams in 7 different countries during the last five weeks. The challenge consisted of three tasks where the first two tasks were dealing with fixed datasets obtained from BibSonomy, while the third task's subject was to provide tag recommendations to the user in the running system.
Yesterday, during the ECML PKDD Discovery Challenge Workshop, the challenge's participants presented their recommender systems and discussed the different approaches, still ignorant of the third task's winning team, which finally was announced in the evening during the conference's opening session.
Rating the Systems
Algorithms for tag recommendations are typically evaluated by computing some performance measure in an "off-line" setting, that is, by iterating over posts in a dataset, which was derived from a social bookmarking system, presenting only a user and a resource to the recommender system. Thus, for each post, the set of suggested tags can be compared with those the user had assigned. Participants in Task 1 and Task 2 were evaluated in such a setting.
But these "off-line" settings not only ignore some constraints in real live applications (e.g. cpu usage and memory consumption), they also can't take into account the effect of presenting a set of recommended tags to the user. To evaluate these effects, we set up Task 3, were recommender systems were integrated into BibSonomy and the recommender systems had to deliver their tag recommendations within a timeout of 1000 ms.
For evaluating the different recommender systems (in the off-line settings as well as Task 3), we calculated precision and recall for each system. While precision measures, how many recommended tags where adequate, recall takes into account, how many of the tags the user actually assigned to the resource where recommended.
Figure 2 shows the final results of the on-line challenge (which is available here). For each recommender system, we calculated precision and recall, considering only the first n tags (for n=1,2,..., 5) and averaged over all posts. The top blue graph for example shows, that from the corresponding recommender system's five recommended tags (the very right point) around 18% were chosen by the user (precision 0.18) and around 23% of the tags which the user finally assigned to the resource were "predicted" by the recommender.
The winning teams are:
- Task 1: Marek Lipczak, Yeming Hu, Yael Kollet, and Evangelos Milios (Paper)
- Task 2: Steffen Rendle and Lars Schmidt-Thieme (Paper)
- Task 3: Marek Lipczak, Yeming Hu, Yael Kollet, and Evangelos Milios (Paper)
We are happy to say, that it was an interesting challenge which gave substantial insight into the performance of different approaches to the task of tag recommendation. We'd like to thank everybody who contributed to this challenge - last but not least each of BibSonomy's users.