Supervisor: Francesco
Ricci
(October 3, 2008)
In the classical collaborative filtering recommendation approach, the rating prediction method is based on the computation of the similarity of the active user, to whom a recommendation has to be made, with the other users. This similarity is computed by comparing the ratings provided by the two users to a common set of products. In many cases, given two users, there is a small set of products that the two users have co-rated. This creates a major problem to the Collaborative filtering (CF) algorithm as the reliability of the similarity assessment is strongly dependent on the number of co-rated products. Actually two users may have not rated exactly the same products but could have rated products that are similar. If in the user-to-user similarity computation one could exploit that, then the number of user for whom the similarity with the active user could be computed will increase. The objective of this thesis is to investigate this issue and determine an effective new similarity function that could better exploit the available user profile information (ratings). The final objective is to increase the prediction accuracy of the Collaborative Filtering method. A second objective of this research is to identify extensions of the CF method that will make it possible to recommend generalized products. Having defined a concept of similarity of products, the products can then be grouped according to this similarity (hence forming clusters of products) and the goal of the recommender would be to identify what clusters could be recommended to a user. Recommending a cluster of products, e.g., products sharing some common characteristic could help in many ways the process: the user can better understand the rationale of a recommendation, as the recommended cluster would be characterized by a small set of common features, the user can better perceive the richness of the catalogue without and extensive browsing, the user is not pushed with not-negotiable recommendation, but has the space to choose the best option among a set of suggested products, i.e., those belonging to the cluster.
Trip@dvice is a recommendation methodology based on Case-Based Reasoning (Ricci et al., 2006). Trip@dvice exploits content features, user preferences and the choices made by users in the past, to select and rank in a personalized way the most suitable tourist products. These systems, relying on Case Based reasoning methodology, represent the knowledge necessary to support recommendation functionalities as a set of cases, where each case is a hierarchical XML document containing all the relevant information acquired during an interaction session of the user, like user’ preferences and selected products. This technology has been applied in several operational web sites including www.visiteurope.com and www.atl.biella.it.
In the past we performed only simple evaluations of the goodness of the recommendations, and it would be interesting to measure how the quality of the ranking produced by the system is influenced by the quality/quantity of the case base. In this project the goal is to try to correlate various measures that characterize the case base with the quality of the recommendation. The case base can be measured with respect to: the number of cases, the average similarity of cases, the average size of the cases, the diversity of cases, the quality of cases (external judge), and some others. The quality of the recommendation can be measured with subjective measures (surveys), and with objective measures: time to complete the task, number of page views, rate of success, or the position of the selected items in the displayed ranked list.
Hence to complete this project the student must identify the important measures that characterize the case-base and develop procedures to extract these measures from the log data of the web application. Then these measures must be correlated to the performance of the recommender system, by running some evaluation sessions. The outcome must be a report, describing the state of the cases and the impact of the cases on the system performance. The report should be understandable by the service provider to gain insight into the behavior of the system, its users and their preferences.The project will use real data coming from two portals and will be developed in collaboration with www.ectrlsolutions.com.
Trip@dvice is a recommendation methodology based on Case-Based Reasoning (Ricci et al., 2006). Trip@dvice exploits content features, user preferences and the choices made by users in the past, to select and rank in a personalized way the most suitable tourist products. These systems, relying on Case Based reasoning methodology, represent the knowledge necessary to support recommendation functionalities as a set of cases, where each case is a hierarchical XML document containing all the relevant information acquired during an interaction session of the user, like user’ preferences and selected products. This technology has been applied in several operational web sites including www.visiteurope.com and www.atl.biella.it.
MapMobyRek is a mobile recommender system (J2ME based) integrating a conversational preference acquisition technology based on “critiquing” with map visualization technologies. MapMobyRek is a conversational mobile recommender system that can effectively and intuitively support travelers in finding their desired products and services. This system has been developed and tested in previous projects.
The goal of this project is to provide some of the Trip@dvice and MapMobyRek fuctionalities to a mobile user. The major constraint is that the client of this new service (VisitFinland) wants to provide the service to the largest number of mobile phone types (visitors of Finland). Hence the system designer decided to rely on a Wap/XHTML application architecture. This means that the new service will be implemented server-side (in a Java-based web application server) and the (thin) client must deal mainly with visualization. The student must understand both Trip@dvice and MapMobyRek recommendation methodologies and software technologies and work side-by-side with the main software archutect to design this new mobile services, including the functionality and the GUI. The most important functions that will be developed are: visualization of the travel plan, completion of the travel plan, revision of the travel plan in relation with new context-dependent events.
Case-based recommender systems rank items/cases using a similarity function that assign a single numeric score to each case in the library, given a partially defined case as input, i.e., a query case. Research has focused on similarity learning, i.e., methods to adapt the similarity function to obtain a better retrieval set, i.e., a set of top ranked items that satisfy as much as possible the user preferences modeled in the query case. A completely unrelated line of research has investigated methods to combine/aggregate a collection of rankings on a set of common items (typically web pages) to produce a common ranking that is as close a possible to the individual rankings. In fact the two problems are strictly related and one can view the similarity-based ranking as an instance of a general problem of rank aggregation. The goal of this research project is to experimentally compare similarity-based ranking and rank aggregation in some real recommendation problems. The hypothesis is that rank aggregation can increase user satisfaction as the user can manually control the aggregation of the rankings produced by different conditions (features) and better conversational systems can be built using this approach. The rank aggregation problem we are addressing is similar to that used to cope with the word association problem, where the goal is to retrieve (sort) the documents that associate to the largest number of query keywords. To measure the quality of the different rankings we shall use the method proposed by Joachims, that relies on the comparison of clicks received by items in two rankings presented in a merged form to the user.
Generation of Semi-Synthetic Context Enriched Rating Data (Bachelor)
Recommender systems are powerful tools helping on-line users to
overcome information overload. One way to improve the accuracy of the
system is to exploit contextual information related to the user and the
item. Contextual data may include information such as location, time,
weather, needs and preferences, traffic condition, etc. Contextual data
varies greatly according to the type of items.
The goal of this project is to design, develop and validate a component
for generating semi-synthetic context enriched rating data for a travel
planning recommender system. The component would combine content-based
and knowledge-based recommender system approaches to generate precise
ratings and therefore enable the recommender system to achieve good
accuracy and scalability. The users of the system will specify their
preferences about the features of some Places of Interest (POI) in a
given context, and this information will be used to generate (predict)
their ratings for yet inexperienced POIs in different contexts. Here,
the challenge is to create a meaningful way to model the rating
dependencies on yet unseen context and integrate the expert knowledge
into the prediction process.
The generated data will be used to bootstrap a POI recommender system
for Bolzano city. This is an ongoing project, which aims to build a
technology for real time revision of the recommendation list in tourism
domain.Moreover, the data will also be used for benchmarking
context-aware Collaborative Filtering (CF) systems.
The outcome of the thesis would be i) a context-sensitive rating
prediction model for POIs, ii) a web-based system for collecting
on-line user preferences, and iii) the analysis and evaluation of the
data generation (rating prediction) procedure: both user study and
off-line experiments on system scalability and flexibility.