Several approaches to collaborative filtering have been studied, but seldom have studies been reported for large (several million users and items) and dynamic (the underlying item set is continually changing) settings.
Research scientist Mayur Datar describes our approach to collaborative filtering for generating personalized recommendations for users of Google News.
We generate recommendations using three approaches: collaborative filtering using MinHash clustering, Probabilistic Latent Semantic Indexing (PLSI), and covisitation counts. We combine recommendations from different algorithms using a linear model.
Our approach is content-agnostic and consequently domain-independent, making it easily adaptable for other applications and languages with minimal effort. Datar describes our algorithms and system setup in detail and reports results of running the recommendations engine on Google News- Association for Computing Machinery
Mayur Datar, a Stanford PhD in computer science, is a research scientist for Google Inc. and authors publications on many computer-software-related issues.
Google research scientist Mayur Datar gives an overview of what Google News is and how it works. He explains that Google News aggregates news stories about any particular event from thousands of sources, so multiple viewpoints are available about any particular news item.
When users of Google News sign in and use their account, they can allow Google to track their searches and give them a personalized home page containing news stories they might be most interested in.
Google research scientist Mayur Datar describes the software engineering challenges for Google News. Like many other sites, they must provide millions of daily users with millions of items of information.
Unique to the scale of Google News is their rate of item churn: users expect news stories to be updated every 10-15 minutes.
Google research scientist Mayur Datar compares content-based filtering with collaborative filtering.
Content-based filtering seeks to analyze the underlying content of each item a user likes and provide the user with more items with similar underlying content. Google News would analyze content using key words.
Instead, Google News has chosen to use collaborative filtering. This presents new items to a user based on the other users who have liked the same items. Datar explains that they chose this method because one of the unique strengths of Google News is its massive number of users.