GroupLens: Extensible Aggregation-based Filtering

Next: POLITeam: Interleaved Message- Up: Message and Artifact-based Previous: Message Filtering

GroupLens: Extensible Aggregation-based Filtering

In comparison to the moderator-based approach, Tapestry is more sophisticated in that it considers the input of multiple users but is harder to use and less efficient since the user must consider and the system must store all the ratings with a message. It is perhaps for this reason that Tapestry restricts the raters to a single site. GroupLens [] shows that is possible to attain the benefits of both approaches.

Like Tapestry, it allows multiple users to rate an article, but unlike Tapestry, allows these users to be at arbitrary sites and aggregates, for each new reader of the article, all the ratings of the message into one number that guesses how well that reader would like the message.

How does it guess a readers preferences? The idea is simple and intuitive: How well a reader likes a message is correlated to how well other readers with similar preferences have liked it, and how similar are the preferences of two readers can be determined by correlating the ratings they have given to the same messages.

To be more precise, given two users, A and B, who have rated some messages M1 to Mn, the correlation coefficient CAB, can be computed as follows

Sum (i = 1 to n) ((Ai - Amean)* (Bi - Bmean))
---------------------------------------------
Sqrt ( (Sum (i = 1 to n)  (Ai - Amean)**2) * Sum (i = 1 to n) (Bi - Bmean)**2))

where Amean and Bmean are the means of the ratings given by A and B to the n messages.

Now given some set of users S who have rated an article, i, that A has not read, we can predict the rating of A as:

Amean +  Sum (over all users B in S) (Bi - Bmean)*CAB
          -------------------------------------------
         Sum (over all users B in S) CAB

Users can associate with a rating not their real names but pseudo names, an idea borrowed from the refereeing process typically used in the evaluation of conference research papers wherein each reviewer is assigned a unique pseudonym.

The GroupLens system shows how this idea can be implemented in an extensible manner on top of the current news architecture. It adds to the news clients and servers, another process, called a ''better bit bureau,'' which is responsible for implementing the aggregation semantics. A modified news client displays aggregated news ratings to its user and collects ratings from the user. It sends/receives the message directly to/from the news server and the associated rating to/from a better bit bureau. The better bit bureau strips the rating from the message and sends it to the news server. In addition, it sends the ratings in a separate message to a special newsgroup, from which other better bit bureaus can retrieve and aggregate the ratings before giving them to the news clients connected to them. To reduce message traffic, they batch all ratings produced by a user in one news session in a single message. (Is the space required to save ratings significant? If so, can be alleviate this problem by perhaps reducing the functionality?)

This approach provides a nice partioning of function in a news system: News servers are responsible for message distribution, news clients for user-interface, and better bit bureaus for aggregation. It does not require changes to existing news servers but does require some changes to the news clients so that they can customize the filtering user interface for its user. By implementing the semantics of aggregation in better bit bureaus, it allows all news clients to share a common implementation of these semantics. Moreover, users can provide their own better bit bureaus to change the aggregation semantics. Thus, this system is completely extensible and is designed to reuse the message distribution facilities of existing systems.

Next: POLITeam: Interleaved Message- Up: Message and Artifact-based Previous: Message Filtering

Prasun Dewan
Tue Jan 28 17:46:09 EST 1997