Wednesday, April 29, 2009

Jiminy: A Scalable Incentive-Based Architecture for Improving Rating Quality (Kotsovinos et al., 2006)

This paper was one of a set I printed out as part of my look at social psychology inspired approaches to online communities. It turns out it is not particularly inspired by social psychology, but rather just references a few other papers that do.

The authors take a mathematical approach to try and determine the honesty of users contributing to a ratings system, and they test their approach on the GroupLens data set of a million movie ratings. The paper is also concerned with the scalability of their algorithm, but I am less interested in this side of the paper at the moment.

The authors define a "nose length" in their Pinocchio model, which is an indication of how honest a user is being in their ratings. The "nose length" is calculated from the log-likelihood of the user ratings of films based on all the other ratings for the same film by other users. Basically the more that you diverge from how others have rated things, the more likely you are to be classified as dishonest. This "nose-length" or Z value is calculated for all users in the GroupLens data set as shown in the figure above. The authors show that inserted data for made-up dishonest users clearly falls outside of the majority of nose-lengths for real users; and go on to assert that this demonstrates that their model allows them to detect dishonesty.

I think this in an interesting paper, but I am not convinced that the model is necessarily detecting dishonesty. It seems clear that it can be used to detect robot like behaviour and distinguish that from the human data set, but the authors have no real information on which humans were behaving honestly and which were not. The real test of this system would be to see how it operated with a real community of raters, but the evaluation presented in this paper is of scalability with mock data. Of course getting to play with a real community is of course the trick, and so this shouldn't be seen as too serious a flaw in the paper, but I think the authors should be careful about declaring that they can distinguish honesty from dishonesty.

One of the main things the paper made me think was the way community's of raters provide a data set that is much more amenable to mathematical analysis. In an online community with discussions and opinions it would be much much more challenging to be trying to automatically detect levels of honesty, and one might question the benefit of doing so. Makes me think that the field of rating community research (collaborative filtering?) is quite different from the field of online community research - arguably a subset, but then I think it also falls in the machine learning camp. In terms of providing evidence for the validity of one online community design pattern over another this paper doesn't help me much. It presents a design pattern; but the empirical validation is missing in terms of what effect it would have on a real community.

Cited by 2 at time of post according to Google Scholar

2 comments:

Sam Joseph said...

One other thing I did notice was that this paper cited the Erickson paper on Social Translucence as well as the Resnick paper that turned me on to Social Psychology inspired approaches to online communities. The Autonomous Agents paper also cited Resnick and my Google Scholar lookups certainly suggest that as an important paper to read.

Also makes me think about doing other things with a Google Scholar API such as pasting in references from multiple papers, and doing analysis to find ones that they all cite. Of course this is what Peter Bergstroms PaperCube is all about - would be awesome to see that plugged into live Google Scholar data.

Sam Joseph said...

And turns out someone has already developed a paper based equivalent of Google's PageRank, called CiteRank, but only for physics related papers:
http://www.cmth.bnl.gov/%7Emaslov/citerank/