In the last few years I have been a rather loyal Netflix user. Their streaming selection is weak, but their DVD mail service is very convenient. One thing strikes me as odd though: their celebrated movie recommendations—based on collaborative filtering—are typically worthless. Most recently I was trying to figure out their three star prediction for Prometheus, which in my humble opinion is really the slime at the bottom of the Hollywood barrel. And then it occurred to me: maybe it’s my younger self’s fault?
When I’m asked to rate a silly movie like Indiana Jones or Terminator, which I haven’t seen in a very long time, I get nostalgic. As a kid I used to love these movies, but as a kid I also thought the funniest joke in the world was this thing (admittedly it’s a bit funnier in Hebrew). My rating for those movies is based on fond and distant memories, which may not be consistent with my adult point of view. Is this what Netflix “wants me to do”? This is perhaps the wrong question to ask; the right question is, how would another person with a similar taste profile rate one of these silly movies that we both loved as kids but haven’t watched since?
To make things more complicated, consider this. I’m in my early thirties, so as a kid I watched movies that were released in the late eighties and early nineties. In contrast, as a kid I don’t remember watching any movies that came out, say, in the fifties; I evaluate these movies based on my adult taste. Now, someone who is his sixties could have identical taste to mine, but may evaluate fifties movies based on his childhood taste and eighties movies based on his adult taste.
Ever the diligent scientist, I sampled Netflix’s rating queries (“rate what you’ve seen to discover suggestions for you”). Among twenty that were displayed, four (20%) fall into the problematic category of “movies that I watched as a kid but haven’t seen since”: Three Men and a Little Lady, Parenthood, Under Siege (Steven Seagal exterminates bad guys), and Sneakers (which actually falls into the more embarrassing subcategory of “movies that I watched in the cinema with my mom”). I would love to see the results of the following experiment: take people’s ratings and restrict them to movies that they watched in the last five years; would that make Netflix’s disappointing predictions more or less (in)accurate? It’s hard to predict (no pun intended). Yep, research was like a box of chocolates: you never know what you’re gonna get.
I was born and raised in America and my way of using Netflix is to only rate movies I have seen via Netflix. Furthermore, almost all of the movies I watch are foreign language movies. By now I find that if Netflix claims that I will like the movie at the at least 4 stars level I nearly always am glad I saw the movie. I do occasionally watch an American movie (mostly on TV) and find relatively few of them to my liking.
This is interesting. On the one hand it is encouraging that you only rate movies shortly after seeing them, but on the other hand your taste in movies seems to be very different from mine, unless by “foreign language movies” you mean “Israeli movies” 🙂
I do especially like foreign films from the Middle East – in particular Israeli movies. Even though I have been lucky enough to travel outside the US a fair amount (though not to Israel) I think that foreign movies have the advantage that even if their plots are as weak as many American movies (and by the way, I also like Canadian and British “foreign movies” a lot) they get the advantage of my seeing locales, clothes, ways of life somewhat different from my own. This may result in higher ratings for foreign films.
The problem of having one person’s or a group of people’s rating of films be of use to other people has some similarities with trying to find strong papers in mathematical modeling competitions, which I have had some experience with “grading.” Again if one uses some numerical scale for ranking the papers how is one sure that X’s 4 points really indicates a superior paper (movie) to Y’s 3 points, since as with movies the numbers people assign are complicated amalgams on different scales of value? In these competitions since typically there is no time for all the judges to see all the papers a series of “triage” rounds occur to eliminate the weaker submissions. Since a judge can not go back and re-grade papers after seeing later papers there is a natural tendency to be “harsher” in the score on the first papers one sees because one wants to allow “room” for giving a truly remarkable paper a better score. This may be less of an issue for grading movies.
Getting back to Netflix a peculiarity I have noticed is that when recommending items in a particular category for me I might have thought Netflix would first those films that are given more stars for me, but that does not seem to be the case. Very often the items with more stars are not the earlier items in the list they show me.
I think that the problem may be that the one-dimensional scale just doesn’t give you enough information. A score of 3 may be mean “far from brilliant, but will satisfy fans of the genre”, but also “the writing is very good, but the mediocre direction and acting hold it back”. These are quite different, but the system cannot distinguish between them.
Netflix appears to default to a rating of 3 stars for an unrated (or too-small-sample-size) movie or show. So it’s best to think of a 3 as “not yet rated by the Netflix community.”
Also, I rate Netflix based on whether I want Netflix to show me more of the same kind of movie/show or fewer – not on my real objective assessment of the show’s quality. Since offering me recommendations is really the main use to which they put my ratings, that’s the spirit by which I rate things. So for example, even if I see a fairly poor anime on Netflix, I might rate it high so they will keep showing me anime.
Good point, but your first paragraph probably does not explain the Prometheus prediction, as it now has 241k+ ratings (with an average of 3.5) and it presumably had almost as many when I watched it recently.