Cornell researchers working to scrub fake reviews off the Web

By Mike Flacy August 20, 2011

A team of researchers at Cornell recently finished a paper concerning the development of a computer algorithm to determine if a review is fake or authentic. After publishing the research, many specialty travel sites approached the group to determine how this algorithm could be developed to weed out paid reviews. Some brands and companies covertly use sites like Amazon’s Mechanical Turk, Fivver and other freelance sites to build a library of positive reviews for cash. These freelancer meccas are designed to pump out results quickly, thus are exploited to create a plethora of bloated 5-star reviews to inflate the quality of products or services.

In order to establish common elements within fake reviews, the Cornell team was authorized to create an mTurk task for the creation of 400 positive reviews of Chicago hotels. The only stipulation was that the review was to be fake. After combing through TripAdvisor, they specifically choose 400 reviews that they believed to be true and mixed them with the fake entries. These entries were shown to a group of judges, but they couldn’t tell the difference between authentic and fake.

After some analysis, the team created a computer algorithm to weed out the fakes that works 90 percent of the time. According to the results, fake reviews typically tended to be vague stories that focused on the experience in the city rather than specifics about the location under review. There’s also an overabundance of the reviewer identifying themselves with the words “me” and “I” to qualify credibility.

Beyond fake, positive reviews, companies also have to contend with rival businesses creating fake, negative reviews. Freelancers also advertise their services for creating negative reviews on sites like Yelp. However, Yelp uses an algorithm of its own to filter out both overly positive and negative reviews that seem untrue. However, these filtered reviews are linked at the bottom of the main business page, but don’t tie into the overall score.