Extended Epinions dataset

One of the Epinions datasets.

This dataset was given directly by Epinions staff to Paolo Massa. As a consequence, the dataset contains also the distrust lists (which users are distrusted by which users) that is not shown on the site but kept private.

If you use one of these datasets and you appreciate my effort in collecting and releasing them, please cite my paper Trust-aware Bootstrapping of Recommender System (see below) or another of my papers analyzing Epinions, thanks! --Paolo Massa 09:25, 15 February 2011 (PST)

Massa, P., & Avesani, P. (2006). Trust-aware bootstrapping of recommender systems. Proceedings of ECAI 2006 Workshop on Recommender Systems (pp. 29-33).

Note that it is not a tipical collaborative filtering dataset, since the ratings are about the articles and not about items: the ratings represent how much a certain user rates a certain textual article written by an other user, i.e. a review.

The dataset contains
 * ~132,000 users, who issued
 * 841,372 statements (717,667 trusts and 123,705 distrusts).
 * ∼85,000 users received at least one statement.
 * 1,560,144 articles
 * 13,668,319 article ratings

Users and Items are represented by anonimized numeric identifiers.

The dataset consists of 3 files.

= Files =

Trust/distrust information
user_rating.txt.gz (4.7 Megabytes): Trust is the mechanism by which the user makes a statement that he likes the content or the behavior of particular user and would like to see more of what the users does in the site. Distrust is the opposite of the trust in which the user says that they do want to see lesser of the operations performed by that user.

Column Details:


 * 1) MY_ID This stores Id of the member who is making the trust/distrust statement
 * 2) OTHER_ID The other ID is the ID of the member being trusted/distrusted
 * 3) VALUE Value = 1 for trust and -1 for distrust
 * 4) CREATION It is the date on which the trust was made

First 4 lines: 3287060356     232085  -1      2001/01/10 3288305540      709420  1       2001/01/10 3290337156      204418  -1      2001/01/10 3294138244      269243  -1      2001/01/10

Article Author information
mc.txt.gz (15 Megabytes): Each article is written by a user.

Column Details:


 * 1) CONTENT_ID The object ID of the article.
 * 2) AUTHOR_ID The ID of the user who wrote the article
 * 3) SUBJECT_ID The ID of the subject that the article is supposed to be about

First 4 lines: 1445594|718357|149002425217 1445595|220568|149003604865 1445596|717325|5303145344 1445597|360156|192620893057

Article Ratings information
rating.txt.gz (85 Megabytes): Ratings are quantified statements made by users regarding the quality of a content in the site. Ratings is the basis on which the contents are sorted and filtered.

Column Details:-


 * 1) OBJECT_ID The object ID is the object that is being rated. The only valid objects at the present time are the content_id of the member_content table. This means that at present this table only stores the ratings on reviews and essays
 * 2) MEMBER_ID Stores the id of the member who is rating the object
 * 3) RATING Stores the 1-5 (1- Not helpful, 2 - Somewhat Helpful, 3 - Helpful 4 - Very Helpful 5- Most Helpful) rating of the object by member [There are some 6s, treat them as 5]
 * 4) STATUS The display status of the rating. 1 :- means the member has chosen not to show his rating of the object and 0 meaning the member does not mind showing his name beside the rating.
 * 5) CREATION The date on which the member first rated this object
 * 6) LAST_MODIFIED The latest date on which the member modified his rating of the object (missing if the info has never been changed after the creation)
 * 7) TYPE If and when we allow more than just content rating to be stored in this table, then this column would store the type of the object being rated.
 * 8) VERTICAL_ID Vertical_id of the review.

First 4 lines: 139431556      591156              5       0       2001/01/10              1       2518365  139431556       1312460676          5       0       2001/01/10              1       2518365 139431556       204358              5       0       2001/01/10              1       2518365 139431556       368725              5       0       2001/01/10              1       2518365

How to download files
Just download the txt.gz files on your hard disk. Then run from the command line of your GNU/Linux shell: gunzip name_of_file.txt.gz

Some people reported that under Windows the files seems to be doubly zipped.

''When you unzip the files, you'll get a .txt file which is not really a text file. It's still a zip file. Change the extension to .zip and unzip the file again. Then you are done. Let me know if you have any problem.'' --PaoloMassa 01:30, 20 March 2008 (PDT)

= Papers analyzing Epinions dataset =


 * Controversial users demand local trust metrics: an experimental study on epinions.com community
 * Trust metrics on controversial users: balancing between tyranny of the majority and echo chambers
 * add another paper!