Downloaded Epinions dataset

One of the Epinions datasets.

If you use one of these datasets and you appreciate my effort in collecting and releasing them, please cite my paper Trust-aware Bootstrapping of Recommender Systems (see below) or another of my papers analyzing Epinions, thanks! --Paolo Massa 09:25, 15 February 2011 (PST)

Massa, P., & Avesani, P. (2006). Trust-aware bootstrapping of recommender systems. Proceedings of ECAI 2006 Workshop on Recommender Systems (pp. 29-33).

The dataset was collected by Paolo Massa in a 5-week crawl (November/December 2003) from the Epinions.com Web site.

The dataset contains
 * 49,290 users who rated a total of
 * 139,738 different items at least once, writing
 * 664,824 reviews.
 * 487,181 issued trust statements.

Users and Items are represented by anonimized numeric identifiers.

The dataset consists of 2 files.

= Files =

Ratings data
ratings_data.txt.bz2 (2.5 Megabytes): it contains the ratings given by users to items.

Every line has the following format:

user_id item_id rating_value

For example, 23 387 5

represents the fact "user 23 has rated item 387 as 5"

Ranges:
 * user_id is in [1,49290]
 * item_id is in [1,139738]
 * rating_value is in [1,5]

Trust data
trust_data.txt.bz2 (1.7 Megabytes): it contains the trust statements issued by users.

Every line has the following format:

source_user_id target_user_id trust_statement_value

For example, the line

22605 18420 1

represents the fact "user 22605 has expressed a positive trust statement on user 18420"

Ranges:
 * source_user_id and target_user_id are in [1,49290]
 * trust_statement_value is always 1 (since in the dataset there are only positive trust statements and not negative ones (distrust)).

Note: there are no distrust statements in the dataset (block list) but only trust statements (web of trust), because the block list is kept private and not shown on the site.

Data collection procedure
The data were collected using a crawler, written in Perl.

It was the first program I (Paolo Massa) ever wrote in Perl (and an excuse for learning Perl) so the code is probably very ugly. Anyway I release the code under the GNU Generic Public Licence (GPL) so that other people might be use the code if they so wish.

epinionsRobot_pl.txt is the version I used, this version parses the HTML and saves minimal information as perl objects. Later on, I saw this was not a wise choice (for example, I didn't save demographic information about users which might have been useful for testing, for example, is users trusted by user A comes from the same city or region). So later on I created a version that saves the original HTML pages (epinionsRobot_downloadHtml_pl.txt) but I didn't test it. Feel free to let me know if it works. Both Perl files are released under GNU Generic Public Licence (GPL), see first lines of the files. --PaoloMassa

Be aware that the script was working in 2003, I didn't check but it is very likely that the format of HTML pages has changed significantly in the meantime so the script might need some adjustments. Luckily, the code is released as open source so you can modify it. --Paolo Massa 11:34, 16 July 2010 (UTC)

= Papers analyzing Epinions dataset =


 * Trust-aware Recommender Systems
 * add another paper!