Downloaded Epinions dataset

From TrustLet, a free, collaborative project for collecting and analyzing information about trust metrics.

Jump to: navigation, search

One of the Epinions datasets.


The dataset was collected by Paolo Massa in a 5-week crawl (November/December 2003) from the Epinions.com Web site.

The dataset contains

  • 49,290 users who rated a total of
  • 139,738 different items at least once, writing
  • 664,824 reviews.
  • 487,181 issued trust statements.

Users and Items are represented by anonimized numeric identifiers.

The dataset consists of 2 files.

Contents

[edit] Files

[edit] Ratings data

ratings_data.txt.bz2 (2.5 Megabytes): it contains the ratings given by users to items.

Every line has the following format:

user_id item_id rating_value

For example,

23 387 5

represents the fact "user 23 has rated item 387 as 5"

Ranges:

  • user_id is in [1,49290]
  • item_id is in [1,139738]
  • rating_value is in [1,5]

[edit] Trust data

trust_data.txt.bz2 (1.7 Megabytes): it contains the trust statements issued by users.

Every line has the following format:

source_user_id target_user_id trust_statement_value

For example, the line

22605 18420 1

represents the fact "user 22605 has expressed a positive trust statement on user 18420"

Ranges:

  • source_user_id and target_user_id are in [1,49290]
  • trust_statement_value is always 1 (since in the dataset there are only positive trust statements and not negative ones (distrust)).

Note: there are no distrust statements in the dataset (block list) but only trust statements (web of trust), because the block list is kept private and not shown on the site.

[edit] Data collection procedure

The data were collected using a crawler, written in Perl.

It was the first program I (Paolo Massa) ever wrote in Perl (and an excuse for learning Perl) so the code is probably very ugly. Anyway I release the code under the GNU Generic Public Licence (GPL) so that other people might be use the code if they so wish.

epinionsRobot_pl.txt is the version I used, this version parses the HTML and saves minimal information as perl objects. Later on, I saw this was not a wise choice (for example, I didn't save demographic information about users which might have been useful for testing, for example, is users trusted by user A comes from the same city or region). So later on I created a version that saves the original HTML pages (epinionsRobot_downloadHtml_pl.txt) but I didn't test it. Feel free to let me know if it works. Both Perl files are released under GNU Generic Public Licence (GPL), see first lines of the files. --PaoloMassa

[edit] Papers analyzing Epinions dataset

Personal tools