Trust network datasets

From Trust metrics wiki - Trustlet, a free, collaborative project for collecting and analyzing information about trust metrics.
Revision as of 06:21, 4 November 2011 by PaoloMassa (Talk | contribs)

(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Jump to: navigation, search

Trust network datasets
( Datasets categoryRepositories of datasetsNetwork file formats • .. )

In this page we collect trust network datasets (and more generally social network datasets). You might also want to check other repositories of datasets hosted somewhere else.

Trust network datasets (social network datasets) are datasets in which there are entities (users, peers, servers, robots, ...) and some social relationships connecting 2 of these entities.

The goal is to collect as many datasets as possible in one single place (this wiki) and release them in some standard formats for easy use with software also collected in this wiki, and with a reasonable license on them.


Our effort shares the vision of the Science Commons project which tries to remove unnecessary legal and technical barriers to scientific collaboration and innovation and foster Open Access to data.


Released datasets


Ideas or suggestions of datasets to be collected here

Feel free to move things around, and create new (sub)categories. Possibly check the Talk:Trust network datasets though.

Highly internet related


Wikipedia social network


Data for 2.7 million users, 10 million tweets, and 58 million edges (i.e. connections between users)

The rest

  • PGP network
  • Peer-to-peer networks: relationships between users and between nodes. Easier to get datasets: Tribler, Gnutella, ...
  • Network of who replies to whom in Mailing lists. See (Data were collected and visualizations were generated for all people who replied to at least one message or received at least one reply during the study. From the raw data we constructed behavioral visualizations and network data sets based on reply relationships.)
  • Networks of import in GIT. GIT is a distributed code versioning system. Every user is free to import from whichever user. If Linus imports the linux code from Mary's copy, this obviously means that Linus trusts a lot Mary. See Linus presentation and check if it is p[ossible to get the information about who imported from whom, or not, for example
  • CRAWDAD is the Community Resource for Archiving Wireless Data At Dartmouth, a wireless network data resource for the research community. This archive has the capacity to store wireless trace data from many contributing locations, and staff to develop better tools for collecting, anonymizing, and analyzing the data. We work with community leaders to ensure that the archive meets the needs of the research community.
  • The DIMES project is publishing AS graphs of the Internet for usage by the research community, as of December 2006. The datasets are published on a monthly basis, and contain the following: ASNodes - a set of AS level nodes which were found at that month and were seen at least twice, ASEdges - a set of AS level edges which were found at that month and were seen at least twice, Nodes - a set of IP level nodes which were found at that month and were seen at least twice, Edges - a set of IP level edges which were found at that month and were seen at least twice.

Web social networks

  • Youtube, livejournal, flickr, ... dataset at
  • Youtube Dataset available at
  • Flickr
  • digg
  • who listens to what and is friend with whom. See Audioscrobbler API
  • Slashdot Zoo
    • The social network of technology news site Slashdot with "friend" and "foe" relations. About 78,000 users and 510,000 relationships, of which about a quarter are of the "foe" type.
    • Dataset available on request at
    • Associated paper: Kunegis, Lommatzsch & Bauckhage, The Slashdot Zoo: Mining a Social Network with Negative Edges, WWW 2009. pdf
  • club nexus (see the paper A social network caught in the Web by Lada A. Adamic, Orkut Buyukkokten, and Eytan Adar in which the Club Nexus dataset is used. Note that the dataset has weighted relationships!
  • Friendster
  • Facebook
    • Tastes, Ties and Time: Facebook data release. The dataset comprises machine-readable files of virtually all the information posted on approximately 1,700 FB profiles on Facebook by an entire cohort of students at an anonymous, northeastern American university. Profiles were sampled at one-year intervals, beginning in 2006. This first wave covers first-year profiles, and three additional waves of data will be added over time, one for each year of the cohort's college career. Dataset at
    • Facebook social graph dataset. This dataset was collected during April-May 2009 and contains two representative samples of users Facebook-wide, each ~1 million nodes, with a few annotated properties: for each sampled user, it includes the friend list, privacy settings and network membership.
    • Facebook applications dataset. This dataset was collected during February 2008 and contains a list of installed applications for ~300K Facebook users. In total ~13K applications are included. Additionally the dataset contains number of active users and total application installations for every Facebook application daily over a period of 120 days.
  • Epinions
  • Computer networks as social networks
  • Nitle crawled blog network
  • From
    • Virgilio Almeida, UFMG 2007 7 OSN data spans orders of magnitude
    • YouTube 1.6 million-node, Flickr 1.8 million-node, LiveJournal 5.2 million node and Orkut 3 million-node [Cha 2007,Mislove 2007]
    • 33 million blog requests to 210,738 blogs in a blogosphere [Almeida 2007]
    • 30 bilion of conversations among 240 million people: network of all IM communication over one month on Microsoft Instant Messenger [Leskovec and Horvitz 2007]
    • Citation network with 1736 nodes, actor collaboration with 392340 nodes... [Barabasi et al. 2005]
    • Email network with 59812 nodes with emails of 5165 students [Ebel et al. 2002]
    • Time evolution of a social network comprising 43,553 students. [Kossinets and Watts, 2006]

Collaboration in Free and Open Source Software


  • eBay networks about who bought/sold from whom.
  • networks of recommendations: who recommended what to whom. See Patterns of Influence in a Recommendation Network
  • Product Space Properties: using a network representation for the products space we can not only see which products are close to each other and the groups they form, but also their classifications and values. However, the network representation is nothing more than a powerful visualization technique and we still need to study the space properties using the entire proximity matrix complemented.

Monetary systems and social lending

  • Ripple: the Ripple network could be a peer-to-peer distributed social network service with a monetary honor system based on trust that already exists between people in real-world social networks.
  • Prosper: Prosper is a people-to-people social lending marketplace. Other examples of social lending are Zopa and LendingClub, see this article on TechCruch and this research. Riva ( ) is P2P microcredit and it is interesting as well.
  • Donations and networks of donors: The Epidemics of Donations: Logistic Growth and Power-Laws - This paper demonstrates that collective social dynamics resulting from individual donations can be well described by an epidemic model. It captures the herding behavior in donations as a non-local interaction between individual via a time-dependent mean field representing the mass media. Our study is based on the statistical analysis of a unique dataset obtained before and after the tsunami disaster of 2004. We find a power-law behavior for the distributions of donations with similar exponents for different countries. Even more remarkably, we show that these exponents are the same before and after the tsunami, which accounts for some kind of universal behavior in donations independent of the actual event. We further show that the time-dependent change of both the number and the total amount of donations after the tsunami follows a logistic growth equation. As a new element, a time-dependent scaling factor appears in this equation which accounts for the growing lack of public interest after the disaster. The results of the model are underpinned by the data analysis and thus also allow for a quantification of the media influence.
  • Fuzzy Local Currency Based on Social Network Analysis for Promoting Community Businesses.


Social capital

  • BOWLING ALONE, REVISITED: SOCIAL CAPITAL AND SKILL ACQUISITION:This paper uses micro-level data on friendship networks in middle and secondary schools to estimate effects of social capital (as measured by connections to and from other agents) on skill acquisition outcomes and to investigate the association between ethnic fractionalization and connectedness. (The most interesting aspect of the Add Health Survey, for the purpose at hand, is the data on friendship networks.)
  • Suicide and Friendships Among American Adolescents: We analyzed friendship data on 13 465 adolescents from the National Longitudinal Survey of Adolescent Health to explore the relationship between friendship and suicidal ideation and suicide attempts.




  • networks of citations between laws. See for instance the paper The Web of Law. What is interesting is that datasets are usually in the public domain.


  • anonimized networks of telephone calls (not easy to get them)
  • The Reality Mining project represents the largest mobile phone experiment ever attempted in academia.



  • sex networks
  • Disease transmission, virus transmission (also computer viruses which are easier to track)


To be sorted

Also check Online systems that collect trust information from users
Personal tools