Trust network datasets

From TrustLet, a free, collaborative project for collecting and analyzing information about trust metrics.

Jump to: navigation, search

Trust network datasets
( Datasets categoryRepositories of datasetsNetwork file formats • .. )

Trust network datasets are datasets in which there are entities (users, peers, servers, robots, ...) and some social relationships connecting 2 of these entities.

The goal is to collect as many datasets as possible in one single place (this wiki) and release them in some standard formats for easy use with software also collected in this wiki, and with a reasonable license on them.

Our effort shares the vision of the Science Commons project which tries to remove unnecessary legal and technical barriers to scientific collaboration and innovation and foster Open Access to data.


We are going to achieve this goal by contacting people who maintain repositories of datasets to see if we can agree on collecting those datasets here as well and in writing crawlers and scripts for downloading information from online resources.


Contents

[edit] Released datasets

You can find files for released datasets at http://trustlet.org/datasets/

[edit] Ideas or suggestions of datasets to be collected here

Feel free to move things around, and create new (sub)categories. Possibly check the Talk:Trust network datasets though.

[edit] Highly internet related

[edit] Wikipedia

Wikipedia social network

[edit] The rest

  • PGP network
  • Peer-to-peer networks: relationships between users and between nodes. Easier to get datasets: Tribler, Gnutella, ...
  • Network of who replies to whom in Mailing lists. See http://www.cmu.edu/joss/content/articles/volume8/Welser/ (Data were collected and visualizations were generated for all people who replied to at least one message or received at least one reply during the study. From the raw data we constructed behavioral visualizations and network data sets based on reply relationships.)
  • Networks of import in GIT. GIT is a distributed code versioning system. Every user is free to import from whichever user. If Linus imports the linux code from Mary's copy, this obviously means that Linus trusts a lot Mary. See Linus presentation and check if it is p[ossible to get the information about who imported from whom, or not, for example http://kerneltrap.org/node/5014
  • CRAWDAD is the Community Resource for Archiving Wireless Data At Dartmouth, a wireless network data resource for the research community. This archive has the capacity to store wireless trace data from many contributing locations, and staff to develop better tools for collecting, anonymizing, and analyzing the data. We work with community leaders to ensure that the archive meets the needs of the research community. http://crawdad.cs.dartmouth.edu/index.php
  • The DIMES project is publishing AS graphs of the Internet for usage by the research community, as of December 2006. The datasets are published on a monthly basis, and contain the following: ASNodes - a set of AS level nodes which were found at that month and were seen at least twice, ASEdges - a set of AS level edges which were found at that month and were seen at least twice, Nodes - a set of IP level nodes which were found at that month and were seen at least twice, Edges - a set of IP level edges which were found at that month and were seen at least twice. http://www.netdimes.org/DIMESControlCenter/MonthlyData.jsp

[edit] Web social networks

[edit] Collaboration in Free and Open Source Software

[edit] Economy

  • eBay networks about who bought/sold from whom.
  • networks of recommendations: who recommended what to whom. See Patterns of Influence in a Recommendation Network
  • Product Space Properties: using a network representation for the products space we can not only see which products are close to each other and the groups they form, but also their classifications and values. However, the network representation is nothing more than a powerful visualization technique and we still need to study the space properties using the entire proximity matrix complemented.

[edit] Monetary systems and social lending

  • Ripple: the Ripple network could be a peer-to-peer distributed social network service with a monetary honor system based on trust that already exists between people in real-world social networks.
  • Prosper: Prosper is a people-to-people social lending marketplace. Other examples of social lending are Zopa and LendingClub, see this article on TechCruch and this research. Riva ( http://riva.org ) is P2P microcredit and it is interesting as well.
  • Donations and networks of donors: The Epidemics of Donations: Logistic Growth and Power-Laws - This paper demonstrates that collective social dynamics resulting from individual donations can be well described by an epidemic model. It captures the herding behavior in donations as a non-local interaction between individual via a time-dependent mean field representing the mass media. Our study is based on the statistical analysis of a unique dataset obtained before and after the tsunami disaster of 2004. We find a power-law behavior for the distributions of donations with similar exponents for different countries. Even more remarkably, we show that these exponents are the same before and after the tsunami, which accounts for some kind of universal behavior in donations independent of the actual event. We further show that the time-dependent change of both the number and the total amount of donations after the tsunami follows a logistic growth equation. As a new element, a time-dependent scaling factor appears in this equation which accounts for the growing lack of public interest after the disaster. The results of the model are underpinned by the data analysis and thus also allow for a quantification of the media influence.

[edit] Companies


[edit] Social capital

  • BOWLING ALONE, REVISITED: SOCIAL CAPITAL AND SKILL ACQUISITION:This paper uses micro-level data on friendship networks in middle and secondary schools to estimate effects of social capital (as measured by connections to and from other agents) on skill acquisition outcomes and to investigate the association between ethnic fractionalization and connectedness. (The most interesting aspect of the Add Health Survey, for the purpose at hand, is the data on friendship networks.)
  • Suicide and Friendships Among American Adolescents: We analyzed friendship data on 13 465 adolescents from the National Longitudinal Survey of Adolescent Health to explore the relationship between friendship and suicidal ideation and suicide attempts.

[edit] Science

[edit] Politics

[edit] Laws

  • networks of citations between laws. See for instance the paper The Web of Law. What is interesting is that datasets are usually in the public domain.

[edit] Phone

  • anonimized networks of telephone calls (not easy to get them)
  • The Reality Mining project represents the largest mobile phone experiment ever attempted in academia. http://reality.media.mit.edu/dataset.php

[edit] Biological

  • sex networks
  • Disease transmission, virus transmission (also computer viruses which are easier to track)

[edit] Animals

[edit] To be sorted

Also check Online systems that collect trust information from users
Personal tools