Trust network datasets
In this page we collect trust network datasets (and more generally social network datasets). You might also want to check other repositories of datasets hosted somewhere else.
Trust network datasets (social network datasets) are datasets in which there are entities (users, peers, servers, robots, ...) and some social relationships connecting 2 of these entities.
The goal is to collect as many datasets as possible in one single place (this wiki) and release them in some standard formats for easy use with software also collected in this wiki, and with a reasonable license on them.
Our effort shares the vision of the Science Commons project which tries to remove unnecessary legal and technical barriers to scientific collaboration and innovation and foster Open Access to data.
LIST OF RELEASED DATASETS!
- Epinions dataset
- Advogato dataset
- Kaitiaki dataset
- SqueakFoundation dataset
- Robots.net dataset
- WikiPedia dataset
Ideas or suggestions of datasets to be collected here
- Feel free to move things around, and create new (sub)categories. Possibly check the Talk:Trust network datasets though.
Data for 2.7 million users, 10 million tweets, and 58 million edges (i.e. connections between users)
- friendfeed http://larica.uniurb.it/sigsna/data/
- XFN networks
- FOAF networks
- emails networks: for example the Enron Email Dataset or these ones
- Metafilter data http://stuff.metafilter.com/infodump/
- blog networks (check for instance Splog Blog Dataset)
- Livejournal network (see LJNet: LiveJournal Social Network Browser)
- PGP network
- Trust Networks in online 3D worlds such as SecondLife (this and ratepoint) and World of Warcraft (warcraftsocial.com and rupture.com)
- Peer-to-peer networks: relationships between users and between nodes. Easier to get datasets: Tribler, Gnutella, ...
- Network of routers and how they propagate traffic in the internet. See also The Social Life of Routers: Applying Knowledge of Human Networks to the Design of Computer Networks
- Internet traffic. See http://www.caida.org/data/
- Network of who replies to whom in Mailing lists. See http://www.cmu.edu/joss/content/articles/volume8/Welser/ (Data were collected and visualizations were generated for all people who replied to at least one message or received at least one reply during the study. From the raw data we constructed behavioral visualizations and network data sets based on reply relationships.)
- Networks of import in GIT. GIT is a distributed code versioning system. Every user is free to import from whichever user. If Linus imports the linux code from Mary's copy, this obviously means that Linus trusts a lot Mary. See Linus presentation and check if it is p[ossible to get the information about who imported from whom, or not, for example http://kerneltrap.org/node/5014
- CRAWDAD is the Community Resource for Archiving Wireless Data At Dartmouth, a wireless network data resource for the research community. This archive has the capacity to store wireless trace data from many contributing locations, and staff to develop better tools for collecting, anonymizing, and analyzing the data. We work with community leaders to ensure that the archive meets the needs of the research community. http://crawdad.cs.dartmouth.edu/index.php
- The DIMES project is publishing AS graphs of the Internet for usage by the research community, as of December 2006. The datasets are published on a monthly basis, and contain the following: ASNodes - a set of AS level nodes which were found at that month and were seen at least twice, ASEdges - a set of AS level edges which were found at that month and were seen at least twice, Nodes - a set of IP level nodes which were found at that month and were seen at least twice, Edges - a set of IP level edges which were found at that month and were seen at least twice. http://www.netdimes.org/DIMESControlCenter/MonthlyData.jsp
- Youtube, livejournal, flickr, ... dataset at http://socialnetworks.mpi-sws.org
- Youtube Dataset available at http://an.kaist.ac.kr/traces/IMC2007.html
- Last.fm: who listens to what and is friend with whom. See Audioscrobbler API
- Slashdot Zoo
- The social network of technology news site Slashdot with "friend" and "foe" relations. About 78,000 users and 510,000 relationships, of which about a quarter are of the "foe" type.
- Dataset available on request at http://www.dai-labor.de/en/competence_centers/irml/datasets/
- Associated paper: Kunegis, Lommatzsch & Bauckhage, The Slashdot Zoo: Mining a Social Network with Negative Edges, WWW 2009. pdf
- club nexus (see the paper A social network caught in the Web by Lada A. Adamic, Orkut Buyukkokten, and Eytan Adar in which the Club Nexus dataset is used. Note that the dataset has weighted relationships!
- Tastes, Ties and Time: Facebook data release. The dataset comprises machine-readable files of virtually all the information posted on approximately 1,700 FB profiles on Facebook by an entire cohort of students at an anonymous, northeastern American university. Profiles were sampled at one-year intervals, beginning in 2006. This first wave covers first-year profiles, and three additional waves of data will be added over time, one for each year of the cohort's college career. Dataset at http://dvn.iq.harvard.edu/dvn/dv/t3
- Facebook social graph dataset. This dataset was collected during April-May 2009 and contains two representative samples of users Facebook-wide, each ~1 million nodes, with a few annotated properties: for each sampled user, it includes the friend list, privacy settings and network membership.
- Facebook applications dataset. This dataset was collected during February 2008 and contains a list of installed applications for ~300K Facebook users. In total ~13K applications are included. Additionally the dataset contains number of active users and total application installations for every Facebook application daily over a period of 120 days.
- Computer networks as social networks
- Nitle crawled blog network
- From http://research.microsoft.com/en-us/um/people/thomkar/osn/virgilio_panel.pdf
- Virgilio Almeida, UFMG 2007 7 OSN data spans orders of magnitude
- YouTube 1.6 million-node, Flickr 1.8 million-node, LiveJournal 5.2 million node and Orkut 3 million-node [Cha 2007,Mislove 2007]
- 33 million blog requests to 210,738 blogs in a blogosphere [Almeida 2007]
- 30 bilion of conversations among 240 million people: network of all IM communication over one month on Microsoft Instant Messenger [Leskovec and Horvitz 2007]
- Citation network with 1736 nodes, actor collaboration with 392340 nodes... [Barabasi et al. 2005]
- Email network with 59812 nodes with emails of 5165 students [Ebel et al. 2002]
- Time evolution of a social network comprising 43,553 students. [Kossinets and Watts, 2006]
Collaboration in Free and Open Source Software
- network of profiles at Advogato.
- SourceForge networks (between projects, between developers, etc.). See for example SourceForge.net Research Data by Greg Madey
- data from ossmole, such as Sourceforge networks
- networks of CVS commits to code repositories
- eBay networks about who bought/sold from whom.
- networks of recommendations: who recommended what to whom. See Patterns of Influence in a Recommendation Network
- Product Space Properties: using a network representation for the products space we can not only see which products are close to each other and the groups they form, but also their classifications and values. However, the network representation is nothing more than a powerful visualization technique and we still need to study the space properties using the entire proximity matrix complemented.
- Ripple: the Ripple network could be a peer-to-peer distributed social network service with a monetary honor system based on trust that already exists between people in real-world social networks.
- Prosper: Prosper is a people-to-people social lending marketplace. Other examples of social lending are Zopa and LendingClub, see this article on TechCruch and this research. Riva ( http://riva.org ) is P2P microcredit and it is interesting as well.
- Donations and networks of donors: The Epidemics of Donations: Logistic Growth and Power-Laws - This paper demonstrates that collective social dynamics resulting from individual donations can be well described by an epidemic model. It captures the herding behavior in donations as a non-local interaction between individual via a time-dependent mean field representing the mass media. Our study is based on the statistical analysis of a unique dataset obtained before and after the tsunami disaster of 2004. We find a power-law behavior for the distributions of donations with similar exponents for different countries. Even more remarkably, we show that these exponents are the same before and after the tsunami, which accounts for some kind of universal behavior in donations independent of the actual event. We further show that the time-dependent change of both the number and the total amount of donations after the tsunami follows a logistic growth equation. As a new element, a time-dependent scaling factor appears in this equation which accounts for the growing lack of public interest after the disaster. The results of the model are underpinned by the data analysis and thus also allow for a quantification of the media influence.
- Fuzzy Local Currency Based on Social Network Analysis for Promoting Community Businesses.
- network of directors of companies
- Networks of funding relations between groups of candidates and donors. See http://www.visualcomplexity.com/vc/project.cfm?id=478
- BOWLING ALONE, REVISITED: SOCIAL CAPITAL AND SKILL ACQUISITION:This paper uses micro-level data on friendship networks in middle and secondary schools to estimate effects of social capital (as measured by connections to and from other agents) on skill acquisition outcomes and to investigate the association between ethnic fractionalization and connectedness. (The most interesting aspect of the Add Health Survey, for the purpose at hand, is the data on friendship networks.)
- Suicide and Friendships Among American Adolescents: We analyzed friendship data on 13 465 adolescents from the National Longitudinal Survey of Adolescent Health to explore the relationship between friendship and suicidal ideation and suicide attempts.
- Citations network between researchers
- Citations network between journals
- use of citeulike for saving and tagging scientific papers http://www.citeulike.org/faq/data.adp
- use of bibsonomy http://www.bibsonomy.org/faq#faq-dataset-1 and download: http://www.kde.cs.uni-kassel.de/bibsonomy/dumps/2007-12-31.tgz
- Erdos numbers datasets. At http://www.oakland.edu/enp/thedata.html
- networks of pharmacologists. See “Six degrees of pharmacology: Game ranks researchers by proximity to field’s founder
- network of citation between patents. At http://www.nber.org/patents/
- network of inventors http://www.visualcomplexity.com/vc/project.cfm?id=546 (evolution in time!!)
- Littlesys http://littlesis.org/features At its core, LittleSis offers profiles of powerful people and organizations in the public and private sectors. Profiles detail a wealth of information vital to any investigation of the ways power and money guide the formulation of public policy, from board memberships to campaign contributions, old school ties to government contracts. LittleSis integrates a wide range of data currently scattered across print and electronic resources that are disconnected and often difficult to find. In addition to a rich catalog of profiles, LittleSis provides users with sophisticated tools for searching, exploring, and analyzing the network of data at their disposal. LittleSis allows users to map social networks, meticulously follow the money, and highlight relationships overlooked in breaking news stories. Core data sets are maintained by the LittleSis staff, but most fields in LittleSis's database will be open to contribution and editing, while keeping the process of revision transparent and accountable. The overall breadth and detail of LittleSis's data will rely on an active and collaborative user community.
- networks of wars and conflicts among countries. Many interesting datasets (covering from 1816 until nowadays!) at http://cow2.la.psu.edu/ Used for example in Learning and Reputation in International Conflict
- Linkage dataset. tons of datasets about everything (societies around the world mainly). Also look at the paper Multimode Ring Cohesion Theory. Douglas R. White – work agenda with a lot of examples of already studied networks.
- networks of committee and subcommittee assignments in the United States House of Representatives from the 101st--108th Congresses. See http://arxiv.org/abs/physics/0602033
- network of political bloggers in US. See The Political Blogosphere and the 2004 U.S. Election: Divided They Blog (pdf) by Lada Adamic and Natalie Glance
- network of blogs that supported the 2007 French presidential candidate, Segolene Royal, divided by political party and geographically placed throughout the French territory. See http://www.visualcomplexity.com/vc/project.cfm?id=473
- Networks in NGO Communities. See http://www.commphd.com/ann/Projects.php and http://www.mande.co.uk/networkmodels.htm#Publicly_available_data_sets_ and http://www.ingentaconnect.com/content/klu/volu/2006/00000017/00000004/00009022?crawler=true
- Networks of activists See Myth and the Zapatista movement: exploring a network identity, Online activist community, Mapping Networks of Support for the Zapatista Movement: Applying Social Networks Analysis to Study Contemporary Social Movements, "Cyberlinks between human rights NGOs: A network analysis", Social Networks and Social Movements: A Microstructural Approach to Differential Recruitment
- The Italian Extreme Right On-line Network: An Exploratory Study Using an Integrated Social Network Analysis and Content Analysis Approach and cited by this: White supremacist networks on the internet, L'extrême droite sur internet.
- networks between countries. See Reputation and Interstate Conflict (Of Friends and Foes) based on Polity IV dataset.
- http://www.wooster.edu/polisci/mkrain/qm/qmdata.html ANNOTATED BIBLIOGRAPHY OF POLITICAL SCIENCE DATA SETS
- networks of citations between laws. See for instance the paper The Web of Law. What is interesting is that datasets are usually in the public domain.
- anonimized networks of telephone calls (not easy to get them)
- The Reality Mining project represents the largest mobile phone experiment ever attempted in academia. http://reality.media.mit.edu/dataset.php
- sex networks
- Disease transmission, virus transmission (also computer viruses which are easier to track)
- 80 small networks of animals used here (size from four colobus monkeys to 73 high school boys). The ties composing the networks also vary from advice relations and friendship ties to victories in agonistic encounters
- Social Networking for Zebras - Scientists are developing a new branch of network theory to understand zebra communities http://www.sciencenews.org/articles/20071201/mathtrek.asp
- Network of pigeons http://www.visualcomplexity.com/vc/project_details.cfm?id=541&index=541&domain=
- Networks of African elephants. Matriarchs As Repositories of Social Knowledge in African Elephants http://www.sciencemag.org/cgi/content/abstract/292/5516/491
To be sorted
- vizster, Visualizing online social networks by jeffrey heer and danah boyd
- Network of super heros in the Marvel universe http://bioinfo.uib.es/~joemiro/marvel.html
- Network of classical mythology gods
- Social Networking Software Tracks Zebras and Consumers: social networks of zebras (the animals)
- streets networks? Google Cartography uses the Google Search API to build a visual representation of the interconnectivity of streets in an area.
- social networks in sport. See http://ie.tamu.edu/people/faculty/butenko/papers/nba_graph.pdf
- Six Degrees of Kevin Bacon: any actor can be linked, through their film roles, to actor Kevin Bacon.
- The Oracle of Kasparov shows the shortest path by which a chess player beats Kasparov. Based on Chessbase Megabase.
- The Social Network of the Planetary Data System? See http://ocw.mit.edu/NR/rdonlyres/Engineering-Systems-Division/ESD-342Spring-2006/CEA02613-5FB1-44AA-A289-2D6359D92881/0/rep_planet_data.pdf
- ? http://ailab.uta.edu/subdue/download.htm
- the friendship problem. See http://www.cs.cmu.edu/~schneide/JKa2003.ps
- Citeulike or connotea networks of "who subscribe to whom". Same for del.icio.us. See also possible experiment about correlation between citations and bookmarking
- diffusion of innovation, buzz, viral marketing. See http://www-personal.umich.edu/~ladamic/
- social epidemics (see http://en.wikipedia.org/wiki/The_Tipping_Point_%28book%29 )
- social network of drugs
- political blogosphere. See http://www-personal.umich.edu/~ladamic/
- network of (fights between) dolphins. http://www.tethys.org/projects/IDP/idpk_home.htm#dataset
- http://papers.ssrn.com/sol3/papers.cfm?abstract_id=269289 Check the dataset
- Trust and stress in the workspace: Testing alternative hypotheses
- networks of words, check wordnet and much more datasets
- The Steroids Social, baseball players and through whom they got introduced to drugs (steroids): players are connected to the person who first introduced them to the Mitchell report's star witness, Mets batboy turned personal trainer Kirk Radomski.
- add more!