Repositories of datasets

This is an overview of repositories of datasets. The page "Trust network datasets" contains a list of released social network datasets.


 * TODO: analyze the single datasets contained in the following repositories and add them to the list of datasets.


 * http://konect.uni-koblenz.de/ KONECT contains over a hundred network datasets of various types and shows statistics and plots for each of them.
 * http://www.graph-database.org/
 * http://kevinchai.net/datasets/
 * http://casci.umd.edu/Social_Network_Analysis_Datasets
 * http://netwiki.amath.unc.edu/SharedData/SharedData
 * http://snap.stanford.edu/data/ Lots of datasets from Stanford SNAP project
 * http://infochimps.org/ Share and download datasets
 * http://www.casos.cs.cmu.edu/computational_tools/data2.php
 * http://ipr1.hsc.usc.edu/networks/ the Empirical Networks Project provides a dataset that compiles 720 empirical networks from 8 studies to provide a benchmark for social network parameters.
 * http://theinfo.org/ This is a site for large data sets and the people who love them: the scrapers and crawlers who collect them, the academics and geeks who process them, the designers and artists who visualize them. It's a place where they can exchange tips and tricks, develop and share tools together, and begin to integrate their particular projects.
 * http://www.ckan.net/package/list
 * http://search.icpsr.umich.edu/ICPSR/query.html?col=abstract&col=series&op0=%2B&fl0=&ty0=w&tx0=social+network&op1=%2B&fl1=&ty1=w&tx1=&op2=%2B&fl2=&ty2=w&tx2=&op3=%2B&fl3=&ty3=w&tx3=&op4=-&tx4=restricted&ty4=w&fl4=availability%3A&nh=25&rf=0&ht=0&qp=&lk=1&qm=0&st=1&rq=0
 * http://www.fsd.uta.fi/english/data/background/social_capital.html
 * http://www.datawrangling.com/some-datasets-available-on-the-web.html
 * http://datamob.org
 * http://del.icio.us/pskomoroch/dataset
 * http://del.icio.us/judell/publicdata http://del.icio.us/tag/publicdata
 * http://www.daniel-lemire.com/blog/data-for-data-mining/
 * http://stat.gamma.rug.nl/siena.html
 * http://voson.anu.edu.au/about.html
 * http://law.dsi.unimi.it/index.php?option=com_include&Itemid=65
 * http://www.graphdrawing.org/data/index.html
 * analyze this https://nwb.slis.indiana.edu/community/?n=Datasets.HomePage (a wiki as well)!
 * Check the entire project at http://farrall.org/webgraph/research/index.html (the datasets are in the right column)
 * Many Eyes by IBM
 * many networks can be found at http://www-personal.umich.edu/~mejn/netdata/, move them here slowly with a bit of order. it also contains a list of other repositories.
 * networks can be also found at http://www.googlesyndicatedsearch.com/u/umbcsearch?q=dataset&domains=ebiquity.umbc.edu&sitesearch=ebiquity.umbc.edu&x=0&y=0
 * http://www.infovis-wiki.net/index.php/Data_Libraries maybe not trust network related ...
 * http://vw.indiana.edu/07netsci/entries/
 * http://www.commetrix.de/Enron
 * http://vlado.fmf.uni-lj.si/pub/networks/data/esna/default.htm pajek datasets
 * http://www.infovis-wiki.net/index.php/Social_Network_Generation


 * InfoVis CyberInfrastructure, setting up a PostgreSQL database that will provide access to books, journals, proceedings, doctoral and masters theses, technical reports, patents, grants covering both cross-disciplinary research and specific knowledge domains.
 * http://web.archive.org/web/20060425224634/http://www.cosin.org/data.html and http://web.archive.org/web/20060502031821/www.cosin.org/extra/data/data.html
 * del.icio.us "dataset network"
 * http://www.cs.helsinki.fi/u/tsaparas/MACN2006/data-code.html
 * http://netwiki.amath.unc.edu/SharedData/SharedData (on a wiki to be checked, even if the recent changes show that it was not edited recently)
 * http://www.caida.org/data/
 * http://vlado.fmf.uni-lj.si/pub/networks/data/default.htm and http://vlado.fmf.uni-lj.si/pub/networks/pajek/data/gphs.htm
 * http://www.casos.cs.cmu.edu/computational_tools/data2.php and http://www.casos.cs.cmu.edu/computational_tools/data.php
 * http://www.weizmann.ac.il/mcb/UriAlon/groupNetworksData.html
 * http://www.ssfnet.org/Exchange/gallery/index.html
 * check datasets used in http://www.citebase.org/search?submit=1&author=Battiston%2C+Stefano and http://www.sg.ethz.ch/research
 * http://www.socqrl.niu.edu/DatasetsINTRO.htm
 * http://www.insna.org/INSNA/data_inf.html
 * http://www.wjh.harvard.edu/soc_help/datafac.htm
 * http://www.uwe.ac.uk/library/resources/soc/stats.htm
 * http://www.cardiff.ac.uk/insrv/bysubject/sociology/datasets.html
 * Sociology Datasets, Search Engines, and other Resources at washington.edu
 * UC Santa Cruz - Sociology - Sociology Datasets and other Resources
 * Software and Datasets for Sociology and Demography
 * Georgia Tech Library :: Sociology :: Datasets
 * Sociology at George Washington University
 * International Networks Archive
 * http://www.cs.cmu.edu/~jkubica/code/linkds.html
 * simile.mit.edu/repository/datasets/
 * DIMES Public data repository
 * jung / jung / src / samples / datasets (containing citeseer_authors.net which is no more included in the current release)
 * Network Datasets for course Algorithms for Information Networks, Carnegie Mellon, Spring 2005 and from another course
 * http://www.cs.toronto.edu/~tsap/experiments/download/download.html
 * http://www.cs.toronto.edu/~tsap/experiments/datasets/index.html
 * Datasets used in http://www-personal.umich.edu/~ladamic/si708w07/
 * http://www.socqrl.niu.edu/DatasetsOTHERS.htm
 * ideas in this paper
 * http://mobblog.cs.ucl.ac.uk/datasets/


 * http://www.cise.ufl.edu/research/sparse/matrices/ (visualizations at http://www.research.att.com/~yifanhu/GALLERY/GRAPHS/ ) University of Florida Sparse Matrix Collection: Maintained by Tim Davis. The University of Florida Sparse Matrix Collection is a large, widely available, and actively growing set of sparse matrices that arise in real applications. Its matrices cover a wide spectrum of problem domains, both those arising from problems with underlying 2D or 3D geometry: structural engineering, computational fluid dynamics, model reduction, electromagnetics, semiconductor devices, thermodynamics, materials, acoustics, computer graphics/vision, robotics/kinematics, and other discretizations; and those that typically do not have such geometry: optimization, circuit simulation, networks and graphs (including web connectivity matrices), economic and financial modeling, theoretical and quantum chemistry, chemical process simulation, mathematics and statistics, and power networks. The collection meets a vital need that artificially-generated matrices cannot meet, and is widely used by the sparse matrix algorithms community for the development and performance evaluation of sparse matrix algorithms. The collection includes software for accessing and managing the collection, from MATLAB, Fortran, and C. As of May 2008, it contains 1890 problems (some of which are sequences of dozens of matrices). The smallest is 5-by-5 with 19 nonzero entries. The largest has dimension 9.8 million, and the matrix with the most nonzeros has 99.2 million entries. The matrices are available in three formats: MATLAB mat-file, Rutherford-Boeing, and Matrix Market. The size of the collection in each format is about 9 GB. Note that the MATLAB mat-files can only be read by MATLAB 7.0 or later.


 * http://odysseas.calit2.uci.edu/doku.php/public:online_social_networks#available_datasets Facebook social graph and applications.