WikiPedia dataset
From TrustLet, a free, collaborative project for collecting and analyzing information about trust metrics.
Wikipedia Dataset are avaiable here. The dataset is in c2 format (see cache v2 page for more information here), is organized into a hierarchy directory tree, sorted by language and date.
Contents |
[edit] Contents of the c2 dataset
Its contents are:
- the graph of users
- the list of bots
- the list of blocked users
The first information is obviously used to create the graph, the list of bots is used to exclude from network the bot user, and the list of blocked users is used to exclude from graph the blocked users.
[edit] Why do we save graphs in c2 format?
We save graphs in c2 format (Cache v2 provided by Python pickle package) because our code is in python and it is much more efficient to save and load files directly in a native format for python.
However it is very easy to write a 3 lines python script for loading the graph saved in c2 format and save it in the format you prefer. Check the code or contact us, for example using the discussion page of this wiki page.
Trustlet software can manipulate and work with wikipedia network. A wikipedia dataset is a c2 created with wikixml2graph scripts. This script of conversion works only with wikipedia dataset in xml format downloaded from [[1]].
[edit] Warnings
For large wikipedia networks, as en-wiki or it-wiki that takes some GB on your secondary storage, the time of computation used by wikixml2graph in order to create c2 dataset, may be too large. The major difficult in order to manipulate wiki-networks is the ram usage, because some dataset can take some GB of it.
[edit] Problems
The algorithm to get the weight on edges is too simple. We only check for every couple of person in the wiki-network how many times the first wrote on the personal page of the second, and this number is the weight on the edge from first to second. This algorithm may be without sense, if you would change it you may modify wikixml2graph script, or send a mail to us.
[edit] Other informations
For other informations about Wikipedia social network see Wikipedia social network

