Examples
From TrustLet, a free, collaborative project for collecting and analyzing information about trust metrics.
When using trustlet in an interactive way (e.g. in IPython) it's convenient (but not very clean) to import everything into the main namespace:
from trustlet import *
Contents |
[edit] Creating your own dataset
It's easy to create your own dataset, look at those examples:
#in this way we set directly W.path and the nodes/edges of the graph (if you have saved a dataset before)
W = WeightedNetwork( base_path='your_home_directory'(optional),
filepath='path_to_file_in_c2_format_on_which_you_want_to_save/load_dataset',
from_graph=a_graph_in_an_XDiGraph_class(optional),
cachedict=dictionary_used_as_key_for_graph)
#if you would create a new 'generic' network
W = WeightedNetwork()
#if you would set only the 'path' (in which the network save itself and the information about itself)
W = WeightedNetwork(base_path='your_home_directory')
#if you would load directly at init-time your network from a c2 you had to set filepath and cachedict parameter (or only filepath)
#see WeightedNetwork.__doc__ and Network.__doc__ for more info
NB: if the parameter 'cachedict' must be set at init-time, there are no other method to set it.
When 'W' in the previous example is set, you are ready to create your own network with the functions add_node/add_edge, or, (if you would copy the nodes/edges from another class) paste_graph. Before you can run your tests:
W.info() or W.info(cachedict={'network':'your_network_name'})
If you want to save your dataset, simply type:
W.save_c2()
if at init-time you haven't set the 'cachedict' parameter:
W.save_c2(cachedict={'network':'your_network_name'})
If you want to load your dataset simply type:
W.load_c2() #the params are similar to save_c2
see W.load_c2.__doc__ and W.save_c2.__doc__ for more info
[edit] Loading datasets
It's easy to start with a dataset and get some information about it:
# create dummy dataset D = DummyNetwork() D.info() # show information
For several advogato-style websites trustlet will download the most recent dataset automatically:
# create Kaitiaki dataset K = KaitiakiNetwork() K.info() # show information
# create SqueakFoundation dataset S = SqueakFoundationNetwork() S.info()
# create Advogato dataset A = AdvogatoNetwork() A.info()
# create Wikipedia dataset w = WikiNetwork( lang='your_lang', date='your_date' ) where: your_lang : it, en, fur, la, ... and whatever you have converted from wikipedia dataset download with wikixml2graph.py script into a c2 file. (see scripts for more info) your_date : a date in format yyyy-mm-dd
You can also specify a date in the past:
# create Advogato dataset as it was on a certain date. # The .dot file is taken from http://www.trustlet.org/datasets/advogato/ looking for the correctly dated file AD = AdvogatoNetwork(date="2007-10-13") AD.info()
[edit] Generate graphics for an arbitrary number of trust metrics on controversial nodes
If you would to generate this graphics you need to compute the type of error that you want to plot for each controversiality level. The fist step is to allocate the network:
IdentifierNetwork = AdvogatoNetwork( date="year-month-day", base_path="your/dataset/directory(only if it isn't in your home)" )
Now you must allocate the trust metrics that you would plot.
TM = TrustMetric( IdentifierNetwork , trust_metric_function ) or/and TM1 = PageRankTM( IdentifierNetwork ) or/and ...
Now you might have a certain number of trustmetric (we call it tm1, tm2... tmN ) In order to plot them we must calcolate (or read if there was already calculated) the PredGraph (a network with on edges the original trust, and predicted trust) and on this class invoke the graphcontroversiality method. (For the documentation of the method open python import trustlet and launch this command "print PredGraph.graphcontroversiality.__doc__")
PredictedNetwork1 = PredGraph( tm1 ) PredictedNetwork2 = PredGraph( tm2 ) . . . PredictedNetworkN = PredGraph( tmN )
Now we are ready to calculate the points to be plotted, there are returned in a list of tuple by graphcontroversiality method
ListOfPoints1 = PredictedNetwork1.graphcontroversiality( maxcontroversiality, step, typeOfError [, NumberOfYourProcessor] ) ListOfPoints2 = PredictedNetwork2.graphcontroversiality( maxcontroversiality, step, typeOfError [, NumberOfYourProcessor] ) . . ListOfPointsN = PredictedNetworkN.graphcontroversiality( maxcontroversiality, step, typeOfError [, NumberOfYourProcessor] )
Now ( finally ;-) ) we have all data in order to plot the graph. We use the prettyplot function. (See documentation launching "print prettyplot.__doc__")
prettyplot( [ListOfPoints1,ListOfPoints1,....ListOfPointsN], path/to/img/to/save.png,
legend=('Short comment List1','Short comment List2'...'Short comment ListN')[, .....other parameters] )
If you are done all correctly this command show a graphics with the data that you have selected, and save it on path that you have specified.
[edit] Evaluating trust metrics
# load the Advogato network dataset A = AdvogatoNetwork() # create a trust metric based on MoleTrust with horizon 3 and threshold 0.5 moletrust3 = TrustMetric(A, 3) # use the trust metric to predict all the present trust edges (leave-one-out technique) pred_graph = PredGraph( TrustMetric(A, moletrust_generator(horizon=3))) # compute some errors measures on the trust values predicted for trust edges pred_graph.abs_error() pred_graph.coverage() # write something about generating a pred_graph only on edges satisfying some conditions
[edit] Network Evolution
Package trustlet.netevolution provides some tools to compare different snapshot of the same network in order to study its evolution.
USAGE: ./netevolution.py startdate enddate dataset_path save_path [debug file] [-s step]
startdate and enddate is something like 2008-05-12.
- dataset_path is the folder in which all the dataset are stored. (ex. /home/.../datasets/AdvogatoNetwork/ [because contains all the datasets])
- save_path is the path in which we save .png image with the graph and .gnuplot text file that was used to create graph.
- debug file is the file in which the debug are stored. If you pass this parameter automatically you enable debug mode, and store all information in
this file. If the file does not exist it will be created.
- -s step is a parameter to specify the distance between the network that must be calculated and plotted. If you want to calculate a network
only every 10 days you must specify -s 10.
Some functions calculated by netevolution
- edgespernode
- shows the average number of votes for each user
- trustAverage
- shows the average trust of the network
- usersgrown
- shows the grown of the network
. . .
[edit] wikixml2graph
For generating .c2 files representing the Wikipedia-Network instance (plus lists of special users)
USAGE:
wikixml2graph xml_file [base_path] [--current] [--distrust] [--threshold value|-t value] [--no-lists]
Default base_path = home-dir/shared_datasets
If --current isn't set, it'll use history xml
If xml_file is no-graph will insert only lists of users in .c2
distrust: force distrust graph creation (input file must be pages-meta-history)
threshold: remove edge if weight is less then value
no-lists: don't download list of users from wikipedia.org
Download compressed dataset from http://download.wikipedia.org/backup-index.html. We need pages-meta-current.xml.bz2 (or .7z if there is p7zip installed) and stub-meta-history.xml.gz.
$ wikixml2graph /mnt/data/datasetwiki/itwiki-20080626-stub-meta-history.xml --history $ wikixml2graph /mnt/data/datasetwiki/itwiki-20080626-pages-meta-current.xml --current

