Wikipedia social network

See the paper Social Networks of Wikipedia.

What we want to collect and study

 * network of Wikipedia users
 * internal messages: when user A edits the "discussion" page of user B, user A is in fact sending a message to user B. We can create the network of who speaks with whom. Service on it: using a trust metric, it is possible to suggest to user X some other unknown users that she might want to contact, for example for editing together a wiki page or for maintaining and watching a wikipedia portal page. It is also possible to test if and how memes (for example new words or the use of a certain category, that are easy to detect automatically) spread on the wikipedia social network based on this. It is also possible to compute some sort of global trust (reputation) on the network based on the aggregated trust network. It is even possible to show to a user only (or mainly) the text inserted by trustable users and to not show the other text (maybe only on pages under a wikipedia edit war, in this way we would create a personalized wikipedia (daily me?) in which each page (for example, the page on palestine) might look different for different users (boundaries of relativism?!?)
 * coediting: there is a trust edges (not directed but weighted) when A and B coedit pages, the edge could even be directed if they are both inserted (A edits pages with B (weight=0.3), B edits pages with A (weight=0.5). Possible service on it (based on trust metric): suggest to A to collaborate with some trustable users (unknown to A)
 * edit war: there is a trust edge between A and B when A and B had an edit war over a page. This would be a distrust network, very interesting but challenging!
 * network of Wikipedia articles (see how to find shortest path between 2 articles, also here with code)
 * network of Wikipedia categories
 * bipartite network of which Wikipedia users edit which Wikipedia articles: very interesting to find clusters, compute users similarities, compute article similarities (article edited by the same users can be "similar" also if they are not linked), ...

How to collect Wikipedia networks
How to download data from Wikipedia?

Possible solution
At http://meta.wikimedia.org/wiki/Data_dumps#What.27s_available.3F

* pages-articles.xml o Contains current version of all article pages, templates, and other pages o Excludes discussion pages ('Talk:') and user "home" pages ('User:') o Recommended for republishing of content. * pages-meta-current.xml o Contains current version of all pages, including discussion and user "home" pages. * pages-meta-history.xml o Contains complete text of every revision of every page (can be very large!) o Recommended for research and archives.

pages-meta-history.xml is what we need! I made a test with the Wikipedia in Furlan. I downloaded this file http://download.wikimedia.org/furwiki/20080519/furwiki-20080519-pages-meta-history.xml.bz2 from the wikipedia in friulano (6.2 MB) (i found the list of all dumps at http://download.wikimedia.org/backup-index.html

I tried to look for "talks" to user Tocaibon (see http://fur.wikipedia.org/wiki/Discussion_utent:Tocaibon ). By looking in the text file for "propite une buine" (which is contained in the page) I found many revisions of this page! So the info is in there! good! Relevant piece of information Discussion utent:Tocaibon 2586 5902 2006-05-24T18:26:17Z Klenje 1 == Nons gjeografics: cemût scriviu? == Mandi, e je propite une buine idee, si scugne cjatâ un standard par regjons, flums, citâts e vie indevant (par dì ancje Liste di Stâts dal mont e je dome une propueste). In chest fin setemane o provi a creâ une pagjine su Vichipedie:Toponims par furlan

Check WikiXRay ( http://meta.wikimedia.org/wiki/WikiXRay ) is a Python tool for automatically processing Wikipedia's XML dumps for research purposes. It also includes the more complete parser to extract metadata for all revisions and pages in a WIkipedia's XML dump, compressed with 7zip (or any other version). See the WikiXRay page on Meta for more info.

Papers about Wikipedia
TODO: add comments after reading them!

For an incomplete list of academic conference presentations, peer-reviewed papers and other types of academic writing which focus on Wikipedia as their subject see http://en.wikipedia.org/wiki/Wikipedia:Wikipedia_in_academic_studies and http://en.wikipedia.org/wiki/Academic_Research_on_Wikipedia (see about Growth of academic interest in Wikipedia: number of publications by year.

Wikipedias: Collaborative web-based encyclopedias as complex networks
Link http://scitation.aip.org/getabs/servlet/GetabsServlet?prog=normal&id=PLEEE8000074000001016115000001&idtype=cvips&gifs=yes

V. Zlatic,1 M. Bozicevic,2,3 H. Stefancic,1 and M. Domazet3

Wikipedia is a popular web-based encyclopedia edited freely and collaboratively by its users. In this paper we present an analysis of Wikipedias in several languages as complex networks. The hyperlinks pointing from one Wikipedia article to another are treated as directed links while the articles represent the nodes of the network. We show that many network characteristics are common to different language versions of Wikipedia, such as their degree distributions, growth, topology, reciprocity, clustering, assortativity, path lengths, and triad significance profiles. These regularities, found in the ensemble of Wikipedias in different languages and of different sizes, point to the existence of a unique growth process. We also compare Wikipedias to other previously studied networks. (reciprocity between wikipedia pages!)

Measuring wiki viability. An empirical assessment of the social dynamics of a large sample of wikis.
http://nitens.org/docs/wikidyn.pdf

Roth, C., Taraborelli, D., and Gilbert, N. (2008)

Proceedings of the 4th International Symposium on Wikis - WikiSym 2008, Porto, September 8-10, 2008.

NOT ABOUT SOCIAL NETWORKS but about various stats on user population and content of different wikis.

This paper assesses the content- and population-dynamics of a large sample of wikis, over a timespan of several months, in order to identify basic features that may predict or induce different types of fate. We analyze and discuss, in particular, the correlation of various macroscopic indicators, structural features and governance policies with wiki growth patterns. While recent analyses of wiki dynamics have mostly focused on popular projects such as Wikipe- dia, we suggest research directions towards a more general theory of the dynamics of such communities

Network Analysis of Collaboration Structure in Wikipedia
In this paper we give models and algorithms to describe and analyze the collaboration among authors of Wikipedia from a network analytical perspective. The edit network encodes who interacts how with whom when editing an article; it sig- nificantly extends previous network models that code author communities in Wikipedia. Several characteristics summa- rizing some aspects of the organization process and allowing the analyst to identify certain types of authors can be ob- tained from the edit network. Moreover, we propose several indicators characterizing the global network structure and methods to visualize edit networks. It is shown that the structural network indicators are correlated with quality la- bels of the associated Wikipedia articles.

COMMENT: 2 users are connected when they edit together a page. In the future work they mention analyzing directly the who edit the talk pages of who.

Ph.D. thesis "Wikipedia: A Quantitative Analysis"
http://libresoft.es/Members/jfelipe/phd-thesis

This doctoral thesis offers a quantitative analysis of the top ten language editions of Wikipedia, from different perspectives. The main goal has been to trace the evolution in time of key descriptive and organizational parameters of Wikipedia and its community of authors. The analysis is focused on logged authors (those editors who created a personal account to participate in the project). The comparative study encompasses general evolution parameters, a detailed analysis of the inner social structure and stratification of the Wikipedia community of logged authors, a study of the inequality level of contributions (among authors and articles), a demographic study of the Wikipedia community and some basic metrics to analyze the quality of Wikipedia articles and the trustworthiness level of individual authors. This work concludes with the study of the influence of the main findings presented in this thesis for the future sustainability of Wikipedia in the following years.

As far as we know, this is the first research work implementing a comparative analysis, from an quantitative point of view, of the top ten language editions of Wikipedia, presenting complementary results from different research perspectives. Therefore, we expect that this contribution will help the scientific community to enhance their understanding of the rich, complex and fascinating working mechanisms and behavioral patterns of the Wikipedia project and its community of authors. Likewise, we hope that WikiXRay will facilitate the hard task of developing empirical analyses on any language version of the encyclopaedia, boosting in this way the number of comparative studies like this one in many other scientific disciplines.

WikiXRay code is python and open source! http://meta.wikimedia.org/wiki/WikiXRay

Very interesting the "Analysis of inequalities"! The analysis of the inequalities found in terms of the effort spent by every author in each of the Wikipedia communities is a central point of this thesis. I have studied the distribution of inequalties by means of the Lorenz curve and the Gini coefficient. The ineq package in GNU R provides extensive support for these and other statistical tools to measure inequality distribution.

Power of the few vs. wisdom of the crowd: Wikipedia and the rise of the bourgeoisie
http://www.viktoria.se/altchi/submissions/submission_edchi_1.pdf

A. Kittur, E. Chi, B. A. Pendleton, B. Suh, and T. Mytkowicz.

Wikipedia has been a resounding success story as a collaborative system with a low cost of online participation. However, it is an open question whether the success of Wikipedia results from a "wisdom of crowds" type of effect in which a large number of people each make a small number of edits, or whether it is driven by a core group of "elite" users who do the lion's share of the work. In this study we examined how the influence of "elite" vs. "common" users changed over time in Wikipedia. The results suggest that although Wikipedia was driven by the influence of "elite" users early on, more recently there has been a dramatic shift in workload to the "common" user. We also show the same shift in del.icio.us, a very different type of social collaborative knowledge system. We discuss how these results mirror the dynamics found in more traditional social collectives, and how they can influence the design of new collaborative knowledge systems.

Investigations into Trust for Collaborative Information Repositories: A Wikipedia Case Study
http://ebiquity.umbc.edu/paper/html/id/307/Investigations-into-Trust-for-Collaborative-Information-Repositories-A-Wikipedia-Case-Study

Authors: Deborah L. McGuinness, Honglei Zeng, Paulo Pinheiro da Silva, Li Ding, Dhyanesh Narayanan, and Mayukh Bhaowal

Book Title: Proceedings of the Workshop on Models of Trust for the Web

Date: May 21, 2006

Abstract: As collaborative repositories grow in popularity and use, issues concerning the quality and trustworthiness of information grow. Some current popular repositories contain contributions from a wide variety of users, many of which will be unknown to a potential end user. Additionally the content may change rapidly and information that was previously contributed by a known user may be updated by an unknown user. End users are now faced with more challenges as they evaluate how much they may want to rely on information that was generated and updated in this manner. A trust management layer has become an important requirement for the continued growth and acceptance of collaboratively developed and maintained information resources. In this paper, we will describe our initial investigations into designing and implementing an extensible trust management layer for collaborative and/or aggregated repositories of information. We leverage our work on the Inference Web explanation infrastructure and exploit and expand the Proof Markup Language to handle a simple notion of trust. Our work is designed to support representation, computation, and visualization of trust information. We have grounded our work in the setting of Wikipedia. In this paper, we present our vision, expose motivations, relate work to date on trust representation, and present a trust computation algorithm with experimental results. We also discuss some issues encountered in our work that we found interesting.

Assigning Trust to Wikipedia Content
B.T. Adler, K. Chatterjee, L. de Alfaro, M. Faella, I. Pye, V. Raman.

In WikiSym 2008: International Symposium on Wikis.

Check the slides at http://trust.cse.ucsc.edu/Publications_and_Talks?action=AttachFile&do=get&target=wikisym08.pdf and you'll understand the idea in few minutes.

Great presentation on "analyzing large wikis" at http://wikitrust.soe.ucsc.edu/talks-and-papers/wikisym09.pdf?attredirects=0

The code is open source! OCAML

http://wikitrust.soe.ucsc.edu/wikitrust-batch-mode WikiTrust can be used in batch mode, to analyze wiki information dumps, among which: * Amount of user contribution * User-to-user social collaboration graphs * Revision-to-revision infomation reuse metrics * Episodes of collaboration and edit wars

See http://wikitrust.soe.ucsc.edu/ WikiTrust is an open-source MediaWiki extension that computes the origin and author of every word of a wiki, as well as a measure of text trust that indicates the extent with which text has been revised. To use WikiTrust, you click on a special wiktrust tab added by the extension. In the resulting view, the portions of a page that changed recently appear with orange background; the orange color is the more intense, the less the change has been revised. By clicking on any word, you can determine who inserted the word, and you can examine the precise context in which the word was inserted. It is possible to install WikiTrust in such a way that the tab appears only for registered users, who choose to activate the extension. We have both an informal description of our algorithms, and talks and papers that present the details. You may also wish to read the frequently asked questions.

Wikipedia networks at Stanford
http://snap.stanford.edu/data/wiki-Vote.html In order for a user to become an administrator a Request for adminship (RfA) is issued and the Wikipedia community via a public discussion or a vote decides who to promote to adminship. Using the latest complete dump of Wikipedia page edit history (from January 3 2008) we extracted all administrator elections and vote history data. This gave us 2,794 elections with 103,663 total votes and 7,066 users participating in the elections (either casting a vote or being voted on). Out of these 1,235 elections resulted in a successful promotion, while 1,559 elections did not result in the promotion. About half of the votes in the dataset are by existing admins, while the other half comes from ordinary Wikipedia users. The network contains all the users and discussion from the inception of Wikipedia till January 2008. Nodes in the network represent wikipedia users and a directed edge from node i to node j represents that user i voted on user j.

http://snap.stanford.edu/data/wiki-Talk.html Wikipedia is a free encyclopedia written collaboratively by volunteers around the world. Each registered user has a talk page, that she and other users can edit in order to communicate and discuss updates to various articles on Wikipedia. Using the latest complete dump of Wikipedia page edit history (from January 3 2008) we extracted all user talk page changes and created a network. The network contains all the users and discussion from the inception of Wikipedia till January 2008. Nodes in the network represent Wikipedia users and a directed edge from node i to node j represents that user i at least once edited a talk page of user j.

Source code for measuring different metrics http://snap.stanford.edu/snap/download.html#samples

Wikis as Social Networks:�Evolution and Dynamics
http://www-i5.informatik.rwth-aachen.de/i5new/staff/klamma/DA/Wiki-Culture.pdf The Culture of Wikis - A Dynamic Network Analysis Approach

http://www.slideshare.net/klamma/kdd-sna08-presentation Wikis as Social Networks:�Evolution and Dynamics

Wikiwatcher project, see http://www.google.it/search?q=wikiwatcher+aachen&ie=utf-8&oe=utf-8&aq=t&rls=org.mozilla:en-US:official&client=firefox-a

Proposal by jimmy wales
Proposal by Jimmy Wales, to write EXPLICITLY who you trust at User:Phauly/Trust

Syntax:

Didn't take off: 12 users!!!

http://www.gnuband.org/2007/06/26/wikipedia_trust_network

Us vs. Them: Understanding Social Dynamics in Wikipedia with Revert Graph Visualizations
Us vs. Them: Understanding Social Dynamics in Wikipedia with Revert Graph Visualizations

Bongwon Suh, Ed H Chi, Bryan A Pendleton, Aniket Kittur

Node = user / Edge = numbers of reverts between users as weight on edges

there is a revert between users that edit in time t1..tN-1 and users that edit page in time tN, → user that edit page in time tN don't trust users that edit page in time t1..tN-1

When revert graph is made, to create a trust graph we can repeat this steps for each user (named U):

1. make a set with all the users that U untrust (called Uset)

2. make a set with all the users that untrust at least one of the users in Uset (TrustSet)

3. for all users (W) in TrustSet create an edge between U and W, and as weight insert the cardinality of the intersection between TrustSet of U and TrustSet of W

GOAL: Understanding Social Dynamics in Wikipedia

Studying cooperation and conflict between authors with history flow visualizations
(2004) Viegas, Wattenberg, Dave MIT/IBM

"SuggestBot: Using Intelligent Task Routing to Help People Find Work in Wikipedia."
(2007) D. Cosley, D. Frankowski, L. Terveen, J. Riedl.

SuggestBot uses broadly applicable strategies of text analysis, collaborative filtering, and hyperlink following to recommend tasks.

Evaluation: SuggestBot's intelligent task routing increases the number of edits by roughly four times compared to suggesting random articles.

GOAL: recommend Wikipedia articles using (also) the Wikipedia social structure

GOAL: visualization

Six degrees of wikipedia?
Cool? http://www.netsoc.tcd.ie/~mu/wiki/

See also http://en.wikipedia.org/wiki/Six_degrees_of_Wikipedia

Evaluating authoritative sources using social networks: an insight from Wikipedia
Nikolaos Th. Korfiatis, Marios Poulos, George Bokos

Purpose – The purpose of this paper is to present an approach to evaluating contributions in collaborative authoring environments and in particular wikis using social network measures. Design / methodology / approach – A social network model for wikipedia has been constructed and metrics of importance such as centrality have been defined. Data have been gathered from articles belonging to the same topic using a web crawler in order to evaluate the outcome of the social network measures in the articles. Originality / Value - This work tries to develop a network approach to the evaluation of wiki contributions and approaches the problem of quality of wikipedia content from a social network point of view. Practical Implications – We believe that the approach presented here could be used to improve the authoritativeness of content found in Wikipedia and similar sources.

Comment

1. When a contributor edits content submitted by someone else, then it establishes a tie with him/her. This is depicted by an acceptance factor which represents the percentage of the previous contributor’s content that is visible after.

2. Every contributor who has a single contribution, or more, to the article establishes a relational tie with the other content contributors. Evidence of participation in common projects strengthens this tie. (Cointerest factor)

They compute ”Contributor Degree Centrality”

Evaluate by hand on 10 articles

Early work: contributors with lesser authority tend to have their content erased/objected by contributors with higher authority. There exist a number of contributors subject to objections regarding their submissions and therefore are situated on the periphery; whereas contributors with accepted contributions (authorities) tend to be in the centre

GOAL: use SNA indices on wikipedia network

Studying cooperation and conflict between authors with history flow visualizations
(2004) Viegas, Wattenberg, Dave MIT/IBM

GOAL is visualization

People at Polito
http://airwiki.elet.polimi.it/mediawiki/index.php/Techniques_to_analyze_the_Wikipedia_Social_Network

http://airwiki.elet.polimi.it/mediawiki/index.php/Wikipedia_Social_Network

http://airwiki.elet.polimi.it/mediawiki/index.php/Mining_the_Network_of_Coordination_Interactions_in_Wikipedia

"Many papers have analyzed Wikipedia's content and editing patterns, but just a few have studied the social network formed by its users."

We contacted them

Assessing the Value of Coooperation in Wikipedia Export
by: Dennis M. Wilkinson, Bernardo A. Huberman

Since its inception six years ago, the online encyclopedia Wikipedia has accumulated 6.40 million articles and 250 million edits, contributed in a predominantly undirected and haphazard fashion by 5.77 million unvetted volunteers. Despite the apparent lack of order, the 50 million edits by 4.8 million contributors to the 1.5 million articles in the English-language Wikipedia follow strong certain overall regularities. We show that the accretion of edits to an article is described by a simple stochastic mechanism, resulting in a heavy tail of highly visible articles with a large number of edits. We also demonstrate a crucial correlation between article quality and number of edits, which validates Wikipedia as a successful collaborative effort.

Network Analysis for Wikipedia Export
Bellomi, F. and Bonato, R. (2005). Network Analysis for Wikipedia. Proceedings of Wikimania.

About content and NOT social network.

Measuring Wikipedia
http://eprints.rclis.org/archive/00003610/01/MeasuringWikipedia2005.pdf

Table 1 contains some values that show the diversity of different Wikipedias. In German there is a talk page for around 19% of all articles, so it is probably more usual to comment on articles. The number of user talk pages by user pages in the Japanese Wikipedia is remarkable high because there are more than twice as much of the former than of the latter. Supposable Japanese Wikipedians better like notifying and discussing with each other then presenting themselves. A statistical comparison of all Wikipedias and the change of namespace fractions in time should give more precise results as well as the number of edits in each namespace.

Table 1: Some comparisons based on namespace counting German (de)      Japanese (ja)     Danish (da)    Croatian (hr) number of articles                         188,408            93,561           22,513            5,118 talk pages by articles                         0.192           0.139             0.93             0.29 redirects by articles                          0.495           0.424            0.355            0.911 user talk pages by user pages                    0.94           2.51             0.88             0.74

Wikipedias: Collaborative web-based encyclopedias as complex networks
Phys. Rev. E 74, 016115 (2006) [9 pages]

V. Zlatic,1 M. Bozicevic,2,3 H. Stefancic,1 and M. Domazet3

Wikipedia is a popular web-based encyclopedia edited freely and collaboratively by its users. In this paper we present an analysis of Wikipedias in several languages as complex networks. The hyperlinks pointing from one Wikipedia article to another are treated as directed links while the articles represent the nodes of the network. We show that many network characteristics are common to different language versions of Wikipedia, such as their degree distributions, growth, topology, reciprocity, clustering, assortativity, path lengths, and triad significance profiles. These regularities, found in the ensemble of Wikipedias in different languages and of different sizes, point to the existence of a unique growth process. We also compare Wikipedias to other previously studied networks.

To be read

 * http://www.citeulike.org/group/382/tag/social-network
 * http://citeulike.com/user/mattlandau/article/1094045
 * http://www.depthreporting.com/2005/12/social-network-analysis-and-wikipedia.html