Wik2dict

wik2dict is a tool written in Python that converts MediaWiki SQL dumps into the DICT format. The script is available under the GNU General Public License. It is also capable of downloading Wikipedia, Wiktionary, Wikiquote, Wikinews and Wikibooks SQL dumps. See also htdict.

It uses dictdlib, available here.

In 2004 converting the English Wikipedia took 1 hour with a P4 1.9 GHz processor. In 2008 the English Wikipedia is gigantic and it will probably take more than 1 day to convert it.

Usage reports, bug reports and patches are welcome. See also TODO.txt and CHANGELOG.txt for more information.

guaka wikitalk finally set up a repository, at http://code.google.com/p/wik2dict/ 04:43, 9 January 2008 (PST)

XML database dumps
Currently there is a beta version for XML database dumps from XML database dumps. It works, but no fancy stuff yet. Not even support for categories.

Unicode
There is support for Unicode, but it might sometimes lead to troubles. In wik2dict.py there's a list of Wikipedias still using latin-1, and the rest is considered utf-8. Problems here might lead to either mangled characters or a crashing dict server.

Logs
wik2dict creates a log file for every conversion, in which you can find possible errors in the MediaWiki articles (or in wik2dict). Up to now it gives info about stuff like [[non-finished link].

Download
--- None of these downloads work (at least not for me). I already tried it several times last year and thought: "Well, maybe the project is not maintained anymore", but since this page has been updated recently and I am very interested in converting wikimedia dumps to dict format, I think you may just not be aware of it.

Daniel ---

It's currently in the TrustLet code, more specifically here.

NOpe^^^^^^^^^^ Noot there!!!!!!!!!!!

0.4.1
Wijnand Post did some good work on improving the speed and reducing the memory overhead of XML parsing. 0.4.1 will be here to download one of these days.

0.4.0BETA
http://www.industree.org/download/wik2dict0.4BETA.tar.bz2
 * Rough beta version, with support for XML database dumps
 * UPDATE: doesn't work with big database dumps (like the English Wikipedia, see also talk page)

0.3.9
wik2dict-0.3.9.tar.gz
 * several thingies
 * (I see I forgot to put 0.3.8 online)

0.3.7
wik2dict-0.3.7.tar.gz
 * another bugfix for files >100 MB
 * several other thingies

0.3.6
wik2dict-0.3.6.tar.gz Debian package
 * fixed gzip.closed problem occurring with big files (>100 MB, which are processed twice)

0.3.5
wik2dict-0.3.5.tar.gz
 * Version bump needed for Wikimedia's new database dump, all older versions aren't able to download anymore.
 * Some cosmetic thingies.
 * Now working with new database dump format

Links

 * At http://dict.aioe.org you can access some converted Wikipedias and Wiktionaries