Datasets license
From TrustLet, a free, collaborative project for collecting and analyzing information about trust metrics.
We need to decide a license for the datasets.
The Databases and Creative Commons FAQ at ScienceCommons contains very useful information about this topic. The following sentences in italic comes from this FAQ, released under Creative Commons Attribution license. But the entire FAQ is worth a deep read.
- In short, databases usually are comprised of at least four elements: (1) a set of field names identifying the data; (2) a structure (or model), which includes the organization of fields and relations among them; (3) data sheets; and (4) data.
- In most databases, items (2) and (3) - the structure and the data sheet - will reflect sufficient creativity for copyright to apply. The rest of information in general does not reflect sufficient creativity for copyright to apply. In fact, facts are (generally) free (as in freedom) so database users should be able to use factual information contained in a database without restriction.
- However it is possible to apply additional conditions on the use of the factual data contained in the database with a "Terms of Use".
Why should we bother at all? There are different possible reasons. One goal could be to increase visibility for this wiki so that more people find it and can contribute, both with more datasets and with more evaluations and contributions to the wiki. Another goal is to deserve credit to the people who collected the dataset, an activity that is often time and money consuming. One possible way of doing so it to add a "Terms of Use" saying something like:
You are free to use this dataset as you wish. However in order to give credit to the efforts put into collecting it and making it available, we kindly ask you to report this complete text if you redistribute the dataset. Moreover we kindly ask you to include the following text in any document reporting experiments in which this dataset was used.
The <<short name of the dataset>> dataset was collected by <<collector>>. This dataset was made available by <<name of this project>> and can be found at <<short URL for this dataset>>.
We could stamp anyway a Science Commons logo in the dataset page, because our effort is in perfect line with Science Commons goals.
An additional point is worth further discussion. If we get a dataset by collecting information published on a web site, how can we release it? There are different opinions about it: ranging from "this data is total copyright of the web site, you can not redistribute it" to "researchers are free to collect information (just as they are free to, for instance, collect information by looking at people behaviour in public spaces). However we need to decide a policy about this. The best possible option is probably to ask web site owners permission to release the data in an anonimized way.
What do you think?

