pic_syk

Share your Knowledge is a project by Lettera 27 aimed at supporting the use of Creative Commons licence by ONGs and organization. They will help organizations to set free their contents, using it to create and expand encyclopaedic entries.
Lettera 27 asked us to join the project’s evaluators’ pool to monitor the influences of this project on the online presence of each organization. We used two sets of data: the first one comes from the website analytics of each organization examining how people access their site (from where? Which keywords they use? From which sites they arrive?). We complete this data using automated scrapers that periodically will check the relations between involved organization and the rest of the web.

Talking with the project coordinator (Cristina Perillo) we defined a series of generic questions: How much debate there is about Creative commons and organizations? How much Creative Common works are related each organization? How much Creative Common contents are produced by each organization?
To answer to this kind of questions is important to set up a robust method to collect the data and analyse it. The difficult part is finding a way to analyse the content of each Internet page and understand if is relevant for our analysis or not. This task, quite simple for an human, become very difficult for a computer, which requires a series of standardized instructions. Since we have no access to reliable semantic analysis software in Italian language, we decided to turn the questions to Google and analyse the results. This method can seem less accurate, and in some way it is, but it permits us to analyse in a robust process a large number of pages, minimizing the bias of the research method, and it also permits
Starting from the search engine Google, we have defined four different query, each one answer to a specific task.

Aggregating the results on they root domain we start collecting quantitative data, and joining them to each organization we obtain a network, useful to identify the role of each organization in the network. This process is automated using a processing script, ran each month. You can find the process description below.

The last part is the analysis of the impact on Wikipedia. Our idea is to follow month by month the pages created inside the project, the number of edits and the size of each edit.
Talking with the coordinator of this part of the project inside Lettera 27, we found a way to collect in an automated way all pages created or modified due the project. Starting from this list, using the history function of Wikipedia we collect data about all the edits on a page, and using a user list we can recognize which of them are performed by project participants. Also in this case we’ve used Processing to automate the process.
All results will be available on our Issuu page.