Blog

posted by Martina Elisa Cecchi
Wednesday, July 30th, 2014

Who visualize Wikipedia?

Overview of the academic studies applying data visualization in Wikipedia analysis

Hi! I’m Martina, a student that, after attending the Integrated Course Final Synthesis Studio 2012/13 and realizing the project “Unborn Discussion”, developed the Master Degree thesis in the Density Design Lab. The aim of my research was to analyze “Family Planning” article of the English version of Wikipedia as a case study to understand how controversies on the web develop.

Thanks to its increased popularity and relevance as a new form of knowledge generated by its users, but also to its fully accessible database, Wikipedia has received much attention in the last years from the scientific and academic community. The starting point of my research was thus the collection and the categorization of all the studies which addressed Wikipedia data analysis and visualization. Some researchers investigated pages growth rate, others the motivation and quality of users’ contribution, while others the ways in which users collaborate to build the article contents. A group of the considered studies also focused on conflict and controversy detection in Wikipedia rather than on consensus and agreement.

Nevertheless, among the 37 considered studies, only 9 of them used data visualization as an integrated method to investigate and convey knowledge. The scheme above summarizes these studies grouped by the visualization method they used, highlighting how they call their methods, which part of the Wikipedia page was analyzed and the study aims. Moreover, studies that focused on the same part of Wikipedia’s article are linked together with the same type of line. Do you think that I can add other interesting studies that used data visualization in Wikipedia analysis?

posted by Giorgio Uboldi
Monday, July 7th, 2014

The making of the “Seven days of carsharing” project

This post was originally posted on Visual Loop

As a research lab, we are always committed to exploring new research fields, new data sources and new ways to analyze and visualize complex social, organizational and urban phenomena.
Sometimes it happens that self initiated explorations become small side projects that we develop in our spare time and publish after a while. This is what happened in the past with RAW and other side projects.

In this kind of work we think is important to keep things simple and proceed step by step in order to have flexible and iterative processes and the possibility to experiment new tools and visual models that we can use later.
In this sense in the last months we decided to work on a small project we called “Seven days of carsharing”. The rise and growth of many car sharing services around the world has been an important factor in changing the way people move inside the city.
Visualizing and analyzing data from these services and other forms of urban mobility allows for a better understanding of how the city is used and helps to discover the most prominent mobility patterns.
The website is structured around a series of visual explorations of data collected over the time span of one week from one of the main car sharing services in Milan called Enjoy.
We started this project as a small experiment to investigate through different techniques how urban traffic patterns evolve day by day and the main characteristics of use of the service.

Data Scraping

Since Enjoy doesn’t provide any information about the routing, but just the position of the available cars, one of the biggest technical challenge was to find a way to explore the individual routes.
Inspired by this interesting project by Mappable we decide to process the data using an open routing service (Open Source Routing Machine) to estimate route geometries for each rent. The data has then been translated into a geojson file that we used for two visualizations.

Visualizations

Since the beginning the idea was to combine different kind of visualizations, both geographical and not in order to explore the different dimensions of the phenomenon, the biggest challenge was to combine them in a linear story able to convey some insights about mobility in Milan.
For this reason we decided to build a single page website divided in five sections. Each one enable the user to explore a dimension of the phenomenon and give some interesting insights.
The first visualization, created with d3.js, is the entry point on the topic and it represents an overview of the total number of rents. Every step is a car and every rent is represented by a line that connects the pick-up moment and the drop-off. Consequently, the length of the line represent the duration of time a single car has been rented. In this way it’s possible to discover when the service is most used and how the patterns evolve depending on the day of the week and the hour.

The second section of the website is focused on the visualization of the routes we extracted using the routing service. The routes data has then been visualized with Processing, using the Unfolding library to create a video and Tilemill and Mapbox.js to create the interactive map and the map tiles. Each rent has a start and an end time, and could hence be displayed in its own timeframe. In addition, the position of the car in the path was computed by interpolating its coordinates along the route with respect to the total duration and length of the rent.

The resulting routes represent the most likely way to go from the start point to the end point in the city. Obviously, the main streets (especially the rings of the city), are the most visible. It should be noted that this phenomenon is also the result of the service we used to get the routes that tend to privilege the shortest path instead of the quickest one and it doesn’t take in account other factors like traffic and rush hours.

In the last sections of the website we decided to focus on the availability of cars and their position during the week to understand which areas of the city are more active and when.

In the first visualizations we used a Voronoi diagram buil din D3.js in order to visualize both the position of the cars, represented by yellow dots, and also the area “covered” by each car. The areas surrounding the cars in fact contain all the points on the map closer to that car than to any other.

To better understand the patterns we decided to plot, beside the maps, the number of available cars for each of 88 neighbourhoods of Milan using a streamgraph. The streams show the number of cars available every hour for each neighbourhood, sorted by the total amount of cars available during the whole week.

General Remarks

We really enjoyed working on this project for many reasons.
First of all, working on small and self-initiated side projects like this gives us the opportunity to experiment a wide range of tools, in particular open mapping tools like Tilemill and Unfolding. In this way we could better understand the limits and advantages of different technologies and visual models.

Another important and positive aspect of this project was the possibility to involve different people of our lab, from software engineers to interns, experimenting flexible and agile workflows that allowed us to test multiple options for each visualizations that can be implemented in other projects in the future.

posted by Giorgio Uboldi
Friday, July 4th, 2014

A preliminary map of the creative professionals at work for EXPO 2015

Over 90% of the world’s population will be represented in Milan in 2015 for the Universal Exhibition. But how many creative professionals all over the world are working with Expo 2015 to give form and content to this event? The answer is many and the number is growing day by day.
The map we created for the exhibition Innesti/Grafting curated by Cino Zucchi and with the visual identity by StudioFM, is intended to provide an initial response, certainly not exhaustive, to this question.
The visualization, that you can find printed on a 6 by 3 meters panel inside the section of the Italian Pavilion dedicated to Milan and the EXPO2015, represents all the architectural and design projects that will be realized for the EXPO2015 and all the creative professionals and countries involved, weighted on their importance and involvement in the project.

The visualization, based on data provide by EXPO2015 S.p.A., has been created with Nodebox3, an open-source tool ideal for rapid data visualization and generative design.

Check out the high resolution version of the visualization here.
Photo’s courtesy of Roberto Galasso.

posted by Giorgio Uboldi
Tuesday, July 1st, 2014

Seven days of carsharing: exploring and visualizing the Enjoy carsharing service in Milan

As a research lab, we are always committed to exploring new research fields and new data sources.
Sometimes it happens that these self initiated explorations become small side projects that we develop in our spare time and publish after a while. This is what happened in the past with RAW and other side projects. In this kind of work we think is important to keep things simple and proceed step by step in order to have flexible and iterative processes and the possibility to experiment new tools and visual models that we can use later.

The rise and growth of many car sharing services has been an important factor in changing the way people move inside the city.
Visualizing and analyzing data from carsharing services and other forms of urban mobility allows for a better understanding of how the city is used and helps to discover the most prominent mobility patterns.
The website is structures around a series of visual explorations of data collected over the time span of one week from one of the main car sharing services in Milan called Enjoy.
We started this project as a small experiment to investigate through different techniques how urban traffic patterns evolve day by day and the main characteristics of use of the service.
In february 2014 we started scraping data directly from the Enjoy website getting the position of all the cars available every 2 minutes. We collected more than 1,700,000 data points resulting in more than 20,000 rents, and 800 days of usage.
Since the beginning the idea was to combine different kind of visualizations, both geographical and not in order to explore the different dimensions of the phenomenon, the biggest challenge was to combine them in a linear story able to convey some insights about mobility in Milan.
For this reason we decided to build a single page website divided in five sections. Each one enable the user to explore a dimension of the phenomenon and give some interesting insight.

Since Enjoy doesn’t provide any information about the routing, but just the position of the available cars, one of the biggest technical challenge was to find a way to explore the individual routes.
Inspired by this interesting project by Mappable we decide to process the data using an open routing service (Open Source Routing Machine) to estimate route geometries for each rent. The data has then been translated into a geojson file and visualized with Processing, using the Unfolding library, to create a video and with Tilemill and Mapbox.js to create an interactive map.

To see all the visualizations, discover our process and some insights please visit the project here.

posted by Michele Mauri
Thursday, May 8th, 2014

An interview with Richard Rogers:
repurposing the web for social and cultural research

During my time as a visiting researcher at the University van Amsterdam I had the opportunity to interview Richard Rogers, professor of New Media and Digital Culture. He calls himself a “Web epistemologist”, and since 2007 he’s director of the Digital Methods Initiative, a contribution to doing research into the “natively digital”. Working with web data, they strive to repurpose dominant devices for social research enquiries. He’s also author of the “Digital Methods” book. With him I tried to explore the concept of Digital Methods, and the relationship with design.

Richard Rogers at Internet Festival 2013, photo by Andrea Scarfò

Richard Rogers at Internet Festival 2013, photo by Andrea Scarfò

Let’s begin with your main activity, the Digital Methods. What are they, and what are their aims?

The aims of Digital Methods are to learn from and repurpose devices for social and cultural research. It’s important to mention that the term itself is meant to be a counterpoint to another term, “Virtual Methods”. This distinction is elaborated in a small book, “The end of the virtual“.

In this text I tried to make a distinction between “Virtual Methods” and “Digital Methods”, whereby with Virtual Methods, what one is doing is translating existing social science methods — surveys, questionnaires et cetera — and migrating them onto the web. Digital Methods is in some sense the study of methods of the medium, that thrive in the medium.

With virtual methods you’re adjusting in minor but crucial detail existing social science methods whereas digital methods are methods that are written to work online. That is why the term native is used. They run native online.

Are virtual methods still used?

Yes, and virtual methods and digital methods could become conflated in the near future and used interchangeably for the distinction I’m making between two is not necessarily widely shared.

In the UK there is a research program called digital methods as mainstream methodology which tries to move the term outside of a niche. Virtual methods, on the other hand, has been more established, with a large Sage publication edited by Christine Hine.

Digital Methods, the book, was awarded the “2014 Outstanding Book of the Year” by the International Communication Association, which gives it recognition by the field, so now the argument could be in wider circulation.

Today, many websites use the term Digital Methods. I was curious to know if you were the first one using it or not.

Yes, the term originated here, at least for the study of the web. The term itself already existed but I haven’t created a lineage or really looked into it deeply. I coined or re-coined it in 2007.

If you look at digitalmethods.net you can find the original wiki entry which situates digital methods as the study of digital culture that does not lean on the notion of remediation, or merely redoing online what already exists in other media.

How do digital methods work?

There is a process, a procedure that is not so much unlike making a piece of software. Digital Methods really borrows from web software applications and web cultural practices. What you do is create a kind of inventory, or stock-taking, of the digital objects that are available to you: links, tags, Wikipedia edits, timestamps, likes, shares.

You first see what’s available to you, and then you look at how dominant devices use those objects. What is Google doing with hyperlinks? What is Facebook doing with likes? And then you seek to repurpose these methods for social research.

You’re redoing online methods for different purposes to those intended. This is the general Digital Methods protocol.

What is your background, and how did you get the idea of Digital Methods?

My background is originally in political science and international relations. But most of the work behind Digital Methods comes from a later period and that is from the late ’90s, early 2000s when we founded the govcom.org foundation. With it, we made the Issue Crawler and other tools, and a graphical visual language too for issue mapping.
That’s combined in the book, “Preferred placement”, and it includes the first issue map that we made: a map of the GM food debate. You can see link maps and a kind of a visual language that begins to describe what we referred to at the time as the “politics of associations” in linking.

Genetically Modified food debate map (Preferred Placement, 2000)

Genetically Modified food debate map (Preferred Placement, 2000)

It began with a group of people working in the media and design fellowship at the Jan Van Eyck Academy in Maastricht, but it also came out of some previous work that I had done at the Royal college of Art, in computer related design.
Those early works were based on manual work as well. Our very first map was on a blackboard with colored chalk, manually mapping out links between websites. There’s a picture somewhere of that very first map.

So, you created the first map without any software?

Yes. And then we made the Issue Crawler, which was first called the “De-pluralizing engine”.

It was a commentary on the web as a debate space, back in the ‘90s when the web was young. New, pluralistic politics were projected onto the web, but with the De-pluralizing Engine we wanted to show hierarchies where some websites received more links than others.

Issue Crawler first come online in a sort of vanilla version in 2001 and the designed version in 2004. The work comes from my “science and technologies studies” background and in part scientometrics, and citation analysis.

That area, in some sense, informed the study of links. In citation analysis you study which article references other articles. Similarly, with link analysis you’re studying which other websites are linked to.

Nanotechnology policy map, made with Issue Crawler

Nanotechnology policy map, made with Issue Crawler

Reading your book (Digital Methods), sometimes the research is on the medium and some others studies are through it.

Indeed, Digital Methods do both kinds of research, it is true. There’s research into online culture and culture via online data. It’s often the case that we try to do one or the other, but mainly we simultaneously do both.

With Digital Methods one of the key points is that you cannot take the content out of the medium, and merely analyze the content. You have to analyze the medium together with the content. It’s crucial to realize that there are medium effects, when striving to do any kind of social and cultural research project with web data.

You need to know what a device artifact is, which search results are ‘Google artifacts,’ for example. We would like to undertake research, as the industry term would call it, with organic results, so as to study societal dynamics. But there’s nothing organic about engine results.

And the question is, how do you deal with it? How explicit do you make it? So we try to make it explicit.

I think the goal has always been trying to do social research with web data, but indeed we do both and we also strive to discuss when a project is aligning with one type of research or to the other.

Digital Methods (2013), by Richard Rogers

Digital Methods (2013), by Richard Rogers.

On the web, the medium is changing very quickly. Does this affect your research? Is it a problem?

Well it’s something we addressed from the beginning, because one of the prescripts of Digital Methods, one of the slogans, has been to follow the medium, and the reason is that the medium changes. You cannot expect doing sort of standard longitudinal research.

You do not receive the same output out of Facebook nowadays that you had three years ago, or five years ago. The output changed. You can go back in time and harvest Facebook data from five years ago, but Facebook was in many respects a different platform. There was no like button. Similarly, you need to know something about when Google performed a major algorithm update, in order to be able to compare engine results over time.

We are working with what some people call “unstable media”. We embrace that and of course there have been times when our research projects become interrupted or affected by changes in the advanced search, for example in a project created by govcom.org, called “elFriendo”. It is an interesting piece of software where you can use MySpace to do a number of things: create a new profile from scratch, check the compatibility of interests and users, and do a profile makeover.

And this worked very well until MySpace eliminated an advance search feature. You can no longer search for other users with an interest. So that project ended but nevertheless it remains a conceptual contribution, which we refer to as an approach to the study social media called post-demographics. This means that you study profiles and interests as opposed to people’s ego or social networks. This project opened up a particular digital methods approach to social media.

Presenting diagrams made by DMI or based on your methods, sometimes I encounter skepticism. Most of the statements are: you cannot prove that web represents society / when looking at people, you cannot define which portion of the population you are following / when crawling websites, you don’t know what kind of information is missing. Do you receive critiques on DM reliability? How do you answer these?

There is lot of scepticism toward research that has to do with online culture.

Normally it’s thought that if you’re studying the web you’re studying online culture, but we are trying to do more than that.

A second criticism or concern is that online data is messy, unstructured, incomplete, and it doesn’t really meet the characteristics of good data.

And then the third critique is that even if you make findings with online data you need to ground these findings in the offline, to make them stronger. Working with online data, Digital Methods necessarily needs to be part of a mixed methods approach. This is the larger critique.

How do I answer to these critiques? Well I agree with the spirit of them, but the question that I would like to pose in return is: how do we do Internet research?

One could argue that what you sketched out as critiques apply more to Virtual Methods than to Digital Methods. Because the various expectations to be met, those are the expectations that Virtual Methods are trying to deal with; while Digital Methods is a rather different approach from the start.

We use the web in a kind of opportunistic manner for research. Given what’s there, what can we do? That’s the starting point of Digital Methods.

The starting point is not how do we make a statistical sample of parties to a public debate online. That would be a Virtual Methods concern.

One common word used today is Digital Humanities. Are Digital Methods part of it?

To me, Digital Humanities largely work with digitized materials, while Digital Methods work with natively digital data. And Digital Humanities often use standard computational methods, while Digital Methods may come from computational methods but are written for the web and digital culture.

So the difference between Digital Methods and Digital Humanities is that the latter work with digitized material using standard computational methods.

What’s the difference in using a digitized archive (e.g. digitized letters from 1700) and an archive of born-digital data?

If you work with the web, archiving is different, in the sense that the web is no longer live yet is digital, or what Niels Bruegger calls re-born digital.

So web archives are peculiar in that sense. We could talk more specifically about individual web archives.

Let’s talk about the Wayback Machine and the Internet Archive, for example, which I wrote about in the “Digital Methods” book. It was built in 1996 and reflects its time period, in that it has a kind of surfing mentality built into it as opposed to searching.

Wayback Machine homepage

Apparently, in 2010 there was nothing of interesting to record on our website.

It’s also a web-native archive, and is quite different from the national libraries web archives: they take the web and put it offline. If you want to explore them, you have to go to the library; they’ve been turned into a sort of institutionalized archive, one in the realm of the library and librarians.

So it is a very different project from the Internet Archive. You can tell that one is far webbier than the other, right?

Another widely used word is big data. Sometimes it is used as synonym for web data. Is it something related to what you do or not?

As you know, I’m one of the editors of the “Big Data & Society” journal, so I’m familiar with the discourse.

Digital methods are not necessarily born in that; they are an approach to social research with web data, so the question is, what’s the size of that web data? Can digital methods handle it?

Increasingly we have to face larger amounts of data. How would one start to think the work is big data? Is it when you need clusters and cloud services? I think when you reach those two thresholds you’re in the realm of big data and we’re nearly there.

The final chapter of my book deals with this, and I think it is important to consider what kind of analysis one does with big data.

Generally speaking, big data call for pattern seeking, so you have a particular type of analytical paradigm, which then precludes a lot of other interpretative ones which are finer grained and close-reading.

Digital Methods are neither only distant reading nor close reading, but can be either. So Digital Methods do not preclude the opportunities associated with big data but they certainly are not dealing exclusively with big data ones.

You created a considerable amount of tools. Some of them are meant to collect data, others contain a visual layer, and some other ones are meant for visualization. How much importance do you give the visual layer in your research? How do you use it?

Our flagship tool, the Issue Crawler, and a lot of subsequent Digital Methods tools, did a number of things. The idea from the beginning was that the tool would ideally collect, analyze and visualize data. Each tool would have a specific method, and a specific narrative, for the output.

The purpose of digital methods tools would not be generic, rather would be specific or in fact, situated, for a particular research. Most of the tools come from actual research projects: tools are made in order to perform a particular piece of research, and not to do research in general. We don’t build tools without a specific purpose.

DMI tools page

Analysing links in a network, navigating Amazon's recommendation networks, scraping Google: here you can find all the tools you need.

The second answer is that designers have always been important; the work that I mentioned comes from a sort of confluence on one hand on science studies and citation analysis, and on the other hand computer-related design.

I was teaching in science studies at the University of Amsterdam and in computer related design at the Royal College of Art, specifically on mapping, and a number of projects resulted from my course, for example theyrule.net.

My research always had a political attitude as well: with analytical techniques and design techniques we’re mapping social issues.

And we map social issues not only for the academia, our audience also has been issue professionals, people working in issue areas, and in need of maps, graphical artifacts to show and tell before their various issue publics and issue audiences. We’ve always been in the issue communication business as well.

For which public are the visualizations you produce meant?

We have a series of publics: academics, issue professionals, issue advocates, activists, journalists, broadly speaking, and artists. It isn’t necessarily a corporate audience.

Each of those audiences of course has very different cultures, communication styles and needs.

So we try to make tools that are quite straightforward and simple, with simple input and simple output. That’s really the case for the Lippmannian device, also known as Google Scraper, where there are few input fields and you get a single output.

It’s also important for us to try to reduce the threshold in the use. For the Issue Crawler there are 9000 registered users. Obviously they don’t use it all the time.

Generally speaking the tools are open to use, and that’s also part of the design.

In the last summer schools you invited some former DensityDesign students.
Were you already used to invite designers?

Yes, govcom.org as well as DMI has always been a collaboration, maybe I should have mentioned this from the beginning, between analysts, programmers, and designers. And then sometimes there is more of one than another, but we always created a communication culture where the disciplines can talk each other.

Often times the problem, when working in an interdisciplinary project, is that people don’t speak each other’s language. What we’re trying to do is to create a culture where you learn to speak the other’s language. So if you encounter a programmer and say ‘this software is not working’, he would probably ask you to ‘define not working’.

Similarly you won’t go to a designer and just talk about colors, you need a more holistic understanding of design.

It is a research space where the various kinds of professions learn to talk about each other’s practice. It’s something that people in Digital Methods are encouraged to embrace. That’s has always been the culture here.

DensityDesign students during the 2012 summer school: "Reality mining, and the limits of digital methods". Photo by Anne Helmond

DensityDesign students during the 2012 summer school: "Reality mining, and the limits of digital methods". Photo by Anne Helmond

You’ve lots of contacts with design universities. Why did you invite designers from DensityDesign?

Well, because Milan students are already trained in Digital Methods, and I didn’t know that until someone showed me the work by some of you in Milan, using our tools, doing something that we also do, but differently.

What we found so rewarding in Milan is the emphasis on visualizing the research process and the research protocols.

If you look at some of our earlier work, it’s precisely something we would do (for example in “Leaky Content: An Approach to Show Blocked Content on Unblocked Sites in Pakistan – The Baloch Case” (2006)). It is an example of Internet censorship research.

And from the research question you show step by step how you do this particular piece of work, to find out if websites telling a different version of events from the official one are all blocked. So when I saw that DensityDesign was largely doing what we have always naturally done but didn’t really spell it out in design, I thought it was a great fit.

Is there design literature on Digital Methods?

Our work is included, for example, in Manuel Lima visual complexity, and earlier than that our work has been taken up in Janet Abram’s book on Elsewhere: Mapping. She’s also a curator and design thinker I worked with previously at the Netherlands Design Institute, which no longer exists; it was a think-and-do-thank run by John Thackara, a leading design thinker.

In some sense the work that we’ve done has been a part of the design landscape for quite some time, but more peripherally. We can say that our work is not cited in the design discourse, but is occasionally included.

IDEO, a famous design firm, published a job opportunity called “Design Researcher, Digital Methods”. This is an example of how Digital Methods are becoming relevant for design. Is their definition coherent with your idea?

No, but that’s ok. It coheres with a new MA program in Milan, which grew out of digital online ethnography.

Digital Methods in Amsterdam has had little to do with online ethnography, so this idea of online ethnography doesn’t really match with Digital Methods here, but does match with DM done there and elsewhere. Online ethnography comes more from this (showing Virtual Methods book).

IDEO’s job description is not fully incompatible but it’s more a collection of online work that in fact digital agencies do. This particular job description would be for people to build tools, expertises and capacities that will be sold to digital agencies. So these are core competencies for working with online materials.

Is it surprising for you that this job offer uses the term ‘Digital Methods’?

The first thing you learn in science studies is that terms are often appropriated.

How I mean Digital Methods isn’t necessarily how other people would use it, and this appropriation is something that should be welcomed, because when people look up what Digital Methods are, where they came from, when they discover this particular school of thought, hopefully they’ll get something out of the groundwork we’ve done here.

We worked together during the EMAPS project. How do you evaluate DensityDesign approach?

I think that DensityDesign contribution to EMAPS has been spectacular.

Generally speaking I don’t have criticisms of the DensityDesign contribution, but I have questions and they have to do maybe more generally with design thinking and design research.

Often times design think from the format first. Often design starts with a project brief, and with it there is already a choice for a format of the output, because you need to have constraints, otherwise people could do anything, and comparison would be difficult. It’s indeed the element of the project brief that we do differently. So maybe it’s worth to document the differences.

The Digital Methods approach, in terms of working practice, relies on the current analytical needs of subjects matter experts, whereby those needs in some sense drive the project.

Working with web data, the standard questions that we ask a subject matter expert is “what’s the state of the art of your field? What are you analytical needs? And what do you think Internet could add?”

We let the subject matter expert guide our initial activities, provide the constraints. That’s the brief. Which is a different way of working from the idea that, for example, this project output will be a video.

Another comment I would add is the more Digital Methods have become attuned to the field of information visualization, the more DM has learnt from this field, the more standardized the visualizations have become. Whereas in the past we were wild, and we made things that did not necessarily fit the standards.

One of the questions that have been asked in a project I’m working on is: “are good big data visualization possible?” But similarly one could ask: “in data visualization, is innovation possible?” Because what we’re currently seeing is increasing standardization.

So then, what is innovation in data visualization? These are the questions I would pose across the board.

Because when designers are working with project partners, I think they learn more about the analysis than about data visualization.

So is imagination now driven by data analysis? The challenge is to think about processes or setups which make possible innovation.

posted by Michele Mauri
Friday, February 21st, 2014

Contropedia: 1st Hackathon results

Last week, during an hackathon in Amsterdam, we realized the first working prototype of Contropedia Platform, an application meant for the real-time analysis and visualization of controversies in Wikipedia.

It has been a great experience thanks to the interesting mix of different expertise.

Giovanni and me worked together Médialab (Sciences-Po), Digital Methods Initiative (UvA) and Barcelona Media researchers to realize this early prototype.

Hackathon goal was to refine and correlate some metrics previously identified by Barcelona Media and DMI, and to provide a static visualization of each metrics. We’re quite proud to have overcome that goal realizing a working prototype at the end of the week.

Still, lot of work has to be done, but here you can find some preview of the visualization and a brief description of the design process.

Results

Let’s start with results. Below you can see a static mockup of the application (ASAP the working prototype will be available on the project website).

Up to now we have two visualizations groups: the first one is meant for the exploration of controversial elements within the Wikipedia page. The second one is a more analytical exploration of such elements, giving details on involved users and the revisions list for each element.

Preliminary Work

Before the hackathon, each institution prepared some preliminary work.

Médialab, with its expertise in controversy mapping, prepared a user profile and some use scenarios for the platform, defining the application goals.

DMI was already working on the identification of most edited (and therefore controversial) items in the page.

DensityDesign already worked on the visualization of such items, and provided a brief description of all the available variables.

Barcelona Media brought the study of reply chains on discussion pages.

User scenario

Synthetizing the long document prepared by Médialab, the user is:

  • A person that sometimes uses Wikipedia as information source
  • Want to know something more about the debate (without having any prior hypothesis about that)
  • Knows roughly how Wikipedia works (ability to edit, existence of talk pages).
  • Finds the actual structure too fragmented to have a global idea of

The document presents, in form of discourse, user needs. Here is the list of main ones:

  • What is controversial in a Wikipedia page?
  • How much a given part discussed?
  • How many people are discussing it?
  • Which are the most debated sections?
  • How much the linked pages are controverted?
  • How many controverted pages exist about the topic?
  • Who are the main actors (see Actor-Network Theory)?
  • What is an actor on Wikipedia?
  • How the controversiality changes over time?
  • Does the debate move among languages?
  • What is the temporal trajectory of controversiality?

Metrics

At the hackathon beginning some useful metrics were already identified.

The first one is about the controversial elements within a page. For each wiki-link, image and external link in the page, edits are counted and normalized on each edit size.

It is then possible to evaluate the controversiality level of each element through time, as some of them have been deleted from latest page revision.

The second one is about discussions. Wikipedia talk pages have a tree-structure, and analysing the tree depth, its width (number of branches) and the amount of users is possible to define its controversiality.

There is no explicit link between threads and the page, even if reading them is possible to identify which part of the page they are talking about.

It is also possible to extract the revert network between users: each time that a user reverts another user edit; the link between them is reinforced.

Finally it is possible to define a users network starting from threads: each time a user replies to another, the link between them is reinforced.

Working groups

We divided into three main groups, with different tasks:

  • The identification of links between threads and page sections (what section is the most debated?)
  • The identification of groups of users starting from threads and reverts network. In both networks indeed link represents opposition between users. Common metrics are based on the opposite paradigm (a link means agreement or at least proximity).
  • The identification of a visual layout able to enrich Wikipedia page with the new metrics.

Data objects

Starting from the given description from DMI and Barcelona Media researcher, we draw a first schema of possible data objects we could use.

This kind of schema was useful to identify on which objects we want to focus, and the objects hierarchy.

Some objects were already known: the page itself, the user, the single discussion thread, the page items (links, images, templates).

Some other not: is there a middle element between the whole page and the single item?

Some others were conceptual issues. In particular, what is an “actor” on a Wikipedia page?

Design challenges

The project presented several challenges for the design. At the sprint beginning we were aware of three main kind of analysis (thread chains, controversial elements, revert networks) that were able to cover just part of user requirements.

While knowing how the analysis was performed, no data sample was already available.

Some words used in user description were ambiguous: when talking about controversial topic, how to define a topic on Wikipedia? Is it a single page? Is it a group of page? If the latter, how to define which pages describe a topic? And again, when talking about involved actors (Latour), what is an actor on Wikipedia? How to extract this information?

Without knowing how these entities would have been modelled as data objects, for us was difficult to imagine how to visualize it.

Mockup evolution

As there were lost of open questions, instead of trying to create one coherent application we rather decided to create the most suitable visualization for each kind of analysis. The goal was to use the visualization to understand each analysis relevance and how to combine them in a single app.

We started working on a prototype created by DMI on the ‘Climate change‘ wikipedia page. It was the original page with controversial elements marked with different colours according to controversiality level.

We already worked on that kind of data for another project (EMAPS). So we started from that for the new mockup.

The idea was to keep the page layout with marked words but adding some features. As in the user description is asked to provide an overall view on the page, we decided to insert a minified version of the page, sometimes used in text-editing software like Sublime Text.

In the first static mockup of the page, the minified version was a fixed column on the right representing the whole page.

Working with the real data, we identified that most of the controversial items have a very low value of controversiality. To make them simple to identify, we choose a color scale moving between two colours instead that using different opacities.

As controversiality should be the first information, we choose to remove any other colour from the page, including images. Also we decided to use dark grey instead of black as text colour, to empathise  controversial elements.

Creating a minified version of the page arose the need to fine an element in the middle between controversial items and the whole page. Empirically, page sections seemed the best solution: the number of section per page is not too high, their size don’t vary too much – Wikipedia guidelines recommend to avoid too short or too long sections – and each section can be seen as minimum thematic element.

Drawing it we found that sections were also useful to link other information, like related discussion threads. To make even more simple for user to identify where controversy is, we suggested to identify a measure for the overall controversiality of each section. Below, the first mockup:

Using real data on coming from Wikipedia become clear that was not possible to use screen height as scale for the whole page.

We also realized that if the first interaction of the user is with the minified version, it was not useful to show the full text.

We focused on the minified version of the page, imaging a ‘folded’ interaction where in each section user can switch between the visualization and the full text.

To quickly create a mockup, we decided to just hack CSS, using the Block Font to replace words. It worked better than we thought. Having an (almost) working prototype was really useful to identify possible issue with visualization and correcting them.

working with CSS was useful also to quickly switch between the minified and extended version of the page

From beginning, we decided to insert a timeline-bar chart as first visualization representing the overall activity on the page (to be chosen between number of edits, number of active users, number of reverts). User, interacting with the timeline, can choose the temporal period he wants to explore.

Reflecting with the other researchers we understood that temporal evolution is not just about wich revision to show, but also to define on which period analyze the controversy. The same item could have a different controversiality in different times.

Timline become the tool to select a time span, controversial level indicators will be computed on that period.

Meanwhile, Barcelona Media and Sciences-Po found a way to join discussion threads to page sections. We decided to insert also that information in the main visualization – representing them as coloured dots, each one representing a single discussion and its controversiality.

User can open the discussion panel and see a detailed description of threads.

At the time, was difficult to identify what kind of details we had to show – as the analysis was on-going.

One solution was to show the full-text thread in the same page. The risk was to create something too rich (and complex).

In their original publication, Barcelona Media researchers analyzed the full discussion as a tree, visualizing with a tree structure. Below, an image taken from the presentation:

Since threads have a tree structure, are studied as tree structure, and have been previously visualized as tree structure, the ‘duck test‘ persuaded us to represent them as trees.

Also, D3.js has a nice function to do that.

With this visualization it is possible for user to understand some discussion dynamics. Colour shows thread controversiality. Each visualization is a link to the full thread on discussion page.

With the described interface big part of the user needs were addressed. Still it was difficult to see elements controversial in the past but then deleted from the page. To solve this issue we created a page with a list of all the elements involved in the selected time span and visualizing their properties. Working with a sample data about Climate Change, we identified a possible visual solution for these information.

Also in this case we tried to use a folded structure, allowing to get more details about some variables.

The timeline allows the user to see all the edits regarding an actor, marking the deleted and added part for each one.

The bar chart showing user types open a detail on users. We proposed this solution using the available data, without knowing how much would have been relevant to the user. After a discussion with the others we decided to show instead relationship among users (still in development).

Conclusions

Hackathon is a great format for this kind of projects – and with a such rich group of different expertise.

From our point of view, the main issue has been the identification of the workflow. As all the project parts – data collection, data analysis, and data visualization – are performed together in a very short time, it is difficult to set up a proper workflow, and to understand how each part can influence the others.

Lots of questions come to our mind during the hackathon:

Should we discard some kinds of analysis if they are not coherent with the overall application? Is better to keep in mind the overall structure of the app or focus on single visualizations, maybe very different among them? Can designers imagine what kind of data could be useful and ask to produce it? Or should it be the inverse?

Sciences-Po work on user scenario has been a fundamental tool to speed up the choices, we used it to identify the most relevant visualizations without the risk of doing something interesting for us but not for the project.

Due to the lack of time, instead of thinking new visualization for each element we started to work on the other researchers ideas, refining them and understanding how much they were actually suitable or not. Even if this process was forced by time constraints, it turned out as functional co-design practice for the interface development.

Another key factor of the hackathon success is the presence of great and quick developers: each idea from our side was accepted and quickly realized. It’s basilar to test visualizations whit real data to evaluate them. We discarded lots of ideas looking them realized with real data. Without the developers support, this validation process would have been much slower. In this kind of project at least one designer should have a basic knowledge of coding (both to work with data, and to visualize it). Even if it is possible to imagine apps without technical knowledge, it makes harder the communication with the other parts, and especially in an hackathon could make the process really slower.

What’s next?

We will work to create a minimum working prototype of the application and test it on different controversial pages, possibly with issue experts. The aim is to identify which issues address during next hackathon in Barcelona.

posted by Stefania Guerra
Friday, November 29th, 2013

Visualizing Climate Change and Conflict

Last October a small delegation of former DensityDesign students participated at Fall Data Sprint held by Digital Methods Initiative at the University of Amsterdam, a workshop part of EMAPS Project (Electronic Maps to Assist Public Science).
The DMI is developing an ongoing mapping project focused on climate change and leading approaches (and phases) to its study, namely climate change skepticism, mitigation and adaptation. In this workshop they moved towards what could be interpreted as a fourth phase: climate change and conflict.

The workshop envisaged projects concerning the actors and issues, future scenarios and climate fictions as well as the places of conflict. Are there leading scenarios about the coming conflicts (e.g., having to do with refugees, water, and other sources of distress), and whose scenarios are these? Who are liable to be the victims? Where do these conflicts take place? We were also interested in the part played by so-called cli-fi, or climate change fiction. To what extent is fiction, and the imagined futures, organising the issue space?

We took part in two of the three projects realized as visual designers.

Climate Fiction (Cli-Fi)

Full project on DMI website

The first project explores cli-fi—fiction about climate change—in order to understand and categorize fictional scenarios about the future, and the role of human actors in those scenarios. This project uses digital methods to gather data from Amazon and Google Books, in conjuction with manual classification, in order to understand the current zeitgeist of climate change in fiction.

Mainstream discourse suggests that the cli-fi genre aims to humanize the apocalyptic scenarios associated with climate change, and make relatable their potential outcomes:
“Most of the authors seek, at least in part, to warn, translating graphs and scientific jargon into experience and emotion…The novels discussed here do something valuable, too. They refashion myths for our age, appropriating time-honored narratives to accord with our knowledge and our fears. Climate change is unprecedented and extraordinary, forcing us to rethink our place in the world.” (Dissent Magazine)
We chose to investigate these two claims: What kind of scenarios does climate fiction depict? What kind of personal, aesthetic, and emotional experiences does cli-fi it putting forward, and what ‘myths’ does it is refashion?

In order to answer these questions we visualized each cli-fi book’s cover in terms of their scenario and associated issues. The outcome will be an ‘atlas’ of cli-fi places in terms of their scenarios.

FIgure One: Cli-fi Scenarios

FIgure Two: Histogram

When clustering the blurbs of climate fiction books “global warming” and “climate change” were central and seemed to be drivers of the narrative. This puts into question the statement about the normalization of climate change and it being backgrounded on the narratives.
The books appear to share not the details of how these future scenarios look like, but were closer in terms of the personal narratives they introduced. A further step would be to identify and classify in terms of the archetypes of this narratives using a framework (journey back home, searching for the lost land).
In terms of the scenarios depicted they were common themes: global warming, destroyed cities and flood.
When exploring what characters in the book ten to remember included: cities, easier times when fuel was available and the everyday geography that is in their present times gone.

The second project we took part in dealt with the climate conflict vulnerability and victims.

Mapping Climate Conflict Vulnerability and Victims

Full project on DMI website

What are the countries most and least vulnerable to conflict as a result of climate change?

How prominent are these countries in the online issue space of climate change and that of its sub-issues (demarcated through their vulnerability indicators)? How does this resonance vary across online issue space, looking at a country’s resonance within climate change on Google.com, Google News, Twitter (set of one year of climate tweets 22 Oct 2012 – 22 Oct 2013), and within UN General Assembly reports on climate change (dating from 2002-2012)?

How does the issue imagery of climate change (using Google Image results) address these vulnerability indicators? Do we see adapted or vulnerable landscapes? And is the issue humanized (showing people, animals and landscapes) or quantified (in scientific stats and graphs) in imagery?

The first step to address these problems consisted in collecting lists of countries, ranked by their vulnerability to climate change. For this, we have used three indexes with recent data: DARA’s CVM (2012, data from 2011), Germanwatch (2013, data up to 2011), GAIN index (2012, data from 2011). We triangulated the lists and found the countries most and least affected by climate change. For Gain and Germanwatch, we selected the top and bottom 50 countries. For Dara, we used ‘acute’ for the most vulnerable, and ‘moderate’ for the least vulnerable. Subsequently, we have created a world map indicating the least and most vulnerable countries.

Figure Three: Climate Vulnerability World Map

On the world map (Figure Three), the most vulnerable countries (in purple, occurring in all three lists) are located either in Africa or Asia. Very vulnerable countries (in red, occurring in either two of the lists) are also located in the same regions, mostly West African and Southern Asia. Other vulnerable countries (in pink, appearing in at least one list) are more spread out: from South America through Central Europe to Southern Asia.
The most resilient countries (in blue, also appearing in all lists) are also relatively dispersed: Northern Europe, Western Europe, Southern Europe, North Africa and the Middle East. Other resilient countries (in green, occurring in either two of the lists) seem to be mostly confined to Northern and Western Europe, but a few of them are also located in South America, Africa and Asia. Another group of resilient countries (in yellow, appearing in at least one list) is also quite diverse, to be found in regions such as Russia, Southeastern Europe and Western Asia, but also South Africa or Latin America.

The country profiles have been visualized on a separate chart (Figure Four), in which each sub-issue is visualized as a ring, resized according to the resonance of the shortlisted countries within that specific sub-issue. The map shows an aggregate value for each sub-issue on the top right. Each country then is profiled according to its resonance within sub-issues, which are sorted from highest resonance to lowest resonance.
The diamond-shaped labels indicate whether the country is considered vulnerable (red) or resilient (green).

Figure Four: Country Profiles

The profiles demonstrate that the countries resonate most within the sub-issue space of Infrastructure and Human habitat. Food and Water are other consistent sub-issues across countries. Health seems to be specific to Afghanistan and Israel, whereas Ecosystem is specific only to Iceland.

Subsequently, for each country the resonance in the various issue spaces (Google, Google News, Twitter and UN documents) is visualized in a ‘resonance bar’. These resonance bars are placed on a footer that is either green (resilient) or red (vulnerable). With Gephi, a network graph is made of all issues and countries, where only the strongest links are retained to position the countries in relation to their ‘ties’ to a sub-issue. The diameter of a sub-issue shows the cumulative resonance of the set of countries for that sub-issue. The relative position of the countries refers to the commitment of a country to a particular issue (Figure Five).

Figure Five: Resonance Map

Overall, it shows that Infrastructure is an concern of resilient countries, and Human habitat is a concern of vulnerable countries. Furthermore, positioning the sub-issues based on the cumulative resonance, reveals the close relation of Health and Food with the sub-issue of Human habitat.

The Tree Maps that follow visualize the results of the climate change related Google Image searchs according to the index we listed above.
These The most dominant subset of climate change related images according to Google Images is ‘Landscapes’, ‘Science’ and People. The most prominent subcategory within the landscapes was imagery related to disasters, followed by land&sea (floating icebergs).

Figure Six: Images Quantity Overall

Quantities of images for the most climate vulnerable countries

In addition to the quantification of the number of images into the Landscapes, Science, People, Animals, Cartoons, Literature, Metaphors and Other categories, we decided to zoom in on the Landscape imagery, which was translated, visually, into an Issue Landscape comprising panoramic representations of each indicator, grouped according to the subject matter shown in each image.

Here, further analysis could be done of the resonance of countries per online issue space, and per sub-issue. The resonance per sub-issue could then be compared to the ‘hot spots’ defined by the vulnerability indexes per variable (as listed in GAIN and DARA). Furthermore, additional sources could be used, such as blogs.
Another issue that deserves continued research concerns the countries that occurred as both vulnerable and resilient in the triangulated lists (see above, in a lined pattern on the vulnerability map). Such countries – US, Brazil Germany, Italy and Australia, among others – could have been scored both negatively and positively because of the very different indicators used by the sample indexes. For instance, Germanwatch focused on the effects of extreme weather, as quantified through human and financial loss, while GAIN and DARA captured other factors, such as health, infrastructure, habitat and ecosystems. Thus, it would be interesting to see, per sub-issue, why a country can rank both low and high, and also whether this contradiction is reflected by the Web.

Team: Federica Bardelli, Gabriele Colombo, Carlo de Gaetano, Stefania Guerra, Tommaso Renzini

posted by Michele Mauri
Tuesday, November 19th, 2013

“What the Frack is going on?” at Toulouse Novela Science Festival

“What the Frack is going on?” or, more briefly, “What the Frack?” is a visualization project developed during the Integrated Course Final Synthesis Studio A.Y. 2012-13 and mapping the controversies of soil degradation and hydraulic fracturing. The authors have been awarded with the opportunity of presenting the whole work during La Novela – Fête Connaissance, science festival that took place in Toulouse from September 28th to October 12th, 2013.

Our students’ presentation was part of the International Prize of Cartography of Controversies, promoted by Bruno Latour, whose aim is to collect and publicly present the best projects of the year around topical controversies in order to provide citizens with tools to understand the debates and take a stand within them.

Chiara Andreossi, Massimo Guizzetti, Cristina Palamini, Giulia Peretti and Silvia Recalcati presented on Monday October 7th, at 6pm, in the Salle du Sénéchal in front of students from French (Telecom Paris Tech, Les Ponts Paris Tech, Les Mines Paris Tech, Sciences Po) and American (Princeton University School of Architecture) institutions.

Have a look at their website www.whatthefrack.eu to see all the project outputs and to find out more about the topic!

posted by Michele Mauri
Thursday, November 14th, 2013

A short interview on Raw

Recently, “The Why Axis” published a post on Raw. They asked us some interesting questions, here we post the complete answers.

Q: How did you decide to transition Raw from an internal to a public tool? How did that evolve?

As a research lab, we are always committed with exploring the potential of data visualization. In order to obtain reproducible visualizations and facilitate our design process, we have developed numerous scripts in the last years. As one-shot scripts, conceived and optimized for very specific tasks and contexts, it is usually very difficult to share and reuse them, often for us too. As deeply involved in educational activities, both at the Politecnico di Milano and in other universities and institutions, we have seen how very time consuming the visualization process can be for students, forcing them to concentrate their efforts to the development stage instead of exploring and experimenting with the data and new ways of visualizing them.

For these reasons, we tried to capitalize our efforts, making scripts and tools more reusable. Raw is the result of this process. It is open-source because we hope to involve the data visualization community in collaborating and implementing the best visualization techniques available.

Q: How did it fit into your workflow at DensityDesign?

Most of our works have an exploratory nature and often even the domain experts we work with do not know how to make sense of the data. Raw is first of all a data exploration tool that allows to quickly produce, discuss and edit visualizations in order to better understand the data. When we find a promising storyline we do most of the data refinement and manipulation with tools like Open refine, Excel or Google Spreadsheets and partly with custom scripts (mostly in Python, JavaScript or Processing). Since Raw allows to visualize datasets in a very quick and easy way, we usually try different layouts (which sometimes suggest us to go back to the data and rethinking about our process). Once we are happy with a visualization, we export the SVG and edit it. At this point we do most of the graphic refinements using vector graphics tool (e.g. Adobe Illustrator) and according to the medium the visualization is built for.

Q: How did you decide which layouts to include in the first release? Were there any reasons you avoided simpler layouts to start?

Many of the layouts come from works we have done in the last years (e.g. “Design Research Map”, “Link#10: Decode or die”, or the works we did for “La Lettura – Corriere della Sera”). These layouts allow us to perform a series of visual analysis that are not possible with other techniques already available in softwares such as Microsoft Excel or Adobe Illustrator. Some others (e.g. dendrograms or sunbursts) come directly from d3.js (or its community) and we decided to include them as well, also to test how difficult it could be to add new layouts in Raw. We avoided “simple layouts” like bar charts or pie charts just because there are plenty of tools that allow you to create them in simple and effective ways.

Q: What are the plans for the future of RAW? How can the larger data visualization community be a part of these plans?

As for the near future, we are currently working on a major redesign of Raw’s architecture, in order to provide APIs to easily create and customize visual layouts and data models. At the same time, we are consolidating the documentation, allowing other developers to understand and improve the code. Our hope is to create a community around the tool, in order to gather new ideas and discuss uses (and misuses) of visual layouts. Moreover, we would like to add new layouts (you can find more information about this here: https://github.com/densitydesign/raw/wiki/Layouts) and understand if and how we can extend the tool to create visualizations based on more than one dataset, such as graphs or geographical data visualization.

Our long-term goal is to understand how building a sustainable research model around Raw. As we stated, Raw was born as an internal response to our needs and we had no idea about the kind of reactions this tool would generate. So far, reactions have been extremely positive and we would like to spend more time and resources on this project. However, being a self funded research project and knowing the amount of efforts needed by this kind of tools, we are aware that the main issue will be to provide a continuous maintenance and support. We are still in the process of understanding and evaluating the possible solutions and thus suggestions and/or collaborations are very welcome!

posted by Michele Mauri
Thursday, October 24th, 2013

“Around the world: the atlas for today” featuring our students’ works

Feature Density Design - Around the world: the atlas for today

We just received and unpacked “Around the world: the atlas for today”, published by Gestalten. The book features some of our students’ works, made during the Integrated Course Final Synthesis Studio A.Y. 2012-13.

Posters were drawn during a six-month course on the visual representation of complex phenomena. Students analysed controversial topics, developing also visual reports, short animations and interactive applications.

Below you’ll find the list of published works and the link to each project page: have a look at them if you’re interested in the making-of and to learn more about each topic!

Feature Density Design - Around the world: the atlas for today

I can choose, right?“, part of the ”Unborn discussion“ project, by Alberto Barone, Maria Luisa Bertazzoni, Martina Elisa Cecchi, Elisabetta Ghezzi, Alberto Grammatico.

Feature Density Design - Around the world: the atlas for today

The cradle of change“, part of “The morning after pill in Italy“, by Viviana Ferro, Ilaria Pagin, Sara Pandini, Federica Sciuto, Elisa Zamarian.

Feature Density Design - Around the world: the atlas for today

Meat or threat?“, part of “The palm pattern” project, by Irene Cantoni, Claudio Cardamone, Sara De Donno, Fabio Matteo Dozio, Arianna Pirola.

Feature Density Design - Around the world: the atlas for today

The energy decision change“, part of “Every light has its shadow” project, by Giulio Bertolotti, Elia Bozzato, Gabriele Calvi, Stefano Lari, Gianluca Rossi.

Feature Density Design - Around the world: the atlas for today

Also”Cover Mania“, originally published on La Lattura #8has been featured.