Transparency, algorithms and data

Notes from the Data Transparency Lab 2017 in Barcelona

As part of Open Knowledge Finland, I attended the Data Transparency Lab(DTL) Conference in Barcelona last December. Unfortunately I only managed to attend the second day but since the sessions were recorded, I will be watching the rest online soon. Nevertheless, the second day had a lot of interesting talks and people, and I’d thought I’d note here some of the highlights.

The sessions, seemed to have an emphasis on transparency in the times of algorithmic complexity, which was great to listen to, even more so because the topic was covered from multiple angles. It was described from the angles of policy, design, ethics as well as practical demos accordingly in each session. So total the discussions were on a mix of high level discussion as well as practical applications. In this post I’ll try to summarise few of the talks, give some links to the people when possible, and add a few of my own thoughts as well.

The day started with a keynote from Isabella de Michelis from ErnieApp who described their startup’s point of view on companies and personal data. They argue that once users are aware of how they generate value for companies, they would be more willing to shift their permissions to a vendor type relationship rather than a customer one. For example, when owning an IoT connected fridge, a household should be able to get a discount on their electricity bill (since information about the fridge’s contents would be shared eg. to supermarkets for profit). An interesting position for sure, one which could be debated to clash with a later presentation from Illaria Liccardi.

Image from Liccardi’s research on user permission sharing. http://people.csail.mit.edu/ilaria/papers/ShihCHI15.pdf

Illaria Liccardi, as part of the ‘How to foster transparency’ session, presented her research from MIT on how people’s privacy permissions would change on apps they use, depending on the information provided. The results are possibly not as obvious as expected. They actually found that people are much more willing to give permissions for use of their data when no indication of how they will be used is given. However, they are less willing to give permissions to personal data when the reasoning of use is vaguely worded and yet somewhat permissive when there is a more detailed information from the side of the companies. The full research can be found on her page.

These are interesting findings that imply that for the general public to understand and give consent to give out personal data for company profit, there needs to be both an initial motivator from the company side, and also a good balance between sufficient and clear information on it’s use.

Namely, if people are more permissive when knowing less, then it is possible that the path to transparency won’t be as user-driven as expected.

Nozha Boujemaa from DATAIA institute, ( who was also the chair of that session), nicely put it that ‘data-driven does not mean objective’, which is something I can personally imagine myself repeating endlessly, especially in the context of data-driven journalism. A personal favourite entry point to the topic is on feminist data visualisation by Catherine Dignazio, who explainshow data and datasets can be wrongfully coupled with ideas that they are objective or presenting the truth. Nozha Boujemaa also discusses what would computational decision making systems need to be considered transparent. She notes for example that many Machine Learning (ML) algorithms are open sourced yet they are not open-data-ed. Meaning the data they have been trained on are actually proprietary, and since the decisions are actually based on exactly this trained models, then the openness of the algorithms are less useful.

The CNNum, the French national digital council, are actually trying to figure out how to incorporate and implement these principles for accountability in the french regulatory system. Moreover, their approach seemed to be really aware of the environment they are trying to regulate, in terms of diversity, speed of change cycles and they made a good note on the difficulty in actually assessing the impact of power asymmetry caused by algorithms.

Jun Huan (NSF) with the building blocks to model transparency holistically

Jun Huan (NSF) with the building blocks to model transparency holistically. Photo by author.

 

From the Nation Science Foundation (USA), Jun Huan, went a step beyond accountability for algorithmic systems towards their explainability and interpretability. In their research they are creating ways ML systems can identify and therefore indicate when they are actually ‘learning new tasks’ inspired by the human constructivism learning theory. It definitely sounded promising though due to my shallow knowledge on the topic, I am prone to algo-sensationalism!

Simone Fischer-Hubners, professor from Karlstad University, was chairing the session on ‘Discrimination and data Ethics’. She presented real world cases of discrimination and bias, such as the racial discrimination in online ad serving, price discrimination based on operating system used, predictive policing as well as an example case of gender discrimination in bibliometric profiling based on biased input data (women are less likely to self-cite). The last example is especially interesting because it highlights, that when we refer to biased computer decision making systems we tend to refer to the computational side of the system. However as in the example of the bibliometric profiling in academia, women are less likely to self-cite therefore already the ranking carries the initial biased sample of reality.

Julia Stoyanovich explaining why we should asses data science in its whole lifecycle.

Julia Stoyanovich explaining why we should asses data science in its whole lifecycle.

Julia Stoyanovich referred to that we should be assessing the fairness, accountability and transparency throughout the full data science lifecycle in order to be able to make valid conclusions on such systems. Her research with Data Responsibly, is actually centered around this topic as well. Last but not least, Gemma Galdon Clavell from Eticas added their own approach on research and consulting on ethical issues arising from technology and the importance of creating assessment criteria for technologies on society.

DTL actually funds projects globally, many actually awarded to university research groups, to create tools on the theme of data transparency. A part of the sessions was actually devoted exactly to those tools developed during 2017. The demos seemed by majority to be browser add-ons (and an app if I recall correctly ) that inform users on privacy leaks and ad targeting. Relevant topics for sure, though I admit I did catch myself pondering the irony of most privacy related add-ons being developed for Chrome..

The way I see it is that these subjects should eventually be discussed in even more wide audiences since they will affect the majority and in our daily life. It is therefore great to hear the researchers, dedicated circles and organisations who are actively working on these topics first hand.

Keep it open!