The politics of image search - A conversation with Sebastian Schmieg [Part II]
I’m struck by the pedagogical quality of your project Search by Image: when people see it, they immediately get an understanding – not necessarily a technical one – of how many things are mediated by algorithms. We started Part One of the interview reflecting on the different educational disciplines and contexts you have traversed, and I wonder what is the importance of a pedagogical dimension in the work you’re doing?
This is a good question; a question that I’m concerned with all the time. I’m not an artist who is situated in the art world. For me it is more of a process of trying to understand things and turn them into a practice while working with people who are thinking about the same things and maybe contributing to what they are thinking about.
I hope my work is not pedagogic. I’m always trying to keep some kind of ambiguity. I’m not trying to say, look this is good or this is bad. But obviously, I’m really interested in criticality: I would say that I’m concerned first critical and conceptual problems first and, secondly, the visual.
On your question regarding the different backgrounds, I would say that I’m always going back and forth, between two poles. I’m not a computer scientist, but I have some understanding of the field. Having a programming background and understanding coding was more important in the days when Search by Image (2011) was conceived. Today, I think that not having an understanding is also an advantage, as the code can only lead you so far.
There is a dimension of ‘reverse engineering’ in your work that is very strong – in Search by Image, for example, you probe the search algorithm in order to understand how it works. I’m intrigued by this practice as the more I read about machine learning algorithms, the more I notice computer scientists are quite puzzled by the behaviour of their algorithmic creations, because they cannot understand what’s happening by simply looking at the code.
Technologists have now developed a method called “adversarial imaging”, where they will feed images instead of code into a program to understand how it works. I was thinking about how you uploaded a transparent image to Google in a version of Search by Image, I wondered if this was already a form of “adversarial imaging”? Perhaps there is now scope for the computer sciences to fully embrace the importance of visual practices of computing not just as a hobby, but as core to their discipline.
Absolutely. I recently started reading The Human Condition by Hannah Arendt, a book from the 1950s, which has relevance to this discussion. In the prologue of the book, Arendt explains that we have come to a point where humans have the logic and the math to create machines, and can prove that the math works. However, what we lack is the precise language to describe what the machine is and what it does. And even though it’s our own creation, we might not understand it any more.
Today we find ourselves needing machines to do our thinking, seeing and speaking. So we get into the position where we have to follow our own creations. Or, as Arendt says, we have : “indeed become the helpless slaves, not so much of our machines as of our know-how, thoughtless creatures at the mercy of every gadget which is technically possible, no matter how murderous it is”. Perhaps, then, the role of visual practice is to help grasp what we don’t understand anymore.
With respect to your point about reverse engineering, I guess initially I would have said OK, Search by Image utilises this as a strategy, to some degree. But then you would need to reverse engineer my project to understand how I used Google Search, so it’s messy. Perhaps a better question is: what would I find if I reverse engineered Google Image Search – would I find a system that could be used in a way that I might find more interesting?
What do you think? Because I wonder whether a reverse engineer liberates something that has the potential to be used differently. Or perhaps the process reveals something that has existent power and we might not want to further use?
What is really interesting about your work, and its continued relevance, is the fact that it’s open ended. It’s connected to what Geoff Cox calls “the performativity of code”. Before code executes you don’t really know what its agency is, so it’s not just about making a statement but also making ways to create a relationship with this other dimension.
But I do I have the impression that your work is pedagogical, not in the sense that you tell people what they should think, but by engaging them in a practice of learning. It’s also probably because (as you mentioned earlier), your work is also a tool that you build, which allows you to learn yourself. So it’s very much about the process of learning – not just about working out what is happening, but also about how and what the algorithm is learning about us. These are concepts that fold in each other, which makes it very rich and pleasurable also, to engage with such a project on a longer time. It’s not just about making a few clicks, and getting the joke…
Regarding the performativity of code…The thing I realised when viewing Search by Image on the Media Wall at The Photographers’ Gallery, is that it cannot be shared on social media because the work moves so fast that it can’t be photographed. Of course, I love that people take pictures of it, but it makes no sense to take a picture of it: you are simply taking a picture of a picture that momentarily appears on a screen. It is something that only makes sense over time, it has to ‘perform’, and people have to spend some time with it.
How much importance do you give to the fact that it is live?
To me it is very important to exhibit it live, so that it can evolve from day to day. You don’t see it when you interact with Google search online, but its search index is very fragile, it’s re-linking itself all the time, and to me it was important to make this visible. Also, I really wanted to have this alternation between the images of Lena and Fabio. Lena is the name of an iconic test image used in computer science since the early 1970s. It is a cropped photo of Swedish model Lena Söderberg that appeared in Playboy magazine in 1972. Fabio on the other hand is a test image depicting actor and model Fabio Lanzoni. Lanzoni’s image was introduced by Deanna Needell, a few years ago, addressing gender issues in the scientific community as well as in society at large. When considering this story, it becomes clear that it’s not just artists who are thinking about these things. It’s also people inside computer science, that are addressing the history around Lena as a problematic thing.
That’s interesting because Lena appears at a point where there were no distinctions between those who created algorithms and those who created the datasets. An early paper by David Marr , for example, proudly shows his visual data set of … six images. So, it was a very specific and small community of people who were, at the time, involved in the work of image annotation, and constituting training sets in machine vision. And they were doing this with their own bias, their own pre-conceptions about gender, their own ideas about race and so on. But today, this work is generally outsourced to a group of workers subject to a specific set of working and economic conditions – for example those on Amazon’s Mechanical Turk who could be paid a few cents per annotation working without any social protection.
We are in a paradoxical situation. On the one hand, the annotation is increasingly considered as what fuels machine intelligence and makes algorithms relevant. And, on the other hand, the conditions in which these annotations are produced, force the workers to make them in a hurry, paying the minimal amount of attention.
I totally agree with that. For example, there is an algorithm based on data generated by Amazon's Mechanical Turk, which can predict depression based on your Instagram photos. Apparently, it's more reliable than a general practitioner although not very reliable overall and dependent on what filter you use. Anyway, the interesting part was the research revealed that many of the Mechanical Turkers who supplied the data are actually depressed. So that made me think… what do people create under these conditions?
To me it’s like when you have a nice pullover: it might be wonderfully cosy, but it could have been made by a child in hazardous conditions – and as a consumer it might be impossible to know. I think that’s really what is at the core of this discussion around Artificial Intelligence – the whole AI thing is a very capitalistic setup. However, it’s not just pullovers being made under poor working conditions but also services and products, which are increasingly driven by AI.
You mentioned that you are discussing these ideas with others. I was wondering, who are the people you are in conversations with? Are these people living around you, or groups of artists, thinkers, or computer scientists?
I’m doing a lot of collaborations in general and this topic has relevance to the work that I do with Silvio Lorusso, for instance. We produced the book Networked Optimization, which was about making visible the micro-labour that you do when you read an e-book, which generates data for Amazon. In this respect it’s important for us to acknowledge it’s co-authored by us and by Amazon Kindle readers.
Another project which is important to me, which I’m still trying to find a way to resolve is the piece I made in 2015, ‘How to appear offline forever’. It addresses efforts by Facebook and Google to build swarms of solar powered drones and balloons that would allow more people to connect to the Internet. The piece is about visibility, labour and colonialism in a networked world.
It involved working with two writers in Sri Lanka, and it turned out that the only way for me to work with them was to use an outsourcing platform, which I felt very uncomfortable with. I had a good budget by Abandon Normal Devices who commissioned the project however I ended up spending it almost all on outsourcing platforms. As embarrassed as I was, at least I wanted to pay them well. Later I got the opportunity to travel to Sri Lanka, and talk about the work at the Colomboscope. I invited one of the two ladies I worked with, and had a conversation with her. It was very important for me to involve her perspective on it and to make clear that this is a shared project. It is easy to talk about bad working conditions and so on, but it is very different to hear about it from people who are actually subject to it, but who still think the platform is great.
That brings me, maybe, to the last question. I’m interested in the connections between the user who exchanges images on social media, and the outsourced worker who annotates images on Amazon Mechanical Turk. What kind of connections can we make between somebody who is working all day, describing and processing images, trying to make as much money as they can… and people updating and annotating their Facebook or Flickr accounts? I can see that there are clicks produced by both groups, but are these two contexts so different? For instance, with Google, when do they need professional image annotators, and when do they need data generated by internet users ‘in the wild’?
It seems to me that it’s both different and not different at the same time. I was wondering if you had any thoughts around these relations, because for me it is really puzzling.
First of all, I think the connection between a person uploading a picture, and then another person looking at that picture and annotating it – which is absolutely not the intended audience – is an interesting one. There is this really weird connection between the image annotator to other people who are actually living their life, sharing nice photographic moments.
When you speculate how many professionals that Google requires, I’m beginning to think that Google doesn’t actually need any professionals. For instance, for the past five years Silvio Lorusso and I have been collecting all CAPTCHAs that we solved, and so I think I have a pretty good understanding of how these systems evolve and how they put the user to work.
Interestingly, the very latest reCAPTCHAs ask users to draw a line around street signs –not like a rectangle, but a proper line, like in a Segmentation.Network. This is what you or I do when we have to convince Google that we are not a robot. So apparently for them, it’s work that anybody can do. They just need to generate enough data.
So can we compare our clicks to those who actually make a living, or try to make a living, through Mechanical Turk?
That’s why I was so puzzled by the conversation with the lady in Sri Lanka – she was so happy with the outsourcing platform, and I thought it was terrible. The employer who hires somebody on Mechanical Turk can view the Turkers’ screen anytime, to make sure they actually work. The lady really thought this surveillance was justified in the name of quality, because she believed other people don’t work, but act as if they do. I find this really messy, but what I’m always trying to do is to make a connection. It is not just ‘them’, it is all of us, and the system itself, which is exploitative. Artificial Intelligence is built on quite an exploitative system, and is dehumanising as well. It’s dehumanising the workers and it is dehumanising the rest of us. I really hope stuff like platform co-operativism has a future where we all together – workers and users – own the platforms and the infrastructure in general. After all, we're the ones building and maintaining it.
So you think that with fairer conditions this kind of AI “training” method which requires millions of clickworkers could be interesting or…?
That’s what I was trying to get at with your question about reverse engineering. At its core, is such a system already exploitative by design or not? I did a talk recently about these topics, and I got asked whether I think neural networks are exploitative. And I said: “No, it’s just an algorithm”. On reflection, I still think it’s true but the whole setup is problematic and I don’t see how it could become a positive thing any time soon.
I’m struggling with that myself. I’ve been doing some reading exploring where the method of “training”, that is used in neural networks, comes from. It is linked to a certain practice in psychology, of testing animal intelligence, where lab animals were either rewarded or punished. It seems that this historical background cannot be taken lightly, because when you see how “training” is implemented via platforms such as Mechanical Turk, the history of training in the lab comes to mind. The Turkers are not just doing their work; they are opening up their cognitive processes to scrutiny. They have to respond to challenges very fast and are rewarded or not by the requester, with no further obligation on his part to explain why he accepts or refuses to pay for the work done. Just as the lab animal needs to find its way through the maze, the Turker needs to figure out what triggers the reward or the punishment. Which answer to give in order to avoid the requester's rejection without any possibility of establishing contact or negotiation. In the case of an image description task for example, what is an appropriate label? Which one corresponds to the requester's expectation?
Behind every choice of words, there is the perspective of gaining or losing the task's reward. And in turn, the algorithm is implemented with similar methods. The more the algorithm approximates the rules that are in the training data, the more it is rewarded and the more the processes that lead to this successful approximation are reinforced in the programme. If the learning in machine learning is reduced to reward and punishment, we can wonder what kind of intelligence is produced there. It’s true that it is really important to pay the Turkers better. But is it enough to really change what is wrong with the system? Where should we look for a more fruitful approach to training and learning?
Yes, absolutely. I think the connection to training an animal, or to making these experiments, is spot on. I’m also thinking of the time-motion study technique, which is a method to determine the most efficient movement of an industrial worker, so it can be extrapolated to all workers in the name of optimisation and efficiency. And in the case of machine vision, as we’ve been discussing, the task is to extract little bits of knowledge, bit by bit… So, on the one hand you have the training of the system, but on the other hand the misery of the continuous draining of cognitive resources. You suck it out bit by bit, and this doesn’t change when you pay better. Ultimately, it is about what stuff that you can extract, until, in the end you don’t need the person any more – as in Vonnegut's “Player Piano".
This was my point in generating photos based on “This is the Problem, the Solution, the Past and The Future” dataset for my online commission at The Photographers’ Gallery. Of course these images won’t represent the future 100% but I’m also -in a very unsuccessful way- saying, “Look! I’ve got all your photographers in my computer now, I can make my own photos through your previous work…!”
Arendt, H. (1998) The Human Condition. Chicago and London: University of Chicago Press. 2nd edition. p.3
Suggested Citation:Malevé, N. & Schmieg, S. (2017) 'The politics of image search - A conversation with Sebastian Schmieg [Part II]', The Photographers’ Gallery: Unthinking Photography. Available at: https://unthinking.photography/articles/the-politics-of-image-search-a-conversation-with-sebastian-schmieg-part-ii