Unevenly Distributed

Florian A. Schmidt is a professor for Design and Media Theory at the University of Applied Sciences HTW Dresden.

Read full Bio

The Future Is Here!, the title of Mimi Onuoha’s video project reflecting the human side of crowdsourced image labelling, is spot on. The stories I have been told by crowdworkers from across the globe doing this work full-time indeed often have an eerily Gibsonian ring to them. Especially the stories from Venezuela.

As a researcher specialised on digital labour, I had been following the value chain of German car companies backwards, trying to figure out who is doing the image labelling in the ambitious (and potentially over-ambitious) goal of producing fully autonomous vehicles. (This investigation was funded by the trade union-related Hans-Böckler-Foundation and published as a report in a German version as well as in a condensed English version in 2019.)

As it turned out, the supposedly self-learning algorithms for the supposedly self-driving cars as yet need a myriad helping hands from humans to get things straight. And because the hype for AI in general and for autonomous vehicles, in particular, coincided with the collapse of the Venezuelan economy, people from this country made up to 75 percent of the workforce on some of the largest platforms specialised on crowdsourced image annotation in 2018 and 2019.

Typically, these images are stills taken from videos shot in traffic and they are then manually annotated so that a machine can recognise each and every object within the frame. Humans have to, for example, draw so-called bounding boxes around cars or have to assign descriptive labels to every pixel in the video frame. These so-called semantic segmentation maps are currently the most common and most time consuming of various forms of image labelling.

The annotations have to be as detailed as possible so that the algorithm can learn to recognise objects, eventually without supervision, and learn to predict how these objects – vehicles or people – are going to behave in traffic. The reliability of the predictive machine-learning models is directly dependent on the precision of this 'ground truth' or training data and the production of this can only partly be automated.

A full semantic segmentation of an image can take a human up to two hours to complete, so for the automotive companies who need this data in bulk, fast and with high precision, this work can quickly get very expensive, especially if done in-house. This is why the car companies use specialised outsourcing platforms such as Scale, Hive, Playment, Figure Eight, and, until recently Mighty AI. Older platforms such as Amazon Mechanical Turk, which are also used for image annotation in other fields, are not precise enough for the demands of the car industry that only buys data with a guaranteed accuracy of over 99%.

At Scale.com, the semantic segmentation of a single image with 99.2% accuracy costs the clients $6.40 – and they usually need hundreds of thousands of these images – which explains both why the industry is attracting a lot of capital and why it needs so many crowdworkers. In effect, the influx of capital is conjuring the crowd-based workforce into existence, rapidly creating an oversupply of labour needed to buffer the ebbs and flows in demand. The platforms translate the tasks into the language of the labour market they want to access, be it India, China or the Spanish speaking part of the world, and the workers find their way to the tasks on offer, for example, through special forums on Reddit. For the car AI tasks, hardly any cultural knowledge is needed, just the ability to focus for long hours.

In contrast to content moderators working for US social media platforms, who, as a legacy of colonialism, are often based in the Philippines (see new book by Sarah T. Roberts or the 2018 documentary The Cleaners) the workers doing image annotation for the car industry don’t suffer from the type of work as such. It is exhausting work, but not soul-crushing, and it can give workers a rewarding sense of mastery to finish an image with high accuracy. The workers do, however, suffer from the extreme precariousness of the work – the constant worry whether an algorithm or management will allow them to work the next day. In a good week, on a good platform, they can make 50 Dollars, in a bad week only 10, which is still more than what they can earn offline with a regular job in Venezuela these days.

At first glance, the people training the AI systems are normal knowledge workers, sitting at their laptop, clicking away at the Internet all day like most freelancers. Their home-office, as Mimi Onuoha’s project shows, might look similar to yours. Yet, to make ends meet, they not only have to subjugate themselves to algorithmic management and gamification regiments – more importantly, they have to navigate highly complex, volatile and opaque global markets for the production of ground truth data.

They work as freelance sub-sub-contractors, switching back and forth between different platforms that funnel the work from supranational corporations to people in the Global South, while the car companies try their best to stay anonymous. "We don’t know who we are actually working for", says Jose, a 26-year-old civil engineer from Venezuela, "but we would feel better, more part of the project if we did. We used to see quite a lot of images from Germany, we assumed from Volkswagen and Porsche, but not anymore; recently it was more California, San Diego, that’s why we thought Tesla". Ironically, the sub-sub outsourcing goes so far, that also the car companies and crowdsourcing platforms can’t be sure, who in the end is actually doing the work, because the workers, in turn, are renting out their accounts to others when they are not using them. Both ends of the tenuous sub-outsourcing chain are unknown to each other.

While the Venezuelan workers are hyper-connected via international groups on Slack and Discord – official channels by the companies and unofficial private channels to help each other with the tasks and rent out accounts – they are often physically stuck in abject poverty with out-dated computing equipment. Their livelihood is constantly threatened by blackouts, corruption, organised crime, food-shortages on one side of the screen and the capriciousness of algorithmic management, venture capital flows and geopolitical sanctions on the other.

Douglas, a 21-year-old engineering student from Venezuela, explained to me via Skype: "The situation is better than five years ago, mainly because – this may sound crazy to you – most of the criminals have fled the country. The crime rate is still really high, but it is more secure now to go outside without getting robbed. But there is not much for me out there anyway, because I do almost everything here on the computer". However, staying at home didn’t protect Douglas from getting robbed: "During one of the blackouts, people climbed into my courtyard, where, right under my window I keep some life-stock for extra food. They took a few chickens and climbed back. Behind my house, there is a kind of wasteland with improvised settlements and from there people must have observed that I have chickens here".

Douglas and Jose both used to work for the Seattle-based Spare5, by far the most popular of about a dozen of crowdsourcing platforms specialised on data annotation for the automotive industry. Until recently, the company was known to its corporate clients as Mighty AI. It has become a common phenomenon that these crowdsourcing firms appear Janus-faced with complementary brand names and websites: They have a customer-facing front, emphasising their AI prowess, and a worker-facing back-entrance emphasising the opportunity to quickly earn a handful of dollars. (Scale AI is actually remotasks; The Hive AI is actually Hive Work).

In the summer of 2019 Uber ATG acquired Mighty AI, discontinued that side of the brand and be-gan restructuring how work is organised on the accompanying worker platform Spare5, which at that time had a freelance workforce of half a million people. Three quarters of these so-called 'Fives' came from Venezuela. Immediately after the acquisition and without notice or proper explanation, Uber geo-blocked all Venezuelan workers from accessing the platform.

Andrea, a long-time member of the 'Super5s', an elite group of a few hundred of the most productive and accurate workers, hitherto always in close contact with management, recalls: "When we realised that us losing access was not a technical error, but an intentional action to leave us on the sidelines for an indefinite time, it was as if the floor under our feet disappeared". At first, it seemed that Uber’s exclusion of Venezuelan workers was due to US sanctions against Maduro because the workers got paid only three weeks later and only after signing a statement that they are not affiliated with the Government.

One of the few viable alternatives for Venezuelan online workers is to grind for virtual gold or 'dragons bones' in the old school online role-play game RuneScape, or to level up characters in League of Legends. A type of labour done for clients in the Global North, who prefer to outsource the drudgery of their hobby to skip a few rungs of the ladder in their make-believe status game. All sides in this situation make an arbitrage from extremely uneven distribution of wealth, time and power in a hyper-connected global market for information and labour. Since the beginning of 2020, Venezuelan workers are allowed back in at Spare5, while other countries are now shot out without explanation. At the time of writing (February 2020), web-traffic to Spare5 comes to 67% from Venezuela again.

The crowdsourced production of ground truth image data is likely to grow even further over the next years, yet, from the workers’ perspective, it stays an inherently unsustainable and unpredictable line of work that is threatened at every point by additional layers of automation and sub-outsourcing. Machines will be able to do more and more of the image annotation tasks, but will have to be continuously trained by humans for new tasks and new edge cases. Paradoxically, while it is hard to automate the training of the machines, it is easy to automate the training of the people who are then going to train the machines.

Within these systems, the crowdworkers are just another cognitive processing layer within a much larger automation and outsourcing apparatus. Humans and machines form a cybernetic organism, built from layers of artificial and human intelligence. Interestingly, the new specialist platforms try to get a competitive advantage over each other by experimenting with different stacking-orders; alternating successions of humans and algorithms. In this, the platforms resemble not only what Benjamin Bratton has described as 'The Stack' (Bratton 2016), but there is also a structural self-similarity reminiscent of the various hidden processing layers within the neural networks.

Because of the complexity, opacity, and contingency of these global image processing stacks, only raw input and annotated output can be observed, what happens in between remains practically unknowable. It is important to keep in mind, that all the major car companies train their neural networks based on different and differently produced data sets, with varying human-machine layers and varying classes of objects involved. If we get a future with fully autonomous vehicles, different car brands might show varyingly erratic behaviour in edge cases, depending on the diet of images they have been fed. Authorities responsible of traffic safety would either have to make extensive behaviourist tests with the cars, of course without ever being able to check all edge cases, or demand a fully transparent and standardised image annotation and training process from all brands, or even only allow one standardised universal system that all brands con-tribute to. Most likely, however, they will only allow these vehicles in controlled special zones, in which all edge cases caused by humans or animals can be excluded. For now, the future struggles with a present reality that is far messier than can be accounted for.