History and environmental impact of digital image formats
In the first century-and-a-half after Louis Daguerre announced his invention in 1839, photography was an analogue, chemical process. It involved plastic film coated with an emulsion of gelatin and silver halide crystals, and in the case of color photography, in multiple layers with filtering layers in between. Developing film requires more chemicals, and a lot of water. All in all, it’s not a particularly eco-friendly technology: every single photo leads to some amount of toxic waste.
Digital photography is by comparison a lot more sustainable. There is no single-use film, no physical waste at all. Getting a photo printed of course still requires ink and paper, but physical prints have become relatively rare — most photos that are taken today no longer get printed. They mostly get viewed on digital screens: on a laptop, a mobile phone, beamer or digital photo frame. Also, in that regard, photography has become more sustainable.
Or, has it?
While the environmental impact per photo has dropped dramatically, we also make many more photos than ever before. Most of us have a phone camera with us at all times. We are currently making 1.6 trillion new photos per year. 90% of these are made on phones and most end up in some form of cloud storage; either personal storage like Google Photos or Apple iCloud, or social media platforms like Instagram or Facebook. The total number of photos stored in one way or another is estimated to be 10 trillion – a number that is growing exponentially. This means that almost all photos ever created were made in the last decade.
When digital photography first started in the 1990s, digital storage was still costly and limited in capacity. Even though images were of very low resolution by today’s standards, you could barely fit a single uncompressed photo on the storage media of the time.
The major breakthrough was lossy image compression. On a single 4 megabyte CompactFlash card, you could either store one uncompressed image, or ten high-quality JPEG images. Lossy compression made what would otherwise be an impractical technology a promising new way of making photos.
Storage capacities grew exponentially, following Moore’s law, and so did the resolution of image sensors. It didn’t take long for digital cameras to match and then exceed the fidelity of analogue cameras, while also the number of photos that could fit on a single flashcard would soon exceed that of a film roll. Moreover, photos could be instantly previewed on the camera itself, and bad ones could be removed to make room for better ones. In the first decade of this millennium, digital cameras completely overtook analogue cameras.
Lossy Compression
Digital images consist of pixels, which are samples of the intensities of red, green and blue at a particular spatial position in the image — these are typically organized in a rectangular grid. The first image formats were uncompressed: samples were represented with a specific precision, typically 8 bit, and an image file was simply a long list of sample values. An uncompressed image with 8-bit precision for R, G and B takes three bytes per pixel, so 3 MB per megapixel. One way to reduce the size is by reducing the precision: for example using 5 bits for red and blue and 6 bits for green would reduce it to 2 MB per megapixel, or using a palette of only 256 colors in total allows representing a pixel in a single byte, reducing the size to 1 MB per megapixel. Still, uncompressed images are quite large.
JPEG dramatically changed this by introducing the concept of lossy compression. It takes 8x8 blocks of pixels, applies some mathematical sorcery, and represents the block in a way that ends up requiring only a few bits per pixel, as opposed to 24. In the process, information is lost, but in a way that visually makes little difference. When done right, lossy compression can produce images that are indistinguishable from the uncompressed image, in only a tenth of the byte size. This made it possible not only to store more photos, but also to share them more easily over the Internet.
Lossy compression comes with a ‘knob’ to adjust the trade-off between quality and file size. Of course, when taken too far, lossy compression does lead to visible compression artifacts. We are all familiar with the ‘blocky’ look of a low-quality JPEG image. Lossy compression also leads to generation loss, similar to what happens when repeatedly making photocopies of photocopies: artifacts accumulate and the quality degrades when repeatedly applying lossy compression, even when using relatively high quality settings.
The ecological footprint of digital images
Per photo, digital photography is undeniably more eco-friendly than analogue film photography. The problem is that people are also making orders of magnitude more photos than in analogue times. Digital storage does come at an ecological cost. If we assume a typical digital photo to be around 5 megabytes (a typical size for a high-quality, high-resolution JPEG), then the 1.6 trillion photos humanity is currently producing per year require about 10 exabytes of storage. An exabyte is one thousand petabytes or one million terabytes.
The electricity required to store 10 exabytes for a year can be conservatively estimated to be at least 20 terawatt hours — 20 million MWh. That is equivalent to five million electric cars driving 20,000 kilometers each. And that’s just the cost of storage.
Transferring photos over the internet also comes at a cost. Estimates of this cost vary considerably, since technology obviously evolves. We can assume that every photo is transferred at least a few times — say, once to back it up to cloud storage, and a few times to retrieve it again. Some photos end up being shared a lot, for instance because they end up on a highly visited website or they become viral on social media. Not all of these transfers will include the full-resolution high-quality original photo; they are often downscaled, sharing lower quality versions of the image. Still, even with optimistic estimates, the total cost of transfer could end up being similar to the cost for storage: another five million cars driving 20,000 kilometers each.
Obviously, the ecological footprint of the hardware itself needs to be considered too: namely, the production of cameras and phones, cloud servers and hard drives. Being mindful of this, without even attempting to estimate it, it’s safe to say that digital photography has a significant environmental impact.
New image formats
JPEG was introduced in 1992. Since then, numerous new image formats have been created.
JPEG 2000, introduced in — you guessed it right — the year 2000, is perhaps not very well-known in the consumer market, but it is widely used in medical imaging (replacing analogue MRI or CT scans with digital ones) and in digital cinema. In both of these use cases, a higher fidelity is required than JPEG could offer. Misdiagnosis because of compression artifacts would obviously not be acceptable. JPEG 2000 is a rather complex codec, and at least for the first decade, there were no fully compliant open source implementations. For consumer-grade photos, compression improvements compared to “good old JPEG” were not that impressive. Unlike the ubiquitous JPEG, its successor JPEG 2000 never obtained widespread adoption.
In 2006, Microsoft announced the HD Photo format, standardized in 2009 as JPEG XR. Positioned somewhere in between JPEG and JPEG 2000, both in terms of complexity and compression performance, it too failed to obtain widespread adoption.
For professional photography, camera vendors created “raw” image formats that represent the sensor data directly and with higher precision than JPEG offers, allowing to preserve the full dynamic range of the samples, without any compression artifacts. Adobe made an attempt to standardise the various raw formats and created the DNG (Digital Negative) format .
On the web, pages went from mostly text-based to a more image-rich experience as network connections became fast enough to allow this. Both the size of the images and the number of images on web pages kept growing. By 2016, the average web page transfer size was larger than the original 1993 Doom game, and it has doubled again since then, mostly due to images. Better image compression became key to web performance optimisation.
Video Codecs
While digital cinema uses an image format — specifically the JPEG 2000 — to store frames of a video individually (“intra coding”), broadcast and web streaming requires more aggressive compression in order to keep the bandwidth consumption feasible and economical. Video codecs exploit the redundancy between frames by interpolating between frames with similar content (“inter coding”), for example using motion estimation. This leads to a whole new type of compression artifacts, but allows much more extreme compression ratios.
For still images, JPEG (and JPEG 2000) were considered “good enough” in terms of quality and compression, so the research focus shifted to video codecs. New techniques were invented for both inter and intra coding, and in particular various kinds of “deblocking filters” made it possible to dial back the quality settings further without introducing the obvious block artifacts you would get with JPEG.
In the same way that a video is a sequence of still images, a still image is a single-frame video. Since the main innovation was happening in video codecs, it made sense to repurpose video codecs for still images, especially on the web where this could help to reduce the weight and improve performance of web pages. WebP, HEIC and AVIF are three image formats derived directly from a video codec — VP8/WebM, HEVC and AV1, respectively. Google is the main proponent of WebP and AVIF, which are both royalty-free codecs (no patent license fees have to be paid in order to use these formats). HEIC, a heavily patent-encumbered codec, is essentially only available in the Apple ecosystem; Apple used it to replace JPEG as a capture format on the iPhone, although for interoperability, they convert HEIC to JPEG when needed.
Video codecs are optimized for a lower quality range than still-image codecs; after all, video frames are only seen for 1/30th of a second. In the lower quality range, video-codec based image formats achieve much better compression than JPEG. At the higher quality end though, they are not really doing much better than JPEG — they sometimes can’t even reach “camera quality” compression levels and look overly smooth even at high quality settings.
Computing power
Compression is all about making trade-offs. Every compression algorithm trades computation time for space — spending processing energy in return for reduced storage and transfer size.
This trade-off is generally a good idea. Especially for “old” formats like JPEG, the computing power required is quite small while the savings in storage and transfer are substantial.
Newer formats like the ones based on video codecs tend to require a lot more computing power. Video codecs generally have dedicated hardware decoders (and sometimes also encoders), which helps to reduce power consumption, compared to doing the processing on the general-purpose CPU. [1] Hardware decoding is a necessity: otherwise watching videos on a phone or tablet would make it heat up and deplete its battery quickly.
It takes time for new hardware to be developed. In practice, encoding is often done in software even when hardware is available, since hardware encoders are designed for fast but not optimal compression. The ecological footprint of images and video is, therefore, not only determined by storage and transfer, but also by the energy required to do encoding and decoding.
JPEG XL
The most recent new image format, standardised in 2022, is called JPEG XL — think “to excel”, not “extra large”. It is also designed to be a successor to the old JPEG, but compared to previous attempts to dethrone JPEG, it is doing some things differently.
First, it is “legacy friendly”: existing JPEG images can be converted to JPEG XL without any loss (no generation loss) — this cannot be done in any other new image format. It results in files that are 20% smaller while they can be converted back to the exact same JPEG file. Second, it is royalty-free and has a good, complete, free and open source reference implementation, so applications can relatively easily add support. [2] Third, it also supports lossless compression. It is good at it too, even for non-photographic images where currently PNG is widely used. Fourth, it can do progressive decoding — showing a preview of an image while it is still being transferred — which is an important feature for web delivery. Fifth, it can do high-fidelity lossy compression and is fully ready for the fidelity offered by cameras and display devices of today and of the future: high dynamic range, wide color gamut, high resolution, new kinds of sensor data like depth maps and thermal imagery. At the high-fidelity end of the quality spectrum, it offers 60% better compression than JPEG, while some other new formats are struggling to even match JPEG in that range, being optimised mostly for the typical quality range of web video. Finally JPEG XL has a relatively low computational cost: images can be encoded quickly, with low energy consumption.
With JPEG XL, storage and transfer costs for photos could both be reduced by 60%. If at present we require the equivalent of ten million electric cars driving 20,000 km each, switching to JPEG XL would reduce it to four million cars.
In other words, JPEG XL is a very promising new format. It could be used across the entire workflow: from image capture and authoring to delivery and archiving. When preliminary support was added to the Chrome browser, many web developers and companies like Adobe, Meta, and Intel were enthusiastic about the prospect of finally having a worthy successor for JPEG on the web. It was a big surprise when the Chrome developers announced on Halloween 2022 that they were intending to drop their support. [3] Even though according to many stakeholders the case for JPEG XL is quite strong, it looks like Chrome’s decision was based on testing performed by AVIF engineers. Some of these tests were performed and interpreted in questionable ways, leading to incorrect conclusions. But in another unexpected plot twist, recently Apple has announced support for JPEG XL in all of their products, which might prompt the Chrome developers to revert their decision.
The Future
It is impossible to predict the future of digital image formats. So far, no new format has managed to be as successful and ubiquitous as JPEG. Whether there will be a successor is not only a matter of technical merit. Overcoming the inertia when existing solutions are widely seen as “good enough” is not easy. All too often, office politics and the gatekeeping power of monopolistic companies are what makes or breaks a new format.
As the ecological footprint of photography shifted from film rolls and developing chemicals to digital storage, network transfer and processing power, I see only three ways to reduce our footprint: making fewer pictures, reducing their quality, or using better image formats. Which of these options do you prefer?