Dev

AI may finally cure us of our data fetish


Column The rise of large language models (LLMs) built on huge stores of data and driven by artificial intelligence may seem frightening. Paradoxically, it may be the best thing in decades for the progress of human intelligence.

To understand why, consider that the paperless office revolution took a leisurely four decades to become reality. I haven’t had a working printer in years, needing to beg print jobs from friends about once a year – generally when I need some documentation for international travel.

Immigration demands documentation, but remains unevenly divided between paper and bits. On the way into Australia authorities still hand out arrival cards with details that need to be written in ink on a slab of paper with nearly the same dimensions as an old-timey punched card. By way of contrast, a few days before my last trip to Indonesia, I filled out a web form with those same details, then got a QR code that I scanned at the immigration counter in Bali – and wondered why Australia seemed so backward.

Paper-based forms are cheap to print, but they demand human attention – an expensive resource. The reward for that expense is a certain flexibility – people can write between the lines and in the margins, adding information that may not be captured in whatever’s requested on the form itself. In turn, a human can read, interpret and respond to information beyond the confines of the lines and boxes.

Web forms, PDFs, and other fully digitized mechanisms for data entry automate data collection, but also amplify the burden on the person filling out the form. What if a name can’t be typed in Roman characters, or the address doesn’t obey the format given? What if a person prefers not to identify with a particular gender? Forms force people into form – that’s part of their function: to make us all regular, recordable and computable.

French philosopher Michel Foucault made this observation sixty years ago, in The Birth of the Clinic: that doctors treat people based on their medical records, rather than the reality of what’s going on in their bodies. As organizations well beyond the field of medicine twigged to Foucault’s brilliant and subversive critique of the way they operated, most retreated into a fantasy. They believed that the problem lay not in an over-dependence on data, but in a lack of the right data. Businesses, governments and research think-tanks everywhere became data hungry, looking for that elusive datum that would help them make the right decision at the right moment for the right body – or customer, or market.

In other words, doing more of the wrong thing should make it all come right – right?

The pursuit of that fantasy means we have more forms to fill out than ever before, amplifying the paradox of data: even as we acquire more and more of it, the insights we seek become more elusive.

I recently interviewed someone who fervently believed that the solution for autonomous vehicles – to get them to be something more than death robots on wheels – lay in millimeter-wave scans that record beneath the surface of every roadway. We should create a high-definition model of the road surface and road bed, which the vehicle could then use in its decision-making. Petabytes of data were seen as a talisman, to ward off the growing realization that robo-driving a car is far harder than it appears.

It’s these edges where automation fails that reveal an inconvenient truth: data isn’t everything. In fact, it may not be the right thing at all.

Perhaps it makes sense to fight data with data. A colleague recently explained how he used ChatGPT – this moment’s bête noire – to help him comply with a documentation requirement imposed by his organization’s bureaucracy. ChatGPT spat out a convincing template for a requirements document for the ISO/IEC/IEEE 29148:2011 software lifecycle standard. He read the spec himself, then watched a longish video on YouTube. That briefed him enough to detect whether ChatGPT spat out a nonsense response. It didn’t. ChatGPT provided everything he needed – saving hours of time writing out all that compliance boilerplate, leaving him to focus on the specifics.

I reckon my colleague is onto something. We could be piping every request for documentation into Large Language Models capable of complying with this rising sea of bureaucratic demands.

Perhaps that’s Microsoft’s motivation to buy 49 percent of OpenAI, the outfit behind ChatGPT. Redmond would see it as closing the loop – finally satisfying our fetish for data, by letting the computers talk amongst themselves, and leaving people free to wrestle with the interesting bits. ®



READ SOURCE

This website uses cookies. By continuing to use this site, you accept our use of cookies.