
The idea that large language models (LLMs) have been commoditised might seem absurd. Big tech is pouring billions of dollars into developing gargantuan models like ChatGPT or Llama. Google reportedly spent $191m alone training Gemini Ultra.
If we define a commodity as a product or good, with minimal differentiation, it’s easy to see how wheat, olive oil, or a toothbrush fits that definition. It’s harder to see how an AI model that has billions of parameters, needs thousands of megawatt hours of power to train, and that can turn a few prompts into an extensive report, or a highly realistic video, fits that definition.
Yet at Appian’s user conference in May, co-founder and CTO Mike Beckley said that the key LLMs offer minimal differences in performance or features. Any advantage a new model has is short-lived. At the same time, he said, the costs of using LLMs were falling dramatically, making it easier for customers to experiment with the technology – for better or worse.
This is perhaps not totally surprising. As Akamai CTO Robert Blumofe points out, “I think pretty much every LLM out there is using the same neural net architecture, the transformer. They are pretraining on pretty much the exact same dataset, which is basically everything you can hoover up off the web.”
Differences come in terms of tuning, or reinforcement learning through human feedback, he says. “All of which,” Blumofe argues, “supports a case that the core LLM functionality is, indeed, becoming commoditized.”
Of course, beyond the LLMs pushed by the likes of OpenAI, Google, Meta, Microsoft and Anthropic, there are hundreds, even thousands, of models companies can potentially choose from, including open source variants. But as Silvia Lehnis, consulting director for AI and data at UBDS Digital, explains, in a new, extremely fast-moving market, companies looking to adopt AI are looking for security.
“It’s much easier for developers but also for managers to choose something that’s well known,” she says. “Not least as it’s more defensible to the board if something goes wrong.”
Trust the process?
If LLMs are becoming more commoditised, Beckley tells Tech Monitor, the “scaffolding” around them becomes much more important. “The value,” he says, lies “in how you’re able to apply a pipeline of data in a meaningful process that does useful work reliably and safely.”
For Appian, that means applying a process or workflow, and being crystal clear about what problems LLMs could and should be set to work on. After all, he says, we’re 15 years on from the subprime mortgage crisis, which was in large part based on foolhardy automation of decision making and a lack of human intervention. “That,” says Beckley, “was a fascinating example of how a toxic algorithm can run amok and almost destroy the world.”
In finance and insurance today, he says, GenAI’s biggest impact is often registered in “completely boring ways,” such as onboarding customers faster or simply scanning and validating the widest possible variety of documents.
Blumofe agrees that the opportunity for differentiation really lies in how LLMs are integrated with other tools – whether to check or test results, access other datasets or impose guardrails. He adds that minimal differentiation between LLM vendors has broader benefits for end users. Feature parity makes it easier to take a ‘pick and mix’ attitude to tooling – assuming customers have made sure they’re not locked in in other ways.
“Increasingly, there is interest in running LLMs on-prem and on-device,” says Blumofe. “If you’re not overly tied to a particular vendor, that opens up much more opportunity to run on the infrastructure that you want to run on.”
There may even be a case for building your own model from scratch. Lehnis points out that the mainstream models being used most often are not necessarily delivering the best results. In “a lot of the results that we see from businesses,” she says, companies “are disappointed in how [their] copilots [are] performing.”
One solution is to develop a specialised model, she says, whether for an individual company or even for an industry – say, covering all the regulations for the insurance industry. “If you’re taking a generic model,” says Lehnis, “even if you’re trying to inspire it through various techniques on your own data, you’re still getting a lot of the generic content coming through and paying for the cost of the compute of that generic model, which doesn’t really answer that very specialized question.”
It’s not just enterprises that need to make these choices. Nutanix CEO Rajiv Ramaswami says it deliberately sought to provide its customers with choices when it came to which models they integrate with. For internal use, says Ramaswami, “We tend to rely more on open-source models.” These were good enough for simple use cases, he said, such as productivity enhancements or document summarisation – areas where the consequences of the LLM making a mistake were hardly going to be apocalyptic.
But when it comes to critical applications, says Ramaswami, model fidelity becomes much more important. If an agent were diagnosing a condition in a patient, then prescribing a drug, for example, “that’s potentially life impacting, and you want to have absolute fidelity in that before you let that go.”
Chatbots are not the answer
Organisations, then, need to be very clear about what models they trust, the problems they target them at, and the systems, and the guardrails and safeguards they build around them. And while LLMs are undoubtedly powerful tools and becoming ever-cheaper to use, prospective users need to be clear about the limits of the technology.
The problem is, Beckley says, customers are going to the AI providers and saying, “’We think AI is great, what can we do with it?’ And the provider says, ‘Well you can build a chatbot’.”
“They’re better chatbots,” he says, but “they’re still chatbots.” And companies are typically spending far more on them than on the previous generation.
The danger is the level of trust being placed in these systems, he continues. “Instead of telling me, ‘Should I approve this loan’,” says Beckley, the end user is instead letting the model “just go ahead and approve a loan and then pay the money.”
This, he argues, is “terrifying” given LLMs’ ongoing and so far irremediable tendency to hallucinate. “They’ll come up with bizarre strategies that work around what you tell them to do,” says Beckley. “They’re not trustworthy agents.”
Blumofe says that while LLMs have caught the popular imagination, there has been so much investment in the technology that many end users have developed more than a vested interest in the success of the models they’ve acquired. The broader field of machine learning and AI, “at some point needs to garner more attention.”
Likewise, argues Beckley, the fascination with LLMs sucks up the resources needed to develop meaningful AI and unlock innovation in general. “We need to keep our hands on the wheel, and we do that through setting the goals, setting the victory conditions, setting the allowable actions and then allowing our LLMs to improvise.”
If we get it wrong, he says, broader disenchantment with AI and the squandering of investment will make it harder for all of us to apply both LLMs and other AI technology to really important problems, like the next pandemic, climate change, national security, or securing our food supply.
“Everyone’s on pins and needles waiting for the next LLM to start telling us what to do, and running for president, launching the missiles, or taking over all of our jobs,” he says. “You know, the quiet revolution is [that] generative AI can actually accurately read all the mail.”