AI is getting expensive, but relief is on the way – just not for you

Generative AI apps and services are getting more expensive by the day as model devs grapple with surging infrastructure costs. A new generation of GPUs and AI accelerators promises relief from rising inference demand, but you won’t see the savings. After years and billions spent building bigger and better models, the great AI houses are beginning to find tangible use cases for the technology beyond chatbots and image generators. Claude Code, Codex, GitHub Copilot, and the slew of other code assistants have arguably become AI’s biggest success story to date, but history tells us they won’t be the last. But success is a double-edged sword. The bit barns built with borrowed money to train the Sonnets, GPTs, and Geminis at the heart of these apps and services were never meant to serve them at this scale. Inference and training are very different beasts. Those selling the shovels of the AI boom are now racing to bring new hardware better suited to serving these models. Nvidia pulled $20 billion from its war chest to acquihire AI chip startup Groq for this very reason. And it’s not alone; everyone, from AMD and AWS to Intel and Google, is rearchitecting their GPUs, AI accelerators, and systems to drive down the cost per token. Cheaper tokens mean better inference economics, higher margins, and the venture capitalists fanning the flames hope that OpenAI, Anthropic, and all the others might actually drag themselves out of the red one day. Your AI addiction is their opportunity There’s just one little problem. All that AI-optimized hardware isn’t quite ready yet. Much of it is promised for the second half of this year, but it takes time to work out the kinks and ramp supply chains, which means the bulk of these new systems won’t have widespread deployments until early to mid 2027. But here lies a fleeting opportunity for the flag-bearers to see how addictive their products have become, and just how much the market will bear. If Nvidia and AMD are the arms dealers of the AI age, the model devs are the drug dealers: the first hit’s free, the next ones are cheap, and before long you’re hooked. We’re already seeing this play out. With the launch of GPT-5.5, OpenAI doubled the price per token to $5 (input), $0.50 (cached input), and $30 (output) per million tokens. It didn’t take long for Google to follow suit. The Chocolate Factory’s newly-launched Gemini Flash 3.5 is between 3x and 6x more expensive than Gemini 3.1 Flash-Lite and Gemini 3 Flash Preview. These price hikes are further compounded by the fact that the agent harnesses being built atop these models are burning through tokens orders of magnitude faster than a typical chatbot. Flat rate pricing makes a lot of sense when the majority of your customers aren’t running up against usage caps. It makes a lot less sense when customers are spending $200 a month on $5,000 worth of tokens. Microsoft seems to have figured this out. It outright abandoned seat-based pricing for GitHub Copilot and began transitioning its customers to usage-based pricing. Anthropic appears to be rethinking its pricing model as well, but rather than moving to a pure usage-based pricing model, it’s considering watering down its subscription features. AI isn’t the payroll paradise execs were promised Executives who thought AI was going to replace a full-time employee for pennies on the dollar are in for a rude awakening. That’s not happening and it probably never will. Not when Anthropic, Google, or OpenAI can charge the equivalent of $30 an hour in tokens and make the case it’s still cheaper than paying an employee $40 an hour plus benefits and unemployment insurance. Just wait, before long AI pricing will be marketed in dollars per full-time equivalent ($/FTE) instead of dollars per million tokens. AI may not be the sweet deal execs might have hoped for, but that hasn’t stopped large tech firms from laying off thousands in pursuit of the technology. The FOMO has never been higher, and, if there’s anything big tech loves, it’s leading by example. So far this month, we’ve learned: Meta is laying off about 10 percent of its global workforce, closing around 6,000 open positions, and reassigning some 7,000 workers to AI-focused divisions. Cloudflare is cutting more than 1,100 workers, citing increased reliance on AI. Cisco is letting about 4,000 workers go because, as its CEO Chuck Robbins put it, “The companies that will win in the AI era will be those with focus, urgency, and the discipline to continuously shift investment toward the areas where demand and long-term value creation are strongest.” Even New Zealand has revealed plans to use AI to sack around 9,000 government workers. Competition won’t save us Competition, it’s said, is the cure to high prices, but for that to happen, there has to be a profit margin to shave and so far the top model devs are all running deep in the red. Hyperscalers have the advantage here. They can lose billions on AI investments for years while leaning on other product divisions to keep their shareholders from staging a riot. But it is probably not the death knell for Sam Altman’s hypemaxing or Dario Amodei’s sanctimonious posturing. Someone still has to build the models. Microsoft, Meta, and AWS are dabbling in model training, but have yet to show they can compete with OpenAI or Anthropic in any meaningful way. Google is really the standout in this respect. Gemini routinely trades blows with GPT and Claude, and after this week’s I/O, it’ll be practically inescapable. If history tells us anything, the AI boom and inevitable bust will follow a familiar trajectory. Competition abounds in a bubble, but once it bursts, consolidation is inevitable. ®

Leave a Reply