The AI landscape has fundamentally shifted from an era of generous subsidies to one of palpable token scarcity, revealing hidden economic and strategic consequences for businesses and developers alike. This transition, marked by enterprise sticker shock and a scramble for compute, signals a new phase where efficiency, affordability, and strategic access to AI resources will define competitive advantage. Those who grasp the implications of this scarcity--understanding that immediate costs now dictate long-term viability--will be best positioned to navigate the evolving AI ecosystem. This analysis is crucial for business leaders, product managers, and engineers aiming to build sustainable AI strategies beyond the subsidized honeymoon period.
The Unraveling of the AI Subsidy Era
The explosive growth of AI adoption, particularly with the advent of agentic capabilities, has rapidly outpaced the economic models that initially fueled it. For months, the prevailing wisdom was that AI companies could afford to offer generous subsidies, with the cost of tokens for power users far exceeding their subscription fees. This "let her rip" mentality, while fostering rapid experimentation and innovation, has hit a critical inflection point. The realization that these subsidies are unsustainable, coupled with companies burning through multi-year AI budgets in mere months, has forced a dramatic business model recalibration.
The shift from a "seat-based" model, where revenue is capped by the number of users, to a "token-based" model, driven by actual usage, has unleashed staggering revenue growth for foundation model providers like OpenAI and Anthropic. However, this revenue explosion also highlights the underlying cost structure and the inevitable move towards scarcity. As companies like GitHub Copilot and Google announce usage limits and shifts to usage-based billing, the era of virtually unlimited, subsidized AI access is definitively over. This transition is not merely an economic adjustment; it’s a strategic imperative. The ability to manage and optimize token consumption, rather than simply maximizing usage, will become a key differentiator.
"My argument was, of course, about the value of experimentation and in fact, the necessity of experimentation in a period where no one knows the best way to use these tools. But it would be very clear very quickly that there would be consequences of this idea of token maxing that would rear their ugly heads soon."
The implications of this shift are profound. What was once a free-for-all for experimentation is now constrained by budget realities. Companies that embraced "token maxing" -- incentivizing employees to use as many tokens as possible -- are now facing the harsh reality of diminishing returns and questionable ROI, as exemplified by Uber's experience. This necessitates a move from measuring inputs (tokens consumed) to outputs (actual business value generated), a difficult but essential pivot. The challenge for enterprises will be to foster continued innovation while imposing discipline on resource allocation, a balance that requires strategic foresight and a deep understanding of AI’s true operational costs.
The Verticalization of AI Infrastructure and the Compute Scramble
As token costs rise and availability becomes a concern, the AI infrastructure landscape is undergoing a significant verticalization. Companies are no longer solely relying on generalized cloud offerings; instead, specialized providers are emerging to address specific needs in compute, inference, and model orchestration. This trend is driven by the fundamental constraint: compute power. The demand for AI, particularly for agentic systems, has created a compute bottleneck, leading to massive investments in specialized infrastructure.
The strategic alliance between SpaceX and Anthropic is a prime example of this verticalization and the scramble for compute. SpaceX's move into providing significant compute capacity to Anthropic, a severely compute-constrained company, transforms SpaceX into a de facto "neo-cloud" provider. This not only addresses Anthropic's immediate needs but also positions SpaceX for a potentially massive IPO by highlighting its role in a critical, high-demand sector of the AI supply chain. This move underscores a broader industry realization: controlling or accessing compute is becoming paramount. The surge in AI memory stocks and Meta’s exploration of becoming a cloud business further illustrate this point. The ability to secure and optimize compute resources is no longer a secondary concern but a primary driver of competitive advantage.
"The most notable of these undoubtedly, and something that happened this May that I think will have fairly dramatic implications for the industry as a whole, is Elon shifting into a very different type of role vis-à-vis the AI industry."
This verticalization extends beyond raw compute. Providers like Open Router, enabling developers to dynamically switch between models based on cost and performance, and specialized inference providers are gaining traction. The implication is that the AI ecosystem is fragmenting into specialized layers, each with its own economic dynamics and strategic importance. Companies that can effectively navigate this complex infrastructure landscape, securing access to necessary compute and optimizing for cost-effective inference, will gain a significant edge. The days of easily accessible, commoditized AI infrastructure are waning, replaced by a more stratified and competitive market.
The Maturation of Models and the Rise of Harnesses
While the pace of foundational model releases continues, the narrative is shifting. The emphasis is moving from the incremental improvements of individual models to the sophistication and utility of the "harnesses" or platforms that enable their use. As one observer noted, "Unless it's a major breakthrough in model capability, I'm much more excited for super app updates like Codex and Claude Desktop. There's so much to be unlocked by making those surfaces better." This suggests that the era of revolutionary model leaps, akin to the early days of GPT-3 or even GPT-4, may be giving way to a more iterative cycle, similar to smartphone releases.
The real innovation now lies in how these powerful models are integrated into workflows and made accessible for complex tasks. Features like dynamic workflows and "slash commands" (slash goals) are becoming critical primitives, enabling users to orchestrate multi-step processes and achieve more sophisticated outcomes. This is particularly true for knowledge work, where agents can now automate tasks that were previously manual and time-consuming, such as generating meeting prep documents or assembling proposals.
"The thing that actually matters right now is what's happening around the models. Claude Code shipped dynamic workflows this same week, and that genuinely changes what one person