Recursive Self-Improvement Enables Startups to Outpace Frontier Models
This conversation with Ian Fischer, co-founder and CEO of Poetic, reveals a profound shift in how startups can leverage artificial intelligence. The core thesis is that recursive self-improvement, a concept previously confined to massive, expensive AI research labs, is now accessible to small teams. This isn't just about using the latest models; it's about building systems that continuously enhance themselves on top of existing AI, creating a durable competitive advantage. The hidden consequence is that traditional approaches like fine-tuning are becoming obsolete, burning capital and time only to be outpaced by the next model release. Startups and engineers who grasp this paradigm shift can gain a significant edge by building "reasoning harnesses" that consistently outperform off-the-shelf AI, even as foundational models evolve. This offers a path to achieving state-of-the-art performance without the prohibitive costs of frontier AI development.
The Stilts of Self-Improvement: Outpacing the Frontier Models
The rapid advancement of large language models (LLMs) presents a classic dilemma for startups: invest heavily in fine-tuning a model that will soon be surpassed, or risk falling behind. Ian Fischer, co-founder of Poetic, argues that this dilemma is a trap. His company's approach, recursive self-improvement, offers a way to build "reasoning harnesses" that consistently outperform base models, effectively putting startups on "stilts" to stand taller than the latest AI releases. This isn't about competing with the frontier model developers; it's about leveraging their advancements while building a layer of intelligence that continuously optimizes itself.
The conventional path for a startup aiming to improve AI performance typically involves collecting vast datasets and spending significant capital on fine-tuning. As Fischer points out, this effort is often rendered obsolete by the next model release.
"I know that I want to take advantage of whatever the next model is, but the second you're in fine-tuning land, I'm spending millions to hundreds of millions of dollars, and then guess what? I just lit it on fire because the next version of the frontier model comes out, and I'll never catch up."
This highlights a critical consequence: the immense cost and time investment in fine-tuning creates a fragile advantage. The "bitter lesson" of AI development, as Fischer alludes to, is that foundational models improve so rapidly that bespoke fine-tuning quickly becomes a losing race. Poetic's system offers an alternative. Instead of building a static, fine-tuned model, they create dynamic harnesses that can be applied to any underlying LLM. When a new, more powerful model emerges, the harness is compatible, and the system can be further optimized to extract even greater performance, all without the massive expense of retraining. This creates a sustainable advantage, a "lasting moat," by focusing on the optimization layer rather than the foundational model itself.
Beyond Fine-Tuning: The Automated Optimization Engine
The true power of Poetic's approach lies in its automation of what were once manual, insight-driven processes. Traditionally, achieving high performance on complex tasks involved a combination of prompt engineering, context stuffing, and reasoning strategies--all requiring significant human expertise and iteration. Fischer explains how their "Poetic meta-system" automates this, treating the underlying LLM as a component rather than the end product.
"The core technology that we've developed at Poetic is recursive self-improvement. We have a recursively self-improving system, which we call the Poetic meta-system. The output of that system is systems that solve hard problems, where a hard problem is something that if you gave it to GPT-5.2, it would struggle to give you a reliable, robust result, just to use an example."
This automated optimization process is what allows Poetic to achieve state-of-the-art results on challenging benchmarks like ARC-AGI and Humanity's Last Exam, often at a fraction of the cost of the base models they leverage. For instance, they achieved a 54% score on ARC-AGI V2 using Gemini 3 Pro, outperforming Gemini 3 DeepMind's 45% score, despite using a significantly cheaper underlying model. Similarly, on Humanity's Last Exam, they reached 55%, surpassing Anthropic's Claude Opus 4.6. The key differentiator is that these gains come from the harness and its self-optimizing capabilities, not from the raw power of the base LLM alone.
The implication here is a fundamental shift in how performance is achieved. Instead of relying on human intuition to craft the "perfect" prompt or reasoning chain, the AI itself learns to optimize these elements. This is particularly evident in how the system handles data and prompts. Fischer notes that the meta-system can generate its own examples, even if they are imperfect, and that the output prompts are often unlike what a human would write. This suggests that the AI is discovering novel and effective ways to interact with itself and the problem space, pushing performance beyond human-designed limitations. This is a crucial distinction: it’s not just about making the LLM smarter; it’s about building a system that makes the LLM work smarter for a specific, complex task.
The Long Game: Delayed Payoffs and Competitive Moats
The promise of recursive self-improvement is not just about immediate performance gains; it’s about creating durable competitive advantages through delayed payoffs. While fine-tuning offers a quick, albeit temporary, boost, Poetic's approach builds a system that compounds value over time. This is where systems thinking becomes paramount. The "stilts" metaphor is apt: as the foundational models rise, so does the performance of the system built upon them. This creates a strategic advantage because the investment is in the optimization layer, which remains valuable regardless of which LLM is currently at the top.
Fischer elaborates on the difference between manual prompt optimization and their approach:
"That will get you some performance improvements, but it's very far from everything that you can get if you actually think about these reasoning strategies that are really going to be written in code rather than in just better prompts."
This distinction is critical. Reasoning strategies, embedded in code as part of the harness, represent a deeper, more robust form of optimization than simply tweaking prompts. These strategies, developed through recursive self-improvement, are harder for competitors to replicate and provide a more significant performance uplift, especially on complex tasks. The payoff for this deeper investment is delayed but far more substantial. It requires a willingness to embrace complexity and a long-term perspective, which is often at odds with the pressure for rapid, visible results in the startup world.
The consequence of ignoring this is being perpetually behind. Startups that continue to rely on traditional fine-tuning will find themselves in a constant cycle of expensive upgrades, only to be outpaced by the next generation of foundational models. Those that adopt a recursive self-improvement strategy, however, build a system that not only keeps pace but actively outpaces the market. This requires a shift in mindset: viewing the LLM as a powerful but interchangeable engine, and focusing on building the sophisticated chassis and navigation system around it that truly defines the vehicle's capabilities and performance. This is where true, defensible competitive advantage is built--in the systems that can adapt and improve long after the initial components have been superseded.
Key Action Items
- Embrace the "Stilts" Mentality: Shift focus from fine-tuning base LLMs to building optimization layers or "reasoning harnesses" on top of them.
- Prioritize Automated Optimization: Invest in or develop systems that can recursively self-improve prompts, reasoning strategies, and data handling, rather than relying solely on manual iteration.
- Evaluate Base Models as Components: Treat LLMs as interchangeable parts. When a new, superior model is released, integrate it into your existing harness rather than undertaking costly retraining.
- Focus on Durable Advantage (12-18 months+): Recognize that true competitive moats are built on systems that compound value over time, not on temporary performance boosts from fine-tuning.
- Experiment Daily with AI (Immediate): As Ian Fischer advises, actively try new AI tools and techniques daily to stay abreast of rapid advancements and identify potential applications for your work.
- Seek Expertise in Agentic Systems (Over the next quarter): If building complex AI agents, consider how to incorporate advanced reasoning strategies beyond simple prompt engineering, potentially exploring services like Poetic.
- Develop Deep Knowledge Extraction Capabilities (This pays off in 12-18 months): Focus on building systems that can deeply extract and utilize knowledge from LLMs, a capability highlighted by Poetic's benchmark successes.