Surge AI's Elite Team Achieves Billion-Dollar Revenue by Rejecting Growth Playbook
TL;DR
- Surge AI achieved over $1 billion in revenue with under 100 employees by focusing on a super-elite team and rejecting the typical Silicon Valley growth playbook.
- The company's success stems from an obsessive focus on data quality, defining "good" not by simple checks but by nuanced, subjective criteria like Nobel Prize-winning poetry.
- Benchmarks are largely untrustworthy, as models can "hill climb" on flawed, objective metrics rather than solving complex, ambiguous real-world problems.
- Reinforcement learning environments, simulating real-world scenarios, are the next frontier for AI training, enabling models to learn end-to-end tasks and complex interactions.
- AI models will become increasingly differentiated by the values and "taste" of the companies building them, leading to distinct behaviors rather than commoditization.
- The "Silicon Valley machine" of constant pivoting and blitzscaling hinders the creation of truly important, novel companies; founders should focus on their unique vision.
- Building AI that genuinely advances humanity requires focusing on complex, hard-to-measure objective functions rather than simplistic proxies like engagement or superficial benchmarks.
Deep Dive
Edwin Chen's Surge AI has achieved unprecedented revenue growth by rejecting traditional Silicon Valley growth-at-all-costs tactics, instead prioritizing a small, elite team and a contrarian focus on product quality. This approach has led to over $1 billion in revenue with fewer than 100 employees, operating entirely bootstrapped and profitably from day one. The core of Surge AI's success lies in its obsessive dedication to defining and achieving high-quality data for AI training, moving beyond superficial metrics to cultivate nuanced, human-like judgment within AI models. This focus on true quality, rather than just benchmark performance, is shaping the future of AI development and offering a model for building impactful companies.
The emphasis on quality data is not merely about collecting more information, but about deeply understanding the subjective nuances of tasks, whether it's crafting Nobel Prize-winning poetry or writing efficient code. Surge AI employs a complex system of thousands of signals, analyzing worker performance, expertise, and output quality to train models that exhibit sophisticated judgment. This contrasts with the common industry practice of simply meeting basic objective criteria, which often leads to models that are impressive on paper but lack true utility or deeper understanding. This meticulous approach means investing heavily in technology to measure subtle qualities, moving beyond mere content moderation to discover and cultivate the "best of the best" AI behaviors.
Anthropic's Claude, for example, has excelled due to a similar commitment to data quality and a refined sense of "taste" in its training, prioritizing real-world task performance over easily gamed academic benchmarks. This highlights a critical tension in AI development: the allure of impressive benchmark scores versus the pursuit of genuine advancement. Edwin Chen argues that many AI labs are misguidedly optimizing for superficial metrics and engagement, akin to chasing dopamine or tabloid readership, which can lead to models that are more distracting than beneficial. This "vibe coding" and focus on leaderboards, he contends, masks fundamental failures and pushes AI development in a direction that may not truly serve humanity's long-term interests.
Looking ahead, Chen predicts increasing differentiation among AI models, driven by the distinct values and objective functions of the companies building them. This contrasts with an earlier expectation of commoditization, suggesting that AI will evolve to reflect the unique principles of its creators, much like different search engines or social media platforms reflect their parent companies' philosophies. Reinforcement learning (RL) environments are identified as a crucial next frontier, moving beyond static benchmarks to simulate complex, end-to-end real-world tasks. These environments allow models to learn through trial and error, developing the capacity to handle ambiguity, longer time horizons, and the cascading effects of their actions--skills essential for tackling real-world problems.
Surge AI's own research team plays a vital role, not only collaborating with clients to advance their models but also developing new benchmarks and training techniques to counter the industry's misaligned incentives. This research-driven ethos, akin to a research lab rather than a typical startup, allows Surge to focus on long-term impact and intellectual rigor, unswayed by short-term market pressures. Chen advocates for founders to build companies embodying their unique insights and values, rejecting the Silicon Valley dogma of constant pivoting and blitzscaling. He emphasizes that true innovation comes from a deep belief in a singular mission and the courage to pursue it, even when difficult.
Ultimately, the work of Surge AI and companies like it is framed not just as data provision but as a profound responsibility in shaping the future of humanity. By focusing on complex, meaningful objective functions--helping AI advance human flourishing rather than merely consuming our time--they aim to guide AI's development toward genuinely beneficial outcomes, treating AI training much like raising a child with care, values, and a focus on long-term well-being.
Action Items
- Audit AI benchmarks: Identify 3-5 common flaws (e.g., wrong answers, objective answers) and their impact on model direction.
- Design RL environments: Create 2-3 simulations mimicking real-world tasks (e.g., AWS outage, financial analysis) for model training.
- Measure model taste differentiation: For 3-5 models, analyze objective functions and identify 2-3 distinct behavioral patterns.
- Evaluate AI's impact on humanity: Define 2-3 metrics for advancing humanity beyond engagement proxies.
- Develop runbook template: Define 5 required sections (setup, common failures, rollback, monitoring) to prevent knowledge silos.
Key Quotes
"we hit over a billion of revenue last year with under 100 people and i think we're going to see companies with even crazier ratios like 100 million per employee in the next few years ai is just going to get better and better and make things more efficient so that ratio just becomes inevitable"
Edwin Chen argues that the efficiency gains from AI will lead to unprecedented revenue-per-employee ratios in companies. This suggests a future where smaller, highly productive teams can achieve massive financial success. Chen's observation implies a fundamental shift in how companies are built and scaled.
"we basically never wanted to play the silicon valley game and like i always thought it was ridiculous like what did you dream of doing when you were a kid was it building a company from scratch yourself and getting into a wework or coding in your product every day or was it explaining your decisions to vcs and getting on this giant pr and fundraising hamster wheel"
Edwin Chen explains his contrarian approach to company building, rejecting the typical Silicon Valley playbook of constant pitching and fundraising. He contrasts this with a focus on building a product through dedicated effort. Chen's perspective highlights a desire for founders to prioritize technology and product development over the "hamster wheel" of venture capital.
"i think most people don't understand what quality even means in this space they think you can just throw bodies at a problem and good good data and that's completely wrong let me let me give you an example so imagine you wanted to train a model to write an eight line poem about the moon what makes it a good high quality poem if you don't think deeply about quality you'll be like is this a poem does it contain eight lines does it contain the word moon you check all of these boxes and if those sure yeah you say it's a good poem but that's completely different from what we want we are looking for nobel prize winning poetry"
Edwin Chen emphasizes that true data quality in AI training goes far beyond superficial checks. He uses the example of poetry to illustrate that high quality requires depth, nuance, and artistic merit, not just adherence to basic rules. Chen's point is that many in the field misunderstand this complexity, leading to suboptimal AI development.
"i'm worried that instead of building ai that will actually advance us as a species curing cancer solving poverty understanding the universe all these big grand questions we are optimizing for ai swap instead like we're basically teaching our models to chase dopamine instead of truth"
Edwin Chen expresses concern that the AI industry is prioritizing superficial engagement over genuine progress. He argues that current benchmarks and optimization strategies are leading AI models to chase "dopamine" -- user attention and superficial metrics -- rather than pursuing truth and solving significant global challenges. Chen believes this focus is misdirecting AI development away from its potential to advance humanity.
"i think one of the things that's going to happen in the next few years is that the models are actually going to become increasingly differentiated because of the personalities and behaviors that the different labs have and the kind of objective functions that they are optimizing the models for"
Edwin Chen predicts that AI models will become more distinct from each other due to the unique values and goals of the companies developing them. He suggests that instead of commoditization, we will see models with varied "personalities" and behaviors, shaped by the specific objectives their creators pursue. Chen's insight points to a future where AI differentiation is driven by underlying company principles.
"i definitely wish i'd known that you could build a company by being heads down and doing great research and simply building something amazing and not by constantly tweeting and hyping and fundraising"
Edwin Chen reflects that he would have started his company sooner if he had realized that building a successful company could be achieved through focused research and product excellence, rather than constant external promotion and fundraising. He contrasts this with the conventional Silicon Valley approach, suggesting that genuine quality can cut through the noise. Chen's statement offers a counter-narrative for founders seeking a different path.
Resources
External Resources
Books
- "Stories of Your Life and Others" by Ted Chiang - Mentioned as an all-time favorite short story and inspiration for the movie Arrival.
- "The Myth of Sisyphus" by Albert Camus - Mentioned as a book the author finds inspiring, particularly the final chapters.
- "Le Ton Beau de Marot: In Praise of the Music of Language" by Douglas Hofstadter - Mentioned for its exploration of translation and its resonance with ideas about data and quality in LLMs.
- "Gödel, Escher, Bach: An Eternal Golden Braid" by Douglas Hofstadter - Mentioned as a more famous book by the author, related to "Le Ton Beau de Marot."
Videos & Documentaries
- "Interstellar" on Prime Video - Mentioned as a movie based on the short story "Story of Your Life."
- "Arrival" on Prime Video - Mentioned as a movie based on the short story "Story of Your Life."
- "Travelers" on Netflix - Mentioned as a favorite TV show about travelers from the future sent back in time to prevent an apocalypse.
- "Contact" - Mentioned as an all-time favorite movie involving scientists deciphering alien communication.
Research & Studies
- The Bitter Lesson (URL: http://www.incompleteideas.net/IncIdeas/BitterLesson.html) - Referenced in the context of Richard Sutton's view that LLMs might be a dead end.
- Richard Sutton--Father of RL thinks LLMs are a dead end (Source: dwarkesh.com) - Discussed in relation to the idea that LLMs might plateau.
Tools & Software
- Waymo (URL: https://waymo.com) - Mentioned as a magical product that exceeded expectations and felt like living in the future.
- Claude Code (URL: https://www.claude.com/product/claude-code) - Discussed as a product that was significantly better at coding and writing than other models for a long time.
- Gemini 3 (URL: https://aistudio.google.com/models/gemini-3) - Mentioned in the context of benchmarks and how models are trained to perform on them.
- Sora (URL: https://openai.com/sora) - Mentioned in the context of AI products and what they reveal about company values.
- Grok (URL: https://grok.com) - Mentioned as an example of an AI with a distinct personality and approach to answering questions.
- Coda (URL: https://coda.io/lenny) - Mentioned as a collaborative workspace used for managing podcasts and communities.
- Vanta (URL: https://vanta.com/lenny) - Mentioned as a service that automates compliance and simplifies security.
- WorkOS (URL: https://workos.com/lenny) - Mentioned as a modern identity platform for B2B SaaS.
Articles & Papers
- "The 100-person AI lab that became Anthropic and Google's secret weapon | Edwin Chen (Surge AI)" (Source: Lenny's Podcast) - The episode title itself, providing context for the discussion.
- "Surge AI" (Source: lennysnewsletter.com) - Referenced as a transcript for the podcast episode.
- "OpenAI’s CPO on how AI changes must-have skills, moats, coding, startup playbooks, more | Kevin Weil (CPO at OpenAI, ex-Instagram, Twitter)" (Source: Lenny's Newsletter) - Mentioned in the context of discussing AI's impact on product teams.
- "Anthropic’s CPO on what comes next | Mike Krieger (co-founder of Instagram)" (Source: Lenny's Newsletter) - Mentioned in the context of discussing AI's impact on product teams.
People
- Edwin Chen - Founder and CEO of Surge AI, guest on the podcast discussing AI, company building, and data quality.
- Noam Chomsky - Mentioned as an influence at MIT, related to the author's early fascination with language.
- Richard Sutton - Famous AI researcher who discussed the idea that LLMs might be a dead end.
- Terrence Rohan - A VC mentioned as collaborating on a post about finding generational companies.
- Brian Armstrong - Founder of Coinbase, mentioned for his unique background that enabled him to start the company.
- Kevin Weil - CPO at OpenAI, mentioned in a discussion about AI's impact on product teams.
- Mike Krieger - Co-founder of Instagram and CPO at Anthropic, mentioned in a discussion about AI's impact on product teams.
- Warren Buffett (X: https://x.com/WarrenBuffett) - Mentioned in the context of company values and long-term incentives.
Organizations & Institutions
- Surge AI (URL: https://surgehq.ai) - The company founded by Edwin Chen, focused on AI data and training.
- Google - Mentioned as a former employer of Edwin Chen and a company involved in AI research.
- Facebook - Mentioned as a former employer of Edwin Chen and a company involved in AI research.
- Twitter - Mentioned as a former employer of Edwin Chen and a company involved in AI research.
- MIT - Mentioned as the author's alma mater for studying mathematics, computer science, and linguistics.
- Anthropic - Mentioned as a company with a principled approach to AI development.
- OpenAI - Mentioned in the context of AI development and its CPO's appearance on the podcast.
- NFL (National Football League) - Mentioned in the context of sports analytics.
- New England Patriots - Mentioned as an example team for performance analysis.
- Pro Football Focus (PFF) - Mentioned as a data source for player grading.
- DeepMind - Mentioned as a research company that inspired the author's view on building companies.
Websites & Online Resources
- Lenny's Newsletter (URL: https://www.lennysnewsletter.com) - The platform hosting the podcast and associated content.
- Surge's blog (URL: https://surgehq.ai/blog) - A resource for content from Surge AI.
- LM Arena (URL: https://lmsys.org/blog/2023-03-30-arena-leaderboard/) - A popular online leaderboard for AI models, discussed critically.
- FlowingData (URL: https://flowingdata.com/2012/07/09/soda-versus-pop-on-twitter) - Mentioned for a dataset on soda versus pop terminology.
Other Resources
- Reinforcement Learning (RL) - Discussed as a method for training models to reach a reward by simulating environments.
- RL Environments - Simulations of the world used to train models on end-to-end tasks.
- Supervised Fine-Tuning (SFT) - A post-training method for models, analogous to mimicking a master.
- RLHF (Reinforcement Learning from Human Feedback) - A post-training method, analogous to learning from feedback on multiple essays.
- Rubrics and Verifiers (Evals) - Post-training methods involving detailed feedback, analogous to being graded.
- AGI (Artificial General Intelligence) - Discussed in terms of timelines and the direction of AI development.
- Soda versus Pop - A topic discussed in relation to regional language differences and a map created by the guest.