Specialized AI Inference: Infrastructure, Performance, and Developer Experience

Gradient Dissent: Conversations on AI · November 18, 2025 · Listen to Original Episode →

Original Title: The CEO Behind the Fastest-Growing AI Inference Company | Tuhin Srivastava

Related Episodes

Multi-Agent Systems Require New Infrastructure for Identity, Discovery, and Observability

Dec 05, 2025 The Stack Overflow Podcast

Unlock enterprise value by automating tasks with multi-agent systems, a new infrastructure layer beyond cloud-native that requires advanced identity and discovery for interoperable, trustworthy AI collaboration.

View Episode Notes →

A Unified Mathematical Theory of Intelligence

Dec 13, 2025 Machine Learning Street Talk (MLST)

Current AI memorizes; true intelligence discovers predictable patterns through compression and consistency, moving beyond mere data processing.

View Episode Notes →

Building a Business Moat Through Deliberate Investment in People and Culture

Feb 06, 2026 Perpetual Traffic

Build a lasting competitive advantage by investing in people and culture, not just quick wins. Discover how to cultivate talent that creates an unreplicable edge.

View Episode Notes →

AI Augments Education by Redefining Human Guides

Feb 25, 2026 AI & I

AI redefines education by shifting human guides to foster emotional intelligence, not replace teachers. Discover how this fosters deeper personalization and uniquely human skills.

View Episode Notes →

AI Agents Require Domain-Specific Data Generation and Reinforcement Learning

Dec 01, 2025 The Twenty Minute VC (20VC): Venture Capital | Startup Funding | The Pitch

AI agents automate 99% of knowledge work, shifting competitive moats to data feedback loops and enabling 100x productivity gains for widespread entrepreneurship.

View Episode Notes →

Accelerated AGI Timelines Force Global Chip Export and Regulation Debates

Jan 22, 2026 The AI Daily Brief: Artificial Intelligence News and Analysis

AGI is arriving within five years, forcing urgent debates on chip exports and regulation. This rapid progress creates a competitive "blitz," making global slowdowns unlikely and pushing companies toward aggressive self-improvement.

View Episode Notes →

Resources

Here are the external resources mentioned in the podcast episode:

Companies & Organizations

Base 10: The company discussed in the episode, focused on AI inference.
OpenAI: Mentioned as a provider of large language models that some companies initially use.
Waymo: Mentioned as a company founded around the same time as Base 10.
Cloudflare: Mentioned by the host as an example of a company with a long journey before rapid growth.
Patreon: Mentioned as a company experimenting with Whisper for subtitles.
Nvidia: Mentioned as a provider of GPUs and their technology (Hopper, B200s, B300s, GB series).
AMD: Mentioned as a vendor whose chips Base 10 has worked with.
Anthropic: Mentioned as a provider of large language models.
Google: Mentioned in the context of large AI model providers.

Products & Services

AI Inference: The core technology and market discussed in the episode.
Large Language Models (LLMs): General term for the AI models discussed.
Whisper: Mentioned as a tool used by Patreon for generating subtitles.
DALL-E 2: Mentioned as a benchmark for image generation models.
Stable Diffusion: Mentioned as an influential open-source image generation model.
Refusion: A project mentioned that used fine-tuned Stable Diffusion to generate music.
GPT (Generative Pre-trained Transformer): Mentioned implicitly through the discussion of OpenAI and ChatGPT.
ChatGPT: Mentioned as a product that set consumer expectations for AI.
LLaMA: An open-source model mentioned as an example of a commodity offering.
TFLM (TensorFlow Lite for Microcontrollers): Mentioned as an optimization framework for AI models.
DLN (Deep Learning Neural Network): Mentioned as a type of framework.
SG Lang: Mentioned as an open-source framework for AI inference.
CUDA: Mentioned as Nvidia's parallel computing platform and its importance in the AI ecosystem.
Hopper: A series of Nvidia GPUs.
B200s and B300s: Nvidia GPU models.
GB series: Refers to Nvidia's Grace Blackwell platform.
H100: A specific model of Nvidia GPU.

Concepts & Technologies

AI Inference: The process of using a trained AI model to make predictions.
Application Layer: The layer where user-facing applications are built.
Model Deployment: The process of making AI models available for use.
Model Serving: The process of delivering AI model predictions upon request.
Scalability: The ability of a system to handle increasing workloads.
Runtime: The software environment in which a model runs.
GPU (Graphics Processing Unit): Hardware used for accelerating AI computations.
KV Cache: A cache used in transformer models to speed up inference.
SLAs (Service Level Agreements): Agreements on performance and availability.
Tokens per second: A metric for measuring the speed of language models.
Time to first token: The latency before the first output token is generated.
Time per output token: The time taken to generate subsequent tokens.
Throughput: The rate at which a system can process data or requests.
Memory bandwidth: The rate at which data can be read from or written to memory.
Quantization: A technique to reduce the precision of model weights to improve efficiency.
Cost per token: The cost associated with generating each token.
Prefill: The initial forward pass in a neural network for inference.
Decode: The process of generating subsequent tokens in a sequence.
Flash Attention: An optimized attention mechanism.
Continuous Batching: A technique to improve GPU utilization in inference servers.
AGI (Artificial General Intelligence): Hypothetical future AI with human-like cognitive abilities.
Reinforcement Learning (RL): A type of machine learning.
Tool Use: The ability of AI models to use external tools.
Sandbox: An isolated environment for executing code.

People

Tuhin Srivastava: CEO of Base 10.
Lucas Ball: Host of the podcast.
Sam: A friend of the interviewee who developed Refusion.
Sarah: A board member at Base 10.