Building AI Native Software by Composing Multiple Models

Software Engineering Radio - the podcast for professional software developers · December 03, 2025 · Listen to Original Episode →

Original Title:

Resources

Books

"Writing for Software Developers" - Mentioned as a previous topic Philip Kiely discussed on Software Engineering Radio.

Videos & Documentaries

N/A

Research & Studies

N/A

Tools & Software

GPT-5 - Mentioned as an example of a capable, off-the-shelf AI model.
Gemini - Mentioned as an example of a capable, off-the-shelf AI model.
Claude - Mentioned as an example of a capable, off-the-shelf AI model.
Llama - Mentioned as an example of an open-source large language model.
Quent - Mentioned as an example of an open-source large language model.
Mistral - Mentioned as an example of an open-source large language model.
DScript - Mentioned as an example of a product that uses multiple AI models for content creation.
Sourcegraph - Mentioned as a company building code editors that integrate codebase context with AI.
Zed - Mentioned as a company building code editors that integrate codebase context with AI.
Grafana - Mentioned as a standard observability tool used for dashboards and alerts.
Transformers - Mentioned as an underlying technology for running generative models.
Diffusers - Mentioned as an underlying technology for image models.
vLLM - Mentioned as an example of an inference engine for running models.
Sg Lang - Mentioned as an example of an inference engine for running models.
TensorRT LLM - Mentioned as an example of an inference engine for running models.

Articles & Papers

N/A

People Mentioned

Itamar Friedman - Guest on a previous episode discussing automated testing with generative AI (Episode 633).
Rishi Singh - Guest on a previous episode discussing using GenAI for test code generation (Episode 603).
Ipek Ozkaya - Guest on a previous episode discussing GenAI for software architecture (Episode 626).
Simon Wilson - Mentioned as someone who has written about prompt injection.

Organizations & Institutions

Base 10 - Philip Kiely's employer, an inference platform company.
IEEE Computer Society - Sponsor of Software Engineering Radio.
IEEE Software Magazine - Sponsor of Software Engineering Radio.
OpenAI - Mentioned in the context of their models and services.
Nvidia - Mentioned in the context of their GTC conference.

Courses & Educational Resources

N/A

Websites & Online Resources

se radio net - Website for Software Engineering Radio.
computer org - Website associated with the IEEE Computer Society.
Hugging Face - Mentioned as a place to download open-source model weights.
se radio slack com - Slack channel for Software Engineering Radio.

Other Resources

Multi-Agent AI - The primary topic of the episode, focusing on composing multiple AI models.
AI Native Software - Software built from the ground up with AI capabilities.
Function Calling / Tool Use - Technical implementation for agentic AI to interact with tools.
Retrieval Augmented Generation (RAG) - A technique to introduce new context into models dynamically.
Embedding Models - Used for RAG to encode semantic meaning.
Prompt Injection - A security vulnerability where prompts can alter a model's intended behavior.
Evals - Quality benchmarks created for specific products or domains.
Alignment - The philosophical and technical challenge of ensuring AI models are helpful, harmless, and useful.
Inference - The phase where a trained model is used to generate responses to user queries.
Training - The phase where data is fed to a model to improve its performance.
Weights (Model Weights) - The parameters within a neural network that determine its behavior.
Parameters - Individual numbers within a model that influence its output.
GPU - Graphics Processing Unit, hardware optimized for parallel computations essential for AI inference.
CPU - Central Processing Unit, the primary processor in a computer, less suited for large-scale AI parallel processing.
Tensor Cores - Specialized processing units within GPUs designed for matrix math.
Quantization - A technique to reduce the precision of model weights to decrease memory usage and improve speed.
Speculation Algorithm - An algorithm used in inference to predict future tokens.
Batch Sizes - The number of samples processed by the model at once during inference.
Sequence Lengths - The number of tokens the model considers at once.
Temperature - A parameter that controls the randomness of model output.
Observability - The practice of monitoring and understanding the internal state of a system.
Multi Cloud - Utilizing services from multiple cloud providers.
Multi Region - Deploying applications across different geographical regions for resilience and performance.