Building AI Native Software by Composing Multiple Models - Episode Hero Image

Building AI Native Software by Composing Multiple Models

Original Title:

Resources

Books

  • "Writing for Software Developers" - Mentioned as a previous topic Philip Kiely discussed on Software Engineering Radio.

Videos & Documentaries

  • N/A

Research & Studies

  • N/A

Tools & Software

  • GPT-5 - Mentioned as an example of a capable, off-the-shelf AI model.
  • Gemini - Mentioned as an example of a capable, off-the-shelf AI model.
  • Claude - Mentioned as an example of a capable, off-the-shelf AI model.
  • Llama - Mentioned as an example of an open-source large language model.
  • Quent - Mentioned as an example of an open-source large language model.
  • Mistral - Mentioned as an example of an open-source large language model.
  • DScript - Mentioned as an example of a product that uses multiple AI models for content creation.
  • Sourcegraph - Mentioned as a company building code editors that integrate codebase context with AI.
  • Zed - Mentioned as a company building code editors that integrate codebase context with AI.
  • Grafana - Mentioned as a standard observability tool used for dashboards and alerts.
  • Transformers - Mentioned as an underlying technology for running generative models.
  • Diffusers - Mentioned as an underlying technology for image models.
  • vLLM - Mentioned as an example of an inference engine for running models.
  • Sg Lang - Mentioned as an example of an inference engine for running models.
  • TensorRT LLM - Mentioned as an example of an inference engine for running models.

Articles & Papers

  • N/A

People Mentioned

  • Itamar Friedman - Guest on a previous episode discussing automated testing with generative AI (Episode 633).
  • Rishi Singh - Guest on a previous episode discussing using GenAI for test code generation (Episode 603).
  • Ipek Ozkaya - Guest on a previous episode discussing GenAI for software architecture (Episode 626).
  • Simon Wilson - Mentioned as someone who has written about prompt injection.

Organizations & Institutions

  • Base 10 - Philip Kiely's employer, an inference platform company.
  • IEEE Computer Society - Sponsor of Software Engineering Radio.
  • IEEE Software Magazine - Sponsor of Software Engineering Radio.
  • OpenAI - Mentioned in the context of their models and services.
  • Nvidia - Mentioned in the context of their GTC conference.

Courses & Educational Resources

  • N/A

Websites & Online Resources

  • se radio net - Website for Software Engineering Radio.
  • computer org - Website associated with the IEEE Computer Society.
  • Hugging Face - Mentioned as a place to download open-source model weights.
  • se radio slack com - Slack channel for Software Engineering Radio.

Other Resources

  • Multi-Agent AI - The primary topic of the episode, focusing on composing multiple AI models.
  • AI Native Software - Software built from the ground up with AI capabilities.
  • Function Calling / Tool Use - Technical implementation for agentic AI to interact with tools.
  • Retrieval Augmented Generation (RAG) - A technique to introduce new context into models dynamically.
  • Embedding Models - Used for RAG to encode semantic meaning.
  • Prompt Injection - A security vulnerability where prompts can alter a model's intended behavior.
  • Evals - Quality benchmarks created for specific products or domains.
  • Alignment - The philosophical and technical challenge of ensuring AI models are helpful, harmless, and useful.
  • Inference - The phase where a trained model is used to generate responses to user queries.
  • Training - The phase where data is fed to a model to improve its performance.
  • Weights (Model Weights) - The parameters within a neural network that determine its behavior.
  • Parameters - Individual numbers within a model that influence its output.
  • GPU - Graphics Processing Unit, hardware optimized for parallel computations essential for AI inference.
  • CPU - Central Processing Unit, the primary processor in a computer, less suited for large-scale AI parallel processing.
  • Tensor Cores - Specialized processing units within GPUs designed for matrix math.
  • Quantization - A technique to reduce the precision of model weights to decrease memory usage and improve speed.
  • Speculation Algorithm - An algorithm used in inference to predict future tokens.
  • Batch Sizes - The number of samples processed by the model at once during inference.
  • Sequence Lengths - The number of tokens the model considers at once.
  • Temperature - A parameter that controls the randomness of model output.
  • Observability - The practice of monitoring and understanding the internal state of a system.
  • Multi Cloud - Utilizing services from multiple cloud providers.
  • Multi Region - Deploying applications across different geographical regions for resilience and performance.

---
Handpicked links, AI-assisted summaries. Human judgment, machine efficiency.
This content is a personally curated review and synopsis derived from the original podcast episode.