AI's Reasoning Advances and Trust Challenges in Health, Content, and Math
TL;DR
- DeepSeek's interleaved thinking feature, which intersperses reasoning throughout dialogue and assesses information credibility, enables more accurate results and better reasoning logic from AI models.
- Axiom Math's architecture, coupling a language modeling kernel with formal proof systems, aims to overcome LLM limitations in pure mathematics by using verification-driven training signals.
- The Withings Body Scan 2 uses AI to provide medical-grade health monitoring at home, measuring over 60 biomarkers previously only available clinically.
- AI-powered wearables like the Vochi smart ring offer voice-activated note-taking and transcription, functioning as cognitive assistants to aid memory and accessibility.
- Authenticity in content creation is shifting from polished production to genuine, unscripted moments, with community and social proof becoming key trust signals.
- The increasing ease of AI-generated content necessitates consumers taking proactive steps, like identity verification, to establish trust and combat deepfakes.
Deep Dive
The increasing sophistication of AI models, particularly in areas like recursive inference and specialized mathematical reasoning, presents both opportunities and challenges. While AI can now perform complex multi-step research and generate highly accurate mathematical proofs, concerns about its potential for hallucination, misuse, and the erosion of trust are becoming paramount. This necessitates a shift towards AI systems that prioritize verifiable accuracy and user authentication, alongside a societal re-evaluation of authenticity in digital interactions.
The advancement in AI's reasoning capabilities is evident in DeepSeek's interleaved thinking feature, which allows models to intersperse reasoning throughout a dialogue, assessing information credibility between actions. This approach, combined with a significant surge in DeepSeek's monthly active users to 132 million, highlights a growing demand for more nuanced AI interactions beyond simple question-and-answer formats. Simultaneously, a startup named Axiom Math is developing AI mathematicians that tightly couple language modeling with formal proof systems, aiming to overcome the hallucination problem prevalent in general-purpose LLMs, as exemplified by mathematician Joel David Hamkins' critique of current AI’s unreliability in pure mathematics.
In the health sector, AI is enabling sophisticated personal health monitoring that was previously confined to clinical settings. Stanford's SleepGPT can predict over 130 health conditions from sleep data, and devices like the Withings Body Scan 2 offer comprehensive biomarker analysis, including cardiac function and cellular health, powered by AI for personalized insights. These technologies promise early disease detection and proactive health management, yet they also raise significant privacy and data security concerns, particularly regarding the potential for health data to be accessed by insurers or other entities. The conversation underscores a societal tension between the desire for personalized health insights and the inherent risks of data commodification and potential misuse.
A critical emerging theme is the struggle to maintain authenticity and trust in an AI-saturated digital landscape. The ease with which AI can generate realistic fake content, from deepfakes to non-consensual imagery, necessitates new methods for verifying identity and content origin. This is driving a push for verified online presences, such as identity verification through passports for online interactions, to establish trust and combat misinformation. The underlying principle is that as AI becomes more capable of mimicking human output, the value shifts to demonstrable human origin and established community trust, moving beyond superficial content production.
The implications of these trends are far-reaching. For content creators and public figures, authenticity is becoming a key differentiator, requiring more than just a "low-fi" aesthetic; it demands verifiable human origin and a demonstrated community. This evolving landscape suggests a future where online interactions will be increasingly scrutinized for authenticity, with a potential gravitation towards in-person experiences built on established digital trust. Furthermore, the pervasive issue of AI-generated abuse, particularly non-consensual imagery, highlights the urgent need for robust ethical frameworks and technological solutions to safeguard individuals and uphold societal trust in digital spaces. The challenge lies in balancing the benefits of AI with the imperative to protect privacy, prevent misuse, and ensure that human connection remains at the core of our interactions.
Action Items
- Audit AI model outputs: For 3-5 core reasoning tasks, compare LLM performance against human expert evaluations to identify systematic reasoning errors.
- Implement identity verification: For 100% of public-facing community events, require passport-level verification (e.g., via NotBot) to prevent deepfakes and ensure participant authenticity.
- Develop content authenticity framework: Define 5-7 criteria for distinguishing human-generated content from AI-generated content, focusing on unique personal perspectives and community engagement signals.
- Establish personal baseline health metrics: Utilize a comprehensive health monitoring device (e.g., Withings Body Scan 2) to collect daily physiological data for at least 3 months to establish individual baselines.
- Create a secure content creation workflow: For all outgoing content, integrate digital watermarking or verifiable signatures to provide a mechanism for public verification of authenticity.
Key Quotes
"what we're seeing in the advancement of reasoning in ai models is this idea of recursive uh inference so that the loop we discussed this yesterday and this concept of looping like the ability to achieve with a smaller model or with even a large model more accurate results and better reasoning logic by doing recursive inference returning loops with some kind of memory that it acts on"
Andy explains the concept of "recursive inference" in AI models, highlighting its potential to improve accuracy and reasoning logic through iterative loops with memory. This suggests a more sophisticated approach to AI problem-solving beyond single-pass processing.
"mathematicians are working on proofs and developing new theorems in the sort of the ethereal world of mathematics llms are not very impressive and one of the world's biggest mathematicians a fellow named joel david hamkins has slammed ai models used for solving mathematics and calls them zero and garbage adding he doesn't find them useful at all he highlighted ai's frustrating tendency to confidently assert incorrect you know conclusions and resist correction"
Andy relays mathematician Joel David Hamkins' strong criticism of current AI models for pure mathematics, noting their tendency to produce incorrect conclusions with unwarranted confidence. This points to a significant gap between AI's capabilities in general tasks and its reliability in highly specialized, abstract domains like advanced mathematics.
"its core architectural idea is to move from generic next token prediction which creates hallucinations as we know in llms um and instead use a stack that tightly couples a language modeling sort of kernel with formal proof systems and programmatic reasoning from mathematics so it's not trained on the broad web and conversational data so it's not going to spin out and you can't kind of jailbreak it and have it talk about you know politics or anything"
Andy describes Axiom Math's innovative approach, which moves away from standard LLM "next token prediction" to a system that integrates formal proof systems and programmatic reasoning. This architectural shift aims to eliminate hallucinations common in LLMs by grounding the AI in mathematical logic and specific domains, rather than broad web data.
"65 000 participants uh their data was 600 000 hours of their data from those participants analyzing brain waves heart activity breathing and muscle signals so that sounds like this is something that would come from the sensors on your head"
Beth introduces Sleep AI, a new foundation model from Stanford that analyzes extensive sleep data from 65,000 participants. This model's ability to process brain waves, heart activity, breathing, and muscle signals indicates a comprehensive approach to understanding health through sleep patterns.
"it adds ai personalization using the collective network of the withings ecosystem eventually where you would have billions of measurements from hundreds of thousands or millions of users of this system and it would establish your individual physiological baseline personally but it would also compare that to all of the others that it's collecting so that you can see deviations not only in your own baseline but deviations from the expected you know measurements that would come from other people of similar you know age and health levels and so on"
Andy explains how the Withings Body Scan 2 utilizes AI personalization by comparing an individual's health metrics against a vast dataset from its user ecosystem. This feature allows for the establishment of a personal baseline and comparison against aggregated data from similar individuals, offering insights into deviations from expected health norms.
"the memo from adam messery who i have historically really appreciated his takes on things is is you know that content is just absolute commodity you want to make an authentic video versus a polished video remember like how instagram you had your perfect grids and everybody was on vacation and everybody was a supermodel that transition to more tiktok like vibes still a little bit produced you know and now it's a little less produced"
Anne references a memo from Adam Mosseri, suggesting that authentic, less polished content is becoming more valuable than highly produced content. This shift, observed in the evolution from Instagram's curated feeds to TikTok's more spontaneous style, indicates a move towards content that feels more genuine and less manufactured.
"but it's about the footprint that you've created and the community that you've created and the social proof that you have that you're a real person and that people like you and trust you and want to hang out with you like those are the signals that we should do business if we're thinking about it just from a business let alone social"
Anne elaborates on Mosseri's perspective, arguing that true authenticity in content creation is demonstrated through an established footprint, community, and social proof of being a real, trustworthy person. These elements, rather than just unpolished presentation, are presented as the key signals for building business and social connections in the current landscape.
"the other thing right only you can create that content only anne andy beth jude quinn jennifer how only you what are the people only you like your perspective right because like ai can do everything else and i and i think people maybe people say hey this is hard it's like well one because it's hard people put a premium on it yeah true it actually is easier because yeah you know what it's easy to just roll it just start talking about random stuff it is actually easier but you still have to be coherent like it still has to kind of make sense"
Carl highlights that the unique perspective and coherence of a human creator are what AI cannot replicate, even as AI becomes capable of producing polished content. He suggests that while AI can handle many aspects of content creation, the genuine, personal viewpoint and the ability to make sense of random thoughts remain distinctly human contributions.
Resources
External Resources
Books
- "The Handmaid's Tale" - Mentioned as an example of a cautionary societal outcome related to data privacy and control.
Articles & Papers
- "The Problem With AI Benchmarks" (The Daily AI Show) - Episode title that frames the discussion.
Research & Studies
- Sleep Study - Mentioned in relation to the development of the Sleep Foundation model and personal health monitoring.
- Stanford Sleep Study - Referenced by a participant regarding their personal experience.
People
- Beth - Co-host of The Daily AI Show.
- Andy - Co-host of The Daily AI Show.
- Jude - Chat participant.
- Jeff - Chat participant.
- Joel David Hamkins - Mathematician who criticized AI models for mathematical proofs.
- Karina Hong - Founder of Axiom Math.
- Adam Mosseri - CEO of Instagram, whose memo on creator authenticity was discussed.
- Nate Jones - Content creator whose style is cited as an example of authentic presentation.
- Andrea - Wife of Carl, with whom he discussed content trends.
- Sam Altman - CEO of OpenAI, whose speaking style and attire are theorized to be intentionally basic for AI replication.
- Brian - Mentioned as a returning guest.
Organizations & Institutions
- DeepSeek - China AI startup that added an interleaved thinking feature to its chatbot.
- Hugging Face - Platform where AI models can be accessed.
- Poe - Platform where AI models can be accessed.
- OpenAI - Organization whose forums require identity verification.
- Anthropic - AI company contrasted for its ethical stance on content generation.
- Greylock - Venture capital firm investing in Axiom Math.
- Menlo Ventures - Venture capital firm investing in Axiom Math.
- Stanford - Institution that published the Sleep Foundation model.
- Withings - Company that released the Withings Body Scan 2 health monitoring platform.
- Apple - Company whose health platform the Withings Body Scan 2 integrates with.
- Gygés Labs - Company that showcased the Vochi smart ring.
- SwitchBot - Company that launched a voice recording wearable.
- X (formerly Twitter) - Platform where AI image generation usage is discussed.
- XAI - Elon Musk's AI company, discussed in relation to problematic image generation.
- She Leads AI - Community prioritizing a safe space for on-screen interactions.
- Instagram - Social media platform whose CEO's memo on authenticity was discussed.
- TikTok - Social media platform whose content style is contrasted with Instagram.
- Snapchat - Social media platform used by Gen Z for informal sharing.
- LinkedIn - Professional networking platform where bot comments are noted and where a speaker found business-mode content performed better.
Tools & Software
- Axiom Math - AI mathematician startup.
- Sleep Foundation (Sleep FM) - AI foundation model from Stanford that predicts health situations from sleep data.
- Withings Body Scan 2 - Health monitoring platform disguised as a bathroom scale.
- Vochi Smart Ring - AI-powered wearable for voice note-taking and health tracking.
- NotBot - Company used for identity verification via passport.
Websites & Online Resources
- DeepSeek Website - Direct access point for the DeepSeek model.
Podcasts & Audio
- The Daily AI Show - Podcast where the discussion took place.
Other Resources
- Recursive Inference - Concept of looping with memory to improve AI model accuracy and reasoning.
- Interleaved Thinking - Feature in DeepSeek's chatbot for multi-step research with interspersed reasoning.
- Just-in-time distribution - Mentioned in the context of delivery issues.
- AI Benchmarks - Topic of discussion regarding AI model evaluation.
- Large Language Models (LLMs) - AI models discussed in relation to spreadsheets, financial analysis, and mathematics.
- Formal Proof Systems - Component of Axiom Math's architecture.
- Programmatic Reasoning - Component of Axiom Math's architecture.
- Next Token Prediction - Generic AI model training method contrasted with Axiom Math's approach.
- Hallucinations - AI output errors discussed in relation to LLMs.
- Sleep Study Data - Information analyzed by the Sleep Foundation model.
- Brain Waves - Physiological signal measured in sleep studies.
- Heart Activity - Physiological signal measured in sleep studies.
- Breathing - Physiological signal measured in sleep studies.
- Muscle Signals - Physiological signal measured in sleep studies.
- CPAP System - Medical device for sleep apnea.
- Obstructive Sleep Apnea - Medical condition.
- Bipap Systems - Medical devices for sleep apnea.
- Hypopnea Events - Sleep-related breathing disturbances.
- Biomarkers - Health indicators measured by the Withings Body Scan 2.
- Cardiovascular Function - Health metric.
- Cellular Health - Health metric.
- Metabolic Efficiency - Health metric.
- Impedance Cardiography System - Technology for measuring cardiac pumping efficiency.
- Six-Lead Electrocardiogram (ECG) - Technology for measuring heart electrical activity.
- Sinus Cardiac Rhythm - Desired heart rhythm.
- Ultra-High Frequency Bioimpedance Spectroscopy - Technology for assessing cellular health and vascular age.
- Vascular Age Assessment - Health metric.
- Glycemic Regulation - Health metric.
- Hypertension Risk Notifications - Health alert.
- Arterial Stiffness - Health metric.
- Glycemic Monitor - Device for measuring blood sugar.
- GDPR Compliance - Data privacy standard.
- HIPAA Compliance - Data privacy standard for health information.
- ISO 27001 and 27701 Certifications - Information security and privacy management standards.
- Apple Health - Health data aggregation platform.
- Personal Health Data - Information governed by regulations like HIPAA.
- Anonymized Data - Data stripped of personal identifiers.
- Period Trackers - Apps used for menstrual cycle tracking.
- Hand Gestures - Input method for the Vochi smart ring.
- Physical Button - Input method for the Vochi smart ring.
- Titanium Rim - Material of the Vochi smart ring.
- Voice Activated Note Taking - Functionality of the Vochi smart ring.
- Transcribes Conversations - Feature of the Vochi smart ring.
- Speaker Identification - Feature of the Vochi smart ring.
- Encryption - Security measure for recordings.
- Cognitive Assistance - Space encompassing AI wearables for note-taking and other functions.
- Declining Cognition Abilities - Health concern addressed by cognitive assistance tools.
- Dementia - Cognitive condition.
- Parkinson's Disease - Neurological disorder.
- Speech Analysis - Method for detecting health conditions.
- X-ray of Lung - Medical imaging.
- Cancer Markers - Indicators of potential cancer.
- Breast Cancer Markers - Indicators of potential breast cancer.
- Pre-Dementia Markers - Indicators of potential cognitive decline.
- Personal Baseline - Individual health reference point.
- Group Signal - Aggregated health data for comparison.
- Vibe Coding - Mentioned as a future topic.
- Cursor - Mentioned as a future topic.
- Cloud Code - Mentioned as a future topic.
- AI Valuation - Discussion of company valuations in the AI sector.
- Nvidia Backing - Investment in AI companies.
- Series E Funding - Stage of investment.
- Qatar's Sovereign Wealth Fund - Investor in AI companies.
- Non-Consensual Imagery - Problematic AI-generated content.
- Gck (likely referring to a specific AI image generator) - AI tool used for generating images.
- CSAM (Child Sexual Abuse Material) - Illegal and harmful content.
- Deep Fake - AI-generated synthetic media.
- Revenge Porn - Non-consensual sharing of intimate images.
- Nudity - Content discussed in relation to AI generation.
- Government ID - Requirement for accessing consensual pornography.
- Free Website - Access point for generating AI content.
- Elon Musk - Figure whose platforms (X and XAI) are discussed in relation to AI ethics.
- Cloud - AI model from Anthropic.
- Colossus - AI being built, described as powerful.
- Zoom Link - Access method for online meetings.
- Socials - Social media profiles.
- Pronouns - Personal identifiers discussed in online interactions.
- Passport - Document used for identity verification.
- QR Code - Visual code for digital information.
- Verified Identity - Confirmation of an individual's identity.
- Content Creators - Individuals who produce online content.
- Public Figures - Individuals with a public profile.
- Bots - Automated accounts on social media.
- Slack Channel - Communication platform for groups.
- Social Proof - Evidence of authenticity and trustworthiness.
- In-Person Experiences - Real-world interactions.
- Online Dating - Mentioned as a context where AI impacts authenticity.
- Parasocial Relationships - One-sided relationships with online personalities.
- Feed - Content displayed on social media platforms.
- One-off Videos - Individual video content pieces.
- Basic Attire - Clothing style.
- Per Animals Outfits - Clothing style.
- Shirt, Pants, Shoes - Basic clothing items.
- Content - Material produced for consumption.
- Authentic Video - Unpolished video content.
- Polished Video - Highly produced video content.
- TikTok-like Vibes - Informal content style.
- Lighting - Aspect of video production.
- Leaky or Blurry Camera - Imperfect video quality.
- Footprint - Digital presence and history.
- Community - Group of people with shared interests.
- Self Identity - Sense of self.
- Quality of Life - Standard of living.
- Longevity - Lifespan.
- Trust Dry Spell - Period of low trust.
- Unpaid Time - Leisure time.
- Gen Z - Demographic group.
- Unflattering Selfies - Self-portraits that are not aesthetically pleasing.
- Shame - Feeling of embarrassment.
- Haggsfield - Likely a misstatement of a technical term or place.
- AI Replication - AI's ability to imitate.
- Social Proof - Evidence of authenticity.
- Business - Commercial activity.
- Social - Relating to society or its organization.
- Bots in Comments - Automated comments on social media.
- Real People in Comments - Genuine user engagement.