Beyond Bounding Boxes: High-Skill Data for Frontier AI - Episode Hero Image

Beyond Bounding Boxes: High-Skill Data for Frontier AI

Original Title:

Resources

Books

  • "The Art of Computer Programming" by Donald Knuth - Mentioned as an example of a foundational work in computer science, implying a similar depth of understanding is needed for advanced AI data research.

Videos & Documentaries

  • None

Research & Studies

  • Llama 2 Paper - Referenced for its findings on the effectiveness of RLHF over SFT data in training language models.

Tools & Software

  • Crowdflower - Mentioned as a previous company the host started in the data collection business.

Articles & Papers

  • None

People Mentioned

  • Donald Knuth (Author of "The Art of Computer Programming") - Cited as a figure associated with foundational computer science knowledge.
  • Terence Tao - Mentioned as an example of someone whose skill level would not differ in drawing a bounding box, highlighting the low complexity of such tasks.
  • Albert Einstein - Mentioned as an example of someone whose skill level would not differ in drawing a bounding box, highlighting the low complexity of such tasks.
  • Hemingway - Mentioned as an example of a great writer who did not have a PhD.
  • Emily Dickinson - Mentioned as an example of a great poet who did not have a PhD.

Organizations & Institutions

  • Twitter - Mentioned as a previous employer of the guest where he encountered data collection issues.
  • Google - Mentioned as a company where the guest experienced similar data collection problems.
  • Facebook - Mentioned as a company where the guest experienced similar data collection problems.
  • Airbnb - Mentioned as an early customer of Surge.
  • Meta - Mentioned in the context of potentially making Llama open-source models closed-source.
  • OpenAI - Mentioned as a major AI lab with a distinct culture and model.
  • Anthropic - Mentioned as a major AI lab with a distinct culture and model.
  • Elon Musk's X - Mentioned as a major AI lab with a distinct culture and model.
  • Stanford University - Mentioned as an example of where a professor might be working on frontier research.

Courses & Educational Resources

  • None

Websites & Online Resources

  • LLM Arena - Discussed as a popular leaderboard for LLM models that the guest believes has negatively impacted the industry due to flawed evaluation methods.

Other Resources

  • Lidar model - Used as an example of a specific application where understanding underlying principles helps in data collection.
  • Cloud Code - Mentioned as an example of a coding collaborator.
  • AWS - Mentioned as a service that could go down in a simulated RL environment.
  • Gmail - Mentioned as a communication tool within a simulated RL environment.
  • Slack - Mentioned as a communication tool within a simulated RL environment.
  • Jira tickets - Mentioned as a tool within a simulated RL environment.
  • Github PRs - Mentioned as a tool within a simulated RL environment.
  • Codebases - Mentioned as a component within a simulated RL environment.
  • Sat style math problems - Used as an example of narrow, synthetic problems that models can become good at, but at the expense of broader capabilities.
  • RL environments - Discussed as a new method for data collection, involving simulated universes with complex tasks.

---
Handpicked links, AI-assisted summaries. Human judgment, machine efficiency.
This content is a personally curated review and synopsis derived from the original podcast episode.