Beyond OCR: AI Unlocks Document Structure and Vision
Resources
Resources & Recommendations
Tools & Software
- Tesseract - A classic OCR tool mentioned as an example of typical OCR models.
- Paddle OCR - Another example of a typical OCR model.
- Docling (IBM) - A toolkit and concept for document structure models, used to predict the structure of a document rather than just extracting text.
- Marked-down (Microsoft) - A toolkit similar to Docling, used in retrieval augmented generation (RAG) systems to preserve document structure.
- Quinn 25 Vision Language Model - A specific language vision model that the speakers have used, noted for being a good model.
- DeepSeek OCR - A newer model that processes documents by splitting them into high-resolution image tokens combined with a global full-resolution view, aiming to preserve more detail than traditional vision language models.
Organizations & Institutions
- IBM - Developed the Docling toolkit for document processing.
- Hugging Face - Released a smaller Docling model suitable for constrained environments.
- Microsoft - Developed the Marked-down toolkit, used in RAG systems.
Websites & Online Resources
- practicalai.fm - The podcast's official website.
- LinkedIn - A platform to connect with the podcast for updates and insights.
- X - A platform to connect with the podcast for updates and insights.
- Blue Sky - A platform to connect with the podcast for updates and insights.
- predictionguard.com - Website for Prediction Guard, a partner providing operational support for the show.