floatingpoint

W24Pivot 2 of 2

people|Active|Website

106°Near Reinvention

Before

Open source API service to parse complex documents

After

Post-training data to teach models document work

Full description — before

Battle-tested + highly modular vision infrastructure to convert PDFs, PPTs, Word, Excel, PNG, and JPEGs into LLM-ready data. We started by building lumina.sh - where we needed to parse ~600M pages of scientific literature. The researchers didn't care - but devs wanted our ingestion pipeline. So we built chunkr instead. We offer high quality layout analysis, OCR, bounding boxes, granular VLM controls, semantic chunking, and all the last mile engineering that goes into building standout AI applications. Common use-cases include RAG, and automating document workflows like invoices/medical reports -> database.

Full description — after

Floatingpoint builds off-the-shelf post-training datasets that teach models how to do real work with documents. We discover valuable tasks where models fall short and build datasets to close the gap. Human-crafted from real-world sources with synthetic expansions on top, and validated through in-house training cycles.

Category shift

AI Legal AutomationAI-Powered Education

Summary

The company moved from providing an API service for document parsing (infrastructure/tools for developers) to selling pre-built post-training datasets for teaching models document tasks (data product for ML teams), which is a notable shift in core offering but both target support for AI & document workflows.

Detected 3 months ago · 2026-03-20

Company journey — 2 pivots

Current

Post-training data to teach models document work(viewing)

105.7°Near Reinvention2026-03-20

Open source API service to parse complex documents

87.4°Major Pivot2025-02-13

Started as

AI Search Engine for Research