How does Mercurium reduce onboarding time?

Mercurium automates document classification, KYC/AML checks, entity mapping, and risk scoring. What traditionally takes 90 days is reduced to minutes through AI-driven parallel processing.

Is Mercurium deployable on-premise?

Yes. Mercurium supports private GPU, on-premise, and enterprise deployments with full data sovereignty and custom model retraining on client data.

What document accuracy does Mercurium achieve?

Mercurium achieves 99% layout recognition accuracy and 97.6% table extraction accuracy across diverse document formats and languages.

All articles

Article

Why we rebuilt document intelligence from scratch

Every RegTech platform claims to "read" documents. Almost none of them actually do. Here's what we learned after parsing 600-page prospectuses — and why we had to build our own pipeline from the ground up.

J Jean-François Poncet Co-founder & CEO · 24 Mar 2026 · 6 min read

The bar nobody measures against

When we started Mercurium, every vendor in the space sold us the same demo: a 2-page invoice, a driver's licence, maybe a utility bill. OCR extracts the name and address. 'Look how clean!'

That is not compliance. Compliance is a 640-page prospectus with 21 chapters, 358 defined terms, 492 internal cross-references, and 31 referenced external documents. It is a Mandarin board resolution scanned at 150 DPI with a red seal overlapping the signature. It is a subscription agreement where §6.3 amends a definition in §2.1(b)(iv) which was itself amended by a side letter you only see on page 312.

No off-the-shelf OCR, no generic LLM, no Document AI vendor handles that. They all quietly assume you will throw the messy 18% of documents back to a human. But the messy 18% is the whole job.

Three things that had to change

1. Layout as a first-class citizen

Text in isolation is almost useless. What matters is where the text sits: which column, which heading it's under, which table cell, which footnote. We rebuilt the layout model to preserve that structure in a canonical form — every paragraph knows its chapter, every number knows its table, every citation knows its target.

2. Images that mean something

A signature is not decoration — it's the compliance evidence. A corporate structure diagram is not decoration — it's the UBO chain. A stamp is not decoration — it's the regulator's authorisation. We train dedicated models to recognise and interpret these — not to discard them as 'noise' like most pipelines do.

3. One universal format downstream

After the reader runs, every document lands in the same Mercurium Document Format — whether it came in as a scanned PDF, an encrypted .doc, or an XLSX. Everything downstream — classification, extraction, cross-coherence checks, RAG — consumes that single schema. One integration, one debugging surface, one place where quality is measured.

What it unlocks

The biggest thing this buys us isn't accuracy on a leaderboard. It's the right to build everything else. Our cross-coherence checks, our adaptive questionnaire, our M&A Q&A — all of it only works because the foundation is solid. Without the rebuild, we'd be where our competitors are: stuck demoing 2-page invoices and explaining why anything more complex is "roadmap".

It was the hard path. It's also the only one that leads anywhere real.

Share LinkedIn X Email