Gen-AI Document Translation

TL;DR

I led the delivery of a Gen-AI translation feature that lets researchers translate regulatory documents in-context inside the content platform. Researchers can add documents to findings, trigger a one-click translation, review translated text side-by-side with the original, download translations (or markdown fallbacks), and interactively surface VAT-relevant sections. The feature replaced manual copy-paste and third-party workflows, saved research time, and improved accuracy and collaboration across international tax teams.

Problem We Set Out to Solve

International VAT and EMEA returns researchers faced repeated friction:

Manual translation overhead

Teams translated ~1,200 documents/year manually (5–15 minutes per document), often splitting large PDFs to work within free services — losing context and wasting formatting effort.

Large/complex documents

Some regulatory notices exceed 300 pages and can't be reliably or affordably translated by off-the-shelf tools without splitting.

Lack of integrated workflows

Researchers had to leave the content platform to translate and reattach documents, breaking context and audit trails.

Discoverability

Researchers needed translated content and highlighted areas that indicate VAT changes — not just a raw translation.

Goal:

Provide an integrated Gen-AI translation tool that automates translation, preserves context & formatting (or provides markdown fallbacks), highlights VAT-relevant areas, supports large documents, persists translations in findings, and enables downloading and re-translation.

My Remit & Constraints

  • Run UAT with international VAT and EMEA returns teams and ship a production-grade feature
  • Preserve human-in-the-loop: researchers must be able to review and re-translate; sensitive automation is controlled
  • Maintain auditability: translations persist, are downloadable, and do not overwrite the source
  • Support multiple formats (PDF, HTML, other document types) and large documents
  • Deliver an in-platform translation experience available in Notice, Finding and Extraction contexts

Core Features

One-Click Translation

Translate button on document viewer with background processing

Multi-Format Support

PDF, HTML, and other document types with markdown fallbacks

VAT Highlighting

Automatic highlighting of VAT-relevant sections with interactive queries

Download Options

Original, Translated, or All documents downloadable

Re-translate Capability

Request re-run via Gen-AI or cloud service if needed

Persistent & Shared

Translations persist per document and across all assigned users

Approach & Execution

01

Defined scope and acceptance criteria

Produced a one-page objective and a DoD checklist that stakeholders signed off on. The checklist covered where translations are available (notice, finding, extraction), persistence, auto-detect language, download options, long-document behavior, and UX for translate/Show Original toggles.

02

Addressed long-document & formatting challenges

Designed a hybrid approach: LLM / Gen-AI for semantic translation and summarization + cloud translation services as a fallback for formats or languages where preserving layout was hard. Implemented a markdown fallback for languages/documents where faithful layout retention is impractical.

03

Built an integrated, asynchronous UX

One-click Translate button on the document viewer that starts translation and shows a progress indicator. The translation runs in the background (user can leave the task and return). Persisted translations per document and across users.

04

Interaction & highlighting

Added interactive querying and automatic highlighting of VAT-relevant sections. The UI highlights matching regions in the translated or original document and supports switching between multiple documents attached to the finding.

05

Operational decisions & UX

Disabled translate button for HTML docs and surfaced guidance to use standard browser translation where fidelity is known to be poor. For very long documents, implemented chunking that preserves page context (overlap between chunks). Built re-translate capability.

06

UAT & feedback loop

Ran user acceptance testing with International VAT and EMEA Returns teams. Their feedback refined UX (persist behavior, download options, markdown output) and validation flows (confirming highlighted VAT changes).

Key UX Decisions

Show Original / Show Translated Toggle

Easy switching between original and translated versions for side-by-side review.

Background Processing

User can leave the task and return — translation continues in the background with progress indicator.

Chunking with Context Overlap

For very long documents, chunks preserve page context with overlap to avoid losing cross-page references.

Markdown Fallback

For languages/documents where faithful layout retention is impractical, markdown preserves headings, lists and inline emphasis.

Outcomes & Impact

1,200+
Documents/year previously translated manually now automated
5-15 min
Per document saved vs manual copy-paste workflows
300+
Page documents supported with chunking
100%
Audit trail preserved with persistent translations