Dev Status and Delivery Roadmap

Project health against your POC proposal, with phase-based execution plan for RAG and advanced ML.

Track Assessment

Yes, on right track

Current Mode

POC extraction + structuring

RAG Status

Planned next phase

LLM Training Status

Not training yet (correct for POC)

3 Core Features (Project USP)

1) Document Reader / Extractor

Upload any supported file, run OCR/parsing, extract structured fields, validate output, and store tenant-scoped records.

2) RAG Pipeline

Reader/loader, chunking, embeddings, vector DB indexing, retrieval + rerank, and citation-grounded LLM responses.

3) ML Training Lifecycle

Benchmarking, correction-driven dataset growth, evaluation loops, and optional fine-tuning when prompt+RAG plateau is reached.

Project USP: RAG + ML Intelligence Layer

Core platform value is document extraction and structuring. Competitive differentiation comes from tenant-aware RAG, grounded answers with citations, and measurable ML quality improvement loops.

Tenant-aware RAG

Semantic retrieval constrained by tenant/user authorization and metadata filtering.

Grounded AI responses

Answers linked to retrieved source chunks with citations to minimize hallucinations.

Configurable by domain

Medical, billing, finance, and utility extraction rules can run on one common platform.

ML quality loop

Confidence tracking, human corrections, and benchmark-driven continuous improvements.

Are We Correct and On the Right Track?

Core POC objective alignment

Completed

Upload -> OCR/parsing -> LLM structuring -> validation -> DB -> dashboard is implemented and running.

Configurable extraction by department

Completed

Single-dropdown department model is live, with one JSON template per department and prepopulated field rules.

Operational reliability for POC

In progress

Error tracking and delete flows are done; queue workers, retries, and migration discipline are next.

Cloud deployment readiness (GCP)

In progress

Backend Cloud Run is live on March 5, 2026; frontend Cloud Run deployment and final OAuth callback cutover are in progress.

RAG and semantic retrieval capability

Planned

Not yet implemented; this is the next major capability block.

Custom LLM fine-tuning over tenant data

Planned

Not started and intentionally deferred until RAG + evaluation baseline is stable.

Tech Stack and Why We Use It

Category	Tool	Version	Reason
Frontend	Next.js	16.1.6	Fast product UI delivery, routing, and scalable component architecture.
Frontend	React	19.2.3	Reusable dashboard components and modal workflows.
Frontend	Tailwind CSS	4.x	Quick and consistent UI styling for POC velocity.
Backend	FastAPI	0.135.1	Typed APIs, speed, and maintainable service boundaries.
Backend	Uvicorn	0.41.0	ASGI runtime used in Cloud Run container startup.
Backend	Python	3.11	Best ecosystem for OCR, NLP, parsing, and orchestration.
Database	Cloud SQL PostgreSQL	18	Managed PostgreSQL runtime for production-ready GCP deployment.
OCR	pdfplumber + pytesseract	>=0.11.0 / >=0.3.10	Native extraction first, OCR fallback for scanned/image docs.
Doc parsing	python-docx + antiword	>=1.1.2	DOCX and legacy DOC handling for mixed enterprise formats.
LLM	Claude API + Anthropic SDK	Claude 4.x fallback / SDK 0.84.0	Prompt-based structured extraction without custom model training.
Orchestration	LangChain (planned path)	TBD	RAG chains, retrievers, runnables, output parsing standardization.

Phase Plan and Task Breakdown

Phase 1 (Completed): Core Extraction Foundation

Prove end-to-end structured extraction feasibility in a multi-tenant app.

Completed

Multi-format upload support (PDF, DOC, DOCX, images)
Text acquisition layer (native parse + OCR fallback)
LLM-based JSON structuring
Medical/field normalization and DB persistence
UI visualization and extraction results views

Phase 2 (Completed): POC Reliability & UX Controls

Make failures actionable and data manageable by document.

Completed

Per-document `error_message` tracking and UI display
Per-document delete endpoint (file + related DB rows)
Extracted-data deep-linking by document
Status-driven interaction improvements (view error vs view results)

Phase 3 (Completed): Multi-Domain Department Isolation

Make the platform reusable across departments with isolated data/config execution.

Completed

Department field added to document and tenant config models
Department-aware config save/load and upload/list filtering
Single department dropdown with auto-prepopulated template rules
One JSON template file per department category

Phase 4 (In Progress): Cloud Deployment and Production Baseline

Move backend to GCP with secure runtime configuration and storage.

In progress

Backend deployed on Cloud Run: https://ml-docintel-be-dev-1007650653347.us-central1.run.app/
Runtime env and secret contract finalized (OAuth + Claude + DB)
GCP bootstrap steps documented (Artifact Registry, SQL, bucket, IAM)
Pending: frontend Cloud Run deployment + OAuth redirect URI switch to frontend /auth/callback

Phase 5 (Planned): Evaluation and Quality Baseline

Measure extraction quality before introducing advanced ML complexity.

Planned

Build labeled dataset (starting with current 3 PDF validation set)
Track field-level precision/recall and failure taxonomy
Add confidence thresholds and human-review queue
Add regression tests for prompts and parsing

Phase 6 (Planned): RAG Platform Enablement

Enable retrieval-augmented answering and contextual search.

Planned

Chunking pipeline with metadata (tenant, user, doc type, date)
Embeddings generation and vector indexing
Hybrid retrieval (semantic + metadata filters + reranking)
LangChain runnable chains for Q and A + citations
Answer groundedness checks and citation enforcement

Phase 7 (Planned): Production Hardening

Move from POC behavior to production-grade reliability and security.

Planned

Queue workers (Celery/Redis or equivalent) replacing in-process background jobs
Alembic migrations for schema lifecycle control
Audit logs, RBAC enforcement, and compliance controls
Performance scaling and cost observability dashboards

Phase 8 (Planned, Conditional): Fine-tuning / Custom Training

Only if evaluation proves prompt+RAG ceiling is reached.

Planned

Build high-quality supervised dataset from reviewed ground truth
Run controlled fine-tuning experiments with holdout evaluation
Governance for PII/PHI usage in training data
Deploy only if measurable gains justify complexity and risk

Quick Execution Plan (Small Tasks + Easy Testing)

P0: Go-Live Baseline (Backend on GCP)

Completed

Task: Create Cloud SQL instance, GCS bucket, and deploy service account IAM bindings.

Test: Backend /health endpoint on Cloud Run returns database status ok.

Task: Configure GitHub secrets for OAuth, Claude, DB, and bucket.

Test: GitHub Action deploy succeeds from develop branch without manual edits.

Task: Set exact OAuth redirect URI for deployed frontend callback.

Test: Google sign-in completes without redirect_uri_mismatch.

P0: Frontend Go-Live (Cloud Run + OAuth Cutover)

In progress

Task: Deploy Next.js frontend Cloud Run service using new deploy workflow and Dockerfile.

Test: Frontend URL loads login/dashboard and calls backend via NEXT_PUBLIC_API_URL.

Task: Update Google OAuth Authorized origin + redirect to frontend URL.

Test: OAuth round-trip ends at /auth/callback and sets app auth token.

Task: Update backend GOOGLE_OAUTH_REDIRECT_URI to frontend callback and redeploy backend.

Test: No redirect_uri_mismatch and login succeeds end-to-end.

P1: Stabilize Extraction Quality

In progress

Task: Expand current 3-PDF validation batch into 30-50 labeled documents across departments.

Test: Run extraction and compare expected vs extracted fields for each sample.

Task: Add field-level scoring report endpoint.

Test: Verify precision/recall report is generated per field and per document type.

Task: Capture top 10 failure patterns and add prompt/rule fixes.

Test: Re-run benchmark set and confirm net error count drops release-over-release.

P2: RAG Foundation

Planned

Task: Add chunk table + metadata schema (tenant, doc, section, timestamp).

Test: Validate each uploaded document creates chunks with correct tenant scoping.

Task: Generate embeddings and store in pgvector.

Test: Run similarity query and verify top results come from expected documents.

Task: Build retriever API with metadata filters + top-k.

Test: Use test queries and confirm retrieval excludes other-tenant data.

P2: Grounded QA + Citations

Planned

Task: Create RAG answer chain with strict citation format.

Test: Every answer must include source document/chunk references.

Task: Add unsupported-claim guardrail.

Test: Inject irrelevant queries and verify system responds with uncertainty instead of hallucination.

Task: Track groundedness KPI.

Test: Measure percent answers with valid supporting evidence.

P3: Clinical Progress Analytics

Planned

Task: Implement trend engine for repeated parameters over time.

Test: Graph values across historical reports and verify trend direction logic.

Task: Add target-to-achieve recommendation rules.

Test: For each severity class (normal/borderline/high/critical), validate expected target text appears.

Task: Enable doctor-ready report export.

Test: Export includes summary, parameter statuses, trends, and timestamp.

P4: Production Hardening

Planned

Task: Move background processing to queue workers.

Test: Upload burst test and confirm no dropped jobs on API restart.

Task: Add Alembic migrations and release checklist.

Test: Run fresh environment migration from zero to latest successfully.

Task: Add observability for latency/cost/errors.

Test: Dashboard shows per-stage timings and failure rates.

RAG and ML Guidance

Correct current decision

Do not start custom LLM training yet. First stabilize extraction quality and introduce RAG.

RAG priority order

Ingestion -> chunking -> embeddings -> vector store -> retrieval -> rerank -> grounded answer with citations.

Fine-tuning trigger

Only after evaluation proves prompt + RAG cannot meet required accuracy KPIs.

Next Execution Priorities

Priority 1: Quality benchmark

Build labeled dataset and baseline metrics for extraction reliability.

Priority 2: Vector strategy

Use PostgreSQL + pgvector first for operational simplicity, add dedicated vector DB only if scale demands it.

Priority 3: Pipeline hardening

Move background work to queue workers and introduce full migration/versioning discipline.

USP KPI Panel (Track and Report)

Extraction Accuracy

Field-level precision/recall by document type and tenant.

Retrieval Quality

Recall@k and nDCG for semantic retrieval relevance.

Groundedness

Percent of answers with valid supporting citations.

Human Correction Rate

Manual override frequency and trend over time.

Turnaround Time

Document-to-structured-output SLA performance.

Cost per Document

OCR + LLM + retrieval cost observability by tenant.

Operational Readiness Notes

Process mode

POC-ready; production queueing and retries are still pending.

Data model

Tenant-scoped document persistence with explicit error and delete lifecycle support.

Compliance path

Need audit logs, stronger RBAC policies, and data governance controls for production environments.

Updated roadmap aligned with your POC proposal and future AI/ML expansion