Dev Status and Delivery Roadmap
Project health against your POC proposal, with phase-based execution plan for RAG and advanced ML.
Track Assessment
Yes, on right track
Current Mode
POC extraction + structuring
RAG Status
Planned next phase
LLM Training Status
Not training yet (correct for POC)
3 Core Features (Project USP)
1) Document Reader / Extractor
Upload any supported file, run OCR/parsing, extract structured fields, validate output, and store tenant-scoped records.
2) RAG Pipeline
Reader/loader, chunking, embeddings, vector DB indexing, retrieval + rerank, and citation-grounded LLM responses.
3) ML Training Lifecycle
Benchmarking, correction-driven dataset growth, evaluation loops, and optional fine-tuning when prompt+RAG plateau is reached.
Project USP: RAG + ML Intelligence Layer
Core platform value is document extraction and structuring. Competitive differentiation comes from tenant-aware RAG, grounded answers with citations, and measurable ML quality improvement loops.
Tenant-aware RAG
Semantic retrieval constrained by tenant/user authorization and metadata filtering.
Grounded AI responses
Answers linked to retrieved source chunks with citations to minimize hallucinations.
Configurable by domain
Medical, billing, finance, and utility extraction rules can run on one common platform.
ML quality loop
Confidence tracking, human corrections, and benchmark-driven continuous improvements.
Are We Correct and On the Right Track?
Core POC objective alignment
CompletedUpload -> OCR/parsing -> LLM structuring -> validation -> DB -> dashboard is implemented and running.
Configurable extraction by department
CompletedSingle-dropdown department model is live, with one JSON template per department and prepopulated field rules.
Operational reliability for POC
In progressError tracking and delete flows are done; queue workers, retries, and migration discipline are next.
Cloud deployment readiness (GCP)
In progressBackend Cloud Run is live on March 5, 2026; frontend Cloud Run deployment and final OAuth callback cutover are in progress.
RAG and semantic retrieval capability
PlannedNot yet implemented; this is the next major capability block.
Custom LLM fine-tuning over tenant data
PlannedNot started and intentionally deferred until RAG + evaluation baseline is stable.
Tech Stack and Why We Use It
| Category | Tool | Version | Reason |
|---|---|---|---|
| Frontend | Next.js | 16.1.6 | Fast product UI delivery, routing, and scalable component architecture. |
| Frontend | React | 19.2.3 | Reusable dashboard components and modal workflows. |
| Frontend | Tailwind CSS | 4.x | Quick and consistent UI styling for POC velocity. |
| Backend | FastAPI | 0.135.1 | Typed APIs, speed, and maintainable service boundaries. |
| Backend | Uvicorn | 0.41.0 | ASGI runtime used in Cloud Run container startup. |
| Backend | Python | 3.11 | Best ecosystem for OCR, NLP, parsing, and orchestration. |
| Database | Cloud SQL PostgreSQL | 18 | Managed PostgreSQL runtime for production-ready GCP deployment. |
| OCR | pdfplumber + pytesseract | >=0.11.0 / >=0.3.10 | Native extraction first, OCR fallback for scanned/image docs. |
| Doc parsing | python-docx + antiword | >=1.1.2 | DOCX and legacy DOC handling for mixed enterprise formats. |
| LLM | Claude API + Anthropic SDK | Claude 4.x fallback / SDK 0.84.0 | Prompt-based structured extraction without custom model training. |
| Orchestration | LangChain (planned path) | TBD | RAG chains, retrievers, runnables, output parsing standardization. |
Phase Plan and Task Breakdown
Phase 1 (Completed): Core Extraction Foundation
Prove end-to-end structured extraction feasibility in a multi-tenant app.
- Multi-format upload support (PDF, DOC, DOCX, images)
- Text acquisition layer (native parse + OCR fallback)
- LLM-based JSON structuring
- Medical/field normalization and DB persistence
- UI visualization and extraction results views
Phase 2 (Completed): POC Reliability & UX Controls
Make failures actionable and data manageable by document.
- Per-document `error_message` tracking and UI display
- Per-document delete endpoint (file + related DB rows)
- Extracted-data deep-linking by document
- Status-driven interaction improvements (view error vs view results)
Phase 3 (Completed): Multi-Domain Department Isolation
Make the platform reusable across departments with isolated data/config execution.
- Department field added to document and tenant config models
- Department-aware config save/load and upload/list filtering
- Single department dropdown with auto-prepopulated template rules
- One JSON template file per department category
Phase 4 (In Progress): Cloud Deployment and Production Baseline
Move backend to GCP with secure runtime configuration and storage.
- Backend deployed on Cloud Run: https://ml-docintel-be-dev-1007650653347.us-central1.run.app/
- Runtime env and secret contract finalized (OAuth + Claude + DB)
- GCP bootstrap steps documented (Artifact Registry, SQL, bucket, IAM)
- Pending: frontend Cloud Run deployment + OAuth redirect URI switch to frontend /auth/callback
Phase 5 (Planned): Evaluation and Quality Baseline
Measure extraction quality before introducing advanced ML complexity.
- Build labeled dataset (starting with current 3 PDF validation set)
- Track field-level precision/recall and failure taxonomy
- Add confidence thresholds and human-review queue
- Add regression tests for prompts and parsing
Phase 6 (Planned): RAG Platform Enablement
Enable retrieval-augmented answering and contextual search.
- Chunking pipeline with metadata (tenant, user, doc type, date)
- Embeddings generation and vector indexing
- Hybrid retrieval (semantic + metadata filters + reranking)
- LangChain runnable chains for Q and A + citations
- Answer groundedness checks and citation enforcement
Phase 7 (Planned): Production Hardening
Move from POC behavior to production-grade reliability and security.
- Queue workers (Celery/Redis or equivalent) replacing in-process background jobs
- Alembic migrations for schema lifecycle control
- Audit logs, RBAC enforcement, and compliance controls
- Performance scaling and cost observability dashboards
Phase 8 (Planned, Conditional): Fine-tuning / Custom Training
Only if evaluation proves prompt+RAG ceiling is reached.
- Build high-quality supervised dataset from reviewed ground truth
- Run controlled fine-tuning experiments with holdout evaluation
- Governance for PII/PHI usage in training data
- Deploy only if measurable gains justify complexity and risk
Quick Execution Plan (Small Tasks + Easy Testing)
P0: Go-Live Baseline (Backend on GCP)
Task: Create Cloud SQL instance, GCS bucket, and deploy service account IAM bindings.
Test: Backend /health endpoint on Cloud Run returns database status ok.
Task: Configure GitHub secrets for OAuth, Claude, DB, and bucket.
Test: GitHub Action deploy succeeds from develop branch without manual edits.
Task: Set exact OAuth redirect URI for deployed frontend callback.
Test: Google sign-in completes without redirect_uri_mismatch.
P0: Frontend Go-Live (Cloud Run + OAuth Cutover)
Task: Deploy Next.js frontend Cloud Run service using new deploy workflow and Dockerfile.
Test: Frontend URL loads login/dashboard and calls backend via NEXT_PUBLIC_API_URL.
Task: Update Google OAuth Authorized origin + redirect to frontend URL.
Test: OAuth round-trip ends at /auth/callback and sets app auth token.
Task: Update backend GOOGLE_OAUTH_REDIRECT_URI to frontend callback and redeploy backend.
Test: No redirect_uri_mismatch and login succeeds end-to-end.
P1: Stabilize Extraction Quality
Task: Expand current 3-PDF validation batch into 30-50 labeled documents across departments.
Test: Run extraction and compare expected vs extracted fields for each sample.
Task: Add field-level scoring report endpoint.
Test: Verify precision/recall report is generated per field and per document type.
Task: Capture top 10 failure patterns and add prompt/rule fixes.
Test: Re-run benchmark set and confirm net error count drops release-over-release.
P2: RAG Foundation
Task: Add chunk table + metadata schema (tenant, doc, section, timestamp).
Test: Validate each uploaded document creates chunks with correct tenant scoping.
Task: Generate embeddings and store in pgvector.
Test: Run similarity query and verify top results come from expected documents.
Task: Build retriever API with metadata filters + top-k.
Test: Use test queries and confirm retrieval excludes other-tenant data.
P2: Grounded QA + Citations
Task: Create RAG answer chain with strict citation format.
Test: Every answer must include source document/chunk references.
Task: Add unsupported-claim guardrail.
Test: Inject irrelevant queries and verify system responds with uncertainty instead of hallucination.
Task: Track groundedness KPI.
Test: Measure percent answers with valid supporting evidence.
P3: Clinical Progress Analytics
Task: Implement trend engine for repeated parameters over time.
Test: Graph values across historical reports and verify trend direction logic.
Task: Add target-to-achieve recommendation rules.
Test: For each severity class (normal/borderline/high/critical), validate expected target text appears.
Task: Enable doctor-ready report export.
Test: Export includes summary, parameter statuses, trends, and timestamp.
P4: Production Hardening
Task: Move background processing to queue workers.
Test: Upload burst test and confirm no dropped jobs on API restart.
Task: Add Alembic migrations and release checklist.
Test: Run fresh environment migration from zero to latest successfully.
Task: Add observability for latency/cost/errors.
Test: Dashboard shows per-stage timings and failure rates.
RAG and ML Guidance
Correct current decision
Do not start custom LLM training yet. First stabilize extraction quality and introduce RAG.
RAG priority order
Ingestion -> chunking -> embeddings -> vector store -> retrieval -> rerank -> grounded answer with citations.
Fine-tuning trigger
Only after evaluation proves prompt + RAG cannot meet required accuracy KPIs.
Next Execution Priorities
Priority 1: Quality benchmark
Build labeled dataset and baseline metrics for extraction reliability.
Priority 2: Vector strategy
Use PostgreSQL + pgvector first for operational simplicity, add dedicated vector DB only if scale demands it.
Priority 3: Pipeline hardening
Move background work to queue workers and introduce full migration/versioning discipline.
USP KPI Panel (Track and Report)
Extraction Accuracy
Field-level precision/recall by document type and tenant.
Retrieval Quality
Recall@k and nDCG for semantic retrieval relevance.
Groundedness
Percent of answers with valid supporting citations.
Human Correction Rate
Manual override frequency and trend over time.
Turnaround Time
Document-to-structured-output SLA performance.
Cost per Document
OCR + LLM + retrieval cost observability by tenant.
Operational Readiness Notes
Process mode
POC-ready; production queueing and retries are still pending.
Data model
Tenant-scoped document persistence with explicit error and delete lifecycle support.
Compliance path
Need audit logs, stronger RBAC policies, and data governance controls for production environments.
Updated roadmap aligned with your POC proposal and future AI/ML expansion