Professional ProjectAI/MLDeployed in Production

LLM/VLM-based Document Extraction System

Timeline: 2023 - 2026

Role: Technical Lead

Overview

Integrated Large Language Models and Vision-Language Models into production document extraction system, architecting end-to-end AI solution serving enterprise clients.

Challenge

Management requested unrealistic requirements (customer-trainable AI, complex error management). Needed to process 1.2M+ multi-page documents monthly with cost-effective solution deployable at scale. Training data from user inputs had ~3% error rate.

Solution & Approach

Led team of 4 engineers in architecting VLM-based extraction pipeline. Designed self-validation training approach by training model, then evaluating on training set to find discrepancies between model predictions and user behavior. Built custom labeling tool to correct data inconsistencies. Advocated for requirement changes to deliver feasible AI solution. Architected scalable infrastructure on Azure GPU VMs with 2x A100 cards. Integrated vLLM for efficient model serving.

Outcome & Impact

Deployed to production in 2025, serving all customers. Processes 1.2 million multi-page documents monthly on 2x A100 GPUs. Achieved 30-70% error reduction across different fields compared to manual clerk processing, with higher automation rate by handling documents clerks would skip. Generated high-quality training data from company's document archive. Project delivered despite unrealistic initial requirements and management challenges.

Technologies Used

vLLMPyTorchDockerKubernetesPythonFastAPIAzure GPU VMs (A100)TerraformCustom VLM

Key Highlights

✓Led team of 3 engineers (2 backend, 1 frontend) in architecting VLM-based extraction pipeline
✓Processes 1.2M+ documents monthly on 2x A100 GPUs
✓Achieved 30-70% error reduction across fields vs manual clerk processing
✓Higher automation rate than human clerks who skip uncertain documents
✓Designed self-validation training approach to improve model accuracy
✓Built custom labeling tool to fix ~3% error rate in user-generated training data
✓Deployed to production serving all enterprise customers

View All Projects