Integrated Large Language Models and Vision-Language Models into production document extraction system, architecting end-to-end AI solution serving enterprise clients.
Management requested unrealistic requirements (customer-trainable AI, complex error management). Needed to process 1.2M+ multi-page documents monthly with cost-effective solution deployable at scale. Training data from user inputs had ~3% error rate.
Led team of 4 engineers in architecting VLM-based extraction pipeline. Designed self-validation training approach by training model, then evaluating on training set to find discrepancies between model predictions and user behavior. Built custom labeling tool to correct data inconsistencies. Advocated for requirement changes to deliver feasible AI solution. Architected scalable infrastructure on Azure GPU VMs with 2x A100 cards. Integrated vLLM for efficient model serving.
Deployed to production in 2025, serving all customers. Processes 1.2 million multi-page documents monthly on 2x A100 GPUs. Achieved 30-70% error reduction across different fields compared to manual clerk processing, with higher automation rate by handling documents clerks would skip. Generated high-quality training data from company's document archive. Project delivered despite unrealistic initial requirements and management challenges.