Developed a high-throughput automation engine for mPower to digitize Cambridge curriculum at scale. Architected an end-to-end extraction-to-DB pipeline with AI-driven quality control.
Situation
Manual entry of Cambridge past paper questions was slow and error-prone, bottlenecking content scaling.
Task
•Designed and rapidly developed a multi-stage pipeline: PDF extraction (OCR/Vision) -> CDN asset mapping -> PostgreSQL import.
•Implemented LLM-based automated QC for question verification, explanation generation, and metadata tagging.
Result
•Scaled the database to 50,000+ AI-verified MCQs and flashcards with 99% automation.
•Transformed content operations from high-latency manual entry to near-instant digital ingestion.
My Contribution
•Designed end-to-end extraction-to-sync workflow for exam content.
•Integrated multiple LLMs for pedagogical content generation, formatting, and quality control.
•Rapidly prototyped and deployed the core automation engine.
Key Results
< 5 Minutes
Workflow Time Reduction
Reduced A/O-level 40-question content generation from 4-5 hours (2 executives) to under 5 minutes with zero human intervention; consolidated a 4-platform process into one agentic workflow.
Question to Flashcard
End-to-End Automation
Automated questionnaire extraction and image/diagram extraction, and generated right/wrong answer explanations plus practice flashcards.