Thinking Machines Data Science
Data Engineer II | Aug 2024 - Present
A large Singaporean investment holding company
Investment data platform
- Owned backend feature development across more than 10 interconnected microservices powering deal discovery, due diligence, and portfolio monitoring workflows.
- Built backend services and data pipelines with FastAPI, Dagster, Kubernetes, and Snowflake, processing terabytes of data from vendor platforms including Sustainalytics, PitchBook, Bloomberg, and MSCI.
- Improved platform reliability and observability through production monitoring workflows using Kibana and Grafana.
Enterprise document intelligence platform
- Iterated on Snowflake Cortex AI agent workflows with the Knowledge Graph team so investment officers could extract data from long-form documents, run contextual queries, and analyze portfolio information at scale.
- Owned agent tooling, prompt engineering, evaluation, workflow tuning, and production incident triage across the document intelligence stack.
- Partnered with a roughly 15-person cross-functional team spanning frontend, backend, LLM workflows, MLOps, DevOps, and QA.
A major Philippine bank
Enterprise data products
- Led the data products workstream in a year-long engagement, partnering with C-suite leaders, directors, and business units to define and ship priority data products on Azure Databricks.
- Designed and productionized a daily Single Customer View pipeline that consolidated roughly 15 million records from four enterprise systems into about 7 million unique customer keys used across more than 10 business units.
- Built a probabilistic record-linkage engine with Splink, surfaced 748,000 candidate duplicate pairs missed by exact matching, and designed a five-tier confidence framework validated at more than 99.9 percent accuracy.
- Cut Single Customer View runtime from more than four hours to under one hour and co-designed a Next Best Product recommendation pipeline covering 12 product types and four customer segments.
A Philippine education enterprise
Student-at-risk prediction platform
- Spearheaded the infrastructure and ingestion workstream for a student-at-risk prediction platform on Azure Databricks.
- Provisioned and managed platform infrastructure with Terraform, built API ingestion pipelines across three source systems, and migrated Excel-based linear regression models into production Databricks jobs.
- Accelerated team delivery by integrating AI coding agents into the development workflow using custom agent skills, hooks, and command layers.
A large Singaporean enterprise
Data ingestion and transformation
- Led the ingestion and transformation workstream, building end-to-end pipelines from four source systems in SQL Server and MongoDB into BigQuery.
- Designed surrogate key strategies to resolve identifier collisions between member and visitor systems and modeled production reporting tables for attendance analytics.
- Resolved memory failures in legacy Airflow DAGs by implementing chunked CSV processing before production rollout.
A Philippine airline
Enablement curriculum design
- Designed and delivered a six-course enablement curriculum covering Python and Power BI tracks from beginner to advanced levels in four days.
Internal Contributions
- Authored a standardized data ingestion scoping framework RFC for the engineering consulting team.
- Created post-exam study guides that supported certification preparation across the company.
- Built a proof-of-concept integration connecting ChatGPT to Databricks through an MCP server.