$ muhammad.husnain @portfolio
location Lahore, PK · UTC+5
Status Available for new opportunities
Availability PST / EST · GMT / BST · CET / CEST

Muhammad Husnain

Senior Data Engineer
building multi-tenant data
platforms on GCP & AWS.

I design and operate ingestion, orchestration, and warehousing systems for marketing analytics, geospatial ML pipelines, and SaaS data products. 6+ years across five companies. 200+ production pipelines shipped. Comfortable owning the layer from API ingestion through dbt models to Terraform-managed infrastructure.

// pipelines.shipped
200+
// ad_spend.under_analytics
$50M+
// infra.cost_reduction
~58%
// refresh.min_cadence
15 min
// years.industry
6+

About

I work at the boundary between data engineering and infrastructure — the part where pipelines need to be reliable, cheap to run, and observable by the people downstream from them.

At Making Science I led the build-out of a multi-channel marketing platform pulling from 15+ ad APIs into per-client BigQuery projects. The architecture was hub-and-spoke: a shared ingestion layer feeding isolated customer warehouses, orchestrated by Airflow on Cloud Composer, transformed through layered dbt models. The interesting problems were rarely about ingestion itself — they were about cost, tenancy, and standing up new customers without engineering toil. The Terraform module library I built brought new-pipeline deployment from an hour to roughly fifteen minutes.

Before that I worked on geospatial ML pipelines at LiveEO (satellite imagery, Anyscale, spot EC2), and AWS warehousing at Ryan-Miranda. My early years were spent building a low-code data platform on Flask, Nifi, and a stack of custom connectors — which is where I learned that the boring parts of data work (auth, retries, schema drift, idempotency) are usually the ones that matter most.

note Currently studying for the GCP Associate Data Practitioner cert and open to senior data engineering roles — remote, or relocation to the right team.

Experience

Most recent first. Click headings to expand details.

2023.08 → 2026.03 Making Science Senior Data Engineer
  • Designed a multi-channel marketing data platform with a team of 6, integrating 15+ ad sources (CM360, DV360, Google Ads, Facebook Ads, Criteo, Snapchat) into BigQuery. Supports analytics on $50M+ ad spend across 2,000+ campaigns and 400+ clients with cadences from daily to 15 minutes.
  • Architected a hub-and-spoke BigQuery model with centralized ingestion and per-client project isolation; built layered dbt models for spend, impressions, clicks, conversions, and ROAS.
  • Built and orchestrated 20+ Airflow DAGs on Cloud Composer with dependency management, retry strategies, and SLA monitoring.
  • Developed 200+ Cloud Scheduler and Cloud Functions workflows for parameterized sub-daily processing.
  • Migrated legacy Dataproc workloads to serverless Cloud Functions across 20+ customers, cutting infra costs ~58% (~$12 → ~$5 per customer/day).
  • Led company-wide Terraform adoption — reusable modules for 8 GCP services (Pub/Sub, Cloud Functions, Cloud Run, Cloud Scheduler, BigQuery, IAM, Secret Manager, GCS). Reduced pipeline deployment from ~1 hour to 10–15 minutes.
2023.01 → 2023.08 LiveEO Data Engineer
  • Built 8 production Python workflows consolidating data across AWS environments and third-party providers — feeding ML teams for vegetation management, infrastructure monitoring, and deforestation models. 20 GB to 100+ GB per run.
  • Worked with the Anyscale team to design distributed compute on a mix of on-demand and spot EC2, cutting monthly compute by ~80% (~$500 → <$100/month).
  • Built CI/CD pipelines in GitLab CI for three ML teams, deploying containerized workflows to AWS (S3, ECR) via Prefect.
  • Owned end-to-end data delivery SLAs for three downstream ML teams.
2022.06 → 2023.01 Ryan-Miranda Partners Data Engineer
  • ETL pipelines processing 10–50+ GB of customer behavioral and transactional data from GA4 BigQuery exports, Postgres, and MySQL — cadences from 30 minutes to daily.
  • AWS-based ingestion and transformation with Python and Amazon MWAA, replacing manual spreadsheet reporting across 3 client environments.
  • Provisioned AWS infra (EC2, S3, RDS, Redshift, Client VPN) via Terraform, reducing deployment from ~2 days to under 2 hours.
  • Built and optimized Redshift transformation layers for analytics and dashboards.
2020.12 → 2022.05 Youpal Group Data Engineer
  • ETL pipelines in Python and SQL powering data and ML workflows for an AI job-matching platform — 50+ businesses, 10k+ candidates across Scandinavia.
  • Daily incremental ingestion of candidate profiles, CVs, job descriptions, and CRM data; 100+ GB historical dataset processed.
  • GCP pipelines integrating 10+ third-party video data sources for object tracking and logo detection across live video streams.
  • Migrated on-premises datasets to AWS and GCP, reducing maintenance overhead.
2019.06 → 2020.11 Binary Tech Data Engineer
  • Flask APIs powering the platform frontend — 20+ client companies configuring data sources, building workflows, and triggering ML pipelines through a low-code interface.
  • Custom connectors for ingesting customer data from PostgreSQL, SQL Server, REST APIs, and third-party platforms.
  • Drag-and-drop ML model integrations for the visual workflow builder.
  • Contributed to the platform's visual ETL component built on Apache Nifi as part of a 5-person scrum team.

Selected projects

Production systems I designed, built, and operated.

01

Multi-tenant marketing analytics platform

Hub-and-spoke BigQuery architecture ingesting 15+ ad APIs into per-client projects. Cloud Functions and Pub/Sub for serverless ingestion, Cloud Composer for orchestration, dbt for the transformation layer. Serves 400+ clients with cadences down to 15 minutes.

stack
BigQuery · Cloud Composer · Cloud Functions · Pub/Sub · dbt · Terraform
scale
$50M+ ad spend · 2,000+ campaigns · 400+ clients
role
Architecture, ingestion layer, IaC modules
02

Reusable Terraform module library

Company-wide modules for 8 GCP services (Pub/Sub, Cloud Functions, Cloud Run, Cloud Scheduler, BigQuery, IAM, Secret Manager, GCS). Standardized naming, IAM, and observability. Cut new-pipeline deployment from ~1 hour to 10–15 minutes.

stack
Terraform · GCP · IAM · CI/CD
impact
~4–6× faster deploys, eliminated manual provisioning errors
role
Lead designer, internal advocate, reviewer
03

Dataproc → Serverless migration

Migrated 20+ customers from Dataproc clusters to Cloud Functions, removing always-on cluster cost and improving cold-start economics. Per-customer infra cost dropped ~58% (~$12 → ~$5 per day).

stack
Cloud Functions · Python · Pub/Sub · BigQuery
impact
~58% per-customer cost reduction across 20+ tenants
role
Migration design, pilot, rollout
04

Satellite imagery ML pipelines

Distributed compute on Anyscale + EC2 spot/on-demand for vegetation, infrastructure, and deforestation models. 20–100+ GB per run. Spot-heavy scheduling cut monthly compute ~80%.

stack
Anyscale · AWS EC2 · S3 · Prefect · Python
impact
~80% compute cost reduction
role
Pipeline ownership, infra design with Anyscale team
05

Customer analytics warehouse

GA4 + Postgres + MySQL ingestion orchestrated by MWAA into Redshift, with Terraform-managed dev and prod environments. Replaced manual spreadsheet reporting across 3 client environments.

stack
MWAA · Redshift · Terraform · Python
scale
10–50+ GB datasets · 30-min to daily cadences
role
End-to-end build and ownership
06

Visual ML workflow builder

Flask backend, drag-and-drop ML integrations, and an Apache Nifi ETL layer that let 20+ client companies compose pipelines without code.

stack
Flask · Apache Nifi · PostgreSQL · Python
scale
20+ client companies on the platform
role
Backend APIs, connector authoring, ETL layer

Stack & certs

What I reach for daily, and what I keep current.

languages
Python · SQL
data & orchestration
Apache Airflow (Cloud Composer, MWAA) · BigQuery · dbt · Redshift
gcp
Cloud Functions · Cloud Run · Pub/Sub · Cloud Scheduler · Cloud Logging
aws
S3 · Lambda · EC2 · MWAA · RDS · Redshift
infra & devops
Terraform · Docker · GitHub Actions · GitLab CI · Bitbucket Pipelines

Certifications

  • GCP Professional Data Engineer
  • GCP Cloud Digital Leader
  • GCP Generative AI Leader
  • AWS Certified Cloud Practitioner
  • AWS Certified AI Practitioner
  • Azure AI-900 · DP-900 · AZ-900
  • HashiCorp Terraform Associate
  • Snowflake SnowPro Core
degree
B.Sc. Electrical Engineering
institution
University of Engineering and Technology, Lahore
years
2015 — 2019

Get in touch

Best way to reach me is email. I read every message and reply within a day or two.

location Lahore, Pakistan · UTC+5
Availability Open to senior data engineering roles — remote or relocation