location Lahore, PK · UTC+5

Status ● Available for new opportunities

Availability PST / EST · GMT / BST · CET / CEST

Muhammad Husnain

Senior Data Engineer
building multi-tenant data
platforms on GCP & AWS.

I design and operate ingestion, orchestration, and warehousing systems for marketing analytics, geospatial ML pipelines, and SaaS data products. 6+ years across five companies. 200+ production pipelines shipped. Comfortable owning the layer from API ingestion through dbt models to Terraform-managed infrastructure.

→ muhammad.husnain0996@gmail.com ↓ Muhammad Husnain Resume - Senior Data Engineer.pdf ↗ linkedin

// pipelines.shipped

200+

// ad_spend.under_analytics

$50M+

// infra.cost_reduction

~58%

// refresh.min_cadence

15 min

// years.industry

6+

§ 01

About

I work at the boundary between data engineering and infrastructure — the part where pipelines need to be reliable, cheap to run, and observable by the people downstream from them.

At Making Science I led the build-out of a multi-channel marketing platform pulling from 15+ ad APIs into per-client BigQuery projects. The architecture was hub-and-spoke: a shared ingestion layer feeding isolated customer warehouses, orchestrated by Airflow on Cloud Composer, transformed through layered dbt models. The interesting problems were rarely about ingestion itself — they were about cost, tenancy, and standing up new customers without engineering toil. The Terraform module library I built brought new-pipeline deployment from an hour to roughly fifteen minutes.

Before that I worked on geospatial ML pipelines at LiveEO (satellite imagery, Anyscale, spot EC2), and AWS warehousing at Ryan-Miranda. My early years were spent building a low-code data platform on Flask, Nifi, and a stack of custom connectors — which is where I learned that the boring parts of data work (auth, retries, schema drift, idempotency) are usually the ones that matter most.

note Currently studying for the GCP Associate Data Practitioner cert and open to senior data engineering roles — remote, or relocation to the right team.

§ 02

Experience

Most recent first. Click headings to expand details.

2023.08 → 2026.03 Making Science Senior Data Engineer

Designed a multi-channel marketing data platform with a team of 6, integrating 15+ ad sources (CM360, DV360, Google Ads, Facebook Ads, Criteo, Snapchat) into BigQuery. Supports analytics on $50M+ ad spend across 2,000+ campaigns and 400+ clients with cadences from daily to 15 minutes.
Architected a hub-and-spoke BigQuery model with centralized ingestion and per-client project isolation; built layered dbt models for spend, impressions, clicks, conversions, and ROAS.
Built and orchestrated 20+ Airflow DAGs on Cloud Composer with dependency management, retry strategies, and SLA monitoring.
Developed 200+ Cloud Scheduler and Cloud Functions workflows for parameterized sub-daily processing.
Migrated legacy Dataproc workloads to serverless Cloud Functions across 20+ customers, cutting infra costs ~58% (~$12 → ~$5 per customer/day).
Led company-wide Terraform adoption — reusable modules for 8 GCP services (Pub/Sub, Cloud Functions, Cloud Run, Cloud Scheduler, BigQuery, IAM, Secret Manager, GCS). Reduced pipeline deployment from ~1 hour to 10–15 minutes.

2023.01 → 2023.08 LiveEO Data Engineer

Built 8 production Python workflows consolidating data across AWS environments and third-party providers — feeding ML teams for vegetation management, infrastructure monitoring, and deforestation models. 20 GB to 100+ GB per run.
Worked with the Anyscale team to design distributed compute on a mix of on-demand and spot EC2, cutting monthly compute by ~80% (~$500 → <$100/month).
Built CI/CD pipelines in GitLab CI for three ML teams, deploying containerized workflows to AWS (S3, ECR) via Prefect.
Owned end-to-end data delivery SLAs for three downstream ML teams.

2022.06 → 2023.01 Ryan-Miranda Partners Data Engineer

ETL pipelines processing 10–50+ GB of customer behavioral and transactional data from GA4 BigQuery exports, Postgres, and MySQL — cadences from 30 minutes to daily.
AWS-based ingestion and transformation with Python and Amazon MWAA, replacing manual spreadsheet reporting across 3 client environments.
Provisioned AWS infra (EC2, S3, RDS, Redshift, Client VPN) via Terraform, reducing deployment from ~2 days to under 2 hours.
Built and optimized Redshift transformation layers for analytics and dashboards.

2020.12 → 2022.05 Youpal Group Data Engineer

ETL pipelines in Python and SQL powering data and ML workflows for an AI job-matching platform — 50+ businesses, 10k+ candidates across Scandinavia.
Daily incremental ingestion of candidate profiles, CVs, job descriptions, and CRM data; 100+ GB historical dataset processed.
GCP pipelines integrating 10+ third-party video data sources for object tracking and logo detection across live video streams.
Migrated on-premises datasets to AWS and GCP, reducing maintenance overhead.

2019.06 → 2020.11 Binary Tech Data Engineer

Flask APIs powering the platform frontend — 20+ client companies configuring data sources, building workflows, and triggering ML pipelines through a low-code interface.
Custom connectors for ingesting customer data from PostgreSQL, SQL Server, REST APIs, and third-party platforms.
Drag-and-drop ML model integrations for the visual workflow builder.
Contributed to the platform's visual ETL component built on Apache Nifi as part of a 5-person scrum team.

§ 03

Selected projects

Production systems I designed, built, and operated.

01

Multi-tenant marketing analytics platform

Hub-and-spoke BigQuery architecture ingesting 15+ ad APIs into per-client projects. Cloud Functions and Pub/Sub for serverless ingestion, Cloud Composer for orchestration, dbt for the transformation layer. Serves 400+ clients with cadences down to 15 minutes.

stack: BigQuery · Cloud Composer · Cloud Functions · Pub/Sub · dbt · Terraform
scale: $50M+ ad spend · 2,000+ campaigns · 400+ clients
role: Architecture, ingestion layer, IaC modules

02

Reusable Terraform module library

Company-wide modules for 8 GCP services (Pub/Sub, Cloud Functions, Cloud Run, Cloud Scheduler, BigQuery, IAM, Secret Manager, GCS). Standardized naming, IAM, and observability. Cut new-pipeline deployment from ~1 hour to 10–15 minutes.

stack: Terraform · GCP · IAM · CI/CD
impact: ~4–6× faster deploys, eliminated manual provisioning errors
role: Lead designer, internal advocate, reviewer

03

Dataproc → Serverless migration

Migrated 20+ customers from Dataproc clusters to Cloud Functions, removing always-on cluster cost and improving cold-start economics. Per-customer infra cost dropped ~58% (~$12 → ~$5 per day).

stack: Cloud Functions · Python · Pub/Sub · BigQuery
impact: ~58% per-customer cost reduction across 20+ tenants
role: Migration design, pilot, rollout

04

Satellite imagery ML pipelines

Distributed compute on Anyscale + EC2 spot/on-demand for vegetation, infrastructure, and deforestation models. 20–100+ GB per run. Spot-heavy scheduling cut monthly compute ~80%.

stack: Anyscale · AWS EC2 · S3 · Prefect · Python
impact: ~80% compute cost reduction
role: Pipeline ownership, infra design with Anyscale team

05

Customer analytics warehouse

GA4 + Postgres + MySQL ingestion orchestrated by MWAA into Redshift, with Terraform-managed dev and prod environments. Replaced manual spreadsheet reporting across 3 client environments.

stack: MWAA · Redshift · Terraform · Python
scale: 10–50+ GB datasets · 30-min to daily cadences
role: End-to-end build and ownership

06

Visual ML workflow builder

Flask backend, drag-and-drop ML integrations, and an Apache Nifi ETL layer that let 20+ client companies compose pipelines without code.

stack: Flask · Apache Nifi · PostgreSQL · Python
scale: 20+ client companies on the platform
role: Backend APIs, connector authoring, ETL layer

§ 04

Stack & certs

What I reach for daily, and what I keep current.

languages

Python · SQL

data & orchestration

Apache Airflow (Cloud Composer, MWAA) · BigQuery · dbt · Redshift

gcp

Cloud Functions · Cloud Run · Pub/Sub · Cloud Scheduler · Cloud Logging

aws

S3 · Lambda · EC2 · MWAA · RDS · Redshift

infra & devops

Terraform · Docker · GitHub Actions · GitLab CI · Bitbucket Pipelines

Certifications

GCP Professional Data Engineer
GCP Cloud Digital Leader
GCP Generative AI Leader
AWS Certified Cloud Practitioner
AWS Certified AI Practitioner
Azure AI-900 · DP-900 · AZ-900
HashiCorp Terraform Associate
Snowflake SnowPro Core

degree

B.Sc. Electrical Engineering

institution

University of Engineering and Technology, Lahore

years

2015 — 2019

§ 05

Get in touch

Best way to reach me is email. I read every message and reply within a day or two.

Email muhammad.husnain0996@gmail.com

LinkedIn linkedin.com/in/mohusnain

Resume Muhammad Husnain Resume - Senior Data Engineer.pdf (download)

location Lahore, Pakistan · UTC+5

Availability ● Open to senior data engineering roles — remote or relocation

Senior Data Engineer building multi-tenant data platforms on GCP & AWS.