Projects

Real repos behind real platform decisions.

This page is built for developers who want more than a project title. Each section explains what was built, why it matters, and which repositories are worth opening first.

Flagship case study

Enterprise Data Platform on AWS

This is the strongest body of work on the site so far. It is a multi-repo platform built around an e-commerce source system, CDC ingestion, Bronze/Silver/Gold data shaping, analytics serving, and disciplined session-based operations.

  • PostgreSQL plus AWS DMS landing raw CDC files into Bronze S3
  • Six Glue PySpark jobs reconciling raw changes into current-state Silver tables
  • dbt transformations building the Gold analytics layer on Athena
  • Analytics Agent running on ECS Fargate for browser and Slack access
  • GitHub Actions session orchestrator for startup, use, and teardown
  • Terraform-managed infrastructure across networking, IAM, storage, serving, and monitoring
Good for platform engineers

Multi-repo architecture with real boundaries

Study how infrastructure, pipeline logic, orchestration, and analytics serving are split without losing the end-to-end shape.

Good for data engineers

CDC to medallion layers with practical developer workflows

Follow the flow from raw database changes to stakeholder-ready outputs, with validation, quarantine handling, and cost-aware operations.

Public repos worth opening

Each one teaches a different part of the system.

terraform-platform-infra-live

Private network, data lake buckets, IAM, serving infrastructure, monitoring, and the full AWS platform skeleton.

Terraform Network and IAM Serving layer
Open repo

platform-glue-jobs

PySpark jobs and shared libraries for CDC reconciliation, Silver modeling, freshness metrics, and quarantine handling.

PySpark CDC Data quality
Open repo

platform-dbt-analytics

dbt transformations shaping the Gold analytics layer from Silver inputs, with a cleaner business-facing serving contract.

dbt Athena Gold layer
Open repo

platform-analytics-agent

Natural-language analytics over Gold data with SQL guardrails, charts, PDF reports, and stakeholder-friendly answers.

FastAPI Streamlit NL to SQL
Open repo

platform-session-orchestrator

GitHub Actions workflows that start a working platform session and destroy it afterwards to keep costs controlled.

GitHub Actions Lifecycle Cost aware
Open repo

platform-orchestration-mwaa-airflow

Airflow DAGs for the same pipeline when a visual orchestration path is needed in MWAA.

Airflow MWAA DAG orchestration
Open repo

GCP translation

Private GCP rebuild with the same enterprise standards.

The GCP work stays private because it is the active internal workspace where the architecture is still being shaped. The point is not to make a superficial copy. The point is to deeply learn how the same platform ideas translate into GCP: organization and folder hierarchy, project-per-environment design, Workload Identity Federation, BigLake governance, BigQuery serving, and cleaner internal skills.

Current private repos

`platform-docs-gcp`, `terraform-bootstrap-gcp`, `terraform-foundation-gcp`

These are the working repos building the private GCP platform foundation from the ground up.

Other projects

Additional work across other stacks and warehouse shapes.

01

Airflow, dbt, BigQuery, and GCS healthcare pipeline

A more focused GCP analytics pipeline outside the private enterprise rebuild.

Open repo
02

Real estate valuation on Snowflake and dbt

A different warehouse shape that still reflects the same interest in clean modeling and business-facing outputs.

Open repo
03

Databricks real estate pipeline

A medallion architecture implementation in a Databricks shape, useful for comparing platform thinking across ecosystems.

Open repo