Five projects across health research, energy, and business analytics.
All built in production at UCT, Amandla Africa Energy, and related organisations.
All built with real data, real tools, real constraints.
Filter:
01
Production
PythonSQLHealth ResearchPandasPostgreSQLExcel
Paediatric Clinical Data Pipeline
UCT Department of Paediatrics — Cape Town
The Problem
UCT researchers were collecting paediatric health data across multiple clinical
systems — each with different formats, field names, and standards. No unified pipeline existed.
Merging datasets for publication meant hours of manual reconciliation, with errors slipping through into
research outputs.
What I Built
A Python-based validation and cleaning pipeline that ingests
multi-source clinical data, standardises schema, detects missing values and duplicates, applies
longitudinal consistency checks, and outputs publication-ready datasets. Paired with structured Excel
reporting suites for non-technical researchers.
500,000+ records processed with 100% data integrity. Datasets fed
directly into peer-reviewed academic publications. Reporting time for researchers cut by over
60%. Zero data-related revision requests from journal reviewers.
Amandla's operations team managed 10M+ energy data points spread across
disconnected spreadsheets and reporting formats. Planning decisions were made on stale, manually compiled
data — introducing lag and compounding errors at scale.
What I Built
A Tableau dashboard connected to a centralised SQL data model, surfacing
real-time resource utilisation, demand forecasting, and anomaly detection across multiple energy sites.
Automated weekly Excel reports via VBA macros replaced 5 hours of manual compilation per week.
Dashboard adopted as the primary tool for all operational planning.
Manual reporting time cut by 40%. Anomaly detection surfaced 3 under-performing sites
that had gone undetected for 2+ months — enabling corrective action worth significant resource savings.
Using WHO's Global Health Observatory dataset, this project analyses
under-5 child mortality trends across Sub-Saharan Africa from 2000–2023, correlating outcomes with
healthcare access, GDP, and maternal education data from World Bank sources.
The Approach
Multi-source data merge in Python, exploratory analysis in R, and an interactive
Plotly dashboard that lets users filter by country, year, and indicator. Regression modelling
to identify the strongest predictors of mortality reduction.
Why It Matters
This project demonstrates the full analyst stack: data wrangling, statistical
analysis, visualisation, and storytelling with public health data — the exact workflow used in
global health organisations like PATH, WHO, and UNICEF.
Expected Output
Interactive Plotly dashboard, Jupyter notebook with documented methodology, written
findings summary. All code published on GitHub.
A production-style data warehouse built using dbt (data build tool) on
top of a synthetic healthcare claims dataset — modelling patient journeys, claim outcomes, and provider
performance metrics across a simulated insurance environment.
The Approach
Raw claims data → staging models → intermediate joins → mart-layer
tables ready for BI consumption. Full dbt documentation, tests, and lineage graphs. Deployed on
BigQuery with a Power BI layer on top.
Why It Matters
dbt is the industry standard for analytics engineering. This project signals readiness
for senior analyst and analytics engineer roles at healthtech companies (Sanlam,
Discovery Health, CCHP) and remote data teams globally.
Expected Output
Full dbt project on GitHub with staging/intermediate/mart layers, test coverage report,
dbt docs site, and a Power BI summary dashboard.
Using the MIMIC-III open clinical dataset (MIT), build a machine
learning model that predicts 30-day hospital readmission risk from patient demographics, diagnoses, and
prior admission history — a high-value problem in health operations.
The Approach
Feature engineering on clinical data, XGBoost classifier with
hyperparameter tuning, SHAP values for model explainability, and ROC/AUC evaluation. Focused on
interpretability — a model clinicians can actually trust and act on.
Why It Matters
Predictive analytics in healthcare is the frontier of data science work. This project
demonstrates applied ML in a clinical context — exactly what health data teams at
Discovery Health, Medidata, and global NGOs are looking for in senior hires.
Expected Output
Fully documented Jupyter notebook, model card, SHAP visualisations, and a write-up
explaining findings to a non-technical audience. GitHub published.