Projects — Pride Chamisa

01

Production

Python SQL Health Research Pandas PostgreSQL Excel

Paediatric Clinical
Data Pipeline

UCT Department of Paediatrics — Cape Town

The Problem

UCT researchers were collecting paediatric health data across multiple clinical systems — each with different formats, field names, and standards. No unified pipeline existed. Merging datasets for publication meant hours of manual reconciliation, with errors slipping through into research outputs.

What I Built

A Python-based validation and cleaning pipeline that ingests multi-source clinical data, standardises schema, detects missing values and duplicates, applies longitudinal consistency checks, and outputs publication-ready datasets. Paired with structured Excel reporting suites for non-technical researchers.

Tools & Stack

Python (Pandas, NumPy), PostgreSQL, Excel (structured templates), custom validation rule engine, automated flagging for outlier detection.

The Outcome

500,000+ records processed with 100% data integrity. Datasets fed directly into peer-reviewed academic publications. Reporting time for researchers cut by over 60%. Zero data-related revision requests from journal reviewers.

500K+

Records Cleaned

100%

Integrity Rate

60%

Less Reporting Time

0

Journal Errors

View on GitHub ⚠ Data confidential — UCT IRB restricted

Dashboard mockup

02

Production

Tableau SQL Excel / VBA MySQL Python

Energy Resource
Optimisation Dashboard

Amandla Africa Energy — Cape Town

The Problem

Amandla's operations team managed 10M+ energy data points spread across disconnected spreadsheets and reporting formats. Planning decisions were made on stale, manually compiled data — introducing lag and compounding errors at scale.

What I Built

A Tableau dashboard connected to a centralised SQL data model, surfacing real-time resource utilisation, demand forecasting, and anomaly detection across multiple energy sites. Automated weekly Excel reports via VBA macros replaced 5 hours of manual compilation per week.

Tools & Stack

Tableau Desktop, MySQL, Python (for ETL pre-processing), Excel VBA macros, SQL stored procedures for aggregation.

The Outcome

Dashboard adopted as the primary tool for all operational planning. Manual reporting time cut by 40%. Anomaly detection surfaced 3 under-performing sites that had gone undetected for 2+ months — enabling corrective action worth significant resource savings.

10M+

Data Points

40%

Less Manual Work

3

Sites Recovered

5hrs

Saved Weekly

SQL & ETL Scripts View Dashboard

Dashboard mockup

03

Production

Python R WHO Open Data Plotly Pandas Seaborn

Child Mortality &
Health Inequality Analysis

WHO Global Health Observatory — Open Dataset

The Brief

Using WHO's Global Health Observatory dataset, this project analyses under-5 child mortality trends across Sub-Saharan Africa from 2000–2023, correlating outcomes with healthcare access, GDP, and maternal education data from World Bank sources.

The Approach

Multi-source data merge in Python, exploratory analysis in R, and an interactive Plotly dashboard that lets users filter by country, year, and indicator. Regression modelling to identify the strongest predictors of mortality reduction.

Why It Matters

This project demonstrates the full analyst stack: data wrangling, statistical analysis, visualisation, and storytelling with public health data — the exact workflow used in global health organisations like PATH, WHO, and UNICEF.

Expected Output

Interactive Plotly dashboard, Jupyter notebook with documented methodology, written findings summary. All code published on GitHub.

GitHub — in progress Live dashboard

Concept mockup

04

Production

dbt PostgreSQL BigQuery SQL Python Data Modelling

Healthcare Claims
Data Warehouse

Personal Project — dbt + PostgreSQL + BigQuery

The Brief

A production-style data warehouse built using dbt (data build tool) on top of a synthetic healthcare claims dataset — modelling patient journeys, claim outcomes, and provider performance metrics across a simulated insurance environment.

The Approach

Raw claims data → staging models → intermediate joins → mart-layer tables ready for BI consumption. Full dbt documentation, tests, and lineage graphs. Deployed on BigQuery with a Power BI layer on top.

Why It Matters

dbt is the industry standard for analytics engineering. This project signals readiness for senior analyst and analytics engineer roles at healthtech companies (Sanlam, Discovery Health, CCHP) and remote data teams globally.

Expected Output

Full dbt project on GitHub with staging/intermediate/mart layers, test coverage report, dbt docs site, and a Power BI summary dashboard.

GitHub — in progress dbt docs

Concept mockup

05

Production

Python Scikit-learn Machine Learning XGBoost SHAP Jupyter

Hospital Readmission
Risk Predictor

MIMIC-III Open Clinical Dataset — ML Project

The Brief

Using the MIMIC-III open clinical dataset (MIT), build a machine learning model that predicts 30-day hospital readmission risk from patient demographics, diagnoses, and prior admission history — a high-value problem in health operations.

The Approach

Feature engineering on clinical data, XGBoost classifier with hyperparameter tuning, SHAP values for model explainability, and ROC/AUC evaluation. Focused on interpretability — a model clinicians can actually trust and act on.

Why It Matters

Predictive analytics in healthcare is the frontier of data science work. This project demonstrates applied ML in a clinical context — exactly what health data teams at Discovery Health, Medidata, and global NGOs are looking for in senior hires.

Expected Output

Fully documented Jupyter notebook, model card, SHAP visualisations, and a write-up explaining findings to a non-technical audience. GitHub published.

GitHub — in progress View Notebook

Concept mockup

Paediatric ClinicalData Pipeline

Energy ResourceOptimisation Dashboard

Child Mortality &Health Inequality Analysis

Healthcare ClaimsData Warehouse

Hospital ReadmissionRisk Predictor

Paediatric Clinical
Data Pipeline

Energy Resource
Optimisation Dashboard

Child Mortality &
Health Inequality Analysis

Healthcare Claims
Data Warehouse

Hospital Readmission
Risk Predictor