Data Engineer & Data Scientist with a strong foundation
in Python, web scraping and data extraction — building
production-grade pipelines and machine learning systems
that take data from raw files all the way to executive
dashboards and predictive models.
I design and build end-to-end data infrastructure that
businesses can actually depend on — automated, tested,
monitored and documented.
What I build:
→ Batch ETL pipelines using Apache Airflow, dbt and
Snowflake on AWS — with data quality tests, layered
transformation models and real-time alerting
→ Real-time streaming pipelines using Apache Kafka and
Apache Spark — processing live events continuously
into Snowflake
→ Machine Learning prediction models — classification,
regression and Random Forest — deployed as interactive
web apps using Streamlit
→ Analytics dashboards using Power BI connected
directly to Snowflake mart tables
→ Cloud infrastructure on AWS — EC2, S3, SSM, SNS, IAM
→ Containerised deployments using Docker and
Docker Compose
Recent Data Engineering projects:
Netflix ETL Pipeline — Automated batch pipeline that
ingests 87,000+ rows of data daily from AWS S3,
transforms it through 6 layered dbt models in
Snowflake, and delivers Slack and email alerts on
success or failure. Fully containerised with Docker
on AWS EC2.
github.com/EkeminiImeOtu/netflix-etl-pipeline
Real-Time Streaming Pipeline — Live event processing
system using Apache Kafka and PySpark Structured
Streaming. Generates and processes Netflix viewing
events in real-time with 15-second micro-batches
written directly into Snowflake.
github.com/EkeminiImeOtu/netflix-streaming-pipeline
Olist E-Commerce Analytics — End-to-end analytics
platform built on 1.5 million rows of real e-commerce
data. Python ingestion from S3, 11 dbt models across
staging and mart layers, Power BI executive dashboard
showing revenue trends, top categories, delivery
performance and payment breakdown.
github.com/EkeminiImeOtu/olist-ecommerce-pipeline
Machine Learning projects (AI Saturdays Abeokuta —
part of the global AI6 movement):
Diabetes Prediction Web App — ML classification model
deployed as an interactive Streamlit web application
for early diabetes risk assessment.
github.com/EkeminiImeOtu/diabetes-web-app-prediction-using-Machine-Learning
Multiple Disease Prediction System — Predicts multiple
diseases from patient data using classification
algorithms.
github.com/EkeminiImeOtu/multiple_disease_prediction
Hotel Bookings Cancellation Prediction — Predicts
cancellation likelihood to help hotels optimise revenue.
github.com/EkeminiImeOtu/hotel-bookings-cancellation-prediction
Flight Price Prediction — Forecasts flight prices using
multiple machine learning algorithms.
github.com/EkeminiImeOtu/flight-prediction-using-machine-learning-algorithms
Flowers Classification — Image classification system
using Random Forest.
github.com/EkeminiImeOtu/Flowers-classification-using-random-forest
My background in Python automation, web scraping, API
integrations and machine learning means I can handle
the full data journey — from collecting raw data, to
engineering reliable pipelines, to building predictive
models, all the way to delivering clean, actionable
insights.
I also teach programming (Python, Java, Scratch, MIT
App Inventor) to kids, teens and adults, and publish
tutorials on YouTube — which means I communicate
complex technical concepts clearly to both technical
and non-technical stakeholders.
Tools: Python • Apache Airflow • dbt • Snowflake •
Apache Kafka • Apache Spark • AWS • Power BI • Docker •
SQL • Pandas • NumPy • Scikit-learn • Streamlit •
Random Forest • Boto3
Certifications:
ALX Africa — Data Analytics (6-Month Programme)
ALX Africa — Python Programming (8-Week Training)
ALX Africa — Professional Foundations
DataTalks.Club Data Engineering Zoomcamp (In progress)
Available immediately for remote roles worldwide.
Let's turn your raw data into decisions that move your
business forward.