Retail Sales Analysis – Data Engineering Pipeline
Retail Sales Analysis – Data Engineering Pipeline
Overview:
- Developed an end‑to‑end data engineering pipeline to analyze retail sales data on Google Cloud Platform.
- Automated infrastructure provisioning with Terraform: created GCS buckets, BigQuery datasets, Dataproc cluster and configured IAM roles.
- Ingested raw and incremental data into staging buckets and managed lifecycle policies.
- Processed data using Dataproc and transformed it with Delta Live Tables and dbt to produce cleansed and aggregated datasets.
- Scheduled workflows and ETL jobs with Apache Airflow for continuous ingestion and transformation.
- Containerized jobs with Docker and visualized key metrics using Power BI dashboards.
- Implemented best practices for cost optimization, data quality, and security.
GitHub repository: https://github.com/Abdou240/Retail-Sales-Analysis