Sales Analysis – Data Engineering Pipeline
Sales Analysis – Data Engineering Pipeline
Overview:
- Designed and implemented an end‑to‑end data engineering pipeline for sales analysis.
- Used Google Cloud Storage (GCS) and BigQuery for data storage and warehousing.
- Provisioned infrastructure with Terraform: created staging buckets, BigQuery datasets (raw, staging, transformed), Dataproc clusters, IAM roles and service accounts.
- Employed Dataproc for scalable data processing with Spark.
- Leveraged Delta Live Tables and dbt to cleanse and transform raw data into curated datasets.
- Orchestrated workflows and scheduled ETL jobs with Apache Airflow running in Docker.
- Built interactive dashboards and reports with Power BI to deliver insights to stakeholders.
GitHub repository: https://github.com/Abdou240/Sales-Analysis