I am a Data Engineer passionate about turning complex data into actionable insights. I have experience designing and maintaining automated ETL pipelines that process millions of records daily, optimizing Spark transformations to reduce runtime, and building dashboards in Power BI that support real-time decision-making for multiple teams. My expertise spans Python, SQL, PySpark, Azure, and cloud-based analytics solutions, with a strong focus on ensuring data reliability, observability, and governance.
I have worked in both startup and corporate environments, including Mozilla.ai, where I built scalable pipelines for AI telemetry and safety analytics, and Symufolk, where I optimized Azure-based data pipelines. I thrive in high-speed, high-ambiguity environments, taking ownership of problems end-to-end — from data ingestion and cleaning to modeling and serving. I enjoy designing data architectures, implementing validation rules, and ensuring schema enforcement, always treating data as a product that delivers value to users.
My projects include building a NASA APOD automated ETL pipeline, processing large-scale regional flood impact datasets using Spark and SQL, and creating dashboards and monitoring layers for AI model evaluation. I am also experienced in integrating third-party APIs, implementing feedback loops for AI systems, and ensuring low-latency, reliable data delivery.
I am highly motivated by opportunities to work with AI-driven analytics and modern data infrastructures. I enjoy collaborating with ML engineers, analysts, and business stakeholders to design pipelines and workflows that are robust, efficient, and scalable. I am eager to bring my technical skills, problem-solving mindset, and startup experience to a role where I can contribute to building the next generation of data-driven products.