We are seeking a Senior Data Engineer to design, build, and scale modern cloud-native data platforms supporting real-time analytics, enterprise reporting, and high-volume streaming workloads. This role requires strong hands-on expertise in Snowflake, Databricks, Spark, Kafka, Airflow, and dbt, along with proven experience building reliable pipelines handling multi-TB datasets and millions/billions of events per day.
You will partner with engineering, product, analytics, and business teams to deliver scalable pipelines, optimize performance, improve cost efficiency, and implement DataOps best practices across cloud environments (AWS/GCP).
- Design and build scalable batch + streaming data pipelines using Kafka, Spark Structured Streaming, Databricks, and Snowflake
- Develop and optimize ELT workflows using dbt and orchestration frameworks like Airflow
- Build enterprise-grade data models (star schema, dimensional modeling) for analytics and BI workloads
- Optimize Snowflake performance using clustering, workload isolation, materialized views, and query tuning
- Implement automated testing and validation frameworks using Great Expectations, pytest, and unit testing
- Implement CI/CD for data pipelines using GitHub Actions / GitLab / Jenkins
- Support cloud migrations from legacy systems into AWS/GCP cloud-native architectures
- Build metadata, lineage, and governance frameworks (IAM, encryption, audit logging, catalog tools like Amundsen)
- Improve data reliability through SLA monitoring, observability, schema enforcement, and pipeline alerting
- Partner with analytics teams to enable self-service dashboards in Tableau, Power BI, Looker
- Mentor junior engineers and contribute to architecture reviews and platform best practices
- 7+ years of experience as a Data Engineer / Senior Data Engineer
- Strong expertise in SQL, Python, and PySpark
- Hands-on experience with Snowflake and/or Databricks
- Strong experience with Apache Spark (batch + streaming)
- Proven experience building real-time pipelines using Kafka (Confluent is a plus)
- Experience with workflow orchestration using Airflow
- Experience with dbt and modern ELT modeling practices
- Strong knowledge of cloud data services in AWS and/or GCP
- Strong performance tuning experience (Spark + Snowflake + Redshift/BigQuery)
- Familiarity with CI/CD and Infrastructure-as-Code tools (Terraform is a plus)
- Experience processing billions of events/day or multi-terabyte datasets
- Experience with healthcare or regulated environments (HIPAA, HITECH)
- Experience with data governance frameworks (SOC 2, GDPR)
- Familiarity with Redshift, BigQuery, Delta Lake, EMR, Glue, Kinesis
- Experience implementing observability and monitoring for data pipelines
- Experience building fraud detection or near real-time analytics systems
Snowflake, Databricks, Spark, Kafka, Airflow, dbt, Python, SQL, AWS (S3, Glue, Kinesis, EMR, Lambda), GCP, Terraform, GitHub Actions/GitLab/Jenkins, Great Expectations, Tableau/Power BI/Looker