Products

How it works

- Search profiles
- Search jobs

Post a job (FREE)
Sign in

Browse jobs
Looking for work
Looking to hire

Naresh

$45/hr

Data Engineer

Reply rate:: -

Availability:: Full-time (40 hrs/wk)

Location:: Hyderabad, Telangana, India

Experience:: 7 years

Profile
CV/Resume
Portfolio

Data Engineering Projects

Data Engineering Data Engineering Projects 1. Data pipelines with Apache Airflow 2. Data Lakes with Apache Spark 1. Data pipelines with Apache Airflow Automate Data Warehouse ETL process with Apache Airflow : Automation is at the heart of data engineering and Apache Airflow makes it possible to build reusable production-grade data pipelines that cater to the needs of Data Scientists. In this project, I took the role of a Data Engineer to: Develop a data pipeline that automates data warehouse ETL by building airflow operators that handle the extraction, transformation, validation and loading of data from S3 -> Redshift -> S3. Build a reusable production-grade data pipeline that incorporates data quality checks and allows for easy backfills. Keywords: Apache Airflow, AWS Redshift, Python, ETL, Data Engineering 2. Data Lakes with Apache Spark Develop an ETL pipeline for a Data Lake : As a data engineer, I was tasked with building an ETL pipeline that extracts data from S3, processes them using Spark, and loads the data back into S3 as a set of dimensional tables. This allows Data Scientists to continue finding insights from the data stored in the Data Lake. Developed python scripts that make use of PySpark to wrangle the data loaded from S3. Designed a star schema to store the transformed data back into S3 as partitioned parquet files. Keywords: Apache EMR, Data Lakes, PySpark, Python, Data Wrangling, Data Engineering

About

Blog
Contact us
Terms
Privacy
Content Guidelines

Social