KOLAGATLA
CHAITANYA
-Hyderabad
-
DATA ENGINEER
SUMMARY
Results-driven data engineer with a proven track record of designing and implementing scalable data
solutions, optimizing data pipelines, and enabling data-driven decision-making.
EXPERIENCE
GEP Data Engineer
05/2022 - present
Worked as a Backend Engineer for the ETL tool we created.
Extensive experience in working with various data formats including JSON, Parquet, Excel and CSV, ensuring data
compatibility and integrity throughout the processing lifecycle using pyspark/spark.
Proactive approach to monitoring, managing, and scaling Spark clusters, resulting in optimized resource utilization and
enhanced overall system performance.(Databricks)
Used Spark 3.0 to leverage AQE and other features.
Created few UDF'S.
Created and optimized PySpark notebooks in both PySpark and Scala, automating data cleaning, transformation, and
enrichment processes, significantly reducing manual intervention and enhancing productivity.
Fine-tuned PySpark SQL queries and DataFrame operations to optimize performance. Utilized techniques such as broadcast
joins and predicate pushdown, reducing query execution time by 40% and enhancing overall system efficiency.
Implemented partitioning, caching, and optimization techniques to enhance the performance of PySpark jobs, resulting in
reduced processing times by 30%.
Utilized advanced PySpark functionalities to transform raw, complex data into structured and meaningful formats, enabling
streamlined analytics and reporting processes.
Created Data PipeLine using Airflow and stored all the metadata in MongoDB
Worked with file formats like parquet and new table called 'Delta table' in databricks
Scheduling the workflows created from the ETL tool by the user using Airflow.
Implemented and upheld the deployment of Airflow utilizing Docker as part of the development process. This entails ensuring
the setup, management, and continuous operation of the Airflow platform within the Docker environment.
Wrote Dynamic code in python which generate DAG(Airflow).
Created Azure Containers and used Blob storage for Storage.(Microsoft Storage Explorer)
Have done some POC on Azure Managed Airflow(ADF) and MWAA(Azure).
Collaborated with cross-functional teams to design and implement scalable data solutions, ensuring alignment with business
objectives and data governance standards.
TCS
BIg Data Develpoer - Client - Telefonica
06/2019 - 05/2022
Developed Spark Jobs.
Using parquet and other data formats to store in HDFS.
Developing Spark applications using scala.
Converting SQL queries to Spark transformations.
Implementing Partitions in Hive and Monitoring Apache Nifi workflows.
Data engineer/Python developer - Paclife investments
Developed a python framework which is metadata driven frame for multiple sources.
We use python framework to ingest data from source
We use snowflake as a data warehouse where we have different layers RAW,Integration and
Presentation layer.
DBT to write test cases.(create a DBT framework)
we push data to upstream layers using DBT.
Create image and push to amazon ECR repository and use AWS batch job to run the ingestion.
Control-m to create workflows and orchestrate.
Control-M call AWS batch job defintion to run the ingestion,DBT frameworks.
Use Azure DEVops for CI/CD pipeline.
Azure repo’s to store code.
Snowflake Developer -Client Juniper Networks
Developing Complex Views, External Tables, Stored Procedures, File Formats, Tasks, Streams etc.
Developing various SnowSQL code of least, medium and high complexity with utmost efficiency
and performance.
To work on creating and using File Formats, Time-Travel and different types of tables and view
Working on performance/process tuning for the existing SnowSQL jobs.
Developing Matillion Jobs for Triggering SnowSQL statements and scheduling the jobs.
IDQ Developer
Need to re-structure and recreate mapping for better data cleaning and fine tunning.
Used postman to read the data for the api and api testing.
Implemented DNB integration in mapping and transformation.
Worked on procedure to automate batch loading.(as we had limit for api calls
EDUCATION
2015 - 2019
GEETHANJALI COLLEGE OF ENGINEERING AND TECHNOLOGY
B.TECH
CERTIFICATES
August 2021
AZURE DATA FUNDAMENTALS DP-900
72%
TECHNICAL STACK AND TOOLS
Cloud Tools
AWS
S3
ECR Image repository
AWS job defintion
Parameter store
Cloudwatch(logs)
Azure
Datalake gen2
Databricks
Azure Repos
Orchestration tools
Apache airflow
matilion
Control-m
Infomatica(IDQ)
Azure pipelines(CI/CD)
Database/Data warehouse
Delta Tables
Snowflake
Language
MongoDB(basic)
DECLARATION
I ,hereby declare that the above furnished information is authentic and best of my knowledge
Place: Hyderabad
K CHAITANYA
Python
Scala Spark
SQL