Chaitanya | Freelancer Resume

KOLAGATLA CHAITANYA -Hyderabad - DATA ENGINEER SUMMARY Results-driven data engineer with a proven track record of designing and implementing scalable data solutions, optimizing data pipelines, and enabling data-driven decision-making. EXPERIENCE GEP Data Engineer 05/2022 - present Worked as a Backend Engineer for the ETL tool we created. Extensive experience in working with various data formats including JSON, Parquet, Excel and CSV, ensuring data compatibility and integrity throughout the processing lifecycle using pyspark/spark. Proactive approach to monitoring, managing, and scaling Spark clusters, resulting in optimized resource utilization and enhanced overall system performance.(Databricks) Used Spark 3.0 to leverage AQE and other features. Created few UDF'S. Created and optimized PySpark notebooks in both PySpark and Scala, automating data cleaning, transformation, and enrichment processes, significantly reducing manual intervention and enhancing productivity. Fine-tuned PySpark SQL queries and DataFrame operations to optimize performance. Utilized techniques such as broadcast joins and predicate pushdown, reducing query execution time by 40% and enhancing overall system efficiency. Implemented partitioning, caching, and optimization techniques to enhance the performance of PySpark jobs, resulting in reduced processing times by 30%. Utilized advanced PySpark functionalities to transform raw, complex data into structured and meaningful formats, enabling streamlined analytics and reporting processes. Created Data PipeLine using Airflow and stored all the metadata in MongoDB Worked with file formats like parquet and new table called 'Delta table' in databricks Scheduling the workflows created from the ETL tool by the user using Airflow. Implemented and upheld the deployment of Airflow utilizing Docker as part of the development process. This entails ensuring the setup, management, and continuous operation of the Airflow platform within the Docker environment. Wrote Dynamic code in python which generate DAG(Airflow). Created Azure Containers and used Blob storage for Storage.(Microsoft Storage Explorer) Have done some POC on Azure Managed Airflow(ADF) and MWAA(Azure). Collaborated with cross-functional teams to design and implement scalable data solutions, ensuring alignment with business objectives and data governance standards. TCS BIg Data Develpoer - Client - Telefonica 06/2019 - 05/2022 Developed Spark Jobs. Using parquet and other data formats to store in HDFS. Developing Spark applications using scala. Converting SQL queries to Spark transformations. Implementing Partitions in Hive and Monitoring Apache Nifi workflows. Data engineer/Python developer - Paclife investments Developed a python framework which is metadata driven frame for multiple sources. We use python framework to ingest data from source We use snowflake as a data warehouse where we have different layers RAW,Integration and Presentation layer. DBT to write test cases.(create a DBT framework) we push data to upstream layers using DBT. Create image and push to amazon ECR repository and use AWS batch job to run the ingestion. Control-m to create workflows and orchestrate. Control-M call AWS batch job defintion to run the ingestion,DBT frameworks. Use Azure DEVops for CI/CD pipeline. Azure repo’s to store code. Snowflake Developer -Client Juniper Networks Developing Complex Views, External Tables, Stored Procedures, File Formats, Tasks, Streams etc. Developing various SnowSQL code of least, medium and high complexity with utmost efficiency and performance. To work on creating and using File Formats, Time-Travel and different types of tables and view Working on performance/process tuning for the existing SnowSQL jobs. Developing Matillion Jobs for Triggering SnowSQL statements and scheduling the jobs. IDQ Developer Need to re-structure and recreate mapping for better data cleaning and fine tunning. Used postman to read the data for the api and api testing. Implemented DNB integration in mapping and transformation. Worked on procedure to automate batch loading.(as we had limit for api calls EDUCATION 2015 - 2019 GEETHANJALI COLLEGE OF ENGINEERING AND TECHNOLOGY B.TECH CERTIFICATES August 2021 AZURE DATA FUNDAMENTALS DP-900 72% TECHNICAL STACK AND TOOLS Cloud Tools AWS S3 ECR Image repository AWS job defintion Parameter store Cloudwatch(logs) Azure Datalake gen2 Databricks Azure Repos Orchestration tools Apache airflow matilion Control-m Infomatica(IDQ) Azure pipelines(CI/CD) Database/Data warehouse Delta Tables Snowflake Language MongoDB(basic) DECLARATION I ,hereby declare that the above furnished information is authentic and best of my knowledge Place: Hyderabad K CHAITANYA Python Scala Spark SQL