Kamlesh Kumar Gupta

Kamlesh Gupta Bangalore  - -Profile Summary Experience spanning over 8.5 years with Data Integration, Data Engineering and Build of Large-Scale Data Warehouse and Data Lake implementations using various ETL, ELT, BIGDATA and Cloud Technology Core Competencie s Professional Experience     PySpark, AWS Glue, Redshift, Core Java, Kafka, WhereScape , AWS CF, Scala, Java    Redshift, Hive, SQL, Athena Data Vault2.0 Databricks Current organization- Sept 2018 -till date  Project - Data Vault Datawarehouse Implementation for FMCG client Technologies: AWS Glue, Redshift, WhereScape 3D and RED, S3, Athena, Spark, AWS Code Pipeline, AWS CloudFormation, AWS Lambda, DynamoDB, AWS Step Functions, Airflow,Wherescape Business Functions:  Customer required to migrate the on- premises data of different business units and on-prem Hadoop based data process and reports. The target was to achieve a robust technical and functional model which can append new data inclusion in the data product without altering the earlier existing models.  Developed Data Vault 2.0 Data Warehouse on Redshift.  Performed Data Modeling and Developed Data Processing pipelines using AWS Bigdata Solutions [ EMR, GLUE, STEP FUNCTION, lambda]  Developed Data Lake and various Data Ingestion pipelines using Glue Catalog, Athena.  Created AWS Glue Spark Jobs for data transformations of formats like CSV, JSON, and XML.  Optimize the of the AWS Glue process and Redshift Queries.  CICD for AWS Glue using AWS CF, AWS CODE Build, AWS Code Deploy.  Used Dynamo Db for config storage and retrievals.  Migrate the existing Scheduling process to Airflow.  Building the WhereScape 3D Data Vault2.0 models’ mappings to incorporate new changes, creating mappings and other changes in Information Hub Console to incorporate new functionalities, Involved in Redshift warehouse implementations for this strategy.  Building the WhereScape Red based pipelines for Auto capturing of the delta load and change capture and Automation of 3D based Models deployments.  WhereScape RED process for deployment and automation of Data Vault entities like Load, Stage, HUB’s, Link’s, Satellites, SOCV and SOV.  Project - Data Analysis platform development and bot development for Telecom Client. Technologies: Scala, Shell Scripts, Kafka, Elastic Search, AWS EMR, Athena, Spark, AWS Code Pipeline, AWS CloudFormation, AWS Lambda, DynamoDB, AWS Step Functions         The basic function of this application is to migrate the process from Teradata to AWS. The Motive is to use S3 for storing and processing the files. Explicitly using Spark, Scala Spark, Athena, Elastic Search. Developing Data Processing pipelines using AWS Bigdata Solutions EMR, GLUE, STEP FUNCTION, lambda. Develop the AWS EMR jobs to create the processing pipelines. Responsibility to write the custom UDF’s in Spark for handling XML and JSON data. Involved in gathering the requirements, designing, development and testing. With different data sources and data type find the strategies and frameworks for the implementations. Calculating the percentage difference of month and yearly sales, seasonal sales. Reading and writing data to various file formats. Elastic Search integration with the streaming Kafka consumer to load the User account data for Optimized search using Elastic search. Implemented the Rest Auth module along with the end-to-end pipeline using Java and Spark. Implementation of AES 56 encryption and X509 Certificates custom module using the Elastic search JAVA API’S. CICD for AWS Glue using AWS CF, AWS CODE Build, AWS Code Deploy.  Project - Data Onboarding for US biopharmaceutical company on Azure Technologies: Pyspark, Azure Synapse Analytics, Cosmos DB , Databricks, Netapp, Power BI,Spark ,SQL  Solutioning the Azure platform development of Data processing and analysis of various file formats using Databricks.  Setting the Azure SFTP and automating the files onboarding to the server.  Develop the ETL pipelines for various file formats onboarding and analysis fo files of various types and formats like SAS7BDAT.  Developing data bricks jobs for files processing related to various Studies and from various vendors.  Utilizing Cosmos db for the data quality management and setting configurations for process  Developing the Azure Synapse serverless DW.  Managing the security of blinded and unblinded files of all the studies.  Developing Clean patient trackers common data models related to various studies and programs.  DXC Technologies (Formerly HPE ), February 2017 - Sept 2018  Project : Analytics development Platform development , Feb 2017- Sept 2018 Technologies : Sqoop, Hive, Spark, Scala, Oozie, Shell Scripts The basic function of this application is to  Migration to Azure Data Lake using Azure Data Factory V2.  Log Analyzer for different application logs generate by the different applications. It includes the log file transformation and analyzing in the spark (Scala, PySpark) and generating the proper hive or the desired formatted output later data for the process mining.  Migrate the data from SQL Server, Alteryx tool generated files to Hadoop and making it analytically presentable for BI. The Motive is to use Hadoop for storing and processing the files for growth Analytics development. Key Roles and Responsibilities:        Import survey data files from different providers and store in Hadoop using Slurper and PyNotify. Worked on Spark Data Frame API & SparkSQL as part of transformations Transformations and processing are done primarily in file formats Parquet and ORC. Configuring Azure Cloud for the Azure data factory pipelines. Provide the Hive structure to the files and merging with the base and historical files for processing. As per the business analytics requirements perform the analysis on the processed data and generate the KPIs’. Schedule the Oozie jobs on the defined timeframes, monitor, and resolve the query errors. Infosys Technologies ,Bangalore ,February 2014 – January 2017 Project : Technologies : Teradata and Streaming data to Hadoop Offloading Sqoop, Hive, Spark, Scala, Kafka, Flume, Oozie, Shell Scripts Converting the existing Teradata ODI codes into corresponding Big data solutions using Cloudera platform for the development and delivery. So the idea is to reverse engineering the code to understand and replace it with corresponding Big Data solutions using Spark Sql, Java, hive and Sqoop. Thereby utilizing the speedy data processing to enhance the ticket booking and reduce the booking failures in near realtime. Key Roles and Responsibilities:    Worked on Spark Data Frame API & SparkSQL as part of transformations Worked on Sqoop to get the History Data from Teradata to Hadoop Involved in writing hive queries and improved the Hive queries performance by implementing partitioning and bucketing based on different term levels    Analyzed structured data and created table using HIVE Persisting the streaming data of Kafka analyzed by spark in Cassandra. Monitored the SPARK History Server for the performance optimization of the Spark DataFrames. Project : Data migration and Analysis for Telecom Client The basic function of this application is to migrate the process from Teradata to Hadoop. The Motive is to use Hadoop for storing and processing the files. Explicitly using Hive. Key Roles and Responsibilities: o With different data sources and data type find the strategies and frameworks for the implementations. o Involved in gathering the requirements, designing, development and testing. o Designing Hive tables like partitioned and non-partitioned tables. o Calculating the percentage difference of month and yearly sales, seasonal and functional sales. o Reading and writing data to various file formats. o Proficient in Relational Database Management Systems (RDBMS) o Extensive knowledge in SQL (DDL, DML, DCL) and in Design and Normalization of the database tables. o Extensive experience in writing stored procedures, Triggers, Functions Indexes and views. o Extensive Knowledge of advance query concepts (e.g. group by, having clause, union so on)

Scheduled maintenance