Rahul Kumar Singh

Rahul Kumar Singh  - -  Hyderabad, IN Apache Hadoop & Spark Developer SUMMARY PROFESSIONAL EXPERIENCE Overall 3.2 years experienced & 2.9 years of expertise in resultoriented Apache Spark & Hadoop Developer possessing a proven track record in software development using Hadoop, Apache Spark & Python. Proficient in processing structured/Semi Structured data & deploying Apache Spark to analyse huge data sets. Adept at exporting & importing data using Hadoop clusters Highly skilled at Hadoop data management & capacity planning for end-to-end data management & performance optimization. Infosys Limited Apache Spark & Hadoop Developer  Aug '16 - Present Infosys is a global leader in next-generation digital services and consulting. We enable clients in 45 countries to navigate their digital transformation. Professional Synopsis Experience in Hadoop framework, Script designing in Spark and Python. Expertise in Analysis, Design, Development, Implementation and Support of Data Warehousing using Big Data. Hands on experience in Data Transformation using Python, Pandas. Pyspark, Hive, identifying and resolving performance issues in various levels like Hive Performance issues, Spark running performance issues, etc. Performs data processing using Hadoop, Spark, Python, Hive, and Sqoop Sound exposure in directly working with clients to understand and analyze the business requirements and propose effective solutions. KEY SKILLS Independently developed multiple Spark script using Python and pandas to process data . • Data Processing • Big Data Hadoop• Apache Spark Framework • Python Programming • AWS Involved in creating Hive tables, loading data and writing hive queries which will run internally in Pyspark way. Strong Knowledge of Hadoop and Hive and Hive's analytical functions. • Cloud migration, Configuration & Testing • • Client Relationship Management • Project Management • Quality Assurance • Research & Documentation Created schema in Hive with performance optimization using bucketing & partitioning Solid understanding of the Hadoop file distributing system Hands on experience in IDE tools like Eclipse, Pycharm, Jupyter Notebook. Key Achievements Analyzed Log files & conducted Root Cause Analysis to diagnose & resolve 50+ issues as Problem Management. Team Management • Strategy Designed automated table validation framework to validate table load as per schedule in Python,spark,pandas,hive and generate the report over mail on daily basis to support team and SME. TECHNICAL SKILLS Selected as the top 5% performer for extraordinary performance while on confirmation period. Big Data Ecosystem : Hadoop, Hive, Apache Spark, HBase, AWS,Ambari,Hue Tools and Platform: Autosys,ServiceNow,Teamtrack, Zenkins,bitbucket,Pycharm,Eclip se. Languages: Python,Pandas,SQL TRAINING & CERTIFICATIONS Big Data and Hadoop Developer | Edureka | '17 Hyderabad, IN INTERNSHIPS Infosys Limited Internship  Mysuru, IN Jan '16 - May '16 Transport Management Training: Big Data, Python and SQL Project: The project's objective is to automate and speed up the slow process of NOC generation from RTO office. EDUCATION Galgotias University B.Tech - Computer Science Greater Noida, IN  Aug '12 - May '16 CGPA: 7.62 / 10 PROJECTS PROJECT 1: Hadoop Ingestion Client: American Insurance Brief: ELF framework is a customized metadata driven Pyspark framework developed to extract and load the data received from different ingestion patterns into HDFS. Overcame challenges of storing & processing data via Hadoop Framework & Apache PySpark Automated and scheduled the Sqoop jobs in a timely manner using Python Scripts. Filtered out bad records on the basis of requirement. Perform validation at different labels and Ingest data into hive table Created replica of hive tables based of security requirements (different level of hive tables). Ingested 70+ sources into HDFS using ELF framework. Environment: HDFS (for storage), Hive,Spark SQL (for transformation), Python , Oracle (for metadata storage), Sqoop (for data ingestion), Autosys(for job scheduling), bit bucket ( version control repository ). PROJECT 2: ADW Transformations Client: American Insurance Brief: Worked collaboratively with the clients and the onsite team to move the data from Row layer to transformation layer along with the necessary transformations operations and modifications .The responsibilities included design, development. Written Pyspark script to extract data from staging/Row tables. Created transformation layer from multiple source table based on Policy type. Worked on Hadoop framework to process data at multiple layer based on client requirements. Deployed Apache Spark and Python script for data processing and Hive to store data. Environment: Spark , Hive, Python, Hbase, HDFS (for storage). Key Achievements Designed solutions & codes using the Hadoop Framework to create Classic layer and transformation layer where PIT table acting as bridge. Independently designed framework like "DQ ,Security and MD5 generation for Struct data type as well as flat data type " using python and spark.

Scheduled maintenance