Rahul Kumar Singh
-
-
Hyderabad, IN
Apache Hadoop & Spark Developer
SUMMARY
PROFESSIONAL EXPERIENCE
Overall 3.2 years experienced &
2.9 years of expertise in resultoriented Apache Spark & Hadoop
Developer possessing a proven
track record in software
development using Hadoop,
Apache Spark & Python. Proficient
in processing structured/Semi
Structured data & deploying
Apache Spark to analyse huge data
sets. Adept at exporting &
importing data using Hadoop
clusters Highly skilled at Hadoop
data management & capacity
planning for end-to-end data
management & performance
optimization.
Infosys Limited
Apache Spark & Hadoop Developer
Aug '16 - Present
Infosys is a global leader in next-generation digital services and consulting. We enable clients in 45
countries to navigate their digital transformation.
Professional Synopsis
Experience in Hadoop framework, Script designing in Spark and Python.
Expertise in Analysis, Design, Development, Implementation and Support of Data
Warehousing using Big Data.
Hands on experience in Data Transformation using Python, Pandas. Pyspark, Hive,
identifying and resolving performance issues in various levels like Hive Performance
issues, Spark running performance issues, etc.
Performs data processing using Hadoop, Spark, Python, Hive, and Sqoop
Sound exposure in directly working with clients to understand and analyze the
business requirements and propose effective solutions.
KEY SKILLS
Independently developed multiple Spark script using Python and pandas to process
data .
• Data Processing • Big Data
Hadoop• Apache Spark Framework
• Python Programming • AWS
Involved in creating Hive tables, loading data and writing hive queries which will run
internally in Pyspark way.
Strong Knowledge of Hadoop and Hive and Hive's analytical functions.
• Cloud migration, Configuration &
Testing •
• Client Relationship Management •
Project Management • Quality
Assurance • Research &
Documentation
Created schema in Hive with performance optimization using bucketing & partitioning
Solid understanding of the Hadoop file distributing system
Hands on experience in IDE tools like Eclipse, Pycharm, Jupyter Notebook.
Key Achievements
Analyzed Log files & conducted Root Cause Analysis to diagnose & resolve 50+ issues
as Problem Management.
Team Management • Strategy
Designed automated table validation framework to validate table load as per schedule
in Python,spark,pandas,hive and generate the report over mail on daily basis to
support team and SME.
TECHNICAL SKILLS
Selected as the top 5% performer for extraordinary performance while on
confirmation period.
Big Data Ecosystem : Hadoop,
Hive, Apache Spark, HBase,
AWS,Ambari,Hue
Tools and Platform:
Autosys,ServiceNow,Teamtrack,
Zenkins,bitbucket,Pycharm,Eclip
se.
Languages: Python,Pandas,SQL
TRAINING &
CERTIFICATIONS
Big Data and Hadoop Developer
| Edureka | '17
Hyderabad, IN
INTERNSHIPS
Infosys Limited
Internship
Mysuru, IN
Jan '16 - May '16
Transport Management
Training: Big Data, Python and SQL
Project: The project's objective is to automate and speed up the slow process of NOC
generation from RTO office.
EDUCATION
Galgotias University
B.Tech - Computer Science
Greater Noida, IN
Aug '12 - May '16
CGPA: 7.62 / 10
PROJECTS
PROJECT 1: Hadoop Ingestion
Client: American Insurance
Brief: ELF framework is a customized metadata driven Pyspark framework developed to
extract and load the data received from different ingestion patterns into HDFS.
Overcame challenges of storing & processing data via Hadoop Framework & Apache
PySpark
Automated and scheduled the Sqoop jobs in a timely manner using Python Scripts.
Filtered out bad records on the basis of requirement.
Perform validation at different labels and Ingest data into hive table
Created replica of hive tables based of security requirements (different level of hive
tables).
Ingested 70+ sources into HDFS using ELF framework.
Environment: HDFS (for storage), Hive,Spark SQL (for transformation), Python , Oracle (for
metadata storage), Sqoop (for data ingestion), Autosys(for job scheduling), bit bucket (
version control repository ).
PROJECT 2: ADW Transformations
Client: American Insurance
Brief: Worked collaboratively with the clients and the onsite team to move the data from
Row layer to transformation layer along with the necessary transformations operations
and modifications .The responsibilities included design, development.
Written Pyspark script to extract data from staging/Row tables.
Created transformation layer from multiple source table based on Policy type.
Worked on Hadoop framework to process data at multiple layer based on client
requirements.
Deployed Apache Spark and Python script for data processing and Hive to store data.
Environment: Spark , Hive, Python, Hbase, HDFS (for storage).
Key Achievements
Designed solutions & codes using the Hadoop Framework to create Classic layer and
transformation layer where PIT table acting as bridge.
Independently designed framework like "DQ ,Security and MD5 generation for Struct
data type as well as flat data type " using python and spark.