Hari Krishna
Senior Staff Engineer | Technical Lead
Email:-
Phone: -
Location: Bangalore, India
Professional Summary:
A highly motivated Technical Lead with 13+ years of experience in Chaos Engineering, Site
Reliability Engineering (SRE), Cloud Operations, Implementation and Support. Proven
expertise in leading cross-functional teams to deliver mission-critical projects, optimizing
operational efficiency, and ensuring system reliability. Extensive experience working on largescale, high-impact projects for NPCI (National Payments Corporation of India), focusing on
resiliency engineering, ITIL processes. Adept at managing stakeholder relationships, driving
process improvements, and delivering solutions that align with business objectives.
Core Competencies:
•
Project Management & Leadership: End-to-end project planning, risk management,
and resource allocation.
•
Site Reliability Engineering (SRE): Expertise in ensuring high availability, fault
tolerance, and incident response.
•
Service-Level Objectives (SLOs) and Indicators (SLIs)
▪
▪
Define and measure SLOs and SLIs to track system health and performance.
Ensure alignment with Service Level Agreements (SLAs) to meet business and
customer expectations.
•
Chaos Engineering & Resiliency: Specialized in identifying vulnerabilities and ensuring
system robustness through chaos experiments.
•
Experience in monitoring tools like Datadog,BMC & Logic Monitor
•
Cloud & Infrastructure Management: Strong experience in AWS, Docker,
Kubernetes, and other cloud platforms.
•
DevOps & Automation: Extensive experience with CI/CD pipelines, automation tools,
and improving deployment processes.
•
ITIL Framework: Skilled in Incident, Problem, and Change Management, improving
service reliability and reducing downtime.
•
Stakeholder & Client Management: Proven track record of managing complex
relationships with internal and external stakeholders.
Professional Experience:
Senior Staff Engineer | Team Lead ( Chaos Engineering & SRE )
Nagarro – Bangalore, India | June 2022 – Present
•
Led critical projects for NPCI which is an umbrella organization for operating retail
payments and settlement systems in India, is an initiative of the Reserve Bank of India
(RBI) including the AEPS, IMPS, RuPay, and UPI applications, ensuring end-to-end
availability and high performance of mission-critical services.
•
Supervised a team of Site Reliability Engineers (SREs), ensuring application stability,
developing operational processes, and driving performance improvements.
•
Incident & Problem Management
Incident Handling: Handle on-call duties, troubleshoot incidents, and restore
service during outages.
o Post-Incident Reviews: Conduct postmortems to identify gaps and make
systems more resilient.
o Problem Management: Identify recurring issues and implement long-term
solutions.
Spearheaded the design and execution of chaos engineering experiments,
simulating real-world failures to identify system vulnerabilities and enhance overall
resilience.
o
•
•
Chaos Engineering:
Resilience Testing: Conduct chaos experiments to simulate failures and
identify weak points in the system.
o Improve Fault Tolerance: Implement changes to improve system resilience
based on findings from chaos tests.
Collaborated with development teams to analyze system architectures and eliminate
single points of failure, reducing incident recurrence by 30%.
o
•
•
Own end to end availability and performance of mission critical services. Contributing
to the design/architecture of the system.
•
Analyze system architectures to identify single points of failure and other areas that may
present a resiliency deficiency.
•
Develop software to automate chaos and resiliency test cases that simulate failures in
a system that performs financial data processing.
•
Establish a process to define a hypothesis around a steady-state and to simulate realworld events. Identification of top errors, reliability issues and driving root cause to
avoid repeat of incidents.
•
Execute targeted Chaos type failures which may include causing an outage in a specific
service or component and ensuring recovery works as designed or that we know how
best to mitigate the impact of any major chaos scenarios.
•
Extensive debugging and root cause analysis for failures with a heavy emphasis on both
application and network analytics. Execute load scenarios to recreate real world like
conditions using tools and working on small scripts and automation.
•
Achievements:
o
Reduced incident response times by 25% through proactive monitoring and
enhanced automation.
o
Increased system uptime to 99.99%, delivering greater reliability for financial
services.
o
Implemented a real-time error monitoring system, leading to a 40% reduction in
critical failures.
Cloud Engineer | DevOps Specialist
Ellucian India | November 2018 – June 2022
Bangalore,INDIA
•
Led cloud operations for SaaS, PaaS, and IaaS solutions, ensuring high availability and
optimizing infrastructure for cost efficiency.
•
Owned the release management process, overseeing major, minor, and emergency
releases, and ensuring seamless deployment.
•
Introduced automation solutions for repetitive tasks, reducing manual effort and
improving efficiency by 30%.
•
Collaborated with R&D teams to implement automation tools and CI/CD pipelines,
improving deployment times and reducing errors.
•
Conducted incident postmortems, identifying key areas for improvement and ensuring
continuous service improvement.
Cloud Support Engineer
Tekion India Private Limited
09/2017 - 11/2018
Bangalore,INDIA
05/2015 - 09/2017
Bangalore,INDIA
Cloud Application Engineer
SDL Technologies
Application Support Engineer
SLK Technologies
06/2014 - 05/2015
Bangalore,INDIA
01/2013 - 06/2014
Bangalore,INDIA
Associate Technical Analyst
Aptean
Implementation & Production Support Engineer
WIPRO INFOTECH
08/2011 - 01/2013
Bangalore,INDIA
Roles & Responsibilities
Implementing eHelpline tool globally for multiple clients
Resolving the Incidents and Service requests on time without violating any SLA
•
Achievements:
o
Successfully led a major cloud migration project, improving system
performance and reducing costs by 20%.
o
Implemented cloud automation scripts, resulting in a 30% increase in
operational efficiency.
Education:
Bachelor of Technology (B.Tech) in Electronics & Communication Engineering
Jawaharlal Nehru Technological University, Anantapur | May 2010
Certifications:
•
Harness Chaos Engineering Practitioner – Harness/Litmus
•
ITIL Foundation Certification
Technical Skills:
•
Cloud Platforms: AWS
•
DevOps Tools: Jenkins, Docker, Kubernetes, Ansible
•
Monitoring & Logging: Logic Monitor, Datadog, Grafana,BMC Remedy
•
Operating Systems: Windows, Linux
•
Version Control: GIT
Languages Known:
✓
English, Telugu, Hindi , Kannada.
Leadership & Soft Skills:
•
Team Leadership: Led cross-functional teams to deliver complex projects, managing
resources and ensuring timely delivery.
•
Communication: Excellent verbal and written communication skills, adept at liaising
with stakeholders at all levels.
•
Problem-Solving: Strong analytical skills, able to diagnose complex technical issues
and deliver innovative solutions.
•
Agile & Scrum Methodologies: Experience in managing projects using Agile, ensuring
iterative progress and continuous feedback.