Hari Krishna | Freelancer Resume

Hari Krishna Senior Staff Engineer | Technical Lead Email:- Phone: - Location: Bangalore, India Professional Summary: A highly motivated Technical Lead with 13+ years of experience in Chaos Engineering, Site Reliability Engineering (SRE), Cloud Operations, Implementation and Support. Proven expertise in leading cross-functional teams to deliver mission-critical projects, optimizing operational efficiency, and ensuring system reliability. Extensive experience working on largescale, high-impact projects for NPCI (National Payments Corporation of India), focusing on resiliency engineering, ITIL processes. Adept at managing stakeholder relationships, driving process improvements, and delivering solutions that align with business objectives. Core Competencies: • Project Management & Leadership: End-to-end project planning, risk management, and resource allocation. • Site Reliability Engineering (SRE): Expertise in ensuring high availability, fault tolerance, and incident response. • Service-Level Objectives (SLOs) and Indicators (SLIs) ▪ ▪ Define and measure SLOs and SLIs to track system health and performance. Ensure alignment with Service Level Agreements (SLAs) to meet business and customer expectations. • Chaos Engineering & Resiliency: Specialized in identifying vulnerabilities and ensuring system robustness through chaos experiments. • Experience in monitoring tools like Datadog,BMC & Logic Monitor • Cloud & Infrastructure Management: Strong experience in AWS, Docker, Kubernetes, and other cloud platforms. • DevOps & Automation: Extensive experience with CI/CD pipelines, automation tools, and improving deployment processes. • ITIL Framework: Skilled in Incident, Problem, and Change Management, improving service reliability and reducing downtime. • Stakeholder & Client Management: Proven track record of managing complex relationships with internal and external stakeholders. Professional Experience: Senior Staff Engineer | Team Lead ( Chaos Engineering & SRE ) Nagarro – Bangalore, India | June 2022 – Present • Led critical projects for NPCI which is an umbrella organization for operating retail payments and settlement systems in India, is an initiative of the Reserve Bank of India (RBI) including the AEPS, IMPS, RuPay, and UPI applications, ensuring end-to-end availability and high performance of mission-critical services. • Supervised a team of Site Reliability Engineers (SREs), ensuring application stability, developing operational processes, and driving performance improvements. • Incident & Problem Management Incident Handling: Handle on-call duties, troubleshoot incidents, and restore service during outages. o Post-Incident Reviews: Conduct postmortems to identify gaps and make systems more resilient. o Problem Management: Identify recurring issues and implement long-term solutions. Spearheaded the design and execution of chaos engineering experiments, simulating real-world failures to identify system vulnerabilities and enhance overall resilience. o • • Chaos Engineering: Resilience Testing: Conduct chaos experiments to simulate failures and identify weak points in the system. o Improve Fault Tolerance: Implement changes to improve system resilience based on findings from chaos tests. Collaborated with development teams to analyze system architectures and eliminate single points of failure, reducing incident recurrence by 30%. o • • Own end to end availability and performance of mission critical services. Contributing to the design/architecture of the system. • Analyze system architectures to identify single points of failure and other areas that may present a resiliency deficiency. • Develop software to automate chaos and resiliency test cases that simulate failures in a system that performs financial data processing. • Establish a process to define a hypothesis around a steady-state and to simulate realworld events. Identification of top errors, reliability issues and driving root cause to avoid repeat of incidents. • Execute targeted Chaos type failures which may include causing an outage in a specific service or component and ensuring recovery works as designed or that we know how best to mitigate the impact of any major chaos scenarios. • Extensive debugging and root cause analysis for failures with a heavy emphasis on both application and network analytics. Execute load scenarios to recreate real world like conditions using tools and working on small scripts and automation. • Achievements: o Reduced incident response times by 25% through proactive monitoring and enhanced automation. o Increased system uptime to 99.99%, delivering greater reliability for financial services. o Implemented a real-time error monitoring system, leading to a 40% reduction in critical failures. Cloud Engineer | DevOps Specialist Ellucian India | November 2018 – June 2022 Bangalore,INDIA • Led cloud operations for SaaS, PaaS, and IaaS solutions, ensuring high availability and optimizing infrastructure for cost efficiency. • Owned the release management process, overseeing major, minor, and emergency releases, and ensuring seamless deployment. • Introduced automation solutions for repetitive tasks, reducing manual effort and improving efficiency by 30%. • Collaborated with R&D teams to implement automation tools and CI/CD pipelines, improving deployment times and reducing errors. • Conducted incident postmortems, identifying key areas for improvement and ensuring continuous service improvement. Cloud Support Engineer Tekion India Private Limited 09/2017 - 11/2018 Bangalore,INDIA 05/2015 - 09/2017 Bangalore,INDIA Cloud Application Engineer SDL Technologies Application Support Engineer SLK Technologies 06/2014 - 05/2015 Bangalore,INDIA 01/2013 - 06/2014 Bangalore,INDIA Associate Technical Analyst Aptean Implementation & Production Support Engineer WIPRO INFOTECH 08/2011 - 01/2013 Bangalore,INDIA Roles & Responsibilities Implementing eHelpline tool globally for multiple clients Resolving the Incidents and Service requests on time without violating any SLA • Achievements: o Successfully led a major cloud migration project, improving system performance and reducing costs by 20%. o Implemented cloud automation scripts, resulting in a 30% increase in operational efficiency. Education: Bachelor of Technology (B.Tech) in Electronics & Communication Engineering Jawaharlal Nehru Technological University, Anantapur | May 2010 Certifications: • Harness Chaos Engineering Practitioner – Harness/Litmus • ITIL Foundation Certification Technical Skills: • Cloud Platforms: AWS • DevOps Tools: Jenkins, Docker, Kubernetes, Ansible • Monitoring & Logging: Logic Monitor, Datadog, Grafana,BMC Remedy • Operating Systems: Windows, Linux • Version Control: GIT Languages Known: ✓ English, Telugu, Hindi , Kannada. Leadership & Soft Skills: • Team Leadership: Led cross-functional teams to deliver complex projects, managing resources and ensuring timely delivery. • Communication: Excellent verbal and written communication skills, adept at liaising with stakeholders at all levels. • Problem-Solving: Strong analytical skills, able to diagnose complex technical issues and deliver innovative solutions. • Agile & Scrum Methodologies: Experience in managing projects using Agile, ensuring iterative progress and continuous feedback.