As a Senior DevOps Engineer , you will play a pivotal role in deploying and automating
infrastructure, building secure analytics architectures, and enhancing DevOps practices. Your deep
expertise in GCP and AWS production environments, strong focus on automation, and collaborative
mindset will empower engineering teams and ensure our systems operate efficiently and reliably.
This is an exciting opportunity to work with cutting-edge technologies while contributing to high-
impact projects in a fast-paced, innovative environment.
What you will do
Ensure Infrastructure Uptime: Own and maintain the reliability and uptime of infrastructure
across GCP and AWS.
Production Support: Provide on-call support for production infrastructure and deliver
permanent solutions to recurring issues.
Collaborate Across Teams: Work closely with engineering teams to implement solutions for
performance, scalability, and security.
Enhance Deployment Processes: Troubleshoot and improve the deployment and release
processes for infrastructure on AWS, Kubernetes, and Jenkins.
Build Core Infrastructure: Design and maintain core infrastructure components to support
growth and scalability.
Real-Time Monitoring: Identify critical business metrics and implement monitoring, alerting,
trending, and dashboarding solutions.
Optimize Systems: Identify and address high-ROI opportunities for remediation, cost savings,
and margin improvement.
Qualifications
Cloud Expertise: Minimum of 5 years of experience with GCP or AWS production
environments.
Infrastructure as Code (IaC): Strong skills with tools like Terraform.
CI/CD Pipelines: Experience building and defining CI/CD using tools like GitLab, GitHub
Actions, or BuildKit.
Container Orchestration: Expertise in designing and managing containerised services using
technologies like Kubernetes.
Scripting Skills: Proficient in Python, TypeScript, or shell scripting.
Monitoring and Logging: Proficiency in tools such as Datadog and CloudWatch.
Automation-Focused: Demonstrated ability to automate processes to improve efficiency and
reliability.
SRE Principles: Strong understanding of SLA/SLR concepts and site reliability engineering
principles.
Linux Expertise: Hands-on experience working with Linux environments.
Security and Cost Awareness: Knowledge of infrastructure cost optimization and security
best practices.
Good to Have
Familiarity with other cloud platforms such as Aliyun or Azure.
Experience with serverless architectures (e.g., AWS Lambda).
Knowledge of configuration management tools like Ansible or Chef.
Advanced experience with logging and monitoring tools.
Background in network security and compliance standards.
AWS or GCP certification is a plus but not mandatory.