Senior DevOps Engineer full time

Mercans HQ: Dubai, United Arab Emirates, United Arab Emirates Remote job Jun 10

The Senior DevOps Engineer is responsible for executing Mercans’ AI-native infrastructure strategy by building, automating, and operating secure, scalable private cloud platforms and DevSecOps pipelines. Reporting to the CTO, the role collaborates with Product, Engineering, Data Science, MLOps, and SRE teams to accelerate software delivery, support AI workloads, and ensure resilient, compliant operations. The position focuses on Kubernetes-based infrastructure, GitLab DevSecOps, CI/CD automation, observability, security, and AI platform enablement.

Platform Automation & CI/CD

  • Design, implement, and maintain GitLab CI/CD pipelines for payroll, HR, and AI services.
  • Support feature flagging, canary releases, blue-green deployments, and progressive delivery.
  • Integrate DevSecOps controls including SAST, DAST, dependency scanning, container scanning, IaC scanning, and secret detection.
  • Develop self-service deployment templates, reusable pipeline libraries, and environment provisioning capabilities.
  • Enable rapid and secure product releases while reducing manual deployment effort.

Private Cloud Operations

  • Automate infrastructure provisioning using Infrastructure as Code (Terraform, Kubernetes, Ansible, or similar).
  • Manage and optimize Kubernetes clusters, storage platforms, HCI infrastructure, and GPU-enabled environments.
  • Support AI model training and inference workloads through efficient scheduling, autoscaling, and resource optimization.
  • Ensure platform scalability, performance, and operational efficiency.

Observability & Reliability

  • Implement logging, monitoring, tracing, and alerting solutions.
  • Contribute to SRE practices, service availability objectives, incident response, and resilience engineering.
  • Participate in chaos testing, postmortems, and continuous reliability improvements.
  • Support active-active and multi-datacenter deployment strategies.

Security & Compliance

  • Embed security policies into CI/CD pipelines and infrastructure workflows.
  • Implement RBAC, secrets management, vulnerability remediation, and compliance controls.
  • Collaborate with engineering, security, and governance teams to ensure secure software delivery.
  • Support compliance requirements in regulated and data-sensitive environments.

MLOps & Product Enablement

  • Integrate AI model deployment workflows with GitLab pipelines and model registries.
  • Standardize release processes for AI models and application features.
  • Collaborate with Product, Data Science, and MLOps teams to accelerate deployment velocity while maintaining governance controls.
  • Support feature management and experimentation frameworks.

Documentation & Knowledge Sharing

  • Develop runbooks, operational procedures, deployment standards, and platform documentation.
  • Contribute to internal Centers of Excellence for DevOps, SRE, and AI Engineering.
  • Deliver training sessions and technical knowledge-sharing activities.
  • 4–6+ years of experience in DevOps, SRE, Platform Engineering, or similar roles.
  • Strong experience operating Kubernetes-based production environments.
  • Hands-on experience with Infrastructure as Code tools such as Terraform and Ansible.
  • Experience managing containerized AI/ML workloads and GPU-enabled infrastructure.
  • Proficiency in at least one programming language such as Python or Go.
  • Experience with CI/CD automation and GitLab DevSecOps.
  • Knowledge of observability platforms, incident management, SLIs/SLOs, and reliability engineering.
  • Experience with secure software delivery, secrets management, and compliance-focused environments.
  • Familiarity with GitOps workflows, deployment automation, and self-service platform engineering.
  • Strong communication, documentation, and collaboration skills.
  • Achieve 98%+ CI/CD pipeline success rates and reduce deployment lead time to under two hours.
  • Reduce infrastructure costs by 15% while increasing average GPU and node utilization to 70% or higher.
  • Improve service availability toward 99.99%+ uptime and reduce critical incidents by 30%.
  • Enable deployment of AI models and product features within 24 hours for prioritized use cases.
  • Publish at least 10 operational runbooks and conduct 6+ technical knowledge-sharing sessions annually.
  • Support continuous improvement of DevOps, MLOps, security, and platform engineering practices.
Job Skills
Requirements
Availability:
Full-time (40 hrs/wk)
Negotiable rate