Sre/observability Engineer(s) Roles

Confidential Employer

This job listing has expired

Find similar jobs instead:

DevOps Engineer

Damolak Technologies Ltd

DevOps Engineer (Mid–Senior Level)

ROVA

Cloud Operations Engineer

Confidential Employer

Job Overview

Location

Lagos, Lagos, Nigeria

Employment Type

Full-time

Work Arrangement

Hybrid

Sector

Information Technology & Software

Experience Level

Senior (5-8 years)

Job Description

We are actively seeking highly skilled SRE/Observability Engineers, at both Senior and Junior levels, to join our client's dynamic team. This critical role involves the comprehensive oversight, deployment, and continuous enhancement of enterprise monitoring systems across both customer-facing and internal operational environments. A significant focus will be placed on leveraging the Elastic Stack and Dynatrace to ensure robust observability.

As an SRE/Observability Engineer, you will collaborate closely with delivery teams, customers, and key stakeholders. Your primary objective will be to ensure that all systems maintain high availability, optimal performance, and complete measurability. You will be responsible for conducting thorough root cause analyses of incidents and proactively implementing preventive measures to mitigate future occurrences. Furthermore, you will manage cloud infrastructure across AWS, Azure, and GCP to support scalable and resilient applications. You will be joining a friendly, experienced, and open-minded engineering team, committed to agile methodologies, ensuring a collaborative and engaging work experience.

To apply for this role, click the Apply button on this page and follow the instructions.

Required Skills

PythonJavaGoRubyAWSAzureGCPDockerKubernetesJenkinsGitLab CICircleCIPrometheusGrafanaELK StackOpenTelemetryNetworkingSecurityInfrastructure Best PracticesProblem-solvingCommunicationCollaborationMicroservices ArchitectureDistributed SystemsMySQLPostgreSQLMongoDBAgile Methodologies

Key Responsibilities

Oversee, deploy, and enhance enterprise monitoring across customer-facing and internal environments, with a strong focus on Elastic Stack and Dynatrace.
Coordinate effectively with other teams to ensure seamless operations and communication.
Work closely with delivery teams, customers, and stakeholders to ensure system availability, performance, and measurability.
Conduct root cause analysis of incidents and implement preventive measures to avoid recurrence.
Manage cloud infrastructure (AWS, Azure, GCP) to support scalable applications.

Qualifications

Bachelor's degree in Computer Science, Engineering, or a related field.
Minimum of 5 years of experience in software development, system administration, or DevOps.
Proficiency in programming languages such as Python, Java, Go, or Ruby.
Strong experience with cloud platforms (AWS, Azure, GCP).
Expertise in containerization and orchestration tools (Docker, Kubernetes).
Familiarity with CI/CD tools (Jenkins, GitLab CI, CircleCI).
Experience with monitoring and observability tools (Prometheus, Grafana, ELK stack).
Knowledge of OpenTelemetry for instrumentation and observability.
Solid understanding of networking, security, and infrastructure best practices.
Excellent problem-solving skills and ability to work under pressure.
Strong communication and collaboration skills.
Certifications in cloud technologies (e.g., AWS Certified Solutions Architect, Google Cloud Professional Engineer) (Good-to-have).
Experience with microservices architecture and distributed systems (Good-to-have).
Knowledge of database management systems (MySQL, PostgreSQL, MongoDB) (Good-to-have).
Understanding of agile development methodologies and practices (Good-to-have).

How to Apply

This job has expired

Join Our Communities

Join WhatsApp Channel Join Telegram Group

This pivotal role ensures the continuous availability, optimal performance, and comprehensive measurability of critical systems. You will be instrumental in deploying and enhancing enterprise monitoring solutions across both customer-facing and internal environments. Day-to-day tasks involve rigorous root cause analysis of incidents, implementing robust preventive measures, and managing cloud infrastructure to support scalable applications. Success in this position means maintaining highly reliable and performant systems, contributing to a stable and efficient operational landscape.