Site Reliability Engineer/Infrastructure

Confidential Employer

Job Overview

Location

Lagos, Lagos, Nigeria

Employment Type

Full-time

Work Arrangement

On-site

Sector

Information Technology & Software

Experience Level

Senior (5-8 years)

Application Deadline

June 28, 2026

Job Description

We are seeking an experienced Site Reliability Engineer to join our team and enhance the reliability of our Platform API.

In this role, you will collaborate closely with DevOps on infrastructure and observability initiatives. You will also partner with backend engineers to integrate reliability into our services from inception.

Your responsibilities will include leveraging deep knowledge of distributed systems and hands-on Go coding to define SLOs, lead incident response, build automation, and embed resilience patterns into our codebase.

To apply for this role, click the Apply button on this page and follow the instructions.

Required Skills

GoAWSGCPInfrastructure as CodeObservabilityPrometheusGrafanaOpenTelemetryDistributed SystemsMicroservicesEvent-Driven ArchitecturesSLOsIncident ResponseAutomationResilience Patterns

Key Responsibilities

Engineer the reliability of the Platform API.
Work with DevOps on infrastructure and observability.
Partner with backend engineers to build reliability into services.
Define SLOs, lead incident response, and build automation.
Embed resilience patterns directly into the codebase.

Qualifications

Minimum of 4 years of experience in SRE or Backend Engineering.
Good proficiency in Go, with the ability to read, write, and review production Go code.
Deep understanding of distributed systems architecture and design patterns.
Strong command of microservices fundamentals, event-driven architectures, and scaling principles.
Hands-on experience with AWS (ECS, RDS, CloudWatch, Lambda) or GCP.
Proficiency in infrastructure as code.
Experience running production workloads and troubleshooting infrastructure issues.
Experience designing and implementing observability strategies using Prometheus, Grafana, OpenTelemetry, or similar tools.
Ability to instrument code for proper monitoring and alerting.

How to Apply

To apply for this role, click the Apply button on this page and follow the instructions.

Join Our Communities

WhatsApp Channels

Global Jobs Gulf Jobs Nigerian Jobs Indian Jobs

Telegram Groups

Global Jobs Nigerian Jobs Gulf Jobs

The Nigerian tech landscape is experiencing robust growth, particularly in platform infrastructure. This role is pivotal in engineering the reliability of our Platform API, a critical component for scalability and performance. You will leverage deep expertise in distributed systems, microservices, and event-driven architectures. Key technical areas include defining Service Level Objectives (SLOs), implementing robust observability strategies with tools like Prometheus and Grafana, and embedding resilience patterns directly into the codebase. Your contributions will directly impact business ROI by ensuring high availability and performance of core services, enabling leadership to scale operations effectively.