Reliability Engineer
Job summary
We are looking for a proactive and hands-on Reliability Engineer to join our team. You will be crucial in ensuring our core services are stable, scalable, and efficient.
Job descriptions & requirements
Responsibilities:
- Closely monitor system health, performance, and availability using tools like Grafana, Prometheus, Datadog, or New Relic.
- Respond to and resolve incidents.
- Lead and document post-incident reviews to identify root causes and preventive actions.
- Write scripts (Python, Bash) and use configuration management tools to automate operational tasks, deployments, and recovery procedures.
- Build the internal platforms and tools that make reliability a default for every engineering team—self-healing systems, automated canary analysis, and performance tracing at scale.
- Work with software teams to define Service Level Objectives (SLOs) and Error Budgets.
- Implement improvements to reduce manual toil, improve system resilience, and prevent recurring issues.
- Manage and optimize cloud resources (AWS, Google Cloud, or Azure) to ensure cost-effectiveness and performance.
- Implement infrastructure as Code (IaC) principles.
- Lead the design and implementation of chaos engineering practices, disaster recovery automation, and capacity planning.
Requirements:
- 3-5 years of experience in a DevOps, SRE, Linux System Administration, or Backend Engineering role.
- Proficiency in a scripting language: Python or Go.
- Solid experience with cloud platforms: Azure, Google Cloud, AWS, etc.
- Experience with containerization and orchestration (Docker, Kubernetes).
- Practical knowledge of monitoring/observability tools.
- Familiarity with CI/CD pipelines (GitLab CI, Jenkins, GitHub Actions).
Core Skills:
- Excellent problem-solving and troubleshooting skills under pressure.
- Strong understanding of network fundamentals (TCP/IP, DNS, HTTP/S).
- Knowledge of database performance and reliability (PostgreSQL, MySQL, MongoDB).
- A systematic approach to automation and a desire to eliminate manual work.
- Good communication skills to collaborate with both technical and non-technical teams.
- Understanding of security best practices in infrastructure.
Location: Remote
Remuneration: NGN 400,000 (Non-negotiable)
Important safety tips
- Do not make any payment without confirming with the Jobberman Customer Support Team.
- If you think this advert is not genuine, please report it via the Report Job link below.