Tenth Revolution Group is working with a rapidly expanding tech scale-up that is seeking a Sydney-based Site Reliability Engineer (SRE) to join their dynamic team. This role is pivotal in ensuring the reliability, scalability, and performance of their AWS cloud infrastructure while driving automation and operational excellence.
About the Role:
As a Site Reliability Engineer, you will be responsible for enhancing system reliability, performance, and efficiency through automation, monitoring, and proactive incident response. You will work closely with development teams to optimise infrastructure and streamline operations for a high-availability platform.
Your primary responsibilities will include:
*
Designing, building, and maintaining highly available and scalable cloud infrastructure, proactively identifying and resolving reliability risks.
*
Developing and maintaining automation scripts and tools to improve efficiency, reduce manual intervention, and enforce best practices.
*
Establishing and refining incident response processes, conducting root cause analysis, and implementing solutions to minimise downtime.
*
Implementing and managing robust monitoring and alerting systems to provide real-time insights into system health and performance.
*
Developing and maintaining dashboards, alerts, metrics, logs, traces, and automated monitoring tools such as Prometheus, Grafana, Datadog, New Relic, ELK Stack, or AWS CloudWatch.
*
Ensuring security best practices are followed, managing access controls, and maintaining compliance with industry standards.
*
Working cross-functionally with DevOps, software engineering, and security teams to refine processes, introduce innovative solutions, and improve system resilience.
About You:
*
4+ years of experience in Site Reliability Engineering, DevOps, or cloud infrastructure management
*
Strong expertise in AWS services (ECS, RDS, Lambda, S3, CloudFormation)
*
Hands-on experience with AWS networking (VPC, Load Balancers, Peering Links)
*
Proficiency in Infrastructure as Code (Terraform, CloudFormation, or Ansible)
*
Experience with CI/CD pipelines, monitoring tools (Prometheus, Grafana, Datadog), and scripting (Shell, Python, or Go)
*
Strong problem-solving skills with a proactive approach to incident resolution and performance tuning
*
Excellent communication and collaboration skills, with the ability to work in a fast-paced, dynamic environment
*
AWS certifications (e.g., AWS Certified DevOps Engineer, Solutions Architect)
Why You'll Love This Role:
*
Be part of a cutting-edge company driving innovation in a critical industry
*
Play a key role in ensuring platform reliability, scalability, and security
*
Work in a collaborative, DevOps-driven environment that values learning and growth
*
Contribute to mission-driven projects that make a real-world impact
This position is interviewing now and is available for an immediate start.
*Please mention your VISA Status/Working Rights in your Resume*
Olivia Beyer
(03) 9088 3704
o.beyer@tenthrevolution.com
