Site Reliability Engineering Manager

AS Watson

New Territories

Description

Come and join a Winning Team

Be remarkable. Be yourself.

Why Should You Join Us?

At ASW, we believe in our people, in teamwork and the importance of your personal growth. If you are looking for the opportunity to join our award-winning international family with over 17,000 stores across 31 markets in Asia and Europe, the ASW family welcomes you…

You can enjoy:

  • Convenient office location, less than 5 min. walk from MTR
  • Free round-trip lunchtime shuttle bus services to Shatin
  • Comprehensive Medical and Life insurance coverage, including your spouse and children
  • Well-equipped Gym inside our office building
  • Onsite Clinic and Lactation Room

Why This Role Matters:

This role is pivotal in shaping highly resilient, scalable systems that power seamless customer experiences across our O+O (Offline + Online) ecosystem. By leveraging observability and automation, you will help ensure systems are always reliable, insights are proactive, and incidents are resolved swiftly — ultimately putting a smile on customers’ faces.

Working at the intersection of engineering and operations, you will drive a customer-focused, data-driven approach to reliability, enabling the business to innovate faster while maintaining operational excellence.

What You’ll Be Doing:

  • Design and own end-to-end observability frameworks (metrics, logs, traces) to deliver deep system visibility and insights
  • Architect highly resilient, scalable systems to support a seamless O+O business model
  • Champion observability best practices and embed monitoring into infrastructure and application layers
  • Develop automation solutions to reduce manual effort, improve reliability, and optimise resource utilisation
  • Leverage data, AI and anomaly detection to proactively identify and resolve system issues
  • Collaborate with cross-functional teams and technical partners to evaluate tools, run PoCs, and implement new capabilities
  • Lead, coach and mentor team members, fostering a strong culture of collaboration, innovation and continuous improvement
  • Drive a scientific, data-driven approach to system performance analysis and decision-making

What You’ll Bring:

  • Strong hands-on experience across SRE, DevOps, or system architecture environments with successful project delivery
  • Practical expertise in observability tools (e.g. Dynatrace, Prometheus, Grafana, ELK Stack)
  • Experience with automation frameworks (e.g. Ansible, Jenkins) and scripting/programming for tooling and process automation
  • Solid understanding of AI/ML-driven observability, including predictive analytics and anomaly detection
  • Strong problem-solving mindset with a focus on data-driven decision making
  • Ability to communicate clearly and effectively with both technical and non-technical stakeholders
  • Collaborative approach with a strong sense of ownership and continuous improvement
  • Proficiency in spoken and written Cantonese and English

How You Will Make An Impact:

  • Improve system reliability and reduce downtime through proactive monitoring and automation
  • Accelerate incident detection and resolution, enhancing customer experience across digital and physical touchpoints
  • Enable smarter, faster engineering decisions through real-time insights and analytics
  • Strengthen cross-team collaboration and drive consistent observability practices across the organisation
  • Increase operational efficiency by reducing manual work and optimising system performance
  • Build a future-ready, intelligent infrastructure that supports innovation at scale

What is holding you back?

Don’t miss out on this great chance to shape Your life!

Apply now!

We are an equal opportunity employer and welcome applications from all qualified candidates. The information provided will be treated in strict confidence and be used only for consideration of your application for relevant / similar posts within the AS Watson Group.

Applicants not hearing from us within 6 weeks from the date of advertisement may consider their applications unsuccessful. All personal data of unsuccessful applicants will be destroyed within 12 months from the date of application.