Singapore
Enterprise outages are rarely caused by complexity. More often, they are the result of weak governance and poor change discipline. Over the past 20+ years, I’ve led and governed 24x7 IT operations across telecommunications, government, healthcare, and global enterprise environments. From transforming large-scale service desk functions, to leading enterprise Command Centres, to establishing secure Data Centre operations within classified infrastructure programs — my focus has remained consistent: Engineer resilience through structure, scrutiny, and disciplined execution. My recent work centers on: • Major Incident governance and executive war-room leadership • Serving as CAB approver to mitigate operational risk and prevent avoidable disruption • Monitoring modernization and structured automation initiatives • Governing offshore managed service delivery with measurable SLA accountability • Designing resilient, scalable 24x7 operating models Operational stability is not accidental — it is a leadership decision. As organizations modernize and scale, governance maturity becomes a competitive advantage. Strong monitoring integrity and disciplined change control are foundational to sustained reliability. I remain committed to building accountable, high-performing service organizations that protect mission-critical environments at enterprise scale. #ITOperations #ITSM #ServiceGovernance #OperationalResilience #Leadership
Lead enterprise 24x7 Monitoring, Response & Recovery operations within a structured service governance framework supporting mission-critical healthcare systems. • Govern service delivery performance, ensuring SLA adherence and disciplined recovery execution • Strengthen monitoring integrity to eliminate detection blind spots and reduce preventable disruptions • Drive automation initiatives to standardize repeatable recovery workflows and reduce manual dependency • Leverage monitoring analytics and operational data to identify recurring failure patterns and implement preventive control improvements • Standardize and enhance Monitoring & Recovery SOP frameworks, improving response consistency and contributing to measurable reductions in avoidable incidents • Advance alert optimization and exception governance to improve accountability and cost awareness • Reinforce high-severity escalation discipline across internal and technology stakeholders
As the leader of Command Centre operations, I oversaw 24x7 operations, managing major incidents and root cause analysis. This involved coordinating incident processes, chairing calls, and ensuring stakeholder updates for high customer satisfaction. I prioritized major incident resolution and root cause analysis for critical applications, emphasizing team skill development to enhance support and prevent future incidents. In problem management, I identified and resolved recurring incidents through root cause analysis, implementing preventive solutions to evaluate effectiveness and minimize future occurrences. End-to-end monitoring ensured 24x7 effectiveness with SOPs for critical applications. Team and vendor management were crucial, ensuring proper staffing and training for incident resolution and problem management. Overseeing the Change Advisory Board, I assessed, reviewed, and approved changes, minimizing risks and disruptions to the IT environment, aligning them with organizational objectives. Accountability for KPIs and continuous service improvement was a key aspect. I developed incident reduction strategies, aiming to reduce major incidents and resolution time. Continuous assessment, planning, and execution of service improvements were prioritized, involving data analysis and change implementation. Throughout, collaboration with various teams ensured service operations within SLAs, aiming for operational excellence, incident reduction, and continuous service quality enhancement.
In this organization, I manage a 24/7 infrastructure, command centre and data centre operations. I also lead and oversees a team of Network Operation Centre (NOC) & Cyber Security Operation Centre (SOC) Specialists to monitor all IT infrastructures & network devices for availability and performance. This include all servers, storage, LAN, WAN, critical appliances, applications and backup jobs. I collaborated with the various towers on any fine tuning of monitoring parameters and threshold, perform tech refresh and life-cycle management, and also participate in Event and Impact Analysis. Monitor and observe system events and alerts or any out-of-the normal system behavior. Follow incident isolation procedures to identify and verify faults before applying problem resolution scripts to fix the issue or performing escalation. Manage incidents handling, validating, evaluating, planning, defining and implementing solutions to minimize impact, and ensure optimal uptime of critical systems and services across distributed environments. Perform post-change system health checks to verify that the changes did not cause any undesirable behaviour. Manage the life cycle of incident tickets from creation, update, escalation to closure. Review and verify the accuracy of operational scripts. Enforce IT security framework and management processes which includes security policies, procedures, standards and guidelines, including the administration and maintenance of security monitoring and incident response. Perform risk assessments and propose mitigating measures. Review the security process and procedures to be updated with actual practices. Lead the incident response actions during security incidents. ... pls contact me so that i could provide the full details
As part of the Global Infrastructure Operation organization, I lead a team of Systems/Application & NOC Monitoring Specialists to monitor all Global IT infrastructures & network devices for availability and performance. This will include all servers, blades, storage, LAN, WAN, VoIP, business critical applications, batch jobs and backup jobs. I worked with the Tools teams on any fine tuning of monitoring parameters and threshold, perform life-cycle management of tools deployed, participate in Event and Impact Analysis and to identify opportunities for improvement and automation. Technical competencies are critical in this position as I will provide technical guidance to the Specialists and also perform Level 1 and 2 supports within the IOC. Identifying systems/applications abnormality, fault, degradation of performance, problem isolation, proper escalation and incident follow-through to resolution is the most critical responsibilities for this position.
Supervise, monitor and maintain large enterprise network of more than 80 government agencies. Managing various capability team such as Wintel Operations, Security Operations, Back-up and Storage Operation & Messaging and Collaboration Operations team which these teams are assigned as the focal point monitoring for network troubleshooting, software distribution and updating, router and domain name management, performance monitoring, and coordination with affiliated networks. - Ensuring that alerts are monitored and acknowledged by each capability team, and incident tickets are logged on time with accuracy of data. Manage incidents, evaluating, planning, defining and implementing innovative solutions to minimise impact, enhance quality, and ensure optimal uptime of critical systems and services across distributed environments. - Managed critical incidents timely to ensure issues are escalated across production environment and identified potential risk and impact against the Key Production Environment (KPEs), and interfaced with global or various team to ensure resolution met established Service Level Agreement (SLAs). Initiate RToP (Ready to Operate/ Situation bridges with the management to discuss any resolution action plan & timeline. - Preparation of Operations Communications Advisory distribution to any affected agencies via email as well as SMS to ensure that stakeholders are periodically notified in the event of High Severity incidents or Scheduled Maintenance that could lead to an unavailability of a service. - Assisted & supported agencies lotus migration to exchange 2007, WAN & LAN Migrations and DR exercise implementations timelines. Coordinate and support agencies “technology refresh” program - Ensure daily Operation Progress Reports are delivered on time. - Assisting in change management activities. Evaluate & ensure user readiness and also track and report issues. - Perform periodic operational audit checks.