Hillsboro, Oregon, United States
A proven tactical and strategic leader in the field of site reliability, quality assurance, continuous integration, technical operations, risk, incident management and SaaS (Software as a Service) solutions. A dynamic and motivated, thought and coach technical leader with over 20 years of experience in an ever evolving field of Technology and SaaS for hosted and cloud applications • Developed, maintained and designed site reliability, resilience and quality systems, tools and processes. • Oversaw global operations services for event and incident response for a 24/7 Operations Center in follow the sun model for Geo Distributed teams in US, EMEA, India and Manilla. • Designed, Implemented and Oversaw the process of Incident Response, Incident Management, After Action Reviews and Problem Management. Outcomes included Reduction of Time to Detection, Reduction of Time to Repair, and Increase on First time resolution, and Reduction of Escalations. • Designed Career ladder, Individual Career Plans, Annual Evaluation. Hired, trained and mentored less experienced co-workers. • Created and defined several roles in the software development cycle to streamline the time to delivery and increase the quality of Build to QA by overseeing the continues (daily builds) integration process, as well as the continues deployment, maintenance and every day run state. • Has excellent analytical and design skills combined with a very strong ability to communicate and present ideas. Highly creative and well organized. Created financial analysis and prepared budgets. • Embraces a challenge, copes well with ambiguity, and is not afraid of failure.
As Manager III at AWS Premium Support, I lead a specialized team of network engineers focused on resolving complex connectivity, performance, and scalability challenges across AWS's global infrastructure. My role encompasses building and developing high-performing technical teams that provide expert-level support for enterprise customers' network architectures, while establishing operational excellence mechanisms and driving continuous improvement initiatives. I am responsible for creating and maintaining a customer-obsessed culture that balances tactical support excellence with strategic technical leadership, ensuring our team delivers consistently high-quality solutions while meeting aggressive SLAs and maintaining exemplary customer satisfaction metrics. Key responsibilities include developing and executing strategic initiatives to enhance our network support capabilities, mentoring engineers in complex problem-solving methodologies, and driving cross-team collaboration to improve service delivery. I establish and monitor key performance indicators, implement process improvements, and lead technical deep-dives for critical customer escalations. Additionally, I oversee the development of technical documentation, automation solutions, and best practices that enable scalable support operations. My role requires maintaining strong partnerships with AWS service teams to influence product improvements based on customer feedback, while ensuring my team stays current with evolving network technologies and AWS services. I also focus on talent development, creating growth opportunities for team members, and building a diverse, inclusive team culture that promotes innovation and technical excellence.
Responsible for overseeing the development, deployment, and maintenance of the core technical systems and infrastructure that power the company's sports content production, delivery, and operations. This includes managing a team of software engineers, systems administrators, and other IT professionals to ensure the reliability, scalability, and security of the underlying applications, databases, servers, and network components. A key focus would be driving technology strategy and roadmaps to enable new features, improve operational efficiency, and adapt to the evolving business needs of the sports content business. This involves collaborating cross-functionally with other department leaders, monitoring system performance, troubleshooting issues, and implementing preventative measures to minimize disruptions. Additionally, Responsible for budgeting, resource planning, and vendor management for the technology stack supporting the sports division.
Responsible for overseeing the day-to-day operations and maintenance of the critical technical infrastructure and systems that power the company's various products and services. This involves managing a team of systems administrators, site reliability engineers, and other technical specialists to ensure the stability, scalability, and performance of the underlying computing resources, networks, databases, and software applications. A key focus of this role would be proactively monitoring system health, identifying and resolving issues, and implementing preventative measures to minimize downtime and disruptions. This requires close collaboration with cross-functional teams, including software developers, product managers, and customer support, to understand business priorities and align technology solutions accordingly. Additionally, I am responsible for capacity planning, change management, incident response, and continuous improvement initiatives to enhance the overall operational efficiency and resilience of Amazon's technical ecosystem.
Responsible for leading a team of highly skilled site reliability engineers (SREs) focused on ensuring the availability, scalability, and performance of the company's critical production systems and services. This involves developing and implementing robust operational practices, tooling, and automation to proactively detect, mitigate, and prevent issues before they impact customers. A key focus of this role instills a culture of reliability engineering principles across the organization, such as distributed system design, fault tolerance, observability, and incident response. This would require collaborating closely with software engineering teams to embed reliability best practices into the development lifecycle, as well as working with product managers and other stakeholders to align reliability targets with business objectives. Additionally, I am responsible for leading continuous improvement initiatives, driving technical strategy, and mentoring the team of SREs to continuously enhance the resilience and scalability of the company's infrastructure and applications.
Responsible for overseeing the end-to-end incident management and risk mitigation processes for the company's critical technical infrastructure and systems. This involves leading a team of incident response coordinators, site reliability engineers, and risk analysts to ensure rapid identification, escalation, and resolution of issues that could impact service availability, data integrity, or security. A key focus of this role would be developing and constantly refining the incident management playbooks, runbooks, and risk mitigation strategies to enable a highly coordinated and effective response to both anticipated and unforeseen incidents. This would require close collaboration with cross-functional teams, including software engineering, security, and business continuity, to understand potential failure modes, threat vectors, and recovery capabilities. Additionally, I would be responsible for maintaining comprehensive risk registers, driving data-driven risk assessments, and implementing proactive measures to enhance the overall resilience and recoverability of the company's technology stack.
Responsible for overseeing the operational health and performance of the company's critical business applications and the underlying technical infrastructure that supports them. This involves managing a team of systems administrators, site reliability engineers, and application support specialists to ensure the seamless delivery, scalability, and availability of these mission-critical systems. A key focus of this role would be establishing and continuously improving the operational processes, monitoring tools, and incident response procedures to proactively detect and resolve issues before they impact end-users. This requires close collaboration with application development teams, product owners, and other stakeholders to align operational objectives with business priorities. Additionally, I would be responsible for capacity planning, change management, release coordination, and driving continuous improvement initiatives to enhance the overall reliability, efficiency, and agility of the application delivery lifecycle.
High level integration engineer assigned to integrate more complex production software with increased responsibility for independent planning and implementation. Gained progressive recognition as a specialist. Assumed increased responsibility in the Continues Integration Process, providing technical and integration strategies or approaches to the group. Participated in the analysis and conceptual approach to integrating and integration testing of complex projects, utilizing professional experience in the development of plans, specifications and alternative solutions.