Michael Nguyen

IT Consultant at CSUF | SRE/DevOps | AWS Certified SAA

Irvine, California, United States

About

I possess over 10 years of technical experience ranging from creating Drupal/CiviCRM websites for clients to system/network administration and working in a NOC/DevOps environment. I have always been passionate about technology and I'm continuing to learn new things and finding ways to challenge myself. My goal is to find roles that are DevOps/SRE minded. Specialties and Interests: Linux/Windows Administration, Terraform, SCM, Monitoring, Kubernetes/Docker, Virtualization, Scripting/Automation, AWS, Cloud Computing, CDN Management, Networking, ITIL/DevOps Methodologies

Experience

IT Consultant II - Research Computing at California State University, Fullerton
Sep 2025 - Present · 10 mos
Owner/Operator at Self-employed
Dec 2023 - Present · 2 yrs 7 mos
• While seeking my next full-time role I have been running an online e-commerce shop as a side hustle to make additional income • Sourcing vendors from China and US to produce unique products to be sold online • Being my own product manager having to review proofs, create samples to be approved before mass production • Sourcing various artists to help make unique designs • Using BigCartel as an e-commerce platform and Porkbun.com for the domain name • Use of Fera.ai for customer reviews and automated discount codes • Use of Mailchimp to send out newsletters to subscribed customers • Use of Payment Processors like PayPal and Stripe • Using Adobe Photoshop and Canva to create templates for social media posts • Selling products in other platforms such as Etsy, Ebay and Mercari • Running ads on Instagram/Facebook to attract new followers/customers • Processed 1000+ orders online • Over 3,000 followers on Instagram
BlackBerry (Hybrid)
- Site Reliability Engineering Specialist II
  Dec 2022 - Feb 2024 · 1 yr 3 mos
  • Migrated an internal service from BlackBerry’s own on-prem to AWS Staging and Prod environments. This was a Java service, Chef 12 runs on the instance and the service interacts with a PostgreSQL database and other services. Deployed an EC2 instance using Terragrunt and push the cookbook, data bags and configs to s3. Deployed AWS RDS database. • All of our Terraform/Terragrunt infrastructure/configuration and other code are on GitLab. Our Chef cookbooks, data bags and env json files are also there. We use SOPS to encrypt/decrypt secrets. Run GitLab pipelines to deploy the latest release for one of the services we support. Running a git rebase master to ensure our branches pulls the latest changes and fix any merge conflicts. • Help manage and support many K8s clusters using K9s tool. Run various kubectl commands and API calls to do certain tasks like a rolling restart, deployments, or looking at logs when troubleshooting. Upgrade EKS clusters using a script, Terragrunt and perform failovers as necessary. • For our monitoring/alerting we mainly use Prometheus/Grafana, AWS CloudWatch and alertmanager. They are either deployed within k8s clusters or on AWS Fargate when monitoring EC2 instances. Used Helm to redeploy our alert yaml files and upgrade Prometheus helm chart. Used AWS CloudTrail audit logs to assist with troubleshooting. • Provided support for AWS OpenSearch such as doing updates, adding disk space, running queries to help troubleshoot. Help create Kafka topics for our devs in AWS MSK and perform updates. • We use Jira for ticketing and Confluence to create wikis, runbooks, and operational procedures. When making changes in production we always create a change ticket and notify the NOC and other teams as necessary. • Part of a 24/7 on-call rotation. If there is an issue that affects customers NOC will call out and create a bridge and an Incident ticket for us to investigate and resolve within SLA timeframes.
- DevOps NOC Engineer II
  Feb 2021 - Dec 2022 · 1 yr 11 mos
  • Played an active role in helping POC/migrating all of our Cylance monitoring from CloudWisdom/Metricly to open source Prometheus and Grafana saving the company over $350k a year. Deploy IaC using Terragrunt on ECS Fargate per region per AWS account. Repo is stored in Bitbucket. Config and yaml files are pushed and stored in s3. Metric data stored in EFS. Various exporters used to scrape metrics like Cloudwatch, cAdvisor, RabbitMQ, SQL, etc. Puppet installs node exporter to all of our EC2 instances (Windows/Linux) to be able to collect hardware related metrics like CPU, Memory, Disk, etc. Alertmanager used to manage and send out alerts; owners are notified via email, MS Teams and OpsGenie. Critical alerts are also linked with our 24/7 Global NOC which will escalate and follow incident management procedures. Grafana used for data visualization and dashboards. Perform cert updates on nginx server side and client side for Prometheus.
- NOC Engineer
  Sep 2019 - Feb 2021 · 1 yr 6 mos
  • Using Cloudwisdom for performance monitoring and cost optimization. Alertsite used for synthetic monitoring to record and mimic customers user actions. These monitors are tied with OpsGenie which pages the appropriate on-call team rotation whether its DevOps or one of our many other engineering teams. • Ensure NOC notifications are accurate and sent out within published SLA. Create outage and RCA tickets in JIRA. • Assist in creating/updating operational run-books in Confluence. • Tasked to takeover the Cloud Custodian tagging project. CC enforces our AWS resources with tags. If resources are not tagged properly, an email will be sent out to the owner of the resource that will be stopped/deleted within 7 days. ECS/Fargate tasks are deployed with Terraform, when a PR is merged with master in Bitbucket a Jenkins job will build the docker image and send it to our centralized AWS account in ECR for the task definitions to pull from. • Setup automated Windows Patching on our SQL Witness servers by creating Powershell scripts running on task scheduler. The NOC receives email notifications of logs via Sendgrid API. • Help setup AWS Cloudwatch alerts to invoke SNS topics and create Lambda functions with Python and KMS to post messages in our various Slack Channels. • Automated JIRA issues based on AWS EC2 health events for instances going into retirement. This was done via Cloudwatch rules, Lambda with Python and using JIRA's REST API. • Automated our NOC wall screens to refresh our monitors on a daily basis. This was done using a Python script with the Selenium chromedriver. The script will open each monitoring site, authenticate and automatically resize and move to each of our NOC TV screens. • Working knowledge of various AWS services: EC2, ELB, ASG, Athena, Cloudtrail, Cloudwatch, Lambda, S3, KMS, SNS, SQS, IAM roles, etc. • Our infrastructure is mainly in AWS and we monitor 5000+ nodes which are mostly Linux based.
Network Systems Administrator at Ladera Lending
Aug 2019 - Sep 2019 · 2 mos
• Migrated Cisco Meraki Switches, MX Firewall, WAPs, desktops and VoIP phones (Cisco 303/Grandstream GXP2135) from old office to new office in Lake Forest location. • Troubleshooted, diagnosed and resolved all network related issues including LAN, WAN, Wireless network, QoS with Cisco Meraki, COX and YTEL (phone vendor) support. • Created a brand new Sharepoint Company Intranet with Office 365. • Helped with imaging laptops and desktops using WDS. • Added printers on network and made shareable on Active Directory. • Helped with racking servers and re-cabling/organizing server room. • Managed IT Asset Inventory through open source software Snipe-IT. • Provided onsite and remote helpdesk support using Meraki Remote Control, RDP and RemotePC. • Created IT maintenance workflow and responded to IT outages and created reports. • Helped with IT related onboarding/offboarding procedures using Active Directory and Microsoft Office 365. • Monitored/Updated firewalls, switches, Synology NAS DS918+ and internet security. • Created, updated, and maintained documentation of network scope, findings, changes, solutions, and SOPs.
NOC Engineer at iHerb, LLC
Sep 2017 - Mar 2019 · 1 yr 7 mos
• Assist with Cross-functional technical teams with website related projects, custom monitoring, and performance analysis for web services and infrastructure resources. • Experience with Content Delivery Network (CDN) Management using Cloudflare for failing over Web/API traffic as well as using rate limiting to mitigate attacks and block potential threats. • Experience with various AWS services such as VPC, EC2, ELB, AMI, Cloudwatch, Cloudtrail, ElasticSearch and Auto Scaling configuration. Also working knowledge of Aliyun and GCP cloud providers. • Troubleshoot issues by analyzing logs in Windows/Linux/Kubernetes environments as well as utilizing MSSQL queries. • Familiarity with the ELK stack. Using Kibana to analyze logs for issues and setting up alerts based on Kibana REST API queries provided by Dev teams. • Manage and maintain monitoring of corporate IT infra (servers, containers, storage, applications, network). • Experience with using VMware vsphere client to help manage servers and assist with drive expansions. • Experience with using Kubernetes environments (AWS/Aliyun/GCP) to help analyze/resolve issues, as well as deploying Twistlock for security, Datadog for monitoring and Fluentd to collect logs. • Help with deploying .apk files on Linux image servers for China Mobile Android iHerb app. • Experience with Windows Powershell/Linux Bash scripting to clean up logs, restart services, delete pods, etc. • Run Ansible playbooks to automate the restart of critical services across multiple servers, ensuring HA and minimizing downtime • Use of KACE ticket system, JIRA/TFS for project management and Confluence to document illustrative SOP's. • Use of monitoring software such as PRTG, Datadog, Site24x7, Dynatrace and AppDynamics. • The NOC is 24/7/365 and is used as a technical resource hub to other departments and adheres to ITIL change and incident management processes.