Post by Michael Leveille, PhD

Failure Analysis & Reliability Engineer | I design targeted stress tests and multi-modal characterization to isolate mechanisms fast and drive corrective actions that reduce risk.

Calibration failures are silent schedule killers. At Sandia National Laboratories, I worked on reliability studies where the testing pipeline depended on a few critical instruments. One day a DMA (dynamic mechanical analyzer) refused to calibrate. Field service was involved. They were stumped. The problem wasn’t just “the tool is down.” The real risk was what happens next: • Test throughput stalls • Confidence in existing data drops • Program timelines start slipping quietly So I treated it like a failure analysis problem. First step: define what “failure” actually meant. What specifically changed in the calibration behavior compared to normal? From there, I isolated likely causes through basic checks and controlled steps instead of random adjustments. Once a fix was implemented, the important part wasn’t just getting the instrument running again. It was confirming stable calibration performance and documenting what changed so the issue wouldn’t repeat. The DMA returned to service and testing continued without losing the thread of the program. My takeaway: tool ownership is reliability work. If the test system isn’t reliable, the conclusions aren’t reliable either. What’s your go-to move when a critical tool fails mid-program?

Post content