Stress Characterization

5/12/2022

When hardware or software architectural components are stressed to their limit under carefully controlled test conditions, we can quantify and visualize their breaking points. One of the results from such testing is a set of characterization canopies (another example result is the causality graphs of failures). These canopies can be used, along with the system operating point within a parameter space to empirically calculate available engineering margins.

Continuing with our structural engineering analogy, this is similar to varying the application environmental parameters such as temperature, humidity and discovering what application stress levels (and/or vibrations) induce a failure, and estimate margins.

Shown is an example of characterization canopy from a processor component: the last-level cache (LLC) in a multi-core processor. The Figure shows observed core utilization (in Z axis), as each core is loaded (in X axis), while studying its sensitivity to a 3rd parameter (in Y axis) – in this case, the application read-write mix. This 3D graph has been colored to visualize the safe operating regions within this slice of the parameter space. For this graph, other parameters must be fixed at some values (including the memory demand and nominal processor load); but are varied in the overall experiment. Various such slices can be created from data obtained from millions of experiments.

Assuming ‘correct’ application logic (to be visited for another day), these tests allow a careful study of stress saturation on the system operating environment either during system development or events such as component replacement, cyber-attacks that may overwhelm resources, etc.