How do you know when a chip is running correctly? Any idea when or why it might fail?
Lots of variables are involved in determining how well a chip is operating because it doesn’t simply come down to the results of some computation or how well a chip might be controlling a motor right now. Artificial intelligence and machine learning (AI/ML) is being used to check the health of the motor, but what about the health of the chip? That requires a different set of sensors. This is where proteanTecs comes into play with on-chip support to deliver status information for AI analysis (Fig. 1).
I talked with Noam Brousard, Vice President Solutions Engineering at proteanTecs, about the company’s on-chip telemetry solution.
The Challenge of Adding Chip Telemetry
Developers often utilize JTAG for debugging and event tracking features like Arm’s Embedded Trace Macrocell (ETM) can provide information about the computations being performed. However, they typically lack details like chip temperature or other aspects that would be useful in providing overall chip health information. Arm’s Performance Monitoring Unit (PMU) is also focused on the computational units, which are only a part of the system.
proteanTecs’ support works in a similar fashion, being included as additional IP called Agents within the chip design, but it targets different details that can provide insight into the health of the chip (Fig. 2). This information is pulled in a fashion similar to JTAG, ETM, or PMU data, so it can be uploaded to the cloud for analysis. AI models are able to be trained on this data to detect potential problems, defects, or future problems.
Over a dozen standard Agents can be added to a chip design. These are tied together with proteanTecs’ network and support logic that delivers captured data for analysis.
