4. 监控模式
4.1. 监控项目
监测类别 |
异常 |
异常日志信息 |
|---|---|---|
功耗监测 |
出现超功耗 |
Device (xxx) power exceeds limit. |
获取设备功耗失败 |
Device (xxx) get power failed. |
|
过温监测 |
出现过温 |
Device (xxx) vr temperature exceeds limit. Device (xxx) chip temperature exceeds limit. Device (xxx) board temperature exceeds limit. |
CTF |
Device (xxx) chip temperature fault. Device (xxx) board temperature fault. |
|
获取设备温度失败 |
Device (xxx) get temperature failed. |
|
PCC监测 |
出现PCC |
Device (xxx) warning Pcc Device (xxx) critical Pcc warning: 单位时间pcc counter占比不超过3%。 critical: 单位时间pcc counter占比超过3%。 |
操作counter失败 |
Device (xxx) set counter failed: Pcc Device (xxx) get counter failed: Pcc |
|
Counter计数异常清除 |
Device (xxx) cleared counter unexpectedly: Pcc |
|
Power brake监测 |
出现Power brake |
Device (xxx) warning Pwrbrk Device (xxx) critical Pwrbrk warning: 单位时间pwrbrk counter占比不超过3%。 critical: 单位时间pwrbrk counter占比超过3%。 |
操作counter失败 |
Device (xxx) set counter failed: Pwrbrk Device (xxx) get counter failed: Pwrbrk |
|
Counter计数异常清除 |
Device (xxx) cleared counter unexpectedly: Pwrbrk |
|
DI/DT监测 |
出现DI/DT |
Device (xxx) warning Didt Device (xxx) critical Didt warning: 单位时间counter占比不超过3%。 critical: 单位时间didt counter占比超过3%。 |
操作counter失败 |
Device (xxx) set counter failed: Didt Device (xxx) get counter failed: Didt |
|
Counter计数异常清除 |
Device (xxx) cleared counter unexpectedly: Didt |
|
Power state(deepsleep)监测 |
Power state异常 |
Device (xxx) critical power state error. |
获取设备时钟失败 |
Device (xxx) get clocks failed. |
4.2. 监控命令
mx-diagease -m -t <time>
执行命令需要sudo权限。
-t 指定监控时长,支持传入格式为 [seconds] 或 [hh:mm:ss],若未传入该参数默认持续进行,需 Ctrl+C 退出mx-diagease,显示汇总信息。
执行以上命令,持续监控板卡功耗模块,count数据等,如有异常,将实时打印异常指标信息。可在mx-diagease运行目录下 mxdiag-log 文件夹中查看日志。
输出结果
退出后若监测结果为健康,显示如下所示:
MetaX Diagnostic tool Version: X.X.XX Product : C500 Kmd version : X.X.X Bios version : X.XX.X.X Maca version : X.XX.X.X ^C ------------------ Result ----------------- Device xxx Device xxx is healthy
如有异常会实时打印,退出后显示汇总信息,如下所示:
MetaX Diagnostic tool Version: X.X.XX Product : C500 Kmd version : X.X.X Bios version : X.XX.X.X Maca version : X.XX.X.X ------------------ Result ----------------- Device 0 Device 0 is healthy Device 1 WARNING, power exceeds limit CAUTION, warning Didt CAUTION, get temperature info failed Device 2 CRITICAL, critical power state error