This question was originally posted on DCIM Support by Nigel Fanning on 2020-01-29
I am not that experienced in setting thresholds in DCE, something I need to address. However I have an interesting question posed recently by a combination of a petulant PDU and an inquisitive data centre manager (not to be confused).
A PDU recently decided to stop providing data for power and temperature, this is nothing new, however the PDU did not go offline. The PDU manually rebooted fine and has been giving us a normal service since.
I would like to know though if a threshold can be written to detect 'flatlining', in other words, the PDU is online but streaming the same EXACT temp and power - for a given period. Below shows a trace of the culprit. I should mention the red trace is a different PDU in the same rack.
This answer was originally posted on DCIM Support by spezialist on 2020-01-30
Dear Nigel Fanning,
Unfortunately, what you want cannot be implemented using thresholds in DCE software.
If I understand you correctly, did all other sensors of this problematic PDU display normal (except power and temperature)? If so, I can only recommend that you look for other ways to solve this problem. Maybe you should just replace the controller for this problem PDU? Or look for another way to make it clear in real time, that this PDU has started working incorrectly and needs to be rebooted automatically.
If you have any more questions, please ask.
This comment was originally posted on DCIM Support by Nigel Fanning on 2020-01-30
Thanks but all the sensors (all sockets and environmentals) were the same flatline readings until the reboot. Unfortunately there are no controllers for these units, only the units themselves.
With so many PDU's to monitor it would be better to automate the monitoring using DCE and not have to do it manually.
This comment was originally posted on DCIM Support by spezialist on 2020-01-30
Dear Nigel Fanning,
However, I can assure you, that DCE software is not suitable for you to solve this problem.
May I ask you the vendor/model/firmware of your PDU(s)?
Also, is it a problem with just one PDU or many PDUs (or all)?