This question was originally posted on DCIM Support by Thomas Price on 2018-09-12
So we observer something very odd. At multiple sites on our network in different states we are seeing our head units (NetBotz 450) alert that they have been unplugged or malfunctioned. All the devices reply with a ping but fail to connect from either web or Advanced view (4.5.3). A cold reboot resolved this. Problem remains however, what might cause this. Has anyone ever seen this across and entire organization? I have seen various network scans trigger alerts on UPS mgnt. for things like SNMP but this is a first.
(CID:134030850)
Solved! Go to Solution.
This answer was originally posted on DCIM Support by Thomas Price on 2018-10-15
So I think I nailed down the cause. Reviews of the logs seem to have this error occur around NTP time sync errors. The NetBotz head unit seem to try to reboot after many failed attempts to time sync. Come to find out our recently installed new time server appliance is having issues causing the NTP time sync error thus causing the head units to reboot and in some cases to the point of needing a cold boot. While the cold boot hasn't been an issue recently, the reboots (unplugged state) have still from time to time. All seem to be linked to our time server issues affecting all systems at all sites.
(CID:134684080)
This comment was originally posted on DCIM Support by spezialist on 2018-09-12
Dear Thomas Price,
Maybe this is because of power supply problems?
How are all your NetBotz-450 connected to the power supply on all sites?
With respect.
(CID:134030991)
This answer was originally posted on DCIM Support by Steven Marchetti on 2018-09-13
Hi Thomas,
As spezialist mentioned, power issues can cause such things.
In addition to normal power issues, a surge on a network line can also cause networked devices to have issues from freezing to rebooting to actual damage.
The only other option I can think of is if you also have StruxureWare DCE and someone tried to push some configuration to all devices. I've never see that cause what you're talking about but other than network related issues as you've suggested, I'm not sure what else could be common among all of the appliances. I don't know of any specific scan that would cause this either.
Unless you have some of the devices pushing their logs to a syslog server, the appliance log that might show us something is wiped clean when the device is rebooted.
Steve
(CID:134031505)
This comment was originally posted on DCIM Support by Thomas Price on 2018-09-13
Steve,
Thanks for posting a reply. I don't think power was the issue since it occurred at different sites in different states. The only other time I saw something of this nature was when a vendor was doing Network Pen Testing that cause our iLO ports to go down on HP servers. I did make enquiries to that point but came up blank. I am in the process now of updating the devices to ver. 4.6.2 (latest). Fingers crossed that we don't have a repeat.
Thanks again, Tom
(CID:134031671)
This comment was originally posted on DCIM Support by Thomas Price on 2018-09-13
Hi,
We run all our data room equipment on APC UPS power with backup generators. We had no alerts from the UPS's that was out of the ordinary.
Thanks for your comment.
(CID:134031672)
This comment was originally posted on DCIM Support by spezialist on 2018-10-07
Dear Thomas Price,
Tell us, please, did you solve your problem or not?
With respect.
(CID:134680026)
This answer was originally posted on DCIM Support by Thomas Price on 2018-10-15
So I think I nailed down the cause. Reviews of the logs seem to have this error occur around NTP time sync errors. The NetBotz head unit seem to try to reboot after many failed attempts to time sync. Come to find out our recently installed new time server appliance is having issues causing the NTP time sync error thus causing the head units to reboot and in some cases to the point of needing a cold boot. While the cold boot hasn't been an issue recently, the reboots (unplugged state) have still from time to time. All seem to be linked to our time server issues affecting all systems at all sites.
(CID:134684080)
This question is closed for comments. You're welcome to start a new topic if you have further comments on this issue.
Discuss challenges in energy and automation with 30,000+ experts and peers.
Find answers in 10,000+ support articles to help solve your product and business challenges.
Find peer based solutions to your questions. Provide answers for fellow community members!