This question was originally posted on DCIM Support by Garry Priestland on 2016-05-06
We have a DCO 750 server that we run monthly energy usage reports on on behalf of a customer. This month I noticed that the figure seemed wrong - way too low.
I changed the period to be last week and got a report saying no energy was used at all for the last week. Eventually managed to narrow it down to the 8th April when DCO stops reporting energy usage in the energy report. All the data for a daily report on 7th April seems correct, all the data for a daily report for 8th April is zero. I can find no evidence of any event that occurred on or around those dates.
I also noted that some of the racks for the last weeks report are missing totally - they don't even appear on the report as zero energy usage. The customer has removed some racks in April so these are showing lost comms in DCE at the moment but these removed racks DO still appear in the report.
Here is what I have tried so far.
We also have an other DCO / DCE server combination for the same customer but for a different network and this is behaving normally as expected.
Anybody got any ideas what the problem is?
This comment was originally posted on DCIM Support by Jef Faridi on 2016-05-06
Hi Garry, It is hard to say what might be the problem, but I would gladly study your data if they could be shared with me (via a shared box). Additional to your excellent description, the following data would be really appreciated: 1. a copy of the latest DCO backup file, 2. DCO server logs 3. and screen captures illustrating the issue I will send you an invite to my =S= box shortly, just in case if you would kindly share the data with me, thanks. Kind regards
This comment was originally posted on DCIM Support by Jef Faridi on 2016-05-09
Hi Garry, Many thanks for sharing the data with me, I will study the files and will get back to you asap, thanks. Kind regards
This answer was originally posted on DCIM Support by Jef Faridi on 2016-05-11
Many thanks for providing the data - it seems there have not been power readings for a while, apparently there are no power measurements since 2016-04-08.
Logs contain error messages with the notifications like Unable to determine device power measurements, Connection failed for external system: . DCE ., Read timed out.
I'm not sure what might have stopped the readings, but you might want to try to reschedule the readings from DCE and increase the timeout settings.
Here is how to reschedule the readings from external system (DCE):
Go to System Setup > External System Configuration > and Edit the server "StruxureWare Data Center Expert" > Next > and then increase the "Timeout" settings to higher values (currently 5 seconds, seems to be very low).
You might want to increase the values to 60 seconds or even higher.
After rescheduling/increasing the timeouts, you might want to let the server (DCO) run for at least 1 hr or so. And then if you wish, please collect a new set server logs and share with me, then we can verify if there are readings, thanks.
This comment was originally posted on DCIM Support by Garry Priestland on 2016-05-13
Hi Jef. I omitted to mention that time-outs were not originally 5 seconds. I had been playing with this setting to try to see if it was a comms issue between DCO and DCE. They were originally 120 seconds but have been as 900 seconds. It seemed to make no difference to the data. I have changed them back to 300 seconds and will wait to see what happens over the weekend, then send you the logs again. By the way - reducing to 5 seconds did cause the occasional error about DCO not being able to collect data from DCE. However as this was only once in a while error (maybe a couple of times in 8 hours) I presumed this not to be the problem. Regards
This comment was originally posted on DCIM Support by Jef Faridi on 2016-05-20
Hi Garry Many thanks for the additional info (that's puzzling) - I was wondering how it goes after the latest rescheduling, if there are any readings since last week? Otherwise please share a new set server logs with me (same box location as last time), thanks. Kind regards
This comment was originally posted on DCIM Support by Garry Priestland on 2016-05-24
Hi Jeff. Still the same issues I am afraid - All racks in all rooms are reporting zero energy usage. The new logs and backup are uploading to you now...
This comment was originally posted on DCIM Support by Jef Faridi on 2016-05-26
Hi Garry, many thanks for sharing the files - I will look into this and will get back to you asap, thanks. Kind regards
This comment was originally posted on DCIM Support by Jef Faridi on 2016-05-31
Hi Garry, Unfortunately I'm still seeing the same errors in the log files, errors messages complaining about 'unable to determine device power measurements' and read timed outs. That is usually an indication of network related issues and/or perhaps the DCE server is very slow to respond (I'm only guessing). I'm not sure if increasing the time-outs to 1200 s could help, and/or perhaps (if possible) to reboot the DCE server and see if that helps. Kind regards
This comment was originally posted on DCIM Support by Garry Priestland on 2016-05-31
Thanks Jef. How does DCO communicate with DCE and vice versa? The connections test between DCO to DCE (from within DCO) is always instantly passed and recently some new equipment was discovered onto DCE, which appeared within in minutes on DCO, so it seems strange that there should be time-outs in the logs. I can't shutdown the DCE server without some prior planning to the customer but will get it scheduled anyway.
This comment was originally posted on DCIM Support by Jef Faridi on 2016-06-06
Hi Garry, There are basically three different jobs/services running to get the readings from the external system/DCE, namely: power measurement retrieval device group retrieval alarm retrieval The "power measurement retrieval" is more costly and more complex process (compared to other two) - for some reasons it seems DCO is not receiving the power measurements (from DCE). As a test, you might want to try to set device info and alarm synchronization as high as possible,eg: Update device information every: 9999 seconds Full alarm synchronization every:9999 seconds and then check eth0 network bandwidth on DCO to see if data is really being transferred between DCE and DCO. I would also suggest checking the "date" for both DCO and DCE, this is to make sure that servers are talking the same language specially concerning the power measurements (in worth case one of the servers could/might be running in future, compared to the other one). Kind regards
This comment was originally posted on DCIM Support by Garry Priestland on 2016-06-07
Hi Jef, the clocks on the 2 servers are synced to UTC on the host VM hardware and are within a second or so of each other. I am beginning to wonder if data is being transferred too. There seems to be a peak of traffic every 10 minutes but the collection interval is 5 minutes (300s). I would have expected a peak every 5 mins from this. Full alarm retrieval is set for 3600 seconds. I have made the adjustments you suggest and will check the bandwidth usage graphs on the server again tomorrow. One thing I have noticed in DCE is that the free swap memory is very low <1%. Not sure if it is relevant as the swap memory free was around 40% on the date we stopped getting measurements. I still have not yet managed to schedule a downtime. PS. is there a prize for the longest thread here?
This comment was originally posted on DCIM Support by Garry Priestland on 2016-06-28
Hi Jef. The data is transferring I believe . I changed the transfer rate to 9999 seconds as you suggested on 7th June and today changed it back to 3600 seconds. You can see the peaks correspond to these times on the attached graphs. With the settings at 9999 seconds we do now seem to be getting power data in to DCO. The values are no longer zero in the energy reports but I have not yet had chance to cross refer them to data from DCE to see if they are correct. I will check later if the data is still being transferred with the settings at 3600 seconds. Any ideas what the issue is? Regards
This comment was originally posted on DCIM Support by Jef Faridi on 2016-06-30
Hi Garry, Yeah, it is strange that apparently it is taking so long to get the power readings (from DCE), if I could have your server logs then I can take a look and see if there is anything helpful. As a wild guess, perhaps the DCE server itself is "over loaded", or maybe needs to be rebooted (sometimes rebooting the server might help, specially in case of network interruptions/cut off). By the way it is also recommended to upgrade the DCO server CPU to at least 4 (I think it is currently running with 2 cpu's). Kind regards
Discuss challenges in energy and automation with 30,000+ experts and peers.
Find answers in 10,000+ support articles to help solve your product and business challenges.
Find peer based solutions to your questions. Provide answers for fellow community members!