Customer is having issues with a DCE reporting to a DCO and certain sensors timing out.
He explains in detail:
"We have started get error messages to server.log file on DCO regarding retrieving the power data from DCE server. We are using modbus devices (ABB CMS700) which have totally around 1500 sensors and looks that DCE can read these without any issues. DCO generates java errors which show SocketTimeoutException although we have increased timeout to 7200 sec (2h). We tested run a DCE API command getPeakDataForSensors to one device which has more than 40 sensors, and this took only 2.5 sec (period 1 day). Based on this test, retrieving all devices and sensors should not take very long time.
We copied the production data to the test server where we removed associations to modbus devices from one by one, but this didn’t help. DCO is still generating SocketTimeout error messages even when every device was unassociated… perhaps DCO was finding devices for the power trend of history. If there is illegal associations or something else wrong in modelled room in DCO, seems that we cannot find the root cause by self. Can you help us solve the issue. When DCO doesn’t get new measurements, we afraid that all racks are moving to state ‘no power load’ at certain period, and hopefully we can avoid this in time.
We are using DCO 8.2.7 and DCE 7.6.0 versions in prod. Our test environment has newer versions DCO 8.3.2 and DCE 7.7.1."
Capturelog from DCE as well as DCO log available in BOX folder at request.
To clarify, you change the below settings in your external systems configuration entry for the DCE server in DCO?
I assume you increased the "stop waiting for data after" value to 7200? Please make sure you also increase the "update power information" value as well as if the DCE server is taking a while to provide data back to DCO then you don't want to keep asking for that data more frequently than the data can be provided.
Given your post, I assume alarms and device group data are properly syncing in DCO so you are seeing new alarms from the DCE server?
DCO asks DCE for avg and peak power information based on the "update power every" interval in the above window. It does not matter whether the DCE devices are associated to items in DCO, the query will be for all DCE devices visible to DCO which explains why you did not see a difference when you unassociated devices on your test server.
Does your DCE server have a long history of measurements for the modbus devices (lets say over a year)? If yes, would you be willing to purge some of that history to see if that improves the situation?
I've received the following reply from the customer:
Deletion of older than 1 year old history data didn’t solve the issue, DCO gives still timeout error when reading power data from DCE
We updated external system configuration entry for the DCE server in DCO and it looks this
We know that the device what we are using, CMS700, is not the best choice for measuring and modelling especially when every device can have 96 sensors for power, +energy, +current, etc. This can cause problem to read all sensors, we have totally 13173 sensors in 38 CMS devices on DCE-1F.
When DCO is fetching measurements, does it send a request which contains all devices or does it send a request per device? I did some tests and noticed that when asking peak powers of all sensors of one device, DCE API can return values in few seconds. When request contains more devices then performance was dropping fast. For instance when DCE API was called by a merged request of 20 devices then it takes more than 20 minutes to process. Can this be the root cause here?
Please advice 🙂
DCO makes web service call(s) to the external system/DCE requesting for device and sensor measurements/updates. It will not request for one specific device at a time, but makes a general call. The error collecting data message usually occurs if DCO is unable to receive data from the integrated external system/DCE. There could be several reasons for the server response to be cut off:
network infrastructure between the two servers, specially in slow network and/or if the servers are far apart that may affect the communication between the servers (DCO-DCE)
DCE server may be under too much stress and unable to run the necessary background jobs.
Reboot the DCE server (to re-initiate the jobs) and see if that helps.
If the server (DCE) is stressed because it is monitoring too many devices, perhaps considering additional purchases could be considered (just a thought).
However, if you could provided the complete server logs, then I will see if they contain anything useful.
DCO 8.3.2 server logs can be downloaded from the web client, Administration > Download log files
I will send you an invite to my =S= box shortly so the data safely can be shared with me, thanks.