The Gandi Community

[Resolved] Storage slowdown

A storage unit is currently experiencing a slowdown. Our teams are currently working on a solution.

Update (09:45 GMT): The situation improved between 07:00 and 08:00 GMT. There were significant slowdowns between 05:00 and 06:50 GMT.

 

Update (January 25th 09:00 GMT): A storage equipment is currently experiencing slowdown. The incident is similar to the one yesterday. Our technical team is working on solving the issue.

 

Update (January 25th 10:00 GMT): The I/O situation improved. Our technical team is still working to find a complete fix to the issue.

 

Update (January 25th 10:22 GMT): A storage equipment is currently experiencing slowdown. The incident is similar to the one this morning. Our technical team is working on solving the issue.

 

Update (January 26th 11:26 GMT): The I/O situation improved. Our technical team is still working to find a complete fix to the issue.

 

Update (January 27th 19:11 GMT): A storage equipment is currently experiencing slowdown. The incident is similar to the incident of the week. Our technical team is working on solving the issue.

 

Update (January 27th 22:00 GMT): The I/O situation is now stabilized. Our technical team is still working to find a complete fix to the issue.

 

Update (February 2nd 03h30 GMT): Another incident affects one of our storage units. We’re now rebooting the faulty equipment. We recently found a few corrective actions that we’ll soon be able to take in order to solve this kind of issues.

 

Update (February 2nd 20:19 GMT): Another incident has occurred, and slowdown was noticed, however the situation is stable right now.

 

Update (February 6th 02:09 GMT): Slowdown on one of our storage units. Teams working on it.

 

Two storage units are concerned by these incidents, which are isolated slowdowns in read/write operations. We suspect that the problem is two-fold: a software problem (blocking of operations), and a hardware problem (some disk models are unusually slow).

When these slowdowns occur, the implementation of iSCSI that lets us connect your servers to their disks may be dysfunctional. The result is an “I/O wait” that is artificially high (100%) even if the storage is once again rapid.

We are currently working on these three problems by giving priority to the capacity of our system to re-establish service after a slowdown.