« All articles

Case study – System timeouts for an unknown reason

A client complained about his system getting frequent timeouts and freezing for an unknown reason.
The client could not identify the cause of the problem, so he turned to us with a combination of hope and despair.
We logged into AimBetter, clicked on the Query tab which revealed that the timeout occurred at midnight:

1

The timeout on our client’s application was set to 20 seconds but the interfering process ran for 24 seconds, which caused the timeout.
We saw that the status of the above query was suspended and that it was waiting for a disk resource.
The database is located in drive D, therefore we checked what happened in on that drive.
We saw that the stall was caused by a resource overload – drive D had been 100% busy during the times of the interfering process:

2

We also found out that another user ran a heavy query:

3

Through an even deeper investigation we discovered that a week before an employee had tried to synchronize PDFs on drive D for backups and that caused the overloads:

4

We looked at the writing speed to drive D and saw that it was high:

5

6

The code that locked the other process (as you can see in the first screenshot) was as following:

INSERT INTO Yoman_Tipul (
Msd_Yoman
,Msd_Tipul
,Y_Datetime
)
SELECT Msd
,0
,Y_Datetime
FROM Inserted

And it locked the code you can see in the third screenshot:

SELECT * FROM Yoman WHERE Bakara = 0 ORDER BY Msd DESC

The technical explanation as to why the resource overload caused the problem is as following:

1. A query gets suspended when it is requesting access to a resource that is currently not available.
2. This can be a logical resource like a locked row or a physical resource like a memory data page.
3. The query starts running again once the resource becomes available

In the second screenshot you can see that the bottleneck was availability of drive D (see top left corner of the screenshot).
With AimBetter we were able to detect the cause of the problem within minutes and to direct the client how to solve the problem. By moving the synchronization of PDF files from drive D to another drive, the system resumed normal activity without locks or slowdowns.