Troubleshooting a slow system

Problem

A server or computer is currently running slow, but you are able to log into it. You want to see what is causing the problem.

Troubleshooting

Here are the commands and steps you can use to troubleshoot a machine that is running slow or is freezing up constantly.

uptime

Uptime is a command used to view how long a system has been online for. This doesn’t necessarily mean on the network but how long the local system has been running for. Uptime also displays a number of other useful things, which is the load average. ( See Below )

Screen Shot 2017-08-03 at 1.02.57 PM

As you can see on my mac, it’s displaying a load average of 1.81, 1.60 & 1.54.  Each category stands for a specific amount of time 1-, 5- and 15-minute load averages. Each number represents the average number of processes that are running without any interruptions. As you can guess, this indicates the average load over a period of time.

In the photo above, you can see that my MacBook was under stress for some time. If the output were to show us an average of 3.04, 0.17, 0.09 then we can determine that the spike in processes demanding resources was recent. Remember that number indicated a specific time. so if we were to flip the load average to 0.09, 0.17, 3.04. We can determine that the spike was under a high load for about 15 minutes.

So, a single CPU system with a load avg of 1 means it the single CPU is under constant load. Now, if a system has an avg load of 4, that means the system is currently under 4 times the load it can handle. If that’s the case, 3 out of the 4 processes are currently waiting for resources.

Using the uptime several times will allow a user to view how long the machine is under stress for and if its on going. Remember, each category stands for a specific amount of time1-, 5- and 15-minute. Keep that in mind when viewing multiple outputs from uptime. It will help you determine how long ago a spike was.

  • Here is where it gets tricky. What exactly is a High Load Average?
    • My personal definition of a system under a high load or heavy load is a machine that is behaving slow, lagging, or constantly freezing.

top

As we have discussed, the top command allows you to see load averages in real time. Through that, you can view load averages not only in real time but in greater detail. Please view the following posts about top command and shortcuts to better understand how to utilize the outputs in with top.

I/O Stat

iostat or input output stat is a command used to view the average load of your disk(s) or storage device(s). The content that is displayed for this, isn’t in real time but is displayed very similarly to uptime.

As you can see above, it displays the transfers speed per second. While also displaying the user, system and idle time. But, the load average remains the same 1-, 5- & 15- minute times. There you can see if the disks are under load but, not by much. If this is low, it can be ruled out when troubleshooting performance problems. If this is high, it could be because a large amount of swap space is being used. If that is the case, then that means a lot of RAM is being used. Since swap space is essentially extra RAM, taken from the Hard drives, it naturally competes with the read and writes speeds on a disk. Which in-turn causes the disk performance to suffer.

Screen Shot 2017-08-03 at 1.58.35 PM

Diagnose Memory Issues

When it comes to ram, you must remember that Linux will cache a file in the RAM in case it needs to access the program again. Caching the program allows it to access process/program much more quickly than if it were to re-open it after just closing it. So, if you are looking at the amount of RAM in top, you must subtract the file cache from the used RAM.

Another useful tool, you can determine which programs/processes are using large amounts of RAM by reviewing %MEM or in Unix, MEM.  From there you can kill them or troubleshoot why they are using so many resources.

Screen Shot 2017-08-03 at 2.55.31 PM

SYSSTAT

sar command – “The sar command writes to standard output the contents of selected cumulative activity counters in the operating system.” 

Memory

Sar, can also be used to view the load average in RAM by adding the -r option.

CPU

sar will allow a user to view an output of the CPUs average load. starting with the time, then moving to user space, nice, system, iostat, idle. you can set the output to display the average load every 4 seconds 5 times with the following option, -u. it looks like this, #sar -u 4 5.

Disk

Much like the CPU and RAM, sar can be used to view the load average for the disks, with the -d option added to it. Here you can view the total transfers per seconds, reads per second and the writes per second. keep in mind, that if a system is showing signs of slowing down, check the I/O of the drives first. It’s usually an indication that the swap is getting used.

Finally, if you want to pull data from an output you can use the -f option followed by the path and file name to save it for later on. this can be extremely useful when writing reports, or possibly helpful with future troubleshooting.

Advertisements

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

w

Connecting to %s