In this second tutorial on server load, we outline the steps that should be taken when investigating where server load originates and what may be causing your server to become overloaded. As noted in Part 1of our series, excessive use of any apps or services can typically cause load issues. Here are the four main areas of concern:

  • CPU
  • Memory (including swap)
  • Disk I/O
  • Networking

Historical Load

Typically, when troubleshooting server load, we are notified of a high load error or receive an email informing us about an event that has already passed. Unless an admin can log in while the load is occurring, trying to pin down where the load originated is difficult. With that being said, there are several tools we can use to identify the timeframe and load averages that occurred during a load occurrence. The first step uses the sar -q command to locate the timeframe in which the load occurred. 

SAR Command

Using the sar command specifically, we can review the past load history to pin down when it experienced an elevated server load. If we see a pattern of high load times, say between 1:00 am, and 2:00 am on the 22nd of the month, we can move forward and use the sar command to view load between the times noted. Below is the example output of the sar command using the - q flag.

root@host [~]# sar -q -f /var/log/sa/sa22
Linux 3.10.0-1127.19.1.el7.x86_64 ( 01/22/2021 _x86_64_  (4 CPU)

12:00:01 AM runq-sz  plist-sz ldavg-1   ldavg-5  ldavg-15   blocked
12:10:01 AM  5        567      0.06      0.11      0.12         0
12:20:01 AM  9        567      0.03      0.04      0.08         0
12:30:01 AM   4       576      0.24      0.14      0.11         8
12:40:01 AM   7       576      0.04      0.07      0.11         0
12:50:01 AM   7       570      0.08      0.06      0.09         0
01:00:02 AM   8       578      0.03      0.07      0.08         0
01:10:01 AM   5       570      0.00      0.04      0.05         0
01:20:01 AM   7       577      0.00      0.02      0.05         1
01:30:02 AM   3       573      0.06      0.03      0.05         0
01:40:01 AM   6       566      0.06      0.05      0.05         0
01:50:01 AM   6       563      0.07      0.08      0.05         0
02:00:01 AM   16      589      0.14      0.06      0.06         2
02:10:02 AM   1       577      1.10      1.16      0.69         4
02:20:01 AM   10      581      1.87      1.76      1.27         2
02:30:01 AM   6       571      0.04      0.40      0.81         1
02:40:01 AM   4       567      0.03      0.10      0.46         2
02:50:01 AM   8       568      0.10      0.08      0.27         1
03:00:01 AM   10      647      0.14      0.11      0.20         0
03:10:01 AM   4       638      0.04      0.09      0.16         0
03:20:01 AM   8       640      0.06      0.09      0.13         0
03:30:01 AM   9       651      0.04      0.25      0.22         0
03:40:01 AM   8       629      0.10      0.16      0.20         1

The command’s full output is shown from 12:00:01 AM to 11:50:01 PM for the 22nd of January. It should be noted that the sar logs rotate out each month and overwrite the log from a month prior.The sar command also can review CPU, memory, swap, and I/O data using various flags. 

Current Load

If an admin can log in right away, several of the tools noted below are excellent at deducing high server load.


This command is an interactive process-viewer, process-manager, and system-monitor designed as an alternative to the top command. To install it, we use the command below.

root@host [~]# yum install htop

To view the current statistics, simply run the htop command.

Using the menu, we can sort, filter, and search for information broken down by multiple factors, including PID, user, priority, state, time, and the percentage of CPU and memory being used. Assuming the load is not too high, this is an excellent tool we can use to locate and stop a high load in its tracks. 


In a prior knowledge base article, we reviewed PCP or Performance Co-Pilot. PCP is an assessment and evaluation tool previously known as Dstat. It is used to collect a wide range of server metrics while examining current and prior operational data. As a side note, Dstat was renamed to PCP after RedHat took it over. Dstat was a versatile replacement for the following Linux toolsets:

  • vmstat
  • Iostat
  • netstat 
  • ifstat

RedHat also added additional features like more counters and increased flexibility over those older tools. To install PCP, visit the PCP homepage tool. An additional feature of dstat is the ability to run a cron of a dstat command to take a snapshot of your server's processes every X seconds/minutes. It then allows us to record that info into a .csv format, which we can download and import later into excel or google sheets for review.

Solving High Load Issues

We can break down the high server load issue once we determine where the load originates.


Typically, issues revolving around a high CPU load indicate that we need additional cores to handle the system's extra workload. The other cause we can address is working with the application to either optimize or reduce the application usage or horizontally scale the application. As noted in Part 1, some of the common causes of increased CPU usage are as follows:

Here is a command we can use to gather information about CPU processes and queue usage.

root@host [~]# resize; clear; date; echo "Top 10 Processes";echo "";ps -eo user,%cpu,%mem,\
rsz,args|sort -rnk2|awk 'BEGIN {printf "%s\t%s\t%s\t%s\t%s\n","USER","%CPU",\
"%MEM","RSZ","COMMAND"}{printf "%s\t%g'%'\t%g'%'\t%d MB\t%-10s\n",$1,$2,$3,\
$4/1024,$5}'|head -n10;echo "";sar -u 2 5;echo "";sar -q 2 5

This command provides output like so.

Wed Jan 27 14:26:50 EST 2021
Top 10 Processes

mysql   0.3%    3.9%    146 MB  /usr/sbin/mysqld
cpanelc+        0.2%    0%      2 MB    /usr/local/cpanel/3rdparty/sbin/p0f
wp-tool+        0%      0.1%    5 MB    /usr/bin/sw-engine
wp-tool+        0%      0.1%    4 MB    /usr/bin/sw-engine
USER    0%      0%      0 MB    COMMAND
systuser        0%      0.5%    20 MB   Sonar

Linux 3.10.0-1127.19.1.el7.x86_64 (   01/27/2021      _x86_64_        (4 CPU)

02:26:50 PM     CPU     %user     %nice   %system   %iowait    %steal     %idle
02:26:52 PM     all      0.12      0.00      0.12      0.00      0.37     99.38
02:26:54 PM     all      0.00      0.00      0.00      0.00      0.25     99.75
02:26:56 PM     all      0.00      0.00      0.12      0.00      0.50     99.38
02:26:58 PM     all      0.00      0.00      0.00      0.00      0.37     99.63
02:27:00 PM     all      0.37      0.00      0.25      0.00      1.36     98.01
Average:        all      0.10      0.00      0.10      0.00      0.57     99.23


If a PHP script is over-utilized or not coded well, it will cause an excessive CPU load. The script should be reviewed and optimized by a web developer. Generally, these scripts perform a specific function on a website like file manipulation, content management, handling multimedia duties, or other utility roles used to improve the website’s usability or functionality. 

Apache/Background Processes

Background processes like malware scans or increased Apache processes can have a severe impact on available resources. If enough of these processes occur simultaneously, the available RAM is consumed, and the server begins running into swap issues. To solve this, a user should identify the specific processes related to the increased load using a command like this to find Apache’s top requests.

root@host [~]# cut -d' ' -f7 /usr/local/apache/logs/access_log | sort | uniq -c | sort -rn | head -20
2886972 /
   1107 /.env
   1016 /robots.txt
    688 /vendor/phpunit/phpunit/src/Util/PHP/eval-stdin.php
    582 /index.php
    470 /favicon.ico
    453 /TP/public/index.php
    449 /config/getuser?index=0
    427 /manager/html
    424 /TP/index.php
    411 /thinkphp/html/public/index.php
    410 /public/index.php
    398 /html/public/index.php
root@host [~]#

Next, we can also check if the website is being hit by too many requests from an IP address.

root@host [~]# cut -d' ' -f1 /usr/local/apache/logs/access_log | sort | uniq -c | sort -rn | head -25 | column -t

root@host [~]#

Too many connections may indicate a bot, web scraper, or someone just hitting the site too often. 

Lastly, here is a command which shows us the top ten memory utilization processes (with children aggregate).

root@host [~]# ps axo rss,comm,pid | awk '{ proc_list[$2]++; proc_list[$2 "," 1] += $1; } END { for (proc in proc_list) { printf("%d\t%s\n", proc_list[proc "," 1],proc); }}' | sort -n | tail -n 10
13796   dockerd
21520   SonarPush
25512   lfd
39888   python3
44220   php-fpm
51128   cpsrvd
56232   php
106356  httpd
156492  mysqld
523904  clamd
root@host [~]#


Malformed MySQL queries can also have a significant impact on load. If a user is interacting with a website trying to pull specific information from an existing database, backed up MySQL processes will delay that data's return. If enough of those processes run concurrently, it will force the load higher. Using the mysqladmin processlist command will help identify the current workload of MySQL.

root@host [~]# mysqladmin processlist
| Id  | User | Host | db | Command | Time | State | Info | Progress |
| 1   | system user |    | Daemon  | InnoDB purge worker      | 0.000    |
| 2   | system user |    | Daemon  | InnoDB purge coordinator | 0.000    |
| 4   | system user |    | Daemon  | InnoDB purge worker      | 0.000    |
| 3   | system user |    | Daemon  | InnoDB purge worker      | 0.000    |
| 5   | system user |    | Daemon  | InnoDB shutdown handler  | 0.000    |

Another tool that is useful for identifying load from MySQL is Mytop. Mytop is an open-source, command-line tool used for monitoring MySQL performance. It is a clone of the top command and features a terminal-based interface to monitor the overall performance of MySQL. Using this method, we can see how queries from a database are performing. 

Once the malformed queries are located, we can then have a developer adjust them to run more efficiently. Otherwise, they should be rewritten or removed if they are a primary cause of load. 


Locating the cause of high load related to memory or RAM consumption can be difficult without the proper tools and information. Luckily, we can use commands like this to identify a memory usage overview using the terminal.

The output of this command looks like this. 

Wed Jan 27 14:43:31 EST 2021
awk: cmd. line:1: BEGIN {FS=" "}{printf \"\nAvail\tActive\tTotal\tPercent Avail\n%sMB\t%sMB\t%sMB\t%s\n\n",$4+$5,$6,\$4+$5+$6,($4+$5)/($4+$5+$6)*100}
awk: cmd. line:1:                       ^ backslash not last character on line
awk: cmd. line:1: BEGIN {FS=" "}{printf \"\nAvail\tActive\tTotal\tPercent Avail\n%sMB\t%sMB\t%sMB\t%s\n\n",$4+$5,$6,\$4+$5+$6,($4+$5)/($4+$5+$6)*100}
awk: cmd. line:1:                       ^ syntax error    USER   %CPU   %MEM      RSZ COMMAND
    root    0.0   15.1   555.68 MB /usr/local/cpanel/3rdparty/bin/clamd
   mysql    0.3    3.9  146.953 MB /usr/sbin/mysqld
g33kinfo    9.7    2.1  77.5586 MB php-fpm:
g33kinfo    4.6    1.5  58.4375 MB php-fpm:
    root    1.0    1.4  54.9141 MB /opt/alt/php74-imunify/usr/bin/php
    root    2.4    1.0  39.1133 MB /opt/alt/python35/bin/python3
    root    1.7    0.9  36.0352 MB /opt/alt/python35/bin/python3
    root    0.0    0.7   26.918 MB cpsrvd
    root    0.0    0.6  25.3125 MB lfd

Linux 3.10.0-1127.19.1.el7.x86_64 (   01/27/2021      _x86_64_        (4 CPU)

02:43:31 PM kbmemfree kbmemused  %memused kbbuffers  kbcached  kbcommit   %commit  kbactive   kbinact   kbdirty
02:43:33 PM    309532   3457452     91.78    215880   1616540   5070160     87.19   1404328   1709384       344
02:43:35 PM    390092   3376892     89.64    215880   1620340   4951008     85.14   1362512   1669656       412
02:43:37 PM    388080   3378904     89.70    215880   1621152   4973132     85.52   1364784   1670468       420
02:43:38 PM    361204   3405780     90.41    215892   1622888   4999824     85.98   1390580   1672156       484
Average:       362227   3404757     90.38    215883   1620230   4998531     85.96   1380551   1680416       415

Linux 3.10.0-1127.19.1.el7.x86_64 (   01/27/2021      _x86_64_        (4 CPU)

02:43:38 PM     CPU     %user     %nice   %system   %iowait    %steal     %idle
02:43:40 PM     all      1.13      0.00      0.13     24.69      0.63     73.43
02:43:42 PM     all      3.99      0.00      2.58     23.07      2.32     68.04
02:43:44 PM     all      5.06      0.00      1.17     22.70      1.82     69.26
02:43:46 PM     all      0.12      0.00      0.25     24.72      0.50     74.41
02:43:48 PM     all      3.82      0.00      0.51     25.19      0.89     69.59
Average:        all      2.80      0.00      0.92     24.08      1.22     70.98

A more exhaustive command that shows memory usage can be seen here.

The nicely formatted output is below. 

== Server Time: ==
2021-01-27 02:50:06 PM

== Memory Utilization Information: ==
Total Memory    Active Memory   Percentage Used
3678M           0M              0.00%

== Current Swap Usage: ==
DEVICE               USED              TOTAL    PERCENT USED
/dev/vda2  1499.74M  2000.00M  74.99%
== Top 10 Processes by Memory Usage: ==
USER    PID    %MEM  RSZ     COMMANDroot    2215   14.9  563148  /usr/local/cpanel/3rdparty/bin/clamdmysql   1191   3.9   150480  /usr/sbin/mysqldroot    15187  1.4   56232   /opt/alt/php74-imunify/usr/bin/phproot    12576  1.0   40084   /opt/alt/python35/bin/python3root    13002  0.7   27564   cpsrvdroot    47801  0.6   25920   lfdnobody  99176  0.6   25212   /usr/sbin/httpdnobody  99324  0.6   25080   /usr/sbin/httpdnobody  40054  0.6   24152   /usr/sbin/httpd
root    15420  0.6   23912   /usr/local/cpanel/3rdparty/bin/perl

== Top 10 Processes By Swap Usage: ==
2215   clamd       633.36M
1364   named       292.50M
977    rsyslogd    149.05M
1191   mysqld      109.23M
52159  php-fpm     72.66M
1360   dockerd     35.03M
52164  php-fpm     28.88M
1025   containerd  28.64M
939    tuned       12.85M
1056   run-script  12.51M

== Top 10 Kernel Slab Caches: ==
106.41M  ext4_inode_cache
51.34M   radix_tree_node
21.59M   kmalloc-2048
14.27M   dentry
12.16M   inode_cache
9.45M    buffer_head
9.06M    kmalloc-512
6.25M    kmalloc-192
5.88M    kmalloc-256
3.84M    kmalloc-1024

== Last 30 Minutes Memory Usage: ==
02:10:01 PM kbmemfree kbmemused  %memused kbbuffers  kbcached  kbcommit   %commit  kbactive   kbinact   kbdirty
02:20:01 PM    394892   3372092     89.52    186972   1791996   4502996     77.44   1702664   1330604       140
02:30:01 PM    278952   3488032     92.59    182660   1844708   4583800     78.83   1763660   1382000      1072
02:40:01 PM    478732   3288252     87.29    211216   1584016   4980100     85.64   1468992   1467576       144
02:50:01 PM    516000   3250984     86.30    260268   1414912   5030872     86.52   1259140   1617076       828
Average:       417144   3349840     88.93    210279   1658908   4774442     82.11   1548614   1449314       546

== Last 30 Minutes Paging/Swap Statistics: ==
02:10:01 PM  pgpgin/s pgpgout/s   fault/s  majflt/s  pgfree/s pgscank/s pgscand/s pgsteal/s    %vmeff
02:20:01 PM    293.67     82.92   5881.40      4.30   3205.94     50.32      0.00     48.36     96.09
02:30:01 PM    183.14     60.00   4211.40      3.46   2441.86     27.97      0.00     27.47     98.21
02:40:01 PM    886.77    109.05   4756.09      1.31   3219.39     54.63      0.00     51.24     93.79
02:50:01 PM   1987.77    146.82   9831.37      5.97   7140.46    173.18     48.66    218.12     98.33
Average:       838.22     99.71   6171.33      3.76   4002.99     76.56     12.18     86.34     97.30

== Current 1 Second Memory Usage Statistics (10 Count): ==
02:50:10 PM kbmemfree kbmemused  %memused kbbuffers  kbcached  kbcommit   %commit  kbactive   kbinact   kbdirty
02:50:11 PM    561296   3205688     85.10    260344   1436072   4820100     82.89   1210256   1630628       396
02:50:12 PM    581888   3185096     84.55    260356   1444944   4769172     82.02   1180624   1639364       644
02:50:13 PM    578084   3188900     84.65    260356   1449008   4769176     82.02   1180404   1643412       828
02:50:14 PM    577960   3189024     84.66    260356   1449164   4769176     82.02   1180404   1643504       848
02:50:15 PM    573992   3192992     84.76    260356   1453148   4769176     82.02   1180412   1647512       872
02:50:16 PM    550716   3216268     85.38    260356   1453304   4813672     82.78   1204252   1647616       892
02:50:17 PM    569768   3197216     84.87    260356   1457372   4769168     82.02   1180928   1651716       916
02:50:18 PM    569752   3197232     84.88    260364   1457448   4769168     82.02   1180448   1651752       940
02:50:19 PM    567892   3199092     84.92    260364   1459156   4769168     82.02   1180456   1653500       960
02:50:20 PM    567892   3199092     84.92    260364   1459272   4769168     82.02   1180492   1653556       984
Average:       569924   3197060     84.87    260357   1451889   4778714     82.18   1185868   1646256       828

== Current 1 Second Paging/Swap Statistics (10 Count): ==
02:50:10 PM  pgpgin/s pgpgout/s   fault/s  majflt/s  pgfree/s pgscank/s pgscand/s pgsteal/s    %vmeff
02:50:11 PM   4096.00      0.00  11152.00      0.00    372.00      0.00      0.00      0.00      0.00
02:50:12 PM   8852.00    476.00  18888.00      3.00  19614.00      0.00      0.00      0.00      0.00
02:50:13 PM   4096.00      0.00   1823.00      0.00   1055.00      0.00      0.00      0.00      0.00
02:50:14 PM      0.00      0.00     34.00      0.00     49.00      0.00      0.00      0.00      0.00
02:50:15 PM   4096.00      0.00     38.00      0.00     54.00      0.00      0.00      0.00      0.00
02:50:16 PM      0.00      0.00  11140.59      0.00   1865.35      0.00      0.00      0.00      0.00
02:50:17 PM   4096.00      0.00  15499.00      0.00  16205.00      0.00      0.00      0.00      0.00
02:50:18 PM      0.00     60.00     34.00      0.00     49.00      0.00      0.00      0.00      0.00
02:50:19 PM   1780.00      0.00     34.00      0.00     50.00      0.00      0.00      0.00      0.00
02:50:20 PM      0.00      0.00     40.00      0.00     66.00      0.00      0.00      0.00      0.00
Average:      2698.90     53.55   5873.53      0.30   3935.86      0.00      0.00      0.00      0.00
root@host [~]#

Granted, this is a massive amount of output data, but having tools like this only serves to bolster our overall memory usage view. 

Disk I/O

In searching for load issues caused by disk I/O, access time is the driving factor. Disk I/O is the input/output operation on a physical disk drive. If the server reads or writes data to or from a disk file, the processor waits for the file to be written or read. Since older hard drives tend to be mechanical, the server waits to rotate to the disk to the required disk sector.

Typically, the easiest method to observe high disk I/O throughput is to use the iotop command. Running this provides an easy readout displaying where disk I/O may be originating. 

Total DISK READ :       0.00 B/s | Total DISK WRITE :       0.00 B/s
Actual DISK READ:       0.00 B/s | Actual DISK WRITE:       0.00 B/s
     1 be/4 root        0.00 B/s    0.00 B/s  0.00 %  0.00 % systemd --switched-root --system --deserialize 22
     2 be/4 root        0.00 B/s    0.00 B/s  0.00 %  0.00 % [kthreadd]
     4 be/0 root        0.00 B/s    0.00 B/s  0.00 %  0.00 % [kworker/0:0H]
     6 be/4 root        0.00 B/s    0.00 B/s  0.00 %  0.00 % [ksoftirqd/0]
     7 rt/4 root        0.00 B/s    0.00 B/s  0.00 %  0.00 % [migration/0]
     8 be/4 root        0.00 B/s    0.00 B/s  0.00 %  0.00 % [rcu_bh]
     9 be/4 root        0.00 B/s    0.00 B/s  0.00 %  0.00 % [rcu_sched]
    10 be/0 root        0.00 B/s    0.00 B/s  0.00 %  0.00 % [lru-add-drain]
    11 rt/4 root        0.00 B/s    0.00 B/s  0.00 %  0.00 % [watchdog/0]
    12 rt/4 root        0.00 B/s    0.00 B/s  0.00 %  0.00 % [watchdog/1]
    13 rt/4 root        0.00 B/s    0.00 B/s  0.00 %  0.00 % [migration/1]

Another commandline tool that can show disk I/O is atop. The command atop provides a more comprehensive view of the hardware.


If looking at the DSK row, we can see the following info.

DSK |  vda |  busy 0% |  read 0 |  write 5 |  KiB/r  0 |  KiB/w  33  | MBr/s 0.0  | MBw/s    0.0  | avq     1.00  | avio 0.20 ms

One final command that indicates I/O usage can be seen here. 

root@host [~]# dd if=/dev/zero of=diskbench bs=1M count=1024 conv=fdatasync
1024+0 records in
1024+0 records out
1073741824 bytes (1.1 GB) copied, 2.57465 s, 417 MB/s
root@host [~]#

This command does an actual read/write check of the disk to measure throughput. It sends/receives one GB of data to the disk and then calculates the speed and how fast it was transferred. In this case, it moved 417MB/s.

To address load caused by I/O wait, we must lower the number of read/writes, modify our configurations that utilize I/O (like mysql), or use a faster disk drive. Modern SSD drives reduce this issue significantly. More advanced techniques can be used but require a developer or systems administrator to implement.


Typically, only a few factors define load caused by networking issues. These include network saturation, incorrectly configured networking protocols, and malicious traffic.

Symptoms of network saturation include, but are not limited to, dropped packets, unreachable websites, and increased server load (from trying to digest the influx of inbound traffic). Additionally, suppose a web application firewall or software firewall is in place, and firewall rules are misconfigured (such as trying to block a significant number of IPs from multiple countries). In that case, the load can occur due to the increased workload from that application.

With the rise in streaming, running multiple large video streams from a site can also cause stress the outbound network connections. This stress can reduce the overall flow of traffic, increasing load considerably. 


One of the tools we can use to examine the amount of traffic we are receiving is IPTraf. 


The iptraf-ng command brings up a graphical interface that allows us to select multiple options for viewing the network connections. In the menu, first, select IP traffic monitor and then All interfaces to show all traffic on eth0, lo, docker, and other available interfaces.

When selecting “General interface statistics,” we see the following info. 

iptraf-ng 1.1.4
┌ Iface ── Total ── IPv4 ── IPv6 ── NonIP ── BadIP ── Activity 
│ lo       68       68      0       0        0        6.98 kbps          
│ eth0     541      507     34      0        0        23.22 kbps         
│ docker0  0        0       0       0        0        0.00 kbps  

When selecting “Statistical breakdowns,” we see this information. 

iptraf-ng 1.1.4
┌ Proto/Port ─ Pkts ─ Bytes ─ PktsTo ─ BytesTo ─ PktsFrom ─ BytesFrom
│ TCP/443       528    226088  289     109761     239       116327                                                              
│ UDP/53        6      563     3       204        3         359                                                              │
 TCP/80        22     2475    13      1289       9         1186                                                              
│ UDP/546       6      820     0       0          6         820                                                              
│ UDP/547       6      820     6       820        0         0                                                              
│ TCP/21        18     586     10      488        8         1098                                                              
│ TCP/110       16     904     10      536        6         368  

This tool provides a unique view of the amount of traffic moving in and out of the server. It can help us track down networking backups based on the amount of traffic we are receiving on which interface and port. 

To lower the load caused by these issues, we must lower or limit the type or amount of traffic. We must also ensure that our network configurations are not contributing to this issue. 


A server will always have a slight load, which is expected. Linux can control most load concerns, but we must step in when needed to locate and lower high load issues. The tools above provide a solid base for tracking download problems no matter where it originates.

