I used laravel forge to launch 3 servers on Amazon AWS.
2 servers are t2.micro
1 server is on a t2.nano
All 3 Servers use all the CPU available to them causing the sites to be sluggish.
In addition, while the t2.micro instances keeps running, the t2.nano actually crashes.
Now, according to top, the command kswapd0 is using all of the CPU resources available.
All 3 servers experience the same issue.
I started researching and I found that there are a few forum discussions on the internet about this issue:
- https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1518457
- https://askubuntu.com/questions/259739/kswapd0-is-taking-a-lot-of-cpu
Link 1 suggests that there was a bug in the kernel. However, this bug was fixed with 4.4.0-43. Looking at my server, I am on 4.4.0-78 so it shouldn't be this bug.
Here is the result of running uname -a on one of the servers:
Linux frosty-abyss 4.4.0-78-generic #99-Ubuntu SMP Thu Apr 27 15:29:09 UTC 2017 x86_64 x86_64 x86_64 GNU/Linux
Even though my server is later than the one with the bug, my server has many of the same symptoms.
For example, if I run:
sudo sh -c "sync && echo 3 > /proc/sys/vm/drop_caches"
This will temporarily solve the issue.
Folks facing this issue as part of the above mentioned bug reported having a similar experience.
Unfortunately, this is not a permanent solution. After some time, the problem comes back.
So, I got quite frustrated and scheduled a task to run this command (sync && echo 3 > /proc/sys/vm/drop_caches) once a day. This worked for a couple of weeks and then it happened again despite the task continuing to successfully run.
I rebooted the server and every thing seemed okay again. So, I updated the schedule to run every 15 minutes (I was really desperate). To my surprise, it happened again after a while. I went back to research mode and after digging in the comments of the kernel bug forum discussion, I found a suggestion to do this:
touch /etc/udev/rules.d/40-vm-hotadd.rules
reboot
I have no clue what that command actually does, but in desperation I tried it.
I don't know if I have resolved the issue or not but I am hoping it somehow has.
I would bet that it hasn't though since I don't actually think it is this bug that is causing the issue and that command disables something that is related to that specific bug.
Obviously, this wasn't the right way to solve the problem to begin with and in addition I think it impacts the performance of my site since I keep dropping the cache.
So what is my next steps?
My servers are incredibly simple. I just have one site on each of the servers.
One runs a WordPress site.
One runs a complex Laravel site
One runs a ridiculously simple Laravel site (the t2.nano)
I don't think it is relevant but I use Let's Encrypt SSLs on 2 of the 3 servers.
The Laravel Sites are auto deploying from a GitHub bucket (but I haven't touched the code for a bit).
I use PHP 5.6 on the WordPress server
I use PHP 7.1 on the Laravel servers
It really is a very standard setup so I am quite confused why this is happening. Is anyone else facing this issue? I find it hard to believe that Forge can't properly provision a simple server but I can't see what I might be doing incorrectly either.