At RisingStack, we have used Ghost in the very beginning, and we loved it! As of today we have more than 125 blogposts, with thousands of unique visitors every day, and with 1.5 million pageviews in 2016 overall.
In this post I’m going to share the story of how we discovered a node.jsNode.js is an asynchronous event-driven JavaScript runtime and is the most effective when building scalable network applications. Node.js is free of locks, so there's no chance to dead-lock any process. memory leak in ghost@0.9.0
, and what role Trace played in the process of detecting and fixing it.
UPDATE: This article mentions Trace, RisingStack’s Node.js Monitoring platform several times. On 2017 October, Trace has been merged with Keymetrics’s APM solution. Click here to give it a try!
What’s Ghost?
Just a blogging platform
Ghost is a fully open-source publishing platform written entirely in JavaScript. It uses Node.js for the backend, Ember.js for the admin side and Handlebars.js to power the rendering.
Ghost is actively developed – in the last 30 days, it had 10 authors with 66 commits to the master branch. The project’s roadmap can be found here: https://trello.com/b/EceUgtCL/ghost-roadmap.
You can open an account at https://ghost.org/ and start writing instantly – or you can host your own version of Ghost, just like we do.
Our Ghost Deployment
Firstly, I’d like to give you a quick overview of how we deploy and use Ghost in production at RisingStack. We use Ghost as a npmnpm is a software registry that serves over 1.3 million packages. npm is used by open source developers from all around the world to share and borrow code, as well as many businesses. There are three components to npm: the website the Command Line Interface (CLI) the registry Use the website to discover and download packages, create user profiles, and... module, required into a bigger project, something like this:
// adding Trace to monitor the blog
require('@risingstack/trace')
const path = require('path')
const ghost = require('ghost')
ghost({
config: path.join(__dirname, 'config.js')
}).then(function (ghostServer) {
ghostServer.start()
})
Deployments are done using Circle CI which creates a Docker image, pushes it to a Docker registry and deploys it to a staging environment. If everything looks good, the updates are moved to the production blog you are reading now. As a backing database, the blog uses PostgreSQL.
The Node.js Memory Leak
As we like to keep our dependencies up-to-date, we updated to ghost@0.9.0
as soon as it came out. Once we did this, our alerts started to fire, as memory usage started to grow:
Luckily, we had alerts set up for memory usage in Trace, which notified us that something is not right. As Trace integrates with Opsgenie and Pagerduty seamlessly, we could have set up alerts for those channels.
We set up alerts for the blog service at 180 and 220 Mb because usually it consumes around 150 Mb when everything’s all right.
What was even better, is that the alerting was set up in a way that it triggered actions on the collector level. What does this mean? It means, that Trace could create a memory heapdump automatically, without human intervention. Once we started to investigate the issue, the memory heapdump was already in the Profiler section of Trace in the format that’s supported by the Google Chrome DevTools.
This enabled us to start looking at the problem instantly, and in a way it happened in the production system, not by trying to reproduce the issue in a local development environment.
Also, as we could take multiple heapdumps from the application itself, we could compare them using the comparison view of the DevTools.
How to use the comparison view to find the source of a problem? On the picture above, you can see that I compared the heapdump that Trace automatically collected when the alert was triggered with a heapdump that was requested earlier, when everything was ok with the service.
What you have to look for is the #Delta, which shows +772 in our case. This means that at the time our high memory usage alert was triggered the heapdump had an extra 772 objects in it. On the bottom of the picture you can see what were these elements, and that they have something to do with the lodash module.
Figuring this out otherwise would be extremely challenging since you’d have to reproduce the issue in a local environment – which is tricky if you don’t even know what caused it.
Should I update? Well..
The final cause of the leak was found by Katharina Irrgang, a core Ghost contributor. To check out the whole thread you can take a look at the GitHub issue: https://github.com/TryGhost/Ghost/issues/7189 . A fix was shipped with 0.10.1. – but updating to it will cause another issue: slow response times.
Slow Response Times
Once we upgraded to the new version, we ran into a new problem – our blog’s response time started to degrade. The 95 percentile grew from 100ms to almost 300ms. It instantly triggered our alerts set for response times.
For the slow response time we started to take CPU profiles using Trace. For now, we are still investigating the exact reason, but so far we suspect something is off with how moment.js is used.
We will update the post once we found why it happens.
Conclusion
I hope this article helped you to figure out what to do in case you’re experiencing memory leaks in your Node.js applications. If you’d like to get memory heapdumps automatically in a case like this, connect your services with Trace and enable alerting just like we did earlier.
If you have any additional questions, you can reach me in the comments section!