At RisingStack, we have been using Ghost from the very beginning, and we love it! As of today we have more than 125 blogposts, with thousands of unique visitors every day, and with 1.5 million pageviews in 2016 overall.

In this post I’m going to share the story of how we discovered a node.js memory leak in [email protected], and what role Trace played in the process of detecting and fixing it.

What's Ghost?

Just a blogging platform

Node.js Memory Leak - The Ghost blogging platforms logo

Ghost is a fully open-source publishing platform written entirely in JavaScript. It uses Node.js for the backend, Ember.js for the admin side and Handlebars.js to power the rendering.

Ghost is actively developed - in the last 30 days, it had 10 authors with 66 commits to the master branch. The project's roadmap can be found here: https://trello.com/b/EceUgtCL/ghost-roadmap.

You can open an account at https://ghost.org/ and start writing instantly - or you can host your own version of Ghost, just like we do.

Our Ghost Deployment

Firstly, I'd like to give you a quick overview of how we deploy and use Ghost in production at RisingStack. We use Ghost as a npm module, required into a bigger project, something like this:

// adding Trace to monitor the blog
require('@risingstack/trace')
const path = require('path')
const ghost = require('ghost')

ghost({
  config: path.join(__dirname, 'config.js')
}).then(function (ghostServer) {
  ghostServer.start()
})

Deployments are done using Circle CI which creates a Docker image, pushes it to a Docker registry and deploys it to a staging environment. If everything looks good, the updates are moved to the production blog you are reading now. As a backing database, the blog uses PostgreSQL.

The Node.js Memory Leak

As we like to keep our dependencies up-to-date, we updated to [email protected] as soon as it came out. Once we did this, our alerts started to fire, as memory usage started to grow:

Node.js Memory leak in ghost - Trace memory metrics

Luckily, we had alerts set up for memory usage in Trace, which notified us that something is not right. As Trace integrates with Opsgenie and Pagerduty seamlessly, we could have set up alerts for those channels.

Node.js Monitoring and Debugging from the Experts of RisingStack

Set up alerts for your Node.js deployments using Trace
Learn more

We set up alerts for the blog service at 180 and 220 Mb because usually it consumes around 150 Mb when everything’s all right.

Setting up alerts for Node.js memory leaks in Trace

What was even better, is that the alerting was set up in a way that it triggered actions on the collector level. What does this mean? It means, that Trace could create a memory heapdump automatically, without human intervention. Once we started to investigate the issue, the memory heapdump was already in the Profiler section of Trace in the format that's supported by the Google Chrome DevTools.

This enabled us to start looking at the problem instantly, and in a way it happened in the production system, not by trying to reproduce the issue in a local development environment.

Also, as we could take multiple heapdumps from the application itself, we could compare them using the comparison view of the DevTools.

Memory heapshot comparison with Trace and Chrome's Devtools

How to use the comparison view to find the source of a problem? On the picture above, you can see that I compared the heapdump that Trace automatically collected when the alert was triggered with a heapdump that was requested earlier, when everything was ok with the service.

What you have to look for is the #Delta, which shows +772 in our case. This means that at the time our high memory usage alert was triggered the heapdump had an extra 772 objects in it. On the bottom of the picture you can see what were these elements, and that they have something to do with the lodash module.

Figuring this out otherwise would be extremely challenging since you’d have to reproduce the issue in a local environment - which is tricky if you don’t even know what caused it.

Should I update? Well..

The final cause of the leak was found by Katharina Irrgang, a core Ghost contributor. To check out the whole thread you can take a look at the GitHub issue: https://github.com/TryGhost/Ghost/issues/7189 . A fix was shipped with 0.10.1. - but updating to it will cause another issue: slow response times.

Slow Response Times

Once we upgraded to the new version, we ran into a new problem - our blog's response time started to degrade. The 95 percentile grew from 100ms to almost 300ms. It instantly triggered our alerts set for response times.

Slow response time graph from Trace

For the slow response time we started to take CPU profiles using Trace. For now, we are still investigating the exact reason, but so far we suspect something is off with how moment.js is used.

CPU profile analysis with Trace

We will update the post once we found why it happens.

Conclusion

I hope this article helped you to figure out what to do in case you're experiencing memory leaks in your Node.js applications. If you'd like to get memory heapdumps automatically in a case like this, connect your services with Trace and enable alerting just like we did earlier.

If you have any additional questions, you can reach me in the comments section!