Node.js Development & Consulting

Talk with an expert

Try our Node.js monitoring tool

node.js at scale

Node.js Post-Mortem Diagnostics & Debugging

Node.js Post-Mortem Diagnostics & Debugging

Post-mortem diagnostics & debugging comes into the picture when you want to figure out what went wrong with your Node.js application in production.

In this chapter of Node.js at Scale we will take a look at node-report, a core project which aims to help you to do post-mortem diagnostics & debugging.

The node-report diagnostics module

The purpose of the module is to produce a human-readable diagnostics summary file. It is meant to be used in both development and production environments.

The generated report includes:

  • JavaScript and native stack traces,
  • heap statistics,
  • system information,
  • resource usage,
  • loaded libraries.

Currently node-report supports Node.js v4, v6, and v7 on AIX, Linux, MacOS, SmartOS, and Windows.

Adding it to your project just takes an npm install and require:

npm install node-report --save  
//index.js
require('node-report')  

Once you add node-report to your application, it will automatically listen on unhandled exceptions and fatal error events, and will trigger a report generation. Report generation can also be triggered by sending a USR2 signal to the Node.js process.

Use cases of node-report

Diagnostics of exceptions

For the sake of simplicity, imagine you have the following endpoint in one of your applications:

function myListener(request, response) {  
  switch (request.url) {
  case '/exception':
    throw new Error('*** exception.js: uncaught exception thrown from function myListener()');
  }
}

This code simply throws an exception once the /exception route handler is called. To make sure we get the diagnostics information, we have to add the node-report module to our application, as shown previously.

require('node-report')  
function my_listener(request, response) {  
  switch (request.url) {
  case '/exception':
    throw new Error('*** exception.js: uncaught exception thrown from function my_listener()');
  }
}

Let's see what happens once the endpoint gets called! Our report just got written into a file:

Writing Node.js report to file: node-report.20170506.100759.20988.001.txt  
Node.js report completed  


Need assistance running Node.js in production?

Expert help when you need it the most
I want help


The header

Once you open the file, you'll get something like this:

=================== Node Report ===================

Event: exception, location: "OnUncaughtException"  
Filename: node-report.20170506.100759.20988.001.txt  
Dump event time:  2017/05/06 10:07:59  
Module load time: 2017/05/06 10:07:53  
Process ID: 20988  
Command line: node demo/exception.js

Node.js version: v6.10.0  
(ares: 1.10.1-DEV, http_parser: 2.7.0, icu: 58.2, modules: 48, openssl: 1.0.2k, 
 uv: 1.9.1, v8: 5.1.281.93, zlib: 1.2.8)

node-report version: 2.1.2 (built against Node.js v6.10.0, 64 bit)

OS version: Darwin 16.4.0 Darwin Kernel Version 16.4.0: Thu Dec 22 22:53:21 PST 2016; root:xnu-3789.41.3~3/RELEASE_X86_64

Machine: Gergelys-MacBook-Pro.local x86_64  

You can think of this part as a header for your diagnostics summary - it includes..

  • the main event why the report was created,
  • how the Node.js application was started (node demo/exception.js),
  • what Node.js version was used,
  • the host operating system,
  • and the version of node-report itself.

The stack traces

The next part of the report includes the captured stack traces, both for JavaScript and the native part:

=================== JavaScript Stack Trace ===================
Server.myListener (/Users/gergelyke/Development/risingstack/node-report/demo/exception.js:19:5)  
emitTwo (events.js:106:13)  
Server.emit (events.js:191:7)  
HTTPParser.parserOnIncoming [as onIncoming] (_http_server.js:546:12)  
HTTPParser.parserOnHeadersComplete (_http_common.js:99:23)  

In the JavaScript part, you can see..

  • the stack trace (which function called which one with line numbers),
  • and where the exception occurred.

In the native part, you can see the same thing - just on a lower level, in the native code of Node.js

=================== Native Stack Trace ===================
 0: [pc=0x103c0bd50] nodereport::OnUncaughtException(v8::Isolate*) [/Users/gergelyke/Development/risingstack/node-report/api.node]
 1: [pc=0x10057d1c2] v8::internal::Isolate::Throw(v8::internal::Object*, v8::internal::MessageLocation*) [/Users/gergelyke/.nvm/versions/node/v6.10.0/bin/node]
 2: [pc=0x100708691] v8::internal::Runtime_Throw(int, v8::internal::Object**, v8::internal::Isolate*) [/Users/gergelyke/.nvm/versions/node/v6.10.0/bin/node]
 3: [pc=0x3b67f8092a7] 
 4: [pc=0x3b67f99ab41] 
 5: [pc=0x3b67f921533] 

Heap and garbage collector metrics

You can see in the heap metrics how each heap space performed during the creation of the report:

  • new space,
  • old space,
  • code space,
  • map space,
  • large object space.

These metrics include:

  • memory size,
  • committed memory size,
  • capacity,
  • used size,
  • available size.

To better understand how memory handling in Node.js works, check out the following articles:

=================== JavaScript Heap and GC ===================
Heap space name: new_space  
    Memory size: 2,097,152 bytes, committed memory: 2,097,152 bytes
    Capacity: 1,031,680 bytes, used: 530,736 bytes, available: 500,944 bytes
Heap space name: old_space  
    Memory size: 3,100,672 bytes, committed memory: 3,100,672 bytes
    Capacity: 2,494,136 bytes, used: 2,492,728 bytes, available: 1,408 bytes

Total heap memory size: 8,425,472 bytes  
Total heap committed memory: 8,425,472 bytes  
Total used heap memory: 4,283,264 bytes  
Total available heap memory: 1,489,426,608 bytes

Heap memory limit: 1,501,560,832  

Resource usage

The resource usage section includes metrics on..

  • CPU usage,
  • the size of the resident set size,
  • information on page faults,
  • and the file system activity.
=================== Resource usage ===================
Process total resource usage:  
  User mode CPU: 0.119704 secs
  Kernel mode CPU: 0.020466 secs
  Average CPU Consumption : 2.33617%
  Maximum resident set size: 21,965,570,048 bytes
  Page faults: 13 (I/O required) 5461 (no I/O required)
  Filesystem activity: 0 reads 3 writes

System information

The system information section includes..

  • environment variables,
  • resource limits (like open files, CPU time or max memory size)
  • and loaded libraries.

Diagnostics of fatal errors

The node-report module can also help once you have a fatal error, like your application runs out of memory.

By default, you will get an error message something like this:

<--- Last few GCs --->

   23249 ms: Mark-sweep 1380.3 (1420.7) -> 1380.3 (1435.7) MB, 695.6 / 0.0 ms [allocation failure] [scavenge might not succeed].
   24227 ms: Mark-sweep 1394.8 (1435.7) -> 1394.8 (1435.7) MB, 953.4 / 0.0 ms (+ 8.3 ms in 231 steps since start of marking, biggest step 1.2 ms) [allocation failure] [scavenge might not succeed].

On its own, this information is not that helpful. You don't know the context, or what was the state of the application. With node-report, it gets better.

First of all, in the generated post-mortem diagnostics summary you will have a more descriptive event:

Event: Allocation failed - JavaScript heap out of memory, location: "MarkCompactCollector: semi-space copy, fallback in old gen"  

Secondly, you will get the native stack trace - that can help you to understand better why the allocation failed.

Diagnostics of blocking operations

Imagine you have the following loops which block your event loop. This is a performance nightmare.

var list = []  
for (let i = 0; i < 10000000000; i++) {  
  for (let j = 0; i < 1000; i++) {
    list.push(new MyRecord())
  }
  for (let j=0; i < 1000; i++) {
    list[j].id += 1
    list[j].account += 2
  }
  for (let j = 0; i < 1000; i++) {
    list.pop()
  }
}

With node-report you can request reports even when your process is busy, by sending the USR2 signal. Once you do that you will receive the stack trace, and you will see in a minute where your application spends time.

(Examples are taken for the node-report repository)

The API of node-report

Triggering report generation programmatically

The creation of the report can also be triggered using the JavaScript API. This way your report will be saved in a file, just like when it was triggered automatically.

const nodeReport = require('node-report')  
nodeReport.triggerReport()  

Getting the report as a string

Using the JavaScript API, the report can also be retrieved as a string.

const nodeReport = require('nodereport')  
const report = nodeReport.getReport()  

Using without automatic triggering

If you don't want to use automatic triggers (like the fatal error or the uncaught exception) you can opt-out of them by requiring the API itself - also, the file name can be specified as well:

const nodeReport = require('node-report/api')  
nodeReport.triggerReport('name-of-the-report')  

Contribute

If you feel like making Node.js even better, please consider joining the Postmortem Diagnostics working group, where you can contribute to the module.

The Postmortem Diagnostics working group is dedicated to the support and improvement of postmortem debugging for Node.js. It seeks to elevate the role of postmortem debugging for Node, to assist in the development of techniques and tools, and to make techniques and tools known and available to Node.js users.

In the next chapter of the Node.js at Scale series, we will discuss profiling Node.js Applications. If you have any questions, please let me know in the comments section below.

How to Debug Node.js with the Best Tools Available

How to Debug Node.js with the Best Tools Available

Debugging - the process of finding and fixing defects in software - can be a challenging task to do in all languages. Node.js is no exception.

Luckily, the tooling for finding these issues improved a lot in the past period. Let's take a look at what options you have to find and fix bugs in your Node.js applications!

We will dive into two different aspects of debugging Node.js applications - the first one will be logging, so you can keep an eye on production systems, and have events from there. After logging, we will take a look at how you can debug your applications in development environments.

Logging in Node.js

Logging takes place in the execution of your application to provide an audit trail that can be used to understand the activity of the system and to diagnose problems to find and fix bugs.

For logging purposes, you have lots of options when building Node.js applications. Some npm modules are shipped with built in logging that can be turned on when needed using the debug module. For your own applications, you have to pick a logger too! We will take a look at pino.

Before jumping into logging libraries, let's take a look what requirements they have to fulfil:

  • timestamps - it is crucial to know which event happened when,
  • formatting - log lines must be easily understandable by humans, and straightforward to parse for applications,
  • log destination - it should be always the standard output/error, applications should not concern themselves with log routing,
  • log levels - log events have different severity levels, in most cases, you won't be interested in debug or info level events.

The debug module of Node.js

Recommendation: use for modules published on npm

Let's see how it makes your life easier! Imagine that you have a Node.js module that sends serves requests, as well as send out some.

// index.js
const debugHttpIncoming = require('debug')('http:incoming')  
const debugHttpOutgoing = require('debug')('http:outgoing')

let outgoingRequest = {  
  url: 'https://risingstack.com'
}

// sending some request
debugHttpOutgoing('sending request to %s', outgoingRequest.url)

let incomingRequest = {  
  body: '{"status": "ok"}'
}

// serving some request
debugHttpOutgoing('got JSON body %s', incomingRequest.body)  

Once you have it, start your application this way:

DEBUG=http:incoming,http:outgoing node index.js  

The output will be something like this:

Output of Node.js Debugging

Also, the debug module supports wildcards with the * character. To get the same result we got previously, we simply could start our application with DEBUG=http:* node index.js.

What's really nice about the debug module is that a lot of modules (like Express or Koa) on npm are shipped with it - as of the time of writing this article more than 14.000 modules.

The pino logger module

Recommendation: use for your applications when performance is key

Pino is an extremely fast Node.js logger, inspired by bunyan. In many cases, pino is over 6x faster than alternatives like bunyan or winston:

benchWinston*10000:     2226.117ms  
benchBunyan*10000:      1355.229ms  
benchDebug*10000:       445.291ms  
benchLogLevel*10000:    322.181ms  
benchBole*10000:        291.727ms  
benchPino*10000:        269.109ms  
benchPinoExtreme*10000: 102.239ms  

Getting started with pino is straightforward:

const pino = require('pino')()

pino.info('hello pino')  
pino.info('the answer is %d', 42)  
pino.error(new Error('an error'))  

The above snippet produces the following log lines:

{"pid":28325,"hostname":"Gergelys-MacBook-Pro.local","level":30,"time":1492858757722,"msg":"hello pino","v":1}
{"pid":28325,"hostname":"Gergelys-MacBook-Pro.local","level":30,"time":1492858757724,"msg":"the answer is 42","v":1}
{"pid":28325,"hostname":"Gergelys-MacBook-Pro.local","level":50,"time":1492858757725,"msg":"an error","type":"Error","stack":"Error: an error\n    at Object.<anonymous> (/Users/gergelyke/Development/risingstack/node-js-at-scale-debugging/pino.js:5:12)\n    at Module._compile (module.js:570:32)\n    at Object.Module._extensions..js (module.js:579:10)\n    at Module.load (module.js:487:32)\n    at tryModuleLoad (module.js:446:12)\n    at Function.Module._load (module.js:438:3)\n    at Module.runMain (module.js:604:10)\n    at run (bootstrap_node.js:394:7)\n    at startup (bootstrap_node.js:149:9)\n    at bootstrap_node.js:509:3","v":1}

The Built-in Node.js Debugger module

Node.js ships with an out-of-process debugging utility, accessible via a TCP-based protocol and built-in debugging client. You can start it using the following command:

$ node debug index.js

This debugging agent is a not a fully featured debugging agent - you won't have a fancy user interface, however, simple inspections are possible.

You can add breakpoints to your code by adding the debugger statement into your codebase:

const express = require('express')  
const app = express()

app.get('/', (req, res) => {  
  debugger
  res.send('ok')
})

This way the execution of your script will be paused at that line, then you can start using the commands exposed by the debugging agent:

  • cont or c - continue execution,
  • next or n - step next,
  • step or s - step in,
  • out or o - step out,
  • repl - to evaluate script's context.

V8 Inspector Integration for Node.js

The V8 inspector integration allows attaching Chrome DevTools to Node.js instances for debugging by using the Chrome Debugging Protocol.

V8 Inspector can be enabled by passing the --inspect flag when starting a Node.js application:

$ node --inspect index.js

In most cases, it makes sense to stop the execution of the application at the very first line of your codebase and continue the execution from that. This way you won't miss any command execution.

$ node --inspect-brk index.js


I recommend watching this video in full-screen mode to get every detail!

How to Debug Node.js with Visual Studio Code

Most modern IDEs have some support for debugging applications - so does VS Code. It has built-in debugging support for Node.js.

What you can see below, is the debugging interface of VS Code - with the context variables, watched expressions, call stack and breakpoints.

VS Code Debugging Layout Image credit: Visual Studio Code

One of the most valuable features of the integrated Visual Studio Code debugger is the ability to add conditional breakpoints. With conditional breakpoints, the breakpoint will be hit whenever the expression evaluates to true.

If you need more advanced settings for VS Code, it comes with a configuration file, .vscode/launch.json which describes how the debugger should be launched. The default launch.json looks something like this:

{
    "version": "0.2.0",
    "configurations": [
        {
            "type": "node",
            "request": "launch",
            "name": "Launch Program",
            "program": "${workspaceRoot}/index.js"
        },
        {
            "type": "node",
            "request": "attach",
            "name": "Attach to Port",
            "address": "localhost",
            "port": 5858
        }
    ]
}

For advanced configuration settings of launch.json go to https://code.visualstudio.com/docs/editor/debugging#_launchjson-attributes.

For more information on debugging with Visual Studio Code, visit the official site: https://code.visualstudio.com/docs/editor/debugging.

Next Up

If you have any questions about debugging, please let me know in the comments section.

In the next episode of the Node.js at Scale series, we are going to talk about Node.js Post-Mortem Diagnostics & Debugging.

The Definitive Guide for Monitoring Node.js Applications

The Definitive Guide for Monitoring Node.js Applications

In the previous chapters of Node.js at Scale we learned how you can get Node.js testing and TDD right, and how you can use Nightwatch.js for end-to-end testing.

In this article, we will learn about running and monitoring Node.js applications in Production. Let's discuss these topics:

  • What is monitoring?
  • What should be monitored?
  • Open-source monitoring solutions
  • SaaS and On-premise monitoring offerings

What is Node.js Monitoring?

Monitoring is observing the quality of a software over time. The available products and tools we have in this industry usually go by the term Application Performance Monitoring or APM in short.

If you have a Node.js application in a staging or production environment, you can (and should) do monitoring on different levels:

You can monitor

  • regions,
  • zones,
  • individual servers and,
  • of course, the Node.js software that runs on them.

In this guide we will deal with the software components only, as if you run in a cloud environment, the others are taken care for you usually.

What should be monitored?

Each application you write in Node.js produces a lot of data about its' behavior.

There are different layers where an APM tool should collect data from. The more of them covered, the more insights you'll get about your system's behavior.

  • Service level
  • Host level
  • Instance (or process) level

The list you can find below collects the most crucial problems you'll run into while you maintain a Node.js application in production. We'll also discuss how monitoring helps to solve them and what kind of data you'll need to do so.

Problem 1.: Service Downtimes

If your application is unavailable, your customers can't spend money on your sites. If your API's are down, your business partners and services depending on them will fail as well because of you.

We all know how cringeworthy it is to apologize for service downtimes.

Your topmost priority should be preventing failures and providing 100% availability for your application.

Running a production app comes with great responsibility.

Node.js APM's can easily help you detecting and preventing downtimes, since they usually collect service level metrics.

This data can show if your application handles requests properly, although it won't always help to tell if your public sites or API's are available.

To have a proper coverage on downtimes, we recommend to set up a pinger as well which can emulate user behavior and provide foolproof data on availability. If you want to cover everything, don't forget to include different regions like the US, Europe and Asia too.

Problem 2.: Slow Services, Terrible Response Times

Slow response times have a huge impact on conversion rate, as well as on product usage. The faster your product is the more customers and user satisfaction you'll have.

Usually, all Node.js APM's can show if your services are slowing down, but interpreting that data requires further analysis.

I recommend doing two things to find the real reasons for slowing services.

  • Collect data on a process level too. Check out each instance of a service to figure out what happens under the hood.
  • Request CPU profiles when your services slow down and analyze them to find the faulty functions.

Eliminating performance bottlenecks enables you to scale your software more efficiently and also to optimize your budget.

Problem 3.: Solving Memory Leaks is Hard

Our Node.js Consulting & Development expertise allowed us to build huge enterprise systems and help developers making them better.

What we see constantly is that Memory Leaks in Node.js applications are quite frequent and that finding out what causes them is among the greatest struggles Node developers face.

This impression is backed with data as well. Our Node.js Developer Survey showed that Memory Leaks cause a lot of headache for even the best engineers.

To find memory leaks, you have to know exactly when they happen.

Some APM's collect memory usage data which can be used to recognize a leak. What you should look for is the steady growth of memory usage which ends up in a service crash & restart (as Node runs out of memory after 1,4 Gigabytes).

Node.js memory leak shown in Trace, the node.js monitoring tool

If your APM collects data on the Garbage Collector as well, you can look for the same pattern. As extra objects in a Node app's memory pile up, the time spent with Garbage Collection increases simultaneously. This is a great indicator of the Memory Leak.

After figuring out that you have a leak, request a memory heapdump and look for the extra objects!

This sounds easy in theory but can be challenging in practice.

What you can do is request 2 heapdumps from your production system with a Monitoring tool, and analyze these dumps with Chrome's DevTools. If you look for the extra objects in comparison mode, you'll end up seeing what piles up in your app's memory.

If you'd like a more detailed rundown on these steps, I wrote one article about finding a Node.js memory leak in Ghost, where I go into more details.

Problem 4.: Depending on Code Written by Anonymus

Most of the Node.js applications heavily rely on npm. We can end up with a lot of dependencies written by developers of unknown expertise and intentions.

Roughly 76% of Node shops use vulnerable packages, while open source projects regularly grow stale, neglecting to fix security flaws.

There are a couple of possible steps to lower the security risks of using npm packages.

  1. Audit your modules with the Node Security Platform CLI
  2. Look for unused dependencies with the depcheck tool
  3. Use the npm stats API, or browse historic stats on npm-stat.com to find out if others using a package
  4. Use the npm view <pkg> maintainers command to avoid packages maintained by only a few
  5. Use the npm outdated command or Greenkeeper to learn whether you're using the latest version of a package.

Going through these steps can consume a lot of your time, so picking a Node.js Monitoring Tool which can warn you about insecure dependencies is highly recommended.

Problem 6.: Email Alerts often go Unnoticed

Let's be honest. We are developers who like spending time writing code - not going through our email account every 10 minutes..

According to my experience, email alerts are usually unread and it's very easy to miss out on a major outage or problem if we depend only on them.

Email is a subpar method to learn about issues in production.

I guess that you also don't want to watch dashboards for potential issues 24/7. This is why it is important to look for an APM with great alerting capabilities.

What I recommend is to use pager systems like opsgenie or pagerduty to learn about critical issues. Pair up the monitoring solution of your choice with one of these systems if you'd like to know about your alerts instantly.

A few alerting best-practices we follow at RisingStack:

  • Always keep alerting simple and alert on symptoms
  • Aim to have as few alerts as possible - associated with end-user pain
  • Alert on high response time and error rates as high up in the stack as possible

Problem 7.: Finding Crucial Errors in the Code

If a feature is broken on your site, it can prevent customers from achieving their goals. Sometimes it can be a sign of bad code quality. Make sure you have proper test coverage for your codebase and a good QA process (preferably automated).

If you use an APM that collects errors from your app then you'll be able to find the ones which occur more often.

The more data your APM is accessing, better the chances of finding and fixing critical issues. We recommend to use a monitoring tool which collects and visualises stack traces as well - so you'll be able to find the root causes of errors in a distributed system.


In the next part of the article, I will show you one open-source, and one SaaS / on-premises Node.js monitoring solution that will help you operate your applications.

Prometheus - an Open-Source, General Purpose Monitoring Platform

Prometheus is an open-source systems monitoring and alerting toolkit originally built at SoundCloud.

Prometheus was started in 2012, and since then, many companies and organizations have adopted the tool. It is a standalone open source project and maintained independently of any company.

In 2016, Prometheus joined the Cloud Native Computing Foundation, right after Kubernetes.

The most important features of Prometheus are:

  • a multi-dimensional data model (time series identified by metric name and key/value pairs),
  • a flexible query language to leverage this dimensionality,
  • time series collection happens via a pull model over HTTP by default,
  • pushing time series is supported via an intermediary gateway.

Node.js monitoring with prometheus

As you could see from the previous features, Prometheus is a general purpose monitoring solution, so you can use it with any language or technology you prefer.

Check out the official Prometheus getting started pages if you'd like to give it a try.

Before you start monitoring your Node.js services, you need to add instrumentation to them via one of the Prometheus client libraries.

For this, there is a Node.js client module, which you can find here. It supports histograms, summaries, gauges and counters.

Essentially, all you have to do is require the Prometheus client, then expose its output at an endpoint:

const Prometheus = require('prom-client')  
const server = require('express')()

server.get('/metrics', (req, res) => {  
  res.end(Prometheus.register.metrics())
})

server.listen(process.env.PORT || 3000)  

This endpoint will produce an output, that Prometheus can consume - something like this:

# HELP process_start_time_seconds Start time of the process since unix epoch in seconds.
# TYPE process_start_time_seconds gauge
process_start_time_seconds 1490433285  
# HELP process_resident_memory_bytes Resident memory size in bytes.
# TYPE process_resident_memory_bytes gauge
process_resident_memory_bytes 33046528  
# HELP nodejs_eventloop_lag_seconds Lag of event loop in seconds.
# TYPE nodejs_eventloop_lag_seconds gauge
nodejs_eventloop_lag_seconds 0.000089751  
# HELP nodejs_active_handles_total Number of active handles.
# TYPE nodejs_active_handles_total gauge
nodejs_active_handles_total 4  
# HELP nodejs_active_requests_total Number of active requests.
# TYPE nodejs_active_requests_total gauge
nodejs_active_requests_total 0  
# HELP nodejs_version_info Node.js version info.
# TYPE nodejs_version_info gauge
nodejs_version_info{version="v4.4.2",major="4",minor="4",patch="2"} 1  

Of course, these are just the default metrics which were collected by the module we have used - you can extend it with yours. In the example below we collect the number of requests served:

const Prometheus = require('prom-client')  
const server = require('express')()

const PrometheusMetrics = {  
  requestCounter: new Prometheus.Counter('throughput', 'The number of requests served')
}

server.use((req, res, next) => {  
  PrometheusMetrics.requestCounter.inc()
  next()
})

server.get('/metrics', (req, res) => {  
  res.end(Prometheus.register.metrics())
})

server.listen(3000)  

Once you run it, the /metrics endpoint will include the throughput metrics as well:

# HELP process_start_time_seconds Start time of the process since unix epoch in seconds.
# TYPE process_start_time_seconds gauge
process_start_time_seconds 1490433805  
# HELP process_resident_memory_bytes Resident memory size in bytes.
# TYPE process_resident_memory_bytes gauge
process_resident_memory_bytes 25120768  
# HELP nodejs_eventloop_lag_seconds Lag of event loop in seconds.
# TYPE nodejs_eventloop_lag_seconds gauge
nodejs_eventloop_lag_seconds 0.144927586  
# HELP nodejs_active_handles_total Number of active handles.
# TYPE nodejs_active_handles_total gauge
nodejs_active_handles_total 0  
# HELP nodejs_active_requests_total Number of active requests.
# TYPE nodejs_active_requests_total gauge
nodejs_active_requests_total 0  
# HELP nodejs_version_info Node.js version info.
# TYPE nodejs_version_info gauge
nodejs_version_info{version="v4.4.2",major="4",minor="4",patch="2"} 1  
# HELP throughput The number of requests served
# TYPE throughput counter
throughput 5  

Once you have exposed all the metrics you have, you can start querying and visualizing them - for that, please refer to the official Prometheus query documentation and the vizualization documentation.

As you can imagine, instrumenting your codebase can take quite some time - since you have to create your dashboard and alerts to make sense of the data. While sometimes these solutions can provide greater flexibility for your use-case than hosted solutions, it can take months to implement them & then you have to deal with operating them as well.

If you have the time to dig deep into the topic, you'll be fine with it.

Meet Trace - our SaaS, and On-premises Node.js Monitoring Tool

As we just discussed, running your own solution requires domain knowledge, as well as expertise on how to do proper monitoring. You have to figure out what aggregation to use for what kind of metrics, and so on..

This is why it can make a lot of sense to go with a hosted monitoring solution - whether it is a SaaS product or an on-premises offering.

At RisingStack, we are developing our own Node.js Monitoring Solution, called Trace. We built all the experience into Trace which we gained through the years of providing professional Node services.

What's nice about Trace, is that you have all the metrics you need with only adding a single line of code to your application - so it really takes only a few seconds to get started.

require([email protected]/trace')  

After this, the Trace collector automatically gathers your application's performance data and visualizes it for you in an easy to understand way.

Just a few things Trace is capable to do with your production Node app:

  1. Send alerts about Downtimes, Slow services & Bad Status Codes.
  2. Ping your websites and API's with an external service + show APDEX metrics.
  3. Collect data on service, host and instance levels as well.
  4. Automatically create a (10 second-long) CPU profile in a production environment in case of a slowdown.
  5. Collect data on memory consumption and garbage collection.
  6. Create memory heapdumps automatically in case of a Memory Leak in production.
  7. Show errors and stack traces from your application.
  8. Visualize whole transaction call-chains in a distributed system.
  9. Show how your services communicate with each other on a live map.
  10. Automatically detect npm packages with security vulnerabilities.
  11. Mark new deployments and measure their effectiveness.
  12. Integrate with Slack, Pagerduty, and Opsgenie - so you'll never miss an alert.

Although Trace is currently a SaaS solution, we'll make an on-premises version available as well soon.

It will be able to do exactly the same as the cloud version, but it will run on Amazon VPC or in your own datacenter. If you're interested in it, let's talk!

Summary

I hope that in this chapter of Node.js at Scale I was able to give useful advice about monitoring your Node.js application. In the next article, you will learn how to debug Node.js applications in an easy way.

Node.js End-to-End Testing with Nightwatch.js

Node.js End-to-End Testing with Nightwatch.js

In this article, we are going to take a look at how you can do end-to-end testing with Node.js, using Nightwatch.js, a Node.js powered end-to-end testing framework.

In the previous chapter of Node.js at Scale, we discussed Node.js Testing and Getting TDD Right. If you did not read that article, or if you are unfamiliar with unit testing and TDD (test-driven development), I recommend checking that out before continuing with this article.

What is Node.js end-to-end testing?

Before jumping into example codes and learning to implement end-to-end testing for a Node.js project, it's worth exploring what end-to-end tests really are.

First of all, end-to-end testing is part of the black-box testing toolbox. This means that as a test writer, you are examining functionality without any knowledge of internal implementation. So without seeing any source code.

Secondly, end-to-end testing can also be used as user acceptance testing, or UAT. UAT is the process of verifying that the solution actually works for the user. This process is not focusing on finding small typos, but issues that can crash the system, or make it dysfunctional for the user.

Enter Nightwatch.js

Nightwatch.js enables you to "write end-to-end tests in Node.js quickly and effortlessly that run against a Selenium/WebDriver server".

Nightwatch is shipped with the following features:

  • a built-in test runner,
  • can control the selenium server,
  • support for hosted selenium providers, like BrowserStack or SauceLabs,
  • CSS and Xpath selectors.

Installing Nightwatch

To run Nightwatch locally, we have to do a little bit of extra work - we will need a standalone Selenium server locally, as well as a webdriver, so we can use Chrome/Firefox to test our applications locally.

With these three tools, we are going to implement the flow this diagram shows below.

node.js end-to-end testing with nightwatch.js flowchart Photo credit: nightwatchjs.org

STEP 1: Add Nightwatch

You can add Nightwatch to your project simply by running npm install nightwatch --save-dev.

This places the Nightwatch executable in your ./node_modules/.bin folder, so you don't have to install it globally.

STEP 2: Download Selenium

Selenium is a suite of tools to automate web browsers across many platforms.

Prerequisite: make sure you have JDK installed, with at least version 7. If you don't have it, you can grab it from here.

The Selenium server is a Java application which is used by Nightwatch to connect to various browsers. You can download the binary from here.

Once you have downloaded the JAR file, create a bin folder inside your project, and place it there. We will set up Nightwatch to use it, so you don't have to manually start the Selenium server.

STEP 3: Download Chromedriver

ChromeDriver is a standalone server which implements the W3C WebDriver wire protocol for Chromium.

To grab the executable, head over to the downloads section, and place it to the same bin folder.

STEP 4: Configuring Nightwatch.js

The basic Nightwatch configuration happens through a json configuration file.

Let's create a nightwatch.json file, and fill it with:

{
  "src_folders" : ["tests"],
  "output_folder" : "reports",

  "selenium" : {
    "start_process" : true,
    "server_path" : "./bin/selenium-server-standalone-3.3.1.jar",
    "log_path" : "",
    "port" : 4444,
    "cli_args" : {
      "webdriver.chrome.driver" : "./bin/chromedriver"
    }
  },

  "test_settings" : {
    "default" : {
      "launch_url" : "http://localhost",
      "selenium_port"  : 4444,
      "selenium_host"  : "localhost",
      "desiredCapabilities": {
        "browserName": "chrome",
        "javascriptEnabled": true,
        "acceptSslCerts": true
      }
    }
  }
}

With this configuration file, we told Nightwatch where can it find the binary of the Selenium server and the Chromedriver, as well as the location of the tests we want to run.


You shouldn't rely only on e2e testing for QA. Trace helps you to find all issues before your users do.

Node.js monitoring & debugging from the experts of RisingStack
Learn more


Quick Recap

So far, we have installed Nightwatch, downloaded the standalone Selenium server, as well as the Chromedriver. With these steps, you have all the necessary tools to create end-to-end tests using Node.js and Selenium.

Writing your first Nightwatch Test

Let's add a new file in the tests folder, called homepage.js.

We are going to take the example from the Nightwatch getting started guide. Our test script will go to Google, search for Rembrandt, and check the Wikipedia page:

module.exports = {  
  'Demo test Google' : function (client) {
    client
      .url('http://www.google.com')
      .waitForElementVisible('body', 1000)
      .assert.title('Google')
      .assert.visible('input[type=text]')
      .setValue('input[type=text]', 'rembrandt van rijn')
      .waitForElementVisible('button[name=btnG]', 1000)
      .click('button[name=btnG]')
      .pause(1000)
      .assert.containsText('ol#rso li:first-child',
        'Rembrandt - Wikipedia')
      .end()
  }
}

The only thing left to do is to run Nightwatch itself! For that, I recommend adding a new script into our package.json's scripts section:

"scripts": {
  "test-e2e": "nightwatch"
}

The very last thing you have to do is to run the tests using this command:

npm run test-e2e  

If everything goes well, your test will open up Chrome, then Google and Wikipedia.

Nightwatch.js in Your Project

Now as you understood what end-to-end testing is, and how you can set up Nightwatch, it is time to start adding it to your project.

For that, you have to consider some aspects - but please note, that there are no silver bullets here. Depending on your business needs, you may answer the following questions differently:

  • Where should I run? On staging? On production? When don I build my containers?
  • What are the test scenarios I want to test?
  • When and who should write end-to-end tests?

Summary & Next Up

In this chapter of Node.js at Scale we have learned:

  • how to set up Nightwatch,
  • how to configure it to use a standalone Selenium server,
  • and how to write basic end-to-end tests.

In the next chapter, we are going to explore how you can monitor production Node.js infrastructures.

Getting Node.js Testing and TDD Right

Getting Node.js Testing and TDD Right

Making changes to a large codebase and making sure it works is a huge deal in software development. We've already talked about a few great features of Node.js testing before, and it is very important to emphasize how crucial it is to have your code tested before you release it to your users.

It can be tedious to have proper test coverage when you have to focus on pushing out all the new features, but think about your future self, would you like to work on code that's not tested properly? If not, read this guide on getting testing and TDD (test-driven development) right.

Node.js at Scale is a collection of articles focusing on the needs of companies with bigger Node.js installations and advanced Node developers. Chapters:

Getting Test-Driven Development (TDD) Right

When new people join a the project, you'll have to make sure that whenever they make a breaking change to the codebase, your tests will indicate it by failing. I have to admit that it is hard to determine what a breaking change is, but there is one thing that I've found really handy: TDD.

Test driven development is a methodology for writing the tests first for a given module and for the actual implementation afterward. If you write your tests before your application code, that saves you from the cognitive load of keeping all the implementation details in mind, for the time you have to write your tests. At least for me, these are the two best things in it. I always found it hard to remember all the nitty-gritty details about the code that I had to test later.

With TDD I can focus more on the current step that I'm taking. It consists of 3 steps:

  • writing failing tests
  • writing code that satisfies our tests
  • and refactor.

It's that simple and I'd like to encourage you to give it a try. I'll guide you through the steps I usually take when I write a new module, and I'll also introduce you to advanced testing principles and tools that we use at RisingStack.

Step 1: Creating a New Module

This module will be responsible for creating and fetching users from our database, postgresql. For that, we're going to use knex.

First, let's create a new module:

npm init -y  

And install the tools required for testing

npm install mocha chai --save-dev  

Don't forget to add the following lines to the package json

"scripts": {
  "test": "mocha lib/**/**.spec.js"
},

Step 2: Creating the first test file

Let's create the first test file for our module:

'use strict'

const User = require('./User')  
const expect = require('chai').expect

describe('User module', () => {  
  describe('"up"', () => {
    it('should export a function', () => {
      expect(User.up).to.be.a('function')
    })
  })
})

I always like to create a function called "up" to that encapsulates the creation of the table. All I currently care about is to be able to call this function. So I expect it to be a function, let's run the tests now:

AssertionError: expected undefined to be a function  
   at Context.it (lib/User.spec.js:9:29)

This is our first failing test, let's fix it.

'use strict'

function up () {  
}

module.exports = {  
  up
}

This is enough to satisfy the current requirements. We have so few code, that there is nothing to refactor just yet, let's write the next test. I want the up function to run asynchronously; I prefer Promises to callbacks, so I'm going to use that in my example.

Step 3: Creating a Node.js test case

What I want is the up function to return a Promise, let's create a test case for it:

it('should return a Promise', () => {  
  const usersUpResult = User.up()
  expect(usersUpResult.then).to.be.a('Function')
  expect(usersUpResult.catch).to.be.a('Function')
})

It will fail again, to fix it we can just simply return a Promise from it.

function up () {  
  return new Promise(function (resolve) {
    resolve()
  })
}

You see my point now. Always take a small step towards your goal with writing your tests and then write code that satisfies it. It is not only good for documenting your code, but when it's API changes for some reason in the future, the test will be clear about what is wrong. If someone changes the up function, use callbacks instead of promises - so our test will fail.

Advanced Testing

The next step is to actually create tables. For that, we will need knex installed.

npm install pg knex --save  

For the next step I'm going to create a database called nodejs_at_scale with the following command in the terminal:

createdb nodejs_at_scale  

And create a database.js file to have the connection to my database in a single place.

'use strict'

const createKnex = require('knex')

const knex = createKnex({  
  client: 'pg',
  connection: 'postgres:[email protected]:5432/nodejs_at_scale'
})

module.exports = knex  
it('should create a table named "users"', () => {  
  return User.up()
    .then(() => db.schema.hasTable('users'))
    .then((hasUsersTable) => expect(hasUsersTable).to.be.true)
})
'use strict'

const db = require('./database')

const tableName = 'users'

function up () {  
  return db.schema.createTableIfNotExists(tableName, (table) => {
    table.increments()
    table.string('name')
    table.timestamps()
  })
}

module.exports = {  
  up
}

The actual implementation

We could go more in-depth with expecting all of the fields on the table, but I'll leave that up to your imagination.

Now we are at the refactor stage, and you can already feel that this might not be the cleanest code we wrote so far. It can get a bit funky with huge promise chains so let's make it a little bit easier to deal with. We are great fans of generators and the co module here at RisingStack, we rely on it heavily at a day-to-day basis. Let's throw in some syntactic sugar.

npm install co-mocha --save-dev  

Let's shake up that boring test script with our new module.

{
  "test": "mocha --require co-mocha lib/**/**.spec.js"
}

Now everything is in place let's refactor:

it('should create a table named "users"', function * () {  
  yield User.up()
  const hasUsersTable = yield db.schema.hasTable('users')

  expect(hasUsersTable).to.be.true
})

Co-mocha allows us to write our it blocks as generator functions and use the yield keyword to suspend at Promises, more on this topic in our Node.js Async Best Practices article.

There is even one more thing that can make it less cluttered. There is a module called chai-as-promised.

npm install chai-as-promised --save-dev  

It extends the regular chai components with expectation about promises, as db.schema.hasTable('users') returns a promise we can refactor it to the following:

'use strict'

const User = require('./User')

const chai = require('chai')  
const chaiAsPromised = require('chai-as-promised')

const db = require('./database')

chai.use(chaiAsPromised)  
const expect = chai.expect

describe('User module', () => {  
  describe('"up"', () => {
    // ...
    it('should create a table named "users"', function * () {
      yield User.up()

      return expect(db.schema.hasTable('users'))
        .to.eventually.be.true
    })
  })
})

If you look at the example above you'll see that we can use the yield keyword to extract the resolved value out of the promise, or you can return it (at the end of the function), that way mocha will do that for you. These are some nice patterns you can use in your codebase to have cleaner tests. Remember our goal is to express our intentions, pick whichever you feel closer to yours.

Let's clean up before and after our tests in a before and after block.

'use strict'

const User = require('./User')

const chai = require('chai')  
const chaiAsPromised = require('chai-as-promised')

const db = require('./database')

chai.use(chaiAsPromised)  
const expect = chai.expect

describe('User module', () => {  
  describe('"up"', () => {
    function cleanUp () {
      return db.schema.dropTableIfExists('users')
    }

    before(cleanUp)
    after(cleanUp)

    it('should export a function', () => {
      expect(User.up).to.be.a('Function')
    })

    it('should return a Promise', () => {
      const usersUpResult = User.up()
      expect(usersUpResult.then).to.be.a('Function')
      expect(usersUpResult.catch).to.be.a('Function')
    })

    it('should create a table named "users"', function * () {
      yield User.up()

      return expect(db.schema.hasTable('users'))
        .to.eventually.be.true
    })
  })
})

This should be enough for the "up" function, let's continue with creating a fetch function for our User model.

After expecting the exported and the returned types, we can move on to the actual implementation. When I'm dealing with testing modules with a database, I usually create an extra describe block for those functions that need test data inserted. Within that extra describe block I can create a beforeEach block to insert data before each test. It is also important to create a before block for creating the table before testing.

describe('fetch', () => {  
    it('should export a function', () => {
      it('should export a function', () => {
        expect(User.fetch).to.be.a('Function')
      })
      it('should return a Promise', () => {
        const usersFetchResult = User.fetch()
        expect(usersFetchResult.then).to.be.a('Function')
        expect(usersFetchResult.catch).to.be.a('Function')
      })

      describe('with inserted rows', () => {
        const testName = 'Peter'

        before(() => User.up())
        beforeEach(() =>
          Promise.all([
            db.insert({
              name: testName
            }).into('users'),
            db.insert({
              name: 'John'
            }).into('users')
          ])
        )

        it('should return the users by their name', () =>
          expect(
            User.fetch(testName)
              .then(_.map(
                _.omit(['id', 'created_at', 'updated_at'])))
          ).to.eventually.be.eql([{
            name: 'Peter'
          }])
        )
      })
    })
  })

Notice that I've used lodash to omit those fields that are dynamically added by the database and would be hard (or even impossible) to inspect on otherwise. We can also use Promises to extract the first value to inspect its keys with the following code:

it('should return users with timestamps and id', () =>  
  expect(
    User.fetch(testName)
      .then((users) => users[0])
  ).to.eventually.have.keys('created_at', 'updated_at', 'id', 'name')
)

Testing Internal Functions

Let's move forward with testing some internals of our functions. When you're writing proper tests only the functionality of the current function should be tested. To achieve this, you have to ignore the external function calls. To solve this, there are some utility functions provided by a module called sinon. Sinon module allows us to do 3 things:

  • Stubbing: means that the function that you stub, won't be called, instead you can provide an implementation. If you don't provide one, then it will be called as function () {} empty function).
  • Spying: a function spy will be called with its original implementation, but you can make assertions about it.
  • Mocking: is basically the same as stubbing but for objects not only functions

To demonstrate the use of spies, let's introduce a logger module into our codebase: winston. Guess what the code is doing by its the test over here:

it('should call winston if name is all lowercase', function * () {  
  sinon.spy(logger, 'info')
  yield User.fetch(testName.toLocaleLowerCase())

  expect(logger.info).to.have.been.calledWith('lowercase parameter supplied')
  logger.info.restore()
})

And at last let's make this one pass too:

function fetch (name) {  
  if (name === name.toLocaleLowerCase()) {
    logger.info('lowercase parameter supplied')
  }

  return db.select('*')
    .from('users')
    .where({ name })
}

This is great, our tests pass but let's check the output:

with inserted rows  
info: lowercase parameter supplied  
    ✓ should return users with timestamps and id
info: lowercase parameter supplied  
    ✓ should return the users by their name
info: lowercase parameter supplied  
    ✓ should call winston if name is all lowercase

The logger was called, we even verified it through our tests, but it is also visible in the test output. It is generally not a good thing to have your tests output cluttered with text like that. Let's clean that up, to do that we have to replace the spy with a stub, remember I've mentioned that stubs will not call the function that you apply them to.

it('should call winston if name is all lowercase', function * () {  
  sinon.stub(logger, 'info')
  yield User.fetch(testName.toLocaleLowerCase())

  expect(logger.info).to.have.been.calledWith('lowercase parameter supplied')
  logger.info.restore()
})

This paradigm can also be applied if you don't want your functions to call the database, you can stub out all of the functions one by one on the db object like this:

it('should build the query properly', function * () {  
  const fakeDb = {
    from: sinon.spy(function () {
      return this
    }),
    where: sinon.spy(function () {
      return Promise.resolve()
    })
  }

  sinon.stub(db, 'select', () => fakeDb)
  sinon.stub(logger, 'info')

  yield User.fetch(testName.toLocaleLowerCase())

  expect(db.select).to.have.been.calledOnce
  expect(fakeDb.from).to.have.been.calledOnce
  expect(fakeDb.where).to.have.been.calledOnce

  db.select.restore()
  logger.info.restore()
})

As you can see, it is already a bit tedious work to restore all of the stubs by hand at the end of every test case. For this problem, sinon has a nice solution called sandboxing. Sinon sandboxes allow you to define a sandbox at the beginning of the test and when you're done, you can restore all of the stubs and spies that you have on the sandbox. Check out how easy it is:

it('should build the query properly', function * () {  
  const sandbox = sinon.sandbox.create()

  const fakeDb = {
    from: sandbox.spy(function () {
      return this
    }),
    where: sandbox.spy(function () {
      return Promise.resolve()
    })
  }

  sandbox.stub(db, 'select', () => fakeDb)
  sandbox.stub(logger, 'info')

  yield User.fetch(testName.toLocaleLowerCase())

  expect(db.select).to.have.been.calledOnce
  expect(fakeDb.from).to.have.been.calledOnce
  expect(fakeDb.where).to.have.been.calledOnce

  sandbox.restore()
})

To take it an additional step further you can move the sandbox creation in a beforeEach block:

beforeEach(function () {  
  this.sandbox = sinon.sandbox.create()
})
afterEach(function () {  
  this.sandbox.restore()
})

There is one last refactor to take on these tests, instead of stubbing each property on the fake object, we can use a mock instead. It makes our intentions a little bit clearer, and our code more compact. To mimic this chaining function call behavior in tests we can use the returnsThis method.

it('should build the query properly', function * () {  
  const mock = sinon.mock(db)
  mock.expects('select').once().returnsThis()
  mock.expects('from').once().returnsThis()
  mock.expects('where').once().returns(Promise.resolve())

  yield User.fetch(testName.toLocaleLowerCase())

  mock.verify()
})

Preparing for Failures

These tests are great if everything goes according to plan, but sadly we also have to prepare for failures, the database can sometimes fail, so knex will throw an error. It is really hard to mimic this behavior properly, so I'm going to stub one of the functions and expect it to throw.

it('should log and rethrow database errors', function * () {  
  this.sandbox.stub(logger, 'error')
  const mock = sinon.mock(db)
  mock.expects('select').once().returnsThis()
  mock.expects('from').once().returnsThis()
  mock.expects('where').once().returns(Promise.reject(new Error('database has failed')))

  let err
  try {
    yield User.fetch(testName.toLocaleLowerCase())
  } catch (ex) {
    err = ex
  }
  mock.verify()

  expect(logger.error).to.have.been.calledOnce
  expect(logger.error).to.have.been.calledWith('database has failed')
  expect(err.message).to.be.eql('database has failed')
})

With this pattern, you can test errors that appear in your applications, when possible try to avoid try-catch blocks as they are considered an anti-pattern. With a more functional approach it can be rewritten as the following:

it('should log and rethrow database errors', function * () {  
  this.sandbox.stub(logger, 'error')
  const mock = sinon.mock(db)
  mock.expects('select').once().returnsThis()
  mock.expects('from').once().returnsThis()
  mock.expects('where').once().returns(Promise.reject(new Error('database has failed')))

  return expect(User.fetch(testName.toLocaleLowerCase()))
    .to.be.rejectedWith('database has failed')
})

Conclusion

While this guide concludes most of what we do here at RisingStack on testing, there is a lot more to learn for us and for you from these projects' excellent documentation, links to them can be found below:

If you have made it this far, congratulations, you are now a 5-dan test-master in theory. Your last assignment is to go and fill your codebase with the knowledge you have learned and create greatly-documented test cases for your code in TDD style! :)

CQRS  Explained

CQRS Explained

What is CQRS?

CQRS is an architectural pattern, where the acronym stands for Command Query Responsibility Segregation. We can talk about CQRS when the data read operations are separated from the data write operations, and they happen on a different interface.

In most of the CQRS systems, read and write operations use different data models, sometimes even different data stores. This kind of segregation makes it easier to scale, read and write operations and to control security - but adds extra complexity to your system.

Node.js at Scale is a collection of articles focusing on the needs of companies with bigger Node.js installations and advanced Node developers. Chapters:


The level of segregation can vary in CQRS systems:

  • single data stores and separated model for reading and updating data
  • separated data stores and separated model for reading and updating data

In the simplest data store separation, we can use read-only replicas to achieve segregation.

Why and when to use CQRS?

In a typical data management system, all CRUD (Create Read Update Delete) operations are executed on the same interface of the entities in a single data storage. Like creating, updating, querying and deleting table rows in an SQL database via the same model.

CQRS really shines compared to the traditional approach (using a single model) when you build complex data models to validate and fulfil your business logic when data manipulation happens. Read operations compared to update and write operations can be very different or much simpler - like accessing a subset of your data only.

Real world example

In our Node.js Monitoring Tool, we use CQRS to segregate saving and representing the data. For example, when you see a distributed tracing visualization on our UI, the data behind it arrived in smaller chunks from our customers application agents to our public collector API.

In the collector API, we only do a thin validation and send the data to a messaging queue for processing. On the other end of the queue, workers are consuming messages and resolving all the necessary dependencies via other services. These workers are also saving the transformed data to the database.

If any issue happens, we send back the message with exponential backoff and max limit to our messaging queue. Compared to this complex data writing flow, on the representation side of the flow, we only query a read-replica database and visualize the result to our customers.

Microservice with CQRS Trace by RisingStack data processing with CQRS

CQRS and Event Sourcing

I've seen many times that people are confusing these two concepts. Both of them are heavily used in event driven infrastructures like in an event driven microservices, but they mean very different things.

To read more about Event Sourcing with Examples, check out our previous Node.js at Scale article.

Download the whole building with Node.js series as a single pdf

Reporting database - Denormalizer

In some event driven systems, CQRS is implemented in a way that the system contains one or multiple Reporting databases.

A Reporting database is an entirely different read-only storage that models and persists the data in the best format for representing it. It's okay to store it in a denormalized format to optimize it for the client needs. In some cases, the reporting database contains only derived data, even from multiple data sources.

In a microservices architecture, we call a service the Denormalizer if it listens for some events and maintains a Reporting Database based on these. The client is reading the denormalized service's reporting database.

An example can be that the user profile service emits a user.edit event with { id: 1, name: 'John Doe', state: 'churn' } payload, the Denormalizer service listens to it but only stores the { name: 'John Doe' } in its Reporting Database, because the client is not interested in the internal state churn of the user.

It can be hard to keep a Reporting Database in sync. Usually, we can only aim to eventual consistency.

A CQRS Node.js Example Repo

For our CQRS with Denormalizer Node.js example visit our cqrs-example GitHub repository.

CQRS Example

Outro

CQRS is a powerful architectural pattern to segregate read and write operations and their interfaces, but it also adds extra complexity to your system. In most of the cases, you shouldn't use CQRS for the whole system, only for specific parts where the complexity and scalability make it necessary.

To read more about CQRS and Reporting databases, I recommend to check out these resources:

In the next chapter of the Node.js at Scale series we'll discuss Node.js Testing and Getting TDD Right. Read on! :)

I’m happy to answer your CQRS related questions in the comments section!

Event Sourcing with Examples in Node.js

Event Sourcing with Examples in Node.js

Event Sourcing is a powerful architectural pattern to handle complex application states that may need to be rebuilt, re-played, audited or debugged.

From this article you can learn what Event Sourcing is, and when should you use it. We’ll also take a look at some Event sourcing examples with code snippets.

Node.js at Scale is a collection of articles focusing on the needs of companies with bigger Node.js installations and advanced Node developers. Chapters:

Event Sourcing

Event Sourcing is a software architecture pattern which makes it possible to reconstruct past states (latest state as well). It's achieved in a way that every state change gets stored as a sequence of events.

The State of your application is like a user's account balance or subscription at a particular time. This current state may only exist in memory.

Good examples for Event Sourcing are version control systems that stores current state as diffs. The current state is your latest source code, and events are your commits.

Why is Event Sourcing useful?

In our hypothetical example, you are working on an online money transfer site, where every customer has an account balance. Imagine that you just started working on a beautiful Monday morning when it suddenly turns out that you made a mistake and used a wrong currency exchange for the whole past week. In this case, every account which sent and received money in a last seven days are in a corrupt state.

With event sourcing, there’s no need to panic!

If your site uses event sourcing, you can revert the account balances to their previous uncorrupted state, fix the exchange rate and replay all the events until now. That's it, your job and reputation is saved!


Node.js Monitoring and Debugging from the Experts of RisingStack

See how deployments affect the performance of your production environment.
Learn more


Other use-cases

You can use events to audit or debug state changes in your system. They can also be useful for handling SaaS subscriptions. In a usual subscription based system, your users can buy a plan, upgrade it, downgrade it, pro-rate a current price, cancel a plan, apply a coupon, and so on... A good event log can be very useful to figure out what happened.

So with event sourcing you can:

  • Rebuild states completely
  • Replay states from a specific time
  • Reconstruct the state of a specific moment for temporary query

What is an Event?

An Event is something that happened in the past. An Event is not a snapshot of a state at a specific time; it's the action itself with all the information that's necessary to replay it.

Events should be a simple object which describes some action that occurred. They should be immutable and stored in an append-only way. Their immutable append-only nature makes them suitable to use as audit logs too.

This is what makes possible to undo and redo events or even replay them from a specific timestamp.

Be careful with External Systems!

As any software pattern, Event Sourcing can be challenging at some points as well.

The external systems that your application communicates with are usually not prepared for event sourcing, so you should be careful when you replay your events. I’m sure that you don’t wish to charge your customers twice or send all welcome emails again.

To solve this challenge, you should handle replays in your communication layers!

Command Sourcing

Command Sourcing is a different approach from Event Sourcing - make sure you don’t mix ‘em up by accident!

Event Sourcing:

  • Persist only changes in state
  • Replay can be side-effect free

Command Sourcing:

  • Persist Commands
  • Replay may trigger side-effects

Example for Event Sourcing

In this simple example, we will apply Event Sourcing for our accounts:

// current account states (how it looks in our DB now)
const accounts = {  
  account1: { balance: 100 },
  account2: { balance: 50 }
}
// past events (should be persisted somewhere, for example in a DB)
const events = [  
  { type: 'open', id: 'account1', balance: 150, time: 0 },
  { type: 'open', id: 'account2', balance: 0, time: 1 },
  { type: 'transfer', fromId: 'account1', toId: 'account2': amount: 50, time: 2 }
]

Let's rebuild the latest state from scratch, using our event log:

// complete rebuild
const accounts = events.reduce((accounts, event) => {  
  if (event.type === 'open') {
    accounts[event.id].balance = event.balance
  } else if (event.type === 'transfer') {
    accounts[event.fromId].balance -= event.amount
    accounts[event.toId].balance += event.amount
  }
  return accounts
}, {})

Undo the latest event:

// undo last event
const accounts = events.splice(-1).reduce((accounts, event) => {  
  if (event.type === 'open') {
    delete accounts[event.id]
  } else if (event.type === 'transfer') {
    accounts[event.fromId].balance += event.amount
    accounts[event.toId].balance -= event.amount
  }
  return accounts
}, {})

Query accounts state at a specific time:

// query specific time
function getAccountsAtTime (time) {  
  return events.reduce((accounts, event) => {
    if (time > event.time {
      return accounts
    }

    if (event.type === 'open') {
      accounts[event.id].balance = event.balance
    } else if (event.type === 'transfer') {
      accounts[event.fromId].balance -= event.amount
      accounts[event.toId].balance += event.amount
    }
    return accounts
  }, {})
}

const accounts = getAccountsAtTime(1)  

Download the whole building with Node.js series as a single pdf

Learning more..

For more detailed examples, you can check out our Event Sourcing Example repository.

For more general and deeper understanding of Event Sourcing I recommend to read these articles:

In the next part of the Node.js at Scale series, we’ll learn about Command Query Responsibility Segregation. Make sure you check back in a week!

If you have any questions on this topic, please let me know in the comments section below!

Node.js Async Best Practices & Avoiding the Callback Hell

Node.js Async Best Practices & Avoiding the Callback Hell

In this post, we cover what tools and techniques you have at your disposal when handling Node.js asynchronous operations: async.js, promises, generators and async functions.

After reading this article, you’ll know how to avoid the despised callback hell!

Node.js at Scale is a collection of articles focusing on the needs of companies with bigger Node.js installations and advanced Node developers. Chapters:

Asynchronous programming in Node.js

Previously we have gathered a strong knowledge about asynchronous programming in JavaScript and understood how the Node.js event loop works.

If you did not read these articles, I highly recommend them as introductions!

The Problem with Node.js Async

Node.js itself is single threaded, but some tasks can run parallelly - thanks to its asynchronous nature.

But what does running parallelly mean in practice?

Since we program a single threaded VM, it is essential that we do not block execution by waiting for I/O, but handle them concurrently with the help of Node.js's event driven APIs.

Let’s take a look at some fundamental patterns, and learn how we can write resource efficient, non-blocking code, with the built-in solutions of Node.js and some third-party libraries.

The Classical Approach - Callbacks

Let's take a look at these simple async operations. They do nothing special, just fire a timer and call a function once the timer finished.

function fastFunction (done) {  
  setTimeout(function () {
    done()
  }, 100)
}

function slowFunction (done) {  
  setTimeout(function () {
    done()
  }, 300)
}

Seems easy, right?

Our higher-order functions can be executed sequentially or parallelly with the basic "pattern" by nesting callbacks - but using this method can lead to an untameable callback-hell.

function runSequentially (callback) {  
  fastFunction((err, data) => {
    if (err) return callback(err)
    console.log(data)   // results of a

    slowFunction((err, data) => {
      if (err) return callback(err)
      console.log(data) // results of b

      // here you can continue running more tasks
    })
  })
}


Avoiding Callback Hell with Control Flow Managers

To become an efficient Node.js developer, you have to avoid the constantly growing indentation level, produce clean and readable code and be able to handle complex flows.

Let me show you some of the libraries we can use to organize our code in a nice and maintainable way!


Node.js Monitoring and Debugging from the Experts of RisingStack

Concurrency issues in production? Trace can help!
Learn more


#1: Meet the Async Module

Async is a utility module which provides straight-forward, powerful functions for working with asynchronous JavaScript.

Async contains some common patterns for asynchronous flow control with the respect of error-first callbacks.

Let's see how our previous example would look like using async!

async.waterfall([fastFunction, slowFunction], () => {  
  console.log('done')
})

What kind of witchcraft just happened?

Actually, there is no magic to reveal. You can easily implement your async job-runner which can run tasks parallelly and wait for each to be ready.

Let's take a look at what async does under the hood!

// taken from https://github.com/caolan/async/blob/master/lib/waterfall.js
function(tasks, callback) {  
    callback = once(callback || noop);
    if (!isArray(tasks)) return callback(new Error('First argument to waterfall must be an array of functions'));
    if (!tasks.length) return callback();
    var taskIndex = 0;

    function nextTask(args) {
        if (taskIndex === tasks.length) {
            return callback.apply(null, [null].concat(args));
        }

        var taskCallback = onlyOnce(rest(function(err, args) {
            if (err) {
                return callback.apply(null, [err].concat(args));
            }
            nextTask(args);
        }));

        args.push(taskCallback);

        var task = tasks[taskIndex++];
        task.apply(null, args);
    }

    nextTask([]);
}

Essentially, a new callback is injected into the functions, and this is how async knows when a function is finished.

#2: Using co - generator based flow-control for Node.js

In case you wouldn't like to stick to the solid callback protocol, then co can be a good choice for you.

co is a generator based control flow tool for Node.js and the browser, using promises, letting you write non-blocking code in a nice-ish way.

co is a powerful alternative which takes advantage of generator functions tied with promises without the overhead of implementing custom iterators.

const fastPromise = new Promise((resolve, reject) => {  
  fastFunction(resolve)
})

const slowPromise = new Promise((resolve, reject) => {  
  slowFunction(resolve)
})

co(function * () {  
  yield fastPromise
  yield slowPromise
}).then(() => {
  console.log('done')
})

As for now, I suggest to go with co, since one of the most waited Node.js async/await functionality is only available in the nightly, unstable v7.x builds. But if you are already using Promises, switching from co to async function will be easy.

This syntactic sugar on top of Promises and Generators will eliminate the problem of callbacks and even help you to build nice flow control structures. Almost like writing synchronous code, right?

Stable Node.js branches will receive this update in the near future, so you will be able to remove co and just do the same.

Flow Control in Practice

As we have just learned several tools and tricks to handle async, it is time to do some practice with fundamental control flows to make our code more efficient and clean.

Let’s take an example and write a route handler for our web app, where the request can be resolved after 3 steps: validateParams, dbQuery and serviceCall.

If you'd like to write them without any helper, you'd most probably end up with something like this. Not so nice, right?

// validateParams, dbQuery, serviceCall are higher-order functions
// DONT
function handler (done) {  
  validateParams((err) => {
    if (err) return done(err)
    dbQuery((err, dbResults) => {
      if (err) return done(err)
      serviceCall((err, serviceResults) => {
        done(err, { dbResults, serviceResults })
      })
    })
  })
}

Instead of the callback-hell, we can use the async library to refactor our code, as we have already learned:

// validateParams, dbQuery, serviceCall are higher-order functions
function handler (done) {  
  async.waterfall([validateParams, dbQuery, serviceCall], done)
}

Let's take it a step further! Rewrite it to use Promises:

// validateParams, dbQuery, serviceCall are thunks
function handler () {  
  return validateParams()
    .then(dbQuery)
    .then(serviceCall)
    .then((result) => {
      console.log(result)
      return result
    })
}

Also, you can use co powered generators with Promises:

// validateParams, dbQuery, serviceCall are thunks
const handler = co.wrap(function * () {  
  yield validateParams()
  const dbResults = yield dbQuery()
  const serviceResults = yield serviceCall()
  return { dbResults, serviceResults }
})

It feels like a "synchronous" code but still doing async jobs one after each other.

Lets see how this snippet should work with async / await.

// validateParams, dbQuery, serviceCall are thunks
async function handler () {  
  await validateParams()
  const dbResults = await dbQuery()
  const serviceResults = await serviceCall()
  return { dbResults, serviceResults }
})

Download the whole building with Node.js series as a single pdf

Takeaway rules for Node.js & Async

Fortunately, Node.js eliminates the complexities of writing thread-safe code. You just have to stick to these rules to keep things smooth:

  • As a rule of thumb, prefer async over sync API, because using a non-blocking approach gives superior performance over the synchronous scenario.

  • Always use the best fitting flow control or a mix of them in order reduce the time spent waiting for I/O to complete.

You can find all of the code from this article in this repository.

If you have any questions or suggestions for the article, please let me know in the comments!

In the next part of the Node.js at Scale series, we take a look at Event Sourcing with Examples.

JavaScript Clean Coding Best Practices

JavaScript Clean Coding Best Practices

Writing clean code is what you must know and do in order to call yourself a professional developer. There is no reasonable excuse for doing anything less than your best.

In this blog post, we will cover general clean coding principles for naming and using variables & functions, as well as some JavaScript specific clean coding best practices.

“Even bad code can function. But if the code isn’t clean, it can bring a development organization to its knees.” — Robert C. Martin (Uncle Bob)


Node.js at Scale is a collection of articles focusing on the needs of companies with bigger Node.js installations and advanced Node developers. Chapters:

First of all, what does clean coding mean?

Clean coding means that in the first place you write code for your later self and for your co-workers and not for the machine.

Your code must be easily understandable for humans.

"Write code for your later self and for your co-workers in the first place - not for the machine." via @RisingStack

Click To Tweet

You know you are working on a clean code when each routine you read turns out to be pretty much what you expected.

JavaSctipr Clean Coding: The only valid measurement of code quality is WTFs/minute

JavaScript Clean Coding Best Practices

Now that we know what every developer should aim for, let’s go through the best practices!

How should I name my variables?

Use intention-revealing names and don't worry if you have long variable names instead of saving a few keyboard strokes.

If you follow this practice, your names become searchable, which helps a lot when you do refactors or you are just looking for something.

// DON'T
let d  
let elapsed  
const ages = arr.map((i) => i.age)

// DO
let daysSinceModification  
const agesOfUsers = users.map((user) => user.age)  

Also, make meaningful distinctions and don't add extra, unnecessary nouns to the variable names, like its type (hungarian notation).

// DON'T
let nameString  
let theUsers

// DO
let name  
let users  

Make your variable names easy to pronounce, because for the human mind it takes less effort to process.

When you are doing code reviews with your fellow developers, these names are easier to reference.

// DON'T
let fName, lName  
let cntr

let full = false  
if (cart.size > 100) {  
  full = true
}

// DO
let firstName, lastName  
let counter

const MAX_CART_SIZE = 100  
// ...
const isFull = cart.size > MAX_CART_SIZE  

In short, don't cause extra mental mapping with your names.

How should I write my functions?

Your functions should do one thing only on one level of abstraction.

Functions should do one thing. They should do it well. They should do it only. — Robert C. Martin (Uncle Bob)

// DON'T
function getUserRouteHandler (req, res) {  
  const { userId } = req.params
  // inline SQL query
  knex('user')
    .where({ id: userId })
    .first()
    .then((user) => res.json(user))
}

// DO
// User model (eg. models/user.js)
const tableName = 'user'  
const User = {  
  getOne (userId) {
    return knex(tableName)
      .where({ id: userId })
      .first()
  }
}

// route handler (eg. server/routes/user/get.js)
function getUserRouteHandler (req, res) {  
  const { userId } = req.params
  User.getOne(userId)
    .then((user) => res.json(user))
}

After you wrote your functions properly, you can test how well you did with CPU profiling - which helps you to find bottlenecks.

Node.js Monitoring and Debugging from the Experts of RisingStack

Find slow functions using Trace
Learn more

Use long, descriptive names

A function name should be a verb or a verb phrase, and it needs to communicate its intent, as well as the order and intent of the arguments.

A long descriptive name is way better than a short, enigmatic name or a long descriptive comment.

// DON'T
/**
 * Invite a new user with its email address
 * @param {String} user email address
 */
function inv (user) { /* implementation */ }

// DO
function inviteUser (emailAddress) { /* implementation */ }  


Avoid long argument list

Use a single object parameter and destructuring assignment instead. It also makes handling optional parameters much easier.

// DON'T
function getRegisteredUsers (fields, include, fromDate, toDate) { /* implementation */ }  
getRegisteredUsers(['firstName', 'lastName', 'email'], ['invitedUsers'], '2016-09-26', '2016-12-13')

// DO
function getRegisteredUsers ({ fields, include, fromDate, toDate }) { /* implementation */ }  
getRegisteredUsers({  
  fields: ['firstName', 'lastName', 'email'],
  include: ['invitedUsers'],
  fromDate: '2016-09-26',
  toDate: '2016-12-13'
})

Reduce side effects

Use pure functions without side effects, whenever you can. They are really easy to use and test.

// DON'T
function addItemToCart (cart, item, quantity = 1) {  
  const alreadyInCart = cart.get(item.id) || 0
  cart.set(item.id, alreadyInCart + quantity)
  return cart
}

// DO
// not modifying the original cart
function addItemToCart (cart, item, quantity = 1) {  
  const cartCopy = new Map(cart)
  const alreadyInCart = cartCopy.get(item.id) || 0
  cartCopy.set(item.id, alreadyInCart + quantity)
  return cartCopy
}

// or by invert the method location
// you can expect that the original object will be mutated
// addItemToCart(cart, item, quantity) -> cart.addItem(item, quantity)
const cart = new Map()  
Object.assign(cart, {  
  addItem (item, quantity = 1) {
    const alreadyInCart = this.get(item.id) || 0
    this.set(item.id, alreadyInCart + quantity)
    return this
  }
})


Organize your functions in a file according to the stepdown rule

Higher level functions should be on top and lower levels below. It makes it natural to read the source code.

// DON'T
// "I need the full name for something..."
function getFullName (user) {  
  return `${user.firstName} ${user.lastName}`
}

function renderEmailTemplate (user) {  
  // "oh, here"
  const fullName = getFullName(user)
  return `Dear ${fullName}, ...`
}

// DO
function renderEmailTemplate (user) {  
  // "I need the full name of the user"
  const fullName = getFullName(user)
  return `Dear ${fullName}, ...`
}

// "I use this for the email template rendering"
function getFullName (user) {  
  return `${user.firstName} ${user.lastName}`
}


Query or modification

Functions should either do something (modify) or answer something (query), but not both.


Everyone likes to write JavaScript differently, what to do?

As JavaScript is dynamic and loosely typed, it is especially prone to programmer errors.

Use project or company wise linter rules and formatting style.

The stricter the rules, the less effort will go into pointing out bad formatting in code reviews. It should cover things like consistent naming, indentation size, whitespace placement and even semicolons.

"The stricter the linter rules, the less effort needed to point out bad formatting in code reviews." by @RisingStack

Click To Tweet

The standard JS style is quite nice to start with, but in my opinion, it isn't strict enough. I can agree most of the rules in the Airbnb style.

How to write nice async code?

Use Promises whenever you can.

Promises are natively available from Node 4. Instead of writing nested callbacks, you can have chainable Promise calls.

// AVOID
asyncFunc1((err, result1) => {  
  asyncFunc2(result1, (err, result2) => {
    asyncFunc3(result2, (err, result3) => {
      console.lor(result3)
    })
  })
})

// PREFER
asyncFuncPromise1()  
  .then(asyncFuncPromise2)
  .then(asyncFuncPromise3)
  .then((result) => console.log(result))
  .catch((err) => console.error(err))

Most of the libraries out there have both callback and promise interfaces, prefer the latter. You can even convert callback APIs to promise based one by wrapping them using packages like es6-promisify.

// AVOID
const fs = require('fs')

function readJSON (filePath, callback) {  
  fs.readFile(filePath, (err, data) => {
    if (err) {
      return callback(err)
    }

    try {
      callback(null, JSON.parse(data))
    } catch (ex) {
      callback(ex)
    }
  })
}

readJSON('./package.json', (err, pkg) => { console.log(err, pkg) })

// PREFER
const fs = require('fs')  
const promisify = require('es6-promisify')

const readFile = promisify(fs.readFile)  
function readJSON (filePath) {  
  return readFile(filePath)
    .then((data) => JSON.parse(data))
}

readJSON('./package.json')  
  .then((pkg) => console.log(pkg))
  .catch((err) => console.error(err))

The next step would be to use async/await (≥ Node 7) or generators with co (≥ Node 4) to achieve synchronous like control flows for your asynchronous code.

const request = require('request-promise-native')

function getExtractFromWikipedia (title) {  
  return request({
    uri: 'https://en.wikipedia.org/w/api.php',
    qs: {
      titles: title,
      action: 'query',
      format: 'json',
      prop: 'extracts',
      exintro: true,
      explaintext: true
    },
    method: 'GET',
    json: true
  })
    .then((body) => Object.keys(body.query.pages).map((key) => body.query.pages[key].extract))
    .then((extracts) => extracts[0])
    .catch((err) => {
      console.error('getExtractFromWikipedia() error:', err)
      throw err
    })
} 

// PREFER
async function getExtractFromWikipedia (title) {  
  let body
  try {
    body = await request({ /* same parameters as above */ })
  } catch (err) {
    console.error('getExtractFromWikipedia() error:', err)
    throw err
  }

  const extracts = Object.keys(body.query.pages).map((key) => body.query.pages[key].extract)
  return extracts[0]
}

// or
const co = require('co')

const getExtractFromWikipedia = co.wrap(function * (title) {  
  let body
  try {
    body = yield request({ /* same parameters as above */ })
  } catch (err) {
    console.error('getExtractFromWikipedia() error:', err)
    throw err
  }

  const extracts = Object.keys(body.query.pages).map((key) => body.query.pages[key].extract)
  return extracts[0]
})

getExtractFromWikipedia('Robert Cecil Martin')  
  .then((robert) => console.log(robert))


How should I write performant code?

In the first place, you should write clean code, then use profiling to find performance bottlenecks.

Never try to write performant and smart code first, instead, optimize the code when you need to and refer to true impact instead of micro-benchmarks.

"Write clean code first and optimize it when you need to. Refer to true impact instead of micro-benchmarks!"

Click To Tweet

Although, there are some straightforward scenarios like eagerly initializing what you can (eg. joi schemas in route handlers, which would be used in every request and adds serious overhead if recreated every time) and using asynchronous instead of blocking code.

Download the whole building with Node.js series as a single pdf

Next up in Node.js at Scale

In the next episode of this series, we’ll discuss advanced Node.js async best practices and avoiding the callback hell!

If you have any questions regarding clean coding, don’t hesitate and let me know in the comments!


Advanced Node.js Project Structure Tutorial

Advanced Node.js Project Structure Tutorial

Project structuring is an important topic because the way you bootstrap your application can determine the whole development experience throughout the life of the project.

In this Node.js project structure tutorial I’ll answer some of the most common questions we receive at RisingStack about structuring advanced Node applications, and help you with structuring a complex project.

These are the goals that we are aiming for:

  • Writing an application that is easy to scale and maintain.
  • The config is well separated from the business logic.
  • Our application can consist of multiple process types.

Node.js at Scale is a collection of articles focusing on the needs of companies with bigger Node.js installations and advanced Node developers. Chapters:

The Node.js Project Structure

Our example application is listening on Twitter tweets and tracks certain keywords. In case of a keyword match, the tweet will be sent to a RabbitMQ queue, which will be processed and saved to Redis. We will also have a REST API exposing the tweets we have saved.

You can take a look at the code on GitHub. The file structure for this project looks like the following:

.
|-- config
|   |-- components
|   |   |-- common.js
|   |   |-- logger.js
|   |   |-- rabbitmq.js
|   |   |-- redis.js
|   |   |-- server.js
|   |   `-- twitter.js
|   |-- index.js
|   |-- social-preprocessor-worker.js
|   |-- twitter-stream-worker.js
|   `-- web.js
|-- models
|   |-- redis
|   |   |-- index.js
|   |   `-- redis.js
|   |-- tortoise
|   |   |-- index.js
|   |   `-- tortoise.js
|   `-- twitter
|       |-- index.js
|       `-- twitter.js
|-- scripts
|-- test
|   `-- setup.js
|-- web
|   |-- middleware
|   |   |-- index.js
|   |   `-- parseQuery.js
|   |-- router
|   |   |-- api
|   |   |   |-- tweets
|   |   |   |   |-- get.js
|   |   |   |   |-- get.spec.js
|   |   |   |   `-- index.js
|   |   |   `-- index.js
|   |   `-- index.js
|   |-- index.js
|   `-- server.js
|-- worker
|   |-- social-preprocessor
|   |   |-- index.js
|   |   `-- worker.js
|   `-- twitter-stream
|       |-- index.js
|       `-- worker.js
|-- index.js
`-- package.json

In this example we have 3 processes:

  • twitter-stream-worker: The process is listening on Twitter for keywords and sends the tweets to a RabbitMQ queue.
  • social-preprocessor-worker: The process is listening on the RabbitMQ queue and saves the tweets to Redis and removes old ones.
  • web: The process is serving a REST API with a single endpoint: GET /api/v1/tweets?limit&offset.

We will get to what differentiates a web and a worker process, but let's start with the config.

Node.js Monitoring and Debugging from the Experts of RisingStack

Check your service dependencies in production using Trace
Learn more

How to handle different environments and configurations?

Load your deployment specific configurations from environment variables and never add them to the codebase as constants. These are the configurations that can vary between deployments and runtime environments, like CI, staging or production. Basically, you can have the same code running everywhere.

A good test for whether the config is correctly separated from the application internals is that the codebase could be made public at any moment. This means that you can be protected from accidentally leaking secrets or compromising credentials on version control.

Your config is correctly separated from the apps internals if the codebase could be made public at any moment.

Click To Tweet

The environment variables can be accessed via the process.env object. Keep in mind that all the values have a type of String, so you might need to use type conversions.

// config/config.js
'use strict'

// required environment variables
[
  'NODE_ENV',
  'PORT'
].forEach((name) => {
  if (!process.env[name]) {
    throw new Error(`Environment variable ${name} is missing`)
  }
})

const config = {  
  env: process.env.NODE_ENV,
  logger: {
    level: process.env.LOG_LEVEL || 'info',
    enabled: process.env.BOOLEAN ? process.env.BOOLEAN.toLowerCase() === 'true' : false
  },
  server: {
    port: Number(process.env.PORT)
  }
  // ...
}

module.exports = config  

Config validation

Validating environment variables is also a quite useful technique. It can help you catching configuration errors on startup before your application does anything else. You can read more about the benefits of early error detection of configurations by Adrian Colyer in this blog post.

This is how our improved config file looks like with schema validation using the joi validator:

// config/config.js
'use strict'

const joi = require('joi')

const envVarsSchema = joi.object({  
  NODE_ENV: joi.string()
    .allow(['development', 'production', 'test', 'provision'])
    .required(),
  PORT: joi.number()
    .required(),
  LOGGER_LEVEL: joi.string()
    .allow(['error', 'warn', 'info', 'verbose', 'debug', 'silly'])
    .default('info'),
  LOGGER_ENABLED: joi.boolean()
    .truthy('TRUE')
    .truthy('true')
    .falsy('FALSE')
    .falsy('false')
    .default(true)
}).unknown()
  .required()

const { error, value: envVars } = joi.validate(process.env, envVarsSchema)  
if (error) {  
  throw new Error(`Config validation error: ${error.message}`)
}

const config = {  
  env: envVars.NODE_ENV,
  isTest: envVars.NODE_ENV === 'test',
  isDevelopment: envVars.NODE_ENV === 'development',
  logger: {
    level: envVars.LOGGER_LEVEL,
    enabled: envVars.LOGGER_ENABLED
  },
  server: {
    port: envVars.PORT
  }
  // ...
}

module.exports = config  


Config splitting

Splitting the configuration by components can be a good solution to forego a single, growing config file.

// config/components/logger.js
'use strict'

const joi = require('joi')

const envVarsSchema = joi.object({  
  LOGGER_LEVEL: joi.string()
    .allow(['error', 'warn', 'info', 'verbose', 'debug', 'silly'])
    .default('info'),
  LOGGER_ENABLED: joi.boolean()
    .truthy('TRUE')
    .truthy('true')
    .falsy('FALSE')
    .falsy('false')
    .default(true)
}).unknown()
  .required()

const { error, value: envVars } = joi.validate(process.env, envVarsSchema)  
if (error) {  
  throw new Error(`Config validation error: ${error.message}`)
}

const config = {  
  logger: {
    level: envVars.LOGGER_LEVEL,
    enabled: envVars.LOGGER_ENABLED
  }
}

module.exports = config  

Then in the config.js file we only need to combine the components.

// config/config.js
'use strict'

const common = require('./components/common')  
const logger = require('./components/logger')  
const redis = require('./components/redis')  
const server = require('./components/server')

module.exports = Object.assign({}, common, logger, redis, server)  

You should never group your config together into "environment" specific files, like config/production.js for production. It doesn't scale well as your app expands into more deployments over time.

You should never group your config together into environment specific files. It doesn’t scale well! #nodejs

Click To Tweet

How to organize a multi-process application?

The process is the main building block of a modern application. An app can have multiple stateless processes, just like in our example. HTTP requests can be handled by a web process and long-running or scheduled background tasks by a worker. They are stateless, because any data that needs to be persisted is stored in a stateful database. For this reason, adding more concurrent processes are very simple. These processes can be independently scaled based on the load or other metrics.

In the previous section, we saw how to break down the config into components. This comes very handy when having different process types. Each type can have its own config only requiring the components it needs, without expecting unused environment variables.

In the config/index.js file:

// config/index.js
'use strict'

const processType = process.env.PROCESS_TYPE

let config  
try {  
  config = require(`./${processType}`)
} catch (ex) {
  if (ex.code === 'MODULE_NOT_FOUND') {
    throw new Error(`No config for process type: ${processType}`)
  }

  throw ex
}

module.exports = config  

In the root index.js file we start the process selected with the PROCESS_TYPE environment variable:

// index.js
'use strict'

const processType = process.env.PROCESS_TYPE

if (processType === 'web') {  
  require('./web')
} else if (processType === 'twitter-stream-worker') {
  require('./worker/twitter-stream')
} else if (processType === 'social-preprocessor-worker') {
  require('./worker/social-preprocessor')
} else {
  throw new Error(`${processType} is an unsupported process type. Use one of: 'web', 'twitter-stream-worker', 'social-preprocessor-worker'!`)
}

The nice thing about this is that we still got one application, but we have managed to split it into multiple, independent processes. Each of them can be started and scaled individually, without influencing the other parts. You can achieve this without sacrificing your DRY codebase, because parts of the code, like the models, can be shared between the different processes.

How to organize your test files?

Place your test files next to the tested modules using some kind of naming convention, like <module_name>.spec.js and <module_name>.e2e.spec.js. Your tests should live together with the tested modules, keeping them in sync. It would be really hard to find and maintain the tests and the corresponding functionality when the test files are completely separated from the business logic.

Place your test files next to the tested modules using some kind of naming convention, like module_name.spec.js

Click To Tweet

A separated /test folder can hold all the additional test setup and utilities not used by the application itself.

Where to put your build and script files?

We tend to create a /scripts folder where we put our bash and node scripts for database synchronization, front-end builds and so on. This folder separates them from your application code and prevents you from putting too many script files into the root directory. List them in your npm scripts for easier usage.

Download the whole building with Node.js series as a single pdf

Conclusion

I hope you enjoyed this article on project structuring. I highly recommend to check out our previous article on the subject, where we laid out the 5 fundamentals of Node.js project structuring.

If you have any questions, please let me know in the comments. In the next chapter of the Node.js at Scale series, we’re going to dive deep into JavaScript clean coding. See you next week!