Trace Node.js Monitoring Start my free trial!

node.js tutorial

Advanced Node.js Project Structure Tutorial - Node.js at Scale

Advanced Node.js Project Structure Tutorial - Node.js at Scale

Project structuring is an important topic because the way you bootstrap your application can determine the whole development experience throughout the life of the project.

In this Node.js project structure tutorial I’ll answer some of the most common questions we receive at RisingStack about structuring advanced Node applications, and help you with structuring a complex project.

These are the goals that we are aiming for:

  • Writing an application that is easy to scale and maintain.
  • The config is well separated from the business logic.
  • Our application can consist of multiple process types.

Node.js at Scale is a collection of articles focusing on the needs of companies with bigger Node.js installations and advanced Node developers. Chapters:


The Node.js Project Structure

Our example application is listening on Twitter tweets and tracks certain keywords. In case of a keyword match, the tweet will be sent to a RabbitMQ queue, which will be processed and saved to Redis. We will also have a REST API exposing the tweets we have saved.

You can take a look at the code on GitHub. The file structure for this project looks like the following:

.
|-- config
|   |-- components
|   |   |-- common.js
|   |   |-- logger.js
|   |   |-- rabbitmq.js
|   |   |-- redis.js
|   |   |-- server.js
|   |   `-- twitter.js
|   |-- index.js
|   |-- social-preprocessor-worker.js
|   |-- twitter-stream-worker.js
|   `-- web.js
|-- models
|   |-- redis
|   |   |-- index.js
|   |   `-- redis.js
|   |-- tortoise
|   |   |-- index.js
|   |   `-- tortoise.js
|   `-- twitter
|       |-- index.js
|       `-- twitter.js
|-- scripts
|-- test
|   `-- setup.js
|-- web
|   |-- middleware
|   |   |-- index.js
|   |   `-- parseQuery.js
|   |-- router
|   |   |-- api
|   |   |   |-- tweets
|   |   |   |   |-- get.js
|   |   |   |   |-- get.spec.js
|   |   |   |   `-- index.js
|   |   |   `-- index.js
|   |   `-- index.js
|   |-- index.js
|   `-- server.js
|-- worker
|   |-- social-preprocessor
|   |   |-- index.js
|   |   `-- worker.js
|   `-- twitter-stream
|       |-- index.js
|       `-- worker.js
|-- index.js
`-- package.json

In this example we have 3 processes:

  • twitter-stream-worker: The process is listening on Twitter for keywords and sends the tweets to a RabbitMQ queue.
  • social-preprocessor-worker: The process is listening on the RabbitMQ queue and saves the tweets to Redis and removes old ones.
  • web: The process is serving a REST API with a single endpoint: GET /api/v1/tweets?limit&offset.

We will get to what differentiates a web and a worker process, but let's start with the config.

How to handle different environments and configurations?

Load your deployment specific configurations from environment variables and never add them to the codebase as constants. These are the configurations that can vary between deployments and runtime environments, like CI, staging or production. Basically, you can have the same code running everywhere.

A good test for whether the config is correctly separated from the application internals is that the codebase could be made public at any moment. This means that you can be protected from accidentally leaking secrets or compromising credentials on version control.

Your config is correctly separated from the apps internals if the codebase could be made public at any moment.

Click To Tweet

The environment variables can be accessed via the process.env object. Keep in mind that all the values have a type of String, so you might need to use type conversions.

// config/config.js
'use strict'

// required environment variables
[
  'NODE_ENV',
  'PORT'
].forEach((name) => {
  if (!process.env[name]) {
    throw new Error(`Environment variable ${name} is missing`)
  }
})

const config = {  
  env: process.env.NODE_ENV,
  logger: {
    level: process.env.LOG_LEVEL || 'info',
    enabled: process.env.BOOLEAN ? process.env.BOOLEAN.toLowerCase() === 'true' : false
  },
  server: {
    port: Number(process.env.PORT)
  }
  // ...
}

module.exports = config  



Need help with enterprise-grade Node.js Development?
Hire the experts of RisingStack!


Config validation

Validating environment variables is also a quite useful technique. It can help you catching configuration errors on startup before your application does anything else. You can read more about the benefits of early error detection of configurations by Adrian Colyer in this blog post.

This is how our improved config file looks like with schema validation using the joi validator:

// config/config.js
'use strict'

const joi = require('joi')

const envVarsSchema = joi.object({  
  NODE_ENV: joi.string()
    .allow(['development', 'production', 'test', 'provision'])
    .required(),
  PORT: joi.number()
    .required(),
  LOGGER_LEVEL: joi.string()
    .allow(['error', 'warn', 'info', 'verbose', 'debug', 'silly'])
    .default('info'),
  LOGGER_ENABLED: joi.boolean()
    .truthy('TRUE')
    .truthy('true')
    .falsy('FALSE')
    .falsy('false')
    .default(true)
}).unknown()
  .required()

const { error, value: envVars } = joi.validate(process.env, envVarsSchema)  
if (error) {  
  throw new Error(`Config validation error: ${error.message}`)
}

const config = {  
  env: envVars.NODE_ENV,
  isTest: envVars.NODE_ENV === 'test',
  isDevelopment: envVars.NODE_ENV === 'development',
  logger: {
    level: envVars.LOGGER_LEVEL,
    enabled: envVars.LOGGER_ENABLED
  },
  server: {
    port: envVars.PORT
  }
  // ...
}

module.exports = config  


Config splitting

Splitting the configuration by components can be a good solution to forego a single, growing config file.

// config/components/logger.js
'use strict'

const joi = require('joi')

const envVarsSchema = joi.object({  
  LOGGER_LEVEL: joi.string()
    .allow(['error', 'warn', 'info', 'verbose', 'debug', 'silly'])
    .default('info'),
  LOGGER_ENABLED: joi.boolean()
    .truthy('TRUE')
    .truthy('true')
    .falsy('FALSE')
    .falsy('false')
    .default(true)
}).unknown()
  .required()

const { error, value: envVars } = joi.validate(process.env, envVarsSchema)  
if (error) {  
  throw new Error(`Config validation error: ${error.message}`)
}

const config = {  
  logger: {
    level: envVars.LOGGER_LEVEL,
    enabled: envVars.LOGGER_ENABLED
  }
}

module.exports = config  

Then in the config.js file we only need to combine the components.

// config/config.js
'use strict'

const common = require('./components/common')  
const logger = require('./components/logger')  
const redis = require('./components/redis')  
const server = require('./components/server')

module.exports = Object.assign({}, common, logger, redis, server)  

You should never group your config together into "environment" specific files, like config/production.js for production. It doesn't scale well as your app expands into more deployments over time.

You should never group your config together into environment specific files. It doesn’t scale well! #nodejs

Click To Tweet

How to organize a multi-process application?

The process is the main building block of a modern application. An app can have multiple stateless processes, just like in our example. HTTP requests can be handled by a web process and long-running or scheduled background tasks by a worker. They are stateless, because any data that needs to be persisted is stored in a stateful database. For this reason, adding more concurrent processes are very simple. These processes can be independently scaled based on the load or other metrics.

In the previous section, we saw how to break down the config into components. This comes very handy when having different process types. Each type can have its own config only requiring the components it needs, without expecting unused environment variables.

In the config/index.js file:

// config/index.js
'use strict'

const processType = process.env.PROCESS_TYPE

let config  
try {  
  config = require(`./${processType}`)
} catch (ex) {
  if (ex.code === 'MODULE_NOT_FOUND') {
    throw new Error(`No config for process type: ${processType}`)
  }

  throw ex
}

module.exports = config  

In the root index.js file we start the process selected with the PROCESS_TYPE environment variable:

// index.js
'use strict'

const processType = process.env.PROCESS_TYPE

if (processType === 'web') {  
  require('./web')
} else if (processType === 'twitter-stream-worker') {
  require('./worker/twitter-stream')
} else if (processType === 'social-preprocessor-worker') {
  require('./worker/social-preprocessor')
} else {
  throw new Error(`${processType} is an unsupported process type. Use one of: 'web', 'twitter-stream-worker', 'social-preprocessor-worker'!`)
}

The nice thing about this is that we still got one application, but we have managed to split it into multiple, independent processes. Each of them can be started and scaled individually, without influencing the other parts. You can achieve this without sacrificing your DRY codebase, because parts of the code, like the models, can be shared between the different processes.

How to organize your test files?

Place your test files next to the tested modules using some kind of naming convention, like <module_name>.spec.js and <module_name>.e2e.spec.js. Your tests should live together with the tested modules, keeping them in sync. It would be really hard to find and maintain the tests and the corresponding functionality when the test files are completely separated from the business logic.

Place your test files next to the tested modules using some kind of naming convention, like module_name.spec.js

Click To Tweet

A separated /test folder can hold all the additional test setup and utilities not used by the application itself.

Where to put your build and script files?

We tend to create a /scripts folder where we put our bash and node scripts for database synchronization, front-end builds and so on. This folder separates them from your application code and prevents you from putting too many script files into the root directory. List them in your npm scripts for easier usage.

Conclusion

I hope you enjoyed this article on project structuring. I highly recommend to check out our previous article on the subject, where we laid out the 5 fundamentals of Node.js project structuring.

If you have any questions, please let me know in the comments. In the next chapter of the Node.js at Scale series, we’re going to dive deep into JavaScript clean coding. See you next week!


Writing Native Node.js Modules - Node.js at Scale

Writing Native Node.js Modules - Node.js at Scale

There are times when the performance of JavaScript is not enough, so you have to depend more on native Node.js modules.

While native extensions are definitely not a beginner topic, I'd recommend this article for every Node.js developer to get a bit of knowledge on how they work.


With Node.js at Scale we are creating a collection of articles focusing on the needs of companies with bigger Node.js installations, and developers who already learned the basics of Node.

Upcoming chapters for the Node.js at Scale series:


Common use cases of native Node.js Modules

The knowledge on native modules comes handy when you're adding a native extension as a dependency, which you could have done already!

Just take a look at the list of a few popular modules using native extensions. You’re using at least one of them, right?

There are a few reasons why one would consider writing native Node.js modules, these include but not limited to:

  • Performance-critical applications: Let's be honest, Node.js is great for doing asynchronous I/O operations, but when it comes to real number-crunching, it's not that great of a choice.

  • Hooking into lower level (e.g.: operating-system) APIs

  • Creating a bridge between C or C++ libraries and Node.js

What are the native modules?

Node.js Addons are dynamically-linked shared objects, written in C or C++, that can be loaded into Node.js using the require() function, and used just as if they were an ordinary Node.js module. - From the Node.js documentation

This means that (if done right) the quirks of C/C++ can be hidden from the module’s consumer. What they will see instead is that your module is a Node.js module - just like if you had written it in JavaScript.

As we've learned from previous blog posts, Node.js runs on the V8 JavaScript Engine, which is a C program on its own. We can write code that interacts directly with this C program in its own language, which is great because we can avoid a lot of expensive serialization and communication overhead.

Also, in a previous blogpost we've learnt about the cost of the Node.js Garbage Collector. Although Garbage Collection can be completely avoided if you decide to manage memory yourself (because C/C++ have no GC concept), you’ll create memory issues much easier.

Writing native extensions requires knowledge on one or more of the following topics:

All of those have excellent documentation. If you’re getting into this field, I’d recommend to read them.

Without further ado, let's begin:

Prerequisites

Linux:

  • python (v2.7 recommended, v3.x.x is not supported)
  • make
  • A proper C/C++ compiler toolchain, like GCC

Mac:

  • Xcode installed: make sure you not only install it, but you start it at least once and accept its terms and conditions - otherwise it won't work!

Windows

  • Run cmd.exe as administrator and type npm install --global --production windows-build-tools - which will install everything for you.

    OR

  • Install Visual Studio (it has all the C/C++ build tools preconfigured)

    OR

  • Use the Linux subsystem provided by the latest Windows build. With that, follow the LINUX instructions above.



Need help with enterprise-grade Node.js Development?
Hire the experts of RisingStack!


Creating our native Node.js extension

Let’s create our first file for the native extension. We can either use the .cc extension that means it’s C with classes, or the .cpp extension which is the default for C++. The Google Style Guide recommends .cc, so I’m going to stick with it.

First, let’s see the file in whole, and after that, I’m going to explain it to you line-by-line!

#include <node.h>

const int maxValue = 10;  
int numberOfCalls = 0;

void WhoAmI(const v8::FunctionCallbackInfo<v8::Value>& args) {  
  v8::Isolate* isolate = args.GetIsolate();
  auto message = v8::String::NewFromUtf8(isolate, "I'm a Node Hero!");
  args.GetReturnValue().Set(message);
}

void Increment(const v8::FunctionCallbackInfo<v8::Value>& args) {  
  v8::Isolate* isolate = args.GetIsolate();

  if (!args[0]->IsNumber()) {
    isolate->ThrowException(v8::Exception::TypeError(
          v8::String::NewFromUtf8(isolate, "Argument must be a number")));
    return;
  }

  double argsValue = args[0]->NumberValue();
  if (numberOfCalls + argsValue > maxValue) {
    isolate->ThrowException(v8::Exception::Error(
          v8::String::NewFromUtf8(isolate, "Counter went through the roof!")));
    return;
  }

  numberOfCalls += argsValue;

  auto currentNumberOfCalls =
    v8::Number::New(isolate, static_cast<double>(numberOfCalls));

  args.GetReturnValue().Set(currentNumberOfCalls);
}

void Initialize(v8::Local<v8::Object> exports) {  
  NODE_SET_METHOD(exports, "whoami", WhoAmI);
  NODE_SET_METHOD(exports, "increment", Increment);
}

NODE_MODULE(module_name, Initialize)  

Now let's go through the file line-by-line!

#include <node.h>

Include in C++ is like require() in JavaScript. It will pull everything from the given file, but instead of linking directly to the source, in C++ we have the concept of header files.

We can declare the exact interface in the header files without implementation and then we can include the implementations by their header file. The C++ linker will take care of linking these two together. Think of it as a documentation file that describes contents of it, that can be reused from your code.

void WhoAmI(const v8::FunctionCallbackInfo<v8::Value>& args) {  
  v8::Isolate* isolate = args.GetIsolate();
  auto message = v8::String::NewFromUtf8(isolate, "I'm a Node Hero!");
  args.GetReturnValue().Set(message);
}

Because this is going to be a native extension, the v8 namespace is available to use. Note the v8:: notation - which is used to access the v8's interface. If you don’t want to include v8:: before using any of the v8’s provided types, you can add using v8; to the top of the file. Then you can omit all the v8:: namespace specifiers from your types, but this can introduce name collisions in the code, so be careful using these. To be 100% clear, I’m going to use v8:: notation for all of the v8 types in my code.

In our example code, we have access to the arguments the function was called with (from JavaScript), via the args object that also provides us with all of the call related information.

With v8::Isolate* we're gaining access to the current JavaScript scope for our function. Scopes work just like in JavaScript: we can assign variables and tie them into the lifetime of that specific code. We don't have to worry about deallocating these chunks of memory, because we allocate them as if we'd do in JavaScript, and the Garbage Collector will automatically take care of them.

function () {  
 var a = 1;
} // SCOPE

Via args.GetReturnValue() we get access to the return value of our function. We can set it to anything we'd like as long as it is from v8:: namespace.

C++ has built-in types for storing integers and strings, but JavaScript only understands it's own v8:: type objects. As long as we are in the scope of the C++ world, we are free to use the ones built into C++, but when we're dealing with JavaScript objects and interoperability with JavaScript code, we have to transform C++ types into ones that are understood by the JavaScript context. These are the types that are exposed in the v8:: namespace like v8::String or v8::Object.

void WhoAmI(const v8::FunctionCallbackInfo<v8::Value>& args) {  
  v8::Isolate* isolate = args.GetIsolate();
  auto message = v8::String::NewFromUtf8(isolate, "I'm a Node Hero!");
  args.GetReturnValue().Set(message);
}

Let’s look at the second method in our file that increments a counter by a provided argument until an upper cap of 10.

This function also accepts a parameter from JavaScript. When you’re accepting parameters from JavaScript, you have to be careful because they are loosely typed objects. (You’re probably already used to this in JavaScript.)

The arguments array contains v8::Objects so they are all JavaScript objects, but be careful with these, because in this context we can never be sure what they might contain. We have to explicitly check for the types of these objects. Fortunately, there are helper methods added to these classes to determine their type before typecasting.

To maintain compatibility with existing JavaScript code, we have to throw some error if the arguments type is wrong. To throw a type error, we have to create an Error object with the
v8::Exception::TypeError() constructor. The following block will throw a TypeError if the first argument is not a number.

if (!args[0]->IsNumber()) {  
  isolate->ThrowException(v8::Exception::TypeError(
        v8::String::NewFromUtf8(isolate, "Argument must be a number")));
  return;
}

In JavaScript that snippet would look like:

If (typeof arguments[0] !== ‘number’) {  
  throw new TypeError(‘Argument must be a number’)
}

We also have to handle if our counter goes out of bounds. We can create a custom exception just like we would do in JavaScript: new Error(error message’). In C++ with the v8 api it looks like: v8::Exception:Error(v8::String::NewFromUtf8(isolate, "Counter went through the roof!"))); where the isolate is the current scope that we have to first get the reference via the v8::Isolate* isolate = args.GetIsolate();.

double argsValue = args[0]->NumberValue();  
if (numberOfCalls + argsValue > maxValue) {  
  isolate->ThrowException(v8::Exception::Error(
        v8::String::NewFromUtf8(isolate, "Counter went through the roof!")));
  return;
 }

After we handled everything that could go wrong, we add the argument to the counter variable that’s available in our C++ scope. That looks like as if it was JavaScript code. To return the new value to JavaScript code, first we have to make the conversion from integer in C++ to v8::Number that we can access from JavaScript. First we have to cast our integer to double with static_cast<double>() and we can pass its result to the v8::Number constructor.

auto currentNumberOfCalls =  
  v8::Number::New(isolate, static_cast<double>(numberOfCalls));

NODE_SET_METHOD is a macro that we use to assign a method on the exports object. This is the very same exports object that we're used to in JavaScript. That is the equivalent of:

exports.whoami = WhoAmI  

In fact, all Node.js addons must export an initialization function following this pattern:

void Initialize(v8::Local<v8::Object> exports);  
NODE_MODULE(module_name, Initialize)  

All C++ modules have to register themselves into the node module system. Without these lines, you won’t be able to access your module from JavaScript. If you accidentally forget to register your module, it will still compile, but when you’re trying to access it from JavaScript you’ll get the following exception:

module.js:597  
  return process.dlopen(module, path._makeLong(filename));
                 ^

Error: Module did not self-register.  

From now on when you see this error you’ll know what to do.

Compiling our native Node.js module

Now we have a skeleton of a C++ Node.js module ready, so let's compile it! The compiler we have to use is called node-gyp and it comes with npm by default. All we have to do is add a binding.gyp file which looks like this:

{
  "targets": [
    {
      "target_name": "addon",
      "sources": [ "example.cc" ]
    }
  ]
}

npm install will take care of the rest. You can also use node-gyp in itself by installing it globally on your system with npm install node-gyp -g.

Now that we have the C++ part ready, the only thing remaining is to get it working from within our Node.js code. Calling these addons are seamless thanks to the node-gyp compiler. It’s just a require away.

const myAddon = require('./build/Release/addon')  
console.log(myAddon.whoami())  

This approach works, but it can get a little bit tedious to specify paths every time, and we all know that relative paths are just hard to work with. There is a module to help us deal with this problem.

The bindings module is built to make require even less work for us. First, let's install the bindings module with npm install bindings --save, then make a small adjustment in our code snippet right over there. We can require the bindings module, and it will expose all the .node native extensions that we've specified in the binding.gyp files target_name.

const myAddon = require('bindings')('addon')  
console.log(myAddon.whoami())  

These two ways of using the binding is equivalent.

This is how you create native bindings to Node.js and bridge it to JavaScript code. But there is one small problem: Node.js is constantly evolving, and the interface just tends to break a lot! This means that targeting a specific version might not be a good idea because your addon will go out of date fast.

Think ahead and use Native Abstractions for Node.js (NaN).

The NaN library started out as a third party module written by independent individuals, but from late 2015 it became an incubated project of the Node.js foundation.

"Think ahead and use Native Abstractions for Node.js (NaN) in your native #nodejs modules" via @RisingStack

Click To Tweet

NaN provides us a layer of abstraction on top of the Node.js API and creates a common interface on top of all versions. It’s considered a best practice to use NaN instead of the native Node.js interface, so you can always stay ahead of the curve.

To use NaN, we have to rewrite parts of our application, but first, let’s install it with npm install nan --save. First, we have to add the following lines into the targets field in our bindings.gyp. This will make it possible to include the NaN header file in our program to use NaN’s functions.

{
  "targets": [
    {
      "include_dirs" : [
        "<!(node -e \"require('nan')\")"
      ],
      "target_name": "addon",
      "sources": [ "example.cc" ]
    }
  ]
}

We can replace some of the v8’s types with NaN’s abstractions in our sample application. It provides us helper methods on the call arguments and makes working with v8 types a much better experience.

The first thing you’ll probably notice is that we don’t have to have explicit access to the JavaScript’s scope, via the v8::Isolate* isolate = args.GetIsolate(); NaN handles that automatically for us. Its types will hide bindings to the current scope, so we don’t have to bother using them.

#include <nan.h>

const int maxValue = 10;  
int numberOfCalls = 0;

void WhoAmI(const Nan::FunctionCallbackInfo<v8::Value>& args) {  
  auto message = Nan::New<v8::String>("I'm a Node Hero!").ToLocalChecked();
  args.GetReturnValue().Set(message);
}

void Increment(const Nan::FunctionCallbackInfo<v8::Value>& args) {  
  if (!args[0]->IsNumber()) {
    Nan::ThrowError("Argument must be a number");
    return;
  }

  double argsValue = args[0]->NumberValue();
  if (numberOfCalls + argsValue > maxValue) {
    Nan::ThrowError("Counter went through the roof!");
    return;
  }

  numberOfCalls += argsValue;

  auto currentNumberOfCalls =
    Nan::New<v8::Number>(numberOfCalls);

  args.GetReturnValue().Set(currentNumberOfCalls);
}

void Initialize(v8::Local<v8::Object> exports) {  
  exports->Set(Nan::New("whoami").ToLocalChecked(),
      Nan::New<v8::FunctionTemplate>(WhoAmI)->GetFunction());
  exports->Set(Nan::New("increment").ToLocalChecked(),
      Nan::New<v8::FunctionTemplate>(Increment)->GetFunction());
}

NODE_MODULE(addon, Initialize)  

Now we have a working and also idiomatic example of how a Node.js native extension should look like.

First, we’ve learned about structuring the code, then about compilation processes, then went through the code itself line by line to understand every small piece of it. At the end, we looked at NaN’s provided abstractions over the v8 API.

There is one more small tweak we can make, and that is to use the provided macros of NaN.

Macros are snippets of code that the compiler will expand when compiling the code. More on macros can be found in this documentation. We had already been using one of these macros, NODE_MODULE, but NaN has a few others that we can include as well. These macros will save us a bit of time when creating our native extensions.

#include <nan.h>

const int maxValue = 10;  
int numberOfCalls = 0;

NAN_METHOD(WhoAmI) {  
  auto message = Nan::New<v8::String>("I'm a Node Hero!").ToLocalChecked();
  info.GetReturnValue().Set(message);
}

NAN_METHOD(Increment) {  
  if (!info[0]->IsNumber()) {
    Nan::ThrowError("Argument must be a number");
    return;
  }

  double infoValue = info[0]->NumberValue();
  if (numberOfCalls + infoValue > maxValue) {
    Nan::ThrowError("Counter went through the roof!");
    return;
  }

  numberOfCalls += infoValue;

  auto currentNumberOfCalls =
    Nan::New<v8::Number>(numberOfCalls);

  info.GetReturnValue().Set(currentNumberOfCalls);
}

NAN_MODULE_INIT(Initialize) {  
  NAN_EXPORT(target, WhoAmI);
  NAN_EXPORT(target, Increment);
}

NODE_MODULE(addon, Initialize)  

The first NAN_METHOD will save us the burden of typing the long method signature and will include that for us when the compiler expands this macro. Take note that if you use macros, you’ll have to use the naming provided by the macro itself - so now instead of args the arguments object will be called info, so we have to change that everywhere.

The next macro we used is the NAN_MODULE_INIT which provides the initialization function, and instead of exports, it named its argument target so we have to change that one as well.

The last macro is NAN_EXPORT which will set our modules interface. You can see that we cannot specify the objects keys in this macro, it will assign them with their respective names.

That would look like this in modern JavaScript:

module.exports = {  
  Increment,
  WhoAmI
}

If you’d like to use this with our previous example make sure you change the function names to uppercase, like this:

'use strict'

const addon = require('./build/Release/addon.node')

console.log(`native addon whoami: ${addon.WhoAmI()}`)

for (let i = 0; i < 6; i++) {  
  console.log(`native addon increment: ${addon.Increment(i)}`)
}

For further documentation refer to Nan’s Github page.

Example Repository

I’ve created a repository with all the code included in this post. The repository is under GIT version control, and available on GitHub, via this link. Each of the steps have their own branch, master is the first example, nan is the second one and the final step’s branch is called macros.

Download the whole Node.js Under the Hood tutorial series and read it later

Conclusion

I hope you had as much fun following along, as I’ve had writing about this topic. I’m not a C/C++ expert, but I’ve been doing Node.js long enough to be interested in writing my own super fast native addons and experimenting with a great language, namely C.

I’d highly recommend getting into at least a bit of C/C++ to understand the lower levels of the platform itself. You’ll surely find something of your interest. :)

As you see it’s not as scary as it looks at first glance, so go ahead and build something in C++, and tweet about it using @risingstack if you need help from us, or drop a comment below!

In the next part of the Node.js at Scales series, we'll take a look at Advanced Node.js Project Structuring.


Node.js Tutorial Videos: Debugging, Async, Memory Leaks, CPU Profiling

Node.js Tutorial Videos: Debugging, Async, Memory Leaks, CPU Profiling

At RisingStack, we're continuously working on delivering Node.js tutorials to help developers overcome their biggest obstacles, and become even better, week-by-week.

In our recent Node.js survey we've been told that Debugging, understanding/using Async programming, handling callbacks and memory leaks are amongst the greatest pain-points one would face on her/his journey to become a Node Hero.

This is why we came up with the idea of a new video tutorial series called Owning Node.js

In this three-part video series, we're going through all of these topics in a detailed way - by showing and explaining the actual coding process to you.

All of the videos are captioned, so you'll have no problem with understanding what's going on by enabling the subtitles!

So, let's start Owning Node.js together!


Node.js Debugging Made Easy

In this very first video, I'm going to show you how to use the debug module, the built-in debugger, and Chrome DevTools to find and fix issues easily!


Node.js Async Programming Done Right

In the second Node.js tutorial video, I'm going to show you how you can handle asynchronous operations easily, and how you can do performant applications in Node.js using them!

In this 3-part video series @RisingStack explains #nodejs debugging, #async, memory leaks and CPU profiling.

Click To Tweet

So, we are going to take a look at error handling with asynchronous operations, and learn how you can use the async module to handle multiple callbacks at the same time.


CPU and Memory Profiling with Node.js

In the 3rd Node.js tutorial of the series I teach you how to create CPU profiles and Memory Heapdumps, and how to analyze them in the Chrome DevTools profiler. You'll learn detecting memory leaks and bottlenecks easily.


More Node.js tutorials: Announcing the Node Hero Program

I hope these videos made things clearer! If you'd like to keep getting better, I've got good news for you!

We're launching the NODE HERO program as of today, which contains further webinars and screencasts, live-coding sessions and access to our Node.js Debugging and Monitoring solution called Trace.

I highly recommend to check it out, if you'd like to become an even better Node.js developer! See you there!


Node.js Garbage Collection Explained - Node.js at Scale

Node.js Garbage Collection Explained - Node.js at Scale

In this article, you are going to learn how Node.js garbage collection works, what happens in the background when you write code and how memory is freed up for you.

Ancient garbage collector in action

With Node.js at Scale we are creating a collection of articles focusing on the needs of companies with bigger Node.js installations, and developers who already learned the basics of Node.

Upcoming chapters for the Node.js at Scale series:


Memory Management in Node.js Applications

Every application needs memory to work properly. Memory management provides ways to dynamically allocate memory chunks for programs when they request it, and free them when they are no longer needed - so that they can be reused.

Application-level memory management can be manual or automatic. The automatic memory management usually involves a garbage collector.

The following code snippet shows how memory can be allocated in C, using manual memory management:

#include <stdio.h>
#include <stdlib.h>
#include <string.h>

int main() {

   char name[20];
   char *description;

   strcpy(name, "RisingStack");

   // memory allocation
   description = malloc( 30 * sizeof(char) );

   if( description == NULL ) {
      fprintf(stderr, "Error - unable to allocate required memory\n");
   } else {
      strcpy( description, "Trace by RisingStack is an APM.");
   }

   printf("Company name = %s\n", name );
   printf("Description: %s\n", description );

   // release memory
   free(description);
}

In manual memory management, it is the responsibility of the developer to free up the unused memory portions. Managing your memory this way can introduce several major bugs to your applications:

  • Memory leaks when the used memory space is never freed up.
  • Wild/dangling pointers appear when an object is deleted, but the pointer is reused. Serious security issues can be introduced when other data structures are overwritten or sensitive information is read.

Luckily for you, Node.js comes with a garbage collector, and you don't need to manually manage memory allocation.

The Concept of the Garbage Collector

Garbage collection is a way of managing application memory automatically. The job of the garbage collector (GC) is to reclaim memory occupied by unused objects (garbage). It was first used in LISP in 1959, invented by John McCarthy.

The way how the GC knows that objects are no longer in use is that no other object has references to them.

"A garbage collector was first used in LISP in 1959, invented by John McCarthy." via @RisingStack

Click To Tweet

Memory before the garbage collection

The following diagram shows how the memory can look like if you have objects with references to each other, and with some objects that have no reference to any objects. These are the objects that can be collected by a garbage collector run.

Memory state before Node.js garbage collection

Memory after the garbage collection

Once the garbage collector is run, the objects that are unreachable gets deleted, and the memory space is freed up.

Memory state after Node.js garbage collection

The Advantages of Using a Garbage Collector

  • it prevents wild/dangling pointers bugs,
  • it won't try to free up space that was already freed up,
  • it will protect you from some types of memory leaks.

Of course, using a garbage collector doesn't solve all of your problems, and it’s not a silver bullet for memory management. Let's take a look at things that you should keep in mind!

"Using a garbage collector doesn't solve all of your memory management problems with #nodejs!" via @RisingStack

Click To Tweet


Things to Keep in Mind When Using a Garbage Collector

  • performance impact - in order to decide what can be freed up, the GC consumes computing power
  • unpredictable stalls - modern GC implementations try to avoid "stop-the-world" collections

Node.js Garbage Collection & Memory Management in Practice

The easiest way of learning is by doing - so I am going to show you what happens in the memory with different code snippets.

The Stack

The stack contains local variables and pointers to objects on the heap or pointers defining the control flow of the application.

In the following example, both a and b will be placed on the stack.

function add (a, b) {  
  return a + b
}

add(4, 5)  



Need help with enterprise-grade Node.js Development?
Hire the experts of RisingStack!


The Heap

The heap is dedicated to store reference type objects, like strings or objects.

The Car object created in the following snippet is placed on the heap.

function Car (opts) {  
  this.name = opts.name
}

const LightningMcQueen = new Car({name: 'Lightning McQueen'})  

After this, the memory would look something like this:

Node.js Garbage Collection First Step - Object Placed in the Memory Heap

Let's add more cars, and see how our memory would look like!

function Car (opts) {  
  this.name = opts.name
}

const LightningMcQueen = new Car({name: 'Lightning McQueen'})  
const SallyCarrera = new Car({name: 'Sally Carrera'})  
const Mater = new Car({name: 'Mater'})  

Node.js Garbage Collection Second Step - More elements added to the heap

If the GC would run now, nothing could be freed up, as the root has a reference to every object.

Let's make it a little bit more interesting, and add some parts to our cars!

function Engine (power) {  
  this.power = power
}

function Car (opts) {  
  this.name = opts.name
  this.engine = new Engine(opts.power)
}

let LightningMcQueen = new Car({name: 'Lightning McQueen', power: 900})  
let SallyCarrera = new Car({name: 'Sally Carrera', power: 500})  
let Mater = new Car({name: 'Mater', power: 100})  

Node.js Garbage Collection - Assigning values to the objects in the heap

What would happen, if we no longer use Mater, but redefine it and assign some other value, like Mater = undefined?

Node.js Garbage Collection - Redefining values

As a result, the original Mater object cannot be reached from the root object, so on the next garbage collector run it will be freed up:

Node.js Garbage Collection - Freeing up the unreachable object

Now as we understand the basics of what's the expected behaviour of the garbage collector, let's take a look on how it is implemented in V8!

Garbage Collection Methods

In one of our previous articles we dealt with how the Node.js garbage collection methods work, so I strongly recommend reading that article.

Here are the most important things you’ll learn there:

New Space and Old Space

The heap has two main segments, the New Space and the Old Space. The New Space is where new allocations are happening; it is fast to collect garbage here and has a size of ~1-8MBs. Objects living in the New Space are called Young Generation.

The Old Space where the objects that survived the collector in the New Space are promoted into - they are called the Old Generation. Allocation in the Old Space is fast, however collection is expensive so it is infrequently performed .

Young Generation

Usually, ~20% of the Young Generation survives into the Old Generation. Collection in the Old Space will only commence once it is getting exhausted. To do so the V8 engine uses two different collection algorithms.

Scavenge and Mark-Sweep collection

Scavenge collection is fast and runs on the Young Generation, however the slower Mark-Sweep collection runs on the Old Generation.

A Real-Life Example - The Meteor Case-Study

In 2013, the creators of Meteor announced their findings about a memory leak they ran into. The problematic code snippet was the following:

var theThing = null  
var replaceThing = function () {  
  var originalThing = theThing
  var unused = function () {
    if (originalThing)
      console.log("hi")
  }
  theThing = {
    longStr: new Array(1000000).join('*'),
    someMethod: function () {
      console.log(someMessage)
    }
  };
};
setInterval(replaceThing, 1000)  

Well, the typical way that closures are implemented is that every function object has a link to a dictionary-style object representing its lexical scope. If both functions defined inside replaceThing actually used originalThing, it would be important that they both get the same object, even if originalThing gets assigned to over and over, so both functions share the same lexical environment. Now, Chrome's V8 JavaScript engine is apparently smart enough to keep variables out of the lexical environment if they aren't used by any closures - from the Meteor blog.

Further reading:


Next up

In the next chapter of the Node.js at Scale tutorial series we will take a deep dive into writing native Node.js module.

In the meantime, let us know in the comments sections if you have any questions!


Graceful shutdown with Node.js and Kubernetes

Graceful shutdown with Node.js and Kubernetes

This article helps you to understand what graceful shutdown is, what are the main benefits of it and how can you set up the graceful shutdown of a Kubernetes application. We’ll discuss how you can validate and benchmark this process, and what are the most common mistakes that you should avoid.

Graceful shutdown

We can speak about the graceful shutdown of our application, when all of the resources it used and all of the traffic and/or data processing what it handled are closed and released properly.

It means that no database connection remains open and no ongoing request fails because we stop our application.

"Graceful shutdown: when all of the resources & data processing are closed and released properly." via @RisingStack

Click To Tweet

Possible scenarios for a graceful web server shutdown:

  1. App gets notification to stop (received SIGTERM)
  2. App lets know the load balancer that it’s not ready for newer requests
  3. App served all the ongoing requests
  4. App releases all of the resources correctly: DB, queue, etc.
  5. App exits with "success" status code (process.exit())

This article goes deep with shutting down web servers properly, but you should also apply these techniques to your worker processes: it’s highly recommended to stop consuming queues for SIGTERM and finish the current task/job.

Why is it important?

If we don't stop our application correctly, we are wasting resources like DB connections and we may also break ongoing requests. An HTTP request doesn't recover automatically - if we fail to serve it, then we simply missed it.

"If we don't stop our app correctly, we're wasting resources & may also break ongoing requests." via @RisingStack

Click To Tweet

Graceful start

We should only start our application when all of the dependencies and database connections are ready to handle our traffic.

Possible scenarios for a graceful web server start:

  1. App starts (npm start)
  2. App opens DB connections
  3. App listens on port
  4. App tells the load balancer that it’s ready for requests

Graceful shutdown in a Node.js application

First of all, you need to listen for the SIGTERM signal and catch it:

process.on('SIGTERM', function onSigterm () {  
  console.info('Got SIGTERM. Graceful shutdown start', new Date().toISOString())
  // start graceul shutdown here
  shutdown()
})

After that, you can close your server, then close your resources and exit the process:

function shutdown() {  
  server.close(function onServerClosed (err) {
    if (err) {
      console.error(err)
      process.exit(1)
    }

    closeMyResources(function onResourcesClosed (err) {
      // error handling
      process.exit()
    })
  })
}

Sounds easy right? Maybe a little bit too easy.

What about the load balancer? How will it know that your app is not ready to receive further requests anymore? What about keep-alive connections? Will they keep the server open for a longer time? What if my server SIGKILL my app in the meantime?

Graceful shutdown with Kubernetes

If you’d like to learn a little bit about Kubernetes, you can read our Moving a Node.js app from PaaS to Kubernetes Tutorial. For now, let's just focus on the shutdown now.

Kubernetes comes with a resource called Service. Its job is to route traffic to your pods (~instances of your app). Kubernetes also comes with a thing called Deployment that describes how your applications should behave during exit, scale and deploy - and you can also define a health check here. We will combine these resources for the perfect graceful shutdown and handover during new deploys at high traffic.

We would like to see throughput charts like below with consistent rpm and no deployment side effects at all:

Graceful shutdown example: Throughput time in Trace by RisingStack Throughput metrics shown in Trace - no change at deploy

Ok, let's see how to solve this challenge.

Setting up graceful shutdown

In Kubernetes, for a proper graceful shutdown we need to add a readinessProbe to our application’s Deployment yaml and let the Service’s load balancer know during the shutdown that we will not serve more requests so it should stop sending them. We can close the server, tear down the DB connections and exit only after that.

How does it work?

Kubernetes graceful shutdown flowchart

  1. pod receives SIGTERM signal because Kubernetes wants to stop it - because of deploy, scale, etc.
  2. App (pod) starts to return 500 for GET /health to let readinessProbe (Service) know that it's not ready to receive more requests.
  3. Kubernetes readinessProbe checks GET /health and after (failureThreshold * periodSecond) it stops redirecting traffic to the app (because it continuously returns 500)
  4. App waits (failureThreshold * periodSecond) before it starts to shut down - to make sure that the Service is getting notified via readinessProbe fail
  5. App starts graceful shutdown
  6. App first closes server with live working DB connections
  7. App closes databases after the server is closed
  8. App exits process
  9. Kubernetes force kills the application after 30s (SIGKILL) if it's still running (in an optimal case it doesn't happen)

In our case, the Kubernetes livenessProbe won't kill the app before the graceful shutdown happens because it needs to wait (failureThreshold * periodSecond) to do it. This means that the livenessProve threshold should be larger than the readinessProbe threshold. This way the (graceful stop happens around 4s, while the force kill would happen 30s after SIGTERM).



Compare your applications revisions by using deployment hooks. - check out Trace by RisingStack!


How to achieve it?

For this we need to do two things, first we need to let the readinessProbe know after SIGTERM that we are not ready anymore:

'use strict'

const db = require('./db')  
const promiseTimeout = require('./promiseTimeout')  
const state = { isShutdown: false }  
const TIMEOUT_IN_MILLIS = 900

process.on('SIGTERM', function onSigterm () {  
  state.isShutdown = true
})

function get (req, res) {  
  // SIGTERM already happened
  // app is not ready to serve more requests
  if (state.isShutdown) {
    res.writeHead(500)
    return res.end('not ok')
  }

  // something cheap but tests the required resources
  // timeout because we would like to log before livenessProbe KILLS the process
  promiseTimeout(db.ping(), TIMEOUT_IN_MILLIS)
    .then(() => {
      // success health
      res.writeHead(200)
      return res.end('ok')
    })
    .catch(() => {
      // broken health
      res.writeHead(500)
      return res.end('not ok')
    })
}

module.exports = {  
  get: get
}

The second thing is that we have to delay the teardown process - as a sane default you can use the time needed for two failed readinessProbe: failureThreshold: 2 * periodSeconds: 2 = 4s

process.on('SIGTERM', function onSigterm () {  
  console.info('Got SIGTERM. Graceful shutdown start', new Date().toISOString())

  // Wait a little bit to give enough time for Kubernetes readiness probe to fail 
  // (we are not ready to serve more traffic)
  // Don't worry livenessProbe won't kill it until (failureThreshold: 3) => 30s
  setTimeout(greacefulStop, READINESS_PROBE_DELAY)
})

You can find the full example here:
https://github.com/RisingStack/kubernetes-graceful-shutdown-example

How to validate it?

Let's test our graceful shutdown by sending high traffic to our pods and releasing a new version in the meantime (recreating all of the pods).

Test case

$ ab -n 100000 -c 20 http://localhost:myport

Other than this, you need to change an environment variable in the Deployment to re-create all pods during the ab benchmarking.

AB output

Document Path:          /  
Document Length:        3 bytes

Concurrency Level:      20  
Time taken for tests:   172.476 seconds  
Complete requests:      100000  
Failed requests:        0  
Total transferred:      7800000 bytes  
HTML transferred:       300000 bytes  
Requests per second:    579.79 [#/sec] (mean)  
Time per request:       34.495 [ms] (mean)  
Time per request:       1.725 [ms] (mean, across all concurrent requests)  
Transfer rate:          44.16 [Kbytes/sec] received  

Application log output

Got SIGTERM. Graceful shutdown start 2016-10-16T18:54:59.208Z  
Request after sigterm: / 2016-10-16T18:54:59.217Z  
Request after sigterm: / 2016-10-16T18:54:59.261Z  
...
Request after sigterm: / 2016-10-16T18:55:00.064Z  
Request after sigterm: /health?type=readiness 2016-10-16T18:55:00.820Z  
HEALTH: NOT OK  
Request after sigterm: /health?type=readiness 2016-10-16T18:55:02.784Z  
HEALTH: NOT OK  
Request after sigterm: /health?type=liveness 2016-10-16T18:55:04.781Z  
HEALTH: NOT OK  
Request after sigterm: /health?type=readiness 2016-10-16T18:55:04.800Z  
HEALTH: NOT OK  
Server is shutting down... 2016-10-16T18:55:05.210Z  
Successful graceful shutdown 2016-10-16T18:55:05.212Z  

Benchmark result

Success!

Zero failed requests: you can see in the app log that the Service stopped sending traffic to the pod before we disconnected from the DB and killed the app.

Common gotchas

The following mistakes can still prevent your app from doing a proper graceful shutdown:

Keep-alive connections

Kubernetes doesn't handover keep-alive connections properly. :/

This means that request from agents with a keep-alive header will still be routed to the pod.

It tricked me first when I benchmarked with autocannon or Google Chrome (they use keep-alive connections).

Keep-alive connections prevent closing your server in time. To force the exit of a process, you can use the server-destroy module. Once it ran, you can be sure that all the ongoing requests are served. Alternatively you can adda timeout logic to your server.close(cb).

Docker signaling

It’s quite possible that your application doesn't receive the signals correctly in a dockerized application.

For example in our Alpine image: CMD ["node", "src"] works, CMD ["npm", "start"] doesn't. It simply doesn't pass the SIGTERM to the node process. The issue is probably related to this PR: https://github.com/npm/npm/pull/10868

An alternative you can use is dumb-init for fixing broken Docker signaling.

Takeaway

Always be sure that your application stops correctly: It releases all of the resources and helps to hand over the traffic to the new version of your app.

Check out our example repository with Node.js and Kubernetes:
https://github.com/RisingStack/kubernetes-graceful-shutdown-example

"An app stops correctly if it releases all resources & hands over the traffic to your new app." via @RisingStack

Click To Tweet

If you have any questions or thoughts about this topic, find me in the comment section below!


How the module system, CommonJS & require works - Node.js at Scale

How the module system, CommonJS & require works - Node.js at Scale

In the third chapter of Node.js at Scale you are about to learn how the Node.js module system & CommonJS works and what does require do under the hood.

With Node.js at Scale we are creating a collection of articles focusing on the needs of companies with bigger Node.js installations, and developers who already learned the basics of Node.

Upcoming chapters for the Node.js at Scale series:


CommonJS to the rescue

The JavaScript language didn’t have a native way of organizing code before the ES2015 standard. Node.js filled this gap with the CommonJS module format. In this article we will learn about how the Node.js module system works, how you can organize your modules and what does the new ES standard means for the future of Node.js.

"#JavaScript didn't have a mature module system before #nodejs. That gap was filled with #commonjs" via @RisingStack

Click To Tweet

What is the module system?

Modules are the fundamental building blocks of the code structure. The module system allows you to organize your code, hide information and only expose the public interface of a component using module.exports. Every time you use the require call, you are loading another module.

The simplest example can be the following using CommonJS:

// add.js
function add (a, b) {  
  return a + b
}

module.exports = add  

To use the add module we have just created, we have to require it.

// index.js
const add = require('./add')

console.log(add(4, 5))  
//9

Under the hood, add.js is wrapped by Node.js this way:

(function (exports, require, module, __filename, __dirname) {
  function add (a, b) {
    return a + b
  }

  module.exports = add
})

This is why you can access the global-like variables like require and module. It also ensures that your variables are scoped to your module rather than the global object.

"Modules are the fundamental building blocks of the code structure." via @RisingStack #nodejs

Click To Tweet

How does require work?

The module loading mechanism in Node.js is caching the modules on the first require call. It means that every time you use require('awesome-module') you will get the same instance of awesome-module, which ensures that the modules are singleton-like and have the same state across your application.

You can load native modules and path references from your file system or installed modules. If the identifier passed to the require function is not a native module or a file reference (beginning with /, ../, ./ or similar), then Node.js will look for installed modules. It will walk your file system looking for the referenced module in the node_modules folder. It starts from the parent directory of your current module and then moves to the parent directory until it finds the right module or until the root of the file system is reached.



Need help with enterprise-grade Node.js Development?
Hire the experts of RisingStack!


Require under the hood - module.js

The module dealing with module loading in the Node core is called module.js, and can be found in lib/module.js in the Node.js repository.

The most important functions to check here are the _load and _compile functions.

Module._load

This function checks whether the module is in the cache already - if so, it returns the exports object.

If the module is native, it calls the NativeModule.require() with the filename and returns the result.

Otherwise, it creates a new module for the file and saves it to the cache. Then it loads the file contents before returning its exports object.

Module._compile

The compile function runs the file contents in the correct scope or sandbox, as well as exposes helper variables like require, module or exports to the file.

How require works in Node.js How Require Works - From James N. Snell

How to organize the code?

In our applications, we need to find the right balance of cohesion and coupling when creating modules. The desirable scenario is to achieve high cohesion and loose coupling of the modules.

A module must be focused only on a single part of the functionality to have high cohesion. Loose coupling means that the modules should not have a global or shared state. They should only communicate by passing parameters, and they are easily replaceable without touching your broader codebase.

"The desirable scenario is to achieve high cohesion and loose coupling of the modules." via @RisingStack #nodejs

Click To Tweet

We usually export named functions or constants in the following way:

'use strict'

const CONNECTION_LIMIT = 0

function connect () { /* ... */ }

module.exports = {  
  CONNECTION_LIMIT,
  connect
}

What’s in your node_modules?

The node_modules folder is the place where Node.js looks for modules. npm v2 and npm v3 install your dependencies differently. You can find out what version of npm you are using by executing:

npm --version  

npm v2

npm 2 installs all dependencies in a nested way, where your primary package dependencies are in their node_modules folder.

npm v3

npm3 attempts to flatten these secondary dependencies and install them in the root node_modules folder. This means that you can’t tell by looking at your node_modules which packages are your explicit or implicit dependencies. It is also possible that the installation order changes your folder structure because npm 3 is non-deterministic in this manner.


You can make sure that your node_modules directory is always the same by installing packages only from a package.json. In this case, it installs your dependencies in alphabetical order, which also means that you will get the same folder tree. This is important because the modules are cached using their path as the lookup key. Each package can have its own child node_modules folder, which might result in multiple instances of the same package and of the same module.

How to handle your modules?

There are two main ways for wiring modules. One of them is using hard coded dependencies, explicitly loading one module into another using a require call. The other method is to use a dependency injection pattern, where we pass the components as a parameter or we have a global container (known as IoC, or Inversion of Control container), which centralizes the management of the modules.

We can allow Node.js to manage the modules life cycle by using hard coded module loading. It organizes your packages in an intuitive way, which makes understanding and debugging easy.

Dependency Injection is rarely used in a Node.js environment, although it is a useful concept. The DI pattern can result in an improved decoupling of the modules. Instead of explicitly defining dependencies for a module, they are received from the outside. Therefore they can be easily replaced with modules having the same interfaces.

Let’s see an example for DI modules using the factory pattern:

class Car {  
  constructor (options) {
    this.engine = options.engine
  }

  start () {
    this.engine.start()
  }
}

function create (options) {  
  return new Car(options)
}

module.exports = create  

The ES2015 module system

As we saw above, the CommonJS module system uses a runtime evaluation of the modules, wrapping them into a function before the execution. The ES2015 modules don’t need to be wrapped since the import/export bindings are created before evaluating the module. This incompatibility is the reason that currently there are no JavaScript runtime supporting the ES modules. There was a lot of discussion about the topic and a proposal is in DRAFT state, so hopefully we will have support for it in future Node versions.

To read an in-depth explanation of the biggest differences between CommonJS and the ESM, read the following article by James M Snell.

Download the whole Learn using npm series as a single pdf

Next up

I hope this article contained valuable information about the module system and how require works. If you have any questions or insights on the topic, please share them in the comments. In the next chapter of the Node.js at Scale series, we are going to take a deep dive and learn about the event loop.


Experimenting With async/await in Node.js 7 Nightly

Experimenting With async/await in Node.js 7 Nightly

A couple of months ago async/await landed in V8, the JavaScript engine. In the meantime, V8 was updated multiple times in Node.js, and the latest nightly build finally added the V8 version that supports the async/await functionality to Node.js.

Disclaimer: the async/await functionality is only available in the nightly, unstable version of Node.js. Do not use it in production for now.

What's async/await?

First, let's see how you are doing async operations with Promises! This little example shows you how you can fetch data using the Fetch API and Promises.

function getTrace () {  
  return fetch('https://trace.risingstack.com', {
    method: 'get'
  })
}

getTrace()  
  .then()
  .catch()

With async/await, you can await on Promises. This will halt the execution in a non-blocking way - since it waits for the result and returns it. If the promise is not resolved but rejected, the rejected value will be thrown, meaning it can be caught with a try/catch block.

The previous example rewritten with async/await would look something like this:

async function getTrace () {  
  let pageContent
  try {
    pageContent = await fetch('https://trace.risingstack.com', {
      method: 'get'
    })
  } catch (ex) {
    console.error(ex)
  }

  return pageContent
}

getTrace()  
  .then()

For more information on async/await, I recommend reading the following resources:

Using async/await without transpilers

Installing Node 7

To get started, you have to get the latest build of Node.js first. To do so, head over to the Nightly builds and grab the latest one of the v7.

Once you have downloaded it, unpack it - and you are ready to use it!


If you are using nvm, you can try to install it this way:

NVM_NODEJS_ORG_MIRROR=https://nodejs.org/download/nightly  
nvm install 7  
nvm use 7  

Running files with async/await

Let's create a simple JavaScript file that delays the execution of a function using the setTimeout call, but wrapped with async/await calls.

// app.js
const timeout = function (delay) {  
  return new Promise((resolve, reject) => {
    setTimeout(() => {
      resolve()
    }, delay)
  })
}

async function timer () {  
  console.log('timer started')
  await Promise.resolve(timeout(100));
  console.log('timer finished')
}

timer()  

Once you have this file, you could try running with:

node app.js  

However, it won't work. The async/await support is still behind a flag. To run it, you have to use:

node --harmony-async-await app.js  



Need help with enterprise-grade Node.js Development?
Hire the experts of RisingStack!


Building a web server with async/await

As of Koa v2, Koa supports async functions as middlewares. Previously, it was only possible with transpilers, but from now on it is not the case!

You can simply pass an async function as a Koa middleware:

// app.js
const Koa = require('koa')  
const app = new Koa()

app.use(async (ctx, next) => {  
  const start = new Date()
  await next()
  const ms = new Date() - start
  console.log(`${ctx.method} ${ctx.url} - ${ms}ms`)
})

app.use(ctx => {  
  ctx.body = 'Hello Koa'
})

app.listen(3000)  

Once you have a working server written using Koa, you can simply start it with:

node --harmony-async-await app.js  

When to start using it?

Node.js v8, the next stable version containing the V8 version that enables async/await operations will be released in April 2017. Till that time you can still experiment with it using the unstable Node.js v7 branch.

"#nodejs 8 will enable JavaScript V8 async/await operations. It will be released in April 2017." via @RisingStack

Click To Tweet


JavaScript Garbage Collection Improvements - Orinoco

JavaScript Garbage Collection Improvements - Orinoco

This article summarizes the three main effect of the new V8 upgrade (5.0) on JavaScript garbage collection.

In the latest releases of Node.js, the V8 JavaScript engine was upgraded to version 5.0. Among new ES2015 features, it includes three major improvements for the garbage collector. These changes lay the groundwork for a new garbage collector in V8, code-named Orinoco.

V8 implements a generational garbage collector - meaning it has a memory segment called new space for the young generation, and has an old space for the old generation. New objects are allocated in the new space, and if they survive two garbage collections in the new space, they are moved to the old space.

#1: Parallelised JavaScript Garbage Collection

The problem with the two spaces is that moving objects between them is expensive: the objects need to be copied over, and the pointers must be updated. Moving out from the young generation is called evacuation, while adding it to the old generation is called compaction.

Since there are no dependencies between young generation evacuation and old generation compaction, the garbage collector can perform these in parallel, resulting in a reduction of compaction time of 75% from ~7ms to under 2ms on average.

Node.js garbage collector parallel from JavaScript Garbage Collection

"The Parallelised Garbage Collector can reduce compaction time by 75%" via @RisingStack #javascript #nodejs

Click To Tweet

#2: Track Pointers Improvement

When an object is moved on the heap, the garbage collector has to update all the pointers - but first, it has to find all the pointers to the old location. For this V8 uses a data structure called remembered set to keep track of interesting pointers in the heap. A pointer is classified interesting if the next garbage collector run may move it:

  • moving objects from the young generation to the old generation,
  • pointers to objects in fragmented pages, as objects may be moved to other pages during compaction

In older versions of V8, remembered sets are implemented using store buffers. This contains addresses of all incoming pointers. The problem is that it may result in duplicated entries, because a store buffer may end up including a pointer multiple times and two store buffers may have the same pointer. This would make parallelization of the pointer update really complex.



Need help with enterprise-grade Node.js Development?
Hire the experts of RisingStack!


Instead of dealing with the extra complexity, Orinoco removes it by reorganizing the remembered set. Instead of the previous approach, now each page stores the offsets of the interesting pointers originating from the given page. With this technique the parallel updates of the pointers become feasible.

It has a huge performance impact - it can reduce the maximum pause time of compacting garbage collection by 40%.

#3: Black Allocation

The Black Allocation introduced by Orinoco is involved in the marking phase of the garbage collector. With this, once an object is allocated in the old generation, they are marked black instantly, meaning that they are "live" objects. The goal of this allocation is that objects allocated in the old space are likely to live longer, so they should survive the next garbage collector. Objects colored black are not visited by the garbage collector - they are put on black pages of the old space.

It speeds up the garbage collection due the faster-marking progress as well as less garbage collection work in general.


For more information on V8 updates, follow the V8 project's blog.

Let me know if you have any questions or additional thoughts in the comments!


Moving a Node.js app from PaaS to Kubernetes Tutorial

Moving a Node.js app from PaaS to Kubernetes Tutorial

From this Kubernetes tutorial, you can learn how to move a Node.js app from a PaaS provider while achieving lower response times, improving security and reducing costs.


Before we jump into the story of why and how we migrated our services to Kubernetes, it's important to mention that there is nothing wrong with using a PaaS. PaaS is perfect to start building a new product, and it can also turn out to be a good solution as an application advances - it always depends on your requirements and resources.

PaaS

Trace by RisingStack, our Node.js monitoring solution was running on one of the biggest PaaS providers for more than half a year. We have chosen a PaaS over other solutions because we wanted to focus more on the product instead of the infrastructure.

Our requirements were simple; we wanted to have:

  • fast deploys,
  • simple scaling,
  • zero-downtime deployments,
  • rollback capabilities,
  • environment variable management,
  • various Node.js versions,
  • and "zero" DevOps.

What we didn't want to have, but got as a side effect of using PaaS:

  • big network latencies between services,
  • lack of VPC,
  • response time peaks because of the multitenancy,
  • larger bills (pay for every single process, no matter how small it is: clock, internal API, etc.).

Trace is developed as a group of micro-services, you can imagine how quickly the network latency and billing started to hurt us.

Kubernetes tutorial

From our PaaS experience, we knew that we are looking for a solution that needs very few DevOps effort but provides a similar flow for our developers. We didn't want to lose any of the advantages I mentioned above - however, we wanted to fix the outstanding issues.

We were looking for an infrastructure that is more configuration-based, and anyone from the team can modify it.

Kubernetes with its’ configuration-focused, container-based and micro-services friendly nature convinced us.

"Kubernetes convinced us with its configuration-focused, microservices friendly nature" via @RisingStack #kubernetes

Click To Tweet

Let me show you what I mean under these "buzzwords" through the upcoming sections.

What is Kubernetes?

Kubernetes is an open-source system for automating deployments, scaling, and management of containerized applications - kubernetes.io

I don't want to give a very deep intro about the Kubernetes elements here, but you need to know the basic ones for the upcoming parts of this post.

My definitions won't be 100% correct, but you can think of it as a PaaS to Kubernetes dictionary:

  • pod: your running containerized application with environment variables, disk, etc. together, pods born and die quickly, like at deploys,

    • in PaaS: ~currently running app
  • deployment: configuration of your application that describes what state do you need (CPU, memory, env. vars, docker image version, disks, number of running instances, deploy strategy, etc.):

    • in PaaS: ~app settings
  • secret: you can separate your credentials from environment variables,

    • in PaaS: not exist, like a shared separated secret environment variable, for DB credentials etc.
  • service: exposes your running pods by label(s) to other apps or to the outside world on the desired IP and port

    • in PaaS: built-in non-configurable load balancer

How to set up a running Kubernetes cluster?

You have several options here. The easiest one is to create a Container Engine in Google Cloud, which is a hosted Kubernetes. It's also well integrated with other Google Cloud components, like load balancers and disks.

You should also know, that Kubernetes can run anywhere like AWS, DigitalOcean, Azure etc. For more information check out the CoreOS Kubernetes tools.

Running the application

First, we have to prepare our application to work well with Kubernetes in a Docker environment.

If you are looking for a tutorial on how to start an app from scratch with Kubernetes check out their 101 tutorial.

Node.js app in Docker container

Kubernetes is Docker-based, so first we need to containerize our application. If you are not sure how to do that, check out our previous post: Dockerizing Your Node.js Application

If you are a private NPM user, you will also find this one helpful: Using the Private NPM Registry from Docker

"Procfile" in Kubernetes

We create one Docker image for every application (Git repository). If the repository contains multiple processes like: server, worker and clock we choose between them with an environment variable. Maybe you find it strange, but we don't want to build and push multiple Docker images from the very same source code, it would slow down our CI.

Environments, rollback, and service-discovery

Staging, production

During our PaaS period we named our services like trace-foo and trace-foo-staging, the only difference between the staging and production application was the name prefix and the different environment variables. In Kubernetes it's possible to define namespaces. Each namespace is totally independent from each other and doesn't share any resources like secrets, config, etc.

$ kubectl create namespace production
$ kubectl create namespace staging

Application versions

In a containerized infrastructure, each application version should be a different container image with a tag. We use the Git short hash as a Docker image tag.

foo:b37d759  
foo:f53a7cb  

To deploy a new version of your application, you only need to change the image tag in your application's deployment config, Kubernetes will do the rest.

Kubernetes Tutorial: The Deploy Flow (Deploy flow)

Any change in your deployment file is versioned and you can rollback to them anytime.

$ kubectl rollout history deployment/foo
deployments "foo":  
REVISION    CHANGE-CAUSE  
1           kubectl set image deployment/foo foo=foo:b37d759  
2           kubectl set image deployment/foo foo=foo:f53a7cb  

During our deploy process, we only replace Docker images which are quite fast - they only require a couple of seconds.

Service discovery

Kubernetes has a built-in simple service discovery solution: The created services expose their hostname and port as an environment variable for each pod.

const fooServiceUrl = `http://${process.env.FOO_SERVICE_HOST}:${process.env.FOO_SERVICE_PORT}`  

If you don't need advanced discovery, you can just start using it, instead of copying your service URLs to each other's environment variables. Kind of cool, isn't it?

Production ready application

The reallly challenging part of jumping into a new technology is to know what you need to be production ready. In the following section we will check what you should consider to set up in your app.

Zero downtime deployment and failover

Kubernetes can update your application in a way that it always keeps some pods running and deploy your changes in smaller steps - instead of stopping and starting all of them at the same time.

It’s not just helpful to prevent zero downtime deploys; it also avoids killing your whole application when you misconfigure something. Your mistake stops escalating to all of the running pods after Kubernetes detects that your new pods are unhealthy.

Kubernetes supports several strategies to deploy your applications. You can check them in the Deployment strategy documentation.

Graceful stop

It’s not mainly related to Kubernetes, but it’s impossible to have a good application lifecycle without starting and stopping your process in a proper way.

Start server

const server = MyServer()  
Promise.all([  
   db1.connect()
   db2.connect()
])
  .then() => server.listen(3000))

Gracefull server stop

process.on('SIGTERM', () => {  
  server.close()
    .then() => Promise.all([
      db1.disconnect()
      db2.disconnect()
    ])
   .then(() => process.exit(0))
   .catch((err) => process.exit(-1))
})

Liveness probe (health check)

In Kubernetes, you should define health check (liveness probe) for your application. With this, Kubernetes will be able to detect when your application needs to be restarted.

Web server health check

You have multiple options to check your applications health, but I think the easiest one is to create a GET /healthz endpoint end check your applications logic / DB connections there. It’s important to mention that every application is different, only you can know what checks are necessary to make sure it’s working.

app.get('/healthz', function (req, res, next) {  
  // check my health
  // -> return next(new Error('DB is unreachable'))
  res.sendStatus(200)
})
livenessProbe:  
    httpGet:
      # Path to probe; should be cheap, but representative of typical behavior
      path: /healthz
      port: 3000
    initialDelaySeconds: 30
    timeoutSeconds: 1
Worker health check

For our workers we also set up a very small HTTP server with the same /healthz endpoint which checks different criteria with the same liveness probe. We do it to have company-wide consistent health check endpoints.

Readiness probe

The readiness probe is similar to the liveness probe (health check), but it makes sense only for web servers. It tells the Kubernetes service (~load balancer) that the traffic can be redirected to the specific pod.

It is essential to avoid any service disruption during deploys and other issues.

readinessProbe:  
    httpGet:
      # You can use the /healthz or something else
      path: /healthz
      port: 3000
    initialDelaySeconds: 30
    timeoutSeconds: 1

Logging

For logging, you can choose from different approaches, like adding side containers to your application which collects your logs and sends them to custom logging solutions, or you can go with the built-in Google Cloud one. We selected the built-in one.

To be able to parse the built in log levels (severity) on Google Cloud, you need to log in the specific format. You can achieve this easily with the winston-gke module.

// setup logger
cons logger = require(‘winston’)  
cons winstonGke = require(‘winston-gke’)  
logger.remove(logger.transports.Console)  
winstonGke(logger, config.logger.level)

// usage
logger.info(‘I\’m a potato’, { foo: ‘bar’ })  
logger.warning(‘So warning’)  
logger.error(‘Such error’)  
logger.debug(‘My debug log)

If you log in the specific format, Kubernetes will automatically merge your log messages with the container, deployment, etc. meta information and Google Cloud will show it in the right format.

Your applications first log message has to be in the right format, otherwise it won’t start to parse it correctly.

To achieve this, we turned our npm start to silent, npm start -s in a Dockerfile: CMD ["npm", "start", "-s"]

Monitoring

We check our applications with Trace which is optimized from scratch to monitor and visualize micro-service architectures. The service map view of Trace helped us a lot during the migration to understand which application communicates with which one and what are the database and external dependencies.

Kubernetes Tutorial: Services in our infrastructure (Services in our infrastructure)

Since Trace is environment independent, we didn't have to change anything in our codebase, and we could use it to validate the migration and our expectations about the positive performance changes.

Kubernetes tutorial : Response times after the migration (Stable and fast response times)

Example

Check out our all together example repository for Node.js with Kubernetes and CircleCI:
https://github.com/RisingStack/kubernetes-nodejs-example

Tooling

Continuous deployment with CI

It's possible to update your Kubernetes deployment with a JSON path, or update only the image tag. After you have a working kubectl on your CI machine, you only need to run this command:

$ kubectl --namespace=staging set image deployment/foo foo=foo:GIT_SHORT_SHA

Debugging

In Kubernetes it's possible to run a shell inside any container, it's this easy:

$ kubectl get pod

NAME           READY     STATUS    RESTARTS   AGE  
foo-37kj5   1/1       Running   0          2d

$ kubectl exec foo-37kj5 -i -t -- sh
# whoami       
root  

Another useful thing is to check the pod events with:

$ kubectl describe pod foo-37kj5

You can also get the log message of any pod with the:

$ kubectl log foo-37kj5

Code piping

At our PaaS provider, we liked code piping between staging and production infrastructure. In Kubernetes we missed this, so we built our own solution.

It's a simple npm library which reads the current image tag from staging and sets it on the production deployment config.

Because the Docker container is the very same, only the environment variable changes.

SSL termination (https)

Kubernetes services are not exposed as https by default, but you can easily change this. To do so, read how to expose your applications with TLS in Kubernetes.

Conclusion

To summarize our experience with Kubernetes: we are very satisfied with it.

"Kubernetes helped us to improve our response times and failover + reduced our costs" via @RisingStack #kubernetes

Click To Tweet

We improved our applications response time in our micro-service architecture. We managed to raise security to the next level with the private network (VPC) between apps.

Also, we reduced our costs and improved the failover with the built-in rolling update strategy and liveness, readiness probes.

If you are in a state when you need to think about your infrastructure's future, you should definitely take Kubernetes into consideration!

If you have questions about migrating to Kubernetes from a PaaS, feel free to post them in the comment section.