Trace Node.js Monitoring Start my free trial!

node.js at scale

Node.js Async Best Practices & Avoiding Callback Hell - Node.js at Scale

Node.js Async Best Practices & Avoiding Callback Hell - Node.js at Scale

In this post, we cover what tools and techniques you have at your disposal when handling Node.js asynchronous operations: async.js, promises, generators and async functions.

After reading this article, you’ll know how to avoid the despised callback hell!


Node.js at Scale is a collection of articles focusing on the needs of companies with bigger Node.js installations and advanced Node developers. Chapters:


Asynchronous programming in Node.js

Previously we have gathered a strong knowledge about asynchronous programming in JavaScript and understood how the Node.js event loop works.

If you did not read these articles, I highly recommend them as introductions!

The Problem with Node.js Async

Node.js itself is single threaded, but some tasks can run parallelly - thanks to its asynchronous nature.

But what does running parallelly mean in practice?

Since we program a single threaded VM, it is essential that we do not block execution by waiting for I/O, but handle them concurrently with the help of Node.js's event driven APIs.

Let’s take a look at some fundamental patterns, and learn how we can write resource efficient, non-blocking code, with the built-in solutions of Node.js and some third-party libraries.

The Classical Approach - Callbacks

Let's take a look at these simple async operations. They do nothing special, just fire a timer and call a function once the timer finished.

function fastFunction (done) {  
  setTimeout(function () {
    done()
  }, 100)
}

function slowFunction (done) {  
  setTimeout(function () {
    done()
  }, 300)
}

Seems easy, right?

Our higher-order functions can be executed sequentially or parallelly with the basic "pattern" by nesting callbacks - but using this method can lead to an untameable callback-hell.

function runSequentially (callback) {  
  fastFunction((err, data) => {
    if (err) return callback(err)
    console.log(data)   // results of a

    slowFunction((err, data) => {
      if (err) return callback(err)
      console.log(data) // results of b

      // here you can continue running more tasks
    })
  })
}


Avoiding Callback Hell with Control Flow Managers

To become an efficient Node.js developer, you have to avoid the constantly growing indentation level, produce clean and readable code and be able to handle complex flows.

Let me show you some of the libraries we can use to organize our code in a nice and maintainable way!


Node.js Monitoring and Debugging from the Experts of RisingStack

Concurrency issues in production? Trace can help!
Start my free trial


#1: Meet the Async Module

Async is a utility module which provides straight-forward, powerful functions for working with asynchronous JavaScript.

Async contains some common patterns for asynchronous flow control with the respect of error-first callbacks.

Let's see how our previous example would look like using async!

async.waterfall([fastFunction, slowFunction], () => {  
  console.log('done')
})

What kind of witchcraft just happened?

Actually, there is no magic to reveal. You can easily implement your async job-runner which can run tasks parallelly and wait for each to be ready.

Let's take a look at what async does under the hood!

// taken from https://github.com/caolan/async/blob/master/lib/waterfall.js
function(tasks, callback) {  
    callback = once(callback || noop);
    if (!isArray(tasks)) return callback(new Error('First argument to waterfall must be an array of functions'));
    if (!tasks.length) return callback();
    var taskIndex = 0;

    function nextTask(args) {
        if (taskIndex === tasks.length) {
            return callback.apply(null, [null].concat(args));
        }

        var taskCallback = onlyOnce(rest(function(err, args) {
            if (err) {
                return callback.apply(null, [err].concat(args));
            }
            nextTask(args);
        }));

        args.push(taskCallback);

        var task = tasks[taskIndex++];
        task.apply(null, args);
    }

    nextTask([]);
}

Essentially, a new callback is injected into the functions, and this is how async knows when a function is finished.

#2: Using co - generator based flow-control for Node.js

In case you wouldn't like to stick to the solid callback protocol, then co can be a good choice for you.

co is a generator based control flow tool for Node.js and the browser, using promises, letting you write non-blocking code in a nice-ish way.

co is a powerful alternative which takes advantage of generator functions tied with promises without the overhead of implementing custom iterators.

const fastPromise = new Promise((resolve, reject) => {  
  fastFunction(resolve)
})

const slowPromise = new Promise((resolve, reject) => {  
  slowFunction(resolve)
})

co(function * () {  
  yield fastPromise
  yield slowPromise
}).then(() => {
  console.log('done')
})

As for now, I suggest to go with co, since one of the most waited Node.js async/await functionality is only available in the nightly, unstable v7.x builds. But if you are already using Promises, switching from co to async function will be easy.

This syntactic sugar on top of Promises and Generators will eliminate the problem of callbacks and even help you to build nice flow control structures. Almost like writing synchronous code, right?

Stable Node.js branches will receive this update in the near future, so you will be able to remove co and just do the same.

Flow Control in Practice

As we have just learned several tools and tricks to handle async, it is time to do some practice with fundamental control flows to make our code more efficient and clean.

Let’s take an example and write a route handler for our web app, where the request can be resolved after 3 steps: validateParams, dbQuery and serviceCall.

If you'd like to write them without any helper, you'd most probably end up with something like this. Not so nice, right?

// validateParams, dbQuery, serviceCall are higher-order functions
// DONT
function handler (done) {  
  validateParams((err) => {
    if (err) return done(err)
    dbQuery((err, dbResults) => {
      if (err) return done(err)
      serviceCall((err, serviceResults) => {
        done(err, { dbResults, serviceResults })
      })
    })
  })
}

Instead of the callback-hell, we can use the async library to refactor our code, as we have already learned:

// validateParams, dbQuery, serviceCall are higher-order functions
function handler (done) {  
  async.waterfall([validateParams, dbQuery, serviceCall], done)
}

Let's take it a step further! Rewrite it to use Promises:

// validateParams, dbQuery, serviceCall are thunks
function handler () {  
  return validateParams()
    .then(dbQuery)
    .then(serviceCall)
    .then((result) => {
      console.log(result)
      return result
    })
}

Also, you can use co powered generators with Promises:

// validateParams, dbQuery, serviceCall are thunks
const handler = co.wrap(function * () {  
  yield validateParams()
  const dbResults = yield dbQuery()
  const serviceResults = yield serviceCall()
  return { dbResults, serviceResults }
})

It feels like a "synchronous" code but still doing async jobs one after each other.

Lets see how this snippet should work with async / await.

// validateParams, dbQuery, serviceCall are thunks
async function handler () {  
  await validateParams()
  const dbResults = await dbQuery()
  const serviceResults = await serviceCall()
  return { dbResults, serviceResults }
})

Takeaway rules for Node.js & Async

Fortunately, Node.js eliminates the complexities of writing thread-safe code. You just have to stick to these rules to keep things smooth:

  • As a rule of thumb, prefer async over sync API, because using a non-blocking approach gives superior performance over the synchronous scenario.

  • Always use the best fitting flow control or a mix of them in order reduce the time spent waiting for I/O to complete.

You can find all of the code from this article in this repository.

If you have any questions or suggestions for the article, please let me know in the comments!

Advanced Node.js Project Structure Tutorial - Node.js at Scale

Advanced Node.js Project Structure Tutorial - Node.js at Scale

Project structuring is an important topic because the way you bootstrap your application can determine the whole development experience throughout the life of the project.

In this Node.js project structure tutorial I’ll answer some of the most common questions we receive at RisingStack about structuring advanced Node applications, and help you with structuring a complex project.

These are the goals that we are aiming for:

  • Writing an application that is easy to scale and maintain.
  • The config is well separated from the business logic.
  • Our application can consist of multiple process types.

Node.js at Scale is a collection of articles focusing on the needs of companies with bigger Node.js installations and advanced Node developers. Chapters:


The Node.js Project Structure

Our example application is listening on Twitter tweets and tracks certain keywords. In case of a keyword match, the tweet will be sent to a RabbitMQ queue, which will be processed and saved to Redis. We will also have a REST API exposing the tweets we have saved.

You can take a look at the code on GitHub. The file structure for this project looks like the following:

.
|-- config
|   |-- components
|   |   |-- common.js
|   |   |-- logger.js
|   |   |-- rabbitmq.js
|   |   |-- redis.js
|   |   |-- server.js
|   |   `-- twitter.js
|   |-- index.js
|   |-- social-preprocessor-worker.js
|   |-- twitter-stream-worker.js
|   `-- web.js
|-- models
|   |-- redis
|   |   |-- index.js
|   |   `-- redis.js
|   |-- tortoise
|   |   |-- index.js
|   |   `-- tortoise.js
|   `-- twitter
|       |-- index.js
|       `-- twitter.js
|-- scripts
|-- test
|   `-- setup.js
|-- web
|   |-- middleware
|   |   |-- index.js
|   |   `-- parseQuery.js
|   |-- router
|   |   |-- api
|   |   |   |-- tweets
|   |   |   |   |-- get.js
|   |   |   |   |-- get.spec.js
|   |   |   |   `-- index.js
|   |   |   `-- index.js
|   |   `-- index.js
|   |-- index.js
|   `-- server.js
|-- worker
|   |-- social-preprocessor
|   |   |-- index.js
|   |   `-- worker.js
|   `-- twitter-stream
|       |-- index.js
|       `-- worker.js
|-- index.js
`-- package.json

In this example we have 3 processes:

  • twitter-stream-worker: The process is listening on Twitter for keywords and sends the tweets to a RabbitMQ queue.
  • social-preprocessor-worker: The process is listening on the RabbitMQ queue and saves the tweets to Redis and removes old ones.
  • web: The process is serving a REST API with a single endpoint: GET /api/v1/tweets?limit&offset.

We will get to what differentiates a web and a worker process, but let's start with the config.

How to handle different environments and configurations?

Load your deployment specific configurations from environment variables and never add them to the codebase as constants. These are the configurations that can vary between deployments and runtime environments, like CI, staging or production. Basically, you can have the same code running everywhere.

A good test for whether the config is correctly separated from the application internals is that the codebase could be made public at any moment. This means that you can be protected from accidentally leaking secrets or compromising credentials on version control.

Your config is correctly separated from the apps internals if the codebase could be made public at any moment.

Click To Tweet

The environment variables can be accessed via the process.env object. Keep in mind that all the values have a type of String, so you might need to use type conversions.

// config/config.js
'use strict'

// required environment variables
[
  'NODE_ENV',
  'PORT'
].forEach((name) => {
  if (!process.env[name]) {
    throw new Error(`Environment variable ${name} is missing`)
  }
})

const config = {  
  env: process.env.NODE_ENV,
  logger: {
    level: process.env.LOG_LEVEL || 'info',
    enabled: process.env.BOOLEAN ? process.env.BOOLEAN.toLowerCase() === 'true' : false
  },
  server: {
    port: Number(process.env.PORT)
  }
  // ...
}

module.exports = config  



Need help with enterprise-grade Node.js Development?
Hire the experts of RisingStack!


Config validation

Validating environment variables is also a quite useful technique. It can help you catching configuration errors on startup before your application does anything else. You can read more about the benefits of early error detection of configurations by Adrian Colyer in this blog post.

This is how our improved config file looks like with schema validation using the joi validator:

// config/config.js
'use strict'

const joi = require('joi')

const envVarsSchema = joi.object({  
  NODE_ENV: joi.string()
    .allow(['development', 'production', 'test', 'provision'])
    .required(),
  PORT: joi.number()
    .required(),
  LOGGER_LEVEL: joi.string()
    .allow(['error', 'warn', 'info', 'verbose', 'debug', 'silly'])
    .default('info'),
  LOGGER_ENABLED: joi.boolean()
    .truthy('TRUE')
    .truthy('true')
    .falsy('FALSE')
    .falsy('false')
    .default(true)
}).unknown()
  .required()

const { error, value: envVars } = joi.validate(process.env, envVarsSchema)  
if (error) {  
  throw new Error(`Config validation error: ${error.message}`)
}

const config = {  
  env: envVars.NODE_ENV,
  isTest: envVars.NODE_ENV === 'test',
  isDevelopment: envVars.NODE_ENV === 'development',
  logger: {
    level: envVars.LOGGER_LEVEL,
    enabled: envVars.LOGGER_ENABLED
  },
  server: {
    port: envVars.PORT
  }
  // ...
}

module.exports = config  


Config splitting

Splitting the configuration by components can be a good solution to forego a single, growing config file.

// config/components/logger.js
'use strict'

const joi = require('joi')

const envVarsSchema = joi.object({  
  LOGGER_LEVEL: joi.string()
    .allow(['error', 'warn', 'info', 'verbose', 'debug', 'silly'])
    .default('info'),
  LOGGER_ENABLED: joi.boolean()
    .truthy('TRUE')
    .truthy('true')
    .falsy('FALSE')
    .falsy('false')
    .default(true)
}).unknown()
  .required()

const { error, value: envVars } = joi.validate(process.env, envVarsSchema)  
if (error) {  
  throw new Error(`Config validation error: ${error.message}`)
}

const config = {  
  logger: {
    level: envVars.LOGGER_LEVEL,
    enabled: envVars.LOGGER_ENABLED
  }
}

module.exports = config  

Then in the config.js file we only need to combine the components.

// config/config.js
'use strict'

const common = require('./components/common')  
const logger = require('./components/logger')  
const redis = require('./components/redis')  
const server = require('./components/server')

module.exports = Object.assign({}, common, logger, redis, server)  

You should never group your config together into "environment" specific files, like config/production.js for production. It doesn't scale well as your app expands into more deployments over time.

You should never group your config together into environment specific files. It doesn’t scale well! #nodejs

Click To Tweet

How to organize a multi-process application?

The process is the main building block of a modern application. An app can have multiple stateless processes, just like in our example. HTTP requests can be handled by a web process and long-running or scheduled background tasks by a worker. They are stateless, because any data that needs to be persisted is stored in a stateful database. For this reason, adding more concurrent processes are very simple. These processes can be independently scaled based on the load or other metrics.

In the previous section, we saw how to break down the config into components. This comes very handy when having different process types. Each type can have its own config only requiring the components it needs, without expecting unused environment variables.

In the config/index.js file:

// config/index.js
'use strict'

const processType = process.env.PROCESS_TYPE

let config  
try {  
  config = require(`./${processType}`)
} catch (ex) {
  if (ex.code === 'MODULE_NOT_FOUND') {
    throw new Error(`No config for process type: ${processType}`)
  }

  throw ex
}

module.exports = config  

In the root index.js file we start the process selected with the PROCESS_TYPE environment variable:

// index.js
'use strict'

const processType = process.env.PROCESS_TYPE

if (processType === 'web') {  
  require('./web')
} else if (processType === 'twitter-stream-worker') {
  require('./worker/twitter-stream')
} else if (processType === 'social-preprocessor-worker') {
  require('./worker/social-preprocessor')
} else {
  throw new Error(`${processType} is an unsupported process type. Use one of: 'web', 'twitter-stream-worker', 'social-preprocessor-worker'!`)
}

The nice thing about this is that we still got one application, but we have managed to split it into multiple, independent processes. Each of them can be started and scaled individually, without influencing the other parts. You can achieve this without sacrificing your DRY codebase, because parts of the code, like the models, can be shared between the different processes.

How to organize your test files?

Place your test files next to the tested modules using some kind of naming convention, like <module_name>.spec.js and <module_name>.e2e.spec.js. Your tests should live together with the tested modules, keeping them in sync. It would be really hard to find and maintain the tests and the corresponding functionality when the test files are completely separated from the business logic.

Place your test files next to the tested modules using some kind of naming convention, like module_name.spec.js

Click To Tweet

A separated /test folder can hold all the additional test setup and utilities not used by the application itself.

Where to put your build and script files?

We tend to create a /scripts folder where we put our bash and node scripts for database synchronization, front-end builds and so on. This folder separates them from your application code and prevents you from putting too many script files into the root directory. List them in your npm scripts for easier usage.

Conclusion

I hope you enjoyed this article on project structuring. I highly recommend to check out our previous article on the subject, where we laid out the 5 fundamentals of Node.js project structuring.

If you have any questions, please let me know in the comments. In the next chapter of the Node.js at Scale series, we’re going to dive deep into JavaScript clean coding. See you next week!


Node.js Garbage Collection Explained - Node.js at Scale

Node.js Garbage Collection Explained - Node.js at Scale

In this article, you are going to learn how Node.js garbage collection works, what happens in the background when you write code and how memory is freed up for you.

Ancient garbage collector in action

With Node.js at Scale we are creating a collection of articles focusing on the needs of companies with bigger Node.js installations, and developers who already learned the basics of Node.

Upcoming chapters for the Node.js at Scale series:


Memory Management in Node.js Applications

Every application needs memory to work properly. Memory management provides ways to dynamically allocate memory chunks for programs when they request it, and free them when they are no longer needed - so that they can be reused.

Application-level memory management can be manual or automatic. The automatic memory management usually involves a garbage collector.

The following code snippet shows how memory can be allocated in C, using manual memory management:

#include <stdio.h>
#include <stdlib.h>
#include <string.h>

int main() {

   char name[20];
   char *description;

   strcpy(name, "RisingStack");

   // memory allocation
   description = malloc( 30 * sizeof(char) );

   if( description == NULL ) {
      fprintf(stderr, "Error - unable to allocate required memory\n");
   } else {
      strcpy( description, "Trace by RisingStack is an APM.");
   }

   printf("Company name = %s\n", name );
   printf("Description: %s\n", description );

   // release memory
   free(description);
}

In manual memory management, it is the responsibility of the developer to free up the unused memory portions. Managing your memory this way can introduce several major bugs to your applications:

  • Memory leaks when the used memory space is never freed up.
  • Wild/dangling pointers appear when an object is deleted, but the pointer is reused. Serious security issues can be introduced when other data structures are overwritten or sensitive information is read.

Luckily for you, Node.js comes with a garbage collector, and you don't need to manually manage memory allocation.

The Concept of the Garbage Collector

Garbage collection is a way of managing application memory automatically. The job of the garbage collector (GC) is to reclaim memory occupied by unused objects (garbage). It was first used in LISP in 1959, invented by John McCarthy.

The way how the GC knows that objects are no longer in use is that no other object has references to them.

"A garbage collector was first used in LISP in 1959, invented by John McCarthy." via @RisingStack

Click To Tweet

Memory before the garbage collection

The following diagram shows how the memory can look like if you have objects with references to each other, and with some objects that have no reference to any objects. These are the objects that can be collected by a garbage collector run.

Memory state before Node.js garbage collection

Memory after the garbage collection

Once the garbage collector is run, the objects that are unreachable gets deleted, and the memory space is freed up.

Memory state after Node.js garbage collection

The Advantages of Using a Garbage Collector

  • it prevents wild/dangling pointers bugs,
  • it won't try to free up space that was already freed up,
  • it will protect you from some types of memory leaks.

Of course, using a garbage collector doesn't solve all of your problems, and it’s not a silver bullet for memory management. Let's take a look at things that you should keep in mind!

"Using a garbage collector doesn't solve all of your memory management problems with #nodejs!" via @RisingStack

Click To Tweet


Things to Keep in Mind When Using a Garbage Collector

  • performance impact - in order to decide what can be freed up, the GC consumes computing power
  • unpredictable stalls - modern GC implementations try to avoid "stop-the-world" collections

Node.js Garbage Collection & Memory Management in Practice

The easiest way of learning is by doing - so I am going to show you what happens in the memory with different code snippets.

The Stack

The stack contains local variables and pointers to objects on the heap or pointers defining the control flow of the application.

In the following example, both a and b will be placed on the stack.

function add (a, b) {  
  return a + b
}

add(4, 5)  



Need help with enterprise-grade Node.js Development?
Hire the experts of RisingStack!


The Heap

The heap is dedicated to store reference type objects, like strings or objects.

The Car object created in the following snippet is placed on the heap.

function Car (opts) {  
  this.name = opts.name
}

const LightningMcQueen = new Car({name: 'Lightning McQueen'})  

After this, the memory would look something like this:

Node.js Garbage Collection First Step - Object Placed in the Memory Heap

Let's add more cars, and see how our memory would look like!

function Car (opts) {  
  this.name = opts.name
}

const LightningMcQueen = new Car({name: 'Lightning McQueen'})  
const SallyCarrera = new Car({name: 'Sally Carrera'})  
const Mater = new Car({name: 'Mater'})  

Node.js Garbage Collection Second Step - More elements added to the heap

If the GC would run now, nothing could be freed up, as the root has a reference to every object.

Let's make it a little bit more interesting, and add some parts to our cars!

function Engine (power) {  
  this.power = power
}

function Car (opts) {  
  this.name = opts.name
  this.engine = new Engine(opts.power)
}

let LightningMcQueen = new Car({name: 'Lightning McQueen', power: 900})  
let SallyCarrera = new Car({name: 'Sally Carrera', power: 500})  
let Mater = new Car({name: 'Mater', power: 100})  

Node.js Garbage Collection - Assigning values to the objects in the heap

What would happen, if we no longer use Mater, but redefine it and assign some other value, like Mater = undefined?

Node.js Garbage Collection - Redefining values

As a result, the original Mater object cannot be reached from the root object, so on the next garbage collector run it will be freed up:

Node.js Garbage Collection - Freeing up the unreachable object

Now as we understand the basics of what's the expected behaviour of the garbage collector, let's take a look on how it is implemented in V8!

Garbage Collection Methods

In one of our previous articles we dealt with how the Node.js garbage collection methods work, so I strongly recommend reading that article.

Here are the most important things you’ll learn there:

New Space and Old Space

The heap has two main segments, the New Space and the Old Space. The New Space is where new allocations are happening; it is fast to collect garbage here and has a size of ~1-8MBs. Objects living in the New Space are called Young Generation.

The Old Space where the objects that survived the collector in the New Space are promoted into - they are called the Old Generation. Allocation in the Old Space is fast, however collection is expensive so it is infrequently performed .

Young Generation

Usually, ~20% of the Young Generation survives into the Old Generation. Collection in the Old Space will only commence once it is getting exhausted. To do so the V8 engine uses two different collection algorithms.

Scavenge and Mark-Sweep collection

Scavenge collection is fast and runs on the Young Generation, however the slower Mark-Sweep collection runs on the Old Generation.

A Real-Life Example - The Meteor Case-Study

In 2013, the creators of Meteor announced their findings about a memory leak they ran into. The problematic code snippet was the following:

var theThing = null  
var replaceThing = function () {  
  var originalThing = theThing
  var unused = function () {
    if (originalThing)
      console.log("hi")
  }
  theThing = {
    longStr: new Array(1000000).join('*'),
    someMethod: function () {
      console.log(someMessage)
    }
  };
};
setInterval(replaceThing, 1000)  

Well, the typical way that closures are implemented is that every function object has a link to a dictionary-style object representing its lexical scope. If both functions defined inside replaceThing actually used originalThing, it would be important that they both get the same object, even if originalThing gets assigned to over and over, so both functions share the same lexical environment. Now, Chrome's V8 JavaScript engine is apparently smart enough to keep variables out of the lexical environment if they aren't used by any closures - from the Meteor blog.

Further reading:


Next up

In the next chapter of the Node.js at Scale tutorial series we will take a deep dive into writing native Node.js module.

In the meantime, let us know in the comments sections if you have any questions!


Understanding the Node.js Event Loop - Node.js at Scale

Understanding the Node.js Event Loop - Node.js at Scale

This article helps you to understand how the Node.js event loop works, and how you can leverage it to build fast applications. We’ll also discuss the most common problems you might encounter, and the solutions for them.

With Node.js at Scale we are creating a collection of articles focusing on the needs of companies with bigger Node.js installations, and developers who already learned the basics of Node.

Upcoming chapters for the Node.js at Scale series:


The problem

Most of the backends behind websites don’t need to do complicated computations. Our programs spend most of their time waiting for the disk to read & write , or waiting for the wire to transmit our message and send back the answer.

IO operations can be orders of magnitude slower than data processing. Take this for example: SSD-s can have a read speed of 200-730 MB/s - at least a high-end one. Reading just one kilobyte of data would take 1.4 microseconds, but during this time a CPU clocked at 2GHz could have performed 28 000 of instruction-processing cycles.

For network communications it can be even worse, just try and ping google.com

$ ping google.com
64 bytes from 172.217.16.174: icmp_seq=0 ttl=52 time=33.017 ms  
64 bytes from 172.217.16.174: icmp_seq=1 ttl=52 time=83.376 ms  
64 bytes from 172.217.16.174: icmp_seq=2 ttl=52 time=26.552 ms  
64 bytes from 172.217.16.174: icmp_seq=3 ttl=52 time=40.153 ms  
64 bytes from 172.217.16.174: icmp_seq=4 ttl=52 time=37.291 ms  
64 bytes from 172.217.16.174: icmp_seq=5 ttl=52 time=58.692 ms  
64 bytes from 172.217.16.174: icmp_seq=6 ttl=52 time=45.245 ms  
64 bytes from 172.217.16.174: icmp_seq=7 ttl=52 time=27.846 ms  

The average latency is about 44 milliseconds. Just while waiting for a packet to make a round-trip on the wire, the previously mentioned processor can perform 88 millions of cycles.

The solution

Most operational systems provide some kind of an Asynchronous IO interface, which allows you to start processing data that does not require the result of the communication, meanwhile the communication still goes on..

This can be achieved in several ways. Nowadays it is mostly done by leveraging the possibilities of multithreading at the cost of extra software complexity. For example reading a file in Java or Python is a blocking operation. Your program cannot do anything else while it is waiting for the network / disk communication to finish. All you can do - at least in Java - is to fire up a different thread then notify your main thread when the operation has finished.

It is tedious, complicated, but gets the job done. But what about Node? Well, we are surely facing some problems as Node.js - or more like V8 - is single-threaded. Our code can only run in one thread.

EDIT: This is not entirely true. Both Java and Python have async interfaces, but using them is definitely more difficult than in Node.js. Thanks to Shahar and Dirk Harrington for pointing this out.

You might have heard that in a browser, setting setTimeout(someFunction, 0) can sometimes fix things magically. But why does setting a timeout to 0, deferring execution by 0 milliseconds fix anything? Isn’t it the same as simply calling someFunction immediately? Not really.

First of all, let's take a look at the call stack, or simply, “stack”. I am going to make things simple, as we only need to understand the very basics of the call stack. In case you are familiar how it works, feel free to jump to the next section.

Stack

Whenever you call a functions return address, parameters and local variables will be pushed to the stack. If you call another function from the currently running function, its contents will be pushed on top in the same manner as the previous one - with its return address.

For the sake of simplicity I will say that 'a function is pushed' to the top of the stack from now on, even though it is not exactly correct.

Let's take a look!

 1 function main () {
 2   const hypotenuse = getLengthOfHypotenuse(3, 4)
 3   console.log(hypotenuse)
 4 }
 5
 6 function getLengthOfHypotenuse(a, b) {
 7   const squareA = square(a)
 8   const squareB = square(b)
 9   const sumOfSquares = squareA + squareB
10   return Math.sqrt(sumOfSquares)  
11 }  
12  
13 function square(number) {  
14   return number * number  
15 }  
16  
17 main()  

main is called first:

The main function

then main calls getLengthOfHypotenuse with 3 and 4 as arguments

The getLengthOfHypotenuse function

afterwards square is with the value of a

The square(a) function

when square returns, it is popped from the stack, and its return value is assigned to squareA. squareA is added to the stack frame of getLengthOfHypotenuse

Variable a

same goes for the next call to square

The square(b) function

Variable b

in the next line the expression squareA + squareB is evaluated

sumOfSquares

then Math.sqrt is called with sumOfSquares

Math.sqrt

now all is left for getLengthOfHypotenuse is to return the final value of its calculation

The return function

the returned value gets assigned to hypotenuse in main

hypotenuse

the value of hypotenuse is logged to console

The console log

finally, main returns without any value, gets popped from the stack leaving it empty

Finally

SIDE NOTE: You saw that local variables are popped from the stack when the functions execution finishes. It happens only when you work with simple values such as numbers, strings and booleans. Values of objects, arrays and such are stored in the heap and your variable is merely a pointer to them. If you pass on this variable, you will only pass the said pointer, making these values mutable in different stack frames. When the function is popped from the stack, only the pointer to the Object gets popped with leaving the actual value in the heap. The garbage collector is the guy who takes care of freeing up space once the objects outlived their usefulness.

Enter Node.js Event Loop

The Node.js Event Loop - cat version

No, not this loop. :)

So what happens when we call something like setTimeout, http.get, process.nextTick, or fs.readFile? Neither of these things can be found in V8's code, but they are available in the Chrome WebApi and the C++ API in case of Node.js. To understand this, we will have to understand the order of execution a little bit better.

Let's take a look at a more common Node.js application - a server listening on localhost:3000/. Upon getting a request, the server will call wttr.in/<city> to get the weather, print some kind messages to the console, and it forwards responses to the caller after recieving them.

'use strict'  
const express = require('express')  
const superagent = require('superagent')  
const app = express()

app.get('/', sendWeatherOfRandomCity)

function sendWeatherOfRandomCity (request, response) {  
  getWeatherOfRandomCity(request, response)
  sayHi()
}

const CITIES = [  
  'london',
  'newyork',
  'paris',
  'budapest',
  'warsaw',
  'rome',
  'madrid',
  'moscow',
  'beijing',
  'capetown',
]

function getWeatherOfRandomCity (request, response) {  
  const city = CITIES[Math.floor(Math.random() * CITIES.length)]
  superagent.get(`wttr.in/${city}`)
    .end((err, res) => {
      if (err) {
        console.log('O snap')
        return response.status(500).send('There was an error getting the weather, try looking out the window')
      }
      const responseText = res.text
      response.send(responseText)
      console.log('Got the weather')
    })

  console.log('Fetching the weather, please be patient')
}

function sayHi () {  
  console.log('Hi')
}

app.listen(3000)  

What will be printed out aside from getting the weather when a request is sent to localhost:3000?

If you have some experience with Node, you shouldn't be surprised that even though console.log('Fetching the weather, please be patient') is called after console.log('Got the weather') in the code, the former will print first resulting in:

Fetching the weather, please be patient  
Hi  
Got the weather  

What happened? Even though V8 is single-threaded, the underlying C++ API of Node isn't. It means that whenever we call something that is a non-blocking operation, Node will call some code that will run concurrently with our javascript code under the hood. Once this hiding thread receives the value it awaits for or throws an error, the provided callback will be called with the necessary parameters.

SIDE NOTE: The ‘some code’ we mentioned is actually part of libuv. libuv is the open source library that handles the thread-pool, doing signaling and all other magic that is needed to make the asynchronous tasks work. It was originally developed for Node.js but a lot of other projects use of it by now.



Need help with enterprise-grade Node.js Development?
Hire the experts of RisingStack!


To peek under the hood, we need to introduce two new concepts: the event loop and the task queue.

Task queue

Javascript is a single-threaded, event-driven language. This means that we can attach listeners to events, and when a said event fires, the listener executes the callback we provided.

Whenever you call setTimeout, http.get or fs.readFile, Node.js sends these operations to a different thread allowing V8 to keep executing our code. Node also calls the callback when the counter has run down or the IO / http operation has finished.

These callbacks can enqueue other tasks and those functions can enqueue others and so on. This way you can read a file while processing a request in your server, and then make an http call based on the read contents without blocking other requests from being handled.

"#nodejs sends IO operations to different threads so #v8 can keep executing our code" via @RisingStack #javascript

Click To Tweet

However, we only have one main thread and one call-stack, so in case there is another request being served when the said file is read, its callback will need to wait for the stack to become empty. The limbo where callbacks are waiting for their turn to be executed is called the task queue (or event queue, or message queue). Callbacks are being called in an infinite loop whenever the main thread has finished its previous task, hence the name 'event loop'.

In our previous example it would look something like this:

  1. express registers a handler for the 'request' event that will be called when request arrives to '/'
  2. skips the functions and starts listening on port 3000
  3. the stack is empty, waiting for 'request' event to fire
  4. upon incoming request, the long awaited event fires, express calls the provided handler sendWeatherOfRandomCity
  5. sendWeatherOfRandomCity is pushed to the stack
  6. getWeatherOfRandomCity is called and pushed to the stack
  7. Math.floor and Math.random are called, pushed to the stack and popped, a from cities is assigned to city
  8. superagent.get is called with 'wttr.in/${city}', the handler is set for the end event.
  9. the http request to http://wttr.in/${city} is send to a background thread, and the execution continues
  10. 'Fetching the weather, please be patient' is logged to the console, getWeatherOfRandomCity returns
  11. sayHi is called, 'Hi' is printed to the console
  12. sendWeatherOfRandomCity returns, gets popped from the stack leaving it empty
  13. waiting for http://wttr.in/${city} to send it's response
  14. once the response has arrived, the end event is fired.
  15. the anonymous handler we passed to .end() is called, gets pushed to the stack with all variables in its closure, meaning it can see and modify the values of express, superagent, app, CITIES, request, response, city and all the functions we have defined
  16. response.send() gets called either with 200 or 500 statusCode, but again it is sent to a background thread, so the response stream is not blocking our execution, anonymous handler is popped from the stack.

So now we can understand why the previously mentioned setTimeout hack works. Even though we set the counter to zero, it defers the execution until the current stack and the task queue is empty, allowing the browser to redraw the UI, or Node to serve other requests.

Microtasks and Macrotasks

If this wasn't enough, we actually have more then one task queue. One for microtasks and another for macrotasks.

examples of microtasks:

  • process.nextTick
  • promises
  • Object.observe

examples of macrotasks:

  • setTimeout
  • setInterval
  • setImmediate
  • I/O

Let's take a look at the following code:

console.log('script start')

const interval = setInterval(() => {  
  console.log('setInterval')
}, 0)

setTimeout(() => {  
  console.log('setTimeout 1')
  Promise.resolve().then(() => {
    console.log('promise 3')
  }).then(() => {
    console.log('promise 4')
  }).then(() => {
    setTimeout(() => {
      console.log('setTimeout 2')
      Promise.resolve().then(() => {
        console.log('promise 5')
      }).then(() => {
        console.log('promise 6')
      }).then(() => {
        clearInterval(interval)
      })
    }, 0)
  })
}, 0)

Promise.resolve().then(() => {  
  console.log('promise 1')
}).then(() => {
  console.log('promise 2')
})

this will log to the console:

script start  
promise1  
promise2  
setInterval  
setTimeout1  
promise3  
promise4  
setInterval  
setTimeout2  
setInterval  
promise5  
promise6  

According to the WHATVG specification, exactly one (macro)task should get processed from the macrotask queue in one cycle of the event loop. After said macrotask has finished, all of the available microtasks will be processed within the same cycle. While these microtasks are being processed, they can queue more microtasks, which will all be run one by one, until the microtask queue is exhausted.

This diagram tries to make the picture a bit clearer:

The Node.js Event Loop

In our case:

Cycle 1:

  1. `setInterval` is scheduled as task
  2. `setTimeout 1` is scheduled as task
  3. in `Promise.resolve 1` both `then`s are scheduled as microtasks
  4. the stack is empty, microtasks are run

Task queue: setInterval, setTimeout 1

Cycle 2:

  1. the microtask queue is empty, `setInteval`'s handler can be run, another `setInterval` is scheduled as a task, right behind `setTimeout 1`

Task queue: setTimeout 1, setInterval

Cycle 3:

  1. the microtask queue is empty, `setTimeout 1`'s handler can be run, `promise 3` and `promise 4` are scheduled as microtasks,
  2. handlers of `promise 3` and `promise 4` are run `setTimeout 2` is scheduled as task

Task queue: setInterval, setTimeout 2

Cycle 4:

  1. the microtask queue is empty, `setInteval`'s handler can be run, another `setInterval` is scheduled as a task, right behind `setTimeout`

Task queue: setTimeout 2, setInteval

  1. `setTimeout 2`'s handler run, `promise 5` and `promise 6` are scheduled as microtasks

Now handlers of promise 5 and promise 6 should be run clearing our interval, but for some strange reason setInterval is run again. However, if you run this code in Chrome, you will get the expected behavior.

We can fix this in Node too with process.nextTick and some mind-boggling callback hell.

console.log('script start')

const interval = setInterval(() => {  
  console.log('setInterval')
}, 0)

setTimeout(() => {  
  console.log('setTimeout 1')
  process.nextTick(() => {
    console.log('nextTick 3')
    process.nextTick(() => {
      console.log('nextTick 4')
      setTimeout(() => {
        console.log('setTimeout 2')
        process.nextTick(() => {
          console.log('nextTick 5')
          process.nextTick(() => {
            console.log('nextTick 6')
            clearInterval(interval)
          })
        })
      }, 0)
    })
  })
})

process.nextTick(() => {  
  console.log('nextTick 1')
  process.nextTick(() => {
    console.log('nextTick 2')
  })
})

This is the exact same logic as our beloved promises use, only a little bit more hideous. At least it gets the job done the way we expected.

Download the whole Node.js Under the Hood tutorial series and read it later

Tame the async beast!

As we saw, we need to manage and pay attention to both task queues, and to the event loop when we write an app in Node.js - in case we wish to leverage all its power, and if we want to keep our long running tasks from blocking the main thread.

The event loop might be a slippery concept to grasp at first, but once you get the hang of it, you won't be able to imagine that there is life without it. The continuation passing style that can lead to a callback hell might look ugly, but we have Promises, and soon we will have async-await in our hands... and while we are (a)waiting, you can simulate async-await using co and/or koa.

One last parting advice:

Knowing how Node.js and V8 handles long running executions, you can start using it for your own good. You might have heard before that you should send your long running loops to the task queue. You can do it by hand or make use of async.js.

Happy coding!

If you have any questions or thoughts, share them in the comments, I’ll be there! The next part of the Node.js at Scale series is discussing the Garbage Collection in Node.js, I recommend to check it out!

How the module system, CommonJS & require works - Node.js at Scale

How the module system, CommonJS & require works - Node.js at Scale

In the third chapter of Node.js at Scale you are about to learn how the Node.js module system & CommonJS works and what does require do under the hood.

With Node.js at Scale we are creating a collection of articles focusing on the needs of companies with bigger Node.js installations, and developers who already learned the basics of Node.

Upcoming chapters for the Node.js at Scale series:


CommonJS to the rescue

The JavaScript language didn’t have a native way of organizing code before the ES2015 standard. Node.js filled this gap with the CommonJS module format. In this article we will learn about how the Node.js module system works, how you can organize your modules and what does the new ES standard means for the future of Node.js.

"#JavaScript didn't have a mature module system before #nodejs. That gap was filled with #commonjs" via @RisingStack

Click To Tweet

What is the module system?

Modules are the fundamental building blocks of the code structure. The module system allows you to organize your code, hide information and only expose the public interface of a component using module.exports. Every time you use the require call, you are loading another module.

The simplest example can be the following using CommonJS:

// add.js
function add (a, b) {  
  return a + b
}

module.exports = add  

To use the add module we have just created, we have to require it.

// index.js
const add = require('./add')

console.log(add(4, 5))  
//9

Under the hood, add.js is wrapped by Node.js this way:

(function (exports, require, module, __filename, __dirname) {
  function add (a, b) {
    return a + b
  }

  module.exports = add
})

This is why you can access the global-like variables like require and module. It also ensures that your variables are scoped to your module rather than the global object.

"Modules are the fundamental building blocks of the code structure." via @RisingStack #nodejs

Click To Tweet

How does require work?

The module loading mechanism in Node.js is caching the modules on the first require call. It means that every time you use require('awesome-module') you will get the same instance of awesome-module, which ensures that the modules are singleton-like and have the same state across your application.

You can load native modules and path references from your file system or installed modules. If the identifier passed to the require function is not a native module or a file reference (beginning with /, ../, ./ or similar), then Node.js will look for installed modules. It will walk your file system looking for the referenced module in the node_modules folder. It starts from the parent directory of your current module and then moves to the parent directory until it finds the right module or until the root of the file system is reached.



Need help with enterprise-grade Node.js Development?
Hire the experts of RisingStack!


Require under the hood - module.js

The module dealing with module loading in the Node core is called module.js, and can be found in lib/module.js in the Node.js repository.

The most important functions to check here are the _load and _compile functions.

Module._load

This function checks whether the module is in the cache already - if so, it returns the exports object.

If the module is native, it calls the NativeModule.require() with the filename and returns the result.

Otherwise, it creates a new module for the file and saves it to the cache. Then it loads the file contents before returning its exports object.

Module._compile

The compile function runs the file contents in the correct scope or sandbox, as well as exposes helper variables like require, module or exports to the file.

How require works in Node.js How Require Works - From James N. Snell

How to organize the code?

In our applications, we need to find the right balance of cohesion and coupling when creating modules. The desirable scenario is to achieve high cohesion and loose coupling of the modules.

A module must be focused only on a single part of the functionality to have high cohesion. Loose coupling means that the modules should not have a global or shared state. They should only communicate by passing parameters, and they are easily replaceable without touching your broader codebase.

"The desirable scenario is to achieve high cohesion and loose coupling of the modules." via @RisingStack #nodejs

Click To Tweet

We usually export named functions or constants in the following way:

'use strict'

const CONNECTION_LIMIT = 0

function connect () { /* ... */ }

module.exports = {  
  CONNECTION_LIMIT,
  connect
}

What’s in your node_modules?

The node_modules folder is the place where Node.js looks for modules. npm v2 and npm v3 install your dependencies differently. You can find out what version of npm you are using by executing:

npm --version  

npm v2

npm 2 installs all dependencies in a nested way, where your primary package dependencies are in their node_modules folder.

npm v3

npm3 attempts to flatten these secondary dependencies and install them in the root node_modules folder. This means that you can’t tell by looking at your node_modules which packages are your explicit or implicit dependencies. It is also possible that the installation order changes your folder structure because npm 3 is non-deterministic in this manner.


You can make sure that your node_modules directory is always the same by installing packages only from a package.json. In this case, it installs your dependencies in alphabetical order, which also means that you will get the same folder tree. This is important because the modules are cached using their path as the lookup key. Each package can have its own child node_modules folder, which might result in multiple instances of the same package and of the same module.

How to handle your modules?

There are two main ways for wiring modules. One of them is using hard coded dependencies, explicitly loading one module into another using a require call. The other method is to use a dependency injection pattern, where we pass the components as a parameter or we have a global container (known as IoC, or Inversion of Control container), which centralizes the management of the modules.

We can allow Node.js to manage the modules life cycle by using hard coded module loading. It organizes your packages in an intuitive way, which makes understanding and debugging easy.

Dependency Injection is rarely used in a Node.js environment, although it is a useful concept. The DI pattern can result in an improved decoupling of the modules. Instead of explicitly defining dependencies for a module, they are received from the outside. Therefore they can be easily replaced with modules having the same interfaces.

Let’s see an example for DI modules using the factory pattern:

class Car {  
  constructor (options) {
    this.engine = options.engine
  }

  start () {
    this.engine.start()
  }
}

function create (options) {  
  return new Car(options)
}

module.exports = create  

The ES2015 module system

As we saw above, the CommonJS module system uses a runtime evaluation of the modules, wrapping them into a function before the execution. The ES2015 modules don’t need to be wrapped since the import/export bindings are created before evaluating the module. This incompatibility is the reason that currently there are no JavaScript runtime supporting the ES modules. There was a lot of discussion about the topic and a proposal is in DRAFT state, so hopefully we will have support for it in future Node versions.

To read an in-depth explanation of the biggest differences between CommonJS and the ESM, read the following article by James M Snell.

Download the whole Learn using npm series as a single pdf

Next up

I hope this article contained valuable information about the module system and how require works. If you have any questions or insights on the topic, please share them in the comments. In the next chapter of the Node.js at Scale series, we are going to take a deep dive and learn about the event loop.


npm Publishing Tutorial - Node.js at Scale

npm Publishing Tutorial - Node.js at Scale

With Node.js at Scale we are creating a collection of articles focusing on the needs of companies with bigger Node.js installations, and developers who already learned the basics of Node.

In this second chapter of Node.js at Scale you are going to learn how to expand the npm registry with your own modules. This tutorial is also going to explain how versioning works.

Upcoming chapters for the Node.js at Scale series:


npm Module Publishing

When writing Node.js apps, there are so many things on npm that can help us being more productive. We don't have to deal with low-level things like padding a string from the left because there are already existing modules that are (eventually) available on the npm registry.

Where do these modules come from?

The modules are stored in a huge registry which is powered by a CouchDB instance.

The official public npm registry is at https://registry.npmjs.org/. It is powered by a CouchDB database, which has a public mirror at https://skimdb.npmjs.com/registry. The code for the couchapp is available at https://github.com/npm/npm-registry-couchapp.

How do modules make it to the registry?

People like you write them for themselves or for their co-workers and they share the code with their fellow JavaScript developers.

When should I consider publishing?

  • If you want to share code between projects,
  • if you think that others might run into the very same problem and you'd like to help them,
  • if you have a bit (or even more) code that you think you can make use of later.

Creating a module

First let's create a module: npm init -y should take care of it, as you've learned in the previous post.

{
  "name": "npm-publishing",
  "version": "1.0.0",
  "description": "",
  "main": "index.js",
  "scripts": {
    "test": "echo \"Error: no test specified\" && exit 1"
  },
  "keywords": [],
  "author": "",
  "repository": {
    "type": "git",
    "url": "git+https://github.com/author/modulename"
  },
  "bugs": {
    "url": "https://github.com/caolan/async/issues"
  },
  "license": "ISC"
}

Let's break this down really quick. These fields in your package.json are mandatory when you're building a module for others to use.

First, you should give your module a distinct name because it has to be unique in the npm registry. Make sure it does not collide with any trademarks out there! main describes which file will be returned when your users do a require('modulename'). You can leave it as default or set it to any file in your project, but make sure you actually point it to a valid filename.

keywords should also be included because npm is going to index your package based on those fields and people will be able to find your module if they search those keywords in npm's search, or in any third party npm search site.

author, well obviously that's going to be you, but if anyone helps you develop your project be so kind to include them too! :) Also, it is very important to include where can people contact you if they'd like to.

In the repository field, you can see where the code is hosted and the bugs section tells you where can you file bugs if you find one in the package. To quickly jump to the bug report site you can use npm bug modulename.

#1 Licensing

Solid license and licenses adoption helps Node adoption by large companies. Code is a valuable resource, and sharing it has it's own costs.

Licensing is a really hard, but this site can help you pick one that fits your needs.

Generally when people publish modules to npm they use the MIT license.

The MIT License is a permissive free software license originating at the Massachusetts Institute of Technology (MIT). As a permissive license, it puts only very limited restriction on reuse and has therefore an excellent license compatibility.

#2 Semantic Versioning

Versioning is so important that it deserves its own section.

Most of the modules in the npm registry follow the specification called semantic versioning. Semantic versioning describes the version of a software as 3 numbers separated by "."-s. It describes how this version number has to change when changes are made to the software itself.

Given a version number MAJOR.MINOR.PATCH, increment the:

  • MAJOR version when you make incompatible API changes,
  • MINOR version when you add functionality in a backwards-compatible manner, and
  • PATCH version when you make backwards-compatible bug fixes.

Additional labels for the pre-release and the build metadata are available as extensions to the MAJOR.MINOR.PATCH format.

These numbers are for machines, not for humans! Don't assume that people will be discouraged from using your libraries when you often change the major version.

"If you break your API, think about your users and BUMP THAT MAJOR!" via @RisingStack #nodejs #semver

Click To Tweet

You have to start versioning at 1.0!

Most people think that doing changes while the software is still in "beta" phase should not respect the semantic versioning. They are wrong! It is really important to communicate breaking changes to your users even in beta phase. Always think about your users who want to experiment with your project.



Need help with enterprise-grade Node.js Development?
Hire the experts of RisingStack!


#3 Documentation

Having a proper documentation is imperative if you’d like to share your code with others. Putting a README.md file in your project’s root folder is usually enough, and if you publish it to the registry npm will generate a site like this one. It's all done automatically and it helps other people when they try to use your code.

Before publishing, make sure you have all documentation in place and up to date.

#4 Keeping secret files out of your package

Using a specific file called .npmignore will keep your secret or private files from publishing. Use that to your advantage, add files to .npmignore that you wish to not upload.

If you use .gitignore npm will use that too by default. Like git, npm looks for .npmignore and .gitignore files in all subdirectories of your package, not only in the root directory.

#5 Encouraging contributions

When you open up your code to the public, you should consider adding some guidelines for them on how to contribute. Make sure they know how to help you dealing with software bugs and adding new features to your module.

There are a few of these available, but in general you should consider using github's issue and pull-request templates.

npm publish

Now you understand everything that's necessary to publish your first module. To do so, you can type: npm publish and the npm-cli will upload the code to the registry.

Congratulations, your module is now public on the npm registry! Visit
www.npmjs.com/package/yourpackagename for the public URL.

If you published something public to npm, it's going to stay there forever. There is little you can do to make it non-discoverable. Once it hits the public registry, every other replica that's connected to it will copy all the data. Be careful when publishing.

I published something that I didn't mean to.

We're human. We make mistakes, but what can be done now? Since the recent leftpad scandal, npm changed the unpublish policy. If there is no package on the registry that depends on your package, then you're fine to unpublish it, but remember all the replicas will copy all the data so someone somewhere will always be able to get it. If it contained any secrets, make sure you change them after the act, and remember to add them to the .npmignore file for the next publish.

"If you accidentally published secrets to #npm change & add them to the .npmignore file!" via @RisingStack #nodejs

Click To Tweet

Private Scoped Packages

If you don't want or you're not allowed to publish code to a public registry (for any corporate reasons), npm allows organizations to open an organization account so that they can push to the registry without being public. This way you can share private code between you and your co-workers.

Further read on how to set it up: https://docs.npmjs.com/misc/scope

npm enterprise

If you'd like to further tighten your security by running a registry by yourself, you can do that pretty easily. npm has an on-premise version that can be run behind corporate firewalls. Read more about setting up npm enterprise.

Download the whole Learn using npm series as a single pdf

Build something!

Now that you know all these things, go and build something. If you’re up for a little bragging, make sure you tweet us (@risingstack) the name of the package this tutorial helped you to build! If you have any questions, you’ll find me in the comments.

Happy publishing!

In the next part of the Node.js at Scale series, you're going to learn about the Node.js module system and require.


npm Best Practices - Node.js at Scale

npm Best Practices - Node.js at Scale

Node Hero was a Node.js tutorial series with focusing on teaching the most essential Node.js best practices, so one can start developing applications using it.

With our new series, called Node.js at Scale, we are creating a collection of articles focusing on the needs of companies with bigger Node.js installations, and developers who already learned the basics of Node.

In the first chapter of Node.js at Scale you are going to learn the best practices on using npm as well as tips and tricks that can save you a lot of time on a daily basis.

Upcoming chapters for the Node.js at Scale series:


npm Best Practices

npm install is the most common way of using the npm cli - but it has a lot more to offer! In this chapter of Node.js at Scale you will learn how npm can help you during the full lifecycle of your application - from starting a new project through development and deployment.

#0 Know your npm

Before diving into the topics, let's see some commands that help you with what version of npm you are running, or what commands are available.

npm versions

To get the version of the npm cli you are actively using, you can do the following:

$ npm --version
2.13.2  

npm can return a lot more than just its own version - it can return the version of the current package, the Node.js version you are using and OpenSSL or V8 versions:

$ npm version
{ bleak: '1.0.4',
  npm: '2.15.0',
  ares: '1.10.1-DEV',
  http_parser: '2.5.2',
  icu: '56.1',
  modules: '46',
  node: '4.4.2',
  openssl: '1.0.2g',
  uv: '1.8.0',
  v8: '4.5.103.35',
  zlib: '1.2.8' }

npm help

As most cli toolkits, npm has a great built-in help functionality as well. Description and synopsis are always available. These are essentially man-pages.

$ npm help test
NAME  
       npm-test - Test a package

SYNOPSIS  
           npm test [-- <args>]

           aliases: t, tst

DESCRIPTION  
       This runs a package's "test" script, if one was provided.

       To run tests as a condition of installation, set the npat config to true.

"9 npm best practices - a must-read collection for #nodejs developers" via @RisingStack

Click To Tweet

#1 Start new projects with npm init

When starting a new project npm init can help you a lot by interactively creating a package.json file. This will prompt questions for example on the project's name or description. However, there is a quicker solution!

$ npm init --yes

If you use npm init --yes, it won't prompt for anything, just create a package.json with your defaults. To set these defaults, you can use the following commands:

npm config set init.author.name YOUR_NAME  
npm config set init.author.email YOUR_EMAIL  



Need help with enterprise-grade Node.js Development?
Hire the experts of RisingStack!


#2 Finding npm packages

Finding the right packages can be quite challenging - there are hundreds of thousands of modules you can choose from. We know this from experience, and developers participating in our latest Node.js survey also told us that selecting the right npm package is frustrating. Let's try to pick a module that helps us sending HTTP requests!

One website that makes the task a lot easier is npms.io. It shows metrics like quality, popularity and maintenance. These are calculated based on whether a module has outdated dependencies, does it have linters configured, is it covered with tests or when the most recent commit was made.

finding npm packages

#3 Investigate npm packages

Once we picked our module (which will be the request module in our example), we should take a look at the documentation, and check out the open issues to get a better picture of what we are going to require into our application. Don’t forget that the more npm packages you use, the higher the risk of having a vulnerable or malicious one. If you’d like to read more on npm-related security risks, read our related guideline.

If you'd like to open the homepage of the module from the cli you can do:

$ npm home request

To check open issues or the publicly available roadmap (if there’s any), you can try this:

$ npm bugs request

Alternatively, if you'd just like to check a module's git repository, type this:

$ npm repo request

#4 Saving dependencies

Once you found the package you want to include in your project, you have to install and save it. The most common way of doing that is by using npm install request.

If you'd like to take that one step forward and automatically add it to your package.json file, you can do:

$ npm install request --save

npm will save your dependencies with the ^ prefix by default. It means that during the next npm install the latest module without a major version bump will be installed. To change this behaviour, you can:

$ npm config set save-prefix='~'

In case you'd like to save the exact version, you can try:

$ npm config set save-exact true

#5 Lock down dependencies

Even if you save modules with exact version numbers as shown in the previous section, you should be aware that most npm module authors don't. It’s totally fine, they do it to get patches and features automatically.

The situation can easily become problematic for production deployments: It’s possible to have different versions locally then on production, if in the meantime someone just released a new version. The problem will arise, when this new version has some bug which will affect your production system.

To solve this issue, you may want to use npm shrinkwrap. It will generate an npm-shrinkwrap.json that contains not just the exact versions of the modules installed on your machine, but also the version of its dependencies, and so on. Once you have this file in place, npm install will use it to reproduce the same dependency tree.

#6 Check for outdated dependencies

To check for outdated dependencies, npm comes with a built-in tool method the npm outdated command. You have to run in the project's directory which you'd like to check.

$ npm outdated
conventional-changelog    0.5.3   0.5.3   1.1.0  @risingstack/docker-node  
eslint-config-standard    4.4.0   4.4.0   6.0.1  @risingstack/docker-node  
eslint-plugin-standard    1.3.1   1.3.1   2.0.0  @risingstack/docker-node  
rimraf                    2.5.1   2.5.1   2.5.4  @risingstack/docker-node  

Once you maintain more projects, it can become an overwhelming task to keep all your dependencies up to date in each of your projects. To automate this task, you can use Greenkeeper which will automatically send pull requests to your repositories once a dependency is updated.

#7 No devDepenendencies in production

Development dependencies are called development dependencies for a reason - you don't have to install them in production. It makes your deployment artifacts smaller and more secure, as you will have less modules in production which can have security problems.

To install production dependencies only, run this:

$ npm install --production

Alternatively, you can set the NODE_ENV environment variable to production:

$ NODE_ENV=production npm install

"Don't install development dependencies in production" via @RisingStack #nodejs

Click To Tweet

#8 Secure your projects and tokens

In case of using npm with a logged in user, your npm token will be placed in the .npmrc file. As a lot of developers store dotfiles on GitHub, sometimes these tokens get published by accident. Currently, there are thousands of results when searching for the .npmrc file on GitHub, with a huge percentage containing tokens. If you have dotfiles in your repositories, double check that your credentials are not pushed!

Another source of possible security issues are the files which are published to npm by accident. By default npm respects the .gitignore file, and files matching those rules won't be published. However, if you add an .npmignore file, it will override the content of .gitignore - so they won't be merged.

#9 Developing packages

When developing packages locally, you usually want to try them out with one of your projects before publish to npm. This is where npm link comes to the rescue.

What npm link does is that it creates a symlink in the global folder that links to the package where the npm link was executed.

You can run npm link package-name from another location, to create a symbolic link from the globally installed package-name to the /node_modules directory of the current folder.

"Use npm link to test packages locally" via @RisingStack #nodejs

Click To Tweet

Let's see it in action!

# create a symlink to the global folder
/projects/request $ npm link

# link request to the current node_modules
/projects/my-server $ npm link request

# after running this project, the require('request') 
# will include the module from projects/request

Download the whole Learn using npm series as a single pdf

Next up on Node.js at Scale: SemVer and Module Publishing

The next article in the Node.js at Scale series will be a SemVer deep dive with how to publish Node.js modules.

Let me know if you have any questions in the comments!