node.js tutorial

Mastering Async Await in Node.js

Mastering Async Await in Node.js

In this article, you will learn how you can simplify your callback or Promise based Node.js application with async functions (async/await).

Asynchronous language constructs have been around in other languages for a while, like async/await in C#, coroutines in Kotlin and goroutines in Go. With the release of Node.js 8, the long awaited async functions have landed in Node.js as well.

By the end of this tutorial, you should be able to answer the following question too:

What are async functions in Node?

Async function declarations return an AsyncFunction object. These are similar to Generator-s in the sense that their execution can be halted. The only difference is that they always return a Promise instead of a { value: any, done: Boolean } object. In fact, they are so similar that you could gain similar functionality using the co package.

In an async function, you can await for any Promise or catch its rejection cause.

So if you had some logic implemented with promises:

function handler (req, res) {  
  return request('https://user-handler-service')
    .catch((err) => {
      logger.error('Http error', err)
      error.logged = true
      throw err
    })
    .then((response) => Mongo.findOne({ user: response.body.user }))
    .catch((err) => {
      !error.logged && logger.error('Mongo error', err)
      error.logged = true
      throw err
    })
    .then((document) => executeLogic(req, res, document))
    .catch((err) => {
      !error.logged && console.error(err)
      res.status(500).send()
    })
}

You can make it look like synchronous code using async/await:

async function handler (req, res) {  
  let response
  try {
    response = await request('https://user-handler-service')  
  } catch (err) {
    logger.error('Http error', err)
    return res.status(500).send()
  }

  let document
  try {
    document = await Mongo.findOne({ user: response.body.user })
  } catch (err) {
    logger.error('Mongo error', err)
    return res.status(500).send()
  }

  executeLogic(document, req, res)
}

In older versions of V8, unhandled promise rejections were silently dropped. Now at least you get a warning from Node, so you don’t necessarily need to bother with creating a listener. However, it is recommended to crash your app in this case as when you don’t handle an error, your app is in an unknown state:

process.on('unhandledRejection', (err) => {  
  console.error(err)
  process.exit(1)
})

Patterns with async functions

There are quite a couple of use cases when the ability to handle asynchronous operations as if they were synchronous comes very handy, as solving them with Promises or callbacks requires the use of complex patterns or external libraries.

These are cases when you need to loop through asynchronously gained data or use if-else conditionals.

Retry with exponential backoff

Implementing retry logic was pretty clumsy with Promises:

function requestWithRetry (url, retryCount) {  
  if (retryCount) {
    return new Promise((resolve, reject) => {
      const timeout = Math.pow(2, retryCount)

      setTimeout(() => {
        console.log('Waiting', timeout, 'ms')
        _requestWithRetry(url, retryCount)
          .then(resolve)
          .catch(reject)
      }, timeout)
    })
  } else {
    return _requestWithRetry(url, 0)
  }
}

function _requestWithRetry (url, retryCount) {  
  return request(url, retryCount)
    .catch((err) => {
      if (err.statusCode && err.statusCode >= 500) {
        console.log('Retrying', err.message, retryCount)
        return requestWithRetry(url, ++retryCount)
      }
      throw err
    })
}

requestWithRetry('http://localhost:3000')  
  .then((res) => {
    console.log(res)
  })
  .catch(err => {
    console.error(err)
  })

It gave me a headache just to look at it. We can rewrite it with async/await and make it a lot more simple.

function wait (timeout) {  
  return new Promise((resolve) => {
    setTimeout(() => {
      resolve()
    }, timeout)
  })
}

async function requestWithRetry (url) {  
  const MAX_RETRIES = 10
  for (let i = 0; i <= MAX_RETRIES; i++) {
    try {
      return await request(url)
    } catch (err) {
      const timeout = Math.pow(2, i)
      console.log('Waiting', timeout, 'ms')
      await wait(timeout)
      console.log('Retrying', err.message, i)
    }
  }
}

A lot more pleasing to the eye isn't it?

Intermediate values

Not as hideous as the previous example, but if you have a case where 3 asynchronous functions depend on each other the following way, then you have to choose from several ugly solutions.

functionA returns a Promise, then functionB needs that value and functionC needs the resolved value of both functionA's and functionB's Promise.

Solution 1: The .then Christmas tree
function executeAsyncTask () {  
  return functionA()
    .then((valueA) => {
      return functionB(valueA)
        .then((valueB) => {          
          return functionC(valueA, valueB)
        })
    })
}

With this solution, we get valueA from the surrounding closure of the 3rd then and valueB as the value the previous Promise resolves to. We cannot flatten out the Christmas tree as we would lose the closure and valueA would be unavailable for functionC.



Need help with enterprise-grade Node.js Development?
Hire the experts of RisingStack!


Solution 2: Moving to a higher scope
function executeAsyncTask () {  
  let valueA
  return functionA()
    .then((v) => {
      valueA = v
      return functionB(valueA)
    })
    .then((valueB) => {
      return functionC(valueA, valueB)
    })
}

In the Christmas tree, we used a higher scope to make valueA available as well. This case works similarly, but now we created the variable valueA outside the scope of the .then-s, so we can assign the value of the first resolved Promise to it.

This one definitely works, flattens the .then chain and is semantically correct. However, it also opens up ways for new bugs in case the variable name valueA is used elsewhere in the function. We also need to use two names — valueA and v — for the same value.

Solution 3: The unnecessary array
function executeAsyncTask () {  
  return functionA()
    .then(valueA => {
      return Promise.all([valueA, functionB(valueA)])
    })
    .then(([valueA, valueB]) => {
      return functionC(valueA, valueB)
    })
}

There is no other reason for valueA to be passed on in an array together with the Promise functionB then to be able to flatten the tree. They might be of completely different types, so there is a high probability of them not belonging to an array at all.

Solution 4: Write a helper function
const converge = (...promises) => (...args) => {  
  let [head, ...tail] = promises
  if (tail.length) {
    return head(...args)
      .then((value) => converge(...tail)(...args.concat([value])))
  } else {
    return head(...args)
  }
}

functionA(2)  
  .then((valueA) => converge(functionB, functionC)(valueA))

You can, of course, write a helper function to hide away the context juggling, but it is quite difficult to read, and may not be straightforward to understand for those who are not well versed in functional magic.

By using async/await our problems are magically gone:
async function executeAsyncTask () {  
  const valueA = await functionA()
  const valueB = await functionB(valueA)
  return function3(valueA, valueB)
}

Multiple parallel requests with async/await

This is similar to the previous one. In case you want to execute several asynchronous tasks at once and then use their values at different places, you can do it easily with async/await:

async function executeParallelAsyncTasks () {  
  const [ valueA, valueB, valueC ] = await Promise.all([ functionA(), functionB(), functionC() ])
  doSomethingWith(valueA)
  doSomethingElseWith(valueB)
  doAnotherThingWith(valueC)
}

As we've seen in the previous example, we would either need to move these values into a higher scope or create a non-semantic array to pass these value on.

Array iteration methods

You can use map, filter and reduce with async functions, although they behave pretty unintuitively. Try guessing what the following scripts will print to the console:

  1. map
function asyncThing (value) {  
  return new Promise((resolve, reject) => {
    setTimeout(() => resolve(value), 100)
  })
}

async function main () {  
  return [1,2,3,4].map(async (value) => {
    const v = await asyncThing(value)
    return v * 2
  })
}

main()  
  .then(v => console.log(v))
  .catch(err => console.error(err))
  1. filter
function asyncThing (value) {  
  return new Promise((resolve, reject) => {
    setTimeout(() => resolve(value), 100)
  })
}

async function main () {  
  return [1,2,3,4].filter(async (value) => {
    const v = await asyncThing(value)
    return v % 2 === 0
  })
}

main()  
  .then(v => console.log(v))
  .catch(err => console.error(err))
  1. reduce
function asyncThing (value) {  
  return new Promise((resolve, reject) => {
    setTimeout(() => resolve(value), 100)
  })
}

async function main () {  
  return [1,2,3,4].reduce(async (acc, value) => {
    return await acc + await asyncThing(value)
  }, Promise.resolve(0))
}

main()  
  .then(v => console.log(v))
  .catch(err => console.error(err))

Solutions:

  1. [ Promise { <pending> }, Promise { <pending> }, Promise { <pending> }, Promise { <pending> } ]
  2. [ 1, 2, 3, 4 ]
  3. 10

If you log the returned values of the iteratee with map you will see the array we expect: [ 2, 4, 6, 8 ]. The only problem is that each value is wrapped in a Promise by the AsyncFunction.

So if you want to get your values, you'll need to unwrap them by passing the returned array to a Promise.all:

main()  
  .then(v => Promise.all(v))
  .then(v => console.log(v))
  .catch(err => console.error(err))

Originally, you would first wait for all your promises to resolve and then map over the values:

function main () {  
  return Promise.all([1,2,3,4].map((value) => asyncThing(value)))
}

main()  
  .then(values => values.map((value) => value * 2))
  .then(v => console.log(v))
  .catch(err => console.error(err))

This seems a bit more simple isn't it?

The async/await version can still be useful if you have some long running synchronous logic in your iteratee and another long-running async task.

This way you can start calculating as soon as you have the first value - you don't have to wait for all the Promises to be resolved to run your computations. Even though the results will still be wrapped in Promises, those are resolved a lot faster then if you did it the sequential way.

What about filter? Something is clearly wrong...

Well, you guessed it: even though the returned values are [ false, true, false, true ], they will be wrapped in promises, which are truthy, so you'll get back all the values from the original array. Unfortunately, all you can do to fix this is to resolve all the values and then filter them.

Reducing is pretty straightforward. Bear in mind though that you need to wrap the initial value into Promise.resolve, as the returned accumulator will be wrapped as well and has to be await-ed.

.. As it is pretty clearly intended to be used for imperative code styles.

To make your .then chains more "pure" looking, you can use Ramda's pipeP and composeP functions.



Need help with enterprise-grade Node.js Development?
Hire the experts of RisingStack!


Rewriting callback-based Node.js applications

Async functions return a Promise by default, so you can rewrite any callback based function to use Promises, then await their resolution. You can use the util.promisify function in Node.js to turn callback-based functions to return a Promise-based ones.

Rewriting Promise-based applications

Simple .then chains can be upgraded in a pretty straightforward way, so you can move to using async/await right away.

function asyncTask () {  
  return functionA()
    .then((valueA) => functionB(valueA))
    .then((valueB) => functionC(valueB))
    .then((valueC) => functionD(valueC))
    .catch((err) => logger.error(err))
}

will turn into

async function asyncTask () {  
  try {
    const valueA = await functionA()
    const valueB = await functionB(valueA)
    const valueC = await functionC(valueB)
    return await functionD(valueC)
  } catch (err) {
    logger.error(err)
  }
}

Rewriting Node.js apps with async/await

  • If you liked the good old concepts of if-else conditionals and for/while loops,
  • if you believe that a try-catch block is the way errors are meant to be handled,

you will have a great time rewriting your services using async/await.

As we have seen, it can make several patterns a lot more easier to code and read, so it is definitely more suitable in several cases than Promise.then() chains. However, if you are caught up in the functional programming craze of the past years, you might wanna pass on this language feature.

So what do you guys think? Is async/await is the next best thing since the invention of sliced bread, or is it just as controversial as the addition of class was in es2015?

Are you already using async/await it in production, or you plan on never touching it? Let's discuss it in the comments below.

Building a Node.js App with TypeScript Tutorial

Building a Node.js App with TypeScript Tutorial

This tutorial teaches how you can build, structure, test and debug a Node.js application written in TypeScript. To do so, we use an example project which you can access anytime later.


Managing large-scale JavaScript projects can be challenging, as you need to guarantee that the pieces fit together. You can use unit tests, types (which JavaScript does not really have), or the two in combination to solve this issue.

This is where TypeScript comes into the picture. TypeScript is a typed superset of JavaScript that compiles to plain JavaScript.

In this article you will learn:

  • what TypeScript is,
  • what are the benefits of using Typescript,
  • how you can set up a project to start developing using it:
    • how to add linters,
    • how to write tests,
    • how to debug applications written in TypeScript

This article won't go into to details of using the TypeScript language itself, it focuses on how you can build Node.js applications using it. If you are looking for an in-depth TypeScript tutorial, I recommend checking out the TypeScript Gitbook.

The benefits of using TypeScript

As we already discussed, TypeScript is a superset of Javascript. It gives you the following benefits:

  • optional static typing, with emphasis on optional (it makes porting JavaScript application to TypeScript easy),
  • as a developer, you can start using ECMAScript features that are not supported by the current V8 engine by using build targets,
  • use of interfaces,
  • great tooling with instruments like IntelliSense.

Getting started with TypeScript & Node

TypeScript is a static type checker for JavaScript. This means that it will check for issues in your codebase using the information available on different types. Example: a String will have a toLowerCase() method, but not a parseInt() method. Of course, the type system of TypeScript can be extended with your own type definitions.

As TypeScript is a superset of JavaScript, you can start using it by literally just renaming your .js files to .ts, so you can introduce TypeScript gradually to your teams.

Note: TypeScript won't do anything in runtime, it works only during compilation time. You will run pure JavaScript files.


To get started with TypeScript, grab it from npm:

$ npm install -g typescript

Let's write our first TypeScript file! It will simply greet the person it gets as a parameter:

// greeter.ts
function greeter(person: string) {  
  return `Hello ${person}!`
}

const name = 'Node Hero'

console.log(greeter(name))  

One thing you could already notice is the string type annotation which tells the TypeScript compiler that the greeter function is expecting a string as its parameter.

Let's try to compile it!

tsc greeter.ts  

First, let's take a look at the compiled output! Ss you can see, there was no major change, only that the type annotations were removed:

function greeter(person) {  
    return "Hello " + person + "!";
}
var userName = 'Node Hero';  
console.log(greeter(userName));  

What would happen if you'd change the userName to a Number? As you could guess, you will get a compilation error:

greeter.ts(10,21): error TS2345: Argument of type '3' is not assignable to parameter of type 'string'.  

Tutorial: Building a Node.js app with TypeScript

1. Set up your development environment

To build applications using TypeScript, make sure you have Node.js installed on your system. This article will use Node.js 8.

We recommend installing Node.js using nvm, the Node.js version manager. With this utility application, you can have multiple Node.js versions installed on your system, and switching between them is only a command away.

# install nvm
curl -o- https://raw.githubusercontent.com/creationix/nvm/v0.33.2/install.sh | bash

# install node 8
nvm install 8

# to make node 8 the default
nvm alias default 8  

Once you have Node.js 8 installed, you should create a directory where your project will live. After that, create your package.json file using:

npm init  

2. Create the project structure

When using TypeScript, it is recommended to put all your files under an src folder.

At the end of this tutorial, we will end up with the following project structure:

Node.js TypeScript Tutorial - Example Application Project Structure

Let's start by adding the App.ts file - this will be the file where your web server logic will be implemented, using express.

In this file, we are creating a class called App, which will encapsulate our web server. It has a private method called mountRoutes, which mounts the routes served by the server. The express instance is reachable through the public express property.

import * as express from 'express'

class App {  
  public express

  constructor () {
    this.express = express()
    this.mountRoutes()
  }

  private mountRoutes (): void {
    const router = express.Router()
    router.get('/', (req, res) => {
      res.json({
        message: 'Hello World!'
      })
    })
    this.express.use('/', router)
  }
}

export default new App().express  

We are also creating an index.ts file, so the web server can be fired up:

import app from './App'

const port = process.env.PORT || 3000

app.listen(port, (err) => {  
  if (err) {
    return console.log(err)
  }

  return console.log(`server is listening on ${port}`)
})

With this - at least in theory - we have a functioning server. To actually make it work, we have to compile our TypeScript code to JavaScript.

For more information on how to structure your project, read our Node.js project structuring article.

3. Configuring TypeScript

You can pass options to the TypeScript compiler by either by using the CLI, or a special file called tsconfig.json. As we would like to use the same settings for different tasks, we will go with the tsconfig.json file.

By using this configuration file, we are telling TypeScript things like the build target (can be ES5, ES6, and ES7 at the time of this writing), what module system to expect, where to put the build JavaScript files, or whether it should create source-maps as well.

{
  "compilerOptions": {
    "target": "es6",
    "module": "commonjs",
    "outDir": "dist",
    "sourceMap": true
  },
  "files": [
    ".[email protected]/mocha/index.d.ts",
    ".[email protected]/node/index.d.ts"
  ],
  "include": [
    "src/**/*.ts"
  ],
  "exclude": [
    "node_modules"
  ]
}

Once you added this TypeScript configuration file, you can build your application using the tsc command.

If you do not want to install TypeScript globally, just add it to the dependency of your project, and create an npm script for it: "tsc": "tsc".

This will work, as npm scripts will look for the binary in the ./node_modules/.bin folder, and add it to the PATH when running scripts. Then you can access tsc using npm run tsc. Then, you can pass options to tsc using this syntax: npm run tsc -- --all (this will list all the available options for TypeScript).



Need help with enterprise-grade Node.js Development?
Hire the experts of RisingStack!


4. Add ESLint

As with most projects, you want to have linters to check for style issues in your code. TypeScript is no exception.

To use ESLint with TypeScript, you have to add an extra package, a parser, so ESLint can understand Typescript as well: typescript-eslint-parser. Once you installed it, you have to set it as the parser for ESLint:

# .eslintrc.yaml
---
  extends: airbnb-base
  env:
    node: true
    mocha: true
    es6: true
  parser: typescript-eslint-parser
  parserOptions:
    sourceType: module
    ecmaFeatures: 
      modules: true

Once you run eslint src --ext ts, you will get the same errors and warnings for your TypeScript files that you are used to:

Node.js TypeScript Tutorial - Console Errors

5. Testing your application

Testing your TypeScript-based applications is essentially the same as you would do it with any other Node.js applications.

The only gotcha is that you have to compile your application before actually running the tests on them. Achieving it is very straightforward, you can simply do it with: tsc && mocha dist/**/*.spec.js.

For more on testing, check out our Node.js testing tutorial.

6. Build a Docker image

Once you have your application ready, most probably you want to deploy it as a Docker image. The only extra steps you need to take are:

  • build the application (compile from TypeScript to JavaScript),
  • start the Node.js application from the built source.
FROM risingstack/alpine:3.4-v6.9.4-4.2.0

ENV PORT 3001

EXPOSE 3001

COPY package.json package.json  
RUN npm install

COPY . .  
RUN npm run build

CMD ["node", "dist/"]  

7. Debug using source-maps

As we enabled generating source-maps, we can use them to find bugs in our application. To start looking for issues, start your Node.js process the following way:

node --inspect dist/  

This will output something like the following:

To start debugging, open the following URL in Chrome:  
    chrome-devtools:[email protected]84980/inspector.html?experiments=true&v8only=true&ws=127.0.0.1:9229/23cd0c34-3281-49d9-81c8-8bc3e0bc353a
server is listening on 3000  

To actually start the debugging process, open up your Google Chrome browser and browse to chrome://inspect. A remote target should already be there, just click inspect. This will bring up the Chrome DevTools.

Here, you will instantly see the original source, and you can start putting breakpoints, watchers on the TypeScript source code.

Node.js TypeScript Tutorial - Debugging in Chrome

The source-map support only works with Node.js 8 and higher.

The Complete Node.js TypeScript Tutorial

You can find the complete Node.js TypeScript starter application on GitHub.

Let us know in the issues, or here in the comments what would you change!

Node.js + MySQL Example: Handling 100's of GigaBytes of Data

Node.js + MySQL Example: Handling 100's of GigaBytes of Data

Through this Node.js & MySQL example project, we will take a look at how you can efficiently handle billions of rows that take up hundreds of gigabytes of storage space.

My secondary goal with this article is to help you decide if Node.js + MySQL is a good fit for your needs, and to provide help with implementing such a solution.

The actual code we will use throughout this blogpost can be found on GitHub.

Why Node.js and MySQL?

We use MySQL to store the distributed tracing data of the users of our Node.js Monitoring & Debugging Tool called Trace.

We chose MySQL, because at the time of the decision, Postgres was not really good at updating rows, while for us, updating immutable data would have been unreasonably complex.

Unfortunately, these solutions are not ACID compliant which makes them difficult to use when data consistency is extremely important.

However, with good indexing and proper planning, MySQL can be just as suitable for the task as the above-mentioned NoSQL alternatives.

MySQL has several storage engines. InnoDB is the default one, which comes with the most features. However, one should take into account that InnoDB tables are immutable, meaning every ALTER TABLE statement will copy all the data into a new table. It will make matters worse when the need arises to migrate an already existing database.

If you have nominal values, each having a lot of associated data — e.g. each of your users have millions of products and you have tons of users — it is probably the easiest by creating tables for each of them and giving them names like <user_id>_<entity_name>. This way you can reduce the size of individual tables significantly.

Also, getting rid of a user's data in case of an account removal is an O(1) operation. This is very important, because if you need to remove large amount of values from big tables, MySQL may decide to use the wrong index or not to use indexes at all.

It does not help either that you cannot use index hints for DELETEs. You might need to ALTER your table to remove your data, but that would mean copying each row to a new table.

Creating tables for each user clearly adds complexity, but it may be a big win when it comes to removing users or similar entities with huge amount of associated data.

However, before going for dynamically created tables, you should try deleting rows in chunks as it may help as well and results in less added complexity. Of course, if you have data coming in faster than you can delete, you might get stuck with the aforementioned solution.

But what if your tables are still huge after partitioning them by users and you need to delete outdated rows as well? You still have data coming in faster than you can remove. In this case, you should try MySQL's built in table partitioning. It comes handy when you need to cut your tables by values that are defined on an ordinal or continuous scale, such as a creation timestamp.

Table partitioning with MySQL

With MySQL, a partitioned table will work as if it was multiple tables, but you can use the same interface you got used to, while no additional logic is needed from the application's side. This also means you can drop partitions as if you dropped tables.

The documentation is good, but pretty verbose as well (after all this is not a simple topic), so let's take a quick look at how you should create a partitioned table.

The way we handled our partitions was taken from Rick James's post on the topic. He also gives quite some insight on how you should plan your tables.

CREATE TABLE IF NOT EXISTS tbl (  
      id INTEGER NOT NULL AUTO_INCREMENT,
      data VARCHAR(255) NOT NULL,
      created_at DATETIME NOT NULL DEFAULT CURRENT_TIMESTAMP,
      PRIMARY KEY (id, created_at)
    )

PARTITION BY RANGE (TO_DAYS(created_at)) (  
        start        VALUES LESS THAN (0),
        from20170514 VALUES LESS THAN (TO_DAYS('2017-05-15')),
        from20170515 VALUES LESS THAN (TO_DAYS('2017-05-16')),
        from20170516 VALUES LESS THAN (TO_DAYS('2017-05-17')),
        future       VALUES LESS THAN MAXVALUE
    );

It is nothing unusual until the part PARTITION BY RANGE.

In MySQL, you can partition by RANGE, LIST, COLUMN, HASH and KEY you can read about them in the documentation. Notice that the partitioning key must be part of the primary key or any unique indexes.

The ones starting with from<date> should be self-explanatory. Each partition holds values for which the created_at column is less than the date of the next day. This also means that from20120414 holds all data that are older than 2012-04-15, so this is the partition that we will drop when we perform the cleanup.

The future and start partitions need some explanation: future holds the values for the days we have not yet defined. So if we cannot run repartitioning in time, all data that arrives on 2017-05-17 or later will end up there, making sure we don't lose any of it. start serves as a safety net as well. We expect all rows to have a DATETIME created_at value, however, we need to be prepared for possible errors. If for some reason a row would end up having NULL there, it will end up in the start partition, serving as a sign that we have some debugging to do.

When you use partitioning, MySQL will keep that data on separate parts of the disk as if they were separate tables and organizes your data automatically based on the partitioning key.

There are some restrictions to be taken into account though:

  • Query cache is not supported.
  • Foreign keys are not supported for partitioned InnoDB tables.
  • Partitioned tables do not support FULLTEXT indexes or searches.

There are a lot more, but these are the ones that we felt the most constraining after adopting partitioned tables at RisingStack.

If you want to create a new partition, you need to reorganize an existing one and split it to fit your needs:

ALTER TABLE tbl  
    REORGANIZE PARTITION future INTO (
        from20170517 VALUES LESS THAN (TO_DAYS('2017-05-18')),
        from20170518 VALUES LESS THAN (TO_DAYS('2017-05-19')),
        PARTITION future VALUES LESS THAN MAXVALUE
);

Dropping partitions takes an alter table, yet it runs as if you dropped a table:

ALTER TABLE tbl  
    DROP PARTITION from20170517, from20170518;

As you can see you have to include the actual names and descriptions of the partitions in the statements. They cannot be dynamically generated by MySQL, so you have to handle it in the application logic. That's what we'll cover next.

Table partitioning example with Node.js & MySQL

Let's see the actual solution. For the examples here, we will use knex, which is a query builder for JavaScript. In case you are familiar with SQL, you shouldn't have any problem understanding the code.

First, let's create the table:

const dedent = require('dedent')  
const _ = require('lodash')  
const moment = require('moment')

const MAX_DATA_RETENTION = 7  
const PARTITION_NAME_DATE_FORMAT = 'YYYYMMDD'

Table.create = function () {  
  return knex.raw(dedent`
    CREATE TABLE IF NOT EXISTS \`${tableName}\` (
      \`id\` INTEGER NOT NULL AUTO_INCREMENT,
      \`data\` VARCHAR(255) NOT NULL,
      \`created_at\` DATETIME NOT NULL DEFAULT CURRENT_TIMESTAMP,
      PRIMARY KEY (\`id\`, \`created_at\`)
    )
    PARTITION BY RANGE ( TO_DAYS(\`created_at\`)) (
      PARTITION \`start\` VALUES LESS THAN (0),
      ${Table.getPartitionStrings()}
      PARTITION \`future\` VALUES LESS THAN MAXVALUE
    );
  `)
}

Table.getPartitionStrings = function () {  
  const days = _.range(MAX_DATA_RETENTION - 2, -2, -1)
  const partitions = days.map((day) => {
    const tomorrow = moment().subtract(day, 'day').format('YYYY-MM-DD')
    const today = moment().subtract(day + 1, 'day').format(PARTITION_NAME_DATE_FORMAT)
    return `PARTITION \`from${today}\` VALUES LESS THAN (TO_DAYS('${tomorrow}')),`
  })
  return partitions.join('\n')
}

It is practically the same statement we saw earlier, but we have to create the names and descriptions of partitions dynamically. That's why we created the getPartitionStrings method.

The first row is:

const days = _.range(MAX_DATA_RETENTION - 2, -2, -1)  

MAX_DATA_RETENTION - 2 = 5 creates an sequence from 5 to -2 (last value exclusive) -> [ 5, 4, 3, 2, 1, 0, -1 ], then we subtract these values from the current time and create the name of the partition (today) and its' limit (tomorrow). The order is vital as MySQL throws an error if the values to partition by do not grow constantly in the statement.

Large Scale Data Removal Example with MySQL and Node.js

Now let's take a step by step look at data removal. You can see the whole code here.

The first method, removeExpired gets the list of current partitions then passes it on to repartition.

const _ = require('lodash')

Table.removeExpired = function (dataRetention) {  
  return Table.getPartitions()
    .then((currentPartitions) => Table.repartition(dataRetention, currentPartitions))
}

Table.getPartitions = function () {  
  return knex('information_schema.partitions')
    .select(knex.raw('partition_name as name'), knex.raw('partition_description as description')) // description holds the day of partition in mysql days
    .where('table_schema', dbName)
    .andWhere('partition_name', 'not in', [ 'start', 'future' ])
    .then((partitions) => partitions.map((partition) => ({
      name: partition.name,
      description: partition.description === 'MAX_VALUE' ? 'MAX_VALUE' : parseInt(partition.description)
    })))
}

Table.repartition = function (dataRetention, currentPartitions) {  
  const partitionsThatShouldExist = Table.getPartitionsThatShouldExist(dataRetention, currentPartitions)

  const partitionsToBeCreated = _.differenceWith(partitionsThatShouldExist, currentPartitions, (a, b) => a.description === b.description)
  const partitionsToBeDropped = _.differenceWith(currentPartitions, partitionsThatShouldExist, (a, b) => a.description === b.description)

  const statement = dedent
    `${Table.reorganizeFuturePartition(partitionsToBeCreated)}
    ${Table.dropOldPartitions(partitionsToBeDropped)}`

  return knex.raw(statement)
}

First, we select all currently existing partitions from the information_schema.partitions table that is maintained by MySQL.

Then we create all the partitions that should exist for the table. If A is the set of partitions that exist and B is set of partitions that should exist then

partitionsToBeCreated = B \ A

partitionsToBeDropped = A \ B.

getPartitionsThatShouldExist creates set B.

Table.getPartitionsThatShouldExist = function (dataRetention, currentPartitions) {  
  const days = _.range(dataRetention - 2, -2, -1)
  const oldestPartition = Math.min(...currentPartitions.map((partition) => partition.description))
  return days.map((day) => {
    const tomorrow = moment().subtract(day, 'day')
    const today = moment().subtract(day + 1, 'day')
    if (Table.getMysqlDay(today) < oldestPartition) {
      return null
    }

    return {
      name: `from${today.format(PARTITION_NAME_DATE_FORMAT)}`,
      description: Table.getMysqlDay(tomorrow)
    }
  }).filter((partition) => !!partition)
}

Table.getMysqlDay = function (momentDate) {  
  return momentDate.diff(moment([ 0, 0, 1 ]), 'days') // mysql dates are counted since 0 Jan 1 00:00:00
}

The creation of partition objects is quite similar to the creation of the CREATE TABLE ... PARTITION BY RANGE statement. It is also vital to check if the partition we are about to create is older than the current oldest partition: it is possible that we need to change the dataRetention over time.

Take this scenario for example:

Imagine that your users start out with 7 days of data retention, but have an option to upgrade it to 10 days. In the beginning the user has partitions that cover days in the following order: [ start, -7, -6, -5, -4, -3, -2, -1, future ]. After a month or so, a user decides to upgrade. The missing partitions are in this case: [ -10, -9, -8, 0 ].

At cleanup, the current script would try to reorganize the future partition for the missing partitions appending them after the current ones.

Creating partitions for days older than -7 does not make sense in the first place because that data was meant to be thrown away so far anyways, and it would also lead to a partition list that looks like [ start, -7, -6, -5, -4, -3, -2, -1, -10, -9, -8, 0, future ] which isn't monotonously increasing, thus MySQL will throw an error, and the cleanup will fail.

MySQL's TO_DAYS(date) function calculates the number of days passed since year 0 January 1, so we replicate this in JavaScript.

Table.getMysqlDay = function (momentDate) {  
  return momentDate.diff(moment([ 0, 0, 1 ]), 'days')
}

Now that we have the partitions that have to be dropped, and the partitions that have to be created, let's create our new partition first for the new day.

Table.reorganizeFuturePartition = function (partitionsToBeCreated) {  
  if (!partitionsToBeCreated.length) return '' // there should be only one every day, and it is run hourly, so ideally 23 times a day it should be a noop
  const partitionsString = partitionsToBeCreated.map((partitionDescriptor) => {
    return `PARTITION \`${partitionDescriptor.name}\` VALUES LESS THAN (${partitionDescriptor.description}),`
  }).join('\n')

  return dedent`
    ALTER TABLE \`${tableName}\`
      REORGANIZE PARTITION future INTO (
        ${partitionsString}
        PARTITION \`future\` VALUES LESS THAN MAXVALUE
      );`
}

We simply prepare a statement for the new partition(s) to be created.

We run this script hourly just to make sure nothing goes astray and we are able to perform the cleanup properly at least once a day.

So the first thing to check is if there's a partition to be created at all. This should happen only at the first run, then be a noop 23 times a day.

We also have to drop the outdated partitions.

Table.dropOldPartitions = function (partitionsToBeDropped) {  
  if (!partitionsToBeDropped.length) return ''
  let statement = `ALTER TABLE \`${tableName}\`\nDROP PARTITION\n`
  statement += partitionsToBeDropped.map((partition) => {
    return partition.name
  }).join(',\n')
  return statement + ';'
}

This method creates the same ALTER TABLE ... DROP PARTITION statement we saw earlier.

And finally, everything is ready for the reorganization.

  const statement = dedent
    `${Table.reorganizeFuturePartition(partitionsToBeCreated)}
    ${Table.dropOldPartitions(partitionsToBeDropped)}`

  return knex.raw(statement)

Wrapping it up

As you can see, contrary to popular belief, ACID compliant DBMS solutions such as MySQL can be used when you are handling large amounts of data, so you don't necessarily need to give up the features of transactional databases.

However, table partitioning comes with quite a few restrictions, meaning you are cut off from using all the power InnoDB provides for keeping your data consistent. You might also have to handle in the app logic what otherwise would be available such as foreign key constraints or full-text searches.

I hope this post helps you decide whether MySQL is a good fit for your needs and helps you implement your solution. Until next time: Happy engineering!

If you have any Node + MySQL questions, let me know in the comments below!

Mastering the Node.js CLI & Command Line Options

Mastering the Node.js CLI & Command Line Options

Node.js comes with a lot of CLI options to expose built-in debugging & to modify how V8, the JavaScript engine works.

In this post, we have collected the most important CLI commands to help you become more productive.

Accessing Node.js CLI Options

To get a full list of all available Node.js CLI options in your current distribution of Node.js, you can access the manual page from the terminal using:

$ man node

Usage: node [options] [ -e script | script.js ] [arguments]  
       node debug script.js [arguments] 

Options:  
  -v, --version         print Node.js version
  -e, --eval script     evaluate script
  -p, --print           evaluate script and print result
  -c, --check           syntax check script without executing
...

As you can see in the first usage section, you have to provide the optional options before the script you want to run.

Take the following file:

console.log(new Buffer(100))  

To take advantage of the --zero-fill-buffers option, you have to run your application using:

$ node --zero-fill-buffers index.js

This way the application will produce the correct output, instead of random memory garbage:

<Buffer 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ... >  

CLI Options

Now as we saw how you instruct Node.js to use CLI options, let's see what other options are there!

--version or -v

Using the node --version, or short, node -v, you can print the version of Node.js you are using.

$ node -v
v6.10.0  

--eval or -e

Using the --eval option, you can run JavaScript code right from your terminal. The modules which are predefined in REPL can also be used without requiring them, like the http or the fs module.

$ node -e 'console.log(3 + 2)'
5  

--print or -p

The --print option works the same way as the --eval, however it prints the result of the expression. To achieve the same output as the previous example, we can simply leave the console.log:

$ node -p '3 + 2'
5  

--check or -c

Available since v4.2.0

The --check option instructs Node.js to check the syntax of the provided file, without actually executing it.

Take the following example again:

console.log(new Buffer(100)  

As you can see, a closing ) is missing. Once you run this file using node index.js, it will produce the following output:

/Users/gergelyke/Development/risingstack/mastering-nodejs-cli/index.js:1
(function (exports, require, module, __filename, __dirname) { console.log(new Buffer(100)
                                                                                        ^
SyntaxError: missing ) after argument list  
    at Object.exports.runInThisContext (vm.js:76:16)
    at Module._compile (module.js:542:28)
    at Object.Module._extensions..js (module.js:579:10)
    at Module.load (module.js:487:32)
    at tryModuleLoad (module.js:446:12)
    at Function.Module._load (module.js:438:3)
    at Module.runMain (module.js:604:10)
    at run (bootstrap_node.js:394:7)

Using the --check option you can check for the same issue, without executing the script, using node --check index.js. The output will be similar, except you won't get the stack trace, as the script never ran:

/Users/gergelyke/Development/risingstack/mastering-nodejs-cli/index.js:1
(function (exports, require, module, __filename, __dirname) { console.log(new Buffer(100)
                                                                                        ^
SyntaxError: missing ) after argument list  
    at startup (bootstrap_node.js:144:11)
    at bootstrap_node.js:509:3

The --check option can come handy, when you want to see if your script is syntactically correct, without executing it.


Expert help when you need it the most

Commercial Node.js Support by RisingStack
Learn more


--inspect[=host:port]

Available since v6.3.0

Using node --inspect will activate the inspector on the provided host and port. If they are not provided, the default is 127.0.0.1:9229. The debugging tools attached to Node.js instances communicate via a tcp port using the Chrome Debugging Protocol.

--inspect-brk[=host:port]

Available since v7.6.0

The --inspect-brk has the same functionality as the --inspect option, however it pauses the execution at the first line of the user script.

$ node --inspect-brk index.js 
Debugger listening on port 9229.  
Warning: This is an experimental feature and could change at any time.  
To start debugging, open the following URL in Chrome:  
    chrome-devtools://devtools/bundled/inspector.html?experiments=true&v8only=true&ws=127.0.0.1:9229/86dd44ef-c865-479e-be4d-806d622a4813

Once you have ran this command, just copy and paste the URL you got to start debugging your Node.js process.

--zero-fill-buffers

Available since v6.0.0

Node.js can be started using the --zero-fill-buffers command line option to force all newly allocated Buffer instances to be automatically zero-filled upon creation. The reason to do so is that newly allocated Buffer instances can contain sensitive data.

It should be used when it is necessary to enforce that newly created Buffer instances cannot contain sensitive data, as it has significant impact on performance.

Also note, that some Buffer constructors got deprecated in v6.0.0:

  • new Buffer(array)
  • new Buffer(arrayBuffer[, byteOffset [, length]])
  • new Buffer(buffer)
  • new Buffer(size)
  • new Buffer(string[, encoding])

Instead, you should use Buffer.alloc(size[, fill[, encoding]]), Buffer.from(array), Buffer.from(buffer), Buffer.from(arrayBuffer[, byteOffset[, length]]) and Buffer.from(string[, encoding]).

You can read more on the security implications of the Buffer module on the Synk blog.

--prof-process

Using the --prof-process, the Node.js process will output the v8 profiler output.

To use it, first you have to run your applications using:

node --prof index.js  

Once you ran it, a new file will be placed in your working directory, with the isolate- prefix.

Then, you have to run the Node.js process with the --prof-process option:

node --prof-process isolate-0x102001600-v8.log > output.txt  

This file will contain metrics from the V8 profiler, like how much time was spent in the C++ layer, or in the JavaScript part, and which function calls took how much time. Something like this:

[C++]:
   ticks  total  nonlib   name
     16   18.4%   18.4%  node::ContextifyScript::New(v8::FunctionCallbackInfo<v8::Value> const&)
      4    4.6%    4.6%  ___mkdir_extended
      2    2.3%    2.3%  void v8::internal::String::WriteToFlat<unsigned short>(v8::internal::String*, unsigned short*, int, int)
      2    2.3%    2.3%  void v8::internal::ScavengingVisitor<(v8::internal::MarksHandling)1, (v8::internal::LoggingAndProfiling)0>::ObjectEvacuationStrategy<(v8::internal::ScavengingVisitor<(v8::internal::MarksHandling)1, (v8::internal::LoggingAndProfiling)0>::ObjectContents)1>::VisitSpecialized<24>(v8::internal::Map*, v8::internal::HeapObject**, v8::internal::HeapObject*)

[Summary]:
   ticks  total  nonlib   name
      1    1.1%    1.1%  JavaScript
     70   80.5%   80.5%  C++
      5    5.7%    5.7%  GC
      0    0.0%          Shared libraries
     16   18.4%          Unaccounted

To get a full list of Node.js CLI options, check out the official documentation here.


V8 Options

You can print all the available V8 options using the --v8-options command line option.

Currently V8 exposes more than a 100 command line options - here we just picked a few to showcase some of the functionality they can provide. Some of these options can drastically change how V8 behaves, use them with caution!

--harmony

With the harmony flag, you can enable all completed harmony features.

--max_old_space_size

With this option, you can set the maximum size of the old space on the heap, which directly affects how much memory your process can allocate.

This setting can come handy when you run in low memory environments.

--optimize_for_size

With this option, you can instruct V8 to optimize the memory space for size - even if the application gets slower.

Just as the previous option, it can be useful in low memory environments.


Environment Variables

NODE_DEBUG=module[,…]

Setting this environment variable enables the core modules to print debug information. You can run the previous example like this to get debug information on the module core component (instead of module, you can go for http, fs, etc...):

$ NODE_DEBUG=module node index.js

The output will be something like this:

MODULE 7595: looking for "/Users/gergelyke/Development/risingstack/mastering-nodejs-cli/index.js" in ["/Users/gergelyke/.node_modules","/Users/gergelyke/.node_libraries","/Users/gergelyke/.nvm/versions/node/v6.10.0/lib/node"]  
MODULE 7595: load "/Users/gergelyke/Development/risingstack/mastering-nodejs-cli/index.js" for module "."  

NODE_PATH=path

Using this setting, you can add extra paths for the Node.js process to search modules in.

OPENSSL_CONF=file

Using this environment variable, you can load an OpenSSL configuration file on startup.

For a full list of supported environment variables, check out the official Node.js docs.

Let's Contribute to the CLI related Node Core Issues!

As you can see, the CLI is a really useful tool which becomes better with each Node version!

If you'd like to contribute to its advancement, you can help by checking out the currently open issues at https://github.com/nodejs/node/labels/cli !

Node.js Command Line Interface CLI Issues

Digital Transformation with the Node.js Stack

Digital Transformation with the Node.js Stack

In this article, we explore the 9 main areas of digital transformation and show what are the benefits of implementing Node.js. At the end, we’ll lay out a Digital Transformation Roadmap to help you get started with this process.

Note, that implementing Node.js is not the goal of a digital transformation project - it is just a great tool that opens up possibilities that any organization can take advantage of.


Digital transformation is achieved by using modern technology to radically improve the performance of business processes, applications, or even whole enterprises.

One of the available technologies that enable companies to go through a major performance shift is Node.js and its ecosystem. It is a tool that grants improvement opportunities that organizations should take advantage of:

  • Increased developer productivity,
  • DevOps or NoOps practices,
  • and shipping software to production in brief time using the proxy approach,

just to mention a few.

The 9 Areas of Digital Transformation

Digital Transformation projects can improve a company in nine main areas. The following elements were identified as a result of an MIT research on digital transformation, where they interviewed 157 executives from 50 companies (typically $1 billion or more in annual sales).

#1. Understanding your Customers Better

Companies are heavily investing in systems to understand specific market segments and geographies better. They have to figure out what leads to customer happiness and customer dissatisfaction.

Many enterprises are building analytics capabilities to better understand their customers. Information derived this way can be used for data-driven decisions.

#2. Achieving Top-Line Growth

Digital transformation can also be used to enhance in-person sales conversations. Instead of paper-based presentations or slides, salespeople can use great looking, interactive presentations, like tablet-based presentations.

Understanding customers better helps enterprises to transform and improve the sales experience with more personalized sales and customer service.

#3. Building Better Customer Touch Points

Customer service can be improved tremendously with new digital services. For example, by introducing new channels for the communication. Instead of going to a local branch of a business, customers can talk to support through Twitter of Facebook.

Self-service digital tools can be developed which both save time for the customer while saving money for the company.

#4. Process Digitization

With automation companies can focus their employees on more strategic tasks, innovation or creativity rather than repetitive efforts.

#5. Worker Enablement

Virtualization of individual work (the work process is separated from the location of work) have become enablers for knowledge sharing. Information and expertise is accessible in real-time for frontline employees.

#6. Data-Driven Performance Management

With the proper analytical capabilities, decisions can be made on real data and not on assumptions.

Digital transformation is changing the way how strategic decisions are made. With new tools strategic planning sessions can include more stakeholders, not just a small group.

#7. Digitally Extended Businesses

Many companies extend their physical offerings with digital ones. Examples include:

  • news outlets augmenting their print offering with digital content,
  • FMCG companies extending to e-commerce.

#8. New Digital Businesses

Companies not just extend their current offerings with digital transformation, but also coming up with new digital products that complement the traditional ones. Examples may include connected devices, like GPS trackers that can now report activity to the cloud and provide value to the customers through recommendations.

#9. Digital Globalization

Global shared services, like shared finance or HR enable organizations to build truly global operations.


The Digital Transformation Benefits of Implementing Node.js

Organizations are looking for the most effective way of digital transformation - among these companies, Node.js is becoming the de facto technology for building out digital capabilities.

Node.js is a JavaScript runtime built on Chrome's V8 JavaScript engine. Node.js uses an event-driven, non-blocking I/O model that makes it lightweight and efficient.

In other words: Node.js offers you the possibility to write servers using JavaScript with an incredible performance.

Increased Developer Productivity with Same-Language Stacks

When PayPal started using Node.js, they reported an 2x increase in productivity compared to the previous Java stack. How is that even possible?

NPM, the Node.js package manager, has an incredible amount of modules that can be used instantly. This saves a lot of development effort for the development team.

Secondly, as Node.js applications are written using JavaScript, front-end developers can also easily understand what's going on and make necessary changes.

This saves you valuable time again as developers will use the same language on the entire stack.

100% Business Availability, even with Extreme Load

Around 1.5 billion dollars are being spent online in the US on a single day on Black Friday, each year.

It is crucial that your site can keep up with the traffic. This is why Walmart, one of the biggest retailers is using Node.js to serve 500 million pageviews on Black Friday, without a hitch.

Fast Apps = Satisfied Customers

As your velocity increases because of the productivity gains, you can ship features/products sooner. Products that run faster result in better user experience.

Kissmetric's study showed that 40% of people abandon a website that takes more than 3 seconds to load, and 47% of consumers expect a web page to load in 2 seconds or less.

To read more on the benefits of using Node.js, you can download our Node.js is Enterprise Ready ebook.


Your Digital Transformation Roadmap with Node.js

As with most new technologies introduced to a company, it’s worth taking baby-steps first with Node.js as well. As a short framework for introducing Node.js, we recommend the following steps:

  • building a Node.js core team,
  • picking a small part of the application to be rewritten/extended using Node.js,
  • extending the scope of the project to the whole organization.

Step 1 - Building your Node.js Core Team

The core Node.js team will consist of people with JavaScript experience for both the backend and the frontend. It’s not crucial that the backend engineers have any Node.js experience, the important aspect is the vision they bring to the team.

Introducing Node.js is not just about JavaScript - it has to include members of the operations team as well, joining the core team.

The introduction of Node.js to an organization does not stop at excelling Node.js - it also means adding modern DevOps or NoOps practices, including but not limited to continuous integration and delivery.

Step 2 - Embracing The Proxy Approach

To incrementally replace old systems or to extend their functionality easily, your team can use the proxy approach.

For the features or functions you want to replace, create a small and simple Node.js application and proxy some of your load to the newly built Node.js application. This proxy does not necessarily have to be written in Node.js. With this approach, you can easily benefit from modularized, service-oriented architecture.

Another way to use proxies is to write them in Node.js and make them to talk with the legacy systems. This way you have the option to optimize the data sent being sent. PayPal was one of the first adopter of Node.js at scale, and they started with this proxy approach as well.

The biggest advantages of these solutions are that you can put Node.js into production in a short amount of time, measure your results, and learn from them.

Step 3 - Measure Node.js, Be Data-Driven

For the successful introduction of Node.js during a digital transformation project, it is crucial to set up a series of benchmarks to compare the results between the legacy system and the new Node.js applications. These data points can be response times, throughput or memory and CPU usage.

Orchestrating The Node.js Stack

As mentioned previously, introducing Node.js does not stop at excelling Node.js itself, but introducing continuous integration and delivery are crucial points as well.

Also, from an operations point of view, it is important to add containers to ship applications with confidence.

For orchestration, to operate the containers containing the Node.js applications we encourage companies to adopt Kubernetes, an open-source system for automating deployment, scaling, and management of containerized applications.

RisingStack and Digital Transformation with Node.js

RisingStack enables amazing companies to succeed with Node.js and related technologies to stay ahead of the competition. We provide professional Node.js development and consulting services from the early days of Node.js, and help companies like Lufthansa or Cisco to thrive with this technology.

10 Best Practices for Writing Node.js REST APIs

10 Best Practices for Writing Node.js REST APIs

In this article we cover best practices for writing Node.js REST APIs, including topics like naming your routes, authentication, black-box testing & using proper cache headers for these resources.

One of the most popular use-cases for Node.js is to write RESTful APIs using it. Still, while we help our customers to find issues in their applications with Trace, our Node.js monitoring tool we constantly experience that developers have a lot of problems with REST APIs.

I hope these best-practices we use at RisingStack can help:

#1 - Use HTTP Methods & API Routes

Imagine, that you are building a Node.js RESTful API for creating, updating, retrieving or deleting users. For these operations HTTP already has the adequate toolset: POST, PUT, GET, PATCH or DELETE.

As a best practice, your API routes should always use nouns as resource identifiers. Speaking of the user's resources, the routing can look like this:

  • POST /user or PUT /user:/id to create a new user,
  • GET /user to retrieve a list of users,
  • GET /user/:id to retrieve a user,
  • PATCH /user/:id to modify an existing user record,
  • DELETE /user/:id to remove a user.

#2 - Use HTTP Status Codes Correctly

If something goes wrong while serving a request, you must set the correct status code for that in the response:

  • 2xx, if everything was okay,
  • 3xx, if the resource was moved,
  • 4xx, if the request cannot be fulfilled because of a client error (like requesting a resource that does not exist),
  • 5xx, if something went wrong on the API side (like an exception happened).

If you are using Express, setting the status code is as easy as res.status(500).send({error: 'Internal server error happened'}). Similarly with Restify: res.status(201).

For a full list, check the list of HTTP status codes

#3 - Use HTTP headers to Send Metadata

To attach metadata about the payload you are about to send, use HTTP headers. Headers like this can be information on:

  • pagination,
  • rate limiting,
  • or authentication.

A list of standardized HTTP headers can be found here.

If you need to set any custom metadata in your headers, it was a best practice to prefix them with X. For example, if you were using CSRF tokens, it was a common (but non-standard) way to name them X-Csrf-Token. However with RFC 6648 they got deprecated. New APIs should make their best effort to not use header names that can conflict with other applications. For example, OpenStack prefixes its headers with OpenStack:

OpenStack-Identity-Account-ID  
OpenStack-Networking-Host-Name  
OpenStack-Object-Storage-Policy  

Note that the HTTP standard does not define any size limit on the headers; however, Node.js (as of writing this article) imposes an 80KB size limit on the headers object for practical reasons.

" Don't allow the total size of the HTTP headers (including the status line) to exceed HTTP_MAX_HEADER_SIZE. This check is here to protect embedders against denial-of-service attacks where the attacker feeds us a never-ending header that the embedder keeps buffering."

From the Node.js HTTP parser

#4 - Pick the right framework for your Node.js REST API

It is important to pick the framework that suits your use-case the most.

Express, Koa or Hapi

Express, Koa and Hapi can be used to create browser applications, and as such, they support templating and rendering - just to name a few features. If your application needs to provide the user-facing side as well, it makes sense to go for them.

Restify

On the other hand, Restify is focusing on helping you build REST services. It exists to let you build "strict" API services that are maintainable and observable. Restify also comes with automatic DTrace support for all your handlers.

Restify is used in production in major applications like npm or Netflix.

#5 - Black-Box Test your Node.js REST APIs

One of the best ways to test your REST APIs is to treat them as black boxes.

Black-box testing is a method of testing where the functionality of an application is examined without the knowledge of its internal structures or workings. So none of the dependencies are mocked or stubbed, but the system is tested as a whole.

One of the modules that can help you with black-box testing Node.js REST APIs is supertest.

A simple test case which checks if a user is returned using the test runner mocha can be implemented like this:

const request = require('supertest')

describe('GET /user/:id', function() {  
  it('returns a user', function() {
    // newer mocha versions accepts promises as well
    return request(app)
      .get('/user')
      .set('Accept', 'application/json')
      .expect(200, {
        id: '1',
        name: 'John Math'
      }, done)
  })
})

You may ask: how does the data gets populated into the database which serves the REST API?

In general, it is a good approach to write your tests in a way that they make as few assumptions about the state of the system as possible. Still, in some scenarios you can find yourself in a spot when you need to know what is the state of the system exactly, so you can make assertions and achieve higher test coverage.

So based on your needs, you can populate the database with test data in one of the following ways:

  • run your black-box test scenarios on a known subset of production data,
  • populate the database with crafted data before the test cases are run.

Of course, black-box testing does not mean that you don't have to do unit testing, you still have to write unit tests for your APIs.


Node.js Monitoring and Debugging from the Experts of RisingStack

Improve your REST APIs with Trace
Learn more


#6 - Do JWT-Based, Stateless Authentication

As your REST APIs must be stateless, so does your authentication layer. For this, JWT (JSON Web Token) is ideal.

JWT consists of three parts:

  • Header, containing the type of the token and the hashing algorithm
  • Payload, containing the claims
  • Signature (JWT does not encrypt the payload, just signs it!)

Adding JWT-based authentication to your application is very straightforward:

const koa = require('koa')  
const jwt = require('koa-jwt')

const app = koa()

app.use(jwt({  
  secret: 'very-secret' 
}))

// Protected middleware
app.use(function *(){  
  // content of the token will be available on this.state.user
  this.body = {
    secret: '42'
  }
})

After that, the API endpoints are protected with JWT. To access the protected endpoints, you have to provide the token in the Authorization header field.

curl --header "Authorization: Bearer eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJzdWIiOiIxMjM0NTY3ODkwIiwibmFtZSI6IkpvaG4gRG9lIiwiYWRtaW4iOnRydWV9.TJVA95OrM7E2cBab30RMHrHDcEfxjoYZgeFONFh7HgQ" my-website.com  

One thing that you could notice is that the JWT module does not depend on any database layer. This is the case because all JWT tokens can be verified on their own, and they can also contain time to live values.

Also, you always have to make sure that all your API endpoints are only accessible through a secure connection using HTTPS.

In a previous article, we explained web authentication methods in details - I recommend to check it out!

#7 - Use Conditional Requests

Conditional requests are HTTP requests which are executed differently depending on specific HTTP headers. You can think of these headers as preconditions: if they are met, the requests will be executed in a different way.

These headers try to check whether a version of a resource stored on the server matches a given version of the same resource. Because of this reason, these headers can be:

  • the timestamp of the last modification,
  • or an entity tag, which differs for each version.

These headers are:

  • Last-Modified (to indicate when the resource was last modified),
  • Etag (to indicate the entity tag),
  • If-Modified-Since (used with the Last-Modified header),
  • If-None-Match (used with the Etag header),

Let's take a look at an example!

The client below did not have any previous versions of the doc resource, so neither the If-Modified-Since, nor the If-None-Match header was applied when the resource was sent. Then, the server responds with the Etag and Last-Modified headers properly set.

Node.js RESTfu API with conditional request, without previous versions

From the MDN Conditional request documentation

The client can set the If-Modified-Since and If-None-Match headers once it tries to request the same resource - since it has a version now. If the response would be the same, the server simply responds with the 304 - Not Modified status and does not send the resource again.

Node.js RESTfu API with conditional request, with previous versions

From the MDN Conditional request documentation

#8 - Embrace Rate Limiting

Rate limiting is used to control how many requests a given consumer can send to the API.

To tell your API users how many requests they have left, set the following headers:

  • X-Rate-Limit-Limit, the number of requests allowed in a given time interval
  • X-Rate-Limit-Remaining, the number of requests remaining in the same interval,
  • X-Rate-Limit-Reset, the time when the rate limit will be reset.

Most HTTP frameworks support it out of the box (or with plugins). For example, if you are using Koa, there is the koa-ratelimit package.

Note, that the time window can vary based on different API providers - for example, GitHub uses an hour for that, while Twitter 15 minutes.

#9 - Create a Proper API Documentation

You write APIs so others can use them, benefit from them. Providing an API documentation for your Node.js REST APIs are crucial.

The following open-source projects can help you with creating documentation for your APIs:

Alternatively, if you want to use a hosted products, you can go for Apiary.

#10 - Don't Miss The Future of APIs

In the past years, two major query languages for APIs arose - namely GraphQL from Facebook and Falcor from Netflix. But why do we even need them?

Imagine the following RESTful resource request:

/org/1/space/2/docs/1/collaborators?include=email&page=1&limit=10

This can get out of hand quite easily - as you'd like to get the same response format for all your models all the time. This is where GraphQL and Falcor can help.

About GraphQL

GraphQL is a query language for APIs and a runtime for fulfilling those queries with your existing data. GraphQL provides a complete and understandable description of the data in your API, gives clients the power to ask for exactly what they need and nothing more, makes it easier to evolve APIs over time, and enables powerful developer tools. - Read more here.

About Falcor

Falcor is the innovative data platform that powers the Netflix UIs. Falcor allows you to model all your backend data as a single Virtual JSON object on your Node server. On the client, you work with your remote JSON object using familiar JavaScript operations like get, set, and call. If you know your data, you know your API. - Read more here.

Amazing REST APIs for Inspiration

If you are about to start developing a Node.js REST API or creating a new version of an older one, we have collected four real-life examples that are worth checking out:

I hope that now you have a better understanding of how APIs should be written using Node.js. Please let me know in the comments if you miss anything!

Concurrency and Parallelism: Understanding I/O

Concurrency and Parallelism: Understanding I/O

With this article, we are launching a series of posts targeting developers who want to learn or refresh their knowledge about writing concurrent applications in general. The series will focus on well-known and widely adopted concurrency patterns in different programming languages, platforms, and runtimes.

In the first episode of this series, we’ll start from the ground up: Operating systems handle our applications' I/O, so it’s essential to understand the principles.


Concurrent code has a bad reputation of being notoriously easy to screw up. One of the world's most infamous software disasters was caused by a race condition. A programmer error in the Therac-25 radiation therapy device resulted in the death of four people.

Data races are not the only problem, though: inefficient locking, starvation, and a myriad of other problems rise. I remember from university that even the seemingly trivial, innocent-looking task of writing a thread-safe singleton proved to be quite challenging because of these nuances.

No wonder that throughout the past decades' many concurrency-related patterns emerged to abstract away the complexity and snip the possibilities of errors. Some have arisen as a straightforward consequence of the properties of an application area, like event loops and dispatchers in window managers, GUI toolkits, and browsers; where others succeeded in creating more general approaches applicable to a broad range of use cases, like Erlang's actor system.

My experience is that after a brief learning period, most developers can write highly concurrent, good quality code in Node.js, which is also free from race conditions. Although nothing is stopping us from creating data races, this is far less frequently happening than in programming languages or platforms that expose threads, locks and shared memory as their main concurrency abstraction. I think it's mainly due to the more functional style of creating a data flow (e.g. promises) instead of imperatively synchronizing (e.g. with locks) concurrent computations.

However to reason about the "whats and whys," it is best to start from the ground up, which I think is the OS level. It's the OS that does the hard work of scheduling our applications and interleaving it with I/O, so it is essential that we understand the principles. Then we discuss concurrency primitives and patterns and finally arrive at frameworks.

Let the journey begin!

Intro to Concurrency and Parallelism

Before diving into the OS level details, let's take a second clarifying what is concurrency exactly.

What's the difference between concurrency and parallelism?

Concurrency is much broader, general problem than parallelism. If you have tasks having inputs and outputs, and you want to schedule them so that they produce correct results, you are solving a concurrency problem.

Take a look at this diagram:

Cuncurrency & Paralellism: Diagram of Tasks with Dependencies

It shows a data flow with input and output dependencies. Here tasks 2, 3, 4 can run concurrently after 1. There is no specific order between them, so we have multiple alternatives for running it sequentially. Showing only two of them:

Concurrent data flow path: 12345 Concurrent data flow path: 13245

Alternatively, these tasks can run in parallel, e.g. on another processor core, another processor, or an entirely separate computer.

On these diagrams, thread means a computation carried out on dedicated processor core, not an OS thread, as they are not necessarily parallel. How else could you run a multithreaded web server with dedicated threads for hundreds of connections?

Paralellized Task Diagram

It's not rocket science, but what I wanted to show on these diagrams is that running concurrent tasks in parallel can reduce the overall computation time. The results will remain correct as long as the partial order shown on the above data flow graph is correct. However running if we only have one thread, the different orders are apparently equivalent, at least regarding the overall time.

If we only have one processor, why do we even bother with writing concurrent applications? The processing time will not get shorter, and we add the overhead of scheduling. As a matter of fact, any modern operating system will also slice up the concurrent tasks and interleave them, so each of the slices will run for a short time.

There are various reasons for this.

  • We, humans like to interact with the computer in real time, e.g. as I type this text, I want to see it appearing on the screen immediately, at the same time listening to my favorite tracklist, and getting notifications about my incoming emails. Just imagine that you cannot drag a window while the movie keeps on playing in it.

  • Not all operations are carried out on the computer's CPU. If you want to write to an HDD for example, a lot of time is spent seeking to the position, writing the sectors, etc., and the intermittent time can be spent to do something else. The same applies to virtually every I/O, even computations carried out on the GPU.

These require the operating system kernel to run tasks in an interleaved manner, referred to as time-sharing. This is a very important property of modern operating systems. Let's see the basics of it.

Processes and threads

A process - quite unsurprisingly - is a running instance of a computer program. It is what you see in the task manager of your operating system or top.

A process consists of allocated memory which holds the program code, its data, a heap for dynamic memory allocations, and a lot more. However ,it is not the unit for multi-tasking in desktop operating systems.

Thread is the default unit - the task - of CPU usage. Code executed in a single thread is what we usually refer to as sequential or synchronous execution.

Threads are supported by nearly all operating systems (hence the multithreaded qualifier) and can be created with system calls. They have their own call stacks, virtual CPU and (often) local storage but share the application's heap, data, codebase and resources (such as file handles) with the other threads in the same process.

They also serve as the unit of scheduling in the kernel. For this reason, we call them kernel threads, clarifying that they are native to the operating system and scheduled by the kernel, which distinguishes them from user-space threads, also called green threads, which are scheduled by some user space scheduler such as a library or VM.

Kernel Processes and Threads

Most desktop and server operating system kernels use preemptive schedulers, as does the Linux, macOS and Windows kernel. We can assume that threads are preemptively scheduled, distinguishing them from their non-preemptive (cooperative) counterparts, called fibers. This preemptive scheduling is the reason that a hanging process doesn't stall the whole computer.

The hanging time slices are interleaved with other processes' and the OS' code, so the system as a whole remains responsive.

preemption is the act of temporarily interrupting a task being carried out by a computer system, without requiring its cooperation, and with the intention of resuming the task at a later time” - Wikipedia

Context switching (switching between threads) is done at frequent intervals by the kernel, creating the illusion that our programs are running in parallel, whereas in reality, they are running concurrently but sequentially in short slices. Multi-core processors arrived pretty late to commodity: funny that Intel's first dual-core processor was released in 2005, while multitasking OSes had already been in wide use for at least 20 years.

CPU vs. I/O

Programs usually don't only consist of numeric, arithmetic and logic computations, in fact, a lot of times they merely write something to the file system, do network requests or access peripheries such as the console or an external device.

While the first kind of workload is CPU intensive, the latter requires performing I/O in the majority of the time.

CPU bound I/O bound
scientific computation reading from / writing to disk
(in-memory) data analysis accessing camera, microphone, other devices
simulations reading from / writing to network sockets
reading from stdin

Doing I/O is a kernel space operation, initiated with a system call, so it results in a privilege context switch.

When an I/O operation is requested with a blocking system call, we are talking about blocking I/O.

This can deteriorate concurrency under implementations, concretely those that use many-to-one mapping. This means that all threads in a process share a common kernel thread, which implies that every thread is blocked when one does blocking I/O (because of the above-mentioned switch to kernel mode).

No wonder that modern OSes don't do this. Instead, they use one-to-one mapping, i.e. map a kernel thread to each user-space thread, allowing another thread to run when one makes a blocking system call, which means that they are unaffected by the above adverse effect.

I/O flavors: Blocking vs. non-blocking, sync vs. async

Doing I/O usually consists of two distinct steps:

  • checking the device:

    • blocking: waiting for the device to be ready, or
    • non-blocking: e.g. polling periodically until ready, then
  • transmitting:

    • synchronous: executing the operation (e.g. read or write) initiated by the program, or
    • asynchronous: executing the operation as response to an event from the kernel (asynchronous / event driven)

You can mix the two steps in every fashion. I skip delving into to technical details which I don't possess, instead, let me just draw an analogy.

Recently I moved to a new flat, so that's where the analogy comes from. Imagine that you have to pack your things and transfer them to your new apartment. This is how it is done with different types of I/O:


Illustration for synchronous blocking I/O

Synchronous, blocking I/O: Start to move right away, possibly got blocked by traffic on the road. For multiple turns, you are required to repeat the first two steps.


Illustration for synchronous non-blocking I/O

Synchronous, non-blocking I/O: Periodically check the road for traffic, only move stuff when clear. Between the checks you can do anything else you want, rather than wasting your time on the road being blocked by others. For multiple turns, you are required to repeat the first three steps.


Illustration for asynchronous non-blocking I/O

Asynchronous, non-blocking I/O: Hire a moving company. They will ask you periodically if there's anything left to move, then you give them some of your belongings. Between their interruptions, you can do whatever you want. Finally, they notify you when they are done.


Which model suits you the best depends on your application, the complexity you dare to tackle, your OS's support, etc.

Synchronous, blocking I/O has wide support with long established POSIX interfaces and is the most widely understood and easy to use. Its drawback is that you have to rely on thread-based concurrency, which is sometimes undesirable:

  • every thread allocated uses up resources
  • more and more context switching will happen between them
  • the OS has a maximum number of threads.

That's why modern web servers shifted to the async non-blocking model, and advocate using a single-threaded event loop for the network interface to maximize the throughput. Because currently, the underlying OS APIs are platform-specific and quite challenging to use, there are a couple of libraries providing an abstraction layer over it. You can check the end of the article for the list later.

If you want to know more about the details of different I/O models, read this detailed article about boosting performance using asynchronous IO!

Busy-waiting, polling and the event loop

Busy-waiting is the act of repeatedly checking a resource, such as I/O for availability in a tight loop. The absence of the tight loop is what distinguishes polling from busy-waiting.

It's better shown than said:

// tight-loop example
while(pthread_mutex_trylock(&my_mutex) == EBUSY) { }  
// mutex is unlocked
do_stuff();  
// polling example
while(pthread_mutex_trylock(&my_mutex) == EBUSY) {  
  sleep(POLL_INTERVAL);
}
// mutex is unlocked
do_stuff();  

The difference between the two code is apparent. The sleep function puts the current thread of execution to sleep, yielding control to the kernel to schedule something else to run.

It is also obvious that both of them offer a technique of turning non-blocking code into blocking code, because control won't pass the loop until the mutex becomes free. This means that do_stuff is blocked.

Let's say we have more of these mutexes or any arbitrary I/O device that can be polled. We can invert control-flow by assigning handlers to be called when the resource is ready. If we periodically check the resources in the loop and execute the associated handlers on completion, we created what is called an event loop.

pending_event_t *pendings;  
completed_event_t *completeds;  
struct timespec start, end;  
size_t completed_ev_size, pending_ev_size, i;  
long loop_quantum_us;  
long wait_us;

// do while we have pending events that are not yet completed
while (pending_events_size) {  
  clock_gettime(CLOCK_MONOTONIC, &start);
  // check whether they are completed already
  for (i = 0; i < pending_events_size; ++i) {
    poll(&pendings, &pending_ev_size, &completeds, &completed_ev_size);
  }
  // handle completed events, the handlers might add more pending events
  for (i = 0; i < completeds_size; ++i) {
    handle(&completeds, &completed_ev_size, &pendings, &pending_ev_size);
  }
  // sleep for a while to avoid busy waiting
  clock_gettime(CLOCK_MONOTONIC, &end);
  wait_us = (end.tv_sec - start.tv_sec) * 1e6 + (end.tv_nsec - start.tv_nsec) / 1e3 - loop_quantum_us;
  if (wait_us > 0) {
    usleep(wait_us * 1e3);
  }
}

This kind of control inversion takes some time getting used to. Different frameworks expose various levels of abstractions over it. Some only provide an API for polling events, while others use a more opinionated mechanism like an event loop or a state machine.

TCP server example

The following example will illustrate the differences between working with synchronous, blocking and asynchronous, non-blocking network I/O. It is a dead-simple TCP echo server. After the client connects, every line is echoed back to the socket until the client writes "bye".

Single threaded

The first version uses the standard POSIX procedures of sys/socket.h. The server is single-threaded, it waits until a client connects

/*  Wait for a connection, then accept() it  */
if ((conn_s = accept(list_s, NULL, NULL)) < 0) { /* exit w err */ }  

Then it reads from the socket each line and echoes it back until the client closes connection or prints the word "bye" on a line:

bye = 0;

// read from socket and echo back until client says 'bye'
while (!bye) {  
    read_line_from_socket(conn_s, buffer, MAX_LINE - 1);
    if (!strncmp(buffer, "bye\n", MAX_LINE - 1)) bye = 1;
    write_line_to_socket(conn_s, buffer, strlen(buffer));
}

if (close(conn_s) < 0) { /* exit w err */ }  

animation showing the single threaded server

As you can see on the gif, this server is not concurrent at all. It can handle only one client at a time. If another client connects, it has to wait until the preceding one closes the connection.

Multi-threaded

Introducing concurrency without replacing the synchronous blocking networking API calls is done with threads. This is shown in the second version. The only difference between this and the single-threaded version is that here we create a thread for each of the connections.

A real-life server would use thread pools of course.

/*  Wait for a connection, then accept() it  */
if ((conn_s = accept(list_s, NULL, NULL)) < 0) { /* exit w err */ }  
args = malloc(sizeof(int));  
memcpy(args, &conn_s, sizeof(int));  
pthread_create(&thrd, NULL, &handle_socket, args);  

animation showing the multi threaded server

This finally enables us to serve multiple clients at the same time. Hurray!

Single threaded, concurrent

Another way to create a concurrent server is to use libuv. It exposes asynchronous non-blocking I/O calls and an event loop. Although by using it, our code will be coupled to this library, I still find it better than using obscure, platform-dependent APIs. The implementation is still quite complex.

Once we initialized our tcp server, we register a listener handle_socket for incoming connections.

uv_listen((uv_stream_t*) &tcp, SOMAXCONN, handle_socket);  

In that handler, we can accept the socket and register a reader for incoming chunks.

uv_accept(server, (uv_stream_t*) client);  
uv_read_start((uv_stream_t*) client, handle_alloc, handle_read);  

Whenever a chunk is ready and there is data, we register a write handler handle_write that echoes the data back to the socket.

uv_write(write_req, client, &write_ctx->buf, 1, handle_write);  

Else if the client said bye, or we reached EOF, we close the connection. You can see that to program this way is very tedious and error-prone (I definitely made some bugs myself, although I copied a large portion of it). Data created in one function often has to be available somewhere in its continuation (a handler created in the function, but asynchronously called later), which requires manual memory management. I advise you against using libuv directly, unless you are well acquainted in C programming.

animation showing the single threaded uv-server

Next episode: Concurrency patterns, futures, promises and so on..

We've seen how to achieve concurrency in the lowest levels of programming. Take your time to play with the examples. Also, feel free to check out this list I prepared for you:

  • Boost.Asio
    • C++
    • network and low-level I/O.
    • Boost Software License
  • Seastar
    • C++
    • network and filesystem I/O, multi-core support, fibers. Used by the ScyllaDB project.
    • APL 2.0
  • libuv
    • C
    • network and filesystem I/O, threading and synchronization primitives. Used by Node.js.
    • MIT
  • Netty
  • mio
    • Rust
    • network I/O. It is used the high-level tokio and rotor networking libraries.
    • MIT
  • Twisted
    • Python
    • network I/O
    • MIT

In the next chapter, we continue with some good ol' concurrency patterns and new ones as well. We will see how to use futures and promises for threads and continuations and will also talk about the reactor and proactor design patterns.

If you have any comments or questions about this topic, please let me know in the comment section below.

Yarn vs npm - The State of Node.js Package Managers

Yarn vs npm - The State of Node.js Package Managers

With the v7.4 release, npm 4 became the bundled, default package manager for Node.js. In the meantime, Facebook released their own package manager solution, called Yarn.

Let's take a look at the state of Node.js package managers, what they can do for you, and when you should pick which one!

Yarn - the new kid on the block

Fast, reliable and secure dependency management - this is the promise of Yarn, the new dependency manager created by the engineers of Facebook.

But can Yarn live up to the expectations?

Yarn - the node.js package manager

Installing Yarn

There are several ways of installing Yarn. If you have npm installed, you can just install Yarn with npm:

npm install yarn --global  

However, the recommended way by the Yarn team is to install it via your native OS package manager - if you are on a Mac, probably it will be brew:

brew update  
brew install yarn  

Yarn Under the Hood

Yarn has a lot of performance and security improvements under the hood. Let's see what these are!

Offline cache

When you install a package using Yarn (using yarn add packagename), it places the package on your disk. During the next install, this package will be used instead of sending an HTTP request to get the tarball from the registry.

Your cached module will be put into ~/.yarn-cache, and will be prefixed with the registry name, and postfixed with the modules version.

This means that if you install the 4.4.5 version of express with Yarn, it will be put into ~/.yarn-cache/npm-express-4.4.5.

Node.js Monitoring and Debugging from the Experts of RisingStack

Create performant applications using Trace
Learn more

Deterministic Installs

Yarn uses lockfiles (yarn.lock) and a deterministic install algorithm. We can say goodbye to the "but it works on my machine" bugs.

The lockfile looks like something like this:

Yarn lockfile

It contains the exact version numbers of all your dependencies - just like with an npm shrinkwrap file.

License checks

Yarn comes with a handy license checker, which can become really powerful in case you have to check the licenses of all the modules you depend on.

yarn licenses

Potential issues/questions

Yarn is still in its early days, so it’s no surprise that there are some questions arising when you start using it.

What’s going on with the default registry?

By default, the Yarn CLI uses a different registry, and not the original one: https://registry.yarnpkg.com. So far there is no explanation on why it does not use the same registry.

Does Facebook have plans to make incompatible API changes and split the community?

Contributing back to npm?

One the most logical questions that can come up when talking about Yarn is: Why don’t you talk with the CLI team at npm, and work together?

If the problem is speed, I am sure all npm users would like to get those improvements as well.

When we talk about deterministic installs, instead of coming up with a lockfile, the npm-shrinkwrap.json should have been fixed.

Why the strange versioning?

In the world of Node.js and npm, versions starts with 1.0.0.

At the time of writing this article, Yarn is at 0.18.1.

Is something missing to make Yarn stable? Does Yarn simply not follow semver?

npm 4

npm is the default package manager we all know, and it is bundled with each Node.js release since v7.4.

Updating npm

To start using npm version 4, you just have to update your current CLI version:

npm install npm -g  

At the time of writing this article, this command will install npm version 4.1.1, which was released on 12/11/2016. Let's see what changed in this version!

Changes since version 3

  • npm search is now reimplemented to stream results, and sorting is no longer supported,
  • npm scripts no longer prepend the path of the node executable used to run npm before running scripts,
  • prepublish has been deprecated - you should use prepare from now on,
  • npm outdated returns 1 if it finds outdated packages,
  • partial shrinkwraps are no longer supported - the npm-shrinkwrap.json is considered a complete manifest,
  • Node.js 0.10 and 0.12 are no longer supported,
  • npm doctor, which diagnose user's environment and let the user know some recommended solutions if they potentially have any problems related to npm

As you can see, the team at npm was quite busy as well - both npm and Yarn made great progress in the past months.

Conclusion

It is great to see a new, open-source npm client - no doubt, a lot of effort went into making Yarn great!

Hopefully, we will see the improvements of Yarn incorporated into npm as well, so both users will benefit from the improvements of the others.

Yarn vs. npm - Which one to pick?

If you are working on proprietary software, it does not really matter which one you use. With npm, you can use npm-shrinkwrap.js, while you can use yarn.lock with Yarn.

The team at Yarn published a great article on why lockfiles should be committed all the time, I recommend checking it out: https://yarnpkg.com/blog/2016/11/24/lockfiles-for-all


Node.js Interview Questions and Answers (2017 Edition)

Node.js Interview Questions and Answers (2017 Edition)

Two years ago we published our first article on common Node.js Interview Questions and Answers. Since then a lot of things improved in the JavaScript and Node.js ecosystem, so it was time to update it.

Important Disclaimers

It is never a good practice to judge someone just by questions like these, but these can give you an overview of the person's experience in Node.js.

But obviously, these questions do not give you the big picture of someone's mindset and thinking.

I think that a real-life problem can show a lot more of a candidate's knowledge - so we encourage you to do pair programming with the developers you are going to hire.

Finally and most importantly: we are all humans, so make your hiring process as welcoming as possible. These questions are not meant to be used as "Questions & Answers" but just to drive the conversation.

Node.js Interview Questions for 2017

  • What is an error-first callback?
  • How can you avoid callback hells?
  • What are Promises?
  • What tools can be used to assure consistent style? Why is it important?
  • When should you npm and when yarn?
  • What's a stub? Name a use case!
  • What's a test pyramid? Give an example!
  • What's your favorite HTTP framework and why?
  • How can you secure your HTTP cookies against XSS attacks?
  • How can you make sure your dependencies are safe?

The Answers

What is an error-first callback?

Error-first callbacks are used to pass errors and data as well. You have to pass the error as the first parameter, and it has to be checked to see if something went wrong. Additional arguments are used to pass data.

fs.readFile(filePath, function(err, data) {  
  if (err) {
    // handle the error, the return is important here
    // so execution stops here
    return console.log(err)
  }
  // use the data object
  console.log(data)
})

How can you avoid callback hells?

There are lots of ways to solve the issue of callback hells:

What are Promises?

Promises are a concurrency primitive, first described in the 80s. Now they are part of most modern programming languages to make your life easier. Promises can help you better handle async operations.

An example can be the following snippet, which after 100ms prints out the result string to the standard output. Also, note the catch, which can be used for error handling. Promises are chainable.

new Promise((resolve, reject) => {  
  setTimeout(() => {
    resolve('result')
  }, 100)
})
  .then(console.log)
  .catch(console.error)

What tools can be used to assure consistent style? Why is it important?

When working in a team, consistent style is important, so team members can modify more projects easily, without having to get used to a new style each time.

Also, it can help eliminate programming issues using static analysis.

Tools that can help:

If you’d like to be even more confident, I suggest you to learn and embrace the JavaScript Clean Coding principles as well!

Node.js Monitoring and Debugging from the Experts of RisingStack

Create performant applications using Trace
Learn more

What's a stub? Name a use case!

Stubs are functions/programs that simulate the behaviors of components/modules. Stubs provide canned answers to function calls made during test cases.

An example can be writing a file, without actually doing so.

var fs = require('fs')

var writeFileStub = sinon.stub(fs, 'writeFile', function (path, data, cb) {  
  return cb(null)
})

expect(writeFileStub).to.be.called  
writeFileStub.restore()  

What's a test pyramid? Give an example!

A test pyramid describes the ratio of how many unit tests, integration tests and end-to-end test you should write.

An example for an HTTP API may look like this:

  • lots of low-level unit tests for models (dependencies are stubbed),
  • fewer integration tests, where you check how your models interact with each other (dependencies are not stubbed),
  • less end-to-end tests, where you call your actual endpoints (dependencies are not stubbed).

What's your favorite HTTP framework and why?

There is no right answer for this. The goal here is to understand how deeply one knows the framework she/he uses. Tell what are the pros and cons of picking that framework.

When are background/worker processes useful? How can you handle worker tasks?

Worker processes are extremely useful if you'd like to do data processing in the background, like sending out emails or processing images.

There are lots of options for this like RabbitMQ or Kafka.

How can you secure your HTTP cookies against XSS attacks?

XSS occurs when the attacker injects executable JavaScript code into the HTML response.

To mitigate these attacks, you have to set flags on the set-cookie HTTP header:

  • HttpOnly - this attribute is used to help prevent attacks such as cross-site scripting since it does not allow the cookie to be accessed via JavaScript.
  • secure - this attribute tells the browser to only send the cookie if the request is being sent over HTTPS.

So it would look something like this: Set-Cookie: sid=<cookie-value>; HttpOnly. If you are using Express, with express-cookie session, it is working by default.

How can you make sure your dependencies are safe?

When writing Node.js applications, ending up with hundreds or even thousands of dependencies can easily happen.
For example, if you depend on Express, you depend on 27 other modules directly, and of course on those dependencies' as well, so manually checking all of them is not an option!

The only option is to automate the update / security audit of your dependencies. For that there are free and paid options:

Node.js Interview Puzzles

The following part of the article is useful if you’d like to prepare for an interview that involves puzzles, or tricky questions.

What's wrong with the code snippet?

new Promise((resolve, reject) => {  
  throw new Error('error')
}).then(console.log)

The Solution

As there is no catch after the then. This way the error will be a silent one, there will be no indication of an error thrown.

To fix it, you can do the following:

new Promise((resolve, reject) => {  
  throw new Error('error')
}).then(console.log).catch(console.error)

If you have to debug a huge codebase, and you don't know which Promise can potentially hide an issue, you can use the unhandledRejection hook. It will print out all unhandled Promise rejections.

process.on('unhandledRejection', (err) => {  
  console.log(err)
})

What's wrong with the following code snippet?

function checkApiKey (apiKeyFromDb, apiKeyReceived) {  
  if (apiKeyFromDb === apiKeyReceived) {
    return true
  }
  return false
}

The Solution

When you compare security credentials it is crucial that you don't leak any information, so you have to make sure that you compare them in fixed time. If you fail to do so, your application will be vulnerable to timing attacks.

But why does it work like that?

V8, the JavaScript engine used by Node.js, tries to optimize the code you run from a performance point of view. It starts comparing the strings character by character, and once a mismatch is found, it stops the comparison operation. So the longer the attacker has right from the password, the more time it takes.

To solve this issue, you can use the npm module called cryptiles.

function checkApiKey (apiKeyFromDb, apiKeyReceived) {  
  return cryptiles.fixedTimeComparison(apiKeyFromDb, apiKeyReceived)
}

What's the output of following code snippet?

Promise.resolve(1)  
  .then((x) => x + 1)
  .then((x) => { throw new Error('My Error') })
  .catch(() => 1)
  .then((x) => x + 1)
  .then((x) => console.log(x))
  .catch(console.error)

The Answer

The short answer is 2 - however with this question I'd recommend asking the candidates to explain what will happen line-by-line to understand how they think. It should be something like this:

  1. A new Promise is created, that will resolve to 1.
  2. The resolved value is incremented with 1 (so it is 2 now), and returned instantly.
  3. The resolved value is discarded, and an error is thrown.
  4. The error is discarded, and a new value (1) is returned.
  5. The execution did not stop after the catch, but before the exception was handled, it continued, and a new, incremented value (2) is returned.
  6. The value is printed to the standard output.
  7. This line won't run, as there was no exception.

A day may work better than questions

Spending at least half a day with your possible next hire is worth more than a thousand of these questions.

Once you do that, you will better understand if the candidate is a good cultural fit for the company and has the right skill set for the job.

Do you miss anything? Let us know!

What was the craziest interview question you had to answer? What's your favorite question / puzzle to ask? Let us know in the comments! :)


Node.js Best Practices - How to Become a Better Developer in 2017

Node.js Best Practices - How to Become a Better Developer in 2017

A year ago we wrote a post on How to Become a Better Node.js Developer in 2016 which was a huge success - so we thought now it is time to revisit the topics and prepare for 2017!

In this article, we will go through the most important Node.js best practices for 2017, topics that you should care about and educate yourself in. Let’s start!

Node.js Best Practices for 2017

Use ES2015

Last year we advised you to use ES2015 - however, a lot has changed since.

Back then, Node.js v4 was the LTS version, and it had support for 57% of the ES2015 functionality. A year passed and ES2015 support grew to 99% with Node v6.

If you are on the latest Node.js LTS version you don't need babel anymore to use the whole feature set of ES2015. But even with this said, on the client side you’ll probably still need it!

For more information on which Node.js version supports which ES2015 features, I'd recommend checking out node.green.

Use Promises

Promises are a concurrency primitive, first described in the 80s. Now they are part of most modern programming languages to make your life easier.

Imagine the following example code that reads a file, parses it, and prints the name of the package. Using callbacks, it would look something like this:

fs.readFile('./package.json', 'utf-8', function (err, data) {  
  if (err) {
    return console.log(err)
  }

  try {
    JSON.parse(data)
  } catch (ex) {
    return console.log(ex)
  }
  console.log(data.name)
})

Wouldn't it be nice to rewrite the snippet into something more readable? Promises help you with that:

fs.readFileAsync('./package.json').then(JSON.parse).then((data) => {  
  console.log(data.name)
})
.catch((e) => {
  console.error('error reading/parsing file', e)
})

Of course, for now, the fs API does not have an readFileAsync that returns a Promise. To make it work, you have to wrap it with a module like promisifyAll.

Use the JavaScript Standard Style

When it comes to code style, it is crucial to have a company-wide standard, so when you have to change projects, you can be productive starting from day zero, without having to worry about building the build because of different presets.

At RisingStack we have incorporated the JavaScript Standard Style in all of our projects.

Node.js best practices - The Standard JS Logo

With Standard, there is no decisions to make, no .eslintrc, .jshintrc, or .jscsrc files to manage. It just works. You can find the Standard rules here.



Need help with enterprise-grade Node.js Development?
Hire the experts of RisingStack!


Use Docker - Containers are Production Ready in 2017!

You can think of Docker images as deployment artifacts - Docker containers wrap up a piece of software in a complete filesystem that contains everything it needs to run: code, runtime, system tools, system libraries – anything you can install on a server.

But why should you start using Docker?

  • it enables you to run your applications in isolation,
  • as a conscience, it makes your deployments more secure,
  • Docker images are lightweight,
  • they enable immutable deployments,
  • and with them, you can mirror production environments locally.

To get started with Docker, head over to the official getting started tutorial. Also, for orchestration we recommend checking out our Kubernetes best practices article.

Monitor your Applications

If something breaks in your Node.js application, you should be the first one to know about it, not your customers.

One of the newer open-source solutions is Prometheus that can help you achieve this. Prometheus is an open-source systems monitoring and alerting toolkit originally built at SoundCloud. The only downside of Prometheus is that you have to set it up for you and host it for yourself.

If you are looking for on out-of-the-box solution with support, Trace by RisingStack is a great solution developed by us.

Trace will help you with

  • alerting,
  • memory and CPU profiling in production systems,
  • distributed tracing and error searching,
  • performance monitoring,
  • and keeping your npm packages secure!

Node.js Best Practices for 2017 - Use Trace and Profling

Use Messaging for Background Processes

If you are using HTTP for sending messages, then whenever the receiving party is down, all your messages are lost. However, if you pick a persistent transport layer, like a message queue to send messages, you won't have this problem.

If the receiving service is down, the messages will be kept, and can be processed later. If the service is not down, but there is an issue, processing can be retried, so no data gets lost.

An example: you'd like to send out thousands of emails. In this case, you would just have to put some basic information like the target email address and the first name, and a background worker could easily put together the email's content and send them out.

What's really great about this approach is that you can scale it whenever you want, and no traffic will be lost. If you see that there are millions of emails to be sent out, you can add extra workers, and they can consume the very same queue.

You have lots of options for messaging queues:

Use the Latest LTS Node.js version

To get the best of the two worlds (stability and new features) we recommend using the latest LTS (long-term support) version of Node.js. As of writing this article, it is version 6.9.2.

To easily switch Node.js version, you can use nvm. Once you installed it, switching to LTS takes only two commands:

nvm install 6.9.2  
nvm use 6.9.2  


Use Semantic Versioning

We conducted a Node.js Developer Survey a few months ago, which allowed us to get some insights on how people use semantic versioning.

Unfortunately, we found out that only 71% of our respondents uses semantic versioning when publishing/consuming modules. This number should be higher in our opinion - everyone should use it! Why? Because updating packages without semver can easily break Node.js apps.

Node.js Best Practices for 2017 - Semantic versioning survey results

Versioning your application / modules is critical - your consumers must know if a new version of a module is published and what needs to be done on their side to get the new version.

This is where semantic versioning comes into the picture. Given a version number MAJOR.MINOR.PATCH, increment the:

  • MAJOR version when you make incompatible API changes,
  • MINOR version when you add functionality (without breaking the API), and
  • PATCH version when you make backwards-compatible bug fixes.

npm also uses SemVer when installing your dependencies, so when you publish modules, always make sure to respect it. Otherwise, you can break others applications!

Secure Your Applications

Securing your users and customers data should be one of your top priorities in 2017. In 2016 alone, hundreds of millions of user accounts were compromised as a result of low security.

To get started with Node.js Security, read our Node.js Security Checklist, which covers topics like:

  • Security HTTP Headers,
  • Brute Force Protection,
  • Session Management,
  • Insecure Dependencies,
  • or Data Validation.

After you’ve embraced the basics, check out my Node Interactive talk on Surviving Web Security with Node.js!

Learn Serverless

Serverless started with the introduction of AWS Lambda. Since then it is growing fast, with a blooming open-source community.

In the next years, serverless will become a major factor for building new applications. If you'd like to stay on the edge, you should start learning it today.

One of the most popular solutions is the Serverless Framework, which helps in deploying AWS Lambda functions.

Attend and Speak at Conferences and Meetups

Attending conferences and meetups are great ways to learn about new trends, use-cases or best practices. Also, it is a great forum to meet new people.

To take it one step forward, I'd like to encourage you to speak at one of these events as well!

As public speaking is tough, and “imagine everyone's naked” is the worst advice, I'd recommend checking out speaking.io for tips on public speaking!

Become a better Node.js developer in 2017

As 2017 will be the year of Node.js, we’d like to help you getting the most out of it!

We just launched a new study program called "Owning Node.js" which helps you to become confident in:

  • Async Programming with Node.js
  • Creating servers with Express
  • Using Databases with Node
  • Project Structuring and building scalable apps

I want to learn more !


If you have any questions about the article, find me in the comments section!

The 10 Most Important Node.js Articles of 2016

The 10 Most Important Node.js Articles of 2016

2016 was an exciting year for Node.js developers. I mean - just take a look at this picture:

Every Industry has adopted Node.js

Looking back through the 6-year-long history of Node.js, we can tell that our favorite framework has finally matured to be used by the greatest enterprises, from all around the world, in basically every industry.

Another great news is that Node.js is the biggest open source platform ever - with 15 million+ downloads/month and more than a billion package downloads/week. Contributions have risen to the top as well since now we have more than 1,100 developers who built Node.js into the platform it is now.

To summarize this year, we collected the 10 most important articles we recommend to read. These include the biggest scandals, events, and improvements surrounding Node.js in 2016.

Let's get started!

#1: How one developer broke Node, Babel and thousands of projects in 11 lines of JavaScript

Programmers were shocked looking at broken builds and failed installations after Azer Koçulu unpublished more than 250 of his modules from NPM in March 2016 - breaking thousands of modules, including Node and Babel.

Koçulu deleted his code because one of his modules was called Kik - same as the instant messaging app - so the lawyers of Kik claimed brand infringement, and then NPM took the module away from him.

"This situation made me realize that NPM is someone’s private land where corporate is more powerful than the people, and I do open source because Power To The People." - Azer Koçulu

One of Azer's modules was left-pad, which padded out the lefthand-side of strings with zeroes or spaces. Unfortunately, 1000s of modules depended on it..

You can read the rest of this story in The Register's great article, with updates on the outcome of this event.

#2: Facebook partners with Google to launch a new JavaScript package manager

In October 2016, Facebook & Google launched Yarn, a new package manager for JavaScript.

The reason? There were a couple of fundamental problems with npm for Facebooks’s workflow.

  • At Facebook’s scale npm didn’t quite work well.
  • npm slowed down the company’s continuous integration workflow.
  • Checking all of the modules into a repository was also inefficient.
  • npm is, by design, nondeterministic — yet Facebook’s engineers needed a consistent and reliable system for their DevOps workflow.

Instead of hacking around npm’s limitations, Facebook wrote Yarn from the scratch:

  • Yarn does a better job at caching files locally.
  • Yarn is also able to parallelize some of its operations, which speeds up the install process for new modules.
  • Yarn uses lockfiles and a deterministic install algorithm to create consistent file structures across machines.
  • For security reasons, Yarn does not allow developers who write packages to execute other code that’s needed as part of the install process.

Yarn, which promises to even give developers that don’t work at Facebook’s scale a major performance boost, still uses the npm registry and is essentially a drop-in replacement for the npm client.

You can read the full article with the details on TechCrunch.

#3: Debugging Node.js with Chrome DevTools

New support for Node.js debuggability landed in Node.js master in May.

To use the new debugging tool, you have to

  • nvm install node
  • Run Node with the inspect flag: node --inspect index.js
  • Open the provided URL you got, starting with “chrome-devtools://..”

Read the great tutorial from Paul Irish to get all the features and details right!

#4: How I built an app with 500,000 users in 5 days on a $100 server

Jonathan Zarra, the creator of GoChat for Pokémon GO reached 1 million users in 5 days. Zarra had a hard time paying for the servers (around $4,000 / month) that were necessary to host 1M active users.

He never thought to get this many users. He built this app as an MVP, caring about scalability later. He built it to fail.

Zarra was already talking to VCs to grow and monetize his app. Since he built the app as an MVP, he thought he can care about scalability later.

He was wrong.

Thanks to it's poor design, GoChat was unable to scale to this much users, and went down. A lot of users lost with a lot of money spent.

500,000 users in 5 days on $100/month server

Erik Duindam, the CTO of Unboxd has been designing and building web platforms for hundreds of millions of active users throughout his whole life.

Frustrated by the poor design and sad fate of Zarra's GoChat, Erik decided to build his own solution, GoSnaps: The Instagram/Snapchat for Pokémon GO.

Erik was able to build a scalable MVP with Node.js in 24 hours, which could easily handle 500k unique users.

The whole setup ran on one medium Google Cloud server of $100/month, plus (cheap) Google Cloud Storage for the storage of images - and it was still able to perform exceptionally well.

GoSnap - The Node.js MVP that can Scale

How did he do it? Well, you can read the full story for the technical details:


#5: Getting Started with Node.js - The Node Hero Tutorial Series

The aim of the Node Hero tutorial series is to help novice developers to get started with Node.js and deliver software products with it!

Node Hero - Getting started with Node.js

You can find the full table of contents below:

  1. Getting started with Node.js
  2. Using NPM
  3. Understanding async programming
  4. Your first Node.js HTTP server
  5. Node.js database tutorial
  6. Node.js request module tutorial
  7. Node.js project structure tutorial
  8. Node.js authentication using Passport.js
  9. Node.js unit testing tutorial
  10. Debugging Node.js applications
  11. Node.js Security Tutorial
  12. Deploying Node.js application to a PaaS
  13. Monitoring Node.js Applications

#6: Using RabbitMQ & AMQP for Distributed Work Queues in Node.js

This tutorial helps you to use RabbitMQ to coordinate work between work producers and work consumers.

Unlike Redis, RabbitMQ's sole purpose is to provide a reliable and scalable messaging solution with many features that are not present or hard to implement in Redis.

RabbitMQ is a server that runs locally, or in some node on the network. The clients can be work producers, work consumers or both, and they will talk to the server using a protocol named Advanced Messaging Queueing Protocol (AMQP).

You can read the full tutorial here.

#7: Node.js, TC-39, and Modules

James M Snell, IBM Technical Lead for Node.js attended his first TC-39 meeting in late September.

The reason?

One of the newer JavaScript language features defined by TC-39 — namely, Modules — has been causing the Node.js core team a bit of trouble.

James and Bradley Farias (@bradleymeck) have been trying to figure out how to best implement support for ECMAScript Modules (ESM) in Node.js without causing more trouble and confusion than it would be worth.

ECMAScript modules vs. CommonJS

Because of the complexity of the issues involved, sitting down face to face with the members of TC-39 was deemed to be the most productive path forward.

The full article discusses what they found and understood from this conversation.

#8: The Node.js Developer Survey & its Results

We at Trace by RisingStack conducted a survey during 2016 Summer to find out how developers use Node.js.

The results show that MongoDB, RabbitMQ, AWS, Jenkins, Docker and Amazon Container Services are the go-to choices for developing, containerizing and shipping Node.js applications.

The results also tell Node developers major pain-point: debugging.

Node.js Survey - How do you identify issues in your app? Using logs.

You can read the full article with the Node.js survey results and graphs here.

#9: The Node.js Foundation Pledges to Manage Node Security Issues with New Collaborative Effort

The Node Foundation announced at Node.js Interactive North America that it will oversee the Node.js Security Project which was founded by Adam Baldwin and previously managed by ^Lift.

As part of the Node.js Foundation, the Node.js Security Project will provide a unified process for discovering and disclosing security vulnerabilities found in the Node.js module ecosystem. Governance for the project will come from a working group within the foundation.

The Node.js Foundation will take over the following responsibilities from ^Lift:

  • Maintaining an entry point for ecosystem vulnerability disclosure;
  • Maintaining a private communication channel for vulnerabilities to be vetted;
  • Vetting participants in the private security disclosure group;
  • Facilitating ongoing research and testing of security data;
  • Owning and publishing the base dataset of disclosures, and
  • Defining a standard for the data, which tool vendors can build on top of, and security and vendors can add data and value to as well.

You can read the full article discussing every detail on The New Stack.

#10: The Node.js Maturity Checklist

The Node.js Maturity Checklist gives you a starting point to understand how well Node.js is adopted in your company.

The checklist follows your adoption trough establishing company culture, teaching your employees, setting up your infrastructure, writing code and running the application.

You can find the full Node.js Maturity Checklist here.


Node.js Tutorial Videos: Debugging, Async, Memory Leaks, CPU Profiling

Node.js Tutorial Videos: Debugging, Async, Memory Leaks, CPU Profiling

At RisingStack, we're continuously working on delivering Node.js tutorials to help developers overcome their biggest obstacles, and become even better, week-by-week.

In our recent Node.js survey we've been told that Debugging, understanding/using Async programming, handling callbacks and memory leaks are amongst the greatest pain-points one would face on her/his journey to become a Node Hero.

This is why we came up with the idea of a new video tutorial series called Owning Node.js

In this three-part video series, we're going through all of these topics in a detailed way - by showing and explaining the actual coding process to you.

All of the videos are captioned, so you'll have no problem with understanding what's going on by enabling the subtitles!

So, let's start Owning Node.js together!


Node.js Debugging Made Easy

In this very first video, I'm going to show you how to use the debug module, the built-in debugger, and Chrome DevTools to find and fix issues easily!


Node.js Async Programming Done Right

In the second Node.js tutorial video, I'm going to show you how you can handle asynchronous operations easily, and how you can do performant applications in Node.js using them!

In this 3-part video series @RisingStack explains #nodejs debugging, #async, memory leaks and CPU profiling.

Click To Tweet

So, we are going to take a look at error handling with asynchronous operations, and learn how you can use the async module to handle multiple callbacks at the same time.


CPU and Memory Profiling with Node.js

In the 3rd Node.js tutorial of the series I teach you how to create CPU profiles and Memory Heapdumps, and how to analyze them in the Chrome DevTools profiler. You'll learn detecting memory leaks and bottlenecks easily.


More Node.js tutorials: Announcing the Node Hero Program

I hope these videos made things clearer! If you'd like to keep getting better, I've got good news for you!

We're launching the NODE HERO program as of today, which contains further webinars and screencasts, live-coding sessions and access to our Node.js Debugging and Monitoring solution called Trace.

I highly recommend to check it out, if you'd like to become an even better Node.js developer! See you there!


Graceful shutdown with Node.js and Kubernetes

Graceful shutdown with Node.js and Kubernetes

This article helps you to understand what graceful shutdown is, what are the main benefits of it and how can you set up the graceful shutdown of a Kubernetes application. We’ll discuss how you can validate and benchmark this process, and what are the most common mistakes that you should avoid.

Graceful shutdown

We can speak about the graceful shutdown of our application, when all of the resources it used and all of the traffic and/or data processing what it handled are closed and released properly.

It means that no database connection remains open and no ongoing request fails because we stop our application.

"Graceful shutdown: when all of the resources & data processing are closed and released properly." via @RisingStack

Click To Tweet

Possible scenarios for a graceful web server shutdown:

  1. App gets notification to stop (received SIGTERM)
  2. App lets know the load balancer that it’s not ready for newer requests
  3. App served all the ongoing requests
  4. App releases all of the resources correctly: DB, queue, etc.
  5. App exits with "success" status code (process.exit())

This article goes deep with shutting down web servers properly, but you should also apply these techniques to your worker processes: it’s highly recommended to stop consuming queues for SIGTERM and finish the current task/job.

Why is it important?

If we don't stop our application correctly, we are wasting resources like DB connections and we may also break ongoing requests. An HTTP request doesn't recover automatically - if we fail to serve it, then we simply missed it.

"If we don't stop our app correctly, we're wasting resources & may also break ongoing requests." via @RisingStack

Click To Tweet

Graceful start

We should only start our application when all of the dependencies and database connections are ready to handle our traffic.

Possible scenarios for a graceful web server start:

  1. App starts (npm start)
  2. App opens DB connections
  3. App listens on port
  4. App tells the load balancer that it’s ready for requests

Graceful shutdown in a Node.js application

First of all, you need to listen for the SIGTERM signal and catch it:

process.on('SIGTERM', function onSigterm () {  
  console.info('Got SIGTERM. Graceful shutdown start', new Date().toISOString())
  // start graceul shutdown here
  shutdown()
})

After that, you can close your server, then close your resources and exit the process:

function shutdown() {  
  server.close(function onServerClosed (err) {
    if (err) {
      console.error(err)
      process.exit(1)
    }

    closeMyResources(function onResourcesClosed (err) {
      // error handling
      process.exit()
    })
  })
}

Sounds easy right? Maybe a little bit too easy.

What about the load balancer? How will it know that your app is not ready to receive further requests anymore? What about keep-alive connections? Will they keep the server open for a longer time? What if my server SIGKILL my app in the meantime?

Graceful shutdown with Kubernetes

If you’d like to learn a little bit about Kubernetes, you can read our Moving a Node.js app from PaaS to Kubernetes Tutorial. For now, let's just focus on the shutdown now.

Kubernetes comes with a resource called Service. Its job is to route traffic to your pods (~instances of your app). Kubernetes also comes with a thing called Deployment that describes how your applications should behave during exit, scale and deploy - and you can also define a health check here. We will combine these resources for the perfect graceful shutdown and handover during new deploys at high traffic.

We would like to see throughput charts like below with consistent rpm and no deployment side effects at all:

Graceful shutdown example: Throughput time in Trace by RisingStack Throughput metrics shown in Trace - no change at deploy

Ok, let's see how to solve this challenge.

Setting up graceful shutdown

In Kubernetes, for a proper graceful shutdown we need to add a readinessProbe to our application’s Deployment yaml and let the Service’s load balancer know during the shutdown that we will not serve more requests so it should stop sending them. We can close the server, tear down the DB connections and exit only after that.

How does it work?

Kubernetes graceful shutdown flowchart

  1. pod receives SIGTERM signal because Kubernetes wants to stop it - because of deploy, scale, etc.
  2. App (pod) starts to return 500 for GET /health to let readinessProbe (Service) know that it's not ready to receive more requests.
  3. Kubernetes readinessProbe checks GET /health and after (failureThreshold * periodSecond) it stops redirecting traffic to the app (because it continuously returns 500)
  4. App waits (failureThreshold * periodSecond) before it starts to shut down - to make sure that the Service is getting notified via readinessProbe fail
  5. App starts graceful shutdown
  6. App first closes server with live working DB connections
  7. App closes databases after the server is closed
  8. App exits process
  9. Kubernetes force kills the application after 30s (SIGKILL) if it's still running (in an optimal case it doesn't happen)

In our case, the Kubernetes livenessProbe won't kill the app before the graceful shutdown happens because it needs to wait (failureThreshold * periodSecond) to do it. This means that the livenessProve threshold should be larger than the readinessProbe threshold. This way the (graceful stop happens around 4s, while the force kill would happen 30s after SIGTERM).

Node.js Monitoring and Debugging from the Experts of RisingStack

Compare different application revisions using Trace!
Learn more

How to achieve it?

For this we need to do two things, first we need to let the readinessProbe know after SIGTERM that we are not ready anymore:

'use strict'

const db = require('./db')  
const promiseTimeout = require('./promiseTimeout')  
const state = { isShutdown: false }  
const TIMEOUT_IN_MILLIS = 900

process.on('SIGTERM', function onSigterm () {  
  state.isShutdown = true
})

function get (req, res) {  
  // SIGTERM already happened
  // app is not ready to serve more requests
  if (state.isShutdown) {
    res.writeHead(500)
    return res.end('not ok')
  }

  // something cheap but tests the required resources
  // timeout because we would like to log before livenessProbe KILLS the process
  promiseTimeout(db.ping(), TIMEOUT_IN_MILLIS)
    .then(() => {
      // success health
      res.writeHead(200)
      return res.end('ok')
    })
    .catch(() => {
      // broken health
      res.writeHead(500)
      return res.end('not ok')
    })
}

module.exports = {  
  get: get
}

The second thing is that we have to delay the teardown process - as a sane default you can use the time needed for two failed readinessProbe: failureThreshold: 2 * periodSeconds: 2 = 4s

process.on('SIGTERM', function onSigterm () {  
  console.info('Got SIGTERM. Graceful shutdown start', new Date().toISOString())

  // Wait a little bit to give enough time for Kubernetes readiness probe to fail 
  // (we are not ready to serve more traffic)
  // Don't worry livenessProbe won't kill it until (failureThreshold: 3) => 30s
  setTimeout(greacefulStop, READINESS_PROBE_DELAY)
})

You can find the full example here:
https://github.com/RisingStack/kubernetes-graceful-shutdown-example

How to validate it?

Let's test our graceful shutdown by sending high traffic to our pods and releasing a new version in the meantime (recreating all of the pods).

Test case

$ ab -n 100000 -c 20 http://localhost:myport

Other than this, you need to change an environment variable in the Deployment to re-create all pods during the ab benchmarking.

AB output

Document Path:          /  
Document Length:        3 bytes

Concurrency Level:      20  
Time taken for tests:   172.476 seconds  
Complete requests:      100000  
Failed requests:        0  
Total transferred:      7800000 bytes  
HTML transferred:       300000 bytes  
Requests per second:    579.79 [#/sec] (mean)  
Time per request:       34.495 [ms] (mean)  
Time per request:       1.725 [ms] (mean, across all concurrent requests)  
Transfer rate:          44.16 [Kbytes/sec] received  

Application log output

Got SIGTERM. Graceful shutdown start 2016-10-16T18:54:59.208Z  
Request after sigterm: / 2016-10-16T18:54:59.217Z  
Request after sigterm: / 2016-10-16T18:54:59.261Z  
...
Request after sigterm: / 2016-10-16T18:55:00.064Z  
Request after sigterm: /health?type=readiness 2016-10-16T18:55:00.820Z  
HEALTH: NOT OK  
Request after sigterm: /health?type=readiness 2016-10-16T18:55:02.784Z  
HEALTH: NOT OK  
Request after sigterm: /health?type=liveness 2016-10-16T18:55:04.781Z  
HEALTH: NOT OK  
Request after sigterm: /health?type=readiness 2016-10-16T18:55:04.800Z  
HEALTH: NOT OK  
Server is shutting down... 2016-10-16T18:55:05.210Z  
Successful graceful shutdown 2016-10-16T18:55:05.212Z  

Benchmark result

Success!

Zero failed requests: you can see in the app log that the Service stopped sending traffic to the pod before we disconnected from the DB and killed the app.

Common gotchas

The following mistakes can still prevent your app from doing a proper graceful shutdown:

Keep-alive connections

Kubernetes doesn't handover keep-alive connections properly. :/

This means that request from agents with a keep-alive header will still be routed to the pod.

It tricked me first when I benchmarked with autocannon or Google Chrome (they use keep-alive connections).

Keep-alive connections prevent closing your server in time. To force the exit of a process, you can use the server-destroy module. Once it ran, you can be sure that all the ongoing requests are served. Alternatively you can adda timeout logic to your server.close(cb).

Docker signaling

It’s quite possible that your application doesn't receive the signals correctly in a dockerized application.

For example in our Alpine image: CMD ["node", "src"] works, CMD ["npm", "start"] doesn't. It simply doesn't pass the SIGTERM to the node process. The issue is probably related to this PR: https://github.com/npm/npm/pull/10868

An alternative you can use is dumb-init for fixing broken Docker signaling.

Takeaway

Always be sure that your application stops correctly: It releases all of the resources and helps to hand over the traffic to the new version of your app.

Check out our example repository with Node.js and Kubernetes:
https://github.com/RisingStack/kubernetes-graceful-shutdown-example

"An app stops correctly if it releases all resources & hands over the traffic to your new app." via @RisingStack

Click To Tweet

If you have any questions or thoughts about this topic, find me in the comment section below!


Experimenting With async/await in Node.js 7 Nightly

Experimenting With async/await in Node.js 7 Nightly

A couple of months ago async/await landed in V8, the JavaScript engine. In the meantime, V8 was updated multiple times in Node.js, and the latest nightly build finally added the V8 version that supports the async/await functionality to Node.js.

Disclaimer: the async/await functionality is only available in the nightly, unstable version of Node.js. Do not use it in production for now.

What's async/await?

First, let's see how you are doing async operations with Promises! This little example shows you how you can fetch data using the Fetch API and Promises.

function getTrace () {  
  return fetch('https://trace.risingstack.com', {
    method: 'get'
  })
}

getTrace()  
  .then()
  .catch()

With async/await, you can await on Promises. This will halt the execution in a non-blocking way - since it waits for the result and returns it. If the promise is not resolved but rejected, the rejected value will be thrown, meaning it can be caught with a try/catch block.

The previous example rewritten with async/await would look something like this:

async function getTrace () {  
  let pageContent
  try {
    pageContent = await fetch('https://trace.risingstack.com', {
      method: 'get'
    })
  } catch (ex) {
    console.error(ex)
  }

  return pageContent
}

getTrace()  
  .then()

For more information on async/await, I recommend reading the following resources:

Node.js Monitoring and Debugging from the Experts of RisingStack

Build performant applications using Trace
Learn more

Using async/await without transpilers

Installing Node 7

To get started, you have to get the latest build of Node.js first. To do so, head over to the Nightly builds and grab the latest one of the v7.

Once you have downloaded it, unpack it - and you are ready to use it!


If you are using nvm, you can try to install it this way:

NVM_NODEJS_ORG_MIRROR=https://nodejs.org/download/nightly  
nvm install 7  
nvm use 7  

Running files with async/await

Let's create a simple JavaScript file that delays the execution of a function using the setTimeout call, but wrapped with async/await calls.

// app.js
const timeout = function (delay) {  
  return new Promise((resolve, reject) => {
    setTimeout(() => {
      resolve()
    }, delay)
  })
}

async function timer () {  
  console.log('timer started')
  await Promise.resolve(timeout(100));
  console.log('timer finished')
}

timer()  

Once you have this file, you could try running with:

node app.js  

However, it won't work. The async/await support is still behind a flag. To run it, you have to use:

node --harmony-async-await app.js  

Building a web server with async/await

As of Koa v2, Koa supports async functions as middlewares. Previously, it was only possible with transpilers, but from now on it is not the case!

You can simply pass an async function as a Koa middleware:

// app.js
const Koa = require('koa')  
const app = new Koa()

app.use(async (ctx, next) => {  
  const start = new Date()
  await next()
  const ms = new Date() - start
  console.log(`${ctx.method} ${ctx.url} - ${ms}ms`)
})

app.use(ctx => {  
  ctx.body = 'Hello Koa'
})

app.listen(3000)  

Once you have a working server written using Koa, you can simply start it with:

node --harmony-async-await app.js  

When to start using it?

Node.js v8, the next stable version containing the V8 version that enables async/await operations will be released in April 2017. Till that time you can still experiment with it using the unstable Node.js v7 branch.

"#nodejs 8 will enable JavaScript V8 async/await operations. It will be released in April 2017." via @RisingStack

Click To Tweet


Node.js Examples - What Companies Use Node for in 2016

Node.js Examples - What Companies Use Node for in 2016

We were amazed to see how much everyone appreciated our previous article which summarized how enterprises use Node.js, so we decided to do a follow up on the subject and write more about well-known companies building software products with Node.

This article on Node.js examples shows how Groupon, Lowe’s Home Improvement and Skycatch have successfully deployed their enterprise applications with Node.js. The source of these case studies is the Node Foundations' Enterprise Conversation series. If you're interested why we joined the Foundation and what are its goals, head over here.

Groupon rebuilt its entire web layer with Node.js

The first participant in Node Foundations Enterprise Conversation series is Adam Geitgey - who's been the Director of Software Engineering for five years at one of the largest e-commerce companies, Groupon.

When he arrived at the company, it was mainly a Ruby on Rails shop, and everything was running as a huge monolithic application. That was working well for a long time, but eventually, it become too hard to maintain, and they seemed to outgrow it.

Besides that, Groupon made a number of acquisitions in the recent years, thus, in addition to their Ruby on Rails stack they ended up with a new Java stack in Europe and a PHP stack in South America.

Groupon felt the need for replacing their current technology stack, so they started to look for a more suitable software platform around 3-4 years ago.

Node.js Monitoring and Debugging from the Experts of RisingStack

Build performant applications using Trace
Learn more

The reasons for choosing Node

Groupon decided to adopt Node.js for the following reasons:

  • JavaScript is close to the universal languages, so it requires less effort to learn and work with, and the communication is also easy for the developers.
  • The scaling of Node.js applications worked well on tests. Node did not only allow them to unify their development language but also gave them performance improvements in some cases.
  • Node developers can reuse previously written code elements which can be a huge ease from time to time.
  • Node.js was the most uniform platform at Groupon. Even though they used Java for a lot of backend services, the frameworks and ways how Java was used were diverse. This gave them a way to move a large chunk of their software onto one platform in one swoop.

As a result of the decision, the Groupon engineering team rebuilt their entire web layer with Node.js. During the rebuilding process, Adam’s task was to manage the team which developed the platform and the framework which was used by other product teams to build and ship Node apps in production.

The team also released several open-source libraries that they built along the way:

  • gofer, which is an API client library they used to talk to backend services.
  • node cached, which is caching library for Node.js.

Today Groupon is using Node on multiple platforms:

  • Around 3-400 back-end services are running with Node.js, mixed with Java and Ruby.
  • They use Node as an API integration layer.
  • They use it for all of their client apps, including their website.

Currently, Groupon has 70 Node.js apps in production, which are used in 30 countries. Overall, Groupon uses Node.js heavily in the front-end, and here and there for several backend purposes.

The future of Node at Groupon

Regarding the future, they are totally convinced to invest into Node for the web platform. All of their production services are on Node 4 right now, but they are already excited about Node 6, and waiting for the LTS version to come out.

In the past - because Groupon was on Ruby - they have been using CoffeeScript a lot, and it is a great chance for them to finally migrate from CoffeeScript and standardize on plain JavaScript.

Another big project Groupon is working on is moving from a model where developers maintain their own servers to a model where the company provides them with clusters of servers and their apps run on them - more like a Heroku model.

Node.js: the glue of Skycatch

Andre Deutmeyer is the next participant in Node Foundations Enterprise Conversation series. His role is to lead the web infrastructure and development team at Skycatch.

Skycatch is a data company helping to capture, manage and analyze commercial drone data. Skycatch sees the constructions or mining sites as a database that needs to be queried. Existing tools like writing raw SQL queries are hard and time-consuming to create, while Skycatch’s solution makes it easy to extract actionable data from the sites.

Skycatch has small cross-functional teams with 20 developers, and as I already mentioned, Andre’s role is to lead the web, infrastructure and development team. He is involved in architecting and scaling out data processing, while his goal is to deliver the data that you send them reliably and quickly.

What helps them with that? Of course, Node.js, but where do they use it?

“We are using node everywhere you can think of - Node is our glue.”

They use it on their drones, and across their management and iOS apps. Almost their entire backend is running on Node. For all of their data processing, they have a lot of microservices that are constantly communicating with each other and Node is what keeps that going smoothly.

What are the benefits of using Node.js at Skycatch?

Node has a great impact on the development at Skycatch, as Andre says:

“You can’t really put a price on the ability to move fluently from the front-end development into a service architecture style and scaling things is easy because there is no hurdle moving between frontend and backend. It scales much more easily than if we had chosen a different language to run on the servers.”

They have a lot of people who were working on the web, API, and the data processing sides as well. Thus, the developers can figure out during the projects which part of the stack they prefer working on and again; there's not a lot of huge mental hurdle to move from one to the other because the programming language is not a problem.

The future of Node.js at Skycatch

Recently they've been looking at AWS Lambda as it has released support for Node 4. Since then, they've been in a big hurry to start over coding a lot of their smaller services to make use of the infrastructure on AWS Lambda. They are a small team, so they want to focus on the product, not on having to scale the infrastructure, and AWS Lambda is perfect for that.

Lowe’s Home Improvement thinks differently thanks to Node.js

The latest participant in the Node Foundations' Enterprise Conversations series was Rick Adam. He is the manager of the IT application portfolio of digital interfaces at Lowe’s Home Improvement.

His role at Lowe’s is the management of the applications and teams that drive the presentation tier of Lowe’s digital properties. Rick manages a team of 25 developers, including the software architecture team.

Lowe’s history and how they arrived at Node.js

Coming out of the recession-era of 2007-2008, the company started to see that the home market continued to grow and the needed to drive further investments into a digital space.

As new consumer technologies began to come out for smartphones and tablets, the company began to look at Lowe’s Digital not only as a valuable sales channel for the company, but also as a true sales driver.

They began to build the engineering team which consisted of about 2-3 web developers back at that time.

Killing the Monolith

They started to look for a new technology because their application was a big monolithic app, and it was a daunting process to release and introduce any change regardless of how small that might have been.

Since Lowe’s is in the retail business, their number one priority is to drive customers through a journey and to enable them finishing the checkout processes. However, in those days minor things, like a text change on the product list page required the full application to be updated and the monolithic app to be packaged and deployed again - which crippled their ability to move fast.

Finally, the risk and the quality assurance behind doing that became so daunting that their business and IT people weren't comfortable keeping up with the pace that the business required.

Although they've looked at more off-the-shelf software solutions or larger applications to drive their digital property, traditionally it has not been a part of their process even to search for open-source technologies. However, they began to reconsider their application portfolio and to figure out introducing a more open-source software or solution.

Lowe’s digital team was on the frontline, trying to drive their technology forward. They were in the middle of a major re-architecturing and redesign project for www.lowes.com and their mobile site, with the goal of to bringing a new experience to the table.

During that project, they started to take a look at what's the right technology stack that their business and brand needs, which led them to start using Node.js about two years ago.

How Lowe’s profits from adopting Node.js

When they looked at Node.js, it made sense as they had a great team of web developers who were already well skilled in JavaScript. Hence they didn't have to go and find talent or a new skill set.

“We had a great team here, and the application made sense just from how it plays into our target status quo”!

Node is a perfect technology for their web tier for brokering API requests. Also, Lowe’s has seen a lot of growth both from the company itself and from the technology that they are introducing.

“It's been exciting to see the growth and the maturity of our development of acumen and where we are going to take the brand.”

One of the aspects that they liked about Node was the asynchronous model, providing the ability to call multiple services at once. When they all finish, they can then render the result with their microservices model.

“It delivers a one-page experience that calls five different little services and not have to do the traditional waterfall approach.”

Node has been doing great regarding performance, especially at scale. The applications are using fewer resources in Node.js compared to what they would traditionally use in Java to render a page. The reason for that is the small fragmented applications do one page better than a monolithic app.

What has also been ideal for them is the reuse of their front-end developer skills to work with JavaScript on the backend. That is especially useful because traditionally they had a segregated teamwork. Back-end guys were traditionally on Java and their frontend guys work on all the front-end CSS, JavaScript, and HTML.

By going with Node.js, the engineering team was able to take full responsibility of owning the entire stack for UI from the backend trough the view layer, to the actual front-end. They were able to reuse their resources that are well-versed in JavaScript and HTML and make it go into the Node.

Now they can put new features together quickly and even do prototyping to do research and some user testing. Then take that idea to production level and release it without putting the other parts of their application stack at risk. Rick even says:

“Node.js really opened some eyes to the potential here to think differently than we've ever been able to in the past six years.”

Node.js Examples: The Conclusion

As it has been pointed out, companies can benefit a lot from the adoption of Node.js both on the developer and the application level. The latter is especially considerable when it comes to performance and scalability.

"Companies can benefit a lot from Node.js both on the developer and the application level" via @RisingStack #nodejs

Click To Tweet

If you’d like to start learning more, I suggest to check out our Node Hero tutorial series and deliver software products using Node!