Trace Node.js Monitoring Start my free trial!

javascript

JavaScript Clean Coding Best Practices - Node.js at Scale

JavaScript Clean Coding Best Practices - Node.js at Scale

Writing clean code is what you must know and do in order to call yourself a professional developer. There is no reasonable excuse for doing anything less than your best.

In this blog post, we will cover general clean coding principles for naming and using variables & functions, as well as some JavaScript specific clean coding best practices.

“Even bad code can function. But if the code isn’t clean, it can bring a development organization to its knees.” — Robert C. Martin (Uncle Bob)


Node.js at Scale is a collection of articles focusing on the needs of companies with bigger Node.js installations and advanced Node developers. Chapters:


First of all, what does clean coding mean?

Clean coding means that in the first place you write code for your later self and for your co-workers and not for the machine.

Your code must be easily understandable for humans.

"Write code for your later self and for your co-workers in the first place - not for the machine." via @RisingStack

Click To Tweet

You know you are working on a clean code when each routine you read turns out to be pretty much what you expected.

JavaSctipr Clean Coding: The only valid measurement of code quality is WTFs/minute

JavaScript Clean Coding Best Practices

Now that we know what every developer should aim for, let’s go through the best practices!

How should I name my variables?

Use intention-revealing names and don't worry if you have long variable names instead of saving a few keyboard strokes.

If you follow this practice, your names become searchable, which helps a lot when you do refactors or you are just looking for something.

// DON'T
let d  
let elapsed  
const ages = arr.map((i) => i.age)

// DO
let daysSinceModification  
const agesOfUsers = users.map((user) => user.age)  

Also, make meaningful distinctions and don't add extra, unnecessary nouns to the variable names, like its type (hungarian notation).

// DON'T
let nameString  
let theUsers

// DO
let name  
let users  

Make your variable names easy to pronounce, because for the human mind it takes less effort to process.

When you are doing code reviews with your fellow developers, these names are easier to reference.

// DON'T
let fName, lName  
let cntr

let full = false  
if (cart.size > 100) {  
  full = true
}

// DO
let firstName, lastName  
let counter

const MAX_CART_SIZE = 100  
// ...
const isFull = cart.size > MAX_CART_SIZE  

In short, don't cause extra mental mapping with your names.

How should I write my functions?

Your functions should do one thing only on one level of abstraction.

Functions should do one thing. They should do it well. They should do it only. — Robert C. Martin (Uncle Bob)

// DON'T
function getUserRouteHandler (req, res) {  
  const { userId } = req.params
  // inline SQL query
  knex('user')
    .where({ id: userId })
    .first()
    .then((user) => res.json(user))
}

// DO
// User model (eg. models/user.js)
const tableName = 'user'  
const User = {  
  getOne (userId) {
    return knex(tableName)
      .where({ id: userId })
      .first()
  }
}

// route handler (eg. server/routes/user/get.js)
function getUserRouteHandler (req, res) {  
  const { userId } = req.params
  User.getOne(userId)
    .then((user) => res.json(user))
}

After you wrote your functions properly, you can test how well you did with CPU profiling - which helps you to find bottlenecks.

Use long, descriptive names

A function name should be a verb or a verb phrase, and it needs to communicate its intent, as well as the order and intent of the arguments.

A long descriptive name is way better than a short, enigmatic name or a long descriptive comment.

// DON'T
/**
 * Invite a new user with its email address
 * @param {String} user email address
 */
function inv (user) { /* implementation */ }

// DO
function inviteUser (emailAddress) { /* implementation */ }  


Avoid long argument list

Use a single object parameter and destructuring assignment instead. It also makes handling optional parameters much easier.

// DON'T
function getRegisteredUsers (fields, include, fromDate, toDate) { /* implementation */ }  
getRegisteredUsers(['firstName', 'lastName', 'email'], ['invitedUsers'], '2016-09-26', '2016-12-13')

// DO
function getRegisteredUsers ({ fields, include, fromDate, toDate }) { /* implementation */ }  
getRegisteredUsers({  
  fields: ['firstName', 'lastName', 'email'],
  include: ['invitedUsers'],
  fromDate: '2016-09-26',
  toDate: '2016-12-13'
})



Need help with enterprise-grade Node.js Development?
Hire the experts of RisingStack!


Reduce side effects

Use pure functions without side effects, whenever you can. They are really easy to use and test.

// DON'T
function addItemToCart (cart, item, quantity = 1) {  
  const alreadyInCart = cart.get(item.id) || 0
  cart.set(item.id, alreadyInCart + quantity)
  return cart
}

// DO
// not modifying the original cart
function addItemToCart (cart, item, quantity = 1) {  
  const cartCopy = new Map(cart)
  const alreadyInCart = cartCopy.get(item.id) || 0
  cartCopy.set(item.id, alreadyInCart + quantity)
  return cartCopy
}

// or by invert the method location
// you can expect that the original object will be mutated
// addItemToCart(cart, item, quantity) -> cart.addItem(item, quantity)
const cart = new Map()  
Object.assign(cart, {  
  addItem (item, quantity = 1) {
    const alreadyInCart = this.get(item.id) || 0
    this.set(item.id, alreadyInCart + quantity)
    return this
  }
})


Organize your functions in a file according to the stepdown rule

Higher level functions should be on top and lower levels below. It makes it natural to read the source code.

// DON'T
// "I need the full name for something..."
function getFullName (user) {  
  return `${user.firstName} ${user.lastName}`
}

function renderEmailTemplate (user) {  
  // "oh, here"
  const fullName = getFullName(user)
  return `Dear ${fullName}, ...`
}

// DO
function renderEmailTemplate (user) {  
  // "I need the full name of the user"
  const fullName = getFullName(user)
  return `Dear ${fullName}, ...`
}

// "I use this for the email template rendering"
function getFullName (user) {  
  return `${user.firstName} ${user.lastName}`
}


Query or modification

Functions should either do something (modify) or answer something (query), but not both.


Everyone likes to write JavaScript differently, what to do?

As JavaScript is dynamic and loosely typed, it is especially prone to programmer errors.

Use project or company wise linter rules and formatting style.

The stricter the rules, the less effort will go into pointing out bad formatting in code reviews. It should cover things like consistent naming, indentation size, whitespace placement and even semicolons.

"The stricter the linter rules, the less effort needed to point out bad formatting in code reviews." by @RisingStack

Click To Tweet

The standard JS style is quite nice to start with, but in my opinion, it isn't strict enough. I can agree most of the rules in the Airbnb style.

How to write nice async code?

Use Promises whenever you can.

Promises are natively available from Node 4. Instead of writing nested callbacks, you can have chainable Promise calls.

// AVOID
asyncFunc1((err, result1) => {  
  asyncFunc2(result1, (err, result2) => {
    asyncFunc3(result2, (err, result3) => {
      console.lor(result3)
    })
  })
})

// PREFER
asyncFuncPromise1()  
  .then(asyncFuncPromise2)
  .then(asyncFuncPromise3)
  .then((result) => console.log(result))
  .catch((err) => console.error(err))

Most of the libraries out there have both callback and promise interfaces, prefer the latter. You can even convert callback APIs to promise based one by wrapping them using packages like es6-promisify.

// AVOID
const fs = require('fs')

function readJSON (filePath, callback) {  
  fs.readFile(filePath, (err, data) => {
    if (err) {
      return callback(err)
    }

    try {
      callback(null, JSON.parse(data))
    } catch (ex) {
      callback(ex)
    }
  })
}

readJSON('./package.json', (err, pkg) => { console.log(err, pkg) })

// PREFER
const fs = require('fs')  
const promisify = require('es6-promisify')

const readFile = promisify(fs.readFile)  
function readJSON (filePath) {  
  return readFile(filePath)
    .then((data) => JSON.parse(data))
}

readJSON('./package.json')  
  .then((pkg) => console.log(pkg))
  .catch((err) => console.error(err))

The next step would be to use async/await (≥ Node 7) or generators with co (≥ Node 4) to achieve synchronous like control flows for your asynchronous code.

const request = require('request-promise-native')

function getExtractFromWikipedia (title) {  
  return request({
    uri: 'https://en.wikipedia.org/w/api.php',
    qs: {
      titles: title,
      action: 'query',
      format: 'json',
      prop: 'extracts',
      exintro: true,
      explaintext: true
    },
    method: 'GET',
    json: true
  })
    .then((body) => Object.keys(body.query.pages).map((key) => body.query.pages[key].extract))
    .then((extracts) => extracts[0])
    .catch((err) => {
      console.error('getExtractFromWikipedia() error:', err)
      throw err
    })
} 

// PREFER
async function getExtractFromWikipedia (title) {  
  let body
  try {
    body = await request({ /* same parameters as above */ })
  } catch (err) {
    console.error('getExtractFromWikipedia() error:', err)
    throw err
  }

  const extracts = Object.keys(body.query.pages).map((key) => body.query.pages[key].extract)
  return extracts[0]
}

// or
const co = require('co')

const getExtractFromWikipedia = co.wrap(function * (title) {  
  let body
  try {
    body = yield request({ /* same parameters as above */ })
  } catch (err) {
    console.error('getExtractFromWikipedia() error:', err)
    throw err
  }

  const extracts = Object.keys(body.query.pages).map((key) => body.query.pages[key].extract)
  return extracts[0]
})

getExtractFromWikipedia('Robert Cecil Martin')  
  .then((robert) => console.log(robert))


How should I write performant code?

In the first place, you should write clean code, then use profiling to find performance bottlenecks.

Never try to write performant and smart code first, instead, optimize the code when you need to and refer to true impact instead of micro-benchmarks.

"Write clean code first and optimize it when you need to. Refer to true impact instead of micro-benchmarks!"

Click To Tweet

Although, there are some straightforward scenarios like eagerly initializing what you can (eg. joi schemas in route handlers, which would be used in every request and adds serious overhead if recreated every time) and using asynchronous instead of blocking code.

Next up in Node.js at Scale

In the next episode of this series, we’ll discuss advanced Node.js async best practices and avoiding the callback hell!

If you have any questions regarding clean coding, don’t hesitate and let me know in the comments!


Writing a JavaScript Framework - Data Binding with ES6 Proxies

Writing a JavaScript Framework - Data Binding with ES6 Proxies

This is the fifth chapter of the Writing a JavaScript framework series. In this chapter, I am going to explain how to create a simple, yet powerful data binding library with the new ES6 Proxies.

The series is about an open-source client-side framework, called NX. During the series, I explain the main difficulties I had to overcome while writing the framework. If you are interested in NX please visit the home page.

The series includes the following chapters:

  1. Project structuring
  2. Execution timing
  3. Sandboxed code evaluation
  4. Data binding introduction
  5. Data Binding with ES6 Proxies (current chapter)
  6. Custom elements
  7. Client side routing

Prerequisites

ES6 made JavaScript a lot more elegant, but the bulk of new features are just syntactic sugar. Proxies are one of the few non polyfillable additions. If you are not familiar with them, please take a quick look at the MDN Proxy docs before going on.

"#ES6 made #JavaScript a lot more elegant. Proxies are one of the few non polyfillable additions." via @RisingStack

Click To Tweet

Having a basic knowledge of the ES6 Reflection API and Set, Map and WeakMap objects will also be helpful.

The nx-observe library

nx-observe is a data binding solution in under 140 lines of code. It exposes the observable(obj) and observe(fn) functions, which are used to create observable objects and observer functions. An observer function automatically executes when an observable property used by it changes. The example below demonstrates this.

// this is an observable object
const person = observable({name: 'John', age: 20})

function print () {  
  console.log(`${person.name}, ${person.age}`)
}

// this creates an observer function
// outputs 'John, 20' to the console
observe(print)

// outputs 'Dave, 20' to the console
setTimeout(() => person.name = 'Dave', 100)

// outputs 'Dave, 22' to the console
setTimeout(() => person.age = 22, 200)  

The print function passed to observe() reruns every time person.name or person.age changes. print is called an observer function.

If you are interested in a few more examples, please check the GitHub readme or the NX home page for a more lifelike scenario.

Implementing a simple observable

In this section, I am going to explain what happens under the hood of nx-observe. First, I will show you how changes to an observable's properties are detected and paired with observers. Then I will explain a way to run the observer functions triggered by these changes.

Registering changes

Changes are registered by wrapping observable objects into ES6 Proxies. These proxies seamlessly intercept get and set operations with the help of the Reflection API.

The variables currentObserver and queueObserver() are used in the code below, but will only be explained in the next section. For now, it is enough to know that currentObserver always points to the currently executing observer function, and queueObserver() is a function that queues an observer to be executed soon.

/* maps observable properties to a Set of
observer functions, which use the property */  
const observers = new WeakMap()

/* points to the currently running 
observer function, can be undefined */  
let currentObserver

/* transforms an object into an observable 
by wrapping it into a proxy, it also adds a blank  
Map for property-observer pairs to be saved later */  
function observable (obj) {  
  observers.set(obj, new Map())
  return new Proxy(obj, {get, set})
}

/* this trap intercepts get operations,
it does nothing if no observer is executing  
at the moment */  
function get (target, key, receiver) {  
  const result = Reflect.get(target, key, receiver)
   if (currentObserver) {
     registerObserver(target, key, currentObserver)
   }
  return result
}

/* if an observer function is running currently,
this function pairs the observer function  
with the currently fetched observable property  
and saves them into the observers Map */  
function registerObserver (target, key, observer) {  
  let observersForKey = observers.get(target).get(key)
  if (!observersForKey) {
    observersForKey = new Set()
    observers.get(target).set(key, observersForKey)
  }
  observersForKey.add(observer)
}

/* this trap intercepts set operations,
it queues every observer associated with the  
currently set property to be executed later */  
function set (target, key, value, receiver) {  
  const observersForKey = observers.get(target).get(key)
  if (observersForKey) {
    observersForKey.forEach(queueObserver)
  }
  return Reflect.set(target, key, value, receiver)
}

The get trap does nothing if currentObserver is not set. Otherwise, it pairs the fetched observable property and the currently running observer and saves them into the observers WeakMap. Observers are saved into a Set per observable property. This ensures that there are no duplicates.

The set trap is retrieving all the observers paired with the modified observable property and queues them for later execution.

You can find a figure and a step-by-step description explaining the nx-observe example code below.

JavaScript data binding with es6 proxy - observable code sample

  1. The person observable object is created.
  2. currentObserver is set to print.
  3. print starts executing.
  4. person.name is retrieved inside print.
  5. The proxy get trap on person is invoked.
  6. The observer Set belonging to the (person, name) pair is retrieved by observers.get(person).get('name').
  7. currentObserver (print) is added to the observer Set.
  8. Step 4-7 are executed again with person.age.
  9. ${person.name}, ${person.age} is printed to the console.
  10. print finishes executing.
  11. currentObserver is set to undefined.
  12. Some other code starts running.
  13. person.age is set to a new value (22).
  14. The proxy set trap on person is invoked.
  15. The observer Set belonging to the (person, age) pair is retrieved by observers.get(person).get('age').
  16. Observers in the observer Set (including print) are queued for execution.
  17. print executes again.

Running the observers

Queued observers run asynchronously in one batch, which results in superior performance. During registration, the observers are synchronously added to the queuedObservers Set. A Set cannot contain duplicates, so enqueuing the same observer multiple times won't result in multiple executions. If the Set was empty before, a new task is scheduled to iterate and execute all the queued observers after some time.

/* contains the triggered observer functions,
which should run soon */  
const queuedObservers = new Set()

/* points to the currently running observer,
it can be undefined */  
let currentObserver

/* the exposed observe function */
function observe (fn) {  
  queueObserver(fn)
}

/* adds the observer to the queue and 
ensures that the queue will be executed soon */  
function queueObserver (observer) {  
  if (queuedObservers.size === 0) {
    Promise.resolve().then(runObservers)
  }
  queuedObservers.add(observer)
}

/* runs the queued observers,
currentObserver is set to undefined in the end */  
function runObservers () {  
  try {
    queuedObservers.forEach(runObserver)
  } finally {
    currentObserver = undefined
    queuedObservers.clear()
  }
}

/* sets the global currentObserver to observer, 
then executes it */  
function runObserver (observer) {  
  currentObserver = observer
  observer()
}

The code above ensures that whenever an observer is executing, the global currentObserver variable points to it. Setting currentObserver 'switches' the get traps on, to listen and pair currentObserver with all the observable properties it uses while executing.

Building a dynamic observable tree

So far our model works nicely with single level data structures but requires us to wrap every new object-valued property in an observable by hand. For example, the code below would not work as expected.

const person = observable({data: {name: 'John'}})

function print () {  
  console.log(person.data.name)
}

// outputs 'John' to the console
observe(print)

// does nothing
setTimeout(() => person.data.name = 'Dave', 100)  

In order to make this code work, we would have to replace observable({data: {name: 'John'}}) with observable({data: observable({name: 'John'})}). Fortunately we can eliminate this inconvenience by modifying the get trap a little bit.

function get (target, key, receiver) {  
  const result = Reflect.get(target, key, receiver)
  if (currentObserver) {
    registerObserver(target, key, currentObserver)
    if (typeof result === 'object') {
      const observableResult = observable(result)
      Reflect.set(target, key, observableResult, receiver)
      return observableResult
    }
  }
  return result
}

The get trap above wraps the returned value into an observable proxy before returning it - in case it is an object. This is perfect from a performance point of view too, since observables are only created when they are really needed by an observer.

Comparison with an ES5 technique

A very similar data binding technique can be implemented with ES5 property accessors (getter/setter) instead of ES6 Proxies. Many popular libraries use this technique, for example MobX and Vue. Using proxies over accessors has two main advantages and a major disadvantage.

Expando properties

Expando properties are dynamically added properties in JavaScript. The ES5 technique does not support expando properties since accessors have to be predefined per property to be able to intercept operations. This is a technical reason why central stores with a predefined set of keys are trending nowadays.

On the other hand, the Proxy technique does support expando properties, since proxies are defined per object and they intercept operations for every property of the object.

A typical example where expando properties are crucial is with using arrays. JavaScript arrays are pretty much useless without the ability to add or remove items from them. ES5 data binding techniques usually hack around this problem by providing custom or overwritten Array methods.

Getters and setters

Libraries using the ES5 method provide 'computed' bound properties by some special syntax. These properties have their native equivalents, namely getters and setters. However the ES5 method uses getters/setters internally to set up the data binding logic, so it can not work with property accessors.

Proxies intercept every kind of property access and mutation, including getters and setters, so this does not pose a problem for the ES6 method.

The disadvantage

The big disadvantage of using Proxies is browser support. They are only supported in the most recent browsers and the best parts of the Proxy API are non polyfillable.

A few notes

The data binding method introduced here is a working one, but I made some simplifications to make it digestible. You can find a few notes below about the topics I left out because of this simplification.

Cleaning up

Memory leaks are nasty. The code introduced here avoids them in a sense, as it uses a WeakMap to save the observers. This means that the observers associated with an observable are garbage collected together with the observable.

However, a possible use case could be a central, durable store with a frequently shifting DOM around it. In this case, DOM nodes should release all of their registered observers before they are garbage collected. This functionality is left out of the example, but you can check how the unobserve() function is implemented in the nx-observe code.

Double wrapping with Proxies

Proxies are transparent, meaning there is no native way of determining if something is a Proxy or a plain object. Moreover, they can be nested infinitely, so without necessary precaution, we might end up wrapping an observable again and again.

There are many clever ways to make a Proxy distinguishable from normal objects, but I left it out of the example. One way would be to add a Proxy to a WeakSet named proxies and check for inclusion later. If you are interested in how nx-observe implements the isObservable() method, please check the code.

Inheritance

nx-observe also works with prototypal inheritance. The example below demonstrates what does this mean exactly.

const parent = observable({greeting: 'Hello'})  
const child = observable({subject: 'World!'})  
Object.setPrototypeOf(child, parent)

function print () {  
  console.log(`${child.greeting} ${child.subject}`)
}

// outputs 'Hello World!' to the console
observe(print)

// outputs 'Hello There!' to the console
setTimeout(() => child.subject = 'There!')

// outputs 'Hey There!' to the console
setTimeout(() => parent.greeting = 'Hey', 100)

// outputs 'Look There!' to the console
setTimeout(() => child.greeting = 'Look', 200)  

The get operation is invoked for every member of the prototype chain until the property is found, so the observers are registered everywhere they could be needed.

There are some edge cases caused by the little-known fact that set operations also walk the prototype chain (quite sneakily), but these won't be covered here.

Internal properties

Proxies also intercept 'internal property access'. Your code probably uses many internal properties that you usually don't even think about. Some keys for such properties are the well-known Symbols for example. Properties like these are usually correctly intercepted by Proxies, but there are a few buggy cases.

Asynchronous nature

The observers could be run synchronously when the set operation is intercepted. This would provide several advantages like less complexity, predictable timing and nicer stack traces, but it would also cause a big mess for certain scenarios.

Imagine pushing 1000 items to an observable array in a single loop. The array length would change a 1000 times and the observers associated with it would also execute a 1000 times in quick succession. This means running the exact same set of functions a 1000 times, which is rarely a useful thing.

Another problematic scenario would be two-way observations. The below code would start an infinite cycle if observers ran synchronously.

const observable1 = observable({prop: 'value1'})  
const observable2 = observable({prop: 'value2'})

observe(() => observable1.prop = observable2.prop)  
observe(() => observable2.prop = observable1.prop)  

For these reasons nx-observe queues observers without duplicates and executes them in one batch as a microtask to avoid FOUC. If you are unfamiliar with the concept of a microtask, please check my previous article about timing in the browser.

Data binding with ES6 Proxies - the Conclusion

If you are interested in the NX framework, please visit the home page. Adventurous readers can find the NX source code in this Github repository and the nx-observe source code in this Github repository.

I hope you found this a good read, see you next time when weI’ll discuss custom HTML Elements!

If you have any thoughts on the topic, please share them in the comments.


Understanding the Node.js Event Loop - Node.js at Scale

Understanding the Node.js Event Loop - Node.js at Scale

This article helps you to understand how the Node.js event loop works, and how you can leverage it to build fast applications. We’ll also discuss the most common problems you might encounter, and the solutions for them.

With Node.js at Scale we are creating a collection of articles focusing on the needs of companies with bigger Node.js installations, and developers who already learned the basics of Node.

Upcoming chapters for the Node.js at Scale series:


The problem

Most of the backends behind websites don’t need to do complicated computations. Our programs spend most of their time waiting for the disk to read & write , or waiting for the wire to transmit our message and send back the answer.

IO operations can be orders of magnitude slower than data processing. Take this for example: SSD-s can have a read speed of 200-730 MB/s - at least a high-end one. Reading just one kilobyte of data would take 1.4 microseconds, but during this time a CPU clocked at 2GHz could have performed 28 000 of instruction-processing cycles.

For network communications it can be even worse, just try and ping google.com

$ ping google.com
64 bytes from 172.217.16.174: icmp_seq=0 ttl=52 time=33.017 ms  
64 bytes from 172.217.16.174: icmp_seq=1 ttl=52 time=83.376 ms  
64 bytes from 172.217.16.174: icmp_seq=2 ttl=52 time=26.552 ms  
64 bytes from 172.217.16.174: icmp_seq=3 ttl=52 time=40.153 ms  
64 bytes from 172.217.16.174: icmp_seq=4 ttl=52 time=37.291 ms  
64 bytes from 172.217.16.174: icmp_seq=5 ttl=52 time=58.692 ms  
64 bytes from 172.217.16.174: icmp_seq=6 ttl=52 time=45.245 ms  
64 bytes from 172.217.16.174: icmp_seq=7 ttl=52 time=27.846 ms  

The average latency is about 44 milliseconds. Just while waiting for a packet to make a round-trip on the wire, the previously mentioned processor can perform 88 millions of cycles.

The solution

Most operational systems provide some kind of an Asynchronous IO interface, which allows you to start processing data that does not require the result of the communication, meanwhile the communication still goes on..

This can be achieved in several ways. Nowadays it is mostly done by leveraging the possibilities of multithreading at the cost of extra software complexity. For example reading a file in Java or Python is a blocking operation. Your program cannot do anything else while it is waiting for the network / disk communication to finish. All you can do - at least in Java - is to fire up a different thread then notify your main thread when the operation has finished.

It is tedious, complicated, but gets the job done. But what about Node? Well, we are surely facing some problems as Node.js - or more like V8 - is single-threaded. Our code can only run in one thread.

EDIT: This is not entirely true. Both Java and Python have async interfaces, but using them is definitely more difficult than in Node.js. Thanks to Shahar and Dirk Harrington for pointing this out.

You might have heard that in a browser, setting setTimeout(someFunction, 0) can sometimes fix things magically. But why does setting a timeout to 0, deferring execution by 0 milliseconds fix anything? Isn’t it the same as simply calling someFunction immediately? Not really.

First of all, let's take a look at the call stack, or simply, “stack”. I am going to make things simple, as we only need to understand the very basics of the call stack. In case you are familiar how it works, feel free to jump to the next section.

Stack

Whenever you call a functions return address, parameters and local variables will be pushed to the stack. If you call another function from the currently running function, its contents will be pushed on top in the same manner as the previous one - with its return address.

For the sake of simplicity I will say that 'a function is pushed' to the top of the stack from now on, even though it is not exactly correct.

Let's take a look!

 1 function main () {
 2   const hypotenuse = getLengthOfHypotenuse(3, 4)
 3   console.log(hypotenuse)
 4 }
 5
 6 function getLengthOfHypotenuse(a, b) {
 7   const squareA = square(a)
 8   const squareB = square(b)
 9   const sumOfSquares = squareA + squareB
10   return Math.sqrt(sumOfSquares)  
11 }  
12  
13 function square(number) {  
14   return number * number  
15 }  
16  
17 main()  

main is called first:

The main function

then main calls getLengthOfHypotenuse with 3 and 4 as arguments

The getLengthOfHypotenuse function

afterwards square is with the value of a

The square(a) function

when square returns, it is popped from the stack, and its return value is assigned to squareA. squareA is added to the stack frame of getLengthOfHypotenuse

Variable a

same goes for the next call to square

The square(b) function

Variable b

in the next line the expression squareA + squareB is evaluated

sumOfSquares

then Math.sqrt is called with sumOfSquares

Math.sqrt

now all is left for getLengthOfHypotenuse is to return the final value of its calculation

The return function

the returned value gets assigned to hypotenuse in main

hypotenuse

the value of hypotenuse is logged to console

The console log

finally, main returns without any value, gets popped from the stack leaving it empty

Finally

SIDE NOTE: You saw that local variables are popped from the stack when the functions execution finishes. It happens only when you work with simple values such as numbers, strings and booleans. Values of objects, arrays and such are stored in the heap and your variable is merely a pointer to them. If you pass on this variable, you will only pass the said pointer, making these values mutable in different stack frames. When the function is popped from the stack, only the pointer to the Object gets popped with leaving the actual value in the heap. The garbage collector is the guy who takes care of freeing up space once the objects outlived their usefulness.

Enter Node.js Event Loop

The Node.js Event Loop - cat version

No, not this loop. :)

So what happens when we call something like setTimeout, http.get, process.nextTick, or fs.readFile? Neither of these things can be found in V8's code, but they are available in the Chrome WebApi and the C++ API in case of Node.js. To understand this, we will have to understand the order of execution a little bit better.

Let's take a look at a more common Node.js application - a server listening on localhost:3000/. Upon getting a request, the server will call wttr.in/<city> to get the weather, print some kind messages to the console, and it forwards responses to the caller after recieving them.

'use strict'  
const express = require('express')  
const superagent = require('superagent')  
const app = express()

app.get('/', sendWeatherOfRandomCity)

function sendWeatherOfRandomCity (request, response) {  
  getWeatherOfRandomCity(request, response)
  sayHi()
}

const CITIES = [  
  'london',
  'newyork',
  'paris',
  'budapest',
  'warsaw',
  'rome',
  'madrid',
  'moscow',
  'beijing',
  'capetown',
]

function getWeatherOfRandomCity (request, response) {  
  const city = CITIES[Math.floor(Math.random() * CITIES.length)]
  superagent.get(`wttr.in/${city}`)
    .end((err, res) => {
      if (err) {
        console.log('O snap')
        return response.status(500).send('There was an error getting the weather, try looking out the window')
      }
      const responseText = res.text
      response.send(responseText)
      console.log('Got the weather')
    })

  console.log('Fetching the weather, please be patient')
}

function sayHi () {  
  console.log('Hi')
}

app.listen(3000)  

What will be printed out aside from getting the weather when a request is sent to localhost:3000?

If you have some experience with Node, you shouldn't be surprised that even though console.log('Fetching the weather, please be patient') is called after console.log('Got the weather') in the code, the former will print first resulting in:

Fetching the weather, please be patient  
Hi  
Got the weather  

What happened? Even though V8 is single-threaded, the underlying C++ API of Node isn't. It means that whenever we call something that is a non-blocking operation, Node will call some code that will run concurrently with our javascript code under the hood. Once this hiding thread receives the value it awaits for or throws an error, the provided callback will be called with the necessary parameters.

SIDE NOTE: The ‘some code’ we mentioned is actually part of libuv. libuv is the open source library that handles the thread-pool, doing signaling and all other magic that is needed to make the asynchronous tasks work. It was originally developed for Node.js but a lot of other projects use of it by now.



Need help with enterprise-grade Node.js Development?
Hire the experts of RisingStack!


To peek under the hood, we need to introduce two new concepts: the event loop and the task queue.

Task queue

Javascript is a single-threaded, event-driven language. This means that we can attach listeners to events, and when a said event fires, the listener executes the callback we provided.

Whenever you call setTimeout, http.get or fs.readFile, Node.js sends these operations to a different thread allowing V8 to keep executing our code. Node also calls the callback when the counter has run down or the IO / http operation has finished.

These callbacks can enqueue other tasks and those functions can enqueue others and so on. This way you can read a file while processing a request in your server, and then make an http call based on the read contents without blocking other requests from being handled.

"#nodejs sends IO operations to different threads so #v8 can keep executing our code" via @RisingStack #javascript

Click To Tweet

However, we only have one main thread and one call-stack, so in case there is another request being served when the said file is read, its callback will need to wait for the stack to become empty. The limbo where callbacks are waiting for their turn to be executed is called the task queue (or event queue, or message queue). Callbacks are being called in an infinite loop whenever the main thread has finished its previous task, hence the name 'event loop'.

In our previous example it would look something like this:

  1. express registers a handler for the 'request' event that will be called when request arrives to '/'
  2. skips the functions and starts listening on port 3000
  3. the stack is empty, waiting for 'request' event to fire
  4. upon incoming request, the long awaited event fires, express calls the provided handler sendWeatherOfRandomCity
  5. sendWeatherOfRandomCity is pushed to the stack
  6. getWeatherOfRandomCity is called and pushed to the stack
  7. Math.floor and Math.random are called, pushed to the stack and popped, a from cities is assigned to city
  8. superagent.get is called with 'wttr.in/${city}', the handler is set for the end event.
  9. the http request to http://wttr.in/${city} is send to a background thread, and the execution continues
  10. 'Fetching the weather, please be patient' is logged to the console, getWeatherOfRandomCity returns
  11. sayHi is called, 'Hi' is printed to the console
  12. sendWeatherOfRandomCity returns, gets popped from the stack leaving it empty
  13. waiting for http://wttr.in/${city} to send it's response
  14. once the response has arrived, the end event is fired.
  15. the anonymous handler we passed to .end() is called, gets pushed to the stack with all variables in its closure, meaning it can see and modify the values of express, superagent, app, CITIES, request, response, city and all the functions we have defined
  16. response.send() gets called either with 200 or 500 statusCode, but again it is sent to a background thread, so the response stream is not blocking our execution, anonymous handler is popped from the stack.

So now we can understand why the previously mentioned setTimeout hack works. Even though we set the counter to zero, it defers the execution until the current stack and the task queue is empty, allowing the browser to redraw the UI, or Node to serve other requests.

Microtasks and Macrotasks

If this wasn't enough, we actually have more then one task queue. One for microtasks and another for macrotasks.

examples of microtasks:

  • process.nextTick
  • promises
  • Object.observe

examples of macrotasks:

  • setTimeout
  • setInterval
  • setImmediate
  • I/O

Let's take a look at the following code:

console.log('script start')

const interval = setInterval(() => {  
  console.log('setInterval')
}, 0)

setTimeout(() => {  
  console.log('setTimeout 1')
  Promise.resolve().then(() => {
    console.log('promise 3')
  }).then(() => {
    console.log('promise 4')
  }).then(() => {
    setTimeout(() => {
      console.log('setTimeout 2')
      Promise.resolve().then(() => {
        console.log('promise 5')
      }).then(() => {
        console.log('promise 6')
      }).then(() => {
        clearInterval(interval)
      })
    }, 0)
  })
}, 0)

Promise.resolve().then(() => {  
  console.log('promise 1')
}).then(() => {
  console.log('promise 2')
})

this will log to the console:

script start  
promise1  
promise2  
setInterval  
setTimeout1  
promise3  
promise4  
setInterval  
setTimeout2  
setInterval  
promise5  
promise6  

According to the WHATVG specification, exactly one (macro)task should get processed from the macrotask queue in one cycle of the event loop. After said macrotask has finished, all of the available microtasks will be processed within the same cycle. While these microtasks are being processed, they can queue more microtasks, which will all be run one by one, until the microtask queue is exhausted.

This diagram tries to make the picture a bit clearer:

The Node.js Event Loop

In our case:

Cycle 1:

  1. `setInterval` is scheduled as task
  2. `setTimeout 1` is scheduled as task
  3. in `Promise.resolve 1` both `then`s are scheduled as microtasks
  4. the stack is empty, microtasks are run

Task queue: setInterval, setTimeout 1

Cycle 2:

  1. the microtask queue is empty, `setInteval`'s handler can be run, another `setInterval` is scheduled as a task, right behind `setTimeout 1`

Task queue: setTimeout 1, setInterval

Cycle 3:

  1. the microtask queue is empty, `setTimeout 1`'s handler can be run, `promise 3` and `promise 4` are scheduled as microtasks,
  2. handlers of `promise 3` and `promise 4` are run `setTimeout 2` is scheduled as task

Task queue: setInterval, setTimeout 2

Cycle 4:

  1. the microtask queue is empty, `setInteval`'s handler can be run, another `setInterval` is scheduled as a task, right behind `setTimeout`

Task queue: setTimeout 2, setInteval

  1. `setTimeout 2`'s handler run, `promise 5` and `promise 6` are scheduled as microtasks

Now handlers of promise 5 and promise 6 should be run clearing our interval, but for some strange reason setInterval is run again. However, if you run this code in Chrome, you will get the expected behavior.

We can fix this in Node too with process.nextTick and some mind-boggling callback hell.

console.log('script start')

const interval = setInterval(() => {  
  console.log('setInterval')
}, 0)

setTimeout(() => {  
  console.log('setTimeout 1')
  process.nextTick(() => {
    console.log('nextTick 3')
    process.nextTick(() => {
      console.log('nextTick 4')
      setTimeout(() => {
        console.log('setTimeout 2')
        process.nextTick(() => {
          console.log('nextTick 5')
          process.nextTick(() => {
            console.log('nextTick 6')
            clearInterval(interval)
          })
        })
      }, 0)
    })
  })
})

process.nextTick(() => {  
  console.log('nextTick 1')
  process.nextTick(() => {
    console.log('nextTick 2')
  })
})

This is the exact same logic as our beloved promises use, only a little bit more hideous. At least it gets the job done the way we expected.

Download the whole Node.js Under the Hood tutorial series and read it later

Tame the async beast!

As we saw, we need to manage and pay attention to both task queues, and to the event loop when we write an app in Node.js - in case we wish to leverage all its power, and if we want to keep our long running tasks from blocking the main thread.

The event loop might be a slippery concept to grasp at first, but once you get the hang of it, you won't be able to imagine that there is life without it. The continuation passing style that can lead to a callback hell might look ugly, but we have Promises, and soon we will have async-await in our hands... and while we are (a)waiting, you can simulate async-await using co and/or koa.

One last parting advice:

Knowing how Node.js and V8 handles long running executions, you can start using it for your own good. You might have heard before that you should send your long running loops to the task queue. You can do it by hand or make use of async.js.

Happy coding!

If you have any questions or thoughts, share them in the comments, I’ll be there! The next part of the Node.js at Scale series is discussing the Garbage Collection in Node.js, I recommend to check it out!

Writing a JavaScript Framework - Introduction to Data Binding, beyond Dirty Checking

Writing a JavaScript Framework - Introduction to Data Binding, beyond Dirty Checking

This is the fourth chapter of the Writing a JavaScript framework series. In this chapter, I am going to explain the dirty checking and the accessor data binding techniques and point out their strengths and weaknesses.

The series is about an open-source client-side framework, called NX. During the series, I explain the main difficulties I had to overcome while writing the framework. If you are interested in NX please visit the home page.

The series includes the following chapters:

  1. Project structuring
  2. Execution timing
  3. Sandboxed code evaluation
  4. Data binding introduction (current chapter)
  5. Data Binding with ES6 Proxies
  6. Custom elements
  7. Client side routing

An introduction to data binding

Data binding is a general technique that binds data sources from the provider and consumer together and synchronizes them.

This is a general definition, which outlines the common building blocks of data binding techniques.

  • A syntax to define the provider and the consumer.
  • A syntax to define which changes should trigger synchronization.
  • A way to listen to these changes on the provider.
  • A synchronizing function that runs when these changes happen. I will call this function the handler() from now on.

The above steps are implemented in different ways by the different data binding techniques. The upcoming sections will be about two such techniques, namely dirty checking and the accessor method. Both has their strengths and weaknesses, which I will briefly discuss after introducing them.

Dirty checking

Dirty checking is probably the most well-known data binding method. It is simple in concept, and it doesn't require complex language features, which makes it a nice candidate for legacy usage.

The syntax

Defining the provider and the consumer doesn't require any special syntax, just plain Javascript objects.

const provider = {  
  message: 'Hello World'
}
const consumer = document.createElement('p')  

Synchronization is usually triggered by property mutations on the provider. Properties, which should be observed for changes must be explicitly mapped with their handler().

observe(provider, 'message', message => {  
  consumer.innerHTML = message
})

The observe() function simply saves the (provider, property) -> handler mapping for later use.

function observe (provider, prop, handler) {  
  provider._handlers[prop] = handler
}

With this, we have a syntax for defining the provider and the consumer and a way to register handler() functions for property changes. The public API of our library is ready, now comes the internal implementation.

Listening on changes

Dirty checking is called dirty for a reason. It runs periodical checks instead of listening on property changes directly. Let's call this check a digest cycle from now on. A digest cycle iterates through every (provider, property) -> handler entry added by observe() and checks if the property value changed since the last iteration. If it did change, it runs the handler() function. A simple implementation would look like below.

function digest () {  
  providers.forEach(digestProvider)
}

function digestProvider (provider) {  
  for (let prop in provider._handlers) {
    if (provider._prevValues[prop] !== provider[prop]) {
      provider._prevValues[prop] = provider[prop]
      handler(provider[prop])
    }
  }
}

The digest() function needs to be run from time to time to ensure a synchronized state.

The accessor technique

The accessor technique is the now trending one. It is a bit less widely supported as it requires the ES5 getter/setter functionality, but it makes up for this in elegance.

The syntax

Defining the provider requires special syntax. The plain provider object has to be passed to the observable() function, which transforms it into an observable object.

const provider = observable({  
  greeting: 'Hello',
  subject: 'World'
})
const consumer = document.createElement('p')  

This small inconvenience is more than compensated by the simple handler() mapping syntax. With dirty checking, we would have to define every observed property explicitly like below.

observe(provider, 'greeting', greeting => {  
  consumer.innerHTML = greeting + ' ' + provider.subject
})

observe(provider, 'subject', subject => {  
  consumer.innerHTML = provider.greeting + ' ' + subject
})

This is verbose and clumsy. The accessor technique can automatically detect the used provider properties inside the handler() function, which allows us to simplify the above code.

observe(() => {  
  consumer.innerHTML = provider.greeting + ' ' + provider.subject
})

The implementation of observe() is different from the dirty checking one. It just executes the passed handler() function and flags it as the currently active one while it is running.

let activeHandler

function observe(handler) {  
  activeHandler = handler
  handler()
  activeHandler = undefined
}

Note that we exploit the single-threaded nature of JavaScript here by using the single activeHandler variable to keep track of the currently running handler() function.

Listening on changes

This is where the 'accessor technique' name comes from. The provider is augmented with getters/setters, which do the heavy lifting in the background. The idea is to intercept the get/set operations of the provider properties in the following way.

  • get: If there is an activeHandler running, save the (provider, property) -> activeHandler mapping for later use.
  • set: Run all handler() functions, which are mapped with the (provide, property) pair.

The accessor data binding technique.

The following code demonstrates a simple implementation of this for a single provider property.

function observableProp (provider, prop) {  
  const value = provider[prop]
  Object.defineProperty(provider, prop, {
    get () {
      if (activeHandler) {
        provider._handlers[prop] = activeHandler
      }
      return value
    },
    set (newValue) {
      value = newValue
      const handler = obj._handlers[prop]
      if (handler) {
        activeHandler = handler
        handler()
        activeHandler = undefined
      }
    }
  })
}

The observable() function mentioned in the previous section walks the provider properties recursively and converts all of them into observables with the above observableProp() function.

function observable (provider) {  
  for (let prop in provider) {
    observableProp(provider, prop)
    if (typeof provider[prop] === 'object') {
      observable(provider[prop])
    }
  }
}

This is a very simple implementation, but it is enough for a comparison between the two techniques.

Comparison of the techniques

In this section, I will briefly outline the strengths and weaknesses of dirty checking and the accessor technique.

Syntax

Dirty checking requires no syntax to define the provider and consumer, but mapping the (provider, property) pair with the handler() is clumsy and not flexible.

The accessor technique requires the provider to be wrapped by the observable() function, but the automatic handler() mapping makes up for this. For large projects with data binding, it is a must have feature.

Performance

Dirty checking is notorious for its bad performance. It has to check every (provider, property) -> handler entry possibly multiple times during every digest cycle. Moreover, it has to grind even when the app is idle, since it can't know when the property changes happen.

The accessor method is faster, but performance could be unnecessarily degraded in case of big observable objects. Replacing every property of the provider by accessors is usually an overkill. A solution would be to build the getter/setter tree dynamically when needed, instead of doing it ahead in one batch. Alternatively, a simpler solution is wrapping the unneeded properties with a noObserve() function, that tells observable() to leave that part untouched. This sadly introduces some extra syntax.

Flexibility

Dirty checking naturally works with both expando (dynamically added) and accessor properties.

The accessor technique has a weak spot here. Expando properties are not supported because they are left out of the initial getter/setter tree. This causes issues with arrays for example, but it can be fixed by manually running observableProp() after adding a new property. Getter/setter properties are neither supported since accessors can't be wrapped by accessors again. A common workaround for this is using a computed() function instead of a getter. This introduces even more custom syntax.

Timing alternatives

Dirty checking doesn't give us much freedom here since we have no way of knowing when the actual property changes happen. The handler() functions can only be executed asynchronously, by running the digest() cycle from time to time.

Getters/setters added by the accessor technique are triggered synchronously, so we have a freedom of choice. We may decide to run the handler() right away, or save it in a batch that is executed asynchronously later. The first approach gives us the advantage of predictability, while the latter allows for performance enhancements by removing duplicates.

About the next article

In the next article, I will introduce the nx-observe data binding library and explain how to replace ES5 getters/setters by ES6 Proxies to eliminate most of the accessor technique's weaknesses.

Conclusion

If you are interested in the NX framework, please visit the home page. Adventurous readers can find the NX source code in this Github repository.

I hope you found this a good read, see you next time when I’ll discuss data binding with ES6 Proxies!

If you have any thoughts on the topic, please share them in the comments.


JavaScript Garbage Collection Improvements - Orinoco

JavaScript Garbage Collection Improvements - Orinoco

This article summarizes the three main effect of the new V8 upgrade (5.0) on JavaScript garbage collection.

In the latest releases of Node.js, the V8 JavaScript engine was upgraded to version 5.0. Among new ES2015 features, it includes three major improvements for the garbage collector. These changes lay the groundwork for a new garbage collector in V8, code-named Orinoco.

V8 implements a generational garbage collector - meaning it has a memory segment called new space for the young generation, and has an old space for the old generation. New objects are allocated in the new space, and if they survive two garbage collections in the new space, they are moved to the old space.

#1: Parallelised JavaScript Garbage Collection

The problem with the two spaces is that moving objects between them is expensive: the objects need to be copied over, and the pointers must be updated. Moving out from the young generation is called evacuation, while adding it to the old generation is called compaction.

Since there are no dependencies between young generation evacuation and old generation compaction, the garbage collector can perform these in parallel, resulting in a reduction of compaction time of 75% from ~7ms to under 2ms on average.

Node.js garbage collector parallel from JavaScript Garbage Collection

"The Parallelised Garbage Collector can reduce compaction time by 75%" via @RisingStack #javascript #nodejs

Click To Tweet

#2: Track Pointers Improvement

When an object is moved on the heap, the garbage collector has to update all the pointers - but first, it has to find all the pointers to the old location. For this V8 uses a data structure called remembered set to keep track of interesting pointers in the heap. A pointer is classified interesting if the next garbage collector run may move it:

  • moving objects from the young generation to the old generation,
  • pointers to objects in fragmented pages, as objects may be moved to other pages during compaction

In older versions of V8, remembered sets are implemented using store buffers. This contains addresses of all incoming pointers. The problem is that it may result in duplicated entries, because a store buffer may end up including a pointer multiple times and two store buffers may have the same pointer. This would make parallelization of the pointer update really complex.



Need help with enterprise-grade Node.js Development?
Hire the experts of RisingStack!


Instead of dealing with the extra complexity, Orinoco removes it by reorganizing the remembered set. Instead of the previous approach, now each page stores the offsets of the interesting pointers originating from the given page. With this technique the parallel updates of the pointers become feasible.

It has a huge performance impact - it can reduce the maximum pause time of compacting garbage collection by 40%.

#3: Black Allocation

The Black Allocation introduced by Orinoco is involved in the marking phase of the garbage collector. With this, once an object is allocated in the old generation, they are marked black instantly, meaning that they are "live" objects. The goal of this allocation is that objects allocated in the old space are likely to live longer, so they should survive the next garbage collector. Objects colored black are not visited by the garbage collector - they are put on black pages of the old space.

It speeds up the garbage collection due the faster-marking progress as well as less garbage collection work in general.


For more information on V8 updates, follow the V8 project's blog.

Let me know if you have any questions or additional thoughts in the comments!


Writing a JavaScript framework - Sandboxed code evaluation

Writing a JavaScript framework - Sandboxed code evaluation

This is the third chapter of the Writing a JavaScript framework series. In this chapter, I am going to explain the different ways of evaluating code in the browser and the issues they cause. I will also introduce a method, which relies on some new or lesser known JavaScript features.

The series is about an open-source client-side framework, called NX. During the series, I explain the main difficulties I had to overcome while writing the framework. If you are interested in NX please visit the home page.

The series includes the following chapters:

  1. Project structuring
  2. Execution timing
  3. Sandboxed code evaluation (current chapter)
  4. Data binding introduction
  5. Data Binding with ES6 Proxies
  6. Custom elements
  7. Client side routing

The evil eval

The eval() function evaluates JavaScript code represented as a string.

A common solution for code evaluation is the eval() function. Code evaluated by eval() has access to closures and the global scope, which leads to a security issue called code injection and makes eval() one of the most notorious features of JavaScript.

Despite being frowned upon, eval() is very useful in some situations. Most modern front-end frameworks require its functionality but don't dare to use it because of the issue mentioned above. As a result, many alternative solutions emerged for evaluating strings in a sandbox instead of the global scope. The sandbox prevents the code from accessing secure data. Usually it is a simple JavaScript object, which replaces the global object for the evaluated code.

The common way

The most common eval() alternative is complete re-implementation - a two-step process, which consists of parsing and interpreting the passed string. First the parser creates an abstract syntax tree, then the interpreter walks the tree and interprets it as code inside a sandbox.

This is a widely used solution, but it is arguably too heavy for such a simple thing. Rewriting everything from scratch instead of patching eval() introduces a lot of bug opportunities and it requires frequent modifications to follow the latest language updates as well.

An alternative way

NX tries to avoid re-implementing native code. Evaluation is handled by a tiny library that uses some new or lesser known JavaScript features.

This section will progressively introduce these features and use them to explain the nx-compile code evaluation library. The library has a function called compileCode(), which works like below.

const code = compileCode('return num1 + num2')

// this logs 17 to the console
console.log(code({num1: 10, num2: 7}))

const globalNum = 12  
const otherCode = compileCode('return globalNum')

// global scope access is prevented
// this logs undefined to the console
console.log(otherCode({num1: 2, num2: 3}))  

By the end of this article, we will implement the compileCode() function in less than 20 lines.

new Function()

The Function constructor creates a new Function object. In JavaScript, every function is actually a Function object.

The Function constructor is an alternative to eval(). new Function(...args, 'funcBody') evaluates the passed 'funcBody' string as code and returns a new function that executes that code. It differs from eval() in two major ways.

  • It evaluates the passed code just once. Calling the returned function will run the code without re-evaluating it.
  • It doesn't have access to local closure variables, however, it can still access the global scope.
function compileCode (src) {  
  return new Function(src)
}

new Function() is a better alternative to eval() for our use case. It has superior performance and security, but global scope access still has to be prevented to make it viable.

The 'with' keyword

The with statement extends the scope chain for a statement.

with is a lesser known keyword in JavaScript. It allows a semi-sandboxed execution. The code inside a with block tries to retrieve variables from the passed sandbox object first, but if it doesn't find it there, it looks for the variable in the closure and global scope. Closure scope access is prevented by new Function() so we only have to worry about the global scope.

function compileCode (src) {  
  src = 'with (sandbox) {' + src + '}'
  return new Function('sandbox', src)
}

with uses the in operator internally. For every variable access inside the block, it evaluates the variable in sandbox condition. If the condition is truthy, it retrieves the variable from the sandbox. Otherwise, it looks for the variable in the global scope. By fooling with to always evaluate variable in sandbox as truthy, we could prevent it from accessing the global scope.

Sandboxed code evaluation: Simple 'with' statement

ES6 proxies

The Proxy object is used to define custom behavior for fundamental operations like property lookup or assignment.

An ES6 Proxy wraps an object and defines trap functions, which may intercept fundamental operations on that object. Trap functions are invoked when an operation occurs. By wrapping the sandbox object in a Proxy and defining a has trap, we can overwrite the default behavior of the in operator.

function compileCode (src) {  
  src = 'with (sandbox) {' + src + '}'
  const code = new Function('sandbox', src)

  return function (sandbox) {
    const sandboxProxy = new Proxy(sandbox, {has})
    return code(sandboxProxy)
  }
}

// this trap intercepts 'in' operations on sandboxProxy
function has (target, key) {  
  return true
}

The above code fools the with block. variable in sandbox will always evaluate to true because the has trap always returns true. The code inside the with block will never try to access the global object.

Sandboxed code evaluation: 'with' statement and proxies

Symbol.unscopables

A symbol is a unique and immutable data type and may be used as an identifier for object properties.

Symbol.unscopables is a well-known symbol. A well-known symbol is a built-in JavaScript Symbol, which represents internal language behavior. Well-known symbols can be used to add or overwrite iteration or primitive conversion behavior for example.

The Symbol.unscopables well-known symbol is used to specify an object value of whose own and inherited property names are excluded from the 'with' environment bindings.

Symbol.unscopables defines the unscopable properties of an object. Unscopable properties are never retrieved from the sandbox object in with statements, instead they are retrieved straight from the closure or global scope. Symbol.unscopables is a very rarely used feature. You can read about the reason it was introduced on this page.

Sandboxed code evaluation: 'with' statement and proxies. A security issue.

We can fix above issue by defining a get trap on the sandbox Proxy, which intercepts Symbol.unscopables retrieval and always returns undefined. This will fool the with block into thinking that our sandbox object has no unscopable properties.

function compileCode (src) {  
  src = 'with (sandbox) {' + src + '}'
  const code = new Function('sandbox', src)

  return function (sandbox) {
    const sandboxProxy = new Proxy(sandbox, {has, get})
    return code(sandboxProxy)
  }
}

function has (target, key) {  
  return true
}

function get (target, key) {  
  if (key === Symbol.unscopables) return undefined
  return target[key]
}

Sandboxed code evaluation: 'with' statement and proxies. Has and get traps.

WeakMaps for caching

The code is now secure, but its performance can be still upgraded, since it creates a new Proxy on every invocation of the returned function. This can be prevented by caching and using the same Proxy for every function call with the same sandbox object.

A proxy belongs to a sandbox object, so we could simply add the proxy to the sandbox object as a property. However, this would expose our implementation details to the public, and it wouldn't work in case of an immutable sandbox object frozen with Object.freeze(). Using a WeakMap is a better alternative in this case.

The WeakMap object is a collection of key/value pairs in which the keys are weakly referenced. The keys must be objects, and the values can be arbitrary values.

A WeakMap can be used to attach data to an object without directly extending it with properties. We can use WeakMaps to indirectly add the cached Proxies to the sandbox objects.

const sandboxProxies = new WeakMap()

function compileCode (src) {  
  src = 'with (sandbox) {' + src + '}'
  const code = new Function('sandbox', src)

  return function (sandbox) {
    if (!sandboxProxies.has(sandbox)) {
      const sandboxProxy = new Proxy(sandbox, {has, get})
      sandboxProxies.set(sandbox, sandboxProxy)
    }
    return code(sandboxProxies.get(sandbox))
  }
}

function has (target, key) {  
  return true
}

function get (target, key) {  
  if (key === Symbol.unscopables) return undefined
  return target[key]
}

This way only one Proxy will be created per sandbox object.

Final notes

The above compileCode() example is a working sandboxed code evaluator in just 19 lines of code. If you would like to see the full source code of the nx-compile library, you can find it in this Github repository.

Apart from explaining code evaluation, the goal of this chapter was to show how new ES6 features can be used to alter the existing ones, instead of re-inventing them. I tried to demonstrate the full power of Proxies and Symbols through the examples.

Conclusion

If you are interested in the NX framework, please visit the home page. Adventurous readers can find the NX source code in this Github repository.

I hope you found this a good read, see you next time when I’ll discuss data binding!

If you have any thoughts on the topic, please share them in the comments.


Writing a JavaScript framework - Execution timing, beyond setTimeout

Writing a JavaScript framework - Execution timing, beyond setTimeout

This is the second chapter of the Writing a JavaScript framework series. In this chapter, I am going to explain the different ways of executing asynchronous code in the browser. You will read about the event loop and the differences between timing techniques, like setTimeout and Promises.

The series is about an open-source client-side framework, called NX. During the series, I explain the main difficulties I had to overcome while writing the framework. If you are interested in NX please visit the home page.

The series includes the following chapters:

  1. Project structuring
  2. Execution timing (current chapter)
  3. Sandboxed code evaluation
  4. Data binding introduction
  5. Data Binding with ES6 Proxies
  6. Custom elements
  7. Client side routing

Async code execution

Most of you are probably familiar with Promise, process.nextTick(), setTimeout() and maybe requestAnimationFrame() as ways of executing asynchronous code. They all use the event loop internally, but they behave quite differently regarding precise timing.

In this chapter, I will explain the differences, then show you how to implement a timing system that a modern framework, like NX requires. Instead of reinventing the wheel we will use the native event loop to achieve our goals.

The event loop

The event loop is not even mentioned in the ES6 spec. JavaScript only has jobs and job queues on its own. A more complex event loop is specified separately by NodeJS and the HTML5 spec. Since this series is about the front-end I will explain the latter one here.

The event loop is called a loop for a reason. It is infinitely looping and looking for new tasks to execute. A single iteration of this loop is called a tick. The code executed during a tick is called a task.

while (eventLoop.waitForTask()) {  
  eventLoop.processNextTask()
}

Tasks are synchronous pieces of code that may schedule other tasks in the loop. An easy programmatic way to schedule a new task is setTimeout(taskFn). However, tasks may come from several other sources like user events, networking or DOM manipulation.

Execution timing: Event loop with tasks

Task queues

To complicate things a bit, the event loop can have multiple task queues. The only two restrictions are that events from the same task source must belong to the same queue and tasks must be processed in insertion order in every queue. Apart from these, the user agent is free to do as it wills. For example, it may decide which task queue to process next.

while (eventLoop.waitForTask()) {  
  const taskQueue = eventLoop.selectTaskQueue()
  if (taskQueue.hasNextTask()) {
    taskQueue.processNextTask()
  }
}

With this model, we loose precise control over timing. The browser may decide to totally empty several other queues before it gets to our task scheduled with setTimeout().

Execution timing: Event loop with task queues

The microtask queue

Fortunately, the event loop also has a single queue called the microtask queue. The microtask queue is completely emptied in every tick after the current task finished executing.

while (eventLoop.waitForTask()) {  
  const taskQueue = eventLoop.selectTaskQueue()
  if (taskQueue.hasNextTask()) {
    taskQueue.processNextTask()
  }

  const microtaskQueue = eventLoop.microTaskQueue
  while (microtaskQueue.hasNextMicrotask()) {
    microtaskQueue.processNextMicrotask()
  }
}

The easiest way to schedule a microtask is Promise.resolve().then(microtaskFn). Microtasks are processed in insertion order, and since there is only one microtask queue, the user agent can't mess with us this time.

Moreover, microtasks can schedule new microtasks that will be inserted in the same queue and processed in the same tick.

Execution timing: Event loop with microtask queue

Rendering

The last thing missing is the rendering schedule. Unlike event handling or parsing, rendering is not done by separate background tasks. It is an algorithm that may run at the end of every loop tick.

The user agent has a lot of freedom again: It may render after every task, but it may decide to let hundreds of tasks execute without rendering.

Fortunately, there is requestAnimationFrame(), that executes the passed function right before the next render. Our final event loop model looks like this.

while (eventLoop.waitForTask()) {  
  const taskQueue = eventLoop.selectTaskQueue()
  if (taskQueue.hasNextTask()) {
    taskQueue.processNextTask()
  }

  const microtaskQueue = eventLoop.microTaskQueue
  while (microtaskQueue.hasNextMicrotask()) {
    microtaskQueue.processNextMicrotask()
  }

  if (shouldRender()) {
    applyScrollResizeAndCSS()
    runAnimationFrames()
    render()
  }
}

Execution timing: Event loop with rendering

Now let’s use all this knowledge to build a timing system!

Using the event loop

As most modern frameworks, NX deals with DOM manipulation and data binding in the background. It batches operations and executes them asynchronously for better performance. To time these things right it relies on Promises, MutationObservers and requestAnimationFrame().

The desired timing is this:

  1. Code from the developer
  2. Data binding and DOM manipulation reactions by NX
  3. Hooks defined by the developer
  4. Rendering by the user agent

#Step 1

NX registers object mutations with ES6 Proxies and DOM mutations with a MutationObserver synchronously (more about these in the next chapters). It delays the reactions as microtasks until step 2 for optimized performance. This delay is done by Promise.resolve().then(reaction) for object mutations, and handled automatically by the MutationObserver as it uses microtasks internally.

#Step 2

The code (task) from the developer finished running. The microtask reactions registered by NX start executing. Since they are microtasks they run in order. Note that we are still in the same loop tick.

#Step 3

NX runs the hooks passed by the developer using requestAnimationFrame(hook). This may happen in a later loop tick. The important thing is that the hooks run before the next render and after all data, DOM and CSS changes are processed.

#Step 4

The browser renders the next view. This may also happen in a later loop tick, but it never happens before the previous steps in a tick.

Things to keep in mind

We just implemented a simple but effective timing system on top of the native event loop. It works well in theory, but timing is a delicate thing, and slight mistakes can cause some very strange bugs.

In a complex system, it is important to set up some rules about the timing and keep to them later. For NX I have the following rules.

  1. Never use setTimeout(fn, 0) for internal operations
  2. Register microtasks with the same method
  3. Reserve microtasks for internal operations only
  4. Do not pollute the developer hook execution time window with anything else

#Rule 1 and 2

Reactions on data and DOM manipulation should execute in the order the manipulations happened. It is okay to delay them as long as their execution order is not mixed up. Mixing execution order makes things unpredictable and difficult to reason about.
setTimeout(fn, 0) is totally unpredictable. Registering microtasks with different methods also leads to mixed up execution order. For example microtask2 would incorrectly execute before microtask1 in the example below.

Promise.resolve().then().then(microtask1)  
Promise.resolve().then(microtask2)  

Execution timing: Microtask registration method

#Rule 3 and 4

Separating the time window of the developer code execution and the internal operations is important. Mixing these two would start to cause seemingly unpredictable behavior, and it would eventually force developers to learn about the internal working of the framework. I think many front-end developers have experiences like this already.

Conclusion

If you are interested in the NX framework, please visit the home page. Adventurous readers can find the NX source code in this Github repository.

I hope you found this a good read, see you next time when I’ll discuss sandboxed code evaluation!

If you have any thoughts on the topic, please share them in the comments.


Writing a JavaScript Framework - Project Structuring

Writing a JavaScript Framework - Project Structuring

In the last couple of months Bertalan Miklos, JavaScript engineer at RisingStack wrote a next generation client-side framework, called NX. In the Writing a JavaScript Framework series, Bertalan shares what he learned during the process:

In this chapter, I am going to explain how NX is structured, and how I solved its use case specific difficulties regarding extendibility, dependency injection and private variables.

The series includes the following chapters.

  1. Project structuring (current chapter)
  2. Execution timing
  3. Sandboxed code evaluation
  4. Data binding introduction
  5. Data Binding with ES6 Proxies
  6. Custom elements
  7. Client side routing

Project Structuring

There is no structure that fits all projects, although there are some general guidelines. Those who are interested can check out our Node.js project structure tutorial from the Node Hero series.

An overview of the NX JavaScript Framework

NX aims to be an open-source community driven project, which is easy to extend and scales well.

  • It has all the features expected from a modern client-side framework.
  • It has no external dependencies, other than polyfills.
  • It consists around 3000 lines altogether.
  • No module is longer than 300 lines.
  • No feature module has more than 3 dependencies.

Its final dependency graph looks like this:

JavaScript Framework in 2016: The NX project structure

This structure provides a solution for some typical framework related difficulties.

  • Extendibility
  • Dependency injection
  • Private variables

Achieving Extendibility

Easy extendibility is a must for community driven projects. To achieve it, the project should have a small core and a predefined dependency handling system. The former ensures that it is understandable, while the latter ensures that it will stay that way.

In this section, I focus on having a small core.

The main feature expected from modern frameworks is the ability to create custom components and use them in the DOM. NX has the single component function as its core, and that does exactly this. It allows the user to configure and register a new component type.

component(config)  
  .register('comp-name')

The registered comp-name is a blank component type which can be instantiated inside the DOM as expected.

<comp-name></comp-name>  

The next step is to ensure that the components can be extended with new features. To keep both simplicity and extendibility, these new features should not pollute the core. This is where dependency injection comes handy.

Dependency Injection (DI) with Middlewares

If you are unfamiliar with dependency injection, I suggest you to read our article on the topic : Dependency Injection in Node.js.

Dependency injection is a design pattern in which one or more dependencies (or services) are injected, or passed by reference, into a dependent object.

DI removes hard burnt dependencies but introduces a new problem. The user has to know how to configure and inject all the dependencies. Most client-side frameworks have DI containers doing this instead of the user.

A Dependency Injection Container is an object that knows how to instantiate and configure objects.

Another approach is the middleware DI pattern, which is widely used on the server side (Express, Koa). The trick here is that all injectable dependencies (middlewares) have the same interface and can be injected the same way. In this case, no DI container is needed.

I went with this solution to keep simplicity. If you ever used Express the below code will be very familiar.

component()  
  .use(paint) // inject paint middleware
  .use(resize) // inject resize middleware
  .register('comp-name')

function paint (elem, state, next) {  
  // elem is the component instance, set it up or extend it here
  elem.style.color = 'red'
  // then call next to run the next middleware (resize)
  next()
}

function resize (elem, state, next) {  
  elem.style.width = '100 px'
  next()
}

Middlewares execute when a new component instance is attached to the DOM and typically extend the component instance with new features. Extending the same object by different libraries leads to name collisions. Exposing private variables deepens this problem and may cause accidental usage by others.

Having a small public API and hiding the rest is a good practice to avoid these.

Handling privacy

Privacy is handled by function scope in JavaScript. When cross-scope private variables are required, people tend to prefix them with _ to signal their private nature and expose them publicly. This prevents accidental usage but doesn't avoid name collisions. A better alternative is the ES6 Symbol primitive.

A symbol is a unique and immutable data type, that may be used as an identifier for object properties.

The below code demonstrates a symbol in action.

const color = Symbol()

// a middleware
function colorize (elem, state, next) {  
  elem[color] = 'red'
  next()
}

Now 'red' is only reachable by owning a reference to the color symbol (and the element). The privacy of 'red' can be controlled by exposing the color symbol to different extents. With a reasonable number of private variables, having a central symbol storage is an elegant solution.

// symbols module
exports.private = {  
  color: Symbol('color from colorize')
}
exports.public = {}  

And an index.js like below.

// main module
const symbols = require('./symbols')  
exports.symbols = symbols.public  

The storage is accessible inside the project for all modules, but the private part is not exposed to the outside. The public part can be used to expose low-level features to external developers. This prevents accidental usage since the developer has to explicitly require the needed symbol to use it. Moreover, symbol references can not collide like string names, so name collision is impossible.

The points below summarize the pattern for different scenarios.

1. Public variables

Use them normally.

function (elem, state, next) {  
  elem.publicText = 'Hello World!'
  next()
}

2. Private variables

Cross-scope variables, that are private to the project should have a symbol key added to the private symbol registry.

// symbols module
exports.private = {  
  text: Symbol('private text')
}
exports.public = {}  

And required from it when needed somewhere.

const private = require('symbols').private

function (elem, state, next) {  
  elem[private.text] = 'Hello World!'
  next()
}

3. Semi-private variables

Variables of the low level API should have a symbol key added to the public symbol registry.

// symbols module
exports.private = {  
  text: Symbol('private text')
}
exports.public = {  
  text: Symbol('exposed text')
}

And required from it when needed somewhere.

const exposed = require('symbols').public

function (elem, state, next) {  
  elem[exposed.text] = 'Hello World!'
  next()
}

Conclusion

If you are interested in the NX framework, please visit the home page. Adventurous readers can find the NX source code in this Github repository.

I hope you found this a good read, see you next time when I’ll discuss execution timing!

If you have any thoughts on the topic, share it in the comments.

On Third-Party JavaScript - In Production Case-Study

Third-party JavaScript is a pattern of JavaScript programming that enables the creation of highly distributable web applications. Unlike regular web applications, which are accessed at a single web address, these applications can be arbitrarily loaded on any web page using simple JavaScript includes. — Ben Vinegar, Anton Kovalyov (Third-party Javascript)

Google Analytics, Mixpanel, Disqus - just to name a few products that heavily rely on third-party JavaScript development.

This is the second part of the "On Third-Party JavaScript" series, the first one dealt with the very principles you have to understand when dealing with third-party JavaScipt.

In this post we are going to take a look on how companies out there solve the challenges of third-party JavaScript development.

Disclaimer: we do not work for any of these companies (as of the date publishing this post), so the findings here are based purely on reverse engineering.

The Mixpanel Way

Mixpanel is a company that provides an analytics platform that can be integrated with your web applications - so they heavily depend on third-party JavaScript.

Integrating with Mixpanel

To integrate with Mixpanel the very first step you have to do is include the following snippet in your HTML's head section:

After this you can use the full power of the library - but what happens behind the curtains? This tiny include snippet only does the following:

  • when inserted into the page it automatically starts downloading the full mixpanel library
  • if mixpanel.init puts placeholder methods on the global mixpanel object - this will store any commands until the full is not loaded
  • when the full library is downloaded then it overwrites the methods on the global mixpanel object and processes any commands that may be in the queue

Size, cache policy

The full mixpanel library is only 16.2K - so far so good. Check.

Regarding the distribution: Mixpanel uses Akamai as their CDN provider. Check.

Mixpanel sets the Cache-Control:public, max-age=3600 header on their library. This tells the browser to keep the current version for 3600 seconds (1 hour) then download it again. This, combined with the 16K in size can have a huge impact on both their Akamai bills and the users bandwidth usage (no, not in a good way).

Things to improve at MixPanel

  • To improve the cache policy and enable infinite caching of the main library Mixpanel should split the main file into two separate files, like outlined in the previous article's Distributing section.

The Disqus Way

Disqus is a great tool for connecting the audience of a website or blog and start a discussion on each content. Unlike Mixpanel, Disqus has to take care of a user interface as well.

Integration

Integrating Disqus is pretty straightforward:

Let's take a closer look!

The very first thing you will notice is the configuration variable - it is used to initializes Disqus to work with your site. What comes after that is very similar to Mixpanel - the loading of the main application's JavaScript file. Also, pay attention to the <noscript> tag - if a user does not have JavaScript support, it will give him/her a warning.

The mechanisms behind Disqus

The embed.js file is what the small include snippet will require - this is relatively small, and has a very short cache lifetime set: 300 seconds. This file contains some basic configuration, like the version of Disqus you are using. This information is then used to fetch more resources the application needs. This setup works really great, as only the embed.js has to have a small cache period set, other resources can be cached for a really long period of time.

For configuration management they are using a config.js file which is cached for a very short amount of time to enable rapid changes in the configuration. This file contains settings on how Disqus will appear - but not just that: it contains feature flips, and service discovery information as well.

Disqus not only loads static resources with embed.js, but modules as well, staring with lounge.load.js. For this, they are using AMD, which stands for Asynchronous Module Definition. The library that helps them is RequireJS. These modules are referenced with a unique version/commit tag, so they can be cached for eternity - if they want to roll out a new version only some configuration has to be updated, which will mean a new URI for the resource.

CSS and images files are handled in the very same way: they are versioned, and cached for a month. To send images to the client Disqus uses sprites to minimize the number of requests the client has to make.

A summarized overview of what is happening:

Disqus architecture

One more thing about the implementation of Disqus - bug tracking! They use Sentry to collect and report JavaScript errors - you should give Sentry a try as well.

Things to improve

The Disqus team made an incredible job developing their product, hats off! The only thing I could come up with is that they include snippet leaks the disqus_shortname variable into the global scope. A possible way to solve this:

Conclusion

Building third-party JavaScript is a hard task - but there are companies out there who made a great job solving this. When in doubt, you can always try to examine how they do things, so you will have options to choose from.

UPDATE 1: Disqus uses their own "smart" file versioner, which can be found here: https://github.com/disqus/grunt-smartrev. Thanks!

Need help in developing your application?

RisingStack provides JavaScript development and consulting services - ping us if you need a helping hand!

On Third-Party Javascript - The Principles

Third-party JavaScript is a pattern of JavaScript programming that enables the creation of highly distributable web applications. Unlike regular web applications, which are accessed at a single web address, these applications can be arbitrarily loaded on any web page using simple JavaScript includes. — Ben Vinegar, Anton Kovalyov (Third-party Javascript)

Google Analytics, Mixpanel, Disqus - just to name a few products that heavily rely on third-party JavaScript development. In this post we are going to take a look on the principles of third-party JavaScript development - in Part II we will take a look on how other companies do it in details.

Principles of developing third-party JavaScript

Before going into the details on how the big players out there do this, let's take a look at the key points that you should pay attention to.

Injecting third-party JavaScript

Traditionally, JavaScript resources can be inserted into a webpage with the following snippet:

We have to do something very similar when integrating into different web applications. For this you can provide the following snippet to your clients:

What just happened here? Firstly, we created a new script HTML element, then started to decorate it with attributes. This snippet should be placed at the end of the body tag.

The most important thing to notice here is the async attribute. Imagine the following scenario: your service gets a huge amount of traffic, and it becomes slow. If the loading of your script does not happen asynchronously, you can block the entire webpage. By setting its value to true we make sure that we will not block the loading of any other resources on the page.

But what should this file contain? Your entire application, or something different? The distribution part will try to answer this question.

The sacred global scope

When writing third-party JavaScript you do not know where your library will be used. It will be injected into the unknown, and that unknown sometimes will be Mordor itself, with other third-party libs already there.

Be a good guy, do not pollute the global scope even more.

Dependencies

As we have already discussed, your script will be injected into the unknown. This means, that it is very likely that libraries like jQuery, Backbone or Lodash/Underscore will be present in the page.

Be careful! You should never depend on these things, the developers of that site will not reach out to you and ask, if you are still using that thing. Even worse, they can use different versions of those libraries. So once again: never ever use them.

But what should you do instead? You have to bundle all your dependencies into your JavaScript file(s). Make sure, that thes do not interfere with the original ones (a.k.a. noConflict). To solve this problem Browserify/Webpack can be a good choice - they can help isolate your dependencies from the dependencies of the orignal site with scoping.

Also, lots of front end libraries can be found on NPM and used with Browserify/Webpack. (e.g. you can use jQuery this way without putting it into the global scope, or even worse, overwritting the one used by the site you are injected into).

Communication with a server

When developing third-party JavaScript, communication with the back end servers is not trivial.

XMLHttpRequest cannot load http://example.org/apple. Origin https://example.com is not allowed by Access-Control-Allow-Origin.  

Have you ever encountered this error message? It happened because the remote server refused to serve our request.

Enable CORS (Cross-Origin Resource Sharing)

The easiest way to do is to set the following headers in the response of the server:

Access-Control-Allow-Origin: *  

Sure, you may want to limit who can reach your services - you can add domains instead of the asterisk.

The only thing you have to keep in mind when using CORS is the legacy support (if you have to deal with that). Internet Explorer browsers (8 and 9) does not have full CORS support:

  • only POST and GET
  • no custom HTTP headers
  • content-type must be text/plain

To support these browsers you have to implement HTTP Method Overriding on both the client and the server. How does that work? It extracts the intended HTTP method from the method querystring/parameter, then handle the actual request as it was a DELETE, PUT, etc...

Luckily, for the common frameworks like Express and Koa you can find solutions on NPM (for Express, for Koa).

Identifying users

Users can be identified using cookies. They can be used in third-party JavaScript development as well, but we have to introduce two new definitions.

First-party cookie

First-party cookies are the "traditional" cookies. They are called first-party cookies because these cookies are placed on the same domain where the JavaScript code runs. Your partners will also see these cookies in their traffic.

Third-party cookie

Third-party cookies are called third-party, because they are placed on a different domain. Imagine the following scenario: your script is injected into examplestore.com. You may want to track your users using your own domain, whatanicewidget.com. In that case you will put a cookie on whatanicewidget.com.

What are the benefits of using a third-party cookie? You can recognise users from niceexamplestore.com, whatastooore.com not just from examplestore.com, because when making requests to your domain you will have the very same cookie.

When implementing an identifying mechanism for your application do not forget, that third-party cookies are not supported everywhere. Because of this reason you have to implement a fallback for the first-party cookie version.

LocalStorage

This is the trickiest one. You can use localStorage (if available in the browser) to identify users. But be aware: the same-origin policy applies to localStorage as well, so visiting the same site using HTTP and HTTPS will result in different localStorage contents.

So how does localStorage help you? In short: you can use window.postMessage to send messages between windows. So what you have to do is to include an external webpage into your site using an iframe (using HTTPS), then communicate with it - that window will contain the localstorage, that will be the same, no matter from where the user visits that. A sample implementation can be found here: https://github.com/zendesk/cross-storage. `

Distributing

When serving third-party JavaScript applications the size of it and the cache policy are crucial, as both not just affect the time your users have to wait to see the application, but also your monthly bills. CDNs charge based on traffic (in GBs, TBs) and the number of requests.

Hopefully this will not strike you as a suprise: always uglify/minify your JavaScript/CSS resources.

What what about caching? If you set the max-age to a big number, then pushing out new versions may take a lot of time to progate to all the clients. If you set it to a small value, then the clients will frequently download it. We can do better!

Let's split up your application into two seperate JavaScript files! For the sake of simplicity call them loader.js and application.js.

The loader will be a really small file, basically what we created before, with a small exception: we include a revision number when loading the application.js file.

So in this case your users have to load loader.js file to their site, which will then load the application.js, containing all the application logic. But why to do this? For the loader file we can set a small cache time, like an hour - it does not matter if this will be downloaded a lot, because it will not be bigger than 1KB. For the application itself we can set the cache time to eternity, it will be downloaded only once.

Splitting third-party javascript applications Splitting third-party JavaScript applications

Recommended reading

Take a closer look on how the big players out there do third-party JavaScript development, examining cache policies, dependencies, security, communcation with the server and more.