Péter Márton's Picture

Péter Márton

CTO at RisingStack, microservices and brewing beer with Node.js

Designing a Microservices Architecture for Failure

A Microservices architecture makes it possible to isolate failures through well-defined service boundaries. But like in every distributed system, there is a higher chance for network, hardware or application level issues. As a consequence of service dependencies, any component can be temporarily unavailable for their consumers. To minimize the impact of partial outages we need to build fault tolerant services that can gracefully respond to certain types of outages.

This article introduces the most common techniques and architecture patterns to build and operate a highly available microservices system based on RisingStack’s Node.js Consulting & Development experience.

If you are not familiar with the patterns in this article, it doesn’t necessarily mean that you do something wrong. Building a reliable system always comes with an extra cost.

The Risk of the Microservices Architecture

The microservices architecture moves application logic to services and uses a network layer to communicate between them. Communicating over a network instead of in-memory calls brings extra latency and complexity to the system which requires cooperation between multiple physical and logical components. The increased complexity of the distributed system leads to a higher chance of particular network failures.

One of the biggest advantage of a microservices architecture over a monolithic one is that teams can independently design, develop and deploy their services. They have full ownership over their service's lifecycle. It also means that teams have no control over their service dependencies as it's more likely managed by a different team. With a microservices architecture, we need to keep in mind that provider services can be temporarily unavailable by broken releases, configurations, and other changes as they are controlled by someone else and components move independently from each other.

Graceful Service Degradation

One of the best advantages of a microservices architecture is that you can isolate failures and achieve graceful service degradation as components fail separately. For example, during an outage customers in a photo sharing application maybe cannot upload a new picture, but they can still browse, edit and share their existing photos.

Microservices fail separately in theory

Microservices fail separately (in theory)

In most of the cases, it's hard to implement this kind of graceful service degradation as applications in a distributed system depend on each other, and you need to apply several failover logics (some of them will be covered by this article later) to prepare for temporary glitches and outages.

Microservices Depend on Each Other

Services depend on each other and fail together without failover logics.

Change management

Google’s site reliability team has found that roughly 70% of the outages are caused by changes in a live system. When you change something in your service - you deploy a new version of your code or change some configuration - there is always a chance for failure or the introduction of a new bug.

In a microservices architecture, services depend on each other. This is why you should minimize failures and limit their negative effect. To deal with issues from changes, you can implement change management strategies and automatic rollouts.

For example, when you deploy new code, or you change some configuration, you should apply these changes to a subset of your instances gradually, monitor them and even automatically revert the deployment if you see that it has a negative effect on your key metrics.

Microservices Change Management

Change Management - Rolling Deployment

Another solution could be that you run two production environments. You always deploy to only one of them, and you only point your load balancer to the new one after you verified that the new version works as it is expected. This is called blue-green, or red-black deployment.

Reverting code is not a bad thing. You shouldn’t leave broken code in production and then think about what went wrong. Always revert your changes when it’s necessary. The sooner the better.


Want to learn more about building reliable mircoservices architectures?

Check out our upcoming trainings!
Microservices Trainings


Health-check and Load Balancing

Instances continuously start, restart and stop because of failures, deployments or autoscaling. It makes them temporarily or permanently unavailable. To avoid issues, your load balancer should skip unhealthy instances from the routing as they cannot serve your customers' or sub-systems' need.

Application instance health can be determined via external observation. You can do it with repeatedly calling a GET /health endpoint or via self-reporting. Modern service discovery solutions continuously collect health information from instances and configure the load-balancer to route traffic only to healthy components.

Self-healing

Self-healing can help to recover an application. We can talk about self-healing when an application can do the necessary steps to recover from a broken state. In most of the cases, it is implemented by an external system that watches the instances health and restarts them when they are in a broken state for a longer period. Self-healing can be very useful in most of the cases, however, in certain situations it can cause trouble by continuously restarting the application. This might happen when your application cannot give positive health status because it is overloaded or its database connection times out.

Implementing an advanced self-healing solution which is prepared for a delicate situation - like a lost database connection - can be tricky. In this case, you need to add extra logic to your application to handle edge cases and let the external system know that the instance is not needed to restart immediately.

Failover Caching

Services usually fail because of network issues and changes in our system. However, most of these outages are temporary thanks to self-healing and advanced load-balancing we should find a solution to make our service work during these glitches. This is where failover caching can help and provide the necessary data to our application.

Failover caches usually use two different expiration dates; a shorter that tells how long you can use the cache in a normal situation, and a longer one that says how long can you use the cached data during failure.

Microservices Failover Caching

Failover Caching

It’s important to mention that you can only use failover caching when it serves the outdated data better than nothing.

To set cache and failover cache, you can use standard response headers in HTTP.

For example, with the max-age header you can specify the maximum amount of time a resource will be considered fresh. With the stale-if-error header, you can determine how long should the resource be served from a cache in the case of a failure.

Modern CDNs and load balancers provide various caching and failover behaviors, but you can also create a shared library for your company that contains standard reliability solutions.

Retry Logic

There are certain situations when we cannot cache our data or we want to make changes to it, but our operations eventually fail. In these cases, we can retry our action as we can expect that the resource will recover after some time or our load-balancer sends our request to a healthy instance.

You should be careful with adding retry logic to your applications and clients, as a larger amount of retries can make things even worse or even prevent the application from recovering.

In distributed system, a microservices system retry can trigger multiple other requests or retries and start a cascading effect. To minimize the impact of retries, you should limit the number of them and use an exponential backoff algorithm to continually increase the delay between retries until you reach the maximum limit.

As a retry is initiated by the client (browser, other microservices, etc.) and the client doesn't know that the operation failed before or after handling the request, you should prepare your application to handle idempotency. For example, when you retry a purchase operation, you shouldn't double charge the customer. Using a unique idempotency-key for each of your transactions can help to handle retries.

Rate Limiters and Load Shedders

Rate limiting is the technique of defining how many requests can be received or processed by a particular customer or application during a timeframe. With rate limiting, for example, you can filter out customers and microservices who are responsible for traffic peaks, or you can ensure that your application doesn’t overload until autoscaling can’t come to rescue.

You can also hold back lower-priority traffic to give enough resources to critical transactions.

Microservices Rate Limiter

A rate limiter can hold back traffic peaks

A different type of rate limiter is called the concurrent request limiter. It can be useful when you have expensive endpoints that shouldn’t be called more than a specified times, while you still want to serve traffic.

A fleet usage load shedder can ensure that there are always enough resources available to serve critical transactions. It keeps some resources for high priority requests and doesn’t allow for low priority transactions to use all of them. A load shedder makes its decisions based on the whole state of the system, rather than based on a single user’s request bucket size. Load shedders help your system to recover, since they keep the core functionalities working while you have an ongoing incident.

To read more about rate limiters and load shredders, I recommend checking out Stripe’s article.

Fail Fast and Independently

In a microservices architecture we want to prepare our services to fail fast and separately. To isolate issues on service level, we can use the bulkhead pattern. You can read more about bulkheads later in this blog post.

We also want our components to fail fast as we don't want to wait for broken instances until they timeout. Nothing is more disappointing than a hanging request and an unresponsive UI. It's not just wasting resources but also screwing up the user experience. Our services are calling each other in a chain, so we should pay an extra attention to prevent hanging operations before these delays sum up.

The first idea that would come to your mind would be applying fine grade timeouts for each service calls. The problem with this approach is that you cannot really know what's a good timeout value as there are certain situations when network glitches and other issues happen that only affect one-two operations. In this case, you probably don’t want to reject those requests if there’s only a few of them timeouts.

We can say that achieving the fail fast paradigm in microservices by using timeouts is an anti-pattern and you should avoid it. Instead of timeouts, you can apply the circuit-breaker pattern that depends on the success / fail statistics of operations.


Want to learn more about building reliable mircoservices architectures?

Check out our upcoming trainings!
Microservices Trainings


Bulkheads

Bulkhead is used in the industry to partition a ship into sections, so that sections can be sealed off if there is a hull breach.

The concept of bulkheads can be applied in software development to segregate resources.

By applying the bulkheads pattern, we can protect limited resources from being exhausted. For example, we can use two connection pools instead of a shared on if we have two kinds of operations that communicate with the same database instance where we have limited number of connections. As a result of this client - resource separation, the operation that timeouts or overuses the pool won't bring all of the other operations down.

One of the main reasons why Titanic sunk was that its bulkheads had a design failure, and the water could pour over the top of the bulkheads via the deck above and flood the entire hull.

Titanic Microservices Bulkheads

Bulkheads in Titanic (they didn't work)

Circuit Breakers

To limit the duration of operations, we can use timeouts. Timeouts can prevent hanging operations and keep the system responsive. However, using static, fine tuned timeouts in microservices communication is an anti-pattern as we’re in a highly dynamic environment where it's almost impossible to come up with the right timing limitations that work well in every case.

Instead of using small and transaction-specific static timeouts, we can use circuit breakers to deal with errors. Circuit breakers are named after the real world electronic component because their behavior is identical. You can protect resources and help them to recover with circuit breakers. They can be very useful in a distributed system where a repetitive failure can lead to a snowball effect and bring the whole system down.

A circuit breaker opens when a particular type of error occurs multiple times in a short period. An open circuit breaker prevents further requests to be made - like the real one prevents electrons from flowing. Circuit breakers usually close after a certain amount of time, giving enough space for underlying services to recover.

Keep in mind that not all errors should trigger a circuit breaker. For example, you probably want to skip client side issues like requests with 4xx response codes, but include 5xx server-side failures. Some circuit breakers can have a half-open state as well. In this state, the service sends the first request to check system availability, while letting the other requests to fail. If this first request succeeds, it restores the circuit breaker to a closed state and lets the traffic flow. Otherwise, it keeps it open.

Microservices Circuit Breakers

Circuit Breaker

Testing for Failures

You should continually test your system against common issues to make sure that your services can survive various failures. You should test for failures frequently to keep your team prepared for incidents.

For testing, you can use an external service that identifies groups of instances and randomly terminates one of the instances in this group. With this, you can prepare for a single instance failure, but you can even shut down entire regions to simulate a cloud provider outage.

One of the most popular testing solutions is the ChaosMonkey resiliency tool by Netflix.

Outro

Implementing and running a reliable service is not easy. It takes a lot of effort from your side and also costs money to your company.

Reliability has many levels and aspects, so it is important to find the best solution for your team. You should make reliability a factor in your business decision processes and allocate enough budget and time for it.

Key Takeways

  • Dynamic environments and distributed systems - like microservices - lead to a higher chance of failures.
  • Services should fail separately, achieve graceful degradation to improve user experience.
  • 70% of the outages are caused by changes, reverting code is not a bad thing.
  • Fail fast and independently. Teams have no control over their service dependencies.
  • Architectural patterns and techniques like caching, bulkheads, circuit breakers and rate-limiters help to build reliable microservices.

To learn more about running a reliable service check out our free Node.js Monitoring, Alerting & Reliability 101 e-book. In case you need help with implementing a microservices system, reach out to us at @RisingStack on Twitter, or enroll in our upcoming Building Microservices with Node.js.

Building an API Gateway using Node.js

Services in a microservices architecture share some common requirements regarding authentication and transportation when they need to be accessible by external clients. API Gateways provide a shared layer to handle differences between service protocols and fulfills the requirements of specific clients like desktop browsers, mobile devices, and legacy systems.

Microservices and consumers

Microservices are a service oriented architecture where teams can design, develop and ship their applications independently. It allows technology diversity on various levels of the system, where teams can benefit from using the best language, database, protocol, and transportation layer for the given technical challenge. For example, one team can use JSON over HTTP REST while the other team can use gRPC over HTTP/2 or a messaging broker like RabbitMQ.

Using different data serialization and protocols can be powerful in certain situations, but clients that want to consume our product may have different requirements. The problem can also occur in systems with homogeneous technology stack as consumers can vary from a desktop browser through mobile devices and gaming consoles to legacy systems. One client may expect XML format while the other one wants JSON. In many cases, you need to support both.

Another challenge that you can face when clients want to consume your microservices comes from generic shared logic like authentication, as you don't want to re-implement the same thing in all of your services.

To summarize: we don't want to implement our internal services in our microservices architecture in a way to support multiple clients and re-implement the same logic all over. This is where the API Gateway comes into the picture and provides a shared layer to handle differences between service protocols and fulfills the requirements of specific clients.

What is an API Gateway?

API Gateway is a type of service in a microservices architecture which provides a shared layer and API for clients to communicate with internal services. The API Gateway can route requests, transform protocols, aggregate data and implement shared logic like authentication and rate-limiters.

You can think about API Gateway as the entry point to our microservices world.
Our system can have one or multiple API Gateways, depending on the clients' requirements. For example, we can have a separate gateway for desktop browsers, mobile applications and public API(s) as well.

API Gateway API Gateway as an entry point to microservices


Want to learn more about microservices and API Gateways?

Check out our training offerings
Microservices Trainings


Node.js API Gateway for frontend teams

As API Gateway provides functionality for client applications like browsers - it can be implemented and managed by the team who is responsible for the frontend application.

It also means that language the API Gateway is implemented in language should be chosen by the team who is responsible for the particular client. As JavaScript is the primary language to develop applications for the browser, Node.js can be an excellent choice to implement an API Gateway even if your microservices architecture is developed in a different language.

Netflix successfully uses Node.js API Gateways with their Java backend to support a broad range of clients - to learn more about their approach read The "Paved Road" PaaS for Microservices at Netflix article.

Netflix's approach to handle different clients, source

API Gateway functionalities

We discussed earlier that you can put generic shared logic into your API Gateway, this section will introduce the most common gateway responsibilities.

Routing and versioning

We defined the API Gateway as the entry point to your microservices. In your gateway service, you can route requests from a client to specific services. You can even handle versioning during routing or change the backend interface while the publicly exposed interface can remain the same. You can also define new endpoints in your API gateway that cooperates with multiple services.

API Gateway - Entry point API Gateway as microservices entry point

Evolutionary design

The API Gateway approach can also help you to break down your monolith application. In most of the cases rewriting your system from scratch as a microservices is not a good idea and also not possible as we need to ship features for the business during the transition.

In this case, we can put a proxy or an API Gateway in front of our monolith application and implement new functionalities as microservices and route new endpoints to the new services while we can serve old endpoints via monolith. Later we can also break down the monolith with moving existing functionalities into new services.

With evolutionary design, we can have a smooth transition from monolith architecture to microservices.

API Gateway - Evolutionary design Evolutionary design with API Gateway

Authentication

Most of the microservices infrastructure need to handle authentication. Putting shared logic like authentication to the API Gateway can help you to keep your services small and domain focused.

In a microservices architecture, you can keep your services protected in a DMZ (demilitarized zone) via network configurations and expose them to clients via the API Gateway. This gateway can also handle more than one authentication method. For example, you can support both cookie and token based authentication.

API Gateway - Authentication API Gateway with Authentication

Data aggregation

In a microservices architecture, it can happen that the client needs data in a different aggregation level, like denormalizing data entities that take place in various microservices. In this case, we can use our API Gateway to resolve these dependencies and collect data from multiple services.

In the following image you can see how the API Gateway merges and returns user and credit information as one piece of data to the client. Note, that these are owned and managed by different microservices.

API Gateway - Data aggregation

Serialization format transformation

It can happen that we need to support clients with different data serialization format requirements.
Imagine a situation where our microservices uses JSON, but one of our customers can only consume XML APIs. In this case, we can put the JSON to XML conversion into the API Gateway instead of implementing it in all of the microservices.

API Gateway - Data serialization format transformation

Protocol transformation

Microservices architecture allows polyglot protocol transportation to gain the benefit of different technologies. However most of the client support only one protocol. In this case, we need to transform service protocols for the clients.

An API Gateway can also handle protocol transformation between client and microservices.
In the next image, you can see how the client expects all of the communication through HTTP REST while our internal microservices uses gRPC and GraphQL.

API Gateway - Protocol transformation

Rate-limiting and caching

In the previous examples, you could see that we can put generic shared logic like authentication into the API Gateway. Other than authentication you can also implement rate-limiting, caching and various reliability features in you API Gateway.

Overambitious API gateways

While implementing your API Gateway, you should avoid putting non-generic logic - like domain specific data transformation - to your gateway.

Services should always have full ownership over their data domain. Building an overambitious API Gateway takes the control from service teams that goes against the philosophy of microservices.

This is why you should be careful with data aggregations in your API Gateway - it can be powerful but can also lead to domain specific data transformation or rule processing logic that you should avoid.

Always define clear responsibilities for your API Gateway and only include generic shared logic in it.


Want to learn more about microservices and API Gateways?

Check out our training offerings
Microservices Trainings


Node.js API Gateways

While you want to do simple things in your API Gateway like routing requests to specific services you can use a reverse proxy like nginx. But at some point, you may need to implement logic that's not supported in general proxies. In this case, you can implement your own API Gateway in Node.js.

In Node.js you can use the http-proxy package to simply proxy requests to a particular service or you can use the more feature rich express-gateway to create API gateways.

In our first API Gateway example, we authenticate the request before we proxy it to the user service.

const express = require('express')  
const httpProxy = require('express-http-proxy')  
const app = express()

const userServiceProxy = httpProxy('https://user-service')

// Authentication
app.use((req, res, next) => {  
  // TODO: my authentication logic
  next()
})

// Proxy request
app.get('/users/:userId', (req, res, next) => {  
  userServiceProxy(req, res, next)
})

Another approach can be when you make a new request in your API Gateway, and you return the response to the client:

const express = require('express')  
const request = require('request-promise-native')  
const app = express()

// Resolve: GET /users/me
app.get('/users/me', async (req, res) => {  
  const userId = req.session.userId
  const uri = `https://user-service/users/${userId}`
  const user = await request(uri)
  res.json(user)
})

Node.js API Gateways Summarized

API Gateway provides a shared layer to serve client requirements with microservices architecture. It helps to keep your services small and domain focused. You can put different generic logic to your API Gateway, but you should avoid overambitious API Gateways as these take the control from service teams.

Microservices Distributed Tracing with Node.js and OpenTracing

Microservices is a powerful architecture pattern with many advantages, but it also brings new challenges regarding debugging - as it’s a distributed architecture that moves the complexity to the network.

Distributed tracing (and OpenTracing) provides a solution by giving enough visibility and information about cross-process communication.

This article explains the basics of distributed tracing as well as shows an open-source solution to debug Node.js based microservices applications.

Microservices debugging

Microservices is a powerful architecture pattern which helps your company to move fast and ship features frequently: it maximizes the impact of autonomous teams with allowing them to design, build and deploy their services independently as they have full ownership over the lifecycle of their applications.

However, we shouldn't forget that a microservices architecture produces a distributed system which moves the complexity to the network layer.

Developers who have experience building and operating microservices know that debugging and observing a distributed system is challenging, as the communication between components doesn't happen with in-memory function calls. It also means that we don't have stack-traces anymore.

This is the case when distributed tracing comes to the rescue and provides visibility for microservices.

Distributed Tracing

Traditional monitoring tools such as metrics and logging solutions still have their place, but they often fail to provide visibility across services. This is where distributed tracing thrives.

Distributed tracing provides enough visibility to debug microservices architectures via propagating transactions from distributed services and gaining information from cross-process communications.

The idea of distributed tracing is not new, Google successfully has been using it internally to understand system behavior and reasoning about performance issues for more than a decade. Google also published a whitepaper about their internal solution called Dapper in 2010.

Distributed tracing gives visibility about microservices communication

Distributed tracing gives visibility about microservices communication

Distributed Tracing Concepts

The Google Dapper whitepaper introduces the two basic elements of distributed tracing: Span and Trace.

Span

A Span represents a logical unit of work in the system that has an operation name, start time and duration. Spans may be nested and ordered to model causal relationships. An RPC call like an HTTP request or a database query is an example of a span, but you can also represent internal operations with spans.

Spans are controlled by events in a system. They can be started, finished and extended with operational data that makes debugging easier.

For example when we create an HTTP call to the other service we want to start and span, and we want to finish it when our response received while we can decorate it with the status code and other metadata.

Trace

A Trace is represented by one or more spans. It's an execution path through the system. You can think about it of as a DAG (Directed Acyclic Graph) of spans.

Opentracing and spans

Trace: graph of spans on a timeline, source: Jaeger

Context propagation

For being able to connect spans and define connections, we need to share some tracing context both within and between processes. For example, we need to define parent-child relation between spans.

Cross-process communication can happen via different channels and protocols like HTTP requests, RPC frameworks, messaging workers or something else. To share the tracing context, we can use meta headers. For example, in an HTTP request, we can use request headers like X-Trace or Trace-Parent-ID.

To manage a span lifecycle and handle the context propagation we need to instrument our code. In our next section, we will discuss instrumentation.

Instrumentation

In the Tracing Concepts section, we discussed that we need to instrument our code to start and finish spans, to decorate them with metadata and to connect them between different processes.

This kind of instrumentation need some time and will produce extra code as we need to touch every part of our application to propagate the tracing context both within and between processes.

We can write this kind of instrumentation on our own, or we can use an out of the box solution like Trace, our Node.js Monitoring & Debugging Platform.

If you decide that you want to do the instrumentation on your own, you should always be very careful while doing so. Instrumentation can introduce bugs and cause performance issues in your application or it can simply make your code very hard to read.

OpenTracing

Okay, in case you decided that you want to do the instrumentation on your own, wouldn't be great if you could do it in a vendor neutral way?

I mean, who wants to spend weeks or months to instrument their code if they have to repeat this process when they want to try out a different distributed tracing solution?

Nobody, right?!

This is exactly the challenge that OpenTracing addresses with providing a standard, vendor neutral interface for instrumentation.

The future of OpenTracing standard also means that maintainers of open source libraries and service providers can provide their solutions with built-in vendor neutral instrumentations for distributed tracing.

How cool would it be if the request and express npm packages would come with built-in OpenTracing instrumentation?

Today we are not there yet. We need to instrument our own code as well as the libraries that we use in our application.

OpenTracing Example

Let's see the following simple code snippet that makes a request to a remote site:

const request = require('request')

// Request options
const uri = 'https://risingstack.com'  
const method = 'GET'  
const headers = {}

request({ uri, method, headers }, (err, res) => {  
  if (err) {
    return
  }
})

Now let's see the very same code snippet when it's instrumented with OpenTracing:

const request = require('request')  
const { Tags, FORMAT_HTTP_HEADERS } = require('opentracing')  
const tracer = require('./my-tracer') // jaeger etc.

// Request options
const uri = 'https://risingstack.com'  
const method = 'GET'  
const headers = {}

// Start a span
const span = tracer.startSpan('http_request')  
span.setTag(Tags.HTTP_URL, uri)  
span.setTag(Tags.HTTP_METHOD, method)

// Send span context via request headers (parent id etc.)
tracer.inject(span, FORMAT_HTTP_HEADERS, headers)

request({ uri, method, headers }, (err, res) => {  
  // Error handling
  if (err) {
    span.setTag(Tags.ERROR, true)
    span.setTag(Tags.HTTP_STATUS_CODE, err.statusCode)
    span.log({
      event: 'error',
      message: err.message,
      err
    })
    span.finish()
    return
  }

  // Finish span
  span.setTag(Tags.HTTP_STATUS_CODE, res.statusCode)
  span.finish()
})

I think it's easy to say that the instrumented code is much more complicated and requires more effort from our side.

Cross-process propagation in Node.js

Earlier in this article, we discussed that distributed tracing requires cross-process Context Propagation to share information between processes and connect spans.

This kind of coordination between different parts of the application needs a standard solution, like a specific request header that each application must send and understand.

OpenTracing has an elegant solution to give enough freedom to the tracer provider to define these headers, while it gives a well-defined instrumentation interface for setting and reading them.

Let's see a Node.js example on how you can share context in an HTTP request:

// Client side of HTTP request
const span= tracer.startSpan('http_request')  
const headers = {}

tracer.inject(span, FORMAT_HTTP_HEADERS, headers)  
request({ uri, method, headers }, (err, res) => { ... })  

This is how you can read the context and define the relation between spans on the server side of the very same request:

// Server side of HTTP request
app.use((req, res) => {  
  const parentSpanContext = tracer.extract(FORMAT_HTTP_HEADERS, req.headers)
  const span = tracer.startSpan('http_server', {
    childOf: parentSpanContext
  })
})

You can see that the extract(..) and inject(..) interfaces provide a vendor neutral instrumentation interface to share context between processes.

The previous code snippet will add different request headers per different tracing vendors. For example, with the Jaeger vendor (see later) it will add the uber-trace-id headers to your HTTP request.

Sampling

Distributed tracing has other challenges besides instrumentation. For example, in most of the cases, we cannot collect tracing information from all of our communication as it would be too much data to report, store and process. In this case, we need to sample our traces and spans to keep the data small but representative.

In our sampling algorithm, we can weigh our traces based on different aspects like priority, error type or occurrence.

In Trace, our Node.js Monitoring & Debugging tool we collect and group traces by similarity. We don't just make them easy to overview, but you can also see the errors occurrence number and make decisions based on that.

microservices traces by similarity and occurrence Traces by similarity and occurrence

Open-source Tracers

We call the application that collects, stores, process and visualize distributed tracing data a Tracer. The most popular open-source tracers today are Zipkin and Jaeger:

  • Zipkin's design is based on the Google Dapper paper and was open-sourced by Twitter in 2012.
  • Jaeger is a new distributed solution built around OpenTracing and released in April 2017.

In the next section, we will dig deeper to Jaeger as it is OpenTracing compatible.

Jaeger

Jaeger is an OpenTracing compatible tracer that is built and open-sourced by Uber in 2017. You can read more about the history and evolution of tracing at Uber in their article.

Jaeger's backend is implemented in Go and uses Cassandra as data storage, while the UI is built with React.

The agent and collector can also accept Zipkin Spans, and it transforms them to Jaegers' data model before storage.

The architecture of Jaeger Architecture of Jaeger

You can try out Jaeger with Docker, using the pre-built image that contains all of the necessary components:

docker run -d -p5775:5775/udp -p6831:6831/udp -p6832:6832/udp -p5778:5778 -p16686:16686 -p14268:14268 jaegertracing/all-in-one:latest  

Jaegers' UI gives us insight about trace durations and provides a search interface, as well as a timeline visualization platform to look and inspect traces.

Jaeger opentracing dashboard List of traces on Jaeger UI

Jaeger and Node.js

Jaegers' npm package is called jaeger-client. It provides an OpenTracing interface with a built-in agent, so you can instrument your code as we did it above in the OpenTracing section.

You might ask: Is there a way I can skip instrumentation?

The answer is yes! :)

RisingStack is pleased to announce the @risingstack/jaeger-node npm package that provides automatic instrumentation for Node.js core modules, the most popular database drives (MongoDB, PostgreSQL, Redis, etc.) and web frameworks like express.

Automatic instrumentation for Node.js and npm libraries with Jaeger Automatic instrumentation for Node.js and npm libraries with Jaeger

The jaeger-node library is built around the Node.js feature called async_hooks that makes possible the efficient and accurate tracking of asynchronous operations inside the application.

However async_hooks is the future of debugging and monitoring Node.js asynchronous resources it's an experimental feature yet.

Which means: Please do not use in production yet.


Expert Node.js Help

Looking to implement distributed tracing in your organization using cloud-native technologies?
Learn more


Conclusion

There are new standards and tools like OpenTracing and Jaeger that can bring us the future of tracing, but we need to work together with open source maintainers to make it widely adopted.

Node.js Performance Monitoring with Prometheus

This article helps you to understand what to monitor if you have a Node.js application in production, and how to use Prometheus - an open-source solution, which provides powerful data compressions and fast data querying for time series data - for Node.js monitoring.

What is Node.js Monitoring?

The term "service monitoring", means tasks of collecting, processing, aggregating, and displaying real-time quantitative data about a system.

Monitoring gives us the ability to observe our system's state and address issues before they impact our business. Monitoring can also help to optimize our users' experience.

To analyze the data, first, you need to extract metrics from your system - like the Memory usage of a particular application instance. We call this extraction instrumentation.

We use the term white box monitoring when metrics are provided by the running system itself. This is the kind of Node.js monitoring we'll be diving into.

The four signals to know

Every service is different, and you can monitor many aspects of them. Metrics can range from low-level resources like Memory usage to high-level business metrics like the number of signups.

We recommend you to watch these signals for all of your services:

  • Error Rate: Because errors are user facing and immediately affect your customers.
  • Response time: Because the latency directly affects your customers and business.
  • Throughput: The traffic helps you to understand the context of increased error rates and the latency too.
  • Saturation: It tells how "full" your service is. If the CPU usage is 90%, can your system handle more traffic?

Instrumentation

You can instrument your system manually, but most of the paid monitoring solutions provide out of the box instrumentations.

In many cases, instrumentation means adding extra logic and code pieces that come with a performance overhead.

With Node.js monitoring and instrumentation, you should aim to achieve low overhead, but it doesn’t necessarily mean that a bigger performance impact is not justifiable for better system visibility.

The risk of instrumenting your code

Instrumentations can be very specific and usually need expertise and more development time. Also, a bad instrumentation can introduce bugs into your system or generate an unreasonable performance overhead.

Instrumenting your code can also produce a lot's of extra lines and bloat your applications codebase.

Node.js Monitoring Ebook Download

Picking your Node.js Monitoring Tool

When your team picks a monitoring tool you should consider the following aspects:

  • Expertise: Do you have the expertise? Building a monitoring tool and writing a high-quality instrumentation and extracting the right metrics is not easy. You need to know what you are doing.

  • Build or buy: Building a proper monitoring solution needs lot's of expertise, time and money while obtaining an existing solution can be easier and cheaper.

  • SaaS or on-premises: Do you want to host your monitoring solution? Can you use a SaaS solution, what's your data compliance and protection policy? Using a SaaS solution can be a good pick for example when you want to focus on your product instead of tooling. Both open-source and commercial solutions are usually available as hosted or on-premises setup.

  • Licensing: Do you want to ship your monitoring toolset with your product? Can you use a commercial solution? You should always check licensing.

  • Integrations: Does it support my external dependencies like databases, orchestration system and npm libraries?

  • Instrumentation: Does it provide automatic instrumentation? Do I need to instrument my code manually? How much time would it take to do it on my own?

  • Microservices: Do you build a monolith or a distributed system? Microservices needs specific tools and philosophy to debug and monitor them effectively. Do you need to distribute tracing or security checks?

Based on our experience, in most of the cases an out of the box SaaS or on-premises monitoring solution like Trace gives the right amount of visibility and toolset to monitor and debug your Node.js applications.

But what can you do when you cannot choose a commercial solution for some reason, and you want to build your own monitoring suite?

This is the case when Prometheus comes into the picture!

Node Monitoring with Prometheus

Prometheus is an open-source solution for Node.js monitoring and alerting. It provides powerful data compressions and fast data querying for time series data.

Time series is a stream of immutable timestamped values that belong to the same metric and the same labels. The labels cause the metrics to be multi-dimensional.

You can read more about how Prometheus optimizes its storage engine in the Writing a Time Series Database from Scratch article.

FunFact: Prometheus was initially built at SoundCloud, in 2016 it joined the Cloud Native Computing Foundation as the second hosted project after Kubernetes.

Data collection and metrics types

Prometheus uses the HTTP pull model, which means that every application needs to expose a GET /metrics endpoint that can be periodically fetched by the Prometheus instance.

Prometheus has four metrics types:

  • Counter: cumulative metric that represents a single numerical value that only ever goes up
  • Gauge: represents a single numerical value that can arbitrarily go up and down
  • Histogram: samples observations and counts them in configurable buckets
  • Summary: similar to a histogram, samples observations, it calculates configurable quantiles over a sliding time window

In the following snippet, you can see an example response for the /metrics endpoint. It contains both the counter (nodejs_heap_space_size_total_bytes) and histogram (http_request_duration_ms_bucket) types of metrics:

# HELP nodejs_heap_space_size_total_bytes Process heap space size total from node.js in bytes.
# TYPE nodejs_heap_space_size_total_bytes gauge
nodejs_heap_space_size_total_bytes{space="new"} 1048576 1497945862862  
nodejs_heap_space_size_total_bytes{space="old"} 9818112 1497945862862  
nodejs_heap_space_size_total_bytes{space="code"} 3784704 1497945862862  
nodejs_heap_space_size_total_bytes{space="map"} 1069056 1497945862862  
nodejs_heap_space_size_total_bytes{space="large_object"} 0 1497945862862

# HELP http_request_duration_ms Duration of HTTP requests in ms
# TYPE http_request_duration_ms histogram
http_request_duration_ms_bucket{le="10",code="200",route="/",method="GET"} 58  
http_request_duration_ms_bucket{le="100",code="200",route="/",method="GET"} 1476  
http_request_duration_ms_bucket{le="250",code="200",route="/",method="GET"} 3001  
http_request_duration_ms_bucket{le="500",code="200",route="/",method="GET"} 3001  
http_request_duration_ms_bucket{le="+Inf",code="200",route="/",method="GET"} 3001  

Prometheus offers an alternative, called the Pushgateway to monitor components that cannot be scrapped because they live behind a firewall or are short-lived jobs.

Before a job gets terminated, it can push metrics to this gateway, and Prometheus can scrape the metrics from this gateway later on.

To set up Prometheus to periodically collect metrics from your application check out the following example configuration.

Monitoring a Node.js application

When we want to monitor our Node.js application with Prometheus, we need to solve the following challenges:

  • Instrumentation: Safely instrumenting our code with minimal performance overhead
  • Metrics exposition: Exposing our metrics for Prometheus with an HTTP endpoint
  • Hosting Prometheus: Having a well configured Prometheus running
  • Extracting value: Writing queries that are statistically correct
  • Visualizing: Building dashboards and visualizing our queries
  • Alerting: Setting up efficient alerts
  • Paging: Get notified about alerts with applying escalation policies for paging

Node.js Monitoring Ebook Download

Node.js Metrics Exporter

To collect metrics from our Node.js application and expose it to Prometheus we can use the prom-client npm library.

In the following example, we create a histogram type of metrics to collect our APIs' response time per routes. Take a look at the pre-defined bucket sizes and our route label:

// Init
const Prometheus = require('prom-client')  
const httpRequestDurationMicroseconds = new Prometheus.Histogram({  
  name: 'http_request_duration_ms',
  help: 'Duration of HTTP requests in ms',
  labelNames: ['route'],
  // buckets for response time from 0.1ms to 500ms
  buckets: [0.10, 5, 15, 50, 100, 200, 300, 400, 500]
})

We need to collect the response time after each request and report it with the route label.

// After each response
httpRequestDurationMicroseconds  
  .labels(req.route.path)
  .observe(responseTimeInMs)

We can register a route a GET /metrics endpoint to expose our metrics in the right format for Prometheus .

// Metrics endpoint
app.get('/metrics', (req, res) => {  
  res.set('Content-Type', Prometheus.register.contentType)
  res.end(Prometheus.register.metrics())
})

Queries

After we collected our metrics, we want to extract some value from them to visualize.

Prometheus provides a functional expression language that lets the user select and aggregate time series data in real time.

The Prometheus dashboard has a built-in query and visualization tool:

Prometheus Time Series Data Graph Prometheus dashboard

Let's see some example queries for response time and memory usage.

Query: 95th Response Time

We can determinate the 95th percentile of our response time from our histogram metrics. With the 95th percentile response time, we can filter out peaks, and it usually gives a better understanding of the average user experience.

histogram_quantile(0.95, sum(rate(http_request_duration_ms_bucket[1m])) by (le, service, route, method))  

Query: Average Response Time

As histogram type in Prometheus also collects the count and sum values for the observed metrics, we can divide them to get the average response time for our application.

avg(rate(http_request_duration_ms_sum[1m]) / rate(http_request_duration_ms_count[1m])) by (service, route, method, code)  

For more advanced queries like Error rate and Apdex score check out our Prometheus with Node.js example repository.

Alerting

Prometheus comes with a built-in alerting feature where you can use your queries to define your expectations, however, Prometheus alerting doesn't come with a notification system. To set up one, you need to use the Alert manager or an other external process.

Let's see an example of how you can set up an alert for your applications' median response time. In this case, we want to fire an alert when the median response time goes above 100ms.

# APIHighMedianResponseTime
ALERT APIHighMedianResponseTime  
  IF histogram_quantile(0.5, sum(rate(http_request_duration_ms_bucket[1m])) by (le, service, route, method)) > 100
  FOR 60s
  ANNOTATIONS {
    summary = "High median response time on {{ $labels.service }} and {{ $labels.method }} {{ $labels.route }}",
    description = "{{ $labels.service }}, {{ $labels.method }} {{ $labels.route }} has a median response time above 100ms (current value: {{ $value }}ms)",
  }

Prometheus alerting Prometheus active alert in pending state

Kubernetes integration

Prometheus offers a built-in Kubernetes integration. It's capable of discovering Kubernetes resources like Nodes, Services, and Pods while scraping metrics from them.

It's an extremely powerful feature in a containerized system, where instances born and die all the time. With a use-case like this, HTTP endpoint based scraping would be hard to achieve through manual configuration.

You can also provision Prometheus easily with Kubernetes and Helm. It only needs a couple of steps. First of all, we need a running Kubernetes cluster!

As Azure Container Service provides a hosted Kubernetes, I can provision one quickly:

# Provision a new Kubernetes cluster
az acs create -n myClusterName -d myDNSPrefix -g myResourceGroup --generate-ssh-keys --orchestrator-type kubernetes

# Configure kubectl with the new cluster
az acs kubernetes get-credentials --resource-group=myResourceGroup --name=myClusterName  

After a couple of minutes when our Kubernetes cluster is ready, we can initialize Helm and install Prometheus:

helm init  
helm install stable/prometheus  

For more information on provisioning Prometheus with Kubernetes check out the Prometheus Helm chart.

Grafana

As you can see, the built-in visualization method of Prometheus is great to inspect our queries output, but it's not configurable enough to use it for dashboards.

As Prometheus has an API to run queries and get data, you can use many external solutions to build dashboards. One of my favorite is Grafana.

Grafana is an open-source, pluggable visualization platform. It can process metrics from many types of systems, and it has built-in Prometheus data source support.

In Grafana, you can import an existing dashboard or build you own.

Dashboard with Grafana for Node.js Monitoring Dashboard with Grafana - click for high-res

Conclusion

Prometheus is a powerful open-source tool to monitor your application, but as you can see, it doesn't work out of the box.

With Prometheus, you need expertise to instrument your application, observe your data, then query and visualize your metrics.

In case you're looking for a simple but powerful out of the box tool to debug and monitor your Node.js application, check out our solution called Trace.


You can find our example repository below, which can help you with more in-depth advice in case you'll choose this way of monitoring your Node.js application.

Example repository: RisingStack/example-prometheus-nodejs

Packing a Kubernetes Microservices App with Helm

In this blog post, I'll show how we packed our Kubernetes microservices app with Helm and made them easy to reproduce in various environments.

Shipping microservices as a single piece of block

At RisingStack we use Kubernetes with tens of microservices to provide our Node.js monitoring solution for our SaaS customers.

During the last couple of months, we were asked by many enterprises with strict data compliance requirements to make our product available as an on-premises solution. So we had to find a solution that makes easy for them to install Trace as a single piece of software and hides the complexity of our infrastructure.

It's challenging because Trace contains many small applications, databases, and settings. We wanted to find a solution that's not just easy to ship but is also highly configurable.

As Kubernetes is configuration based, we started to look for templating solutions that brought up new challenges. That's how we found Helm, which provides a powerful templating and package management solution for Kubernetes.

Thanks to this process, Trace is now available as an on-premises Node.js monitoring solution, and you can have the same experience in your own cloud as our SaaS customers.

In this blog post, I'll show how we packed our Kubernetes microservices app with Helm and made them easy to reproduce in various environments.


Want to learn more about building reliable mircoservices architectures?

Check out our upcoming trainings!
Microservices Trainings


Kubernetes resource definitions

One of the best features of Kubernetes is its configuration based nature, which allows to create or modify your resources. You can easily set up and manage your components from running containers to load balancers through YAML or JSON files.

Kubernetes makes it super easy to reproduce the same thing, but it can be challenging to modify and manage different Docker image tags, secrets and resource limits per different environments.

Take a look at the following YAML snippet that creates three running replicas from the metrics-processor container with the same DB_URI environment variable:

apiVersion: apps/v1beta1  
kind: Deployment  
metadata:  
  name: metrics-processor
spec:  
  replicas: 3
    spec:
      containers:
      - name: metrics-processor
        image: myco/metrics-processor:1.7.9
        env:
        - name: DB_URI
          value: postgres://my-uri

What would happen if we want to ship a different version from our application that connects to a separate database? How about introducing some templating?

For your production application you would probably use Kubernetes Secret resource that expects Base64 encoded strings and makes it even more challenging to dynamically configure them.

Kubernetes Templating challenges

I think we all feel that we need to introduce some kind of templating solution here, but why can it be challenging?

First of all, in Kubernetes, some resources depend on each other. For example, a Deployment can use various secrets, or we want to run some migration jobs before we kick off our applications. This means that we need a solution which is capable of managing these dependency graphs, can run our templates in the correct order.

Another great challenge comes with managing our configurations and a different version of our templates and variables to update our resources. We really want to avoid the situation when we need to re-create everything to update our Docker image tag only.

This is where Helm comes to rescue the day.

Templating with Helm

Helm is a tool for managing Kubernetes charts. Charts are packages of pre-configured Kubernetes resources.

Helm is an open source project maintained by the Kubernetes organization. It makes it easy to pack, ship and update Kubernetes resources together as a single package.

HELM

One of the best parts of Helm is that it comes with an open-source repository maintained by the community, where you can find hundreds of different pre-packed solutions from databases like MongoDB and Redis, to applications like Wordpress and OpenVPN.

With Helm, you can install complex solutions like a Jenkins master-slave architecture in minutes.

helm install --name my-jenkins stable/jenkins  

Helm not just provisions your Kubernetes resources in the correct order. It also comes with lifecycle hooks, advanced templating and the concept of sub-charts. For the complete list, I recommend to check out their documentation.

How does Helm work?

Helm is working in a client-server architecture where the Tiller Server is an in-cluster server that interacts with the Helm client, and interfaces with the Kubernetes API server. It is responsible for combining charts and installing Kubernetes resources asked by the client.

While the Helm Client is a command-line client for end users, the client is responsible for communicating with the tiller server.

Helm example

In this example, I'll show how you can install a Jenkins with master-slave settings to Kubernetes with Azure Container Service in minutes.

First of all, we need a running Kubernetes cluster. Luckily, Azure's Container Service provides a hosted Kubernetes, so I can provision one quickly:

# Provision a new Kubernetes cluster
az acs create -n myClusterName -d myDNSPrefix -g myResourceGroup --generate-ssh-keys --orchestrator-type kubernetes

# Configure kubectl with the new cluster
az acs kubernetes get-credentials --resource-group=myResourceGroup --name=myClusterName  

If you don't have kubectl run: az acs kubernetes install-cl

After a couple of minutes, when our Kubernetes cluster is ready, we can initialize the Helm Tiller:

helm init  

The helm init command installs Helm Tiller into the current Kubernetes cluster.

On OSX you can install Helm with brew: brew install kubernetes-helm, with other platforms check out their installation docs.

After my Helm is ready to accept charts, I can install Jenkins from the official Helm repository:

helm install --name my-ci --set Master.ServiceType=NodePort,Persistence.Enabled=false stable/jenkins  

For the sake of simplicity and security, I disabled persistent volume and service exposing in this example.

That's it! To visit our freshly installed Jenkins, follow the instructions in the Helm install output or use the kubectl pot-forward <pod-name> 8080 terminal command.

Jenkins

In a really short amount of time, we just provisioned a Jenkins master into our cluster, which also runs its slaves in Kubernetes. It is also able to manage our other Kubernetes resources so we can immediately start to build CI pipelines.

Trace as a Helm chart

With Helm, we were able to turn our applications, configurations, autoscaling settings and load balancers into a Helm chart that contains smaller sub-charts and ship it as one piece of the chart. It makes possible to easily reproduce our whole infrastructure in a couple of minutes.

We're not just using this to ship the on-premises version of Trace, but we can also easily run multiple test environments or even move/copy our entire SaaS infrastructure between multiple cloud providers. We only need a running Kubernetes cluster.

To make it easy to turn our existing Kubernetes resources to a Helm chart, we created an npm library called anchor. Anchor automatically extracts configurations from resources and save them as values and templates in a reproducible Helm chart.

Keeping Helm charts in sync

For keeping our charts in sync with our infrastructure, we changed our releasing process to update our Helm repository and modify the chart's Docker image tag. For this, we created a small service that uses the GitHub API; it is triggered by our CI.

Kubernetes & Helm

The popularity of Kubernetes increases rapidly, while hosted cluster solutions are becoming available by cloud providers like Azure. With Helm, you can ship and install complex microservices applications or databases into your Kubernetes cluster.

It was never easier to try out new technologies and ship awesome features.

If you have any questions about Kubernetes, Helm, or the whole process, please let me know in the comments section!

CQRS Explained

What is CQRS?

CQRS is an architectural pattern, where the acronym stands for Command Query Responsibility Segregation. We can talk about CQRS when the data read operations are separated from the data write operations, and they happen on a different interface.

In most of the CQRS systems, read and write operations use different data models, sometimes even different data stores. This kind of segregation makes it easier to scale, read and write operations and to control security - but adds extra complexity to your system.

Node.js at Scale is a collection of articles focusing on the needs of companies with bigger Node.js installations and advanced Node developers. Chapters:


The level of segregation can vary in CQRS systems:

  • single data stores and separated model for reading and updating data
  • separated data stores and separated model for reading and updating data

In the simplest data store separation, we can use read-only replicas to achieve segregation.

Why and when to use CQRS?

In a typical data management system, all CRUD (Create Read Update Delete) operations are executed on the same interface of the entities in a single data storage. Like creating, updating, querying and deleting table rows in an SQL database via the same model.

CQRS really shines compared to the traditional approach (using a single model) when you build complex data models to validate and fulfil your business logic when data manipulation happens. Read operations compared to update and write operations can be very different or much simpler - like accessing a subset of your data only.

Real world example

In our Node.js Monitoring Tool, we use CQRS to segregate saving and representing the data. For example, when you see a distributed tracing visualization on our UI, the data behind it arrived in smaller chunks from our customers application agents to our public collector API.

In the collector API, we only do a thin validation and send the data to a messaging queue for processing. On the other end of the queue, workers are consuming messages and resolving all the necessary dependencies via other services. These workers are also saving the transformed data to the database.

If any issue happens, we send back the message with exponential backoff and max limit to our messaging queue. Compared to this complex data writing flow, on the representation side of the flow, we only query a read-replica database and visualize the result to our customers.

Microservice with CQRS Trace by RisingStack data processing with CQRS

CQRS and Event Sourcing

I've seen many times that people are confusing these two concepts. Both of them are heavily used in event driven infrastructures like in an event driven microservices, but they mean very different things.

To read more about Event Sourcing with Examples, check out our previous Node.js at Scale article.

Download the whole building with Node.js series as a single pdf

Reporting database - Denormalizer

In some event driven systems, CQRS is implemented in a way that the system contains one or multiple Reporting databases.

A Reporting database is an entirely different read-only storage that models and persists the data in the best format for representing it. It's okay to store it in a denormalized format to optimize it for the client needs. In some cases, the reporting database contains only derived data, even from multiple data sources.

In a microservices architecture, we call a service the Denormalizer if it listens for some events and maintains a Reporting Database based on these. The client is reading the denormalized service's reporting database.

An example can be that the user profile service emits a user.edit event with { id: 1, name: 'John Doe', state: 'churn' } payload, the Denormalizer service listens to it but only stores the { name: 'John Doe' } in its Reporting Database, because the client is not interested in the internal state churn of the user.

It can be hard to keep a Reporting Database in sync. Usually, we can only aim to eventual consistency.

A CQRS Node.js Example Repo

For our CQRS with Denormalizer Node.js example visit our cqrs-example GitHub repository.

CQRS Example

Outro

CQRS is a powerful architectural pattern to segregate read and write operations and their interfaces, but it also adds extra complexity to your system. In most of the cases, you shouldn't use CQRS for the whole system, only for specific parts where the complexity and scalability make it necessary.

To read more about CQRS and Reporting databases, I recommend to check out these resources:

In the next chapter of the Node.js at Scale series we'll discuss Node.js Testing and Getting TDD Right. Read on! :)

I’m happy to answer your CQRS related questions in the comments section!

Event Sourcing with Examples in Node.js

Event Sourcing is a powerful architectural pattern to handle complex application states that may need to be rebuilt, re-played, audited or debugged.

From this article you can learn what Event Sourcing is, and when should you use it. We’ll also take a look at some Event sourcing examples with code snippets.

Node.js at Scale is a collection of articles focusing on the needs of companies with bigger Node.js installations and advanced Node developers. Chapters:

Event Sourcing

Event Sourcing is a software architecture pattern which makes it possible to reconstruct past states (latest state as well). It's achieved in a way that every state change gets stored as a sequence of events.

The State of your application is like a user's account balance or subscription at a particular time. This current state may only exist in memory.

Good examples for Event Sourcing are version control systems that stores current state as diffs. The current state is your latest source code, and events are your commits.

Why is Event Sourcing useful?

In our hypothetical example, you are working on an online money transfer site, where every customer has an account balance. Imagine that you just started working on a beautiful Monday morning when it suddenly turns out that you made a mistake and used a wrong currency exchange for the whole past week. In this case, every account which sent and received money in a last seven days are in a corrupt state.

With event sourcing, there’s no need to panic!

If your site uses event sourcing, you can revert the account balances to their previous uncorrupted state, fix the exchange rate and replay all the events until now. That's it, your job and reputation is saved!


Node.js Monitoring and Debugging from the Experts of RisingStack

See how deployments affect the performance of your production environment.
Learn more


Other use-cases

You can use events to audit or debug state changes in your system. They can also be useful for handling SaaS subscriptions. In a usual subscription based system, your users can buy a plan, upgrade it, downgrade it, pro-rate a current price, cancel a plan, apply a coupon, and so on... A good event log can be very useful to figure out what happened.

So with event sourcing you can:

  • Rebuild states completely
  • Replay states from a specific time
  • Reconstruct the state of a specific moment for temporary query

What is an Event?

An Event is something that happened in the past. An Event is not a snapshot of a state at a specific time; it's the action itself with all the information that's necessary to replay it.

Events should be a simple object which describes some action that occurred. They should be immutable and stored in an append-only way. Their immutable append-only nature makes them suitable to use as audit logs too.

This is what makes possible to undo and redo events or even replay them from a specific timestamp.

Be careful with External Systems!

As any software pattern, Event Sourcing can be challenging at some points as well.

The external systems that your application communicates with are usually not prepared for event sourcing, so you should be careful when you replay your events. I’m sure that you don’t wish to charge your customers twice or send all welcome emails again.

To solve this challenge, you should handle replays in your communication layers!

Command Sourcing

Command Sourcing is a different approach from Event Sourcing - make sure you don’t mix ‘em up by accident!

Event Sourcing:

  • Persist only changes in state
  • Replay can be side-effect free

Command Sourcing:

  • Persist Commands
  • Replay may trigger side-effects

Example for Event Sourcing

In this simple example, we will apply Event Sourcing for our accounts:

// current account states (how it looks in our DB now)
const accounts = {  
  account1: { balance: 100 },
  account2: { balance: 50 }
}
// past events (should be persisted somewhere, for example in a DB)
const events = [  
  { type: 'open', id: 'account1', balance: 150, time: 0 },
  { type: 'open', id: 'account2', balance: 0, time: 1 },
  { type: 'transfer', fromId: 'account1', toId: 'account2': amount: 50, time: 2 }
]

Let's rebuild the latest state from scratch, using our event log:

// complete rebuild
const accounts = events.reduce((accounts, event) => {  
  if (event.type === 'open') {
    accounts[event.id].balance = event.balance
  } else if (event.type === 'transfer') {
    accounts[event.fromId].balance -= event.amount
    accounts[event.toId].balance += event.amount
  }
  return accounts
}, {})

Undo the latest event:

// undo last event
const accounts = events.splice(-1).reduce((accounts, event) => {  
  if (event.type === 'open') {
    delete accounts[event.id]
  } else if (event.type === 'transfer') {
    accounts[event.fromId].balance += event.amount
    accounts[event.toId].balance -= event.amount
  }
  return accounts
}, {})

Query accounts state at a specific time:

// query specific time
function getAccountsAtTime (time) {  
  return events.reduce((accounts, event) => {
    if (time > event.time {
      return accounts
    }

    if (event.type === 'open') {
      accounts[event.id].balance = event.balance
    } else if (event.type === 'transfer') {
      accounts[event.fromId].balance -= event.amount
      accounts[event.toId].balance += event.amount
    }
    return accounts
  }, {})
}

const accounts = getAccountsAtTime(1)  

Download the whole building with Node.js series as a single pdf

Learning more..

For more detailed examples, you can check out our Event Sourcing Example repository.

For more general and deeper understanding of Event Sourcing I recommend to read these articles:

In the next part of the Node.js at Scale series, we’ll learn about Command Query Responsibility Segregation. Make sure you check back in a week!

If you have any questions on this topic, please let me know in the comments section below!

Graceful shutdown with Node.js and Kubernetes

This article helps you to understand what graceful shutdown is, what are the main benefits of it and how can you set up the graceful shutdown of a Kubernetes application. We’ll discuss how you can validate and benchmark this process, and what are the most common mistakes that you should avoid.

Graceful shutdown

We can speak about the graceful shutdown of our application, when all of the resources it used and all of the traffic and/or data processing what it handled are closed and released properly.

It means that no database connection remains open and no ongoing request fails because we stop our application.

"Graceful shutdown: when all of the resources & data processing are closed and released properly." via @RisingStack

Click To Tweet

Possible scenarios for a graceful web server shutdown:

  1. App gets notification to stop (received SIGTERM)
  2. App lets know the load balancer that it’s not ready for newer requests
  3. App served all the ongoing requests
  4. App releases all of the resources correctly: DB, queue, etc.
  5. App exits with "success" status code (process.exit())

This article goes deep with shutting down web servers properly, but you should also apply these techniques to your worker processes: it’s highly recommended to stop consuming queues for SIGTERM and finish the current task/job.

Why is it important?

If we don't stop our application correctly, we are wasting resources like DB connections and we may also break ongoing requests. An HTTP request doesn't recover automatically - if we fail to serve it, then we simply missed it.

"If we don't stop our app correctly, we're wasting resources & may also break ongoing requests." via @RisingStack

Click To Tweet

Graceful start

We should only start our application when all of the dependencies and database connections are ready to handle our traffic.

Possible scenarios for a graceful web server start:

  1. App starts (npm start)
  2. App opens DB connections
  3. App listens on port
  4. App tells the load balancer that it’s ready for requests

Graceful shutdown in a Node.js application

First of all, you need to listen for the SIGTERM signal and catch it:

process.on('SIGTERM', function onSigterm () {  
  console.info('Got SIGTERM. Graceful shutdown start', new Date().toISOString())
  // start graceul shutdown here
  shutdown()
})

After that, you can close your server, then close your resources and exit the process:

function shutdown() {  
  server.close(function onServerClosed (err) {
    if (err) {
      console.error(err)
      process.exit(1)
    }

    closeMyResources(function onResourcesClosed (err) {
      // error handling
      process.exit()
    })
  })
}

Sounds easy right? Maybe a little bit too easy.

What about the load balancer? How will it know that your app is not ready to receive further requests anymore? What about keep-alive connections? Will they keep the server open for a longer time? What if my server SIGKILL my app in the meantime?

Graceful shutdown with Kubernetes

If you’d like to learn a little bit about Kubernetes, you can read our Moving a Node.js app from PaaS to Kubernetes Tutorial. For now, let's just focus on the shutdown now.

Kubernetes comes with a resource called Service. Its job is to route traffic to your pods (~instances of your app). Kubernetes also comes with a thing called Deployment that describes how your applications should behave during exit, scale and deploy - and you can also define a health check here. We will combine these resources for the perfect graceful shutdown and handover during new deploys at high traffic.

We would like to see throughput charts like below with consistent rpm and no deployment side effects at all:

Graceful shutdown example: Throughput time in Trace by RisingStack Throughput metrics shown in Trace - no change at deploy

Ok, let's see how to solve this challenge.

Setting up graceful shutdown

In Kubernetes, for a proper graceful shutdown we need to add a readinessProbe to our application’s Deployment yaml and let the Service’s load balancer know during the shutdown that we will not serve more requests so it should stop sending them. We can close the server, tear down the DB connections and exit only after that.

How does it work?

Kubernetes graceful shutdown flowchart

  1. pod receives SIGTERM signal because Kubernetes wants to stop it - because of deploy, scale, etc.
  2. App (pod) starts to return 500 for GET /health to let readinessProbe (Service) know that it's not ready to receive more requests.
  3. Kubernetes readinessProbe checks GET /health and after (failureThreshold * periodSecond) it stops redirecting traffic to the app (because it continuously returns 500)
  4. App waits (failureThreshold * periodSecond) before it starts to shut down - to make sure that the Service is getting notified via readinessProbe fail
  5. App starts graceful shutdown
  6. App first closes server with live working DB connections
  7. App closes databases after the server is closed
  8. App exits process
  9. Kubernetes force kills the application after 30s (SIGKILL) if it's still running (in an optimal case it doesn't happen)

In our case, the Kubernetes livenessProbe won't kill the app before the graceful shutdown happens because it needs to wait (failureThreshold * periodSecond) to do it. This means that the livenessProve threshold should be larger than the readinessProbe threshold. This way the (graceful stop happens around 4s, while the force kill would happen 30s after SIGTERM).

Node.js Monitoring and Debugging from the Experts of RisingStack

Compare different application revisions using Trace!
Learn more

How to achieve it?

For this we need to do two things, first we need to let the readinessProbe know after SIGTERM that we are not ready anymore:

'use strict'

const db = require('./db')  
const promiseTimeout = require('./promiseTimeout')  
const state = { isShutdown: false }  
const TIMEOUT_IN_MILLIS = 900

process.on('SIGTERM', function onSigterm () {  
  state.isShutdown = true
})

function get (req, res) {  
  // SIGTERM already happened
  // app is not ready to serve more requests
  if (state.isShutdown) {
    res.writeHead(500)
    return res.end('not ok')
  }

  // something cheap but tests the required resources
  // timeout because we would like to log before livenessProbe KILLS the process
  promiseTimeout(db.ping(), TIMEOUT_IN_MILLIS)
    .then(() => {
      // success health
      res.writeHead(200)
      return res.end('ok')
    })
    .catch(() => {
      // broken health
      res.writeHead(500)
      return res.end('not ok')
    })
}

module.exports = {  
  get: get
}

The second thing is that we have to delay the teardown process - as a sane default you can use the time needed for two failed readinessProbe: failureThreshold: 2 * periodSeconds: 2 = 4s

process.on('SIGTERM', function onSigterm () {  
  console.info('Got SIGTERM. Graceful shutdown start', new Date().toISOString())

  // Wait a little bit to give enough time for Kubernetes readiness probe to fail 
  // (we are not ready to serve more traffic)
  // Don't worry livenessProbe won't kill it until (failureThreshold: 3) => 30s
  setTimeout(greacefulStop, READINESS_PROBE_DELAY)
})

You can find the full example here:
https://github.com/RisingStack/kubernetes-graceful-shutdown-example

How to validate it?

Let's test our graceful shutdown by sending high traffic to our pods and releasing a new version in the meantime (recreating all of the pods).

Test case

$ ab -n 100000 -c 20 http://localhost:myport

Other than this, you need to change an environment variable in the Deployment to re-create all pods during the ab benchmarking.

AB output

Document Path:          /  
Document Length:        3 bytes

Concurrency Level:      20  
Time taken for tests:   172.476 seconds  
Complete requests:      100000  
Failed requests:        0  
Total transferred:      7800000 bytes  
HTML transferred:       300000 bytes  
Requests per second:    579.79 [#/sec] (mean)  
Time per request:       34.495 [ms] (mean)  
Time per request:       1.725 [ms] (mean, across all concurrent requests)  
Transfer rate:          44.16 [Kbytes/sec] received  

Application log output

Got SIGTERM. Graceful shutdown start 2016-10-16T18:54:59.208Z  
Request after sigterm: / 2016-10-16T18:54:59.217Z  
Request after sigterm: / 2016-10-16T18:54:59.261Z  
...
Request after sigterm: / 2016-10-16T18:55:00.064Z  
Request after sigterm: /health?type=readiness 2016-10-16T18:55:00.820Z  
HEALTH: NOT OK  
Request after sigterm: /health?type=readiness 2016-10-16T18:55:02.784Z  
HEALTH: NOT OK  
Request after sigterm: /health?type=liveness 2016-10-16T18:55:04.781Z  
HEALTH: NOT OK  
Request after sigterm: /health?type=readiness 2016-10-16T18:55:04.800Z  
HEALTH: NOT OK  
Server is shutting down... 2016-10-16T18:55:05.210Z  
Successful graceful shutdown 2016-10-16T18:55:05.212Z  

Benchmark result

Success!

Zero failed requests: you can see in the app log that the Service stopped sending traffic to the pod before we disconnected from the DB and killed the app.

Common gotchas

The following mistakes can still prevent your app from doing a proper graceful shutdown:

Keep-alive connections

Kubernetes doesn't handover keep-alive connections properly. :/

This means that request from agents with a keep-alive header will still be routed to the pod.

It tricked me first when I benchmarked with autocannon or Google Chrome (they use keep-alive connections).

Keep-alive connections prevent closing your server in time. To force the exit of a process, you can use the server-destroy module. Once it ran, you can be sure that all the ongoing requests are served. Alternatively you can adda timeout logic to your server.close(cb).

Docker signaling

It’s quite possible that your application doesn't receive the signals correctly in a dockerized application.

For example in our Alpine image: CMD ["node", "src"] works, CMD ["npm", "start"] doesn't. It simply doesn't pass the SIGTERM to the node process. The issue is probably related to this PR: https://github.com/npm/npm/pull/10868

An alternative you can use is dumb-init for fixing broken Docker signaling.

Takeaway

Always be sure that your application stops correctly: It releases all of the resources and helps to hand over the traffic to the new version of your app.

Check out our example repository with Node.js and Kubernetes:
https://github.com/RisingStack/kubernetes-graceful-shutdown-example

"An app stops correctly if it releases all resources & hands over the traffic to your new app." via @RisingStack

Click To Tweet

If you have any questions or thoughts about this topic, find me in the comment section below!


Moving a Node.js app from PaaS to Kubernetes Tutorial

From this Kubernetes tutorial, you can learn how to move a Node.js app from a PaaS provider while achieving lower response times, improving security and reducing costs.


Before we jump into the story of why and how we migrated our services to Kubernetes, it's important to mention that there is nothing wrong with using a PaaS. PaaS is perfect to start building a new product, and it can also turn out to be a good solution as an application advances - it always depends on your requirements and resources.

PaaS

Trace by RisingStack, our Node.js monitoring solution was running on one of the biggest PaaS providers for more than half a year. We have chosen a PaaS over other solutions because we wanted to focus more on the product instead of the infrastructure.

Our requirements were simple; we wanted to have:

  • fast deploys,
  • simple scaling,
  • zero-downtime deployments,
  • rollback capabilities,
  • environment variable management,
  • various Node.js versions,
  • and "zero" DevOps.

What we didn't want to have, but got as a side effect of using PaaS:

  • big network latencies between services,
  • lack of VPC,
  • response time peaks because of the multitenancy,
  • larger bills (pay for every single process, no matter how small it is: clock, internal API, etc.).

Trace is developed as a group of micro-services, you can imagine how quickly the network latency and billing started to hurt us.

Kubernetes tutorial

From our PaaS experience, we knew that we are looking for a solution that needs very few DevOps effort but provides a similar flow for our developers. We didn't want to lose any of the advantages I mentioned above - however, we wanted to fix the outstanding issues.

We were looking for an infrastructure that is more configuration-based, and anyone from the team can modify it.

Kubernetes with its’ configuration-focused, container-based and micro-services friendly nature convinced us.

"Kubernetes convinced us with its configuration-focused, microservices friendly nature" via @RisingStack #kubernetes

Click To Tweet

Let me show you what I mean under these "buzzwords" through the upcoming sections.

Node.js Monitoring and Debugging from the Experts of RisingStack

Build performant applications using Trace
Learn more

What is Kubernetes?

Kubernetes is an open-source system for automating deployments, scaling, and management of containerized applications - kubernetes.io

I don't want to give a very deep intro about the Kubernetes elements here, but you need to know the basic ones for the upcoming parts of this post.

My definitions won't be 100% correct, but you can think of it as a PaaS to Kubernetes dictionary:

  • pod: your running containerized application with environment variables, disk, etc. together, pods born and die quickly, like at deploys,

    • in PaaS: ~currently running app
  • deployment: configuration of your application that describes what state do you need (CPU, memory, env. vars, docker image version, disks, number of running instances, deploy strategy, etc.):

    • in PaaS: ~app settings
  • secret: you can separate your credentials from environment variables,

    • in PaaS: not exist, like a shared separated secret environment variable, for DB credentials etc.
  • service: exposes your running pods by label(s) to other apps or to the outside world on the desired IP and port

    • in PaaS: built-in non-configurable load balancer

How to set up a running Kubernetes cluster?

You have several options here. The easiest one is to create a Container Engine in Google Cloud, which is a hosted Kubernetes. It's also well integrated with other Google Cloud components, like load balancers and disks.

You should also know, that Kubernetes can run anywhere like AWS, DigitalOcean, Azure etc. For more information check out the CoreOS Kubernetes tools.

Running the application

First, we have to prepare our application to work well with Kubernetes in a Docker environment.

If you are looking for a tutorial on how to start an app from scratch with Kubernetes check out their 101 tutorial.

Node.js app in Docker container

Kubernetes is Docker-based, so first we need to containerize our application. If you are not sure how to do that, check out our previous post: Dockerizing Your Node.js Application

If you are a private NPM user, you will also find this one helpful: Using the Private NPM Registry from Docker

"Procfile" in Kubernetes

We create one Docker image for every application (Git repository). If the repository contains multiple processes like: server, worker and clock we choose between them with an environment variable. Maybe you find it strange, but we don't want to build and push multiple Docker images from the very same source code, it would slow down our CI.

Environments, rollback, and service-discovery

Staging, production

During our PaaS period we named our services like trace-foo and trace-foo-staging, the only difference between the staging and production application was the name prefix and the different environment variables. In Kubernetes it's possible to define namespaces. Each namespace is totally independent from each other and doesn't share any resources like secrets, config, etc.

$ kubectl create namespace production
$ kubectl create namespace staging

Application versions

In a containerized infrastructure, each application version should be a different container image with a tag. We use the Git short hash as a Docker image tag.

foo:b37d759  
foo:f53a7cb  

To deploy a new version of your application, you only need to change the image tag in your application's deployment config, Kubernetes will do the rest.

Kubernetes Tutorial: The Deploy Flow (Deploy flow)

Any change in your deployment file is versioned and you can rollback to them anytime.

$ kubectl rollout history deployment/foo
deployments "foo":  
REVISION    CHANGE-CAUSE  
1           kubectl set image deployment/foo foo=foo:b37d759  
2           kubectl set image deployment/foo foo=foo:f53a7cb  

During our deploy process, we only replace Docker images which are quite fast - they only require a couple of seconds.

Service discovery

Kubernetes has a built-in simple service discovery solution: The created services expose their hostname and port as an environment variable for each pod.

const fooServiceUrl = `http://${process.env.FOO_SERVICE_HOST}:${process.env.FOO_SERVICE_PORT}`  

If you don't need advanced discovery, you can just start using it, instead of copying your service URLs to each other's environment variables. Kind of cool, isn't it?

Production ready application

The reallly challenging part of jumping into a new technology is to know what you need to be production ready. In the following section we will check what you should consider to set up in your app.

Zero downtime deployment and failover

Kubernetes can update your application in a way that it always keeps some pods running and deploy your changes in smaller steps - instead of stopping and starting all of them at the same time.

It’s not just helpful to prevent zero downtime deploys; it also avoids killing your whole application when you misconfigure something. Your mistake stops escalating to all of the running pods after Kubernetes detects that your new pods are unhealthy.

Kubernetes supports several strategies to deploy your applications. You can check them in the Deployment strategy documentation.

Graceful stop

It’s not mainly related to Kubernetes, but it’s impossible to have a good application lifecycle without starting and stopping your process in a proper way.

Start server

const server = MyServer()  
Promise.all([  
   db1.connect()
   db2.connect()
])
  .then() => server.listen(3000))

Gracefull server stop

process.on('SIGTERM', () => {  
  server.close()
    .then() => Promise.all([
      db1.disconnect()
      db2.disconnect()
    ])
   .then(() => process.exit(0))
   .catch((err) => process.exit(-1))
})

Liveness probe (health check)

In Kubernetes, you should define health check (liveness probe) for your application. With this, Kubernetes will be able to detect when your application needs to be restarted.

Web server health check

You have multiple options to check your applications health, but I think the easiest one is to create a GET /healthz endpoint end check your applications logic / DB connections there. It’s important to mention that every application is different, only you can know what checks are necessary to make sure it’s working.

app.get('/healthz', function (req, res, next) {  
  // check my health
  // -> return next(new Error('DB is unreachable'))
  res.sendStatus(200)
})
livenessProbe:  
    httpGet:
      # Path to probe; should be cheap, but representative of typical behavior
      path: /healthz
      port: 3000
    initialDelaySeconds: 30
    timeoutSeconds: 1
Worker health check

For our workers we also set up a very small HTTP server with the same /healthz endpoint which checks different criteria with the same liveness probe. We do it to have company-wide consistent health check endpoints.

Readiness probe

The readiness probe is similar to the liveness probe (health check), but it makes sense only for web servers. It tells the Kubernetes service (~load balancer) that the traffic can be redirected to the specific pod.

It is essential to avoid any service disruption during deploys and other issues.

readinessProbe:  
    httpGet:
      # You can use the /healthz or something else
      path: /healthz
      port: 3000
    initialDelaySeconds: 30
    timeoutSeconds: 1

Logging

For logging, you can choose from different approaches, like adding side containers to your application which collects your logs and sends them to custom logging solutions, or you can go with the built-in Google Cloud one. We selected the built-in one.

To be able to parse the built in log levels (severity) on Google Cloud, you need to log in the specific format. You can achieve this easily with the winston-gke module.

// setup logger
cons logger = require(‘winston’)  
cons winstonGke = require(‘winston-gke’)  
logger.remove(logger.transports.Console)  
winstonGke(logger, config.logger.level)

// usage
logger.info(‘I\’m a potato’, { foo: ‘bar’ })  
logger.warning(‘So warning’)  
logger.error(‘Such error’)  
logger.debug(‘My debug log)

If you log in the specific format, Kubernetes will automatically merge your log messages with the container, deployment, etc. meta information and Google Cloud will show it in the right format.

Your applications first log message has to be in the right format, otherwise it won’t start to parse it correctly.

To achieve this, we turned our npm start to silent, npm start -s in a Dockerfile: CMD ["npm", "start", "-s"]

Monitoring

We check our applications with Trace which is optimized from scratch to monitor and visualize micro-service architectures. The service map view of Trace helped us a lot during the migration to understand which application communicates with which one and what are the database and external dependencies.

Kubernetes Tutorial: Services in our infrastructure (Services in our infrastructure)

Since Trace is environment independent, we didn't have to change anything in our codebase, and we could use it to validate the migration and our expectations about the positive performance changes.

Kubernetes tutorial : Response times after the migration (Stable and fast response times)

Example

Check out our all together example repository for Node.js with Kubernetes and CircleCI:
https://github.com/RisingStack/kubernetes-nodejs-example

Tooling

Continuous deployment with CI

It's possible to update your Kubernetes deployment with a JSON path, or update only the image tag. After you have a working kubectl on your CI machine, you only need to run this command:

$ kubectl --namespace=staging set image deployment/foo foo=foo:GIT_SHORT_SHA

Debugging

In Kubernetes it's possible to run a shell inside any container, it's this easy:

$ kubectl get pod

NAME           READY     STATUS    RESTARTS   AGE  
foo-37kj5   1/1       Running   0          2d

$ kubectl exec foo-37kj5 -i -t -- sh
# whoami       
root  

Another useful thing is to check the pod events with:

$ kubectl describe pod foo-37kj5

You can also get the log message of any pod with the:

$ kubectl log foo-37kj5

Code piping

At our PaaS provider, we liked code piping between staging and production infrastructure. In Kubernetes we missed this, so we built our own solution.

It's a simple npm library which reads the current image tag from staging and sets it on the production deployment config.

Because the Docker container is the very same, only the environment variable changes.

SSL termination (https)

Kubernetes services are not exposed as https by default, but you can easily change this. To do so, read how to expose your applications with TLS in Kubernetes.

Conclusion

To summarize our experience with Kubernetes: we are very satisfied with it.

"Kubernetes helped us to improve our response times and failover + reduced our costs" via @RisingStack #kubernetes

Click To Tweet

We improved our applications response time in our micro-service architecture. We managed to raise security to the next level with the private network (VPC) between apps.

Also, we reduced our costs and improved the failover with the built-in rolling update strategy and liveness, readiness probes.

If you are in a state when you need to think about your infrastructure's future, you should definitely take Kubernetes into consideration!

If you have questions about migrating to Kubernetes from a PaaS, feel free to post them in the comment section.

React.js Best Practices for 2016

2015 was the year of React with tons of new releases and developer conferences dedicated to the topic all over the world. For a detailed list of the most important milestones of last year, check out our React in 2015 wrap up.

The most interesting question for 2016: How should we write an application and what are the recommended libraries?

As a developer working for a long time with React.js I have my own answers and best practices, but it's possible that you won’t agree on everything with me. I’m interested in your ideas and opinions: please leave a comment so we can discuss them.

React.js logo - Best Practices for 2016

If you are just getting started with React.js, check out our React.js tutorial, or the React howto by Pete Hunt.

Dealing with data

Handling data in a React.js application is super easy, but challenging at the same time.
It happens because you can pass properties to a React component in a lot of ways to build a rendering tree from it; however it's not always obvious how you should update your view.

2015 started with the releases of different Flux libraries and continued with more functional and reactive solutions.

Node.js Monitoring and Debugging from the Experts of RisingStack

Build performant backend for React appliactions using Trace
Learn more

Let's see where we are now:

Flux

According to our experience, Flux is often overused (meaning that people use it even if they don't even need it).

Flux provides a clean way to store and update your application's state and trigger rendering when it's needed.

Flux can be useful for the app's global states like: managing logged in user, the state of a router or active account but it can turn quickly into pain if you start to manage your temporary or local data with it.

We don’t recommend using Flux for managing route-related data like /items/:itemId. Instead, just fetch it and store it in your component's state. In this case, it will be destroyed when your component goes away.

If you need more info about Flux, The Evolution of Flux Frameworks is a great read.

Use redux

Redux is a predictable state container for JavaScript apps.

If you think you need Flux or a similar solution you should check out redux and Dan Abramov's Getting started with redux course to quickly boost your development skills.

Redux evolves the ideas of Flux but avoids its complexity by taking cues from Elm.

Keep your state flat

API's often return nested resources. It can be hard to deal with them in a Flux or Redux-based architecture. We recommend to flatten them with a library like normalizr and keep your state as flat as possible.

Hint for pros:

const data = normalize(response, arrayOf(schema.user))

state = _.merge(state, data.entities)  

(we use isomorphic-fetch to communicate with our APIs)

Use immutable states

Shared mutable state is the root of all evil - Pete Hunt, React.js Conf 2015

Immutable logo for React.js Best Practices 2016

Immutable object is an object whose state cannot be modified after it is created.

Immutable objects can save us all a headache and improve the rendering performance with their reference-level equality checks. Like in the shouldComponentUpdate:

shouldComponentUpdate(nexProps) {  
 // instead of object deep comparsion
 return this.props.immutableFoo !== nexProps.immutableFoo
}

How to achieve immutability in JavaScript?
The hard way is to be careful and write code like the example below, which you should always check in your unit tests with deep-freeze-node (freeze before the mutation and verify the result after it).

return {  
  ...state,
  foo
}

return arr1.concat(arr2)  

Believe me, these were the pretty obvious examples.

The less complicated way but also less natural one is to use Immutable.js.

import { fromJS } from 'immutable'

const state = fromJS({ bar: 'biz' })  
const newState = foo.set('bar', 'baz')  

Immutable.js is fast, and the idea behind it is beautiful. I recommend watching the Immutable Data and React video by Lee Byron even if you don't want to use it. It will give deep insight to understand how it works.

Observables and reactive solutions

If you don't like Flux/Redux or just want to be more reactive, don't be disappointed! There are other solutions to deal with your data. Here is a short list of libraries what you are probably looking for:

  • cycle.js ("A functional and reactive JavaScript framework for cleaner code")
  • rx-flux ("The Flux architecture with RxJS")
  • redux-rx ("RxJS utilities for Redux.")
  • mobservable ("Observable data. Reactive functions. Simple code.")

Routing

Almost every client side application has some routing. If you are using React.js in a browser, you will reach the point when you should pick a library.

Our chosen one is the react-router by the excellent rackt community. Rackt always ships quality resources for React.js lovers.

To integrate react-router check out their documentation, but what's more important here: if you use Flux/Redux we recommend to keep your router's state in sync with your store/global state.

Synchronized router states will help you to control router behaviors by Flux/Redux actions and read router states and parameters in your components.

Redux users can simply do it with the redux-simple-router library.

Code splitting, lazy loading

Only a few of webpack users know that it's possible to split your application’s code to separate the bundler's output to multiple JavaScript chunks:

require.ensure([], () => {  
  const Profile = require('./Profile.js')
  this.setState({
    currentComponent: Profile
  })
})

It can be extremely useful in large applications because the user's browser doesn't have to download rarely used codes like the profile page after every deploy.

Having more chunks will cause more HTTP requests - but that’s not a problem with HTTP/2 multiplexed.

Combining with chunk hashing you can also optimize your cache hit ratio after code changes.

The next version of react-router will help a lot in code splitting.

For the future of react-router check out this blog post by Ryan Florence: Welcome to Future of Web Application Delivery.

Components

A lot of people are complaining about JSX. First of all, you should know that it’s optional in React.

At the end of the day, it will be compiled to JavaScript with Babel. You can write JavaScript instead of JSX, but it feels more natural to use JSX while you are working with HTML.
Especially because even less technical people could still understand and modify the required parts.

JSX is a JavaScript syntax extension that looks similar to XML. You can use a simple JSX syntactic transform with React. - JSX in depth

If you want to read more about JSX check out the JSX Looks Like An Abomination - But it’s Good for You article.

Use Classes

React works well with ES2015 classes.

class HelloMessage extends React.Component {  
  render() {
    return <div>Hello {this.props.name}</div>
  }
}

We prefer higher order components over mixins so for us leaving createClass was more like a syntactical question rather than a technical one. We believe there is nothing wrong with using createClass over React.Component and vice-versa.

PropType

If you still don't check your properties, you should start 2016 with fixing this. It can save hours for you, believe me.

MyComponent.propTypes = {  
  isLoading: PropTypes.bool.isRequired,
  items: ImmutablePropTypes.listOf(
    ImmutablePropTypes.contains({
      name: PropTypes.string.isRequired,
    })
  ).isRequired
}

Yes, it's possible to validate Immutable.js properties as well with react-immutable-proptypes.

Higher order components

Now that mixins are dead and not supported in ES6 Class components we should look for a different approach.

What is a higher order component?

PassData({ foo: 'bar' })(MyComponent)  

Basically, you compose a new component from your original one and extend its behaviour. You can use it in various situations like authentication: requireAuth({ role: 'admin' })(MyComponent) (check for a user in higher component and redirect if the user is not logged in) or connecting your component with Flux/Redux store.

At RisingStack, we also like to separate data fetching and controller-like logic to higher order components and keep our views as simple as possible.

Testing

Testing with good test coverage must be an important part of your development cycle. Luckily, the React.js community came up with excellent libraries to help us achieve this.

Component testing

One of our favorite library for component testing is enzyme by AirBnb. With it's shallow rendering feature you can test logic and rendering output of your components, which is pretty amazing. It still cannot replace your selenium tests, but you can step up to a new level of frontend testing with it.

it('simulates click events', () => {  
  const onButtonClick = sinon.spy()
  const wrapper = shallow(
    <Foo onButtonClick={onButtonClick} />
  )
  wrapper.find('button').simulate('click')
  expect(onButtonClick.calledOnce).to.be.true
})

Looks neat, isn't it?

Do you use chai as assertion library? You will like chai-enyzime!

Redux testing

Testing a reducer should be easy, it responds to the incoming actions and turns the previous state to a new one:

it('should set token', () => {  
  const nextState = reducer(undefined, {
    type: USER_SET_TOKEN,
    token: 'my-token'
  })

  // immutable.js state output
  expect(nextState.toJS()).to.be.eql({
    token: 'my-token'
  })
})

Testing actions is simple until you start to use async ones. For testing async redux actions we recommend to check out redux-mock-store, it can help a lot.

it('should dispatch action', (done) => {  
  const getState = {}
  const action = { type: 'ADD_TODO' }
  const expectedActions = [action]

  const store = mockStore(getState, expectedActions, done)
  store.dispatch(action)
})

For deeper redux testing visit the official documentation.

Use npm

However React.js works well without code bundling we recommend using Webpack or Browserify to have the power of npm. Npm is full of quality React.js packages, and it can help to manage your dependencies in a nice way.

(Please don’t forget to reuse your own components, it’s an excellent way to optimize your code.)

Bundle size

This question is not React-related but because most people bundle their React application I think it’s important to mention it here.

While you are bundling your source code, always be aware of your bundle’s file size. To keep it at the minimum you should consider how you require/import your dependencies.

Check the following code snippet, the two different way can make a huge difference in the output:

import { concat, sortBy, map, sample } from 'lodash'

// vs.
import concat from 'lodash/concat';  
import sortBy from 'lodash/sortBy';  
import map from 'lodash/map';  
import sample from 'lodash/sample';  

Check out the Reduce Your bundle.js File Size By Doing This One Thing for more details.

We also like to split our code to least vendors.js and app.js because vendors updates less frequently than our code base.
With hashing the output file names (chunk hash in WebPack) and caching them for the long term, we can dramatically reduce the size of the code what needs to be downloaded by returning visitors on the site. Combining it with lazy loading you can imagine how optimal can it be.

If you are new to Webpack, check out this excellent React webpack cookbook.

Component-level hot reload

If you ever wrote a single page application with livereload, probably you know how annoying it is when you are working on something stateful, and the whole page just reloads while you hit a save in your editor. You have to click through the application again, and you will go crazy repeating this a lot.

With React, it's possible to reload a component while keeping its states - boom, no more pain!

To setup hot reload check out the react-transform-boilerplate.

Use ES2015

I mentioned that we use JSX in our React.js components what we transpile with Babel.js.

Babel logo in React.js Best Practices 2016

Babel can do much more and also makes possible to write ES6/ES2015 code for browsers today. At RisingStack, we use ES2015 features on both server and client side which are available in the latest LTS Node.js version.

Linters

Maybe you already use a style guide for your JavaScript code but did you know that there are style guides for React as well? We highly recommend to pick one and start following it.

At RisingStack, we also enforce our linters to run on the CI system and for git push as well. Check out pre-push or pre-commit.

We use JavaScript Standard Style for JavaScript with eslint-plugin-react to lint our React.js code.

(That's right, we do not use semicolons anymore.)

GraphQL and Relay

GraphQL and Relay are relatively new technologies. At RisingStack, we don’t use it in production for now, just keeping our eyes open.

We wrote a library called graffiti which is a MongoDB ORM for Relay and makes it possible to create a GraphQL server from your existing mongoose models.
If you would like to learn these new technologies we recommend to check it out and play with it.

Takeaway from these React.js Best Practices

Some of the highlighted techniques and libraries are not React.js related at all - always keep your eyes open and check what others in the community do. The React community is inspired a lot by the Elm architecture in 2015.

If you know about other essential React.js tools that people should use in 2016, let us know in the comments!

React in 2015 - Retrospection

In details

2015 was the year of React

React had an amazing year with tons of new releases and developer conferences thanks to the contribution of the open-source community and enterprise adopters. As a result, React is used by companies like Facebook, Yahoo, Imgur, Mozilla, Airbnb, Netflix, Sberbank and much more. For a more detailed list check out this collection: Sites Using React

If you are not familiar with React, check out our in-depth tutorial series: The React.js Way

2015 January

2015 February

2015 March

2015 May

2015 June

2015 July

2015 August

2015 October

2015 November

2015 December

Read more about React

If you can't get enough of React head over to our related articles page.

Do you miss anything from the timeline? Let us know in the comments.

GraphQL Overview - Getting Started with GraphQL and Node.js

We've just released Graffiti: it transforms your existing models into a GraphQL schema. Here is how.

ReactEurope happened last week in the beautiful city of Paris. As it was expected and long-awaited, Facebook released their implementation of the GraphQL draft.

What is GraphQL?

GraphQL is a query language created by Facebook in 2012 which provides a common interface between the client and the server for data fetching and manipulations.

The client asks for various data from the GraphQL server via queries. The response format is described in the query and defined by the client instead of the server: they are called client‐specified queries.
The structure of the data is not hardcoded as in traditional REST APIs - this makes retrieving data from the server more efficient for the client.

For example, the client can ask for linked resources without defining new API endpoints. With the following GraphQL query, we can ask for the user specific fields and the linked friends resource as well.

{
  user(id: 1) {
    name
    age
    friends {
      name
    }
  }
}

In a resource based REST API it would look something like:

GET /users/1 and GET /users/1/friends  


or

GET /users/1?include=friends.name  

GraphQL overview

It's important to mention that GraphQL is not language specific, it's just a specification between the client and the server. Any client should be able to communicate with any server if they speak the common language: GraphQL.

Key concepts of the GraphQL query language are:

  • Hierarchical
  • Product‐centric
  • Strong‐typing
  • Client‐specified queries
  • Introspective

I would like to highlight strong-typing here which means that GraphQL introduces an application level type system. It's a contract between the client and server which means that your server in the background may use different internal types. The only thing here what matters is that the GraphQL server must be able to receive GraphQL queries, decide if that it is syntactically correct and provide the described data for that.

For more details on the concept of GraphQL check out the GraphQL specification.

Where is it useful?

GraphQL helps where your client needs a flexible response format to avoid extra queries and/or massive data transformation with the overhead of keeping them in sync.

Using a GraphQL server makes it very easy for a client side developer to change the response format without any change on the backend.

With GraphQL, you can describe the required data in a more natural way. It can speed up development, because in application structures like top-down rendering in React, the required data is more similar to your component structure.

Check out our previous query and how similar it is to the following component structure:

<App>  
  <User>
    <Friend/>
    <Friend/>
  </User>
</App>  

Differences with REST

REST APIs are resource based. Basically what you do is that you address your resources like GET /users/1/friends, which is a unique path for them. It tells you very well that you are looking for the friends of the user with id=1.

The advantages of REST APIs are that they are cacheable, and their behaviour is obvious.

The disadvantage is that it's hard to specify and implement advanced requests with includes, excludes and especially with linked resources. I think you have already seen requests like:
GET /users/1/friends/1/dogs/1?include=user.name,dog.age

This is exactly the problem what GraphQL wants to solve. If you have types of user and dog and their relations are defined, you can write any kind of query to get your data.

You will have the following queries out of the box:

  • get name of the user with id=1
{
 user(id: 1) {
   name
 }
}
  • get names for friends of the user with id=1
{
 user(id: 1) {
   friends {
     name
   }
 }
}
  • get age and friends of the user with id=1
{
 user(id: 1) {
   age
   friends {
     name
   }
 }
}
  • get names of the dogs of the friends of the user with id=1 :)
{
 user(id: 1) {
   friends {
     dogs {
       name
     }
   }
 }
}

Simple right? Implement once, re-use it as much as possible.

GraphQL queries

You can do two type of queries with GraphQL:

  • when you fetch (get) data from your server and the
  • when you manipulate (create, update, delete) your data

GraphQL queries are like JSON objects without properties:

// a json object
{
  "user": "name"
}
// a graphql query
{
  user {
    name
  }
}

I already showed some queries for getting data from the GraphQL server, but what else can we do?

We can write named queries:

{
  findUser(id: 1)
}

you can pass parameters to your query:

query findUser($userId: String!) {  
  findUser(id: $userId) {
    name
  }
}

With the combination of these building blocks and with the static typing we can write powerful client specified queries. So far so good, but how can we modify our data? Let's see the next chapter for mutations.

GraphQL mutations

With GraphQL mutation you can manipulate data:

mutation updateUser($userId: String! $name: String!) {  
  updateUser(id: $userId name: $name) {
    name
  }
}

With this, you can manipulate your data and retrieve the response in the required format at the same time - pretty powerful, isn't it?

The recommendation here is to name your mutations meaningful to avoid future inconsistencies. I recommend to use names like: createUser, updateUser or removeUser.

GraphQL through HTTP

You can send GraphQL queries through HTTP:

  • GET for querying
  • POST for mutation

Caching GraphQL requests

Caching can work the same way with GET queries, as you would do it with a classic HTTP API. The only exception is when having a very complex query - in that case you may want to send that as a POST and use caching on a database/intermediary level.

Other Transport layers

HTTP is just one option - GraphQL is transport independent, so you can use it with websockets or even mqtt.

GraphQL example with Node.js server

The Facebook engineering team open-sourced a GraphQL reference implementation in JavaScript. I recommend checking their implementation to have a better picture about the possibilities of GraphQL.

They started with the JavaScript implementation and also published an npm library to make GraphQL generally available. We can start playing with it and build a simple GraphQL Node.js server with MongoDB. Are you in? ;)

The GraphQL JS library provides a resolve function for the schemas:

user: {  
  type: userType,
  args: {
    id: {
      name: 'id',
      type: new GraphQLNonNull(GraphQLString)
    }
  },
  resolve: (root, {id}) => {
    return User.findById(id);
  }
}

The only thing what we have to do here is to provide the data for the specific resolve functions. These functions are called by GraphQL JS in parallel.

We can generate a projection for our MongoDB query in the following way:

function getProjection (fieldASTs) {  
  return fieldASTs.selectionSet.selections.reduce((projections, selection) => {
    projections[selection.name.value] = 1;

    return projections;
  }, {});
}

and use it like:

resolve: (root, {id}, source, fieldASTs) => {  
  var projections = getProjection(fieldASTs);
  return User.findById(id, projections);
}

This helps optimising the amount of the fetched data from our database.

Check out the Node.js implementation with MongoDB for more details: https://github.com/RisingStack/graphql-server

Take a look at Graffiti: it transforms your existing models into a GraphQL schema.

The React.js Way: Flux Architecture with Immutable.js

This article is the second part of the "The React.js Way" blog series. If you are not familiar with the basics, I strongly recommend you to read the first article: The React.js Way: Getting Started Tutorial.

In the previous article, we discussed the concept of the virtual DOM and how to think in the component way. Now it's time to combine them into an application and figure out how these components should communicate with each other.

Components as functions

The really cool thing in a single component is that you can think about it like a function in JavaScript. When you call a function with parameters, it returns a value. Something similar happens with a React.js component: you pass properties, and it returns with the rendered DOM. If you pass different data, you will get different responses. This makes them extremely reusable and handy to combine them into an application. This idea comes from functional programming that is not in the scope of this article. If you are interested, I highly recommend reading Mikael Brevik's Functional UI and Components as Higher Order Functions blog post to have a deeper understanding on the topic.

Top-down rendering

Ok it's cool, we can combine our components easily to form an app, but it doesn't make any sense without data. We discussed last time that with React.js your app's structure is a hierarchy that has a root node where you can pass the data as a parameter, and see how your app responds to it through the components. You pass the data at the top, and it goes down from component to component: this is called top-down rendering.

React.js component hierarchy

It's great that we pass the data at the top, and it goes down via component's properties, but how can we notify the component at a higher level in the hierarchy if something should change? For example, when the user pressed a button?
We need something that stores the actual state of our application, something that we can notify if the state should change. The new state should be passed to the root node, and the top-down rendering should be kicked in again to generate (re-render) the new output (DOM) of our application. This is where Flux comes into the picture.

Flux architecture

You may have already heard about Flux architecture and the concept of it.
I’m not going to give a very detailed overview about Flux in this article; I've already done it earlier in the Flux inspired libraries with React post.

Application architecture for building user interfaces - Facebook flux

A quick reminder: Flux is a unidirectional data flow concept where you have a Store which contains the actual state of your application as pure data. It can emit events when it's changed and let your application’s components know what should be re-rendered. It also has a Dispatcher which is a centralized hub and creates a bridge between your app and the Store. It has actions that you can call from your app, and it emits events for the Store. The Store is subscribed for those events and change its internal state when it's necessary. Easy, right? ;)

Flux arhitecture

PureRenderMixin

Where are we with our current application? We have a data store that contains the actual state. We can communicate with this store and pass data to our app that responds for the incoming state with the rendered DOM. It's really cool, but sounds like lot's of rendering: (it is). Remember component hierarchy and top-down rendering - everything responds to the new data.

I mentioned earlier that virtual DOM optimizes the DOM manipulations nicely, but it doesn't mean that we shouldn't help it and minimize its workload. For this, we have to tell the component that it should be re-rendered for the incoming properties or not, based on the new and the current properties. In the React.js lifecycle you can do this with the shouldComponentUpdate.

React.js luckily has a mixin called PureRenderMixin which compares the new incoming properties with the previous one and stops rendering when it's the same. It uses the shouldComponentUpdate method internally.
That’s nice, but PureRenderMixin can't compare objects properly. It checks reference equality (===) which will be false for different objects with the same data:

boolean shouldComponentUpdate(object nextProps, object nextState)

If shouldComponentUpdate returns false, then render() will be skipped until the next state change. (In addition, componentWillUpdate and componentDidUpdate will not be called.)

var a = { foo: 'bar' };  
var b = { foo: 'bar' };

a === b; // false  

The problem here is that the components will be re-rendered for the same data if we pass it as a new object (because of the different object reference). But it also not gonna fly if we change the original Object because:

var a = { foo: 'bar' };  
var b = a;  
b.foo = 'baz';  
a === b; // true  

Sure it won't be hard to write a mixin that does deep object comparisons instead of reference checking, but React.js calls shouldComponentUpdate frequently and deep checking is expensive: you should avoid it.

I recommend to check out the advanced Performance with React.js article by Facebook.

Immutability

The problem starts escalating quickly if our application state is a single, big, nested object like our Flux store.
We would like to keep the object reference the same when it doesn't change and have a new object when it is. This is exactly what Immutable.js does.

Immutable data cannot be changed once created, leading to much simpler application development, no defensive copying, and enabling advanced memoization and change detection techniques with simple logic.

Check the following code snippet:

var stateV1 = Immutable.fromJS({  
  users: [
    { name: 'Foo' },
    { name: 'Bar' }
  ]
});

var stateV2 = stateV1.updateIn(['users', 1], function () {  
  return Immutable.fromJS({
    name: 'Barbar'
  });
});

stateV1 === stateV2; // false  
stateV1.getIn(['users', 0]) === stateV2.getIn(['users', 0]); // true  
stateV1.getIn(['users', 1]) === stateV2.getIn(['users', 1]); // false  

As you can see we can use === to compare our objects by reference, which means that we have a super fast way for object comparison, and it's compatible with React's PureRenderMixin. According to this we should write our entire application with Immutable.js. Our Flux Store should be an immutable object, and we pass immutable data as properties to our applications.

Now let's go back to the previous code snippet for a second and imagine that our application component hierarchy looks like this:

User state

You can see that only the red ones will be re-rendered after the change of the state because the others have the same reference as before. It means the root component and one of the users will be re-rendered.

With immutability, we optimized the rendering path and supercharged our app. With virtual DOM, it makes the "React.js way" to a blazing fast application architecture.

Learn more about how persistent immutable data structures work and watch the Immutable Data and React talk from the React.js Conf 2015.

Check out the example repository with a ES6, flux architecture, and immutable.js:
https://github.com/RisingStack/react-way-immutable-flux

The React.js Way: Getting Started Tutorial

Update: the second part is out! Learn more about the React.js way in the second part of the series: Flux Architecture with Immutable.js.

Now that the popularity of React.js is growing blazing fast and lots of interesting stuff are coming, my friends and colleagues started asking me more about how they can start with React and how they should think in the React way.

React.js Tutorial Google Trends (Google search trends for React in programming category, Initial public release: v0.3.0, May 29, 2013)

However, React is not a framework; there are concepts, libraries and principles that turn it into a fast, compact and beautiful way to program your app on the client and server side as well.

In this two-part blog series React.js tutorial I am going to explain these concepts and give a recommendation on what to use and how. We will cover ideas and technologies like:

  • ES6 React
  • virtual DOM
  • Component-driven development
  • Immutability
  • Top-down rendering
  • Rendering path and optimization
  • Common tools/libs for bundling, ES6, request making, debugging, routing, etc.
  • Isomorphic React

And yes, we will write code. I would like to make it as practical as possible.
All the snippets and post related code are available in the RisingStack GitHub repository .

This article is the first from those two. Let's jump in!

Repository:
https://github.com/risingstack/react-way-getting-started

1. Getting Started with the React.js Tutorial

If you are already familiar with React and you understand the basics, like the concept of virtual DOM and thinking in components, then this React.js tutorial is probably not for you. We will discuss intermediate topics in the upcoming parts of this series. It will be fun, I recommend you to check back later.

Is React a framework?

In a nutshell: no, it's not.
Then what the hell is it and why everybody is so keen to start using it?

React is the "View" in the application, a fast one. It also provides different ways to organize your templates and gets you think in components. In a React application, you should break down your site, page or feature into smaller pieces of components. It means that your site will be built by the combination of different components. These components are also built on the top of other components and so on. When a problem becomes challenging, you can break it down into smaller ones and solve it there. You can also reuse it somewhere else later. Think of it like the bricks of Lego. We will discuss component-driven development more deeply in this article later.

React also has this virtual DOM thing, what makes the rendering super fast but still keeps it easily understandable and controllable at the same time. You can combine this with the idea of components and have the power of top-down rendering. We will cover this topic in the second article.

Ok I admit, I still didn't answer the question. We have components and fast rendering - but why is it a game changer? Because React is mainly a concept and a library just secondly.
There are already several libraries following these ideas - doing it faster or slower - but slightly different. Like every programming concept, React has it’s own solutions, tools and libraries turning it into an ecosystem. In this ecosystem, you have to pick your own tools and build your own ~framework. I know it sounds scary but believe me, you already know most of these tools, we will just connect them to each other and later you will be very surprised how easy it is. For example for dependencies we won't use any magic, rather Node's require and npm. For the pub-sub, we will use Node's EventEmitter and as so on.

(Facebook announced Relay their framework for React at the React.js Conf in January 2015. But it's not available yet. The date of the first public release is unknown.)

Are you excited already? Let's dig in!

The Virtual DOM concept in a nutshell

To track down model changes and apply them on the DOM (alias rendering) we have to be aware of two important things:

  1. when data has changed,
  2. which DOM element(s) to be updated.

For the change detection (1) React uses an observer model instead of dirty checking (continuous model checking for changes). That’s why it doesn't have to calculate what is changed, it knows immediately. It reduces the calculations and make the app smoother. But the really cool idea here is how it manages the DOM manipulations:

For the DOM changing challenge (2) React builds the tree representation of the DOM in the memory and calculates which DOM element should change. DOM manipulation is heavy, and we would like to keep it at the minimum. Luckily, React tries to keep as much DOM elements untouched as possible. Given the less DOM manipulation can be calculated faster based on the object representation, the costs of the DOM changes are reduced nicely.

Node.js Monitoring and Debugging from the Experts of RisingStack

Build performant backends for React applications using Trace
Start my free trial

Since React's diffing algorithm uses the tree representation of the DOM and re-calculates all subtrees when its’ parent got modified (marked as dirty), you should be aware of your model changes, because the whole subtree will be re-rendered then.
Don't be sad, later we will optimize this behavior together. (spoiler: with shouldComponentUpdate() and ImmutableJS)

React.js Tutorial React re-render (source: React’s diffing algorithm - Christopher Chedeau)

How to render on the server too?

Given the fact, that this kind of DOM representation uses fake DOM, it's possible to render the HTML output on the server side as well (without JSDom, PhantomJS etc.). React is also smart enough to recognize that the markup is already there (from the server) and will add only the event handlers on the client side.

Interesting: React's rendered HTML markup contains data-reactid attributes, which helps React tracking DOM nodes.

Useful links, other virtual DOM libraries

Component-driven development

It was one of the most difficult parts for me to pick up when I was learning React.
In the component-driven development, you won't see the whole site in one template.
In the beginning you will probably think that it sucks. But I'm pretty sure that later you will recognize the power of thinking in smaller pieces and work with less responsibility. It makes things easier to understand, to maintain and to cover with tests.

How should I imagine it?

Check out the picture below. This is a possible component breakdown of a feature/site. Each of the bordered areas with different colors represents a single type of component. According to this, you have the following component hierarchy:

  • FilterableProductTable

What should a component contain?

First of all it’s wise to follow the single responsibility principle and ideally, design your components to be responsible for only one thing. When you start to feel you are not doing it right anymore with your component, you should consider breaking it down into smaller ones.

Since we are talking about component hierarchy, your components will use other components as well. But let's see the code of a simple component in ES5:

var HelloComponent = React.createClass({  
    render: function() {
        return <div>Hello {this.props.name}</div>;
    }
});

But from now on, we will use ES6. ;)
Let’s check out the same component in ES6:

class HelloComponent extends React.Component {  
  render() {
    return <div>Hello {this.props.name}</div>;
  }
}

JS, JSX

As you can see, our component is a mix of JS and HTML codes. Wait, what? HTML in my JavaScript? Yes, probably you think it's strange, but the idea here is to have everything in one place. Remember, single responsibility. It makes a component extremely flexible and reusable.

In React, it's possible to write your component in pure JS like:

  render () {
    return React.createElement("div", null, "Hello ",
        this.props.name);
  }

But I think it's not very comfortable to write your HTML in this way. Luckily we can write it in a JSX syntax (JavaScript extension) which let us write HTML inline:

  render () {
    return <div>Hello {this.props.name}</div>;
  }

What is JSX?
JSX is a XML-like syntax extension to ECMAScript. JSX and HTML syntax are similar but it’s different at some point. For example the HTML class attribute is called className in JSX. For more differences and gathering deeper knowledge check out Facebook’s HTML Tags vs. React Components guide.

Because JSX is not supported in browsers by default (maybe someday) we have to compile it to JS. I'll write about how to use JSX in the Setup section later. (by the way Babel can also transpile JSX to JS).

Useful links about JSX:
- JSX in depth
- Online JSX compiler
- Babel: How to use the react transformer.

What else can we add?

Each component can have an internal state, some logic, event handlers (for example: button clicks, form input changes) and it can also have inline style. Basically everything what is needed for proper displaying.

You can see a {this.props.name} at the code snippet. It means we can pass properties to our components when we are building our component hierarchy. Like: <MyComponent name="John Doe" />
It makes the component reusable and makes it possible to pass our application state from the root component to the child components down, through the whole application, always just the necessary part of the data.

Check this simple React app snippet below:

class UserName extends React.Component {  
  render() {
    return <div>name: {this.props.name}</div>;
  }
}

class User extends React.Component {  
  render() {
    return <div>
        <h1>City: {this.props.user.city}</h1>
        <UserName name={this.props.user.name} />
      </div>;
  }
}

var user = { name: 'John', city: 'San Francisco' };  
React.render(<User user={user} />, mountNode);

Useful links for building components:
- Thinking in React

React loves ES6

ES6 is here and there is no better place for trying it out than your new shiny React project.

React wasn't born with ES6 syntax, the support came this year January, in version v0.13.0.

However the scope of this article is not to explain ES6 deeply; we will use some features from it, like classes, arrows, consts and modules. For example, we will inherit our components from the React.Component class.

Given ES6 is supported partly by browsers, we will write our code in ES6 but transpile it to ES5 later and make it work with every modern browser even without ES6 support.
To achieve this, we will use the Babel transpiler. It has a nice compact intro about the supported ES6 features, I recommend to check it out: Learn ES6

Useful links about ES6
- Babel: Learn ES6
- React ES6 announcement

Bundling with Webpack and Babel

I mentioned earlier that we will involve tools you are already familiar with and build our application from the combination of those. The first tool what might be well known is the Node.js's module system and it's package manager, npm. We will write our code in the "node style" and require everything what we need. React is available as a single npm package.
This way our component will look like this:

// would be in ES5: var React = require('react/addons');
import React from 'react/addons';

class MyComponent extends React.Component { ... }

// would be in ES5: module.exports = MyComponent;
export default MyComponent;  

We are going to use other npm packages as well.
Most npm packages make sense on the client side as well,
for example we will use debug for debugging and superagent for composing requests.

Now we have a dependency system by Node (accurately ES6) and we have a solution for almost everything by npm. What's next? We should pick our favorite libraries for our problems and bundle them up in the client as a single codebase. To achieve this, we need a solution for making them run in the browser.

This is the point where we should pick a bundler. One of the most popular solutions today are Browserify and Webpack projects. Now we are going to use Webpack, because my experience is that Webpack is more preferred by the React community. However, I'm pretty sure that you can do the same with Browserify as well.

How does it work?

Webpack bundles our code and the required packages into the output file(s) for the browser. Since we are using JSX and ES6 which we would like to transpile to ES5 JS, we have to place the JSX and ES6 to ES5 transpiler into this flow as well. Actually, Babel can do the both for us. Let's just use that!

We can do that easily because Webpack is configuration-oriented

What do we need for this? First we need to install the necessary modules (starts with npm init if you don't have the package.json file yet).

Run the following commands in your terminal (Node.js or IO.js and npm is necessary for this step):

npm install --save-dev webpack  
npm install --save-dev babel  
npm install --save-dev babel-loader  

After we created the webpack.config.js file for Webpack (It's ES5, we don't have the ES6 transpiler in the webpack configuration file):

var path = require('path');

module.exports = {  
  entry: path.resolve(__dirname, '../src/client/scripts/client.js'),
  output: {
    path: path.resolve(__dirname, '../dist'),
    filename: 'bundle.js'
  },

  module: {
    loaders: [
      {
        test: /src\/.+.js$/,
        exclude: /node_modules/,
        loader: 'babel'
      }
    ]
  }
};

If we did it right, our application starts at ./src/scripts/client/client.js and goes to the ./dist/bundle.js for the command webpack.

After that, you can just include the bundle.js script into your index.html and it should work:
<script src="bundle.js"></script>

(Hint: you can serve your site with node-static install the module with, npm install -g node-static and start with static . to serve your folder's content on the address: 127.0.0.1:8080.)

Project setup

Now we have installed and configured Webpack and Babel properly.
As in every project, we need a project structure.

Folder structure

I prefer to follow the project structure below:

config/  
    app.js
    webpack.js (js config over json -> flexible)
src/  
  app/ (the React app: runs on server and client too)
    components/
      __tests__ (Jest test folder)
      AppRoot.jsx
      Cart.jsx
      Item.jsx
    index.js (just to export app)
    app.js
  client/  (only browser: attach app to DOM)
    styles/
    scripts/
      client.js
    index.html
  server/
    index.js
    server.js
.gitignore
.jshintrc
package.json  
README.md  

The idea behind this structure is to separate the React app from the client and server code. Since our React app can run on both client and server side (=isomorphic app, we will dive deep into this in an upcoming blog post).

How to test my React app

When we are moving to a new technology, one of the most important questions should be testability. Without a good test coverage, you are playing with fire.

Ok, but which testing framework to use?
My experience is that testing a front end solution always works best with the test framework by the same creators. According to this I started to test my React apps with Jest. Jest is a test framework by Facebook and has many great features that I won't cover in this article.

I think it's more important to talk about the way of testing a React app. Luckily the single responsibility forces our components to do only one thing, so we should test only that thing. Pass the properties to our component, trigger the possible events and check the rendered output. Sounds easy, because it is.

For more practical example, I recommend checking out the Jest React.js tutorial.

Test JSX and ES6 files

To test our ES6 syntax and JSX files, we should transform them for Jest. Jest has a config variable where you can define a preprocessor (scriptPreprocessor) for that.
First we should create the preprocessor and after that pass the path to it for Jest. You can find a working example for a Babel Jest preprocessor in our repository.

Jet’s also has an example for React ES6 testing.

(The Jest config goes to the package json.)

Takeaway

In this article, we examined together why React is fast and scalable but how different its approach is. We went through how React handles the rendering and what the component-driven development is and how should you set up and organize your project. These are the very basics.

In the upcoming "The React way" articles we are going to dig deeper.

I still believe that the best way to learn a new programming approach is to start develop and write code.
That’s why I would like to ask you to write something awesome and also spend some time to check out the offical React website, especially the guides section. Excellent resource, the Facebook developers, and the React community did an awesome job with it.

Next up

If you liked this article, subscribe to our newsletter for more. The remaining part of the The React way post series are coming soon. We will cover topics like:

  • immutability
  • top-down rendering
  • Flux
  • isomorphic way (common app on client and server)

Feel free to check out the repository:
https://github.com/RisingStack/react-way-getting-started

Update: the second part is out! Learn more about the React.js way in the second part of the series: Flux Architecture with Immutable.js.

Flux inspired libraries with React

There are lots of flux or flux inspired libraries out there: they try to solve different kind of problems, but which one should you use? This article tries to give an overview on the different approaches.

What is Flux? (the original)

An application architecture for React utilizing a unidirectional data flow. - flux

Ok, but why?

Flux tries to avoid the complex cross dependencies between your modules (MVC for example) and realize a simple one-way data flow. This helps you to write scalable applications and avoid side effects in your application.

Read more about this and about the key properties of Flux architecture at Fluxxor's documentation.

Original flux

Facebook's original Flux has four main components:
singleton Dispatcher, Stores, Actions and Views (or controller-view)

Dispatcher

From the Flux overview:

The dispatcher is the central hub that manages all data flow in a Flux application.

In details:

It is essentially a registry of callbacks into the stores.
Each store registers itself and provides a callback. When the dispatcher responds to an action, all stores in the application are sent the data payload provided by the action via the callbacks in the registry.

Actions

Actions can have a type and a payload. They can be triggered by the Views or by the Server (external source). Actions trigger Store updates.

Facts about Actions:

  • Actions should be descriptive:

    The action (and the component generating the action) doesn't know how to perform the update, but describes what it wants to happen. - Semantic Actions

  • But shouldn't perform an other Action: No Cascading Actions

  • About Actions dispatches

    Action dispatches and their handlers inside the stores are synchronous. All asynchronous operations should trigger an action dispatch that tells the system about the result of the operation - Enforced Synchrony

Later you will see that Actions can be implemented and used in different ways.

Stores

Stores contain the application state and logic.

Every Store receives every action from the Dispatcher but a single store handles only the specified events. For example, the User store handles only user specific actions like createUser and avoid the other actions.

After the store handled the Action and it's updated, the Store broadcasts a change event. This event will be received by the Views.

Store shouldn't be updated externally, the update of the Store should be triggered internally as a response to an Action: Inversion of Control.

Views

Views are subscribed for one or multiple Stores and handle the store change event.
When a store change event is received, the view will get the data from the Store via the Store's getter functions. Then the View will render with the new data.

Steps:
1. Store change event received
2. Get data from the Store via getters
3. Render view

FB Flux

You can find several Flux implementations on GitHub, the most popular libraries are the followings:

Beyond Flux

Lots of people think that Flux could be more reactive and I can just agree with them.
Flux is a unidirectional data flow which is very similar to the event streams.

Now let's see some other ways to have something Flux-like but being functional reactive at the same time.

Reflux

Reflux has refactored Flux to be a bit more dynamic and be more Functional Reactive Programming (FRP) friendly - refluxjs

Reflux is a more reactive Flux implementation by @spoike because he found the original one confusing and broken at some points: Deconstructing ReactJS's Flux

The biggest difference between Flux and Reflux is that there is no centralized dispatcher.

Actions are functions which can pass payload at call. Actions are listenable and Stores can subscribe for them. In Reflux every action act as dispatcher.

Reflux

Reflux provides mixins for React to listen on stores changes easily.
It provides support for async and sync actions as well. It's also easy to handle async errors with Reflux.

In Reflux, stores can listen to other stores in serial and parallel way which sounds useful but it increases the cross dependencies between your stores. I'm afraid you can easily find yourself in a middle of circular dependency.

A problem arises if we create circular dependencies. If Store A waits for Store B, and B waits for A, then we'll have a very bad situation on our hands. - flux

Update

There is a circular dependency check for some cases in reflux implemented and is usually not an issue as long as you think of data flows with actions as initiators of data flows and stores as transformations. - @spoike

rx-flux

The Flux architecture allows you to think your application as an unidirectional flow of data, this module aims to facilitate the use of RxJS Observable as basis for defining the relations between the different entities composing your application. - rx-flux

rx-flux is a newcomer and uses RxJS, the reactive extension to implement a unidirectional data flow.

rx-flux is more similar to Reflux than to the original Flux (from the readme):

  • A store is an RxJS Observable that holds a value
  • An action is a function and an RxJS Observable
  • A store subscribes to an action and updates accordingly its value.
  • There is no central dispatcher.

When the Stores and Actions are RxJS Observables you can use the power of Rx to handle your application business logic in a Functional Reactive way which can be extremely useful in asynchronous situations.

If you don't like Rx, there are similar projects with Bacon.js like fluxstream or react-bacon-flux-poc.

If you like the concept of FRP, I recommend you to read @niklasvh's article about how he combined Immutable.js and Bacon.js to have a more reactive Flux implementation: Flux inspired reactive data flow using React and Bacon.js
niklasvh's example code for lazy people: flux-todomvc

Omniscient

Omniscient is a really different approach compared to Flux. It uses the power of the Facebook's Immutable.js to speed up the rendering. It renders only when the data is really changed. This kind of optimized call of the render function (React) can help us to build performant web applications.

Rendering is already optimized in React with the concept of Virtual DOM, but it checks the DOM diffs what is also computation heavy. With Omniscient you can reduce the React calls and avoid the diff calculations.

What? / Example:
Imagine the following scenario: the user's name is changed, what will happen in Flux and what in Omniscient?
In Flux every user related view component will be re-rendered because they are subscribed to the user Store which one broadcasts a change event.
In Omniscient, only components will be re-rendered which are using the user's name cursor.
Of course it's possible to diverse Flux with multiple Stores, but most of the cases it doesn't make any sense to store the name in a different store.

Omniscient is for React, but actually it's just a helper for React and the real power comes from Immstruct what can be used without Omniscient with other libraries like virtual-dom.

It may not be trivial what Omniscient does. I think this todo example can help the most.

You can find a more complex demo here: demo-reactions

It would be interesting to hear what companies are using Omniscient in production.
If you do so, I would love to hear from you!

Further reading

The State of Flux
Flux inspired reactive data flow using React and Bacon.js
Deconstructing ReactJS's Flux
React + RxJS + Angular 2.0's di.js TodoMVC Example by @joelhooks