Expert Node.js Support
Learn more

monitoring

Announcing Free Node.js Monitoring & Debugging with Trace

Announcing Free Node.js Monitoring & Debugging with Trace

Today, we’re excited to announce that Trace, our Node.js monitoring & debugging tool is now free for open-source projects.

What is Trace?

We launched Trace a year ago with the intention of helping developers looking for a Node.js specific APM which is easy to use and helps with the most difficult aspects of building Node projects, like..

  • finding memory leaks in a production environment
  • profiling CPU usage to find bottlenecks
  • tracing distributed call-chains
  • avoiding security leaks & bad npm packages

.. and so on.

Node.js Monitoring with Trace by RisingStack - Performance Metrics chart

Why are we giving it away for free?

We use a ton of open-source technology every day, and we are also the maintainers of some.

We know from experience that developing an open-source project is hard work, which requires a lot of knowledge and persistence.

Trace will save a lot of time for those who use Node for their open-source projects.

How to get started with Trace?

  1. Visit trace.risingstack.com and sign up - it's free.
  2. Connect your app with Trace.
  3. Head over to this form and tell us a little bit about your project.

Done. Your open-source project will be monitored for free as a result.

If you need help with Node.js Monitoring & Debugging..

Just drop us a tweet at @RisingStack if you have any additional questions about the tool or the process.

If you'd like to read a little bit more about the topic, I recommend to read our previous article The Definitive Guide for Monitoring Node.js Applications.

One more thing

At the same time of making Trace available for open-source projects, we're announcing our new line of business at RisingStack:

Commercial Node.js support, aimed at enterprises with Node.js applications running in a production environment.

RisingStack now helps to bootstrap and operate Node.js apps - no matter what life cycle they are in.


Disclaimer: We retain the exclusive right to accept or deny your application to use Trace by RisingStack for free.

Node Hero - Monitoring Node.js Applications

Node Hero - Monitoring Node.js Applications

This article is the 13th part of the tutorial series called Node Hero - in these chapters, you can learn how to get started with Node.js and deliver software products using it.

In the last article of the series, I’m going to show you how to do Node.js monitoring and how to find advanced issues in production environments.

The Importance of Node.js Monitoring

Getting insights into production systems is critical when you are building Node.js applications! You have an obligation to constantly detect bottlenecks and figure out what slows your product down.

An even greater issue is to handle and preempt downtimes. You must be notified as soon as they happen, preferably before your customers start to complain. Based on these needs, proper monitoring should give you at least the following features and insights into your application's behavior:

  • Profiling on a code level: You have to understand how much time does it take to run each function in a production environment, not just locally.

  • Monitoring network connections: If you are building a microservices architecture, you have to monitor network connections and lower delays in the communication between your services.

  • Performance dashboard: Knowing and constantly seeing the most important performance metrics of your application is essential to have a fast, stable production system.

  • Real-time alerting: For obvious reasons, if anything goes down, you need to get notified immediately. This means that you need tools that can integrate with Pagerduty or Opsgenie - so your DevOps team won’t miss anything important.

"Getting insights into production systems is critical when you are building #nodejs applications" via @RisingStack

Click To Tweet

Server Monitoring versus Application Monitoring

One concept developers usually apt to confuse is monitoring servers and monitoring the applications themselves. As we tend to do a lot of virtualization, these concepts should be treated separately, as a single server can host dozens of applications.

Let’s go trough the major differences!

Server Monitoring

Server monitoring is responsible for the host machine. It should be able to help you answer the following questions:

  • Does my server have enough disk space?
  • Does it have enough CPU time?
  • Does my server have enough memory?
  • Can it reach the network?

For server monitoring, you can use tools like zabbix.

Application Monitoring

Application monitoring, on the other hand, is responsible for the health of a given application instance. It should let you know the answers to the following questions:

  • Can an instance reach the database?
  • How much request does it handle?
  • What are the response times for the individual instances?
  • Can my application serve requests? Is it up?

For application monitoring, I recommend using our tool called Trace. What else? :)

We developed it to be an easy to use and efficient tool that you can use to monitor and debug applications from the moment you start building them, up to the point when you have a huge production app with hundreds of services.

How to Use Trace for Node.js Monitoring

To get started with Trace, head over to https://trace.risingstack.com and create your free account!

Once you registered, follow these steps to add Trace to your Node.js applications. It only takes up a minute - and these are the steps you should perform:

Start Node.js monitoring with these steps

Easy, right? If everything went well, you should see that the service you connected has just started sending data to Trace:

Reporting service in Trace for Node.js Monitoring

#1: Measure your performance

As the first step of monitoring your Node.js application, I recommend to head over to the metrics page and check out the performance of your services.

Basic Node.js performance metrics

  • You can use the response time panel to check out median and 95th percentile response data. It helps you to figure out when and why your application slows down and how it affects your users.
  • The throughput graph shows request per minutes (rpm) for status code categories (200-299 // 300-399 // 400-499 // >500 ). This way you can easily separate healthy and problematic HTTP requests within your application.
  • The memory usage graph shows how much memory your process uses. It’s quite useful for recognizing memory leaks and preempting crashes.

Advanced Node.js Monitoring Metrics

If you’d like to see special Node.js metrics, check out the garbage collection and event loop graphs. Those can help you to hunt down memory leaks. Read our metrics documentation.

#2: Set up alerts

As I mentioned earlier, you need a proper alerting system in action for your production application.

Go the alerting page of Trace and click on Create a new alert.

  • The most important thing to do here is to set up downtime and memory alerts. Trace will notify you on email / Slack / Pagerduty / Opsgenie, and you can use Webhooks as well.

  • I recommend setting up the alert we call Error rate by status code to know about HTTP requests with 4XX or 5XX status codes. These are errors you should definitely care about.

  • It can also be useful to create an alert for Response time - and get notified when your app starts to slow down.

#3: Investigate memory heapdumps

Go to the Profiler page and request a new memory heapdump, wait 5 minutes and request another. Download them and open them on Chrome DevTool’s Profiles page. Select the second one (the most recent one), and click Comparison.

chrome heap snapshot for finding a node.js memory leak

With this view, you can easily find memory leaks in your application. In a previous article I’ve written about this process in a detailed way, you can read it here: Hunting a Ghost - Finding a Memory Leak in Node.js

#4: CPU profiling

Profiling on the code level is essential to understand how much time does your function take to run in the actual production environment. Luckily, Trace has this area covered too.

All you have to do is to head over to the CPU Profiles tab on the Profiling page. Here you can request and download a profile which you can load into the Chrome DevTool as well.

CPU profiling in Trace

Once you loaded it, you'll be able to see the 10 second timeframe of your application and see all of your functions with times and URL's as well.

With this data, you'll be able to figure out what slows down your application and deal with it!

Download the whole Node Hero series as a single pdf

The End

Update: as a sequel to Node Hero, we have started a new series called Node.js at Scale. Check it out if you are interested in more in-depth articles!

This is it.

During the 13 episodes of the Node Hero series, you learned the basics of building great applications with Node.js.

I hope you enjoyed it and improved a lot! Please share this series with your friends if you think they need it as well - and show them Trace too. It’s a great tool for Node.js development!

If you have any questions regarding Node.js monitoring, let me know in the comments section!


Introducing Distributed Tracing for Microservices Monitoring

Introducing Distributed Tracing for Microservices Monitoring

At RisingStack, as an enterprise Node.js development and consulting company, we have been working tirelessly in the past two years to build durable and efficient microservices architectures for our clients and as being passionate advocates of this technology.

During this period, we had to face the cold fact that there aren’t proper tools able to support microservices architectures and the developers working with them. Monitoring, debugging and maintaining distributed systems is still extremely challenging.

We want to change this because doing microservices shouldn’t be so hard.

I am proud to announce that Trace - our microservices monitoring tool has entered the Open Beta stage and is available to use for free with Node.js services from now on.

Trace provides:

  • A Distributed Trace view for all of your transactions with error details
  • Service Map to see the communication between your microservices
  • Metrics on CPU, memory, RPM, response time, event loop and garbage collection
  • Alerting with Slack, Pagerduty, and Webhook integration

Trace makes application-level transparency available on a large microservices system with very low overhead. It will also help you to localize production issues faster to debug and monitor applications with ease.

You can use Trace in any IaaS or PaaS environment, including Amazon AWS, Heroku or DigitalOcean. Our solution currently supports Node.js only, but it will be available for other languages later as well. The open beta program lasts until 1 July.

Get started with Trace for free

Read along to get details on the individual features and on how Trace works.

Distributed Tracing

The most important feature of Trace is the transaction view. By using this tool, you can visualize every transaction going through your infrastructure on a timeline - in a very detailed way.

Distributed Tracing View Trace by Risingstack

By attaching a correlation ID to certain requests, Trace groups services taking part in a transaction and visualizes the exact data-flow on a simple tree-graph. Thanks to this you can see the distributed call stacks and the dependencies between your microservices and see where a request takes the most time.

This approach also lets you to localize ongoing issues and show them on the graph. Trace provides detailed feedback on what caused an error in a transaction and gives you enough data to start debugging your system instantly.

Distributed Tracing with Detailed Error Message

When a service causes an error in a distributed system, usually all of the services taking part in that transaction will throw an error, and it is hard to figure out which one really caused the trouble in the first place. From now on, you won’t need to dig through log files to find the answer.

With Trace, you can instantly see what was the path of a certain request, what services were involved, and what caused the error in your system.

The technology Trace uses is primarily based on Google’s Dapper whitepaper. Read the whole study to get the exact details.

Microservices Topology

Trace automatically generates a dynamic service map based on how your services communicate with each other or with databases and external APIs. In this view, we provide feedback on infrastructure health as well, so you will get informed when something begins to slow down or when a service starts to handle an increased amount of requests.

Distributed Tracing with Service Topology Map

The service topology view also allows you to immediately get a sense of how many requests your microservices handle in a given period and how big are their response times.

By getting this information you can see how your application looks like and understand the behavior of your microservices architecture.

Metrics and Alerting

Trace provides critical metrics data for each of your monitored services. Other than basics like CPU usage, memory usage, throughput and response time, our tool reports event loop and garbage collection metrics as well to make microservices development and operations easier.

Distributed Tracing with Metrics and Alerting

You can create alerts and get notified when a metric passes warning or error thresholds so you can act immediately. Trace will alert you via Slack, Pagerduty, Email or Webhook.

Give Microservices Monitoring a Try

Adding Trace to your services is possible with just a couple lines of code, and it can be installed and used in under two minutes.

Click to sign up for Trace

We are curious on your feedback on Trace and on the concept of distributed transaction tracking, so don’t hesitate to express your opinion in the comment section.

Monitoring Microservices Architectures: Enterprise Best Practices

Monitoring Microservices Architectures: Enterprise Best Practices

By reading the following article, you can get insight on how lead engineers at IBM, Financial Times and Netflix think about the pain-points of application monitoring and what are their best practices for maintaining and developing microservices. Also, I’d like to introduce a solution we developed at RisingStack, which aims to tackle the most important issues with monitoring microservices architectures.


Tearing down a monolithic application into a microservices architecture brings tremendous benefits to engineering teams and organizations. New features can be added without rewriting other services. Smaller codebases make development easier and faster, and the parts of an application can be scaled separately.

Unfortunately, migrating to a microservices architecture has its challenges as well since it requires complex distributed systems, where it can be difficult to understand the communication and request flow between the services. Also, monitoring gets increasingly frustrating thanks to a myriad of services generating a flood of unreliable alerts and un-actionable metrics.

Visibility is crucial for IBM with monitoring microservices architectures

Jason McGee, Vice President and Chief Technical Officer of Cloud Foundation Services at IBM let us take a look at the microservice related problems enterprises often face in his highly recommended Dockercon interview with The New Stack.

Node.js Monitoring and Debugging from the Experts of RisingStack

Build performant microservices applications using Trace
Learn more

For a number of years - according to Jason - developer teams were struggling to deal with the increasing speed and delivery pressures they had to fulfill, but with the arrival of microservices, things have changed.

Migrating from the Monolith to a Microservices Architecture

In a microservices architecture, a complex problem can be broken up into units that are truly independent, so the parts can continue to work separately. The services are decoupled, so people can operate in small groups with less coordination and therefore they can respond more quickly and go faster.

“It’s interesting that a lot of people talk about microservices as a technology when in reality I think it’s more about people, and how people are working together.”

The important thing about microservices for Jason is that anyone can give 5 or 10 people responsibility for a function, and they can manage that function throughout its lifecycle and update it whenever they need to - without having to coordinate with the rest of the world.

“But in technology, everything has a tradeoff, a downside. If you look at microservices at an organization level, the negative trade-off is the great increase in the complexity of operations. You end up with a much more complex operating environment.”

Right now, a lot of activity in the microservices space is about that what kind of tools and management systems teams have to put around their services to make microservices architectures a practical thing to do, said Jason. Teams with microservices have to understand how they want to factor their applications, what approaches they want to take for wiring everything together, and how can they reach the visibility of their services.

The first fundamental problem developers have to solve is how the services are going to find each other. After that, they have to manage complexity by instituting some standardized approach for service discovery. The second biggest problem is about monitoring and bringing visibility to services. Developers have to understand what’s going on, by getting visibility into what is happening in their cloud-based network of services.

Describing this in a simplified manner: an app can have hundreds of services behind the scene, and if it doesn’t work, someone has to figure out what’s going on. When developers just see miles of logs, they are going to have a hard time tracing back a problem to its cause. That’s why people working with microservices need excellent tools providing actionable outputs.

“There is no way a human can map how everyone is talking to everyone, so you need new tools to give you the visibility that you need. That’s a new problem that has to be solved for microservices to became an option.”


At RisingStack, as an enterprise Node.js development and consulting company, we experienced the same problems with microservices since the moment of their conception.

Our frustration of not having proper tools to solve these issues led us to develop our own solution called Trace, a microservice monitoring tool with distributed transaction tracking, error detection, and process monitoring for microservices. Our tool is currently in an open beta stage, therefore it can be used for free.

If you’d like to give it a look, we’d appreciate your feedback on our Node.js monitoring platform.


Financial Times eases the pain of monitoring microservices architectures with the right tools and smart alerts

Sarah Wells, Principal Engineer of Financial Times told the story of what it’s like to move from monitoring a monolithic application to monitoring a microservice architecture in her Codemotion presentation named Alert overload: How to adopt a microservices architecture.

About two years ago Financial Times started working on a new project where their goal was to build a new content platform (Fast FT) with a microservices architecture and APIs. The project team also started to do DevOps at the same time, because they were building a lot of new services, and they couldn’t take the time to hand them over to a different operations team. According to Sarah, supporting their own services meant that all of the pain the operations team used to have was suddenly transferred to them when they did shoddy monitoring and alerting.

“Microservices make it worse! Microservices are an efficient device for transforming business problems into distributed transaction problems.”

It’s also important to note here, that there’s a lot of things to like about microservices as Sarah mentioned:

“I am very happy that I can reason about what I’m trying to do because I can make changes live to a very small piece of my system and roll back really easily whenever I want to. I can change the architecture and I can get rid of the old stuff much more easily than I could when I was building a monolith.”

Let’s see what was the biggest challenge the DevOps team at Financial Times faced with a microservice architecture. According to Sarah, monitoring suddenly became much harder because they had a lot more systems than before. The app they built consisted of 45 microservices. They had 3 environments (integration, test, production) and 2 VM’s for each of those services. Since they ran 20 different checks per service (for things like CPU load, disk status, functional tests, etc.) and they ran them every 5 minutes at least. They ended up with 1,500,000 checks a day, which meant that they got alerts for unlikely and transient things all the time.

“When you build a microservices architecture and something fails, you’re going to get an alert from a service that’s using it. But if you’re not clever about how you do alerts, you’re also going to get alerts from every other service that uses it, and then you get a cascade of alerts.”

One time a new developer joined Sarah’s team he couldn’t believe the number of emails they got from different monitoring services, so he started to count them. The result was over 19,000 system monitoring alerts in 50 days, 380 a day on average. Functional monitoring was also an issue since the team wanted to know when their response time was getting slow or when they logged or returned an error to anyone. Needless to say, they got swamped by the amount of alerts they got, namely 12,745 response time or error alerts in 50 days, 255 a day on average.

Monitoring a Microservices Architecture can cause trouble with Alerting

Sarah and the team finally developed three core principles for making this almost unbearable situation better.

1.Think about monitoring from the start.

The Financial Times team created far too many alerts without thinking about why they were doing it. As it turned out, it was the business functionality they really cared about, not the individual microservices - so that’s what their alerting should have focused on. At the end of the day, they only wanted an alert when they needed to take action. Otherwise, it was just noise. They made sure that the alerts are actually good because anyone reading them should be able to work out what they mean and what is needed to do.

According to Sarah’s experiences, a good alert has clear language, is not fake, and contains a link to more explanatory information. They had also developed a smart solution: they tied all of their microservices together by passing around transaction ID’s as request headers, so the team instantly knew that if an error was caused thanks by an event in the system, and they could even search for it. The team also established health checks for every RESTful application, since they wanted to know early about problems that could affect their customers.

2.Use the right tools for the job.

Since the platform Sarah’s team have been working on was an internal PaaS, they figured out that they needed some tooling to get the job done. They used different solutions for service monitoring, log aggregation, graphing, real-time error analysis, and also built some custom in-house tools for themselves. You can check out the individual tools in Sarah’s presentation from slide51.

The main takeaway from their example was that they needed tools that could show if something happened 10 minutes ago but disappeared soon after - while everyone was in a meeting. They figured out the proper communication channel for alerting: it was not email, but Slack! The team had also established a clever reaction system to tag solved and work in progress issues in Slack.

3.Cultivate your alerts

As soon as you stop paying attention to alerts, things will go wrong. When Sarah’s team gets an alert, they are reviewing it and acting on it immediately. If the alert isn’t good, they are either getting rid of it or making it better. If it isn’t helpful, they make sure that it won’t get sent again. It’s also important to make sure that alerts didn’t stop working. To check this, the team of FT often breaks things deliberately (they actually have a chaos monkey), just to make sure that alerts do fire.

How did the team benefit from these actions? They were able to turn off all emails from system monitoring and they could carry on with work while they were still able to monitor their systems. Sarah ended her presentation with a huge recommendation for using microservices and with her previously discussed pieces of advice distilled in a brief form:

“I build microservices because they are good, and I really like working with them. If you do that, you have to appreciate that you need to work at supporting them. Think about monitoring from the start, make sure you have the right tools and continue to work on your alerts as you go.”

Death Star diagrams make no sense with Microservices Architectures

Adrian Cockroft had the privilege to gain a tremendous amount of microservices related experience by working as Chief Architect for 7 years at Netflix - a company heavily relying on a microservices architecture to provide excellent user experience.

According to Adrian, teams working with microservices have to deal with three major problems right now.

“When you have microservices, you end up with a high rate of change. You do a code push and floods of new microservices appear. It’s possible to launch thousands of them in a short time, which will certainly break any monitoring solution.”

The second problem is that everything is ephemeral: Short lifetimes make it hard to aggregate historical views of services, and hand tweaked monitoring tools take too much work to keep running.

“Microservices have increasingly complex calling patterns. These patterns are hard to figure out with 800 microservices calling each other all the time. The visualization of these flows gets overwhelming, and it’s hard to render so many nodes.”

These microservice diagrams may look complicated, but looking inside a monolith would be even more confusing because it’s tangled together in ways you can’t even see. The system gets tangled together, like a big mass of spaghetti - said Adrian.

A Microservices Architecture often looks like Death Star diagrams Furthermore, managing scale is a grave challenge in the industry right now, because a single company can have tens of thousands of instances across five continents and that makes things complicated. Tooling is crucial in this area. Netflix built its own in-house monitoring tool. Twitter made its own tool too, which is called Zipkin (an open source Java monitoring tool based on Google’s Dapper technology). The problem with these tools is when teams look at the systems they have successfully mapped out, they often end up with the so-called Death Star diagrams.

“Currently, there are a bunch of tools trying to do monitoring in a small way - they can show the request flow across a few services. The problem is, that they can only visualize your own bounded context - who are your clients, who are your dependencies. That works pretty well, but once you’re getting into what’s the big picture with everything, the result will be too difficult to comprehend.”

For Adrian, it was a great frustration at Netflix that every monitoring tool they tried exploded on impact. Another problem is that using, or even testing monitoring tools at scale gets expensive very quickly. Adrian illustrated his claim with a frightening example: The single biggest budget component for Amazon is the monitoring system: it takes up 20% of the costs.

“Pretty much all of the tools you can buy now understand datacenters with a hundred nodes, that’s easy. Some of them can understand cloud. Some of them can get to a few thousand nodes. There’s a few alpha and beta monitoring solutions that claim they can get to the ten thousands. With APM’s you want to understand containers, because your containers might be coming and going in seconds - so event-driven monitoring is a big challenge for these systems.”

According to Adrian, there is still hope since the tools that are currently being built will get to the point where the large scale companies can use them as commercial products.


If you have additional thoughts on the topic, feel free to share it in the comments section.