javascript

On Third-Party JavaScript - In Production Case-Study

Third-party JavaScript is a pattern of JavaScript programming that enables the creation of highly distributable web applications. Unlike regular web applications, which are accessed at a single web address, these applications can be arbitrarily loaded on any web page using simple JavaScript includes. — Ben Vinegar, Anton Kovalyov (Third-party Javascript)

Google Analytics, Mixpanel, Disqus - just to name a few products that heavily rely on third-party JavaScript development.

This is the second part of the "On Third-Party JavaScript" series, the first one dealt with the very principles you have to understand when dealing with third-party JavaScipt.

In this post we are going to take a look on how companies out there solve the challenges of third-party JavaScript development.

Disclaimer: we do not work for any of these companies (as of the date publishing this post), so the findings here are based purely on reverse engineering.

The Mixpanel Way

Mixpanel is a company that provides an analytics platform that can be integrated with your web applications - so they heavily depend on third-party JavaScript.

Integrating with Mixpanel

To integrate with Mixpanel the very first step you have to do is include the following snippet in your HTML's head section:

After this you can use the full power of the library - but what happens behind the curtains? This tiny include snippet only does the following:

  • when inserted into the page it automatically starts downloading the full mixpanel library
  • if mixpanel.init puts placeholder methods on the global mixpanel object - this will store any commands until the full is not loaded
  • when the full library is downloaded then it overwrites the methods on the global mixpanel object and processes any commands that may be in the queue

Size, cache policy

The full mixpanel library is only 16.2K - so far so good. Check.

Regarding the distribution: Mixpanel uses Akamai as their CDN provider. Check.

Mixpanel sets the Cache-Control:public, max-age=3600 header on their library. This tells the browser to keep the current version for 3600 seconds (1 hour) then download it again. This, combined with the 16K in size can have a huge impact on both their Akamai bills and the users bandwidth usage (no, not in a good way).

Things to improve at MixPanel

  • To improve the cache policy and enable infinite caching of the main library Mixpanel should split the main file into two separate files, like outlined in the previous article's Distributing section.

The Disqus Way

Disqus is a great tool for connecting the audience of a website or blog and start a discussion on each content. Unlike Mixpanel, Disqus has to take care of a user interface as well.

Integration

Integrating Disqus is pretty straightforward:

Let's take a closer look!

The very first thing you will notice is the configuration variable - it is used to initializes Disqus to work with your site. What comes after that is very similar to Mixpanel - the loading of the main application's JavaScript file. Also, pay attention to the <noscript> tag - if a user does not have JavaScript support, it will give him/her a warning.

The mechanisms behind Disqus

The embed.js file is what the small include snippet will require - this is relatively small, and has a very short cache lifetime set: 300 seconds. This file contains some basic configuration, like the version of Disqus you are using. This information is then used to fetch more resources the application needs. This setup works really great, as only the embed.js has to have a small cache period set, other resources can be cached for a really long period of time.

For configuration management they are using a config.js file which is cached for a very short amount of time to enable rapid changes in the configuration. This file contains settings on how Disqus will appear - but not just that: it contains feature flips, and service discovery information as well.

Disqus not only loads static resources with embed.js, but modules as well, staring with lounge.load.js. For this, they are using AMD, which stands for Asynchronous Module Definition. The library that helps them is RequireJS. These modules are referenced with a unique version/commit tag, so they can be cached for eternity - if they want to roll out a new version only some configuration has to be updated, which will mean a new URI for the resource.

CSS and images files are handled in the very same way: they are versioned, and cached for a month. To send images to the client Disqus uses sprites to minimize the number of requests the client has to make.

A summarized overview of what is happening:

Disqus architecture

One more thing about the implementation of Disqus - bug tracking! They use Sentry to collect and report JavaScript errors - you should give Sentry a try as well.

Things to improve

The Disqus team made an incredible job developing their product, hats off! The only thing I could come up with is that they include snippet leaks the disqus_shortname variable into the global scope. A possible way to solve this:

Conclusion

Building third-party JavaScript is a hard task - but there are companies out there who made a great job solving this. When in doubt, you can always try to examine how they do things, so you will have options to choose from.

UPDATE 1: Disqus uses their own "smart" file versioner, which can be found here: https://github.com/disqus/grunt-smartrev. Thanks!

Need help in developing your application?

RisingStack provides JavaScript development and consulting services - ping us if you need a helping hand!

On Third-Party Javascript - The Principles

Third-party JavaScript is a pattern of JavaScript programming that enables the creation of highly distributable web applications. Unlike regular web applications, which are accessed at a single web address, these applications can be arbitrarily loaded on any web page using simple JavaScript includes. — Ben Vinegar, Anton Kovalyov (Third-party Javascript)

Google Analytics, Mixpanel, Disqus - just to name a few products that heavily rely on third-party JavaScript development. In this post we are going to take a look on the principles of third-party JavaScript development - in Part II we will take a look on how other companies do it in details.

Principles of developing third-party JavaScript

Before going into the details on how the big players out there do this, let's take a look at the key points that you should pay attention to.

Injecting third-party JavaScript

Traditionally, JavaScript resources can be inserted into a webpage with the following snippet:

We have to do something very similar when integrating into different web applications. For this you can provide the following snippet to your clients:

What just happened here? Firstly, we created a new script HTML element, then started to decorate it with attributes. This snippet should be placed at the end of the body tag.

The most important thing to notice here is the async attribute. Imagine the following scenario: your service gets a huge amount of traffic, and it becomes slow. If the loading of your script does not happen asynchronously, you can block the entire webpage. By setting its value to true we make sure that we will not block the loading of any other resources on the page.

But what should this file contain? Your entire application, or something different? The distribution part will try to answer this question.

The sacred global scope

When writing third-party JavaScript you do not know where your library will be used. It will be injected into the unknown, and that unknown sometimes will be Mordor itself, with other third-party libs already there.

Be a good guy, do not pollute the global scope even more.

Dependencies

As we have already discussed, your script will be injected into the unknown. This means, that it is very likely that libraries like jQuery, Backbone or Lodash/Underscore will be present in the page.

Be careful! You should never depend on these things, the developers of that site will not reach out to you and ask, if you are still using that thing. Even worse, they can use different versions of those libraries. So once again: never ever use them.

But what should you do instead? You have to bundle all your dependencies into your JavaScript file(s). Make sure, that thes do not interfere with the original ones (a.k.a. noConflict). To solve this problem Browserify/Webpack can be a good choice - they can help isolate your dependencies from the dependencies of the orignal site with scoping.

Also, lots of front end libraries can be found on NPM and used with Browserify/Webpack. (e.g. you can use jQuery this way without putting it into the global scope, or even worse, overwritting the one used by the site you are injected into).

Communication with a server

When developing third-party JavaScript, communication with the back end servers is not trivial.

XMLHttpRequest cannot load http://example.org/apple. Origin https://example.com is not allowed by Access-Control-Allow-Origin.  

Have you ever encountered this error message? It happened because the remote server refused to serve our request.

Enable CORS (Cross-Origin Resource Sharing)

The easiest way to do is to set the following headers in the response of the server:

Access-Control-Allow-Origin: *  

Sure, you may want to limit who can reach your services - you can add domains instead of the asterisk.

The only thing you have to keep in mind when using CORS is the legacy support (if you have to deal with that). Internet Explorer browsers (8 and 9) does not have full CORS support:

  • only POST and GET
  • no custom HTTP headers
  • content-type must be text/plain

To support these browsers you have to implement HTTP Method Overriding on both the client and the server. How does that work? It extracts the intended HTTP method from the method querystring/parameter, then handle the actual request as it was a DELETE, PUT, etc...

Luckily, for the common frameworks like Express and Koa you can find solutions on NPM (for Express, for Koa).

Identifying users

Users can be identified using cookies. They can be used in third-party JavaScript development as well, but we have to introduce two new definitions.

First-party cookie

First-party cookies are the "traditional" cookies. They are called first-party cookies because these cookies are placed on the same domain where the JavaScript code runs. Your partners will also see these cookies in their traffic.

Third-party cookie

Third-party cookies are called third-party, because they are placed on a different domain. Imagine the following scenario: your script is injected into examplestore.com. You may want to track your users using your own domain, whatanicewidget.com. In that case you will put a cookie on whatanicewidget.com.

What are the benefits of using a third-party cookie? You can recognise users from niceexamplestore.com, whatastooore.com not just from examplestore.com, because when making requests to your domain you will have the very same cookie.

When implementing an identifying mechanism for your application do not forget, that third-party cookies are not supported everywhere. Because of this reason you have to implement a fallback for the first-party cookie version.

LocalStorage

This is the trickiest one. You can use localStorage (if available in the browser) to identify users. But be aware: the same-origin policy applies to localStorage as well, so visiting the same site using HTTP and HTTPS will result in different localStorage contents.

So how does localStorage help you? In short: you can use window.postMessage to send messages between windows. So what you have to do is to include an external webpage into your site using an iframe (using HTTPS), then communicate with it - that window will contain the localstorage, that will be the same, no matter from where the user visits that. A sample implementation can be found here: https://github.com/zendesk/cross-storage. `

Distributing

When serving third-party JavaScript applications the size of it and the cache policy are crucial, as both not just affect the time your users have to wait to see the application, but also your monthly bills. CDNs charge based on traffic (in GBs, TBs) and the number of requests.

Hopefully this will not strike you as a suprise: always uglify/minify your JavaScript/CSS resources.

What what about caching? If you set the max-age to a big number, then pushing out new versions may take a lot of time to progate to all the clients. If you set it to a small value, then the clients will frequently download it. We can do better!

Let's split up your application into two seperate JavaScript files! For the sake of simplicity call them loader.js and application.js.

The loader will be a really small file, basically what we created before, with a small exception: we include a revision number when loading the application.js file.

So in this case your users have to load loader.js file to their site, which will then load the application.js, containing all the application logic. But why to do this? For the loader file we can set a small cache time, like an hour - it does not matter if this will be downloaded a lot, because it will not be bigger than 1KB. For the application itself we can set the cache time to eternity, it will be downloaded only once.

Splitting third-party javascript applications Splitting third-party JavaScript applications

Recommended reading

Take a closer look on how the big players out there do third-party JavaScript development, examining cache policies, dependencies, security, communcation with the server and more.