Building a microservices architecture in an enterprise environment has tremendous benefits:
- Microservices do not require teams to rewrite the whole application if they want to add new features.
- Smaller codebases make maintenance easier and faster. This saves a lot of development effort and time, therefore increases overall productivity.
- The parts of an application can be scaled separately and are easier to deploy.
After reading this article you will gain valuable insights on the best practices, benefits, and pain-points of using microservices, based on the experiences of highly innovative enterprises like Walmart, Spotify and Amazon.
Walmart Successfully Revitalized its Failing Architecture with Microservices
What can an enterprise do when its aging architecture finally begins to negatively affect business?
This is the multi-million dollar question which the IT Department of Walmart Canada had to address after they were failing to provide to their users on Black Fridays for two years in a row - according to Kevin Webber who helped to re-architect the retail giant's online business.
“It couldn’t handle 6 million pageviews per minute and made it impossible to keep any kind of positive user experience anymore.”
Before embracing microservices, Walmart had an architecture for the internet of 2005, designed around desktops, laptops and monoliths. The company decided to replatform its old legacy system in 2012 since it was unable to scale for 6 million pageviews per minute and was down for most of the day during peak events. They wanted to prepare for the world by 2020, with 4 billion people connected, 25+ million apps available, and 5.200 GB of data for each person on Earth.
Walmart replatformed to a microservices architecture with the intention of achieving close to 100% availability with reasonable costs.
“It’s important to have a system elastic enough to scale out to handle peak without negatively impacting experience.”
Migrating to microservices caused a significant business uplift for the company:
- conversions were up by 20% literally overnight
- mobile orders were up by 98% instantly
- no downtime on Black Friday or Boxing Day (The Black Friday of Canada)
zero downtime since the replatforming
The operational savings were significant as well since the company moved off of its expensive hardware onto commodity hardware (cheap virtual x86 servers). They saved 40% of the computing power and experienced 20-50% cost savings overall.
“Building microservice architectures are really the key to staying in front of the demands of the market. It’s not just a sort of replatforming for the sake of technology. It’s about the overall market in general, about what users expect and what business expects to stay competitive.“
Spotify Builds Flawless User Experience with Microservices
Kevin Goldsmith, VP of Engineering at Spotify knows from experience that an enterprise which intends to move fast and stay innovative in a highly competitive market requires an architecture that can scale.
Spotify serves 75 million active users per month, with an average session length of 23 minutes, while running incredibly complex business roles behind the scenes. They also have to watch out for their competitors, Apple and Google.
“If you’re worried about scaling to hundreds of millions of users, you build your system in a way that you scale components independently.”
Spotify is built on a microservice architecture with autonomous full-stack teams in charge in order to avoid synchronization hell within the organization.
“The problem is, if you want to build a new feature in this kind of (monolithic) world, then the client team have to ask the core team: please get us an API and let us do this. The core team asks the server team: please implement this on the server side so we can do whatever we need to do. And after that, the server team has to ask the infrastructure team for a new database. It is a lot of asking.”
Spotify has 90 teams, 600 developers, and 5 development offices on 2 continents building the same product, so they needed to minimize these dependencies as much as possible.
That’s why they build microservices with full-stack teams, each consisting of back-end developers, front-end developers, testers, a UI designer, and a product owner as well. These teams are autonomous, and their mission does not overlap with other teams mission.
“Developers deploy their services themselves and they are responsible for their own operations too. It’s great when teams have operational responsibility. If they write crummy code, and they are the ones who have to wake up every night to deal with incidents, the code will be fixed very soon.”
Spotify’s microservices are built in very loosely coupled architectures. There aren’t any strict dependencies between individual components.
Kevin mentioned the main challenges of working with microservices:
- They are difficult to monitor since thousands of instances are running at the same time.
- Microservices are prone to create increased latency: instead of calling a single process, Spotify is calling a lot of services, and these services are calling other services too, so the latency grows through each of these calls.
However, building a microservice architecture has its clear benefits for enterprises according to him:
- It’s easy to scale based on real-world bottlenecks: you can identify the bottlenecks in your services and replicate or fix them there without massive rewrites.
- It’s way easier to test: test surface is smaller, and they don’t do that much as big monolithic applications, so developers can test services locally - without having to deploy them to a test environment.
- It’s easier to deploy: applications are smaller, so they deploy really fast.
- Easier monitoring (in some sense): services are doing less so it’s easier to monitor each of these instances.
- Services can be versioned independently: there’s no need to add support for multiple versions in the same instances, so they don’t end up adding multiple versions to the same binary.
- Microservices are less susceptible to large failures: big services fail big, small services fail small.
Building a microservices architecture allows Spotify to have a large number of services down at the same time without the users even noticing it. They’ve built their system assuming that services can fail all the time, so individual services that could be failing are not doing too much, so they can't ruin the experience of using Spotify.
Kevin Goldsmith, VP of Engineering at Spotify ended his speech with a big shoutout to those who are hesitating about embracing microservices in an enterprise environment:
“We’ve been doing microservices at Spotify for years. We do it on a pretty large scale. We do it with thousands and thousand of running instances. We have been incredibly happy with it because we have scaled stuff up. We can rewrite our services at will - which we do, rather than continue to refactor them or to add more and more technical data over time. We just rewrite them when we get to a scaling inflection point. We do this kind of stuff all the time because it’s really easy with this kind of architecture, and its working incredibly well for us. So if you are trying to convince somebody at your company, point to Spotify, point to Netflix, point to other companies and say: This is really working for them, they’re super happy with it.”
Amazon Embraced the DevOps Philosophy with Microservices and Two-Pizza Teams
Rob Birgham, senior AWS product manager shared the story of how Amazon embraced the DevOps philosophy while they migrated to a microservice infrastructure.
He began his speech with a little retrospection: in 2001, the Amazon.com retail website was a large architectural monolith. It was architected in multiple tiers, and those tiers had many components in them, but they were coupled together very tightly, and behaved like one big monolith.
“A lot of startups and enterprise projects start out this way. They take a monolith first approach, because it’s very quick, but over time, as that project matures and has more developers on it, as it grows and the codebase gets more large, and the architecture gets more complex, that monolith is going to add overhead to your process, and the software development lifecycle is going to slow down.”
How did this affect Amazon? They had a large number of developers working on one big monolithic website, and even though each one of these developers only worked on a very small piece of that application, they still needed to deal with the overhead of coordinating their changes with everyone else who was also working on the same project.
When they were adding a new feature or making a bugfix, they needed to make sure that the change is not going to break something else on that project. If they wanted to update a shared library to take advantage of a new feature, they needed to convince everyone else on that project to upgrade to the new shared library at the same time. If they wanted to make a quick fix - to push out to their customers quickly - they couldn’t just do it on their own schedule; they had to coordinate that with all the other developers who have been processed changes at the same time.
“This lead to the existence of something like a merge Friday or a merge week - where all the developers took their changes, merged them together into one version, resolved all the conflicts, and finally created a master version that was ready to move out into production.“
Even when they had that large new version, it still added a lot of overhead to the delivery pipeline. The whole new codebase needed to be rebuilt, all of the test cases needed to be rerun, and after that they had to take the whole application and deploy it to the full production fleet.
Fun fact: In the early 2000’s Amazon even had an engineering group whose sole job was to take these new versions of the application and manually push it across Amazon's production environment.
It was frustrating for the software engineers, and most importantly, it was slowing down the software development lifecycle, the ability to innovate, so they made architectural and organizational changes - big ones.
These big changes began on an architectural level: Amazon went through its monolithic application and teased it apart into a Service Oriented Architecture.
“We went through the code and pulled out functional units that served a single purpose and wrapped those with a web service interface. We then established a rule, that from now on, they can only talk to each other through their web service APIs.”
This enabled Amazon to create a highly decoupled architecture, where these services could iterate independently from each other without any coordination between those services as long as they adhered to that standard web service interface.
“Back then it didn’t have a name, but now we call it as a microservice architecture.”
Amazon also implemented changes in how their organization operated. They broke down their one, central, hierarchical product development team into small, “two-pizza teams”.
“We originally wanted teams so small that we could feed them with just two pizzas. In reality, it’s 6-8 developers per team right now.”
Each of these teams were given full ownership of one or a few microservices. And by full ownership they mean everything at Amazon: They are talking to the customers (internal or external), they are defining their own feature roadmap, designing their features, implementing their features, then test them, deploy them and operate them.
If anything goes wrong anywhere in that full lifecycle, these two-pizza teams are the ones accountable for fixing it. If they choose to skimp on their testing and are unknowingly releasing bad changes into production, the same engineers have to wake up and fix the service in the middle of the night.
This organizational restructuring properly aligned incentives, so engineering teams are now fully motivated to make sure the entire end-to-end lifecycle operates efficiently.
“We didn’t have this term back then, but now we call it a DevOps organization. We took the responsibilities of development, test, and operations, and merged those all into a single engineering team.”
After all these changes were made, Amazon dramatically improved its front-end development lifecycle. Now the product teams can quickly make decisions and crank out new features for their microservices. The company makes 50 million deployments a year, thanks to the microservice architecture and their continuous delivery processes.
“How can others do this? There is not one right answer for every company. A company needs to look at cultural changes, organizational changes, and process changes. Also, there is one common building block that every DevOps transformation needs: That is to have an efficient and reliable continuous delivery pipeline.”
Every technology has a downside. If we consider microservices on an organization level, the negative trade-off is clearly the increase in the complexity of operations. There is no way a human can ultimately map how all of the services are talking to each other, so companies need tools to grant the visibility of their microservice infrastructure.
At RisingStack, our enterprise microservice development and consulting experience inspired us to create a monitoring tool called Trace, which allows engineers to successfully tackle the most common challenges during the full lifecycle of microservices: transaction tracking, anomaly detection, service topology and performance monitoring.
Do you have additional insights on the topic? Share it in the comments.