In the following blogpost, we will walk you through how we chased down a DNS resolution issue for one of our clients. Even though the problem at hand was very specific, you might find the steps we took during the investigation useful. Also, the tools we used might also prove to be helpful in case you’d face something similar in the future. We will also discuss how the Domain Name System (works), so buckle up!
The Symptoms
Our client hosts their static assets and webshop at a hosting provider. They also provide a WordPress like CMS and nameservers as well. The client wanted to reduce costs, so they asked us to move their static assets to AWS S3 and setup CloudFront to cache them. The client set up a CNAME record pointing to the CloudFront, thus it was available at assets.clients-domain.com
as well as distribution-id.cloudfront.net
Setting up CloudFront was pretty straightforward. However, we were pretty surprised when we got a ticket claiming that images are not visible when the site is visited from a mobile browser. Another subcontractor, who handles the development of the static pages reported that they can only access them from certain locations, but from others, they are not available.
Forming the DNS resolution error hypothesis, or why ‘curl’ is better than browsers
The first method we tried to reproduce the error was accessing the resource from the browser at https://assets.clients-domain.com/img/test-image.png
. It was pretty difficult as browsers had no problem loading the assets in our office. Then we used a VPN to test them from other locations.
The results were inconsistent: with disabling the browser cache the images we tested were loaded from one location without issues, from others it failed at first with 502 Bad Gateway
. Then, at some point, it started working and we weren’t able to break it again no matter how hard we tried. Then we tried using curl
. 50% of the time it worked, but the other 50% it reported:
$ curl https://assets.clients-domain.com/img/test-image.png --output test.png
% Total % Received % Xferd Average Speed Time Time Time Current
Dload Upload Total Spent Left Speed
0 0 0 0 0 0 0 0 --:--:-- --:--:-- --:--:-- 0curl: (6) Could not resolve host: assets.clients-domain.com
Once we saw Could not resolve host: assets.clients-domain.com
it was clear that we are facing a DNS issue. Or at least it was an educated guess worth verifying.
‘dig’-ing deep for verification
To verify our hypothesis, we tried to reach CloudFront straight. It worked fine, so we knew we were on the right track..
First, we thought there might be a problem with the way we set up the CNAME
record in CloudFront, so we started digging. We opened two panels in our terminals next to each other and ran watch curl https://assets.clients-domain.com/img/test-image.png --output test.png
and watch dig assets.clients-domain.com
.
dig
reported the following when curl
failed to reach the server:
$ watch dig assets.clients-domain.com
; <<>> DiG 9.13.5 <<>> assets.clients-domain.com
;; global options: +cmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: NXDOMAIN, id: 24152
;; flags: qr rd ra; QUERY: 1, ANSWER: 0, AUTHORITY: 1, ADDITIONAL: 1
;; OPT PSEUDOSECTION:
; EDNS: version: 0, flags:; udp: 4096
;; QUESTION SECTION:
;assets.clients-ip.com. IN A
;; AUTHORITY SECTION:
clients-ip.com. 300 IN SOA virga.hosting-service.com. root.virga.hosting-service.com. 2018091202 10800 3600 604800 86400
;; Query time: 183 msec
;; SERVER: 213.46.246.53#53(213.46.246.53)
;; WHEN: Fri Feb 01 17:18:12 CET 2019
;; MSG SIZE rcvd: 106
When we got a proper answer section, curl
managed to download the asset.
$ watch dig assets.clients-domain.com
; <<>> DiG 9.13.5 <<>> assets.clients-domain.com
;; global options: +cmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 51530
;; flags: qr rd ra; QUERY: 1, ANSWER: 5, AUTHORITY: 0, ADDITIONAL: 1
;; OPT PSEUDOSECTION:
; EDNS: version: 0, flags:; udp: 4096
;; QUESTION SECTION:
;assets.clients-domain.com. IN A
;; ANSWER SECTION:
assets.clients-domain.com. 297 IN CNAME distribution-id.cloudfront.net.
distribution-id.cloudfront.net. 57 IN A 13.32.22.20
distribution-id.cloudfront.net. 57 IN A 13.32.22.108
distribution-id.cloudfront.net. 57 IN A 13.32.22.112
distribution-id.cloudfront.net. 57 IN A 13.32.22.152
;; Query time: 22 msec
;; SERVER: 213.46.246.53#53(213.46.246.53)
;; WHEN: Fri Feb 01 17:17:51 CET 2019
;; MSG SIZE rcvd: 156
Now we started to suspect the problem is not on our side after all. However, let’s dissect the output of dig
first.
Dig is a DNS lookup utility program that you can use to gain information on how a domain name is mapped to an IP address. You can pass it several options, such as +cmd
which prints dig’s version and the command you entered to the terminal. To omit it, you can use dig +nocmd assets.clients-domain.com
.
There are several other options such as +short
which will give you a terser, parseable output, or +trace
which will trace the nameservers that were used for the domain name resolution. After the issued command you can also see the ->>HEADER<<-
printed. We either got NXDOMAIN
stating that the domain we are looking for is non-existent, or NOERROR
, when we get back the IP address for the query.
The QUESTION SECTION
reminds us of the domain and subdomains we were looking for and of the fact that we were looking for an A
record, thus essentially for an IP address.
When the DNS resolution fails, we are only given an AUTHORITY SECTION
which tells us that dig was able to find the domain authority (SOA), but was not able to find anything that points to the subdomain.
However, when dig is able to resolve the subdomain, it tells us that it found a CNAME record on the authoritative nameserver pointing to CloudFront, and provides us with the IP addresses of CloudFront’s nameservers, as you can see highlighted below.
;; ANSWER SECTION:
assets.clients-domain.com. 297 IN CNAME distribution-id.cloudfront.net.
distribution-id.cloudfront.net. 57 IN A 13.32.22.20
distribution-id.cloudfront.net. 57 IN A 13.32.22.108
distribution-id.cloudfront.net. 57 IN A 13.32.22.112
distribution-id.cloudfront.net. 57 IN A 13.32.22.152
Now that we understand the output of dig, let’s continue with the investigation.
‘+trace’-ing nameservers
We wanted to see where the domain name resolution got stuck when we encountered failures, so we ran dig +trace assets.clients-domain.com
. Again, we had two different kinds of output. One where the resolution failed:
$ dig +trace assets.clients-domain.com
; <<>> DiG 9.13.5 <<>> +trace assets.clients-domain.com
;; global options: +cmd
. 84782 IN NS h.root-servers.net.
. 84782 IN NS a.root-servers.net.
. 84782 IN NS e.root-servers.net.
. 84782 IN NS l.root-servers.net.
. 84782 IN NS f.root-servers.net.
. 84782 IN NS c.root-servers.net.
. 84782 IN NS g.root-servers.net.
. 84782 IN NS b.root-servers.net.
. 84782 IN NS k.root-servers.net.
. 84782 IN NS j.root-servers.net.
. 84782 IN NS d.root-servers.net.
. 84782 IN NS m.root-servers.net.
. 84782 IN NS i.root-servers.net.
. 84782 IN RRSIG NS 8 0 518400 20190214050000 20190201040000 16749 . K8k6KqovcMQnSsYoh+9rLiBK2423be5fvZb06NdcRz1tGqsigMQEZg2k IzOv9iPmqqcS0eB5mVxdm1NXcoRYuGQcSwTA9yBWcgs1AZxiEMOIJLNT JTxyiClPo2KKFe32pfJN1ljzZhSP26KI+/htBbRsX0qZARs80cfXOo5v ZUMO875h4ldHI9+UbR9PpkFtfmSHINkiatMQRczFScV0e0Zelqwd9QUq mBzU3vnhw+gqtxyTowkQ4hGictt/KVdJDPqkrV0BFqWocmaoryORGDnv 2IRLyV1uNB0jJrXnzwP492L4d1OhSRslGNfJPotsfNY7cMb2Z6xfO9RL Hvylgg==
;; Received 540 bytes from 213.46.246.53#53(213.46.246.53) in 14 ms
com. 172800 IN NS f.gtld-servers.net.
com. 172800 IN NS b.gtld-servers.net.
com. 172800 IN NS h.gtld-servers.net.
com. 172800 IN NS g.gtld-servers.net.
com. 172800 IN NS k.gtld-servers.net.
com. 172800 IN NS c.gtld-servers.net.
com. 172800 IN NS a.gtld-servers.net.
com. 172800 IN NS m.gtld-servers.net.
com. 172800 IN NS j.gtld-servers.net.
com. 172800 IN NS e.gtld-servers.net.
com. 172800 IN NS d.gtld-servers.net.
com. 172800 IN NS l.gtld-servers.net.
com. 172800 IN NS i.gtld-servers.net.
com. 86400 IN DS 30909 8 2 E2D3C916F6DEEAC73294E8268FB5885044A833FC5459588F4A9184CF C41A5766
com. 86400 IN RRSIG DS 8 1 86400 20190214050000 20190201040000 16749 . OTMp18F8zF6IFwUeH4PC4owXLtMIxejaBs+r2PkgIxM5dDtGIj+JXF6R kXmxsQi7FlMhQq/OxU7B3HksQ8CCXVb4rYEo+6vz8ugElRkGKBZf0tkd 4C/JjleSX5kAHdgYnK5m/0bWq4wxMw+R0sSdsbnVmc+Jzv/S3T+Aj4la 0heACCqQYY+2rrGBJ7pxTWjR2JvK8p8NgVvx6k3fZlG0p5QbnajnGMMY vyB/GtYv3uvLnS4JLQvUMU7meIq6hm+wqpI1kp676ypu+KvoqQVFaO/E u4Rbv7ie5CsQOT4H/7jc8pw6IQqqD3FjdFX2yoW4u9pSwy8LrDgYHix7 AielIA==
;; Received 1208 bytes from 192.112.36.4#53(g.root-servers.net) in 55 ms
clients-domain.com. 172800 IN NS ns2.hosting-nameserver.com.
clients-domain.com. 172800 IN NS ns1.hosting-nameserver.com.
CK0POJMG874LJREF7EFN8430QVIT8BSM.com. 86400 IN NSEC3 1 1 0 - CK0Q1GIN43N1ARRC9OSM6QPQR81H5M9A NS SOA RRSIG DNSKEY NSEC3PARAM
CK0POJMG874LJREF7EFN8430QVIT8BSM.com. 86400 IN RRSIG NSEC3 8 2 86400 20190208054243 20190201043243 16883 com. LzTH+svBHgjuQk3FHAvNO72auLxibi3E6gzqFOkT7BwVqEYSLLd6b2fn r+y8A5fYEsR7VDRS6F+fODsOXlvfAR/Dr4oww/8FYMlAG7eYr5g4Bcsv TTnAqdXhPDoJgfZ6+BTodLgbY6tYZWNNnV2wS/iv0xfZ3BpAVnXEqgmD GrE=
FJ6U8VTBKL1C7L340CMMNNFNL3DP254G.com. 86400 IN NSEC3 1 1 0 - FJ70PP8IH3U3OPHIR4INIE39O39HKIVK NS DS RRSIG
FJ6U8VTBKL1C7L340CMMNNFNL3DP254G.com. 86400 IN RRSIG NSEC3 8 2 86400 20190207061242 20190131050242 16883 com. P5v6fKCuxOuzfmR2IXXZgns/m+NkvDJ2Ph4Az/Rbs+VkOV8jTHlPr/FZ k7EvoW06jHUbDLqa0UdY92IFcK/Z0kEO3t76mcQtd/0WXvVQkBHCyb0Q UfaxxPe00oeEh8Ic/6u5Zz/Co0i7rYXoVKQIprTqngs+x3g5luUogp/Y iLE=
;; Received 612 bytes from 192.48.79.30#53(j.gtld-servers.net) in 278 ms
clients-domain.com. 300 IN SOA virga.hosting-nameserver.com. root.virga.anticsdms.com. 2018091202 10800 3600 604800 86400
;; Received 106 bytes from 50.56.75.143#53(ns2.hosting-nameserver.com) in 217 ms
And another when the domain name got resolved properly:
⇣96% ➜ dig +trace assets.clients-domain.com
; <<>> DiG 9.13.5 <<>> +trace assets.clients-domain.com
;; global options: +cmd
. 79456 IN NS e.root-servers.net.
. 79456 IN NS b.root-servers.net.
. 79456 IN NS d.root-servers.net.
. 79456 IN NS j.root-servers.net.
. 79456 IN NS m.root-servers.net.
. 79456 IN NS i.root-servers.net.
. 79456 IN NS l.root-servers.net.
. 79456 IN NS g.root-servers.net.
. 79456 IN NS c.root-servers.net.
. 79456 IN NS k.root-servers.net.
. 79456 IN NS f.root-servers.net.
. 79456 IN NS a.root-servers.net.
. 79456 IN NS h.root-servers.net.
. 79456 IN RRSIG NS 8 0 518400 20190214050000 20190201040000 16749 . K8k6KqovcMQnSsYoh+9rLiBK2423be5fvZb06NdcRz1tGqsigMQEZg2k IzOv9iPmqqcS0eB5mVxdm1NXcoRYuGQcSwTA9yBWcgs1AZxiEMOIJLNT JTxyiClPo2KKFe32pfJN1ljzZhSP26KI+/htBbRsX0qZARs80cfXOo5v ZUMO875h4ldHI9+UbR9PpkFtfmSHINkiatMQRczFScV0e0Zelqwd9QUq mBzU3vnhw+gqtxyTowkQ4hGictt/KVdJDPqkrV0BFqWocmaoryORGDnv 2IRLyV1uNB0jJrXnzwP492L4d1OhSRslGNfJPotsfNY7cMb2Z6xfO9RL Hvylgg==
;; Received 540 bytes from 213.46.246.53#53(213.46.246.53) in 18 ms
com. 172800 IN NS k.gtld-servers.net.
com. 172800 IN NS h.gtld-servers.net.
com. 172800 IN NS g.gtld-servers.net.
com. 172800 IN NS a.gtld-servers.net.
com. 172800 IN NS d.gtld-servers.net.
com. 172800 IN NS m.gtld-servers.net.
com. 172800 IN NS i.gtld-servers.net.
com. 172800 IN NS l.gtld-servers.net.
com. 172800 IN NS e.gtld-servers.net.
com. 172800 IN NS j.gtld-servers.net.
com. 172800 IN NS f.gtld-servers.net.
com. 172800 IN NS c.gtld-servers.net.
com. 172800 IN NS b.gtld-servers.net.
com. 86400 IN DS 30909 8 2 E2D3C916F6DEEAC73294E8268FB5885044A833FC5459588F4A9184CF C41A5766
com. 86400 IN RRSIG DS 8 1 86400 20190214050000 20190201040000 16749 . OTMp18F8zF6IFwUeH4PC4owXLtMIxejaBs+r2PkgIxM5dDtGIj+JXF6R kXmxsQi7FlMhQq/OxU7B3HksQ8CCXVb4rYEo+6vz8ugElRkGKBZf0tkd 4C/JjleSX5kAHdgYnK5m/0bWq4wxMw+R0sSdsbnVmc+Jzv/S3T+Aj4la 0heACCqQYY+2rrGBJ7pxTWjR2JvK8p8NgVvx6k3fZlG0p5QbnajnGMMY vyB/GtYv3uvLnS4JLQvUMU7meIq6hm+wqpI1kp676ypu+KvoqQVFaO/E u4Rbv7ie5CsQOT4H/7jc8pw6IQqqD3FjdFX2yoW4u9pSwy8LrDgYHix7 AielIA==
;; Received 1208 bytes from 199.9.14.201#53(b.root-servers.net) in 188 ms
clients-domain.com. 172800 IN NS ns2.hosting-nameserver.com.
clients-domain.com. 172800 IN NS ns1.hosting-nameserver.com.
CK0POJMG874LJREF7EFN8430QVIT8BSM.com. 86400 IN NSEC3 1 1 0 - CK0Q1GIN43N1ARRC9OSM6QPQR81H5M9A NS SOA RRSIG DNSKEY NSEC3PARAM
CK0POJMG874LJREF7EFN8430QVIT8BSM.com. 86400 IN RRSIG NSEC3 8 2 86400 20190208054243 20190201043243 16883 com. LzTH+svBHgjuQk3FHAvNO72auLxibi3E6gzqFOkT7BwVqEYSLLd6b2fn r+y8A5fYEsR7VDRS6F+fODsOXlvfAR/Dr4oww/8FYMlAG7eYr5g4Bcsv TTnAqdXhPDoJgfZ6+BTodLgbY6tYZWNNnV2wS/iv0xfZ3BpAVnXEqgmD GrE=
FJ6U8VTBKL1C7L340CMMNNFNL3DP254G.com. 86400 IN NSEC3 1 1 0 - FJ70PP8IH3U3OPHIR4INIE39O39HKIVK NS DS RRSIG
FJ6U8VTBKL1C7L340CMMNNFNL3DP254G.com. 86400 IN RRSIG NSEC3 8 2 86400 20190207061242 20190131050242 16883 com. P5v6fKCuxOuzfmR2IXXZgns/m+NkvDJ2Ph4Az/Rbs+VkOV8jTHlPr/FZ k7EvoW06jHUbDLqa0UdY92IFcK/Z0kEO3t76mcQtd/0WXvVQkBHCyb0Q UfaxxPe00oeEh8Ic/6u5Zz/Co0i7rYXoVKQIprTqngs+x3g5luUogp/Y iLE=
;; Received 612 bytes from 192.12.94.30#53(e.gtld-servers.net) in 29 ms
assets.clients-domain.com. 300 IN CNAME distribution-id.cloudfront.net.
;; Received 92 bytes from 162.242.147.111#53(ns1.hosting-nameserver.com) in 268 ms
The majority of both answers are the same. To understand them first let’s take a look at how the DNS system works.
Domain Name System (DNS) in short
There are two types of nameservers: non-authoritative or resolver cache servers and authoritative servers.
First, clients will send a request to the default nameserver, which is either provided by your ISP, or anything you set up in your router’s settings. If you ever fiddled with it, you probably changed it to 8.8.8.8
or 8.8.4.4
which are Google’s resolver cache servers. These will return the cached address for the requested domain if it’s still valid or refer the request up the chain. This chain is what we traced with dig
.
First, we are directed to the root servers. They are at the top of the DNS hierarchy and are the ones that will refer requests to the appropriate TLD servers. In our case, we have a .com
domain, so we are redirected to the nameservers that manage .com
TLDs. It could have an authoritative answer to the query or will have an NS
record, referring to another nameserver down the chain. You can check the authoritative nameservers of any TLD, domain name or even subdomain by using dig -t ns
.
$ dig -t ns com
; <<>> DiG 9.13.5 <<>> -t ns com
;; global options: +cmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 9967
;; flags: qr rd ra; QUERY: 1, ANSWER: 13, AUTHORITY: 0, ADDITIONAL: 1
;; OPT PSEUDOSECTION:
; EDNS: version: 0, flags:; udp: 4096
;; QUESTION SECTION:
;com. IN NS
;; ANSWER SECTION:
com. 86400 IN NS d.gtld-servers.net.
com. 86400 IN NS g.gtld-servers.net.
com. 86400 IN NS l.gtld-servers.net.
com. 86400 IN NS m.gtld-servers.net.
com. 86400 IN NS f.gtld-servers.net.
com. 86400 IN NS k.gtld-servers.net.
com. 86400 IN NS b.gtld-servers.net.
com. 86400 IN NS h.gtld-servers.net.
com. 86400 IN NS a.gtld-servers.net.
com. 86400 IN NS i.gtld-servers.net.
com. 86400 IN NS e.gtld-servers.net.
com. 86400 IN NS j.gtld-servers.net.
com. 86400 IN NS c.gtld-servers.net.
;; Query time: 36 msec
;; SERVER: 213.46.246.53#53(213.46.246.53)
;; WHEN: Mon Feb 04 14:10:51 CET 2019
;; MSG SIZE rcvd: 256
As you can see, these are the same as the ones we got in the second section with +trace
. These namservers contain NS
records pointing to the appropriate domains may it be google.com
or our clients-domain.com
in question.
In both cases, we are referred to
clients-domain.com. 172800 IN NS ns2.hosting-nameserver.com.
clients-domain.com. 172800 IN NS ns1.hosting-nameserver.com.
These are the authoritative nameservers of the client’s domain. During our investigation, we either got stuck here, or we were further referred to CloudFront.
The fact that we got two different nameservers was not surprising as usually there are at least two of them for high-availability and load balancing. But now we started to understand where the problem is coming from. You see, browser have their own DNS cache, to make requests to frequently used domains faster, but curl
does not, and having one would, of course, ruin the purpose in case of dig
. Thus we guessed that browsers cached the server for the requested domain and that’s why we got reliable responses after the first time it worked, but we got 50-50 error rate with from the terminal.
So we thought that maybe the CNAME
record is only present on one of the authoritative nameservers of the client’s hosting service provider. To test that we used nslookup
specifying the nameserver we wanted to use.
Coup de grace with ‘nslookup’
With nslookup
you can query nameservers. To figure out what’s happening we specified the domain name to be resolved with the nameservers of the hosting service, one by one.
$ nslookup assets.clients-domain.com ns1.hosting-nameserver.com
Server: ns1.hosting-nameserver.com
Address: 162.242.147.111#53
assets.clients-domain.com canonical name = distribution-id.cloudfront.net.
** server can't find distribution-id.cloudfront.net: REFUSED
Specifying ns1.hosting-nameserver.com
we got back the CNAME (canonical name) record pointing to CloudFront. Of course, it refused the resolution of distribution-id.cluodfornt.net
as it is not an authoritative nameserver of CloudFront, but at least we saw that this nameserver has the proper record.
$ nslookup assets.clients-domain.com ns2.hosting-nameserver.com
Server: ns2.hosting-nameserver.com
Address: 50.56.75.143#53
** server can't find assets.clients-domain.com: NXDOMAIN
Then when we queried ns2.hosting-nameserver.com
we got NXDOMAIN
just as when the domain name resolution broke with dig
or when we weren’t able to load the images from the browsers. Finally, these results were consistent across different locations no matter how many times we ran nslookup
.
Thus, we were able to conclude that the problem stemmed from the fact that one of the nameservers was missing the CNAME record pointing to CloudFront. We let the client know to handle this with their hosting service, and a day later the issue got resolved.
Conclusion
In this blogpost we described how we saw symptoms of assets loading inconsistently from AWS S3 with CloudFront, how we used curl
to get a more simple answer than what was provided by the browsers. We went on to investigate the issue with dig +trace
only to find out that the DNS resolution was stuck at the authoritative nameserver of the domain. Then we finally put an end to the investigation by using nslookup
querying the two authoritative nameservers one by one.