In the following blogpost, we will walk you through how we chased down a DNS resolution issue for one of our clients. Even though the problem at hand was very specific, you might find the steps we took during the investigation useful. Also, the tools we used might also prove to be helpful in case you'd face something similar in the future. We will also discuss how the Domain Name System (works), so buckle up!


The Symptoms

Our client hosts their static assets and webshop at a hosting provider. They also provide a Wordpress like CMS and nameservers as well. The client wanted to reduce costs, so they asked us to move their static assets to AWS S3 and setup CloudFront to cache them. The client set up a CNAME record pointing to the CloudFront, thus it was available at assets.clients-domain.com as well as distribution-id.cloudfront.net

Setting up CloudFront was pretty straightforward. However, we were pretty surprised when we got a ticket claiming that images are not visible when the site is visited from a mobile browser. Another subcontractor, who handles the development of the static pages reported that they can only access them from certain locations, but from others, they are not available.


Forming the DNS resolution error hypothesis, or why 'curl' is better than browsers

The first method we tried to reproduce the error was accessing the resource from the browser at https://assets.clients-domain.com/img/test-image.png. It was pretty difficult as browsers had no problem loading the assets in our office. Then we used a VPN to test them from other locations.

The results were inconsistent: with disabling the browser cache the images we tested were loaded from one location without issues, from others it failed at first with 502 Bad Gateway. Then, at some point, it started working and we weren't able to break it again no matter how hard we tried. Then we tried using curl. 50% of the time it worked, but the other 50% it reported:

$ curl https://assets.clients-domain.com/img/test-image.png --output test.png
% Total	% Received % Xferd  Average Speed   Time	Time 	Time  Current
                             	Dload  Upload   Total   Spent	Left  Speed
  0 	0	0 	0	0 	0  	0  	0 --:--:-- --:--:-- --:--:-- 	0curl: (6) Could not resolve host: assets.clients-domain.com

Once we saw Could not resolve host: assets.clients-domain.com it was clear that we are facing a DNS issue. Or at least it was an educated guess worth verifying.


'dig'-ing deep for verification

To verify our hypothesis, we tried to reach CloudFront straight. It worked fine, so we knew we were on the right track..

First, we thought there might be a problem with the way we set up the CNAME record in CloudFront, so we started digging. We opened two panels in our terminals next to each other and ran watch curl https://assets.clients-domain.com/img/test-image.png --output test.png and watch dig assets.clients-domain.com.

dig reported the following when curl failed to reach the server:

$ watch dig assets.clients-domain.com
; <<>> DiG 9.13.5 <<>> assets.clients-domain.com
;; global options: +cmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: NXDOMAIN, id: 24152
;; flags: qr rd ra; QUERY: 1, ANSWER: 0, AUTHORITY: 1, ADDITIONAL: 1

;; OPT PSEUDOSECTION:
; EDNS: version: 0, flags:; udp: 4096
;; QUESTION SECTION:
;assets.clients-ip.com.   	 IN    A

;; AUTHORITY SECTION:
clients-ip.com.   	 300    IN    SOA    virga.hosting-service.com. root.virga.hosting-service.com. 2018091202 10800 3600 604800 86400

;; Query time: 183 msec
;; SERVER: 213.46.246.53#53(213.46.246.53)
;; WHEN: Fri Feb 01 17:18:12 CET 2019
;; MSG SIZE  rcvd: 106

When we got a proper answer section, curl managed to download the asset.

$ watch dig assets.clients-domain.com

; <<>> DiG 9.13.5 <<>> assets.clients-domain.com
;; global options: +cmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 51530
;; flags: qr rd ra; QUERY: 1, ANSWER: 5, AUTHORITY: 0, ADDITIONAL: 1

;; OPT PSEUDOSECTION:
; EDNS: version: 0, flags:; udp: 4096
;; QUESTION SECTION:
;assets.clients-domain.com.   	 IN    A

;; ANSWER SECTION:
assets.clients-domain.com.    297    IN    CNAME    distribution-id.cloudfront.net.
distribution-id.cloudfront.net. 57 IN    A    13.32.22.20
distribution-id.cloudfront.net. 57 IN    A    13.32.22.108
distribution-id.cloudfront.net. 57 IN    A    13.32.22.112
distribution-id.cloudfront.net. 57 IN    A    13.32.22.152

;; Query time: 22 msec
;; SERVER: 213.46.246.53#53(213.46.246.53)
;; WHEN: Fri Feb 01 17:17:51 CET 2019
;; MSG SIZE  rcvd: 156

Now we started to suspect the problem is not on our side after all. However, let's dissect the output of dig first.

Dig is a DNS lookup utility program that you can use to gain information on how a domain name is mapped to an IP address. You can pass it several options, such as +cmd which prints dig's version and the command you entered to the terminal. To omit it, you can use dig +nocmd assets.clients-domain.com.

domain-name-system-diagram

There are several other options such as +short which will give you a terser, parseable output, or +trace which will trace the nameservers that were used for the domain name resolution. After the issued command you can also see the ->>HEADER<<- printed. We either got NXDOMAIN stating that the domain we are looking for is non-existent, or NOERROR, when we get back the IP address for the query.

The QUESTION SECTION reminds us of the domain and subdomains we were looking for and of the fact that we were looking for an A record, thus essentially for an IP address.

When the DNS resolution fails, we are only given an AUTHORITY SECTION which tells us that dig was able to find the domain authority (SOA), but was not able to find anything that points to the subdomain.

However, when dig is able to resolve the subdomain, it tells us that it found a CNAME record on the authoritative nameserver pointing to CloudFront, and provides us with the IP addresses of CloudFront’s nameservers, as you can see highlighted below.

;; ANSWER SECTION:
assets.clients-domain.com.    297    IN    CNAME    distribution-id.cloudfront.net.
distribution-id.cloudfront.net. 57 IN    A    13.32.22.20
distribution-id.cloudfront.net. 57 IN    A    13.32.22.108
distribution-id.cloudfront.net. 57 IN    A    13.32.22.112
distribution-id.cloudfront.net. 57 IN    A    13.32.22.152

Now that we understand the output of dig, let's continue with the investigation.


'+trace'-ing nameservers

We wanted to see where the domain name resolution got stuck when we encountered failures, so we ran dig +trace assets.clients-domain.com. Again, we had two different kinds of output. One where the resolution failed:

$ dig +trace assets.clients-domain.com

; <<>> DiG 9.13.5 <<>> +trace assets.clients-domain.com
;; global options: +cmd
.   		 84782    IN    NS    h.root-servers.net.
.   		 84782    IN    NS    a.root-servers.net.
.   		 84782    IN    NS    e.root-servers.net.
.   		 84782    IN    NS    l.root-servers.net.
.   		 84782    IN    NS    f.root-servers.net.
.   		 84782    IN    NS    c.root-servers.net.
.   		 84782    IN    NS    g.root-servers.net.
.   		 84782    IN    NS    b.root-servers.net.
.   		 84782    IN    NS    k.root-servers.net.
.   		 84782    IN    NS    j.root-servers.net.
.   		 84782    IN    NS    d.root-servers.net.
.   		 84782    IN    NS    m.root-servers.net.
.   		 84782    IN    NS    i.root-servers.net.
.   		 84782    IN    RRSIG    NS 8 0 518400 20190214050000 20190201040000 16749 . K8k6KqovcMQnSsYoh+9rLiBK2423be5fvZb06NdcRz1tGqsigMQEZg2k IzOv9iPmqqcS0eB5mVxdm1NXcoRYuGQcSwTA9yBWcgs1AZxiEMOIJLNT JTxyiClPo2KKFe32pfJN1ljzZhSP26KI+/htBbRsX0qZARs80cfXOo5v ZUMO875h4ldHI9+UbR9PpkFtfmSHINkiatMQRczFScV0e0Zelqwd9QUq mBzU3vnhw+gqtxyTowkQ4hGictt/KVdJDPqkrV0BFqWocmaoryORGDnv 2IRLyV1uNB0jJrXnzwP492L4d1OhSRslGNfJPotsfNY7cMb2Z6xfO9RL Hvylgg==
;; Received 540 bytes from 213.46.246.53#53(213.46.246.53) in 14 ms

com.   		 172800    IN    NS    f.gtld-servers.net.
com.   		 172800    IN    NS    b.gtld-servers.net.
com.   		 172800    IN    NS    h.gtld-servers.net.
com.   		 172800    IN    NS    g.gtld-servers.net.
com.   		 172800    IN    NS    k.gtld-servers.net.
com.   		 172800    IN    NS    c.gtld-servers.net.
com.   		 172800    IN    NS    a.gtld-servers.net.
com.   		 172800    IN    NS    m.gtld-servers.net.
com.   		 172800    IN    NS    j.gtld-servers.net.
com.   		 172800    IN    NS    e.gtld-servers.net.
com.   		 172800    IN    NS    d.gtld-servers.net.
com.   		 172800    IN    NS    l.gtld-servers.net.
com.   		 172800    IN    NS    i.gtld-servers.net.
com.   		 86400    IN    DS    30909 8 2 E2D3C916F6DEEAC73294E8268FB5885044A833FC5459588F4A9184CF C41A5766
com.   		 86400    IN    RRSIG    DS 8 1 86400 20190214050000 20190201040000 16749 . OTMp18F8zF6IFwUeH4PC4owXLtMIxejaBs+r2PkgIxM5dDtGIj+JXF6R kXmxsQi7FlMhQq/OxU7B3HksQ8CCXVb4rYEo+6vz8ugElRkGKBZf0tkd 4C/JjleSX5kAHdgYnK5m/0bWq4wxMw+R0sSdsbnVmc+Jzv/S3T+Aj4la 0heACCqQYY+2rrGBJ7pxTWjR2JvK8p8NgVvx6k3fZlG0p5QbnajnGMMY vyB/GtYv3uvLnS4JLQvUMU7meIq6hm+wqpI1kp676ypu+KvoqQVFaO/E u4Rbv7ie5CsQOT4H/7jc8pw6IQqqD3FjdFX2yoW4u9pSwy8LrDgYHix7 AielIA==
;; Received 1208 bytes from 192.112.36.4#53(g.root-servers.net) in 55 ms

clients-domain.com.   	 172800    IN    NS    ns2.hosting-nameserver.com.
clients-domain.com.   	 172800    IN    NS    ns1.hosting-nameserver.com.
CK0POJMG874LJREF7EFN8430QVIT8BSM.com. 86400 IN NSEC3 1 1 0 - CK0Q1GIN43N1ARRC9OSM6QPQR81H5M9A NS SOA RRSIG DNSKEY NSEC3PARAM
CK0POJMG874LJREF7EFN8430QVIT8BSM.com. 86400 IN RRSIG NSEC3 8 2 86400 20190208054243 20190201043243 16883 com. LzTH+svBHgjuQk3FHAvNO72auLxibi3E6gzqFOkT7BwVqEYSLLd6b2fn r+y8A5fYEsR7VDRS6F+fODsOXlvfAR/Dr4oww/8FYMlAG7eYr5g4Bcsv TTnAqdXhPDoJgfZ6+BTodLgbY6tYZWNNnV2wS/iv0xfZ3BpAVnXEqgmD GrE=
FJ6U8VTBKL1C7L340CMMNNFNL3DP254G.com. 86400 IN NSEC3 1 1 0 - FJ70PP8IH3U3OPHIR4INIE39O39HKIVK NS DS RRSIG
FJ6U8VTBKL1C7L340CMMNNFNL3DP254G.com. 86400 IN RRSIG NSEC3 8 2 86400 20190207061242 20190131050242 16883 com. P5v6fKCuxOuzfmR2IXXZgns/m+NkvDJ2Ph4Az/Rbs+VkOV8jTHlPr/FZ k7EvoW06jHUbDLqa0UdY92IFcK/Z0kEO3t76mcQtd/0WXvVQkBHCyb0Q UfaxxPe00oeEh8Ic/6u5Zz/Co0i7rYXoVKQIprTqngs+x3g5luUogp/Y iLE=
;; Received 612 bytes from 192.48.79.30#53(j.gtld-servers.net) in 278 ms

clients-domain.com.   	 300    IN    SOA    virga.hosting-nameserver.com. root.virga.anticsdms.com. 2018091202 10800 3600 604800 86400
;; Received 106 bytes from 50.56.75.143#53(ns2.hosting-nameserver.com) in 217 ms

And another when the domain name got resolved properly:

⇣96% ➜ dig +trace assets.clients-domain.com

; <<>> DiG 9.13.5 <<>> +trace assets.clients-domain.com
;; global options: +cmd
.   		 79456    IN    NS    e.root-servers.net.
.   		 79456    IN    NS    b.root-servers.net.
.   		 79456    IN    NS    d.root-servers.net.
.   		 79456    IN    NS    j.root-servers.net.
.   		 79456    IN    NS    m.root-servers.net.
.   		 79456    IN    NS    i.root-servers.net.
.   		 79456    IN    NS    l.root-servers.net.
.   		 79456    IN    NS    g.root-servers.net.
.   		 79456    IN    NS    c.root-servers.net.
.   		 79456    IN    NS    k.root-servers.net.
.   		 79456    IN    NS    f.root-servers.net.
.   		 79456    IN    NS    a.root-servers.net.
.   		 79456    IN    NS    h.root-servers.net.
.   		 79456    IN    RRSIG    NS 8 0 518400 20190214050000 20190201040000 16749 . K8k6KqovcMQnSsYoh+9rLiBK2423be5fvZb06NdcRz1tGqsigMQEZg2k IzOv9iPmqqcS0eB5mVxdm1NXcoRYuGQcSwTA9yBWcgs1AZxiEMOIJLNT JTxyiClPo2KKFe32pfJN1ljzZhSP26KI+/htBbRsX0qZARs80cfXOo5v ZUMO875h4ldHI9+UbR9PpkFtfmSHINkiatMQRczFScV0e0Zelqwd9QUq mBzU3vnhw+gqtxyTowkQ4hGictt/KVdJDPqkrV0BFqWocmaoryORGDnv 2IRLyV1uNB0jJrXnzwP492L4d1OhSRslGNfJPotsfNY7cMb2Z6xfO9RL Hvylgg==
;; Received 540 bytes from 213.46.246.53#53(213.46.246.53) in 18 ms

com.   		 172800    IN    NS    k.gtld-servers.net.
com.   		 172800    IN    NS    h.gtld-servers.net.
com.   		 172800    IN    NS    g.gtld-servers.net.
com.   		 172800    IN    NS    a.gtld-servers.net.
com.   		 172800    IN    NS    d.gtld-servers.net.
com.   		 172800    IN    NS    m.gtld-servers.net.
com.   		 172800    IN    NS    i.gtld-servers.net.
com.   		 172800    IN    NS    l.gtld-servers.net.
com.   		 172800    IN    NS    e.gtld-servers.net.
com.   		 172800    IN    NS    j.gtld-servers.net.
com.   		 172800    IN    NS    f.gtld-servers.net.
com.   		 172800    IN    NS    c.gtld-servers.net.
com.   		 172800    IN    NS    b.gtld-servers.net.
com.   		 86400    IN    DS    30909 8 2 E2D3C916F6DEEAC73294E8268FB5885044A833FC5459588F4A9184CF C41A5766
com.   		 86400    IN    RRSIG    DS 8 1 86400 20190214050000 20190201040000 16749 . OTMp18F8zF6IFwUeH4PC4owXLtMIxejaBs+r2PkgIxM5dDtGIj+JXF6R kXmxsQi7FlMhQq/OxU7B3HksQ8CCXVb4rYEo+6vz8ugElRkGKBZf0tkd 4C/JjleSX5kAHdgYnK5m/0bWq4wxMw+R0sSdsbnVmc+Jzv/S3T+Aj4la 0heACCqQYY+2rrGBJ7pxTWjR2JvK8p8NgVvx6k3fZlG0p5QbnajnGMMY vyB/GtYv3uvLnS4JLQvUMU7meIq6hm+wqpI1kp676ypu+KvoqQVFaO/E u4Rbv7ie5CsQOT4H/7jc8pw6IQqqD3FjdFX2yoW4u9pSwy8LrDgYHix7 AielIA==
;; Received 1208 bytes from 199.9.14.201#53(b.root-servers.net) in 188 ms

clients-domain.com.   	 172800    IN    NS    ns2.hosting-nameserver.com.
clients-domain.com.   	 172800    IN    NS    ns1.hosting-nameserver.com.
CK0POJMG874LJREF7EFN8430QVIT8BSM.com. 86400 IN NSEC3 1 1 0 - CK0Q1GIN43N1ARRC9OSM6QPQR81H5M9A NS SOA RRSIG DNSKEY NSEC3PARAM
CK0POJMG874LJREF7EFN8430QVIT8BSM.com. 86400 IN RRSIG NSEC3 8 2 86400 20190208054243 20190201043243 16883 com. LzTH+svBHgjuQk3FHAvNO72auLxibi3E6gzqFOkT7BwVqEYSLLd6b2fn r+y8A5fYEsR7VDRS6F+fODsOXlvfAR/Dr4oww/8FYMlAG7eYr5g4Bcsv TTnAqdXhPDoJgfZ6+BTodLgbY6tYZWNNnV2wS/iv0xfZ3BpAVnXEqgmD GrE=
FJ6U8VTBKL1C7L340CMMNNFNL3DP254G.com. 86400 IN NSEC3 1 1 0 - FJ70PP8IH3U3OPHIR4INIE39O39HKIVK NS DS RRSIG
FJ6U8VTBKL1C7L340CMMNNFNL3DP254G.com. 86400 IN RRSIG NSEC3 8 2 86400 20190207061242 20190131050242 16883 com. P5v6fKCuxOuzfmR2IXXZgns/m+NkvDJ2Ph4Az/Rbs+VkOV8jTHlPr/FZ k7EvoW06jHUbDLqa0UdY92IFcK/Z0kEO3t76mcQtd/0WXvVQkBHCyb0Q UfaxxPe00oeEh8Ic/6u5Zz/Co0i7rYXoVKQIprTqngs+x3g5luUogp/Y iLE=
;; Received 612 bytes from 192.12.94.30#53(e.gtld-servers.net) in 29 ms

assets.clients-domain.com.    300    IN    CNAME    distribution-id.cloudfront.net.
;; Received 92 bytes from 162.242.147.111#53(ns1.hosting-nameserver.com) in 268 ms

The majority of both answers are the same. To understand them first let's take a look at how the DNS system works.


Domain Name System (DNS) in short

There are two types of nameservers: non-authoritative or resolver cache servers and authoritative servers.

First, clients will send a request to the default nameserver, which is either provided by your ISP, or anything you set up in your router's settings. If you ever fiddled with it, you probably changed it to 8.8.8.8 or 8.8.4.4 which are Google's resolver cache servers. These will return the cached address for the requested domain if it's still valid or refer the request up the chain. This chain is what we traced with dig.

First, we are directed to the root servers. They are at the top of the DNS hierarchy and are the ones that will refer requests to the appropriate TLD servers. In our case, we have a .com domain, so we are redirected to the nameservers that manage .com TLDs. It could have an authoritative answer to the query or will have an NS record, referring to another nameserver down the chain. You can check the authoritative nameservers of any TLD, domain name or even subdomain by using dig -t ns.

$ dig -t ns com

; <<>> DiG 9.13.5 <<>> -t ns com
;; global options: +cmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 9967
;; flags: qr rd ra; QUERY: 1, ANSWER: 13, AUTHORITY: 0, ADDITIONAL: 1

;; OPT PSEUDOSECTION:
; EDNS: version: 0, flags:; udp: 4096
;; QUESTION SECTION:
;com.   			 IN    NS

;; ANSWER SECTION:
com.   		 86400    IN    NS    d.gtld-servers.net.
com.   		 86400    IN    NS    g.gtld-servers.net.
com.   		 86400    IN    NS    l.gtld-servers.net.
com.   		 86400    IN    NS    m.gtld-servers.net.
com.   		 86400    IN    NS    f.gtld-servers.net.
com.   		 86400    IN    NS    k.gtld-servers.net.
com.   		 86400    IN    NS    b.gtld-servers.net.
com.   		 86400    IN    NS    h.gtld-servers.net.
com.   		 86400    IN    NS    a.gtld-servers.net.
com.   		 86400    IN    NS    i.gtld-servers.net.
com.   		 86400    IN    NS    e.gtld-servers.net.
com.   		 86400    IN    NS    j.gtld-servers.net.
com.   		 86400    IN    NS    c.gtld-servers.net.

;; Query time: 36 msec
;; SERVER: 213.46.246.53#53(213.46.246.53)
;; WHEN: Mon Feb 04 14:10:51 CET 2019
;; MSG SIZE  rcvd: 256

As you can see, these are the same as the ones we got in the second section with +trace. These namservers contain NS records pointing to the appropriate domains may it be google.com or our clients-domain.com in question.

In both cases, we are referred to

 clients-domain.com.   172800  IN  NS  ns2.hosting-nameserver.com.
 clients-domain.com.   172800  IN  NS  ns1.hosting-nameserver.com.

These are the authoritative nameservers of the client's domain. During our investigation, we either got stuck here, or we were further referred to CloudFront.

The fact that we got two different nameservers was not surprising as usually there are at least two of them for high-availability and load balancing. But now we started to understand where the problem is coming from. You see, browser have their own DNS cache, to make requests to frequently used domains faster, but curl does not, and having one would, of course, ruin the purpose in case of dig. Thus we guessed that browsers cached the server for the requested domain and that's why we got reliable responses after the first time it worked, but we got 50-50 error rate with from the terminal.

So we thought that maybe the CNAME record is only present on one of the authoritative nameservers of the client's hosting service provider. To test that we used nslookup specifying the nameserver we wanted to use.


Coup de grace with 'nslookup'

With nslookup you can query nameservers. To figure out what's happening we specified the domain name to be resolved with the nameservers of the hosting service, one by one.

$ nslookup assets.clients-domain.com ns1.hosting-nameserver.com
Server:   	 ns1.hosting-nameserver.com
Address:    162.242.147.111#53

assets.clients-domain.com    canonical name = distribution-id.cloudfront.net.
** server can't find distribution-id.cloudfront.net: REFUSED

Specifying ns1.hosting-nameserver.com we got back the CNAME (canonical name) record pointing to CloudFront. Of course, it refused the resolution of distribution-id.cluodfornt.net as it is not an authoritative nameserver of CloudFront, but at least we saw that this nameserver has the proper record.

$ nslookup assets.clients-domain.com ns2.hosting-nameserver.com
Server:   	 ns2.hosting-nameserver.com
Address:    50.56.75.143#53

** server can't find assets.clients-domain.com: NXDOMAIN

Then when we queried ns2.hosting-nameserver.com we got NXDOMAIN just as when the domain name resolution broke with dig or when we weren't able to load the images from the browsers. Finally, these results were consistent across different locations no matter how many times we ran nslookup.

Thus, we were able to conclude that the problem stemmed from the fact that one of the nameservers was missing the CNAME record pointing to CloudFront. We let the client know to handle this with their hosting service, and a day later the issue got resolved.


Conclusion

In this blogpost we described how we saw symptoms of assets loading inconsistently from AWS S3 with CloudFront, how we used curl to get a more simple answer than what was provided by the browsers. We went on to investigate the issue with dig +trace only to find out that the DNS resolution was stuck at the authoritative nameserver of the domain. Then we finally put an end to the investigation by using nslookup querying the two authoritative nameservers one by one.