Location and the Internet – a technical perspective (part 1)

The French national data protection regulator (the CNIL)'s has rejected1 of Google's informal appeal of its earlier notice requiring Google to apply "delisting" on all of its search engine domain names. The CNIL rejected the argument that its notice should only apply to European domain names such as google.fr, stating:2

Geographical extensions are only paths giving access to the processing operation

This article tests this statement.3

Structure of this article

This article is set out in two parts:

  • this part covers routing and IP addresses – i.e. how the paths are calculated to allow information to traverse the Internet, and how to map a geographical location onto that path; and

  • the next part will cover domain names and the DNS system – i.e. how our computers expose this complex routing system to us using friendly 'domain names', and what – if any – relationship these domain names have to geographic location.4

Part 1:

The Internet 101

Conceptually, the Internet is a 'network of networks', each a collection of linked communications devices (routers, switches, servers, computers etc.) under the control of a single legal entity / person.

Quite obviously, to be part of the Internet, these networks have to interconnect. This takes place using a number of common protocols, most of which run atop the TCP/IP protocol.5 TCP/IP is the bedrock of the Internet; it defines the common language by which these devices talk to each other, and by doing so, it and its related protocols6 look after the correct transfer of network traffic around the Internet.

Sitting at my desk, if I request a website for (e.g.) a hotel in Barcelona, I am asking for a path, (more commonly known as a route) from my home network, via my ISP, via the wholesale networks it connects to, via the ISP of the hotel's website operator, to that website's server.

The actual route is determined by technical criteria but these do implicitly take into account the commercial relationships between the relevant networks.7 The route is calculated in real time,8 according to lookup tables that internet routers (the device that sits in between networks and creates a connection between them) hold in their memory.9

Here's an example of such a path, traced using the 'traceroute' software.10

Tracing route to www.iberostar.com [] over a maximum of 30 hops:

1    11 ms    <1 ms    <1 ms  
2    <1 ms    <1 ms    <1 ms  
3     7 ms     6 ms     6 ms  firewall.myisp.com []  
4     1 ms     1 ms     1 ms  router-lhr-1.myisp.com []  
5     1 ms     1 ms     1 ms  mpr1.lhr8.uk.ge-7-1-4.above.net []  
6     1 ms     1 ms     1 ms  ae8.mpr2.lhr2.uk.zip.zayo.com []  
7     1 ms     1 ms     1 ms  ae5.mpr1.lhr15.uk.zip.zayo.com []  
8     1 ms     1 ms     1 ms  zayo-ntt.mpr1.lhr15.uk.zip.zayo.com []  
9     1 ms     1 ms     1 ms  ae-4.r23.londen03.uk.bb.gin.ntt.net []  
10    15 ms    14 ms     8 ms  ae-3.r22.amstnl02.nl.bb.gin.ntt.net []  
11    49 ms    49 ms    48 ms  ae-2.r03.amstnl02.nl.bb.gin.ntt.net []  
12    42 ms    42 ms    42 ms  ae-2.r01.barcsp01.es.bb.gin.ntt.net []  
13    35 ms    35 ms    35 ms  xe-0-0-0-4.r01.barcsp01.es.ce.gin.ntt.net []  
14    44 ms    44 ms    44 ms  po-41-uplink-n5596-02.bcn.es.nexica.net []  
15    45 ms    45 ms    43 ms  nex-host108.eidata.net []  

Location of IP addresses

The key data points here are the IP addresses – the groups of four numbers separated by periods which act as device identifiers in a TCP/IP network. Each IP address belongs to a device through which the relevant packets pass on their travels to – in this case – www.iberostar.com.

The long text is a fully qualified domain name given to each IP address by the administrator of the network to which that IP address belongs. It is used for identification and administration purposes; by convention it often contains the domain name of the ISP as well as the geographical location of the relevant device.

In the above example it is easy enough to see a route that goes from (what we can assume to be) a data centre near London Heathrow (lhr),11 through Amsterdam and ultimately to Barcelona.

The very last entry (known as a "hop") doesn't explicity state 'bcn' for Barcelona. Nevertheless it is reasonable to guess that the entry is a webserver in Barcelona:

  1. It has the same IP address as the website www.iberostar.com therefore it is the server that runs the www.iberostar.com website; and

  2. It has materially the same ping time – the number in milliseconds that shows the time taken for data to reach that IP address from my computer – as the previous hop, number 14, whose fully qualified domain name identifies itself as in 'bcn' – Barcelona.12

Take a look at the following traceroute (shortened, as hops 1 through 9 are similar to the above example):

Tracing route to www.example.chrisjames.uk [] over a maximum of 30 hops  
9     2 ms     2 ms     2 ms  ae-2.r02.londen01.uk.bb.gin.ntt.net []  
10     2 ms     2 ms     2 ms  hds.r02.londen01.uk.bb.gin.ntt.net []  
12     2 ms     2 ms     2 ms  webserver.themoon01.chrisjames.uk []  

It is improbable of course, that I am running http://www.example.chrisjames.uk/ on a webserver on the Moon, or that – with current communications technology – the Moon is less than a millisecond away from London.13 This exemplifies the fact that this data is not a reliable indication of location.

RIPE for a WHOIS query

There's another source of location information for IP addresses: the Regional Internet Registry ("RIR") databases. RIRs are the regional organisations set up to allocate IP addresses. There are five of them:

  • African Network Information Center (AFRINIC) for Africa;
  • American Registry for Internet Numbers (ARIN) for the United States, Canada, several parts of the Caribbean region, and Antarctica;
  • Asia-Pacific Network Information Centre (APNIC) for Asia, Australia, New Zealand, and neighbouring countries;
  • Latin America and Caribbean Network Information Centre (LACNIC) for Latin America and parts of the Caribbean region; and
  • Réseaux IP Européens Network Coordination Centre (RIPE NCC) for Europe, Russia, the Middle East, and Central Asia.14

I suspect that is in Europe, but I can query the "whois" service made available by RIPE NCC to check.15 Here are the results:

Abuse contact info: [email protected]

inetnum: -  
netname:         DIGITALOCEAN-LON-1  
descr:           DigitalOcean London  
country:         GB  
admin-c:         BU332-RIPE  
tech-c:          BU332-RIPE  
status:          ASSIGNED PA  
mnt-by:          digitalocean  
mnt-lower:       digitalocean  
mnt-routes:      digitalocean  
changed:         [email protected] 20140407  
created:         2014-04-07T06:16:03Z  
last-modified:   2014-04-07T06:16:03Z  
source:          RIPE  

It looks like my suspicions are right! The results suggest that the IP block is used for a network in London.

Ownership of IP Addresses

The second thing to notice is that it contains the owner's 'abuse contact'.16 This is the email address the 'owner' of the IP address publishes for spam reports and terms-of-service violations.17

But who do we mean by 'owner' of the IP address? IP addresses belong, as such, to the RIRs, but are allocated to networks who apply for them. Network operators must become members of the RIR and must pay fees in respect of their IP addresses. Network operates can keep their IP addresses for as long as they comply with the RIR's policies.18

RIRs allocate IP address in blocks (groups of IP addresses). Wholesale and retail ISPs will apply for one or more blocks,19 and upon allocation, can then use the IP addresses and – if necessary – subdivide them amongst their clients.

Non ISPs can also apply for blocks if they want to; this allows them to buy their internet capacity on the wholesale market. This can be good for the bottom line, but its main benefit is that of redundancy and business continuity: such organisation can buy multiple routes to their blocks by buying interconnection with multiple different wholesale ISPs. This means that the internet will 'route around' any downtime suffered by an individual ISP.20

In this case, the block of IP addresses to which belongs is identified as belonging to DigitalOcean (a well-known cloud computing IaaS provider).

Why is this relevant? Again the information exposed in the WHOIS record is provided by the owner of that block of IP addresses, and is only as reliable as the information they provide.

In this case it is clear that DigitalOcean has been a good Internet citizen; we had a hunch the IP address was in London from the trace route, and this accords with its entry in the RIR's records.

In other cases, with other networks, it is not uncommon to see a mismatch between the described location of an IP address and what we know or suspect its actual location to be.21 IP address of this kind22 are almost all allocated; there is a healthy secondary market in blocks and sometimes the RIR entries are not updated in a timely manner, or kept accurate at all.

The upshot of this: it is reasonably easy to deduce roughly where an IP address (or at least the device to which it is allocated) is located geographically, but it is very difficult to do so definitively in every case.

Final notes

It has become important to know the location of an IP address for more than just data protection reasons. Websites may want to know the location of an IP address of a visitor, for example, to display the content in the appropriate language or (for e-commerce sites) use the appropriate currency.

There exists a commercial need therefore to automate the above 'lookups' and provide this information via an API.23 This is a commercial need that is well met on the Internet. Using publicly available data from the RIRs, some even go so far as providing approximate latitude and longitude of the IP address in question, where available.24

As I explore in Part 2, a single website located in one country can be reached by a number of domain names and will often tailor its content both to the domain name used and to the IP address of the visitor.

  1. http://www.cnil.fr/english/news-and-events/news/article/right-to-delisting-google-informal-appeal-rejected/

  2. For more detail, I recommend Laurence Eastham's Editor's Blog piece at http://www.scl.org/site.aspx?i=bp44076

  3. I.e. the way in which data is routed over the internet.

  4. I ask you to give me this over-abstraction for free.

  5. TCP/IP is actually a suite of protocols which run over the 'Internet protocol' network layer. This uses packet switching, where data are allocated into packets or 'datagrams', see https://tools.ietf.org/html/rfc1594

  6. See further, the 'Internet protocol suite' box out at https://en.wikipedia.org/wiki/Internet_protocol_suite

  7. The routes are determined by which other networks the network has a connection to. These connections are typically classified into either Peering, where networks choose to exchange data with each other either on a paid for or free basis – bilaterally or via an 'internet exchange'; or transit, where networks buy connections on a wholesale basis. Typically, larger networks, such as large ISPs, content networks such as the BBC, and wholesalers, can peer. Smaller networks need to buy transit from the wholesalers.

  8. These routes can differ for each packet that is transmitted; the packets are numbered and reassembled into their correct order at the destination, regardless of what route they took over the Internet.

  9. At its most simple, each network only needs to know a route to the next network in the chain; a protocol called BGP – Border Gateway Protocol – runs on routers that connect major networks, to help them work out, between them, the best 'next network' to pass the data to, in order to achieve an efficient route.

  10. You can run a traceroute yourself, see https://support.cloudflare.com/hc/en-us/articles/200169336-How-do-I-run-a-traceroute-

  11. People who work in the internet industry – especially network engineers – are often able to deduce the actual physical location of the hardware, based on this information and public and private information about which companies operate out of which data centres, and the naming conventions. For example 'sov' in the context of a London data centre refers to specific data centre operated by Telecity Group - Sovereign House – one of the interconnection points ("points of presence" or "PoPs") of the London Internet Exchange ("LINX"). The LINX website has a list of members showing at which PoPs they peer: https://www.linx.net/pubtools/member-techlist.html

  12. With current technology, there are theoretical minimum latencies depending on the type of material used. Even without processing overhead, for example, minimum latency in fibre optics is dictated by the speed of light. See http://www.lightwaveonline.com/articles/print/volume-29/issue-6/feature/network-latency-how-low-can-you-go.html

  13. That would be taking 'Moonshots' too literally. https://www.google.co.uk/about/careers/lifeatgoogle/thinking-big-larry-page.html. Internet traffic does sometimes go via space, of course, if it is sent via Satellite, however latency (ping time) is significant. Try a traceroute to the website of the Ascension Islands' telco Sure Telecom (tracert www.sure.co.ac) and you'll see latency of roughly >500ms as your data is beamed via satellite to the island of Saint Helena.

  14. See https://en.wikipedia.org/wiki/Regional_Internet_registry, which includes links to each RIR.

  15. This is the same information protocol that lawyers who conduct due diligence on domain names might use. 'Whois' is often misconceived as a website; it is not. Any website advertising itself as 'whois' is simply a gateway from the web to the whois protocol.

  16. It also includes the owner's RIPE tag, "digitalocean", which is another way to determine the registrant of the blocks. Further, networks are allocated into Autonomous Systems ("AS") and each network operator will have one or more AS numbers, all information available from the RIRs' WHOIS databases.

  17. This can be useful for IT lawyers as the network engineers who man these email addresses are typically quite responsive to cybersecurity or IP infringement reports.

  18. See https://www.ripe.net/manage-ips-and-asns/resource-management/faq/faq-ipv4-address-space

  19. Grouped together into their "Autonomous System – as noted above.

  20. "Multi-homing" refers to the practice of buying more than one interconnection with a transit provider. Most ISPs will typically carry two or three transit providers, and also peer either bilaterally or on Internet exchanges such as LINX. The downside of this practice is that it is technically and administratively more complex and therefore expensive. Almost all credible ISPs are multihomed but many companies – both end users and for example cloud providers – prefer to be single-homed to an ISP that they trust to manage their connectivity reliably.

  21. By deduction and/or industry knowledge, as described above.

  22. Allocated according to version 4 of the Internet protocol, aka IPv4.

  23. https://en.wikipedia.org/wiki/Application_programming_interface

  24. See http://stackoverflow.com/questions/2663371/longitude-and-latitude-value-from-ip-address