How Does the Internet Actually Work ?

Introduction

We all interact with the ‘net’ daily in one way or another, whether it’s via our smartphone, on a laptop or desktop pc, or just at 'arms length' when using a service that involves internet data transmission of some kind.

Many of our more recently acquired household devices now interface with the internet in some way (e.g doorbells, home heating systems, etc..).

We are now almost totally dependent on the internet's continued function to fulfil our increasingly complex everyday lives - our society just wouldn't work properly without it nowadays. One of the down sides to all this is that it makes us highly vulnerable to cyber-attacks, or indeed any natural event which interferes with the net's infrastructure - for anyone who would like to find out more about this aspect of our vulnearbility as a society, check out my recent blog on modern warfare

How many of us really understand how the net really works ? And do we really need to ?

It is, of course, as with many modern inventions, perfectly possible to use the internet without having a clue about its internal workings. And I suspect the vast majority of people do just that. Over the last 30 years or so the increasing complexity of life in general and our technology in particular has turned us into ‘passive’ consumers with little knowledge of how to repair or even maintain items we rely on every day.

Our cars are a good example of this trend. In the ‘good old days’ (i.e. pre 1990s!) most car owners knew the basics of how their cars worked, and could usually carry out essential servicing, and could get them going again using simple tools and procedures when minor problems cropped up. Today’s microprocessor-controlled vehicles are an order of magnitude more complex and essentially ‘not user serviceable’, often requiring specialist instrumentation just to diagnose what’s wrong with them, let alone actually ‘fix’ them. The plethora of different designs and combinations of power source developed as we attempt to wean ourselves off fossil fuels has merely added to the complexity.

So it has been with personal computers and the internet. Those of us of a certain age who experienced the advent of affordable personal computers first hand in the 1980s and '90s usually learned to code as part of the process. This was simply because we had to in order to get the rather primitive 'beasts' that they then were to actually do anything useful. Today’s smartphones and pcs, with their continuously-evolving operating systems, software, firmware and hardware are a mystery to most, and few of us have the audacity to mess with them…or indeed have the incentive (or permissions!) to do so. (As an aside, Governments have in recent years become quite worried about the shortage of programmers, although I suspect many of them have more urgent concerns to deal with just now).

The internet presents a similar problem – and disincentives.

But for all its complexity, it is worth understanding the basic principles of how it operates, precisely because it is such a ubiquitous element in our daily lives. Although we can’t do much to influence its infrastructure or its inner workings ourselves, with a little knowledge of how it all functions, we can end up feeling a bit less helpless when things go wrong, as they often do...and perhaps even be able do something about it.

To illustrate some of the basics, I’ve attempted to describe what actually happens when we ask our internet browser to carry out the apparently simple task of accessing and displaying the homepage of a website. Along the way I hope to demystify some of the abbreviations regularly bandied about by those 'in the know’.

This isn’t by any means intended as a definitive ‘manual’ on using the internet, but hopefully it should be quite revealing and informative, and may even inspire the interested reader to delve more deeply into its mysteries themselves…

What Happens When we 'Call Up' a Website ?

Firstly, what components are involved in the process of viewing and interacting with a ‘common or garden’ website ?

· A computer, smartphone or tablet with an installed internet browser software package e.g. Chrome, Firefox

· Your computer’s operating system (OS; i.e.Windows, Mac, Linux, Chrome) or Smartphone/Tablet OS (usually Android or Ios)

· An Internet Service Provider (ISP), providing an active functioning link between your router and the internet. Nowadays, this is generally what's known as 'broadband', and this is delivered to your home either via optical fibre or copper phone lines, or a combination of the two. (In the case of phones and tablets it can also be provided wirelessly via the mobile phone network)

· The server where the website is hosted, and any services running on that server, and any others required to complete the process (see below for details).

What are the steps involved for the browser ?

1. 1) Look up the location of the server hosting the website

2. 2) Make a connection to the server

3. 3) Send a request to get the specific page

4. 4) Deal with the response from the server by translating the instructions the website's HTML code sends back to your browser into a page display you can understand.

To understand how these 4 steps work in more detail, we need first to look at the relationship between websites, servers, and so-called Internet Protocol (IP) addresses.

Websites are collections of electronic 'files' containing digital information written in one or other high-level programming language e.g. HyperText Markup Language (HTML), Cascading Style Sheets (CSS), Javascript, etc., that tell your browser how to display the website itself, and any associated images and links to other websites. They can also provide instructions on various types of calculation and processes if the website does anything other than just display pages.

To enable internet access by anyone from anywhere in the world, these files need to be stored on an external computer which is connected to the Internet, usually referred to as a ‘server’. There are many of these servers, all connected to one another via the internet, and they are distributed across the globe in virtually every territory on earth. This allows redundancy i.e. individual servers can be taken off line without the whole system collapsing, which is of course crucial for the system to be workable. It also gets round the problem of 'territorial withdrawal', which could be an issue if servers were bunched together in a small number of territories, whose authorities suddenly withdrew their service (particularly relevant in these times of geopolitical upheaval !).

1. Looking up the server hosting your website

When you supply your browser with the Uniform Resource Locator (URL) of the website you want to visit (e.g. https://vivweb01.blogspot.com for this blog), your browser first has to work out which server on the Internet is actually hosting the site. It does this by looking up what’s called the ‘Domain’ name, i.e. ‘vivweb01.blogspot.com’ in this case, to find its IP address. A key point to note here is that every device on the Internet — whether it be servers, cell phones, smart refrigerators — has a unique address called an Internet Protocol (IP) address. This address contains four numbered parts: e.g. 142.250.187.193. (Hint: you can find the IP address of any website by entering the ‘ping’ command followed by the domain name at the DOS command prompt – try it and see). IP addresses can be static, but are more commonly dynamic (i.e. assigned to different devices at different times); they are always unique to one specific device at a given time.

The Domain Name System (DNS), that does the translation is like the ‘Contacts’ app on a smartphone. DNS enables our browsers to find the appropriate server anywhere on the Internet. The lookup process is the same whether you physically enter the complete URL in your browsers command line or just click a hyperlink containing a valid URL somewhere on the page being displayed.

The inner workings of the DNS are complex, but they must be fast to enable acceptably rapid access for the user – I won’t attempt to elaborate on them here. One thing worth knowing, though, is that DNS data is ‘cached’ (i.e. stored temporarily) at different layers between your browser and at various places across the Internet, to help speed up the lookup process.

Your browser stores data from each website access ‘event’ in its own browser cache, the operating system cache, a local network cache at your router, and a DNS server cache on your ISP’s servers. When you enter a new URL, the browser will first check its various caches to see whether you have visited that site before. If it can't find the IP address in any of those cache layers, it hands over to the DNS server at your ISP, which then does what’s called a ‘recursive DNS lookup’. This queries multiple DNS servers around the Internet, which in turn ask more DNS servers to search for the DNS record until it is located. If this doesn't yield results, DNS will return an error. Despite the apparent complexity, caching previous search data in this way cuts out unnecessary searches, saves time by making the look-up process more efficient.

What about the other elements of a URL ? The first part, ‘https://’ is known as the ‘scheme’. HTTPS stands for Hypertext Transfer Protocol Secure, and tells the browser to make a secure connection to the server using Transport Layer Security (TLS). TLS is an encryption protocol that allows secure communications over the Internet. It ensures that the data exchanged between your browser and the server, like passwords or credit card info, is all encrypted and can’t therefore be intercepted effectively in transit in intelligible form by anyone without the encryption key. Older options such as HTTP:// are still used but are now being phased out due to security issues. (security 'tightening' of this sort this is often the reason why an older OS or software version suddenly loses access to a particular website - be prepared to update if you encounter this problem)

Once the browser successfully locates a valid DNS record with the website’s IP address, it can then go looking for the server on the Internet where the website's files are actually stored, and then establish a connection.

How does the browser do this ?

2. Making a connection to the server

Using the public internet routing infrastructure, data 'packets' from a browser request get routed first through your router to your ISP. These then pass through an internet exchange to switch ISPs or networks, using what’s known as a Transmission Control Protocol (TCP), to find the server with the IP address to connect to.

Once the browser finds the correct server on the Internet, it establishes a TCP connection with the server. If HTTPS is being used, a TLS ‘handshake’ takes place to secure the communication. This is to ensure that the servers recognise one another as bona fide and therefore that the information exchange is secure and encrypted.

Now that the browser has a secure connection to the server, it follows the rules of communication as for the HTTP protocol. It starts with the browser sending an HTTP request to the server to request the contents of the page. The HTTP request contains a request line, headers and a body. The request line contains information that the server can use to determine what the client (in this case, your browser) wants it to do.

3. Sending a request to the Server

The content of the request line will depend on whether you just want to open the website or look specifically for something specific contained within it. It gives the server specific instructions as to what your browser wants to be supplied with:

· 1) a request method, which is one of GET, POST, PUT, PATCH, DELETE, or a handful of other HTTP 'verbs'

· 2) the path, pointing to the requested resources

· 3) the HTTP version to communicate with

In our example case, a simple ‘GET’ request will be sent. Once the server has received the request from the client (i.e. your browser in this case), the server processes it by looking at the info supplied in the 'request' line, headers, and body, decides how to process it. It then fetches the content at whatever location has been specified in the path, constructs the response and sends it back to the client (i.e. your browser). The response contains the following:

· 1) a status line, advising the client of their request’s status

· 2) response headers, telling the browser how to handle the response

· 3) the requested resource available at the specified path, either content like HTML, CSS, Javascript, or image files, or associated data....

Now that we’ve explored how the server generates the response to send back to the browser, let’s take a look at how the browser handles all this.

4. Dealing with the Server’s Response

Once the browser has received the response, it inspects the response headers for information on how to 'render' the resource. The 'Content Type' header tells the browser that the server sent an HTML resource in the response body. The browser has a built-in HTML interpreter and first parses (i.e. interprets) and then ‘renders’ (i.e. displays) the HTML content, making any additional requests necessary to get Javascript, CSS, images, and data that may be included in the website’s content. You can see an example screen dump of the HTML code below. (Hint: you can view the HTML source code for any website yourself by right clicking the website’s page display and selecting the ‘View Source Code’ option.)

The end product of the rendering process is a website page display the user can see and understand (in this case the Home page of my Blogspot blog) with any links to other pages within the blog, or to other websites visible, usually as buttons or text highlighted blue.

They can then select a link to one of the articles and browse the content as required.

(For anyone wanting to take a closer look at this revealing and 'off the wall' look at our 'furry friends', here is a link to the relevant blog.)

The Concept of Data Packets

Let's now look at how data is actually moved about. The Internet is essentially a ‘network of networks’. It works by using a technique called packet-switching, and by relying on standardized networking protocols that all computers can interpret.

At risk of making things seem even more complex, it’s appropriate here to emphasise that when you access a website via the steps we’ve described, the data you've requested isn’t all transferred at once, but is split into ‘packets’, each of which contains a small part of the data actually requested. Each packet contains both data and information about that data, and is processed separately. The information about the packet's contents is known as the "header," and appears at the front of the packet so that the receiving server knows what to do with it. When it sends back a response, it also sends information which enables the data to be reassembled correctly by the browser that originally requested it. Your browser won't attempt to render the website page until it has received all the data packets it was expecting and verified that they're in the right order. This often accounts for the slight delay you see when first requesting a site.

There is a good reason for this complication, which might at first sight seem unnecessary.

If, for example, we asked to view a website containing several hundred pages, and the whole lot were transferred at once, this would slow down access for all other current users, who would then have to wait for resource to become available to process their own requests.

By splitting the data into much smaller packets, processing them separately and then re-uniting them at the receiving end, the ‘waiting time’ for any one user can be minimised by time-sharing processing resource more efficiently. This does, however require the correct data packets to be reunited correctly and in the right order before the original browser request can be completed, and the internet protocols described above have been devised to ensure this happens correctly and without corruption.

Figure 1 (at the end of this article) provides a flow chart of the operations described above.

Final Words

This brief outline of the processes that take place ‘under the bonnet’ of the internet when we make a simple request to view a website’s content is by no means definitive. It is merely designed to give the interested user a glimpse of the internet’s complexity.

Given the huge volume of requests it is bombarded with daily, and now the additional demands of AI, it is truly remarkable that the net is still able to deliver access to many websites in just a fraction of a second. Just as well, given the reliance we all put on it working flawlessly…and in ‘real time’.

Although as users, there is no inherent need for us to understand any of its inner workings, a basic knowledge can be helpful in certain circumstances.

A particularly useful example of how this helps is in dealing with the increasing burden of internet scams we’re all subjected to nowadays. These are now appearing at an alarming rate, often as promotional emails offering links to ‘once in a lifetime bargains’. A knowledge of URL structure (and a suitably suspicious nature !) allows the canny user to recognise suspect mails by checking the URL attached to the message itself, and that of any links provided, and usually avoids them getting caught out.

If you are in doubt about where an unexpected mail has come from, don’t open it fully, and DON’T click any of the links – doing so will, at best, take you to a fake website in an attempt to extract bank details from you. At worst, it may introduce a ransomware virus which then proceeds to encrypt all your data.

The secret is to look carefully at the originating address of the mail and at the URLs of any links within it. Some scam mails contain spelling errors and abnormal fonts and are easy to spot, but others looking surprisingly plausible with Logos often appropriated from bona-fide sites.

If you receive a suspicious mail, please also forward it to report@phishing.gov.uk so they can add it to their increasingly vast database of scam mails, and take appropriate action. Make sure you delete the mail itself and the ‘Sent’ copy you've forwarded. You’ll receive an acknowledgment from phishing.gov with details of the mail you reported, which you can safely keep as a record if you wish, and you will have helped combat this continuing scourge.

Understanding something of the internet’s procedures and architecture can also help with diagnosing problems – if you can rule out your own browser, router and internet connection as the source of a problem early on, it will save much soul-searching and unnecessary ‘remedial’ work, during which could damage your system inadvertently.

I hope this brief expose of the internet’s inner workings has been useful and informative. Comments on content and suggested revisions welcome, as always…..

First published 30.1.25

Revised 21.4.25

Search This Blog

vivweb01blog