Our Daily Lives are Inextricably Linked to the Internet – But How Does it Actually Work ?

 

Introduction

We all interact with the ‘net’ daily in one way or another, whether it’s via our smartphone, on a laptop or desktop pc, or just at arms length when using a service that involves internet data transmission of some kind. We are now arguably totally dependent on the internet's continued function to enable our increasingly complex everyday lives - our society just wouldn't work properly without it. This makes us vulnerable to cyber-attacks, or natural events which interfere with its infrastructure, as discussed in a recent blog on modern warfare

How many of us really understand how it all works, though ? And do we really need to ?

It is, of course, as with many modern inventions, perfectly possible to use the internet without having a clue about its internal workings. I suspect the vast majority of people do. Over the last 30 years or so the increasing complexity of life in general and our technology in particular has turned us into ‘passive’ consumers with little knowledge of how to repair or even service items we rely on every day.

Our cars are a good example of this trend. In the ‘old days’(i.e. pre 1990s!) most car owners knew the basics of how their cars worked, and could usually carry out essential servicing and get them going again using simple tools and procedures when minor problems cropped up. Today’s microprocessor-controlled vehicles are an order of magnitude more complex and essentially ‘not user serviceable’, often requiring specialist instrumentation just to diagnose what’s wrong with them, never mind actually ‘fix’ them.

So it has been with personal computers and the internet. Those of us of a certain age who experienced the advent of affordable personal computers first hand in the 1980s and '90s usually learned to code as part of the process. This was simply because we had to in order to get the rather primitive 'beasts' that they were to actually do anything useful. Today’s smartphones and pcs, with their continuously-evolving operating systems, software, firmware and hardware are a mystery to most, and few of us have the audacity to mess with them…or the incentive (or indeed permissions!) to do so. (As an aside, Governments have in recent years become quite worried about the shortage of programmers, although I suspect they have more urgent concerns to deal with now).

The internet presents a similar problem – but for all its complexity, it is worth understanding the basic principles of how it operates, precisely because it is such a ubiquitous element in our daily lives. Although we can’t do much to influence its infrastructure or its inner workings, with a little knowledge of how it all functions, we can end up feeling a little less helpless when things go wrong, as they often do...and perhaps even be able do something about it.

To illustrate some of the basics, I’ve attempted to describe what actually happens when we ask our internet browser to carry out the apparently simple task of accessing and displaying the homepage of a website. Along the way I hope to demystify some of the abbreviations regularly bandied about by ‘those in the know’. This isn’t by any means intended as a definitive ‘manual’ on using the internet, but hopefully it should be quite revealing and informative, and may even inspire the reader to delve more deeply…

Firstly, what components are involved in the process of viewing and interacting with a ‘common or garden’ website ?

·         A computer, smartphone or tablet with an installed internet browser software package e.g. Chrome, Firefox

·         Your computer’s operating system (OS; i.e.Windows, Mac, Linux) or Smartphone/Tablet OS (usually Android or Ios)

·         An Internet Service Provider (ISP), with a functioning link between your router and the internet. Nowadays, this is generally broadband, delivered either via optical fibre or copper phone lines.

·         The server where the website is hosted, and any services running on that server, and any others required to complete the process (see below for details).

What are the steps involved for the browser ?

1.      1) Look up the location of the server hosting the website

2.       2) Make a connection to the server

3.       3) Send a request to get the specific page

4.       4) Deal with the response from the server by translating the instructions the website's HTML code sends back into a page display you can understand.

To understand how these 4 steps work in more detail, we need first to look at the relationship between websites, servers, and so-called Internet Protocol (IP) addresses.

Websites are collections of files containing digital information written in one or other high-level programming language e.g. HyperText Markup Language (HTML), Cascading Style Sheets (CSS), Javascript, etc., that tell your browser how to display the website itself, and any associated images and links to other websites.

To enable internet access by anyone from anywhere in the world, these files need to be stored on an external computer which is connected to the Internet, usually referred to as a ‘server’. There are many of these servers, all connected to one another via the internet, and they are distributed across the globe in virtually every territory on earth. This allows redundancy i.e. individual servers can be taken off line without the whole system collapsing, which is of course crucial for the system to be workable. It also gets round the problem of 'territorial withdrawal', which could be an issue if servers were bunched together in a small number of territories, whose authorities suddenly withdrew their service.

1.     Looking up the server hosting your website

When you supply your browser with a Uniform Resource Locator (URL) of the website you want to visit (e.g. https://vivweb01.blogspot.com), your browser first has to work out which server on the Internet is hosting the site. It does this by looking up what’s called the ‘Domain’ name, i.e. ‘vivweb01.blogspot.com’ in this case, to find its IP address. A key point to note here is that every device on the Internet — whether it be servers, cell phones, smart refrigerators — has a unique address called an Internet Protocol (IP) address. This address contains four numbered parts: e.g. 142.250.187.193. (You can find the IP address of any website by entering the ‘ping’ command followed by the domain name at the DOS command prompt – try it and see). IP addresses can be static, or more commonly, dynamic (i.e. assigned to different devices at different times, but always unique to one device at a given time.

The Domain Name System (DNS), that does the translation is like the ‘Contacts’ app on a smartphone. DNS enables our browsers to find the appropriate server anywhere on the Internet. The lookup process is the same whether you physically enter the complete URL in your browsers command line or just click a hyperlink  somewhere on the page containing a valid URL.

The inner workings of the DNS are complex and have to be fast to enable acceptably rapid access for the user – I won’t attempt to elaborate on them here. One thing worth knowing, though, is that DNS data is ‘cached’ (i.e. stored temporarily) at different layers between your browser and at various places across the Internet. 

Your browser stores data from each website access ‘event’ in its own browser cache, the operating system cache, a local network cache at your router, and a DNS server cache on your ISP’s servers. When you enter a new URL, the browser will first check its caches to see whether you have visited that site before. If it can't find the IP address in any of those cache layers, it hands over to the DNS server at your ISP, which then does what’s called  a ‘recursive DNS lookup’. This queries multiple DNS servers around the Internet, which in turn ask more DNS servers to search for the DNS record until it is located. If this doesn't yield results, DNS will return an error. Caching previous search data in this way cuts out unnecessary searches, saves time and makes the process more efficient.

What about the other elements of a URL ? The first part, ‘https://’ is known as the ‘scheme’. HTTPS stands for Hypertext Transfer Protocol Secure, and tells the browser to make a secure connection to the server using Transport Layer Security (TLS). TLS is an encryption protocol that allows secure communications over the Internet. It ensures that the data exchanged between your browser and the server, like passwords or credit card info, is all encrypted and can’t therefore be intercepted effectively in transit in intelligible form by anyone without the key. Older options such as HTTP:// are still used but are now being phased out due to security issues.

Once the browser successfully locates a valid DNS record with the website’s IP address, it can then go looking for the server on the Internet where the website's files are actually  stored, and then establish a connection.

How does the browser do this ?

2.     Making a connection to the server

Using the public internet routing infrastructure, data 'packets' from a browser request get routed first through your router to your ISP. It then passes through an internet exchange to switch ISPs or networks, using what’s known as a Transmission Control Protocol (TCP), to find the server with the IP address to connect to.

Once the browser finds the correct server on the Internet, it establishes a TCP connection with the server. If HTTPS is being used, a TLS ‘handshake’ takes place to secure the communication. This is to ensure that the servers recognise one another as bona fide and therefore that the information exchange is secure and encrypted.

Now that the browser has a secure connection to the server, it follows the rules of communication for the HTTP protocol. It starts with the browser sending an HTTP request to the server to request the contents of the page. The HTTP request contains a request line, headers and a body. The request line contains information that the server can use to determine what the client (in this case, your browser) wants to do.

3.     Sending a request to the Server

The content of the request line will depend on whether you just want to open the website or look specifically for something specific contained within it. It gives the server specific instructions as to what your browser wants to be supplied with:

·         1) a request method, which is one of GET, POST, PUT, PATCH, DELETE, or a handful of other HTTP verbs

·         2) the path, pointing to the requested resources

·         3) the HTTP version to communicate with

In our example case, a simple ‘GET’ request will be sent. Once the server has received the request from the client (i.e. your browser in this case), the server processes it by looking at the info supplied in the 'request' line, headers, and body, decides how to process it. It then fetches the content at whatever location has been specified in the path, constructs the response and sends it back to the client. The response contains the following:

·         1) a status line, advising the client of their request’s status

·         2) response headers, telling the browser how to handle the response

·         3) the requested resource available at the specified path, either content like HTML, CSS, Javascript, or image files, or data

Now that we’ve explored how the server generates the response to send back to the browser, let’s take a look at how the browser handles this.

4.     Dealing with the Server’s Response

Once the browser has received the response, it inspects the response headers for information on how to render the resource. The 'Content Type' header tells the browser that the server sent an HTML resource in the response body. The browser has a built-in HTML interpreter and first parses (i.e. interprets) and then ‘renders’ (i.e. displays) the HTML content, making any additional requests necessary to get Javascript, CSS, images, and data that may be included in the website’s content. You can see an example screen dump of the HTML code below. (You can view the HTML source code for any website yourself by right clicking the website’s page display and selecting the ‘View Source Code’ option.)

 


The end product of the rendering process is a website page display the user can see and understand (in this case the Home page of my Blogspot blog) with any links to other pages within the blog, or to other websites.

They can then select a link to one of the articles and browse the content as required.



(For anyone wanting to take a closer look at this revealing and 'off the wall' look at our 'furry friends', here is a link to the relevant blog.)

The Concept of Data Packets

The Internet is essentially a ‘network of networks’. It works by using a technique called packet-switching, and by relying on standardized networking protocols that all computers can interpret. At risk of making things seem even more complex, it’s appropriate here to emphasise that when you access a website via the steps we’ve described, the data you've requested isn’t all transferred at once, but is split into ‘packets’, each of which contains a small part of the data actually requested. Each packet contains both data and information about that data, and is processed separately. The information about the packet's contents is known as the "header," and appears at the front of the packet so that the receiving server knows what to do with it. When it sends back a response, it also sends information which enables the data to be reassembled correctly by the browser that originally requested it. Your browser won't attempt to render the website page until it has received all the data packets it was expecting and verified that they're in the right order.

There is a good reason for this complication, which might at first sight seem unnecessary. 

If, for example, we asked to view a website containing several hundred pages, and the whole lot were transferred at once, this would slow down access for all other users, who would have to wait for resource to become available to process their own requests. 

By splitting the data into much smaller packets, processing them separately and then reuniting them at the receiving end, the ‘waiting time’ for any one user can be minimised by sharing processing resource more efficiently. This does, however require the correct data packets to be reunited correctly and in the right order before the original browser request can be completed, and the internet protocols described above have been devised to ensure this happens correctly without corruption.

Figure 1 (at the end of this article)provides a flow chart of the operations described above.

Final Words

This brief outline of the processes that take place ‘under the bonnet’ of the internet when we make a simple request to view a website’s content is by no means definitive. It is merely designed to give the interested user a glimpse of the internet’s complexity. 

Given the huge volume of requests it is bombarded with daily, and now the additional demands of AI, it is truly remarkable that the net is still able to deliver access to many websites in a fraction of a second. Just as well, given the reliance we all put on it working flawlessly…and in ‘real time’.

Although as users, there is no inherent need for us to understand any of its inner workings, a basic knowledge can be helpful in certain circumstances.

A particularly useful example is in dealing with the increasing burden of internet scams we’re all subjected to. These are now appearing at an alarming rate, often as promotional emails offering links to ‘once in a lifetime bargains’. A knowledge of URL structure (and a suitably suspicious nature !) allows the canny user to recognise suspect mails by checking the URL attached to the message itself, and that of any links provided, and usually avoids  them getting caught out.

If you are in doubt about where an unexpected mail has come from, don’t open it fully, and DON’T click any of the links – doing so will at best take you to a fake website in an attempt to extract bank details from you, and at worst may introduce a ransomware virus which then proceeds to encrypt all your data. 

Look carefully at the originating address of the mail and at the URLs of any links within it. Some scam mails contain spelling errors and abnormal fonts and are easy to spot, but others looking surprisingly plausible with Logos appropriated from bona fide sites.

If you receive a suspicious mail, please also forward it to report@phishing.gov.uk  so they can add it to their increasingly vast database of scam mails, and take appropriate action. Make sure you delete the mail itself and the ‘Sent’ copy. You’ll receive an acknowledgment from phishing.gov with details of the mail you reported, and will have helped combat this recent scourge.

Understanding something of the internet’s procedures and architecture can also help with diagnosing problems – if you can rule out your own browser, router and internet connection as the source of a problem early on, it will save much soul-searching and unnecessary ‘remedial’ work, during which could damage your system inadvertently.

I hope this brief expose of the internet’s inner working has been useful and informative. Comments on content and suggested revisions welcome, as always…..

First published 30.1.25





Comments

Popular posts from this blog

Labour Declares War On Pensioners by Abolishing Universal Winter Fuel Payments – What's Next ?

Solar Panels: Are They Right For Me ?

When Is Bins ? - A Light-Hearted Look at our Domestic Refuse Collection System