Our Daily Lives are Inextricably Linked to the Internet – But How Does it Actually Work ?
Introduction
We all interact with the ‘net’ daily in one way or another,
whether it’s via our smartphone, on a laptop or desktop pc, or just at arms
length when using a service that involves internet data transmission of some
kind. We are now arguably totally dependent on the internet's continued function to enable our increasingly complex everyday lives - our society just wouldn't work properly without it. This makes us vulnerable to cyber-attacks, or natural events which interfere with its infrastructure, as discussed in a recent blog on modern warfare
How many of us really understand how it all works, though
? And do we really need to ?
It is, of course, as with many modern inventions,
perfectly possible to use the internet without having a clue about its internal
workings. I suspect the vast majority of people do. Over the last 30 years or so the increasing complexity of life in
general and our technology in particular has turned us into ‘passive’ consumers with little knowledge of how to repair or even service items we rely on every day.
Our cars are a good example of this trend. In the ‘old
days’(i.e. pre 1990s!) most car owners
knew the basics of how their cars worked, and could usually carry out essential servicing and get them going again using simple tools and procedures when minor
problems cropped up. Today’s microprocessor-controlled vehicles are an order of magnitude more complex
and essentially ‘not user serviceable’, often requiring specialist instrumentation
just to diagnose what’s wrong with them,
never mind actually ‘fix’ them.
So it has been with personal computers and the internet.
Those of us of a certain age who experienced the advent of affordable personal computers first hand in
the 1980s and '90s usually learned to code as part of the process. This was simply because we had to in order to get the rather primitive 'beasts' that they were to
actually do anything useful. Today’s smartphones and pcs, with their continuously-evolving
operating systems, software, firmware and hardware are a mystery to most, and few
of us have the audacity to mess with them…or the incentive (or indeed permissions!) to do so. (As an aside, Governments have in recent years become quite worried about the shortage of programmers, although I suspect they have more urgent concerns to deal with now).
The internet presents a similar problem – but for
all its complexity, it is worth understanding the basic principles of how it
operates, precisely because it is such a ubiquitous element in our daily lives.
Although we can’t do much to influence its infrastructure or its inner workings,
with a little knowledge of how it all functions, we can end up feeling a little
less helpless when things go wrong, as they often do...and perhaps even be able do
something about it.
To illustrate some of the basics, I’ve attempted to
describe what actually happens when we ask our internet browser to carry out
the apparently simple task of accessing and displaying the homepage of a
website. Along the way I hope to demystify some of the abbreviations regularly
bandied about by ‘those in the know’. This isn’t by any means intended as a
definitive ‘manual’ on using the internet, but hopefully it should be quite
revealing and informative, and may even inspire the reader to delve more deeply…
Firstly, what components are involved in the process
of viewing and interacting with a ‘common or garden’ website ?
·
A computer, smartphone or tablet with an installed internet browser software package e.g. Chrome, Firefox
·
Your computer’s operating system (OS;
i.e.Windows, Mac, Linux) or Smartphone/Tablet OS (usually Android or Ios)
·
An Internet Service Provider (ISP), with a functioning link between your router and the internet. Nowadays, this is generally broadband, delivered either via optical fibre or copper phone lines.
·
The server where the website is hosted,
and any services running on that server, and any others required to complete the process (see below for details).
What are the steps involved for the browser ?
1. 1) Look
up the location of the server hosting the website
2. 2) Make
a connection to the server
3. 3) Send
a request to get the specific page
4. 4) Deal
with the response from the server by translating the instructions the website's HTML code sends back into a page display you can understand.
To understand how these 4 steps work in more detail,
we need first to look at the relationship between websites, servers, and so-called Internet Protocol (IP) addresses.
Websites are collections of files containing digital information written in one or other
high-level programming language e.g. HyperText Markup Language (HTML), Cascading
Style Sheets (CSS), Javascript, etc., that tell your browser how to display the website
itself, and any associated images and links to other websites.
To enable internet access by anyone from anywhere in the
world, these files need to be stored on an external computer which is connected to the
Internet, usually referred to as a ‘server’. There are many of these servers, all
connected to one another via the internet, and they are distributed across the
globe in virtually every territory on earth. This allows redundancy i.e.
individual servers can be taken off line without the whole system collapsing,
which is of course crucial for the system to be workable. It also gets round the problem of 'territorial withdrawal', which could be an issue if servers were bunched together in a small number of territories, whose authorities suddenly withdrew their service.
1.
Looking up the server hosting your website
When you supply your browser with a Uniform Resource
Locator (URL) of the website you want to visit (e.g. https://vivweb01.blogspot.com), your browser first has to
work out which server on the Internet is hosting the site. It does this by
looking up what’s called the ‘Domain’ name, i.e. ‘vivweb01.blogspot.com’ in
this case, to find its IP address. A key point to note here is that every device
on the Internet — whether it be servers, cell phones, smart refrigerators — has
a unique address called an Internet Protocol (IP) address. This address
contains four numbered parts: e.g. 142.250.187.193. (You can find the IP
address of any website by entering the ‘ping’ command followed by the domain name
at the DOS command prompt – try it and see). IP addresses can be static, or more commonly, dynamic (i.e. assigned to different devices at different times, but always unique to one device at a given time.
The Domain Name System (DNS), that does the translation is like the ‘Contacts’ app on a smartphone. DNS enables our browsers to find the appropriate server anywhere on the Internet. The lookup process is the same whether you physically enter the complete URL in your browsers command line or just click a hyperlink somewhere on the page containing a valid URL.
The inner workings of the DNS are complex and have to be fast to enable acceptably rapid access for the user – I won’t attempt to elaborate on them here. One thing worth knowing, though, is that DNS data is ‘cached’ (i.e. stored temporarily) at different layers between your browser and at various places across the Internet.
Your browser stores
data from each website access ‘event’ in its own browser cache, the operating
system cache, a local network cache at your router, and a DNS server cache on your
ISP’s servers. When you enter a new URL, the browser will first check its caches to see whether you have visited that site before. If it can't find the IP
address in any of those cache layers, it hands over to the DNS server at your
ISP, which then does what’s called a ‘recursive
DNS lookup’. This queries multiple DNS servers around the Internet, which in
turn ask more DNS servers to search for the DNS record until it is located. If this doesn't yield results, DNS will return an error. Caching previous
search data in this way cuts out unnecessary searches, saves time and makes the process more efficient.
What about the other elements of a URL ? The first
part, ‘https://’ is known as the ‘scheme’. HTTPS stands for Hypertext Transfer
Protocol Secure, and tells the browser to make a secure connection to the server using
Transport Layer Security (TLS). TLS is an encryption protocol that allows
secure communications over the Internet. It ensures that the data exchanged
between your browser and the server, like passwords or credit card info, is all
encrypted and can’t therefore be intercepted effectively in transit in
intelligible form by anyone without the key. Older options such as HTTP:// are
still used but are now being phased out due to security issues.
Once the browser successfully locates a valid DNS
record with the website’s IP address, it can then go looking for the server on
the Internet where the website's files are actually stored, and then establish a connection.
How does the browser do this ?
2.
Making a connection to the server
Using the public internet routing infrastructure, data 'packets' from a browser request get routed first through your router to
your ISP. It then passes through an internet exchange to switch ISPs or
networks, using what’s known as a Transmission Control Protocol (TCP), to find
the server with the IP address to connect to.
Once the browser finds the correct server on the Internet,
it establishes a TCP connection with the server. If HTTPS is being used, a
TLS ‘handshake’ takes place to secure the communication. This is to ensure that
the servers recognise one another as bona
fide and therefore that the information exchange is secure and encrypted.
Now that the browser has a secure connection to the
server, it follows the rules of communication for the HTTP protocol. It starts
with the browser sending an HTTP request to the server to request the contents
of the page. The HTTP request contains a request line, headers and a body. The
request line contains information that the server can use to determine what the
client (in this case, your browser) wants to do.
3.
Sending a request to the Server
The content of the request line will depend on
whether you just want to open the website or look specifically for something
specific contained within it. It gives the server specific instructions as to
what your browser wants to be supplied with:
· 1) a request method, which is one of GET,
POST, PUT, PATCH, DELETE, or a handful of other HTTP verbs
· 2) the path, pointing to the requested
resources
· 3) the HTTP version to communicate with
In our example case, a simple ‘GET’ request will be sent. Once the server has received the request from the client (i.e. your browser in this case), the server processes it by looking at the info supplied in the 'request' line, headers, and body, decides how to process it. It then fetches the content at whatever location has been specified in the path, constructs the response and sends it back to the client. The response contains the following:
· 1) a status line, advising the client of their
request’s status
· 2) response headers, telling the browser
how to handle the response
· 3) the requested resource available at the
specified path, either content like HTML, CSS, Javascript, or image files, or
data
Now that we’ve explored how the server generates the
response to send back to the browser, let’s take a look at how the browser
handles this.
4.
Dealing with the Server’s Response
Once the browser has received the response, it
inspects the response headers for information on how to render the resource.
The 'Content Type' header tells the browser that the server sent an HTML resource
in the response body. The browser has a built-in HTML interpreter and first parses (i.e. interprets) and then ‘renders’ (i.e. displays) the HTML content, making any additional requests necessary
to get Javascript, CSS, images, and data that may be included in the website’s
content. You can see an example screen dump of the HTML code below. (You can view the HTML source code for any
website yourself by right clicking the website’s page display and selecting the
‘View Source Code’ option.)
The end product of the rendering process is a
website page display the user can see and understand (in this case the Home
page of my Blogspot blog) with any links to other pages within the blog, or to
other websites.
They can then select a link to one of the articles
and browse the content as required.
(For anyone wanting to take a closer look at this revealing and 'off the wall' look at our 'furry friends', here is a link to the relevant blog.)
The Concept of Data Packets
The Internet is essentially a ‘network of networks’.
It works by using a technique called packet-switching, and by relying on
standardized networking protocols that all computers can interpret. At risk of
making things seem even more complex, it’s appropriate here to emphasise that
when you access a website via the steps we’ve described, the data you've requested isn’t all
transferred at once, but is split into ‘packets’, each of which contains a
small part of the data actually requested. Each packet contains both data and
information about that data, and is processed separately. The information about
the packet's contents is known as the "header," and appears at the
front of the packet so that the receiving server knows what to do with it. When
it sends back a response, it also sends information which enables the data to
be reassembled correctly by the browser that originally requested it. Your browser won't attempt to render the website page until it has received all the data packets it was expecting and verified that they're in the right order.
There is a good reason for this complication, which might at first sight seem unnecessary.
If, for example, we asked to view a website containing several hundred pages, and the whole lot were transferred at once, this would slow down access for all other users, who would have to wait for resource to become available to process their own requests.
By splitting the data into much smaller packets, processing them separately and then reuniting them at the receiving end, the ‘waiting time’ for any one user can be minimised by sharing processing resource more efficiently. This does, however require the correct data packets to be reunited correctly and in the right order before the original browser request can be completed, and the internet protocols described above have been devised to ensure this happens correctly without corruption.
Figure 1 (at the end of this article)provides a flow chart of the operations
described above.
Final Words
This brief outline of the processes that take place ‘under the bonnet’ of the internet when we make a simple request to view a website’s content is by no means definitive. It is merely designed to give the interested user a glimpse of the internet’s complexity.
Given the huge volume of requests it is
bombarded with daily, and now the additional demands of AI, it is truly remarkable that the net is still able to deliver access to many websites in a fraction of a second.
Just as well, given the reliance we all put on it working flawlessly…and in ‘real
time’.
Although as users, there is no inherent need for us
to understand any of its inner workings, a basic knowledge can be helpful in
certain circumstances.
A particularly useful example is in dealing with the
increasing burden of internet scams we’re all subjected to. These are now appearing
at an alarming rate, often as promotional emails offering links to ‘once in a
lifetime bargains’. A knowledge of URL structure (and a suitably suspicious
nature !) allows the canny user to recognise suspect mails by checking the URL
attached to the message itself, and that of any links provided, and usually
avoids them getting caught out.
If you are in doubt about where an unexpected mail has come from, don’t open it fully, and DON’T click any of the links – doing so will at best take you to a fake website in an attempt to extract bank details from you, and at worst may introduce a ransomware virus which then proceeds to encrypt all your data.
Look carefully at the originating address of the mail and at the URLs
of any links within it. Some scam mails contain spelling errors and abnormal fonts
and are easy to spot, but others looking surprisingly plausible with Logos
appropriated from bona fide sites.
If you receive a suspicious mail, please also forward it
to report@phishing.gov.uk so they can add it to their increasingly vast
database of scam mails, and take appropriate action. Make sure you delete the
mail itself and the ‘Sent’ copy. You’ll receive an acknowledgment from phishing.gov
with details of the mail you reported, and will have helped combat this recent
scourge.
Understanding something of the internet’s procedures
and architecture can also help with diagnosing problems – if you can rule out
your own browser, router and internet connection as the source of a problem
early on, it will save much soul-searching and unnecessary ‘remedial’ work, during which could damage your system inadvertently.
I hope this brief expose of the internet’s inner
working has been useful and informative. Comments on content and suggested
revisions welcome, as always…..
First published 30.1.25
Comments
Post a Comment