A web server is where the resources are located. Each time someone access the site, what they are doing is basically getting all the data from the server to their front end.
In a more technical way of explaining it, when a browser first fetches a web page, it talks to a server in a language called Hypertext Transfer Protocol, commonly known as HTTP. The browser makes an HTTP request, and the web server replies with an HTTP response, which consists of a status code and the requested content.
In Figure below, you see a request being made to webquality.org. The verb GET tells the server to locate /speed.html. Because a few versions of HTTP are in use, the server wants to know which version of the protocol is being referenced (which in this case is HTTPS). In the last step, the request is clarified with the host of the resource.
After making the request, you receive a response code of 200 OK, which assures you that the resource you’ve requested exists, along with a response containing the contents of /speed.html. The content of /speed.html is then downloaded and interpreted by the web browser.

That’s how a request is made to a server. As you can see if the speed.html page contains a lot of different resources(not just a single html file), the amount of requests will become higher as well.
All of these steps incur what is called latency, the amount of time spent waiting for a request to reach the web server, the amount of time for the web server to collect and send its response, and the amount of time for the web browser to download the response.
How do we go about reducing the server response time?
Now, we have a few ways of going about the optimization of the server response time but first let’s understand what’s the metrics we are optimizing here. Since the website server response with requested resources when the browser asks for it, we have to think about a few things here.
- The resources it asked for
- The distance from the user to the server
So what we are doing here, is to minimize the size of the resources and minimize the distance from server to user.
Minimizing resources
There are a few ways we can do that. One way is to leverage a technique called compression. One of the most popular compression methods is Gzip. You can implement it in your server to compress content that’s transmitted over the internet. Gzip can dramatically reduce the size of JavaScript, CSS, and HTML files by up to 90%. When the resource is smaller, naturally, it will reduce the time it takes for a user’s browser to download them.
Also, another technique here is caching. Caching is probably the single biggest speed boosting step while optimizing your server. Our WQA(Web Quality Assurance) team can almost always reduce the load time by >30% simply perform caching from the server’s end.
With caching, the server does not have to spend time fetching files from the disk, executing the application code, fetching database values and assembling the result into an HTML page each time a user refreshes a page. The server can just take a processed result, and send it to the visitor.
So going back to our last example, when a user is visiting the page speed.html which contains a lot of different scripts, images, css, and other resources. What is happening backend is that server is getting all requesting again those information from the disk so that it can be served to the users.
With caching, we just need to server the page that’s already composed hence dramatically reduce the amount of requests it makes.
Minimize the distance from user to the server
Since each time a user access the site, the browser is requesting resources from the server, it would be natural to put the server close to where most users are accessing the site. Don’t operate your business in Japan but host your website with a server that’s located in the US. It’s probably loading blazingly fast from your end, but definitely not your customers.