Today’s Hackers guide to instantly load everything.
-Intro (Today we are sending thousands on bits to upload better and better front ends of our softwares and apps. To make our frontends more interactive and responsive we add new features every day in them. Etc. This also makes our apps heavier and the loading time increses. This is where the capability of the frontend developer comes into question. They need to optimize how they can help their frontends load better with all the feature rich interactions)
Facts for Today’s World of JS:
It takes 16 seconds for any web page, whether it is built with a Java script framework or a simple static page, on a real mobile device on 3G to get interactive with the user.
It usually takes the same about 19 seconds to be fully loaded.
250-400KB of Java script is usually send down by people down the wire.
How does it work (First request, fetch resources, parse, compile and render)?
So let’s take a look on how a browser is able to get anything loaded to a screen from the network. It’s seems quite simpler than it actually is. When you try access a website through a browser, it sends a request to the network which in return responds to it by HTML. Similarly, we parse the request for CSS, JS or images, and anything else that comes back from there. After that, we basically parse CSS, Java and any other data type that comes in, compile and then render it for the pixels on screen. The truth is, it isn’t quite that simple. So what happens is we are developing usually on Desktop machines which are high end and relatively better at computing and speed. Whereas, for mobile phones it is quite different. Usually for a desktop, it takes about ~200ms to parse the code whereas for mobile phones, that is four to five times more. So how do we really optimize this time to be able to load the code better and give interactivity to user faster?
The answer to that question is using real mobile phones and networks for testing the webpages instead of using simulations on a desktop machine. A lot of people use CPU throttling or emulations to test their webpages which is a good step but better can be done. The main reason for so is that for mobile phones, each one has a different characteristic, different CPUs and so on, which means that each device cannot be generalized to fit in the same expected time to interactivity. So one good tool to use here is webpagetest.org/easy. This website has a lot of mobile phone profiles to use to test your webpage and get accurate performance results.
Nowadays, time to interactivity is a measure used for judging the performance of any webpage or application. The intuition behind it is that that is the amount of time, which should be less than 5 seconds, a webpage should take to load anything useful or give anything useful for interacting with to the user.
Load Only What You Need:
The key to make webpages more performance efficient, is to load only what you need; that is, prioritize the codes or objects in your webpages in order of requirement or necessity so that within every 5 second, you are able to provide useful interactivity to the user. For instance, if you are shipping in Java Script, CSS, images etc. You make sure that you load first what is useful to the user’s initial experience on the webpage, whereas loading not high priority objects later on in the idle time, like comment threads, share options, etc. This allows building up a better user experience by displaying first the information that is actually necessary.
One way this can be achieved is by code splitting. Code splitting can be set up easily using webpack, splittable or other compilers from browsify. So the basic idea behind coding can be explained simply an example, that instead of giving the user the whole pizza in one go and making it fully loaded, you give it one piece at a time, in order to maximize their experience of what basically you are delivering to them. Another way to do so, is tree shaking which allows you to get rid of unnecessary exports that may cause delays, for instance roll ups, etc. can be considered being excluded from the initial loading.
Another thing we don’t talk about often is the baseline we are using, while shipping powerful codes, that doesn’t really account for success. The reason for so is that the frameworks that we use today, for instance are built with desktops kept in mind. So if you are loading a page and you have 5 seconds to do it and your framework takes up 4 seconds from it, that’s not a good distribution of time. So a lot of options you can explore for lightweight mobile loading, like Preact, etc. which have relatively low parsing time. So a useful tool we have called appshot with a feature called code coverage which helps you determine which code blocks are being executed at the time of loading and which ones are idle. By using this tool, you can choose to only load the code blocks which are necessary for the website. Another useful platform to only ship out code to those browsers who need it while cross browsers, is transpiling using Babel’s babel-preset-env. Another one is Lodash. So if you are using lodash to load your code, you can use babel-plugin-lodash to load only the modules you require instead of loading the whole package.
So Chrome now also supports ES Module therefore allowing more support for less transpilation and more opportunities for more interesting loading experiences across the board.
Order Loading Thoughtfully:
You know it better what needs to be shipped to the user on priority basis and what not, therefore there is another option called network request blocking that allows you to block URL requests usually from third parties URL on the page that slows down the loading. It can be done by right clicking on the respective URL and choosing to block URL request.
Cache Aggressively:
Cache as much as you can locally and granularly. At Inbox they figured a 10% decrease in time to interactivity by using Static Resource Caching.
So what we are going to do now is hack Chrome to make JSCONF.EU more interactive. You dive into resource fetcher file in CPP format. So code here basically tells chrome which objects to prioritize loading by setting return to ResourceLoadPriorityVeryHigh. Here’s a table to tell how it really prioritizes; like fonts are prioritizes whereas CSS mismatch are given least priority.
So to make use of this, in the network panel and the column called priority, you will find that it lists what is actually prioritized for all resources that are loaded. Let’s try setting all objects to priority very high and see how that works out.
So basically what happens is that all meaningful interactivity gets loaded along with everything else causing useful panes to be delayed. Whereas, with different priorities, with every couple of seconds an interactive module gets loaded for the user to interact with. So setting priorities with respect to the usage of those code blocks and resources in the webpage is a better option to increase performance of that webpage.
A browser preload scanner contributes to webpage performance as well. Like for instance, browsers like Chrome have a document parser and as it goes through the tokenization phase that is all the tokens that make up your HTML page, and scan all of them. If for instance it runs into a blocked token, it will stop in its track and not proceed any further. Here is where the preload scanner comes in which is able to look ahead of blocked resources and able to fetch any other resource or code block that might be necessary. When a preload scanner was introduced in Chrome it showed a 20% increase in performance.
Now another issue that may arise is Discovery. Your browser doesn’t know in what sequence to load packages or even what packages to choose to load in order to increase the webpage performance. You know that better. Therefore you can use link rel = preload to actually set priorities to the scripts to load first which may include Java, CSS, etc. based upon your webpage’s structure. What it does is shift the preload time to parsing allowing it to work its way through to a better performing webpage. Webpack plugin called webpack preload plugin is available that allows you to do preload on synchronous chunks as well as stand-alone ones.
An application called Shop designed by Polymer Team, used webpack plugin to see how smooth and modern can the experience be made of using an application. And it turned out they were able to do so by granular loading and thus providing a buttery smooth browsing experience. Now how did they do that?
They used a pattern called PRPL which stands for Push, Render, Pre-cache and Lazy-load. PRPL allows the browser to send to users what’s important to them as early on as possible.
So what it does is only push ahead essential coding, then render the initial routes, then pre-caching to make available navigation data so if you have to go back and revisit it will load from cache stored locally. And then finally lazy loading.
So by applying preload, we are able to shift the step by step loading to parallel loading and execution to decrease the time to interactivity. Now the problem with this structure is that it makes 2 round trips first fetching the necessary resources and then in the second trip loading the rest of it. So this can be fixed by H/2 Server Push. Push allows us to manifest the files that are critical to the user journey. Instead of sending request to the HTML for the code to be parsed, we can also send out the list of files that are crucial for the user experience and start having them fetched accordingly. So technically we are filling up the server runtime efficiently which is not so the case these days.
So by applying H/2 Server push we were able to save thousands of milliseconds in general saving a lot of time to load and get interactive with users. Now H/2 Server Push can also lead to problems and is by no means the perfect answer. It is not cache aware meaning not aware of the files already present locally in the cache. This means it force pushes cache files even though they might be present already locally, hence not making it the idle choice.
Push Vs. Preload
Push cuts out an RTT whereas Preload move resource download time closer to initial request. Push is useful if you have a service worker or a Cache Digests hence it is not cache aware and has no prioritization. Whereas Preload can process cross origin request, cache and cookies as well as load/error events and content negotiation.
So how do we address the issue of force push so it doesn’t push more information what is already in the cache locally. Well we can use service worker. Instead of going back to the network every time to fetch resources, it will try to get them locally from what is already stored in the cache, we avoid the issue of needing cache digest. So in case of Shop, using PRPL by applying these strategies, we were able to boot it up in only a couple of 100 milliseconds.
So I will share my experience of working with Twitter to develop Twitter Lite. So initially they had a mobile application which was pretty slow on the server side loading, so it didn’t encourage users to actually engage in the application. Whereas, the Twitter Lite application was much smoother and faster in terms of interactivity. So Twitter Lite was able to switch to interactivity in less than 5 seconds on 3G of a mobile phone which is a really good number to begin with. To do so with your webpages and applications you are going to have to work on cutting down your code, using code splitting and granular caching.
The actual Twitter application took almost 30 seconds to interactivity so that users could start tapping around the interface. They analyzed the codes and made usage of patterns like PRPL. They started out with DNS prefetch which defines declaratively which servers to start warming up your DNS connections to. Next they used preload to load their script up. This is very easy to setup. For a static website it will take you about 10 minutes whereas for fullstack it will cost about an hour. We are working on the figuring out the perceivable impacts it has on your website. This led to an overall 36% improvement in their times to interactivity.
The next step was working on rendering to put the pixels on screen much faster. Now twitter is such an application which is media rich therefore it was kind of obvious that image and media loading was what slowed down the loading. So what they did to improve that they used requestIdleCallback() to JS loading images and that led to a four times improvement in image rendering.
Another thing they noticed was that it images were not the right dimensions and also were sub-optmally encoded which led to slow decoding when they Chrome. They were able to optimize the loading time from a single image’s time of about ~300ms to the largest image being loaded in less than 20ms. This ensured images didn’t cause a bottleneck in improving interactivity.
One more improvement they made to better performing Twitter app, was introducing Data Saving Mode that led to only loading images when a user taps on them, which gave about 70% savings in data consumption. It catered to those on a limited data plan. Twitter also made use of precaching. They cached their assets, their emojis so you could reply to comments and posts and with time they ramped that up a bit to include application shell caching. These tweaks took the load time from over 6 seconds to a staggering 1.49s. Now Twitter provided with 20 versions of their run time. So it turned out, on the first load of Twitter light, there was no service worker. Second load increased performance by 47% and a third load was 65% faster.
Now coming to lazy load. So we know their time to interactivity was quite slow and needed a bit of changes. So on checking, they had relatively larger chunks of java scripts that were slow to load on mobile. Now you might think it is okay to for a 100KB or so to load that is still extra code that the browser has to load and work on, like parsing and compiling to boot it up. So these bundles took a bit over 5s to get ready before code splitting. Twitter correctly used vendor splitting ended up creating over 40 different asynchronous chunks that are granularly loaded as you navigate from one view to the other. The impact that had was that large bundle took 3s to fully process after code splitting. They overall improved their performances and time to interactivity making it smoother and more engaging.
One of their key learnings from this was using Bundle Analyzers like Webpack Bundle Analyzers to find out what low hanging fruits they have on their bundles. These packs better help you understand what you are actually sending down the wire. Using the bundle analyzer, they were to figure out that when someone was using the application what direct impact they had on their actual bundle shape.
Performance is a continuous game of measuring for areas to improve. If you are looking to work on performance profile improvements of your webpage, you can check out Lighthouse, a project we are working on that which is an auditing tool for not only performance metrics but also progressive web app features for general web platforms and practices. Also try Calibre, which helps to analyze anything from your bundle size to performance metrics and see what different impacts your deployment packages had. Also check out the webpage test on Github. What it does is every time you submit a pull request, it runs it and posts a small performance strip alongside your PR code which helps you figure out the impact of your code on user experience.