How the Kiwix PWA allows users to store Gigabytes of data from the Internet for offline use

Geoffrey Kantaris
Geoffrey Kantaris
Stéphane Coillet-Matillon
Stéphane Coillet-Matillon

People gathering around a laptop standing on a simple table with a plastic chair on the left. The background looks like a school in a developing country.

This case study explores how Kiwix, a non-profit organization, uses Progressive Web App technology and the File System Access API to enable users to download and store large Internet archives for offline use. Learn about the technical implementation of the code dealing with the Origin Private File System (OPFS), a new browser feature within the Kiwix PWA that enhances file management, providing improved access to archives without permission prompts. The article discusses challenges and highlights potential future developments in this new file system.

More than 30 years after the birth of the web, a third of the world's population is still waiting for reliable access to the Internet according to the International Telecommunication Union. Is this where the story ends? Of course not. Folks at Kiwix, a Switzerland-based non-profit, have developed an ecosystem of open source apps and content that aims to make knowledge available to people with limited or no Internet access. Their idea is that if you can't easily access the Internet, then someone can download key resources for you, where and when connectivity is available, and store them locally for later offline use. Many vital sites, for example Wikipedia, Project Gutenberg, Stack Exchange, or even TED talks, can now be converted to highly-compressed archives, called ZIM files, and read on the fly by the Kiwix browser.

ZIM archives use highly efficient Zstandard (ZSTD) compression (older versions used XZ), mostly for storing HTML, JavaScript, and CSS, while images are usually converted to compressed WebP format. Each ZIM also includes a URL and a title index. Compression is key here, as the entirety of Wikipedia in English (6.4 million articles, plus images) is compressed to 97 GB after conversion to the ZIM format, which sounds like a lot until you realize that the sum of all human knowledge can now fit on a mid-range Android phone. Many smaller resources are also offered, including themed versions of Wikipedia, such as maths, medicine, and so on.

Kiwix offers a range of native apps targeting desktop (Windows/Linux/macOS) as well as mobile (iOS/Android) usage. This case study, however, will focus on the Progressive Web App (PWA) that aims to be a universal and simple solution for any device that has a modern browser.

We will look at the challenges posed in developing a universal Web app that needs to provide fast access to large content archives fully offline, and some modern JavaScript APIs, particularly the File System Access API and the Origin Private File System, that provide innovative and exciting solutions to those challenges.

A Web app for offline use?

Kiwix users are an eclectic bunch with many different needs, and Kiwix has little or no control over the devices and operating systems on which they will be accessing their content. Some of these devices may be slow or outdated, especially in low-income areas of the world. While Kiwix tries to cover as many use cases as possible, the organization also realized that it could reach even more users by using the most universal piece of software on any device: the web browser. So, inspired by Atwood's Law, which states that Any application that can be written in JavaScript, will eventually be written in JavaScript, some Kiwix devs, around 10 years ago, set about porting the Kiwix software from C++ to JavaScript.

The first version of this port, called Kiwix HTML5, was for the now defunct Firefox OS and for browser extensions. At its core was (and is) a C++ decompression engine (XZ and ZSTD) compiled to the intermediate JavaScript language of ASM.js, and later Wasm, or WebAssembly, using the Emscripten compiler. Later renamed Kiwix JS, the browser extensions are still actively developed.

Kiwix JS Offline Browser

Enter the Progressive Web App (PWA). Realizing the potential of this technology, the Kiwix developers built a dedicated PWA version of Kiwix JS, and set about adding OS integrations that would allow the app to offer native-like capabilities, particularly in the areas of offline usage, installation, file handling and file system access.

Offline-first PWAs are extremely lightweight, and so are perfect for contexts where there is intermittent or expensive mobile Internet. The technology behind this is the Service Worker API and the related Cache API, used by all apps based on Kiwix JS. These APIs allow the apps to act as a server, intercepting Fetch Requests from the main document or article being viewed, and redirecting them to the (JS) backend to extract and construct a Response from the ZIM archive.

Storage, storage everywhere

Given the large size of ZIM archives, storage and access to it, especially on mobile devices, is probably the biggest headache for Kiwix developers. Many Kiwix end users download content in-app, when Internet is available, for later offline use. Other users download on a PC using a torrent, and then transfer to a mobile or tablet device, and some exchange content on USB sticks or portable hard drives in areas with patchy or expensive mobile Internet. All these ways of accessing content from arbitrary user-accessible locations need to be supported by Kiwix JS and Kiwix PWA.

What initially made it possible for Kiwix JS to read enormous archives, of hundreds of GB (one of our ZIM archives is 166 GB!) even on low-memory devices, is the File API. This API is universally supported in any browser, even very old browsers, and so it acts as universal fallback, for when newer APIs are not supported. It's as easy as defining an input element in HTML, in Kiwix's case:

<input
  type="file"
  accept="application/octet-stream,.zim,.zimaa,.zimab,.zimac, ..."
  value="Select folder with ZIM files"
  id="archiveFilesLegacy"
  multiple
/>

Once selected, the input element holds the File objects which are essentially metadata referencing the underlying data in storage. Technically, Kiwix's object-oriented backend, written in pure client-side JavaScript, reads small slices of the large archive as needed. If those slices need to be decompressed, the backend passes them to the Wasm decompressor, getting further slices if requested, until a full blob is decompressed (usually an article or an asset). This means that the large archive never has to be read entirely into memory.

Universal as it is, the File API has a drawback which made Kiwix JS apps appear clunky and old-fashioned compared to native apps: it requires the user to pick archives using a file picker, or drag-and-drop a file into the app, every time the app is launched, because with this API, there is no way to persist access permissions from one session to the next.

To mitigate this poor UX, like many developers, the Kiwix JS devs initially went down the Electron route. ElectronJS is an amazing framework which provides powerful features, including full access to the file system using Node APIs. However, it has some well known drawbacks:

  • It only runs on desktop operating systems.
  • It's big and heavy (70MB–100MB).

The size of the Electron apps, due to the fact that a complete copy of Chromium is included with every app, compares very unfavorably to a mere 5.1 MB for the minimized and bundled PWA!

So, was there a way Kiwix could improve the situation for users of the PWA?

File System Access API to the rescue

Around 2019, Kiwix became aware of an emergent API that was undergoing an origin trial in Chrome 78, then called the Native File System API. It promised the ability to get a file handle for a file or a folder and store it in an IndexedDB database. Crucially, this handle persists between app sessions, so the user isn't forced to pick the file or folder again when re-launching the app (though they do have to answer a quick permission prompt). By the time it reached production, it had been renamed as the File System Access API, and the core parts standardized by the WHATWG as the File System API (FSA).

So, how does the File System Access part of the API work? A few important points to note:

  • It's an asynchronous API (except for specialized functions in Web Workers).
  • The file or directory pickers have to be launched programmatically by capturing a user gesture (click or tap on a UI element).
  • For the user to give permission again to access a previously picked file (in a new session), a user gesture is also needed—in fact the browser will refuse to show the permission prompt if not initiated by a user gesture.

The code is relatively straightforward, apart from having to use the clunky IndexedDB API to store file and directory handles. The good news is that there are a couple of libraries that do a lot of the heavy lifting for you, like browser-fs-access. Over at Kiwix JS, we decided to work directly with the APIs, which are very well documented.

Opening file and directory pickers

Opening a file picker looks something like this (here using Promises, but if you prefer async/await sugar, see the Chrome for Developers tutorial):

return window
  .showOpenFilePicker({ multiple: false })
  .then(function (fileHandles) {
    return processFileHandle(fileHandles[0]);
  })
  .catch(function (err) {
    // This is normal if app is launching
    console.warn(
      'User cancelled, or cannot access fs without user gesture',
      err,
    );
  });

Note that for the sake of simplicity, this code only processes the first picked file (and forbids picking more than one). In case you want to allow picking multiple files with { multiple: true }, you simply wrap all the Promises that process each handle in a Promise.all().then(...) statement, for example:

let promisesForFiles = fileHandles.map(function (fileHandle) {
    return processFileHandle(fileHandle);
});
return Promise.all(promisesForFiles).then(function (arrayOfFiles) {
    // Do something with the files array
    console.log(arrayOfFiles);
}).catch(function (err) {
    // Handle any errors that occurred during processing
    console.error('Error processing file handles!', err);
)};

However, picking multiple files is arguably better done by asking the user to pick the directory containing those files rather than the individual files in it, especially since Kiwix users tend to organize all their ZIM files in the same directory. The code for launching the directory picker is almost the same as above except that you use window.showDirectoryPicker.then(function (dirHandle) { … });.

Processing the file or directory handle

Once you have the handle, you need to process it, so the function processFileHandle could look like this:

function processFileHandle(fileHandle) {
  // Serialize fileHandle to indexedDB
  serializeFSHandletoIdxDB('pickedFSHandle', fileHandle, function (val) {
    console.debug('IndexedDB responded with ' + val);
  });
  return fileHandle.getFile().then(function (file) {
    // Do something with the file
    return file;
  });
}

Note that you have to provide the function to store the file handle, there are no convenience methods for this, unless you use an abstraction library. Kiwix's implementation of this can be seen in the file cache.js, but it could be simplified considerably if it is only used to store and retrieve a file or folder handle.

Processing directories is a bit more complicated as you have to iterate through the entries in the picked directory with async entries.next() to find the files or file types that you want. There are various ways of doing that, but this is the code used in Kiwix PWA, in outline:

let iterableEntryList = dirHandle.entries();
return iterateAsyncDirEntries(iterableEntryList, []).then(function (entryList) {
  // Do something with the entry list
  return entryList;
});

/**
 * Iterates FileSystemDirectoryHandle iterator and adds entries to an array
 * @param {Iterator} entries An asynchronous iterator of entries
 * @param {Array} archives An array to which to add the entries (may be empty)
 * @return {Promise<Array>} A Promise for an array of entries in the directory
 */
function iterateAsyncDirEntries(entries, archives) {
  return entries
    .next()
    .then(function (result) {
      if (!result.done) {
        let entry = result.value[1];
        // Filter for the files you want
        if (/\.zim(\w\w)?$/i.test(entry.name)) {
          archives.push(entry);
        }
        return iterateAsyncDirEntryArray(entries, archives);
      } else {
        // We've processed all the entries
        if (!archives.length) {
          console.warn('No archives found in the picked directory!');
        }
        return archives;
      }
    })
    .catch(function (err) {
      console.error('There was an error processing the directory!', err);
    });
}

Note that for each entry in the entryList, you will later need to get the file with entry.getFile().then(function (file) { … }) when you need to use it, or the equivalent using const file = await entry.getFile() in an async function.

Can we go further?

The requirement for the user to grant permission initiated with a user gesture on subsequent launches of the app adds a small amount of friction to file and folder (re-)opening, but it's still much more fluid than being forced to re-pick a file. Chromium developers are currently finalizing code that would allow for persistent permissions for installed PWAs. This is something that a lot of PWA developers have been calling for, and is keenly anticipated.

But what if we don't have to wait?! Kiwix devs recently found that it's possible to eliminate all permission prompts right now, by using a shiny new feature of the File Access API that is supported by both Chromium and Firefox browsers (and partially supported by Safari, but still missing FileSystemWritableFileStream). This new feature is the Origin Private File System.

Going fully native: the Origin Private File System

The Origin Private File System (OPFS) is still an experimental feature in the Kiwix PWA, but the team is really excited to encourage users to try it out because it largely bridges the gap between native apps and Web apps. Here are the key benefits:

  • Archives in the OPFS can be accessed with no permission prompts, even on launch. Users can resume reading an article, and browsing an archive, from where they left off in a previous session, with absolutely no friction.
  • It provides highly optimized access to files stored in it: on Android we see speed improvements between five and ten times faster.

Standard file access in Android using the File API is painfully slow, especially (as is often the case for Kiwix users) if large archives are stored on a microSD card rather than in the device storage. That all changes with this new API. While most users won't be able to store a 97 GB file in the OPFS (which consumes device storage, not microSD card storage), it's perfect for storing small to medium-sized archives. You want the most complete medical encyclopedia from WikiProject Medicine? No problem, at 1.7 GB it easily fits in the OPFS! (Tip: look for othermdwiki_en_all_maxi in the in-app library.)

How the OPFS works

The OPFS is a file system provided by the browser, separate for each origin, that can be thought of as similar to app-scoped storage on Android. Files can be imported into the OPFS from the user-visible file system, or they can be downloaded directly into it (the API also allows creating files in the OPFS). Once in the OPFS, they are isolated from the rest of the device. On desktop Chromium-based browsers, it's also possible to export files back from the OPFS to the user-visible file system.

To use the OPFS, the first step is to request access to it, using navigator.storage.getDirectory() (again, if you'd rather see code using await, read The Origin Private File System):

return navigator.storage
  .getDirectory()
  .then(function (handle) {
    return processDirHandle(handle);
  })
  .catch(function (err) {
    console.warn('Unable to get the OPFS directory entry', err);
  });

The handle you get from this is the very same type of FileSystemDirectoryHandle you get from window.showDirectoryPicker() mentioned above, which means you can re-use the code that handles that (and mercifully there's no need to store this in indexedDB – just get it when you need it). Let's assume you already have some files in the OPFS and you want to use them, then, using the function iterateAsyncDirEntries() shown previously, you could do something like:

return navigator.storage.getDirectory().then(function (dirHandle) {
  let entries = dirHandle.entries();
  return iterateAsyncDirEntries(entries, [])
    .then(function (archiveList) {
      return archiveList;
    })
    .catch(function (err) {
      console.error('Unable to iterate OPFS entries', err);
    });
});

Don't forget you still need to use getFile() on any entry you want to work with from the archiveList array.

Importing files into the OPFS

So, how do you get files into the OPFS in the first place? Not so fast! First, you need to estimate the amount of storage you have to work with, and make sure that users don't try to put a 97 GB file in if it's not going to fit.

Getting the estimated quota is easy: navigator.storage.estimate().then(function (estimate) { … });. Slightly harder is working out how to display this to the user. In the Kiwix app, we opted for a little in-app panel visible right next to the checkbox which lets users try out the OPFS:

Panel showing the used storage in percent and the remaining available storage in Gigabytes.

The panel is populated by using estimate.quota and estimate.usage, for example:

let OPFSQuota; // Global variable, so we don't have to keep checking it
return navigator.storage.estimate().then(function (estimate) {
  const percent = ((estimate.usage / estimate.quota) * 100).toFixed(2);
  OPFSQuota = estimate.quota - estimate.usage;
  document.getElementById('OPFSQuota').innerHTML =
    '<b>OPFS storage quota:</b><br />Used:&nbsp;<b>' +
    percent +
    '%</b>; ' +
    'Remaining:&nbsp;<b>' +
    (OPFSQuota / 1024 / 1024 / 1024).toFixed(2) +
    '&nbsp;GB</b>';
});

As you can see, there's also a button that lets users add files to the OPFS from the user-visible file system. The good news here is that you can simply use the File API to get the needed File object (or objects) that are going to be imported. In fact, it's important not to use window.showOpenFilePicker() because this method is not supported by Firefox, whereas the OPFS is most definitely supported.

The visible Add file(s) button you see in the screenshot above isn't a legacy file picker, but it does click() a hidden legacy picker (<input type="file" multiple … /> element) when it is clicked or tapped. The app then just captures the change event of the hidden file input, checks the size of the files, and rejects them if they are too big for the quota. If all is well, ask the user if they want to add them:

archiveFilesLegacy.addEventListener('change', function (files) {
  const filesArray = Array.from(files.target.files);
  // Abort if user didn't select any files
  if (filesArray.length === 0) return;
  // Calculate the size of the picked files
  let filesSize = 0;
  filesArray.forEach(function (file) {
    filesSize += file.size;
  });
  // Check the size of the files does not exceed the quota
  if (filesSize > OPFSQuota) {
    // Oh no, files are too big! Tell user...
    console.log('Files would exceed the OPFS quota!');
  } else {
    // Ask user if they're sure... if user said yes...
    return importOPFSEntries(filesArray)
      .then(function () {
        // Tell user we successfully imported the archives
      })
      .catch(function (err) {
        // Tell user there was an error (error catching is important!)
      });
  }
});

Dialog asking the user if they want to add a list of .zim files to the origin private file system.

Because on some operating systems, like Android, importing archives is not the speediest operation, Kiwix also shows a banner and a small spinner while the archives are being imported. The team didn't work out how to add a progress indicator for this operation: if you work it out, answers on a postcard please!

So, how did Kiwix implement the importOPFSEntries() function? This involves using the fileHandle.createWriteable() method, which effectively allows each file to be streamed into the OPFS. All the hard work is handled by the browser. (Kiwix is using Promises here for reasons to do with our legacy codebase, but it has to be said that in this case await produces a simpler syntax, and avoids the pyramid of doom effect.)

function importOPFSEntries(files) {
  // Get a handle on the OPFS directory
  return navigator.storage
    .getDirectory()
    .then(function (dir) {
      // Collect the promises for each file that we want to write
      let promises = files.map(function (file) {
        // Create the file and get a writeable handle on it
        return dir
          .getFileHandle(file.name, { create: true })
          .then(function (fileHandle) {
            // Get a writer for the file
            return fileHandle.createWritable().then(function (writer) {
              // Show a banner / spinner, then write the file
              return writer
                .write(file)
                .then(function () {
                  // Finished with this writer
                  return writer.close();
                })
                .catch(function (err) {
                  console.error('There was an error writing to the OPFS!', err);
                });
            });
          })
          .catch(function (err) {
            console.error('Unable to get file handle from OPFS!', err);
          });
      });
      // Return a promise that resolves when all the files have been written
      return Promise.all(promises);
    })
    .catch(function (err) {
      console.error('Unable to get a handle on the OPFS directory!', err);
    });
}

Downloading a file stream directly into the OPFS

A variation on this is the ability to stream a file from the Internet directly into the OPFS, or into any directory for which you have a directory handle (that is, directories picked with window.showDirectoryPicker()). It uses the same principles as the code above, but constructs a Response consisting of a ReadableStream and a controller that enqueues the bytes read from the remote file. The resulting Response.body is then piped to the new file's writer inside the OPFS.

In this case, Kiwix is able to count the bytes passing through the ReadableStream, and so provide a progress indicator to the user, and also warn them not to quit the app during the download. The code is a bit too convoluted to show here, but as our app is a FOSS app, you can look at the source if you're interested in doing something similar. This is what the Kiwix UI looks like (the different progress values shown below are because it only updates the banner when the percentage changes, but updates the Download progress panel more regularly):

Kiwix user interface with a bar at the bottom warning the user not to quit the app and showing the download progress of the .zim archive.

Because downloading can be quite a long operation, Kiwix allows users to use the app freely during the operation, but ensures the banner is always displayed, so that users are reminded not to close the app until the download operation is complete.

Implementing a mini file manager in-app

At this point, the Kiwix PWA devs realized that it's not enough to be able to add files to the OPFS. The app also needed to give the users a way to delete files they no longer need from this storage area, and ideally, also, to export any files locked in the OPFS back to the user-visible file system. Effectively, it became necessary to implement a mini file management system inside the app.

A quick shout out here to the fabulous OPFS Explorer extension for Chrome (it also works in Edge). It adds a tab in developer tools that lets you see exactly what is in the OPFS, and also delete rogue or failed files. It was invaluable for checking if code was working, monitoring the behavior of downloads, and generally cleaning up our development experiments.

File export depends on the ability to get a file handle on a picked file or directory into which Kiwix is going to save the exported file, so this only works in contexts where it can use the window.showSaveFilePicker() method. If Kiwix files were smaller than several GB, we would be able to construct a blob in memory, give it a URL, and then download it to the user-visible file system. Unfortunately, that's not possible with such large archives. If supported, exporting is fairly straightforward: virtually the same, in reverse, as saving a file into the OPFS (get a handle on the file to be saved, ask the user to pick a location to save it to with window.showSaveFilePicker(), then use createWriteable() on the saveHandle). You can see the code in the repo.

File deletion is supported by all browsers, and can be achieved with a simple dirHandle.removeEntry('filename'). In Kiwix's case, we preferred to iterate the OPFS entries as we did above, so that we could check that the selected file exists first and ask for confirmation, but that may not be necessary for everyone. Again, you can examine our code if you're interested.

It was decided not to clutter the Kiwix UI with buttons offering these options, and instead place small icons directly under the archive list. Tapping on one of these icons will change the color of the archive list, as a visual clue to the user about what they are going to do. The user then clicks or taps on one of the archives, and the corresponding operation (export or delete) is carried out (after confirmation).

Dialog asking the user if they want to delete a .zim file.

Finally, here is a screencast demo of all the file management features discussed above—adding a file to the OPFS, directly downloading a file into it, deleting a file, and exporting to the user-visible file system.

A developer's work is never done

The OPFS is a great innovation for developers of PWAs, providing really powerful file management features that go a long way towards closing the gap between native apps and Web apps. But developers are a miserable bunch—they're never quite satisfied! The OPFS is nearly perfect, but not quite… It's great that the main features work in both Chromium and Firefox browsers, and that they are implemented on Android as well as desktop. We hope the full feature set will also be implemented in Safari and iOS soon. The following issues remain:

  • Firefox currently puts a cap of 10GB on the OPFS quota, no matter how much underlying disk space there is. While for most PWA authors this might be ample, for Kiwix, that's quite restrictive. Fortunately, Chromium browsers are much more generous.
  • It's not currently possible to export large files from the OPFS to the user-visible file system on mobile browsers, or desktop Firefox, because window.showSaveFilePicker() is not implemented. In these browsers, large files are effectively trapped in the OPFS. This goes against the Kiwix ethos of open access to content, and the ability to share archives between users especially in areas of intermittent or expensive Internet connectivity.
  • There is no user ability to control which storage the OPFS virtual file system will consume. This is particularly problematic on mobile devices, where users may have large amounts of space on a microSD card, but a very small amount on the device storage.

But all in all, these are minor niggles in what is otherwise a huge step forward for file access in PWAs. The Kiwix PWA team are very grateful to the Chromium developers and advocates who first proposed and designed the File System Access API, and for the hard work of achieving consensus amongst the browser vendors on the importance of the Origin Private File System. For Kiwix JS PWA, it has solved a great many of the UX issues that have hobbled the app in the past, and helps us in our quest to enhance the accessibility of Kiwix content for everyone. Please give the Kiwix PWA a spin and tell the developers what you think!

For some great resources on PWA capabilities, take a look at these sites: