This case study explores how Kiwix, a non-profit organization, uses Progressive Web App technology and the File System Access API to enable users to download and store large Internet archives for offline use. Learn about the technical implementation of the code dealing with the Origin Private File System (OPFS), a new browser feature within the Kiwix PWA that enhances file management, providing improved access to archives without permission prompts. The article discusses challenges and highlights potential future developments in this new file system.
About Kiwix
More than 30 years after the birth of the web, a third of the world's population is still waiting for reliable access to the Internet according to the International Telecommunication Union. Is this where the story ends? Of course not. Folks at Kiwix, a Switzerland-based non-profit, have developed an ecosystem of open source apps and content that aims to make knowledge available to people with limited or no Internet access. Their idea is that if you can't easily access the Internet, then someone can download key resources for you, where and when connectivity is available, and store them locally for later offline use. Many vital sites, for example Wikipedia, Project Gutenberg, Stack Exchange, or even TED talks, can now be converted to highly-compressed archives, called ZIM files, and read on the fly by the Kiwix browser.
ZIM archives use highly efficient Zstandard (ZSTD) compression (older versions used XZ), mostly for storing HTML, JavaScript, and CSS, while images are usually converted to compressed WebP format. Each ZIM also includes a URL and a title index. Compression is key here, as the entirety of Wikipedia in English (6.4 million articles, plus images) is compressed to 97 GB after conversion to the ZIM format, which sounds like a lot until you realize that the sum of all human knowledge can now fit on a mid-range Android phone. Many smaller resources are also offered, including themed versions of Wikipedia, such as maths, medicine, and so on.
Kiwix offers a range of native apps targeting desktop (Windows/Linux/macOS) as well as mobile (iOS/Android) usage. This case study, however, will focus on the Progressive Web App (PWA) that aims to be a universal and simple solution for any device that has a modern browser.
We will look at the challenges posed in developing a universal Web app that needs to provide fast access to large content archives fully offline, and some modern JavaScript APIs, particularly the File System Access API and the Origin Private File System, that provide innovative and exciting solutions to those challenges.
A Web app for offline use?
Kiwix users are an eclectic bunch with many different needs, and Kiwix has little or no control over the devices and operating systems on which they will be accessing their content. Some of these devices may be slow or outdated, especially in low-income areas of the world. While Kiwix tries to cover as many use cases as possible, the organization also realized that it could reach even more users by using the most universal piece of software on any device: the web browser. So, inspired by Atwood's Law, which states that Any application that can be written in JavaScript, will eventually be written in JavaScript, some Kiwix devs, around 10 years ago, set about porting the Kiwix software from C++ to JavaScript.
The first version of this port, called Kiwix HTML5, was for the now defunct Firefox OS and for browser extensions. At its core was (and is) a C++ decompression engine (XZ and ZSTD) compiled to the intermediate JavaScript language of ASM.js, and later Wasm, or WebAssembly, using the Emscripten compiler. Later renamed Kiwix JS, the browser extensions are still actively developed.
Enter the Progressive Web App (PWA). Realizing the potential of this technology, the Kiwix developers built a dedicated PWA version of Kiwix JS, and set about adding OS integrations that would allow the app to offer native-like capabilities, particularly in the areas of offline usage, installation, file handling and file system access.
Offline-first PWAs are extremely lightweight, and so are perfect for contexts where there is intermittent or expensive mobile Internet. The technology behind this is the Service Worker API and the related Cache API, used by all apps based on Kiwix JS. These APIs allow the apps to act as a server, intercepting Fetch Requests from the main document or article being viewed, and redirecting them to the (JS) backend to extract and construct a Response from the ZIM archive.
Storage, storage everywhere
Given the large size of ZIM archives, storage and access to it, especially on mobile devices, is probably the biggest headache for Kiwix developers. Many Kiwix end users download content in-app, when Internet is available, for later offline use. Other users download on a PC using a torrent, and then transfer to a mobile or tablet device, and some exchange content on USB sticks or portable hard drives in areas with patchy or expensive mobile Internet. All these ways of accessing content from arbitrary user-accessible locations need to be supported by Kiwix JS and Kiwix PWA.
What initially made it possible for Kiwix JS to read enormous archives, of
hundreds of GB
(one
of our ZIM archives is 166 GB!) even on low-memory devices, is the
File API. This API
is universally supported in any browser, even
very old browsers, and
so it acts as universal fallback, for when newer APIs are not supported. It's as
easy as defining an input
element in HTML, in Kiwix's case:
<input
type="file"
accept="application/octet-stream,.zim,.zimaa,.zimab,.zimac, ..."
value="Select folder with ZIM files"
id="archiveFilesLegacy"
multiple
/>
Once selected, the input element holds the File objects which are essentially metadata referencing the underlying data in storage. Technically, Kiwix's object-oriented backend, written in pure client-side JavaScript, reads small slices of the large archive as needed. If those slices need to be decompressed, the backend passes them to the Wasm decompressor, getting further slices if requested, until a full blob is decompressed (usually an article or an asset). This means that the large archive never has to be read entirely into memory.
Universal as it is, the File API has a drawback which made Kiwix JS apps appear clunky and old-fashioned compared to native apps: it requires the user to pick archives using a file picker, or drag-and-drop a file into the app, every time the app is launched, because with this API, there is no way to persist access permissions from one session to the next.
To mitigate this poor UX, like many developers, the Kiwix JS devs initially went down the Electron route. ElectronJS is an amazing framework which provides powerful features, including full access to the file system using Node APIs. However, it has some well known drawbacks:
- It only runs on desktop operating systems.
- It's big and heavy (70MB–100MB).
The size of the Electron apps, due to the fact that a complete copy of Chromium is included with every app, compares very unfavorably to a mere 5.1 MB for the minimized and bundled PWA!
So, was there a way Kiwix could improve the situation for users of the PWA?
File System Access API to the rescue
Around 2019, Kiwix became aware of an emergent API that was undergoing an origin trial in Chrome 78, then called the Native File System API. It promised the ability to get a file handle for a file or a folder and store it in an IndexedDB database. Crucially, this handle persists between app sessions, so the user isn't forced to pick the file or folder again when re-launching the app (though they do have to answer a quick permission prompt). By the time it reached production, it had been renamed as the File System Access API, and the core parts standardized by the WHATWG as the File System API (FSA).
So, how does the File System Access part of the API work? A few important points to note:
- It's an asynchronous API (except for specialized functions in Web Workers).
- The file or directory pickers have to be launched programmatically by capturing a user gesture (click or tap on a UI element).
- For the user to give permission again to access a previously picked file (in a new session), a user gesture is also needed—in fact the browser will refuse to show the permission prompt if not initiated by a user gesture.
The code is relatively straightforward, apart from having to use the clunky IndexedDB API to store file and directory handles. The good news is that there are a couple of libraries that do a lot of the heavy lifting for you, like browser-fs-access. Over at Kiwix JS, we decided to work directly with the APIs, which are very well documented.
Opening file and directory pickers
Opening a file picker looks something like this (here using Promises, but if
you prefer async/await
sugar, see the
Chrome for Developers tutorial):
return window
.showOpenFilePicker({ multiple: false })
.then(function (fileHandles) {
return processFileHandle(fileHandles[0]);
})
.catch(function (err) {
// This is normal if app is launching
console.warn(
'User cancelled, or cannot access fs without user gesture',
err,
);
});
Note that for the sake of simplicity, this code only processes the first picked
file (and forbids picking more than one). In case you want to allow picking
multiple files with { multiple: true }
, you simply wrap all the Promises that
process each handle in a Promise.all().then(...)
statement, for example:
let promisesForFiles = fileHandles.map(function (fileHandle) {
return processFileHandle(fileHandle);
});
return Promise.all(promisesForFiles).then(function (arrayOfFiles) {
// Do something with the files array
console.log(arrayOfFiles);
}).catch(function (err) {
// Handle any errors that occurred during processing
console.error('Error processing file handles!', err);
)};
However, picking multiple files is arguably better done by asking the user to
pick the directory containing those files rather than the individual files in
it, especially since Kiwix users tend to organize all their ZIM files in the
same directory. The code for launching the directory picker is almost the
same as above except that you use
window.showDirectoryPicker.then(function (dirHandle) { … });
.
Processing the file or directory handle
Once you have the handle, you need to process it, so the function
processFileHandle
could look like this:
function processFileHandle(fileHandle) {
// Serialize fileHandle to indexedDB
serializeFSHandletoIdxDB('pickedFSHandle', fileHandle, function (val) {
console.debug('IndexedDB responded with ' + val);
});
return fileHandle.getFile().then(function (file) {
// Do something with the file
return file;
});
}
Note that you have to provide the function to store the file handle, there are
no convenience methods for this, unless you use an abstraction library. Kiwix's
implementation of this can be seen in the file
cache.js
,
but it could be simplified considerably if it is only used to store and retrieve
a file or folder handle.
Processing directories is a bit more complicated as you have to iterate
through the entries in the picked directory with async entries.next()
to find
the files or file types that you want. There are various ways of doing that, but
this is the code used in Kiwix PWA, in outline:
let iterableEntryList = dirHandle.entries();
return iterateAsyncDirEntries(iterableEntryList, []).then(function (entryList) {
// Do something with the entry list
return entryList;
});
/**
* Iterates FileSystemDirectoryHandle iterator and adds entries to an array
* @param {Iterator} entries An asynchronous iterator of entries
* @param {Array} archives An array to which to add the entries (may be empty)
* @return {Promise<Array>} A Promise for an array of entries in the directory
*/
function iterateAsyncDirEntries(entries, archives) {
return entries
.next()
.then(function (result) {
if (!result.done) {
let entry = result.value[1];
// Filter for the files you want
if (/\.zim(\w\w)?$/i.test(entry.name)) {
archives.push(entry);
}
return iterateAsyncDirEntryArray(entries, archives);
} else {
// We've processed all the entries
if (!archives.length) {
console.warn('No archives found in the picked directory!');
}
return archives;
}
})
.catch(function (err) {
console.error('There was an error processing the directory!', err);
});
}
Note that for each entry in the entryList
, you will later need to get the file
with entry.getFile().then(function (file) { … })
when you need to use it, or
the equivalent using const file = await entry.getFile()
in an
async function
.
Can we go further?
The requirement for the user to grant permission initiated with a user gesture on subsequent launches of the app adds a small amount of friction to file and folder (re-)opening, but it's still much more fluid than being forced to re-pick a file. Chromium developers are currently finalizing code that would allow for persistent permissions for installed PWAs. This is something that a lot of PWA developers have been calling for, and is keenly anticipated.
But what if we don't have to wait?! Kiwix devs recently found that it's
possible to eliminate all permission prompts right now, by using a shiny new
feature of the File Access API that is supported by both Chromium and Firefox
browsers (and partially supported by Safari, but still
missing FileSystemWritableFileStream
).
This new feature is the Origin Private File System.
Going fully native: the Origin Private File System
The Origin Private File System (OPFS) is still an experimental feature in the Kiwix PWA, but the team is really excited to encourage users to try it out because it largely bridges the gap between native apps and Web apps. Here are the key benefits:
- Archives in the OPFS can be accessed with no permission prompts, even on launch. Users can resume reading an article, and browsing an archive, from where they left off in a previous session, with absolutely no friction.
- It provides highly optimized access to files stored in it: on Android we see speed improvements between five and ten times faster.
Standard file access in Android using the File API is painfully slow, especially (as is often the case for Kiwix users) if large archives are stored on a microSD card rather than in the device storage. That all changes with this new API. While most users won't be able to store a 97 GB file in the OPFS (which consumes device storage, not microSD card storage), it's perfect for storing small to medium-sized archives. You want the most complete medical encyclopedia from WikiProject Medicine? No problem, at 1.7 GB it easily fits in the OPFS! (Tip: look for other → mdwiki_en_all_maxi in the in-app library.)
How the OPFS works
The OPFS is a file system provided by the browser, separate for each origin, that can be thought of as similar to app-scoped storage on Android. Files can be imported into the OPFS from the user-visible file system, or they can be downloaded directly into it (the API also allows creating files in the OPFS). Once in the OPFS, they are isolated from the rest of the device. On desktop Chromium-based browsers, it's also possible to export files back from the OPFS to the user-visible file system.
To use the OPFS, the first step is to request access to it, using
navigator.storage.getDirectory()
(again, if you'd rather see code using
await
, read
The Origin Private File System):
return navigator.storage
.getDirectory()
.then(function (handle) {
return processDirHandle(handle);
})
.catch(function (err) {
console.warn('Unable to get the OPFS directory entry', err);
});
The handle you get from this is the very same type of
FileSystemDirectoryHandle
you get from window.showDirectoryPicker()
mentioned above, which means you can re-use the code that handles that (and
mercifully there's no need to store this in indexedDB
– just get it when you
need it). Let's assume you already have some files in the OPFS and you want to
use them, then, using the function iterateAsyncDirEntries()
shown previously,
you could do something like:
return navigator.storage.getDirectory().then(function (dirHandle) {
let entries = dirHandle.entries();
return iterateAsyncDirEntries(entries, [])
.then(function (archiveList) {
return archiveList;
})
.catch(function (err) {
console.error('Unable to iterate OPFS entries', err);
});
});
Don't forget you still need to use getFile()
on any entry you want to work
with from the archiveList
array.
Importing files into the OPFS
So, how do you get files into the OPFS in the first place? Not so fast! First, you need to estimate the amount of storage you have to work with, and make sure that users don't try to put a 97 GB file in if it's not going to fit.
Getting the estimated quota is easy:
navigator.storage.estimate().then(function (estimate) { … });
. Slightly harder
is working out how to display this to the user. In the Kiwix app, we opted for a
little in-app panel visible right next to the checkbox which lets users try out
the OPFS:
The panel is populated by using
estimate.quota
and
estimate.usage
,
for example:
let OPFSQuota; // Global variable, so we don't have to keep checking it
return navigator.storage.estimate().then(function (estimate) {
const percent = ((estimate.usage / estimate.quota) * 100).toFixed(2);
OPFSQuota = estimate.quota - estimate.usage;
document.getElementById('OPFSQuota').innerHTML =
'<b>OPFS storage quota:</b><br />Used: <b>' +
percent +
'%</b>; ' +
'Remaining: <b>' +
(OPFSQuota / 1024 / 1024 / 1024).toFixed(2) +
' GB</b>';
});
As you can see, there's also a button that lets users add files to the OPFS from
the user-visible file system. The good news here is that you can simply use the
File API to get the
needed File object (or objects) that are going to be imported. In fact, it's
important not to use window.showOpenFilePicker()
because this method is
not supported by Firefox, whereas the OPFS is most definitely supported.
The visible Add file(s) button you see in the screenshot above isn't a legacy
file picker, but it does click()
a hidden legacy picker
(<input type="file" multiple … />
element) when it is clicked or tapped. The
app then just captures the change
event of the hidden file input, checks the
size of the files, and rejects them if they are too big for the quota. If all is
well, ask the user if they want to add them:
archiveFilesLegacy.addEventListener('change', function (files) {
const filesArray = Array.from(files.target.files);
// Abort if user didn't select any files
if (filesArray.length === 0) return;
// Calculate the size of the picked files
let filesSize = 0;
filesArray.forEach(function (file) {
filesSize += file.size;
});
// Check the size of the files does not exceed the quota
if (filesSize > OPFSQuota) {
// Oh no, files are too big! Tell user...
console.log('Files would exceed the OPFS quota!');
} else {
// Ask user if they're sure... if user said yes...
return importOPFSEntries(filesArray)
.then(function () {
// Tell user we successfully imported the archives
})
.catch(function (err) {
// Tell user there was an error (error catching is important!)
});
}
});
Because on some operating systems, like Android, importing archives is not the speediest operation, Kiwix also shows a banner and a small spinner while the archives are being imported. The team didn't work out how to add a progress indicator for this operation: if you work it out, answers on a postcard please!
So, how did Kiwix implement the importOPFSEntries()
function? This involves
using the fileHandle.createWriteable()
method, which effectively allows each
file to be streamed into the OPFS. All the hard work is handled by the
browser. (Kiwix is using Promises here for reasons to do with our legacy
codebase, but it has to be said that in this case await
produces a simpler
syntax, and avoids the pyramid of doom effect.)
function importOPFSEntries(files) {
// Get a handle on the OPFS directory
return navigator.storage
.getDirectory()
.then(function (dir) {
// Collect the promises for each file that we want to write
let promises = files.map(function (file) {
// Create the file and get a writeable handle on it
return dir
.getFileHandle(file.name, { create: true })
.then(function (fileHandle) {
// Get a writer for the file
return fileHandle.createWritable().then(function (writer) {
// Show a banner / spinner, then write the file
return writer
.write(file)
.then(function () {
// Finished with this writer
return writer.close();
})
.catch(function (err) {
console.error('There was an error writing to the OPFS!', err);
});
});
})
.catch(function (err) {
console.error('Unable to get file handle from OPFS!', err);
});
});
// Return a promise that resolves when all the files have been written
return Promise.all(promises);
})
.catch(function (err) {
console.error('Unable to get a handle on the OPFS directory!', err);
});
}
Downloading a file stream directly into the OPFS
A variation on this is the ability to stream a file from the Internet directly
into the OPFS, or into any directory for which you have a directory handle (that
is, directories picked with window.showDirectoryPicker()
). It uses the same
principles as the code above, but constructs a Response
consisting of a
ReadableStream
and a controller that enqueues the bytes read from the remote
file. The resulting Response.body
is then
piped to the new file's writer
inside the OPFS.
In this case, Kiwix is able to count the bytes passing through the
ReadableStream
, and so provide a progress indicator to the user, and also warn
them not to quit the app during the download. The code is a bit too convoluted
to show here, but as our app is a FOSS app, you can
look at the source
if you're interested in doing something similar. This is what the Kiwix UI looks
like (the different progress values shown below are because it only updates the
banner when the percentage changes, but updates the Download progress panel
more regularly):
Because downloading can be quite a long operation, Kiwix allows users to use the app freely during the operation, but ensures the banner is always displayed, so that users are reminded not to close the app until the download operation is complete.
Implementing a mini file manager in-app
At this point, the Kiwix PWA devs realized that it's not enough to be able to add files to the OPFS. The app also needed to give the users a way to delete files they no longer need from this storage area, and ideally, also, to export any files locked in the OPFS back to the user-visible file system. Effectively, it became necessary to implement a mini file management system inside the app.
A quick shout out here to the fabulous OPFS Explorer extension for Chrome (it also works in Edge). It adds a tab in developer tools that lets you see exactly what is in the OPFS, and also delete rogue or failed files. It was invaluable for checking if code was working, monitoring the behavior of downloads, and generally cleaning up our development experiments.
File export depends on the ability to get a file handle on a picked file
or directory into which Kiwix is going to save the exported file, so this only
works in contexts where it can use the window.showSaveFilePicker()
method. If
Kiwix files were smaller than several GB, we would be able to construct a blob
in memory, give it a URL, and then download it to the user-visible file system.
Unfortunately, that's not possible with such large archives. If supported,
exporting is fairly straightforward: virtually the same, in reverse, as saving a
file into the OPFS (get a handle on the file to be saved, ask the user to pick a
location to save it to with window.showSaveFilePicker()
, then use
createWriteable()
on the saveHandle
). You can
see the code
in the repo.
File deletion is supported by all browsers, and can be achieved with a
simple dirHandle.removeEntry('filename')
. In Kiwix's case, we preferred to
iterate the OPFS entries as we did above, so that we could check that the
selected file exists first and ask for confirmation, but that may not be
necessary for everyone. Again, you can
examine our code
if you're interested.
It was decided not to clutter the Kiwix UI with buttons offering these options, and instead place small icons directly under the archive list. Tapping on one of these icons will change the color of the archive list, as a visual clue to the user about what they are going to do. The user then clicks or taps on one of the archives, and the corresponding operation (export or delete) is carried out (after confirmation).
Finally, here is a screencast demo of all the file management features discussed above—adding a file to the OPFS, directly downloading a file into it, deleting a file, and exporting to the user-visible file system.
A developer's work is never done
The OPFS is a great innovation for developers of PWAs, providing really powerful file management features that go a long way towards closing the gap between native apps and Web apps. But developers are a miserable bunch—they're never quite satisfied! The OPFS is nearly perfect, but not quite… It's great that the main features work in both Chromium and Firefox browsers, and that they are implemented on Android as well as desktop. We hope the full feature set will also be implemented in Safari and iOS soon. The following issues remain:
- Firefox currently puts a cap of 10GB on the OPFS quota, no matter how much underlying disk space there is. While for most PWA authors this might be ample, for Kiwix, that's quite restrictive. Fortunately, Chromium browsers are much more generous.
- It's not currently possible to export large files from the OPFS to the
user-visible file system on mobile browsers, or desktop Firefox, because
window.showSaveFilePicker()
is not implemented. In these browsers, large files are effectively trapped in the OPFS. This goes against the Kiwix ethos of open access to content, and the ability to share archives between users especially in areas of intermittent or expensive Internet connectivity. - There is no user ability to control which storage the OPFS virtual file system will consume. This is particularly problematic on mobile devices, where users may have large amounts of space on a microSD card, but a very small amount on the device storage.
But all in all, these are minor niggles in what is otherwise a huge step forward for file access in PWAs. The Kiwix PWA team are very grateful to the Chromium developers and advocates who first proposed and designed the File System Access API, and for the hard work of achieving consensus amongst the browser vendors on the importance of the Origin Private File System. For Kiwix JS PWA, it has solved a great many of the UX issues that have hobbled the app in the past, and helps us in our quest to enhance the accessibility of Kiwix content for everyone. Please give the Kiwix PWA a spin and tell the developers what you think!
For some great resources on PWA capabilities, take a look at these sites:
- Project Fugu API showcase: a collection of Web apps showcasing capabilities that close the gap between native apps and PWAs.
- What PWA can do today: a showcase of what's possible with PWAs today.