engineering

Computing SHA-256 in the browser without freezing the tab

How we hash multi-gigabyte RAW files in the browser during upload — streaming through Web Crypto, with no main-thread stalls — and why the obvious approach doesn't work.

Patrick Meehan
Founder, Total Vault · 5 min read ·
Computing SHA-256 in the browser without freezing the tab cover

This post is one I've been wanting to write since we shipped the upload pipeline. It's about a small, specific engineering problem that has one obvious solution that doesn't work, and one slightly less obvious solution that does. The problem is: how do you compute the SHA-256 hash of a multi-gigabyte file in the browser, while the user is also uploading that file, without freezing the tab.

If you've never thought about this, you wouldn't expect it to be hard. The Web Crypto API has crypto.subtle.digest. You feed it bytes. You get a hash. It's three lines of code. It also doesn't work, and the way it doesn't work is instructive.

The obvious thing first

The naive version is:

const buffer = await file.arrayBuffer();
const hash = await crypto.subtle.digest("SHA-256", buffer);

This works for a 10 KB JPEG. For a 47 MB RAW it works but allocates a 47 MB ArrayBuffer in memory. For a 3 GB ProRes file from a videographer, it crashes the tab. file.arrayBuffer() loads the entire file into memory at once, the browser tries to allocate three gigabytes of contiguous heap, and the tab dies. Even on a machine with 32 GB of RAM. The browser's per-tab memory ceiling is the constraint, not the physical hardware.

So that doesn't work for the use case.

What you actually need

The pattern you want is streaming — reading the file in small chunks, feeding each chunk into a running digest, and never holding more than a chunk's worth of bytes in memory at once.

Web Crypto's crypto.subtle.digest is one-shot. You can't call it incrementally. There's a proposal for a streaming digest API, but it isn't shipped. The current options are:

  1. Use a JS implementation of SHA-256 that supports incremental updates. We use @noble/hashes. It's pure JS, audited, no dependencies, and supports update() / digest() semantics natively.
  2. Use a WebCrypto-backed worker that reads chunks from a ReadableStream and accumulates them server-side. We don't do this — it pushes the work off the browser, which defeats the point of computing the hash before upload so we can verify the upload was intact.

@noble/hashes is the right choice for an upload-time hash. The throughput is around 200-400 MB/sec on a modern laptop, which is faster than the network can saturate, so the hash never becomes the bottleneck.

The shape of the code is:

import { sha256 } from "@noble/hashes/sha256";

async function hashFile(file: File): Promise<string> {
  const hasher = sha256.create();
  const reader = file.stream().getReader();
  while (true) {
    const { done, value } = await reader.read();
    if (done) break;
    hasher.update(value);
    await yieldToMain();
  }
  return bytesToHex(hasher.digest());
}

Two things make this work. The first is file.stream() — a ReadableStream that yields Uint8Array chunks of around 64 KB at a time. The browser handles the file I/O; we never see the whole file. The second is await yieldToMain() — and that's the part that's worth talking about.

The freeze problem

Even with streaming, if you sit in a tight loop calling hasher.update(chunk) thousands of times, the browser's main thread is busy the entire time. The progress bar doesn't update. Clicks don't register. Scroll stutters. The tab looks frozen even though it's making progress.

The reason is that await reader.read() is awaited, but if the data is already buffered (which it usually is, since the browser eagerly reads), the await resolves synchronously. You stay on the main thread. The whole hash runs in one synchronous block.

The fix is yieldToMain — a one-line helper that returns to the event loop briefly so other tasks (rendering, input, network) can run:

function yieldToMain(): Promise<void> {
  return new Promise((resolve) => setTimeout(resolve, 0));
}

There's a more sophisticated version using scheduler.yield() (a newer browser API), but setTimeout(_, 0) works everywhere. We yield every 16 ms, batching chunk updates between yields. That's one frame of work, then a frame of yielding, repeated. The UI stays responsive. The hash completes in roughly the same wall-clock time as the synchronous version. Throughput basically doesn't change because the actual SHA computation isn't the bottleneck — disk I/O is.

Doing it during the upload

The other thing worth mentioning is that we hash concurrently with the upload, not before it. The same ReadableStream is teed — one branch goes to S3 multipart, the other to the hasher. Both run on the same chunks. The upload doesn't wait for the hash to finish. The hash doesn't wait for the upload. They progress together, and the upload is considered complete only when both have finished and the server-side re-hash matches.

This is the subtle reason the hash matters. Without it, the upload could complete with one byte corrupted in transit and you'd never know. With it, that file is rejected at the moment of upload completion and you get a re-upload prompt instead of a corrupt file silently sitting in your archive for ten years.

Where we cheat

We don't hash RAW files larger than 500 MB in real-time during upload. We stream them straight to storage and hash them server-side after the upload completes. The reason is that for very large files (medium-format RAWs, ProRes video) the hash takes long enough that it can't keep pace with multipart upload throughput on a fast connection. The user-visible behavior is the same — they see "uploading" then "verifying" then "verified" — but the verifying step is a server-side hash against the just-uploaded object rather than a comparison against a browser-computed hash.

This is a small compromise. It means we trust the network for very large files instead of catching corruption at the upload boundary. The server-side re-hash still catches storage-side corruption. The probability of a bit flip in transit and it surviving TCP's own checksums and not being caught later is vanishingly small. We'll take that trade for the upload throughput.

The point I keep coming back to

There are a lot of small decisions like this in a real product. Each one is an engineering call: where to compute the hash, how to feed it, when to yield, when to compromise. None of them are visible to the customer — they see "verified" with a green check and the file in their archive. But the small decisions are the difference between an archive that's probably intact and an archive that's provably intact. We've spent more time on the small decisions than I want to admit.

If you want the technical version: read lib/hash.ts in our source tree, or trust the green check. Both work.

Protect what you’ve already made.

Bring everything you’ve ever made into Vault. 5 GB free, then from $9.95 per TB per month. Migration help included.