SQLite in production for as long as you can get away with it

SQLite is not a toy database you graduate away from on principle. For a lot of B2B apps, internal tools, and early-stage products, it is a better default than standing up Postgres on day one. The catch is that SQLite has a real model: one database file, one writer at a time, and every process that touches it must agree on how the file is stored and opened.

Treat it like a very fast, SQL-speaking file format attached to your app server, not like a tiny rented Postgres.

Where this actually works

The SQLite authors spell out the split in plain language. SQLite shines when the database lives on the same machine as the code that runs queries, when writes are bursty but not a stampede of parallel writers, and when you are okay serializing writes through a single lock.

That covers a lot of web products in the wild: one or two app instances behind a load balancer, mostly reads, short writes, nobody trying to fan out five writers against one row.

The same page is blunt about what goes wrong: opening one .db file from many machines over a network filesystem, or needing many concurrent writers that cannot queue politely, points you at a client/server database instead. WAL mode in particular assumes everyone shares memory on one host; the WAL documentation calls out that it does not work over a network filesystem because the WAL index lives in shared memory.

So the first production decision is boring and non-negotiable: local disk, one host’s filesystem, sane locking. NFS or “we mounted the volume from somewhere else and hope POSIX locks hold” is how people collect corruption stories.

Baseline pragmas (small list, high leverage)

You do not need a twenty-line startup script. You need a small baseline you set on every connection (or once at process start, depending on your driver), then tune for evidence.

WAL

PRAGMA journal_mode = WAL;

Write-ahead logging lets readers and writers overlap the way most apps expect: readers keep using the main database file while commits append to the WAL. Without WAL you are more likely to block reads during writes.

Foreign keys on

PRAGMA foreign_keys = ON;

Foreign keys are off by default for backwards compatibility. If your schema has REFERENCES, you want enforcement on every connection; one forgotten worker is a silent data integrity hole.

Busy timeout

PRAGMA busy_timeout = 5000;

The default busy timeout is zero: another connection holds the lock, you get SQLITE_BUSY immediately. Under load, busy_timeout turns many hard failures into short waits. Litestream recommends five seconds because checkpointing takes brief write locks; the same helps any sidecar or admin CLI that touches the file.

`synchronous` in WAL

PRAGMA synchronous = NORMAL;

The WAL docs spell out the tradeoff: with synchronous = FULL, writers sync the WAL on every commit; with NORMAL, commits are cheaper and checkpointing carries more of the sync burden, at the cost of durability after a hard power loss. On typical VMs, NORMAL plus tested backups is the practical default. If you need “last commit always survives the plug yank,” tighten this and measure.

Cache and mmap

cache_size and mmap_size are real (pragma reference), but the right values depend on working set and RAM. Start with defaults, watch latency and I/O, then widen the page cache or mmap cap only when profiling says so.

Habits that matter more than extra pragmas

Short transactions. SQLite allows only one writer at a time. A long transaction is a mutex held in public. Batch work, avoid holding locks while calling outbound HTTP, and treat “transaction open while we think” as a bug.

Long reads inflate the WAL. Checkpointing moves WAL pages back into the main file; a long-lived read can stall checkpoint progress and degrade read performance as the WAL grows. If you run analytics against the same file as OLTP, you will hurt read latency yourself.

One process model that accepts serialization. Fifteen worker processes each opening fifty connections does not change the writer count: it changes how often you fight yourself for locks. Prefer a pool size that matches how your stack actually uses SQLite.

Backups you have restored. Copying app.db while the app runs without using the online backup API or an equivalent snapshot story can give you a torn file. VACUUM INTO is another supported path for a consistent copy (noted alongside backup in the docs). Whatever you pick, prove you can restore to a fresh VM monthly.

Delete the whole family when you wipe a database. If you remove the main file but leave -wal or -shm behind, SQLite can try to replay old WAL state against a new empty file. Litestream’s deletion checklist applies to anyone rotating files: remove db, WAL, shm, and any replication metadata if you use it.

Continuous backups to S3-shaped object storage

File copy plus cron is fine until it is not: you forget the job, the disk dies between runs, or you need a point-in-time story that does not depend on human discipline.

Litestream tails the WAL and replicates changes to object storage. It requires WAL. It also means asynchronous replication: by default replicas lag roughly a second, so a catastrophic power loss on the host can lose the last second of committed writes that had not shipped yet. For clean shutdowns it tries to flush. Treat that window like you treat async replicas elsewhere: know the number, decide if it fits the product.

Cloudflare R2 exposes an S3-compatible API, so the same tooling pattern as “bucket + access keys + endpoint” applies. When using S3 APIs against R2, the region is auto (and tools often accept us-east-1 as an alias). Litestream documents R2 as a first-class target, including endpoint shape and version-specific tuning for provider quirks.

If you run very high sustained small write rates, Litestream warns that SQLite’s automatic checkpointing can interleave badly with replication; their mitigation is disabling auto-checkpoint (PRAGMA wal_autocheckpoint = 0;) and letting their tooling drive checkpoints. That is not your first week of prod; it is something you reach for when metrics say you need it.

When to stop scaling SQLite

SQLite stops being a good fit when the topology stops matching the engine:

Many app servers all writing the same database file over a network mount.
Workloads that need many writers at the same instant without queuing, or long write transactions that turn into permanent lock contention.
Hard requirements for database-level HA (automatic failover of the storage engine itself) where a single file on a single VM is unacceptable.

The signal is usually operational: rising SQLITE_BUSY, p99 write latency chasing lock wait time, checkpoint stalls, or repeated “we added another instance and it got worse.” At that point a hosted Postgres or MySQL is not a moral upgrade; it is a different concurrency and storage model.

Until then, SQLite plus local disk, a tight pragma baseline, disciplined transactions, and replication to something like R2 is often simpler, cheaper, and easier to reason about than the managed database you were told you needed before you had users.