Published Feb 4, 2026
A Daily Hacker News Digest That Actually Runs
Hacker News moves fast. Keeping up with the firehose is a full-time job.
We wanted a daily summary of the best stories and discussions, delivered automatically. This sounds like a simple cron job, but “simple” jobs are often the most fragile. APIs fail, scrapers get blocked, and tasks run twice.
To build a digest that works every day, we designed for failure.
The Transaction First
Naive cron jobs fetch data and then save it. If the process crashes, you lose your work. If it restarts, you might send duplicate emails.
We invert this. We treat the daily digest as a database transaction first.
- Commit: We create a database row for “Digest 2026-02-04”.
- Execute: We schedule background workers to fill in that row.
This makes the process idempotent. If the job runs twice, the database constraint prevents a second row. If the workers fail, we know exactly which digest is incomplete.
A Layered Content Pipeline
Fetching web content is messy. We built a pipeline that degrades gracefully when layers fail.
Layer 1: The Source (Algolia) We query the Algolia API for top stories. We filter out “flame bait”—stories with high points but few comments. We want discussion, not just outrage.
Layer 2: The Content (Jina & MarkItDown) We use Jina Reader to convert articles to clean markdown. If Jina is blocked, we fall back to raw HTML parsing with MarkItDown. If that fails, we use the title and first sentence. The digest always ships, even if some details are missing.
Layer 3: The Summary (LLM) We use OpenRouter to summarize the article and the top comments. The comments often contain more insight than the link itself, so we prioritize them.
Safety in Presentation
Markdown is great for storage but risky for display. We store the summary as markdown in the database.
The API sends this markdown to the frontend. The frontend parses and sanitizes it into HTML in the browser. This ensures that even if an LLM hallucinates a malicious script tag, it never executes.
Summary
Reliability comes from handling failure, not preventing it.
By using transactional locks and a fallback-heavy pipeline, we built a system that runs quietly in the background. APIs can break, and sites can go down, but the digest still arrives.