<?xml version="1.0" encoding="UTF-8" ?>
<rss version="2.0">
<channel>
<title>Alcazar · Technical Blog</title>
<link>https://blog.alcazarsec.com/tech/posts</link>
<description>Technical notes, architecture writeups, and release stories.</description>
<item>
    <title>GPT Image 2 vs Nano Banana 2: what to use now</title>
    <link>https://blog.alcazarsec.com/tech/posts/gpt-image-2-vs-nano-banana-2</link>
    <pubDate>Wed, 22 Apr 2026 00:00:00 GMT</pubDate>
    <description>&lt;!--[--&gt;&lt;p&gt;If you want one answer, use &lt;a href=&quot;https://platform.openai.com/docs/guides/image-generation&quot; rel=&quot;nofollow&quot;&gt;GPT Image 2&lt;/a&gt;. It is now the best image tool for most people.&lt;/p&gt; &lt;p&gt;That does not make &lt;a href=&quot;https://blog.google/innovation-and-ai/technology/ai/nano-banana-2&quot; rel=&quot;nofollow&quot;&gt;Nano Banana 2&lt;/a&gt; a bad choice. It is probably the best fast image model on the market, and if you already live inside Gemini, Search, or Google Ads, it may fit your workflow better. &lt;a href=&quot;https://updates.midjourney.com/v7-is-now-the-default-model/&quot; rel=&quot;nofollow&quot;&gt;Midjourney V7&lt;/a&gt; still has the strongest artistic taste. &lt;a href=&quot;https://bfl.ai/blog/flux-2&quot; rel=&quot;nofollow&quot;&gt;FLUX.2&lt;/a&gt; still matters if you want open or self-hosted workflows. &lt;a href=&quot;https://about.ideogram.ai/3.0&quot; rel=&quot;nofollow&quot;&gt;Ideogram 3.0&lt;/a&gt; and &lt;a href=&quot;https://www.recraft.ai/blog/introducing-recraft-v4-design-taste-meets-image-generation&quot; rel=&quot;nofollow&quot;&gt;Recraft V4&lt;/a&gt; are better picks for some design jobs. &lt;a href=&quot;https://blog.adobe.com/en/publish/2025/04/24/adobe-firefly-next-evolution-creative-ai-is-here&quot; rel=&quot;nofollow&quot;&gt;Adobe Firefly Image Model 4&lt;/a&gt; is still the safe choice for larger brands that care a lot about licensing and provenance.&lt;/p&gt; &lt;p&gt;But if you are a normal user who wants one tool that listens well, edits well, handles text inside images, and is easy to access, the default has changed.&lt;/p&gt; &lt;h2&gt;The short buying guide&lt;/h2&gt; &lt;ul&gt;&lt;li&gt;Use &lt;strong&gt;GPT Image 2&lt;/strong&gt; if you want the best general-purpose image tool.&lt;/li&gt; &lt;li&gt;Use &lt;strong&gt;Nano Banana 2&lt;/strong&gt; if speed and Google integration matter more than absolute control.&lt;/li&gt; &lt;li&gt;Use &lt;strong&gt;Midjourney&lt;/strong&gt; if you care most about aesthetic taste.&lt;/li&gt; &lt;li&gt;Use &lt;strong&gt;FLUX.2&lt;/strong&gt; if you want open weights, self-hosting, or deeper infrastructure control.&lt;/li&gt; &lt;li&gt;Use &lt;strong&gt;Ideogram 3.0&lt;/strong&gt; or &lt;strong&gt;Recraft V4&lt;/strong&gt; if the job is posters, logos, layouts, or vector-style design.&lt;/li&gt; &lt;li&gt;Use &lt;strong&gt;Firefly&lt;/strong&gt; if your company is unusually conservative about commercial safety.&lt;/li&gt;&lt;/ul&gt; &lt;h2&gt;Why GPT Image 2 is the new default&lt;/h2&gt; &lt;p&gt;The biggest shift in image models over the last year is that the best systems are no longer just “good at making pretty pictures.” They are becoming practical tools.&lt;/p&gt; &lt;p&gt;OpenAI’s &lt;a href=&quot;https://help.openai.com/en/articles/6825453-chatgpt-release-notes&quot; rel=&quot;nofollow&quot;&gt;ChatGPT release notes&lt;/a&gt; say &lt;strong&gt;ChatGPT Images 2.0&lt;/strong&gt; is available on all ChatGPT plans and can also use &lt;strong&gt;images with thinking&lt;/strong&gt; on paid plans. OpenAI’s image docs describe &lt;code&gt;gpt-image-2&lt;/code&gt; as its latest image model, with built-in editing, high-fidelity handling of input images, and broad resolution support. That combination matters more than people think. The winning model in 2026 is the one that most often gives you the image you meant to ask for.&lt;/p&gt; &lt;p&gt;On the &lt;a href=&quot;https://lmarena.ai/leaderboard/text-to-image&quot; rel=&quot;nofollow&quot;&gt;Arena text-to-image leaderboard&lt;/a&gt;, GPT Image 2 jumped to first place on April 19 with a preliminary score of &lt;strong&gt;1512&lt;/strong&gt;, well ahead of Nano Banana 2 at &lt;strong&gt;1270&lt;/strong&gt;. That is only one benchmark, and leaderboards are never the whole story, but it matches the practical case for the model. OpenAI has been unusually strong at prompt adherence, text rendering, and editing workflows for a while now. GPT Image 2 looks like the point where those strengths became hard to ignore.&lt;/p&gt; &lt;p&gt;If you make social graphics, mockups, diagrams, blog art, ads, UI concepts, or images with real text in them, GPT Image 2 is the most sensible first tool to reach for.&lt;/p&gt; &lt;h2&gt;Where Nano Banana 2 still wins&lt;/h2&gt; &lt;p&gt;Nano Banana 2 is not losing because it is weak. It is losing because the market got brutal.&lt;/p&gt; &lt;p&gt;Google’s own launch post says Nano Banana 2 combines the quality and world knowledge of Nano Banana Pro with &lt;strong&gt;Flash speed&lt;/strong&gt;. Its Gemini API docs describe it even more plainly: it is the &lt;strong&gt;high-efficiency counterpart&lt;/strong&gt; to Nano Banana Pro, built for &lt;strong&gt;speed and high-volume developer use cases&lt;/strong&gt;.&lt;/p&gt; &lt;p&gt;Nano Banana 2 is fast, good, and deeply wired into Google’s ecosystem. Google says it is rolling out across the Gemini app, Search, Lens, Flow, Ads, AI Studio, and the Gemini API. If you want quick iterations, grounded image generation that can use Google search, and a model that already sits inside products you use every day, Nano Banana 2 is still one of the best tools on the market.&lt;/p&gt; &lt;p&gt;It is also a good reminder that “second best” can be a misleading label. For many teams, the fastest very-good model is more useful than the slowest best model.&lt;/p&gt; &lt;p&gt;Still, Google’s own product lineup tells you something important. When Google describes its image stack, it keeps &lt;strong&gt;Nano Banana Pro&lt;/strong&gt; around for “high-fidelity tasks requiring maximum factual accuracy,” while Nano Banana 2 is framed as the fast, efficient default. That is a great reason to use Nano Banana 2. It is also a reason not to confuse it with Google’s absolute best image model.&lt;/p&gt; &lt;h2&gt;API pricing, in plain English&lt;/h2&gt; &lt;p&gt;Pricing via API is not as simple as “which model is cheaper?”&lt;/p&gt; &lt;p&gt;For &lt;strong&gt;GPT Image 2&lt;/strong&gt;, OpenAI uses token-based pricing. In the official docs, the rough examples are about &lt;strong&gt;$0.006&lt;/strong&gt; for a &lt;code&gt;1024x1024&lt;/code&gt; image at low quality, &lt;strong&gt;$0.053&lt;/strong&gt; at medium, and &lt;strong&gt;$0.211&lt;/strong&gt; at high. At &lt;code&gt;1024x1536&lt;/code&gt; or &lt;code&gt;1536x1024&lt;/code&gt;, the examples are about &lt;strong&gt;$0.005&lt;/strong&gt;, &lt;strong&gt;$0.041&lt;/strong&gt;, and &lt;strong&gt;$0.165&lt;/strong&gt;. That is surprisingly cheap at the low end and less cheap once you push quality up.&lt;/p&gt; &lt;p&gt;For &lt;strong&gt;Nano Banana 2&lt;/strong&gt;, Google’s pricing is easier to budget. The official rate card translates to about &lt;strong&gt;$0.045&lt;/strong&gt; for &lt;code&gt;0.5K&lt;/code&gt;, &lt;strong&gt;$0.067&lt;/strong&gt; for &lt;code&gt;1K&lt;/code&gt;, &lt;strong&gt;$0.101&lt;/strong&gt; for &lt;code&gt;2K&lt;/code&gt;, and &lt;strong&gt;$0.151&lt;/strong&gt; for &lt;code&gt;4K&lt;/code&gt;. Batch pricing roughly cuts that in half.&lt;/p&gt; &lt;p&gt;So the practical answer is:&lt;/p&gt; &lt;ul&gt;&lt;li&gt;&lt;strong&gt;GPT Image 2&lt;/strong&gt; is often cheaper at low quality and still competitive in the middle.&lt;/li&gt; &lt;li&gt;&lt;strong&gt;Nano Banana 2&lt;/strong&gt; is easier to forecast at scale, especially if you use Batch.&lt;/li&gt; &lt;li&gt;&lt;strong&gt;GPT Image 2&lt;/strong&gt; can get expensive faster in edit-heavy workflows, because reference images are processed at high fidelity.&lt;/li&gt;&lt;/ul&gt; &lt;h2&gt;The state of image models now&lt;/h2&gt; &lt;p&gt;The market is easier to understand if you stop looking for one universal winner.&lt;/p&gt; &lt;p&gt;There are now a few clear camps.&lt;/p&gt; &lt;h3&gt;1. General-purpose multimodal image models&lt;/h3&gt; &lt;p&gt;This is where &lt;strong&gt;GPT Image 2&lt;/strong&gt; and &lt;strong&gt;Nano Banana 2&lt;/strong&gt; live.&lt;/p&gt; &lt;p&gt;These are the models most people will actually use. They live inside large assistants, can handle editing as well as generation, and are trying to be useful for normal work, not just art prompts. This category is becoming the center of gravity.&lt;/p&gt; &lt;p&gt;Right now, OpenAI looks ahead on quality and control. Google looks strongest on speed, distribution, and ecosystem reach.&lt;/p&gt; &lt;h3&gt;2. Taste-first creative tools&lt;/h3&gt; &lt;p&gt;Midjourney is still the best example.&lt;/p&gt; &lt;p&gt;Its official V7 update says the model has better prompt understanding, stronger coherence, improved personalization, and a Draft Mode that is &lt;strong&gt;10x&lt;/strong&gt; faster than normal generation. That sounds like a pure quality upgrade, but the real Midjourney advantage is still taste. Midjourney often picks a more pleasing composition or a more interesting mood than the literal prompt alone would suggest.&lt;/p&gt; &lt;p&gt;That makes it valuable. It also makes it a weaker default for people who want strict control. Midjourney is a creative partner. GPT Image 2 is closer to a careful assistant.&lt;/p&gt; &lt;h3&gt;3. Open and self-hosted models&lt;/h3&gt; &lt;p&gt;This is where FLUX.2 matters.&lt;/p&gt; &lt;p&gt;Black Forest Labs built FLUX.2 around real workflows: multi-reference support, editing, high resolution output, and open-weight options you can run yourself. If you are a developer, want to keep costs under your own control, or need infrastructure freedom, FLUX is still the most important non-closed family in the market. For a normal reader, that probably does not matter. For builders, it matters a lot.&lt;/p&gt; &lt;h3&gt;4. Design-specific tools&lt;/h3&gt; &lt;p&gt;This is where many “best model” debates quietly break down.&lt;/p&gt; &lt;p&gt;If your job is not “make an image,” but “make a poster,” “make a logo,” “make an ad layout,” or “make an editable vector,” then general image leaders are not always the right answer.&lt;/p&gt; &lt;p&gt;Ideogram 3.0 is still unusually strong at typography and layout. Recraft V4 is even more specialized. Recraft says V4 was trained around design taste and production-ready outputs, and its vector models generate editable SVG files directly from a prompt. If your output needs to move into a real design workflow, Recraft can beat more famous models simply by being built for the job.&lt;/p&gt; &lt;h3&gt;5. Enterprise-safe creative stacks&lt;/h3&gt; &lt;p&gt;Adobe Firefly is here for a reason.&lt;/p&gt; &lt;p&gt;Adobe keeps pushing Firefly as the commercially safe option, with tight links to the rest of Creative Cloud. That usually does not make it the most exciting model on the internet. It does make it easier to justify inside companies that care about licensing, provenance, and asset handoff.&lt;/p&gt; &lt;h2&gt;So what is the best image tool for most people?&lt;/h2&gt; &lt;p&gt;&lt;strong&gt;GPT Image 2.&lt;/strong&gt;&lt;/p&gt; &lt;p&gt;That is the answer I would give a friend who does not want a spreadsheet. Use ChatGPT if you want the best odds of getting a usable result on the first or second try. Use Nano Banana 2 if you want faster iteration and already spend your day inside Google’s world. Use Midjourney if you care more about taste than obedience. Use FLUX if you want control over the stack. Use Ideogram or Recraft if your real problem is design, not image generation in the abstract.&lt;/p&gt; &lt;p&gt;Image generation is no longer one market. It has split into everyday assistant models, taste-first creative tools, open infrastructure tools, and design-specific tools. That is a healthy sign. The category is maturing.&lt;/p&gt; &lt;p&gt;But if you force me to pick one recommendation for most readers in April 2026, I would not hedge.&lt;/p&gt; &lt;p&gt;Use GPT Image 2 first.&lt;/p&gt;&lt;!--]--&gt;</description>
</item>
<item>
    <title>When will quantum computing break cryptography?</title>
    <link>https://blog.alcazarsec.com/tech/posts/when-will-quantum-computing-break-cryptography</link>
    <pubDate>Tue, 21 Apr 2026 00:00:00 GMT</pubDate>
    <description>&lt;!--[--&gt;&lt;p&gt;If you mean &lt;code&gt;RSA&lt;/code&gt;, &lt;code&gt;Diffie-Hellman&lt;/code&gt;, and &lt;code&gt;elliptic-curve cryptography&lt;/code&gt;, the honest short answer is: &lt;strong&gt;probably not in &lt;code&gt;2026&lt;/code&gt;, probably not all at once, but very plausibly in the &lt;code&gt;2030s&lt;/code&gt;.&lt;/strong&gt;&lt;/p&gt; &lt;p&gt;That is the part most people care about because those systems protect TLS handshakes, VPNs, SSH, certificates, passkeys, and blockchain signatures.&lt;/p&gt; &lt;p&gt;The other important part is this: &lt;strong&gt;you do not need to wait for a quantum computer to exist before the problem becomes real&lt;/strong&gt;. If an attacker can copy encrypted traffic today and store it, they may be able to decrypt it later once a cryptographically relevant quantum computer exists. &lt;a href=&quot;https://www.cisa.gov/resources-tools/resources/quantum-readiness-migration-post-quantum-cryptography&quot; rel=&quot;nofollow&quot;&gt;CISA, NSA, and NIST explicitly warn about this “harvest now, decrypt later” model&lt;/a&gt;.&lt;/p&gt; &lt;p&gt;So the practical answer is simple:&lt;/p&gt; &lt;ul&gt;&lt;li&gt;if your data only needs to stay secret briefly, the risk is later&lt;/li&gt; &lt;li&gt;if it needs to stay secret for &lt;code&gt;5&lt;/code&gt; to &lt;code&gt;15&lt;/code&gt; years, the risk is already here&lt;/li&gt; &lt;li&gt;if you still depend heavily on &lt;code&gt;RSA&lt;/code&gt; or &lt;code&gt;ECC&lt;/code&gt;, your migration clock has started&lt;/li&gt;&lt;/ul&gt; &lt;h2&gt;TL;DR&lt;/h2&gt; &lt;ul&gt;&lt;li&gt;&lt;code&gt;Shor&#39;s algorithm&lt;/code&gt; breaks the public-key systems the internet relies on most: &lt;code&gt;RSA&lt;/code&gt;, &lt;code&gt;DH&lt;/code&gt;, &lt;code&gt;ECDH&lt;/code&gt;, &lt;code&gt;ECDSA&lt;/code&gt;, and similar schemes.&lt;/li&gt; &lt;li&gt;&lt;code&gt;AES&lt;/code&gt; and hash functions are in better shape. &lt;code&gt;Grover&#39;s algorithm&lt;/code&gt; weakens them in theory, but it does not create the same practical emergency, and it does &lt;strong&gt;not&lt;/strong&gt; mean everyone now needs bigger symmetric keys.&lt;/li&gt; &lt;li&gt;The scary part is not that a giant quantum computer exists today. It is that the resource estimates for breaking &lt;code&gt;RSA-2048&lt;/code&gt; and &lt;code&gt;ECC-256&lt;/code&gt; have been falling fast.&lt;/li&gt; &lt;li&gt;In &lt;code&gt;2019&lt;/code&gt;, a widely cited estimate put &lt;code&gt;RSA-2048&lt;/code&gt; at about &lt;code&gt;20 million&lt;/code&gt; noisy physical qubits. In &lt;code&gt;2025&lt;/code&gt;, &lt;a href=&quot;https://arxiv.org/abs/2505.15917&quot; rel=&quot;nofollow&quot;&gt;Craig Gidney&lt;/a&gt; cut that to &lt;strong&gt;less than &lt;code&gt;1 million&lt;/code&gt; noisy qubits&lt;/strong&gt; under similar surface-code assumptions.&lt;/li&gt; &lt;li&gt;In &lt;code&gt;2026&lt;/code&gt;, &lt;a href=&quot;https://quantumai.google/static/site-assets/downloads/cryptocurrency-whitepaper.pdf&quot; rel=&quot;nofollow&quot;&gt;Google researchers&lt;/a&gt; estimated that breaking &lt;code&gt;secp256k1&lt;/code&gt;-style &lt;code&gt;ECC-256&lt;/code&gt; could take &lt;strong&gt;about &lt;code&gt;1200&lt;/code&gt; to &lt;code&gt;1450&lt;/code&gt; logical qubits&lt;/strong&gt; and &lt;strong&gt;fewer than &lt;code&gt;500,000&lt;/code&gt; physical qubits&lt;/strong&gt; on a fast superconducting architecture.&lt;/li&gt; &lt;li&gt;&lt;code&gt;NIST&lt;/code&gt; already finalized the first core post-quantum standards in &lt;code&gt;2024&lt;/code&gt;: &lt;a href=&quot;https://csrc.nist.gov/pubs/fips/203/final&quot; rel=&quot;nofollow&quot;&gt;ML-KEM (&lt;code&gt;FIPS 203&lt;/code&gt;)&lt;/a&gt;, &lt;a href=&quot;https://csrc.nist.gov/pubs/fips/204/final&quot; rel=&quot;nofollow&quot;&gt;ML-DSA (&lt;code&gt;FIPS 204&lt;/code&gt;)&lt;/a&gt;, and &lt;a href=&quot;https://csrc.nist.gov/pubs/fips/205/final&quot; rel=&quot;nofollow&quot;&gt;SLH-DSA (&lt;code&gt;FIPS 205&lt;/code&gt;)&lt;/a&gt;.&lt;/li&gt; &lt;li&gt;&lt;a href=&quot;https://csrc.nist.gov/pubs/ir/8547/ipd&quot; rel=&quot;nofollow&quot;&gt;NIST’s draft transition plan&lt;/a&gt; says common quantum-vulnerable public-key schemes should be &lt;strong&gt;deprecated after &lt;code&gt;2030&lt;/code&gt;&lt;/strong&gt; and &lt;strong&gt;disallowed after &lt;code&gt;2035&lt;/code&gt;&lt;/strong&gt;.&lt;/li&gt; &lt;li&gt;The best plain-English answer is: &lt;strong&gt;treat the &lt;code&gt;2030s&lt;/code&gt; as the danger window for public-key cryptography, and plan as if long-lived secrets are already exposed.&lt;/strong&gt;&lt;/li&gt;&lt;/ul&gt; &lt;h2&gt;Which crypto breaks first&lt;/h2&gt; &lt;p&gt;Quantum computing does not threaten every cryptosystem equally.&lt;/p&gt; &lt;table&gt;&lt;thead&gt;&lt;tr&gt;&lt;th&gt;Crypto&lt;/th&gt;&lt;th&gt;What quantum does&lt;/th&gt;&lt;th&gt;Practical result&lt;/th&gt;&lt;/tr&gt;&lt;/thead&gt;&lt;tbody&gt;&lt;tr&gt;&lt;td&gt;&lt;code&gt;RSA&lt;/code&gt;, &lt;code&gt;DH&lt;/code&gt;, &lt;code&gt;ECDH&lt;/code&gt;&lt;/td&gt;&lt;td&gt;&lt;code&gt;Shor&#39;s algorithm&lt;/code&gt; solves the math directly&lt;/td&gt;&lt;td&gt;Broken once a large enough fault-tolerant machine exists&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt;&lt;code&gt;ECDSA&lt;/code&gt;, &lt;code&gt;EdDSA&lt;/code&gt;, &lt;code&gt;ECC&lt;/code&gt;&lt;/td&gt;&lt;td&gt;&lt;code&gt;Shor&#39;s algorithm&lt;/code&gt; solves discrete logs too&lt;/td&gt;&lt;td&gt;Also broken, and often with fewer resources than RSA&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt;&lt;code&gt;AES-128&lt;/code&gt;&lt;/td&gt;&lt;td&gt;&lt;code&gt;Grover&#39;s algorithm&lt;/code&gt; gives limited practical help&lt;/td&gt;&lt;td&gt;Still broadly considered safe&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt;&lt;code&gt;AES-256&lt;/code&gt;&lt;/td&gt;&lt;td&gt;Grover gives even more margin&lt;/td&gt;&lt;td&gt;Also safe, but not required as a PQ migration step&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt;&lt;code&gt;SHA-256&lt;/code&gt;&lt;/td&gt;&lt;td&gt;Grover helps with preimage search in theory&lt;/td&gt;&lt;td&gt;Still broadly considered safe&lt;/td&gt;&lt;/tr&gt;&lt;/tbody&gt;&lt;/table&gt; &lt;p&gt;This distinction matters because a lot of headlines say “quantum breaks encryption” as if every lock on the internet fails on the same day. The main break is in &lt;strong&gt;public-key cryptography&lt;/strong&gt;: the part used to agree on keys, prove identity, sign software, and authenticate transactions. Once that layer falls, a lot of higher-level systems fall with it.&lt;/p&gt; &lt;p&gt;Symmetric crypto is a different story. Quantum attacks help, but they do not give the same kind of clean knockout. That is why agencies like the &lt;a href=&quot;https://media.defense.gov/2022/Sep/07/2003071836/-1/-1/0/CSI_CNSA_2.0_FAQ_.PDF&quot; rel=&quot;nofollow&quot;&gt;NSA’s CNSA 2.0 guidance&lt;/a&gt; still keep strong symmetric primitives while replacing &lt;code&gt;RSA&lt;/code&gt; and &lt;code&gt;ECC&lt;/code&gt;.&lt;/p&gt; &lt;h2&gt;Why the timeline feels closer now&lt;/h2&gt; &lt;p&gt;For years, the easy answer was “don’t worry, we would need millions of qubits.” That was never the whole story, and it has aged badly. What changed is not just hardware. &lt;strong&gt;The attack cost estimates got better much faster.&lt;/strong&gt; Researchers found better arithmetic, better circuit layouts, and better ways to pay the error-correction overhead.&lt;/p&gt; &lt;p&gt;A short version of the trend:&lt;/p&gt; &lt;ul&gt;&lt;li&gt;In &lt;a href=&quot;https://quantum-journal.org/papers/q-2021-04-15-433/&quot; rel=&quot;nofollow&quot;&gt;&lt;code&gt;2019&lt;/code&gt;&lt;/a&gt;, Gidney and Ekerå estimated &lt;code&gt;RSA-2048&lt;/code&gt; could be factored in about &lt;code&gt;8&lt;/code&gt; hours with roughly &lt;code&gt;20 million&lt;/code&gt; noisy physical qubits.&lt;/li&gt; &lt;li&gt;In &lt;a href=&quot;https://arxiv.org/abs/2505.15917&quot; rel=&quot;nofollow&quot;&gt;&lt;code&gt;2025&lt;/code&gt;&lt;/a&gt;, Gidney updated that estimate to &lt;strong&gt;less than a week with less than &lt;code&gt;1 million&lt;/code&gt; noisy physical qubits&lt;/strong&gt; under similar assumptions.&lt;/li&gt; &lt;li&gt;In &lt;a href=&quot;https://quantumai.google/static/site-assets/downloads/cryptocurrency-whitepaper.pdf&quot; rel=&quot;nofollow&quot;&gt;&lt;code&gt;2026&lt;/code&gt;&lt;/a&gt;, Google researchers estimated &lt;code&gt;ECC-256&lt;/code&gt; over &lt;code&gt;secp256k1&lt;/code&gt; at &lt;strong&gt;&lt;code&gt;1200&lt;/code&gt; to &lt;code&gt;1450&lt;/code&gt; logical qubits&lt;/strong&gt;, &lt;strong&gt;under &lt;code&gt;500,000&lt;/code&gt; physical qubits&lt;/strong&gt;, and &lt;strong&gt;roughly &lt;code&gt;9&lt;/code&gt; to &lt;code&gt;12&lt;/code&gt; minutes&lt;/strong&gt; on a fast-clock superconducting machine.&lt;/li&gt;&lt;/ul&gt; &lt;p&gt;Those are not minor edits. They are order-of-magnitude moves.&lt;/p&gt; &lt;p&gt;They also reinforce an awkward point: &lt;strong&gt;&lt;code&gt;ECC&lt;/code&gt; may fall before &lt;code&gt;RSA&lt;/code&gt; in many real systems.&lt;/strong&gt; The reason is simple. &lt;code&gt;Shor&lt;/code&gt; cares a lot about key size, and elliptic-curve systems use much smaller keys than &lt;code&gt;RSA&lt;/code&gt; for the same classical security.&lt;/p&gt; &lt;p&gt;That means modern systems that moved from &lt;code&gt;RSA&lt;/code&gt; to &lt;code&gt;ECDH&lt;/code&gt; or &lt;code&gt;ECDSA&lt;/code&gt; for efficiency did the right thing for the classical internet, but may have moved into an easier quantum target.&lt;/p&gt; &lt;h2&gt;Why this does not mean “the internet breaks next year”&lt;/h2&gt; &lt;p&gt;Because these papers are not attacks you can run on today’s hardware.&lt;/p&gt; &lt;p&gt;They assume a machine with:&lt;/p&gt; &lt;ul&gt;&lt;li&gt;many error-corrected logical qubits&lt;/li&gt; &lt;li&gt;low enough physical error rates&lt;/li&gt; &lt;li&gt;stable control systems&lt;/li&gt; &lt;li&gt;enough scale to run large fault-tolerant circuits end to end&lt;/li&gt;&lt;/ul&gt; &lt;p&gt;Current systems are making real progress in quantum error correction, which is the hard part. For example, &lt;a href=&quot;https://arxiv.org/abs/2602.22211&quot; rel=&quot;nofollow&quot;&gt;Quantinuum reported&lt;/a&gt; experiments with &lt;code&gt;48&lt;/code&gt; to &lt;code&gt;94&lt;/code&gt; logical qubits in &lt;code&gt;2026&lt;/code&gt;. That is a real milestone, but it is still far from the thousands of logical qubits, deep circuits, and industrial-scale reliability needed for practical cryptanalysis.&lt;/p&gt; &lt;p&gt;So there are two facts to hold at once:&lt;/p&gt; &lt;ol&gt;&lt;li&gt;A quantum computer that can break mainstream cryptography does &lt;strong&gt;not&lt;/strong&gt; exist today.&lt;/li&gt; &lt;li&gt;The distance to one looks shorter than it did a few years ago.&lt;/li&gt;&lt;/ol&gt; &lt;p&gt;That is why the serious debate is no longer “if,” but “how wide the remaining window is.”&lt;/p&gt; &lt;h2&gt;The most honest timeline&lt;/h2&gt; &lt;p&gt;No one can give you a real date, and anybody who says ”&lt;code&gt;2033&lt;/code&gt;” or ”&lt;code&gt;2041&lt;/code&gt;” with confidence is pretending. But we can still say useful things.&lt;/p&gt; &lt;p&gt;&lt;a href=&quot;https://pages.nist.gov/nccoe-migration-post-quantum-cryptography/FAQ/&quot; rel=&quot;nofollow&quot;&gt;NIST’s migration FAQ&lt;/a&gt; says estimates for a cryptographically relevant quantum computer range from &lt;strong&gt;around &lt;code&gt;2030&lt;/code&gt;&lt;/strong&gt; on the aggressive end, to &lt;strong&gt;15 to 20 years&lt;/strong&gt;, to &lt;strong&gt;30+ years&lt;/strong&gt; on the slow end.&lt;/p&gt; &lt;p&gt;The &lt;a href=&quot;https://globalriskinstitute.org/publication/2024-quantum-threat-timeline-report/&quot; rel=&quot;nofollow&quot;&gt;Global Risk Institute’s &lt;code&gt;2024&lt;/code&gt; expert survey&lt;/a&gt; found a &lt;strong&gt;significant chance within &lt;code&gt;10&lt;/code&gt; years&lt;/strong&gt; and a majority view that it becomes likely within &lt;code&gt;15&lt;/code&gt; years.&lt;/p&gt; &lt;p&gt;Germany’s &lt;a href=&quot;https://www.bsi.bund.de/EN/Themen/Unternehmen-und-Organisationen/Informationen-und-Empfehlungen/Quantentechnologien-und-Post-Quanten-Kryptografie/Entwicklungsstand-Quantencomputer/entwicklungsstand-quantencomputer.html&quot; rel=&quot;nofollow&quot;&gt;BSI&lt;/a&gt;, taking a conservative mainstream view, says a cryptographically relevant quantum computer is likely within &lt;code&gt;15&lt;/code&gt; years and reasonably expected by about &lt;code&gt;2040&lt;/code&gt;, with faster progress possible if newer &lt;code&gt;qLDPC&lt;/code&gt; error-correction methods work well.&lt;/p&gt; &lt;p&gt;I also think &lt;a href=&quot;https://words.filippo.io/crqc-timeline/&quot; rel=&quot;nofollow&quot;&gt;Filippo Valsorda’s timeline piece&lt;/a&gt; adds a useful way to think about the uncertainty. He puts it like this:&lt;/p&gt; &lt;blockquote&gt;&lt;p&gt;The bet is not ‘are you 100% sure a CRQC will exist in 2030?’, the bet is ‘are you 100% sure a CRQC will NOT exist in 2030?’&lt;/p&gt;&lt;/blockquote&gt; &lt;p&gt;That is a better framing than arguing over one magic year. Security teams do not need certainty. They need to decide whether the downside of being late is acceptable.&lt;/p&gt; &lt;p&gt;Put that together and the practical forecast looks like this:&lt;/p&gt; &lt;ul&gt;&lt;li&gt;&lt;strong&gt;Before &lt;code&gt;2030&lt;/code&gt;:&lt;/strong&gt; possible but still an aggressive case&lt;/li&gt; &lt;li&gt;&lt;strong&gt;&lt;code&gt;2030&lt;/code&gt; to &lt;code&gt;2035&lt;/code&gt;:&lt;/strong&gt; a serious planning window, not a sci-fi one&lt;/li&gt; &lt;li&gt;&lt;strong&gt;Late &lt;code&gt;2030s&lt;/code&gt;:&lt;/strong&gt; plausible even under more conservative assumptions&lt;/li&gt; &lt;li&gt;&lt;strong&gt;After &lt;code&gt;2040&lt;/code&gt;:&lt;/strong&gt; still possible, but not a safe excuse to delay&lt;/li&gt;&lt;/ul&gt; &lt;p&gt;That is why &lt;a href=&quot;https://csrc.nist.gov/pubs/ir/8547/ipd&quot; rel=&quot;nofollow&quot;&gt;NIST IR 8547&lt;/a&gt; proposes deprecating common quantum-vulnerable public-key schemes after &lt;code&gt;2030&lt;/code&gt; and disallowing them after &lt;code&gt;2035&lt;/code&gt;.&lt;/p&gt; &lt;p&gt;Those dates are not proof that &lt;code&gt;Q-Day&lt;/code&gt; is &lt;code&gt;2035&lt;/code&gt;. They are what a responsible migration schedule looks like when the risk window is uncertain and the replacement takes years.&lt;/p&gt; &lt;h2&gt;Symmetric crypto is not the urgent problem&lt;/h2&gt; &lt;p&gt;The common shortcut is: Grover gives a square-root speedup, so &lt;code&gt;AES-128&lt;/code&gt; becomes “really” &lt;code&gt;64&lt;/code&gt; bits, therefore everybody should move to &lt;code&gt;AES-256&lt;/code&gt;.&lt;/p&gt; &lt;p&gt;That is too simple. Grover attacks do not parallelize the way normal brute force does, which makes the practical cost much worse than the slogan suggests.&lt;/p&gt; &lt;p&gt;&lt;a href=&quot;https://words.filippo.io/128-bits/&quot; rel=&quot;nofollow&quot;&gt;Filippo Valsorda’s note on 128-bit keys&lt;/a&gt; says it plainly:&lt;/p&gt; &lt;blockquote&gt;&lt;p&gt;“AES-128 is safe against quantum computers. SHA-256 is safe against quantum computers. No symmetric key sizes have to change as part of the post-quantum transition.”&lt;/p&gt;&lt;/blockquote&gt; &lt;p&gt;That matches &lt;a href=&quot;https://csrc.nist.gov/projects/post-quantum-cryptography/faqs&quot; rel=&quot;nofollow&quot;&gt;NIST’s PQC FAQ&lt;/a&gt;, which says it is quite likely Grover will provide little or no practical advantage against &lt;code&gt;AES&lt;/code&gt;, and that current applications can continue using &lt;code&gt;AES-128&lt;/code&gt;, &lt;code&gt;AES-192&lt;/code&gt;, or &lt;code&gt;AES-256&lt;/code&gt;.&lt;/p&gt; &lt;p&gt;&lt;strong&gt;Do not let arguments about symmetric key sizes slow down the migration away from &lt;code&gt;RSA&lt;/code&gt;, &lt;code&gt;ECDH&lt;/code&gt;, and &lt;code&gt;ECDSA&lt;/code&gt;.&lt;/strong&gt; The urgent work is on the public-key side.&lt;/p&gt; &lt;h2&gt;What “break” really means in practice&lt;/h2&gt; &lt;p&gt;For most companies, there are three separate questions hiding inside the headline.&lt;/p&gt; &lt;h3&gt;1. When can attackers decrypt traffic they recorded earlier?&lt;/h3&gt; &lt;p&gt;This is the earliest real risk for sensitive long-lived data. If your TLS sessions today use quantum-vulnerable key exchange, an attacker who records the traffic now may be able to read it later. That matters for diplomatic traffic, health records, corporate secrets, long-lived credentials, and anything with a long confidentiality shelf life. That is why &lt;strong&gt;key exchange&lt;/strong&gt; is the first migration priority.&lt;/p&gt; &lt;h3&gt;2. When can attackers forge signatures?&lt;/h3&gt; &lt;p&gt;This comes later, but it is worse in some ways.&lt;/p&gt; &lt;p&gt;Once a machine can break &lt;code&gt;ECDSA&lt;/code&gt;, &lt;code&gt;RSA&lt;/code&gt;, or similar signature schemes quickly enough, it can forge software updates, fake identities, impersonate certificate holders, or steal from systems that expose public keys on-chain or in protocols.&lt;/p&gt; &lt;p&gt;That is a direct integrity problem, not just a privacy problem.&lt;/p&gt; &lt;h3&gt;3. When does the average website need to care?&lt;/h3&gt; &lt;p&gt;Honestly, not every site needs to panic today.&lt;/p&gt; &lt;p&gt;If you run a small site with short-lived sessions and no high-value archived data, quantum risk is not your top fire.&lt;/p&gt; &lt;p&gt;But if you operate infrastructure, VPNs, SSH fleets, PKI, code signing, hardware lifecycles, regulated data, or long-lived secrets, you should already be planning.&lt;/p&gt; &lt;h2&gt;What to do now&lt;/h2&gt; &lt;p&gt;The hard part is not choosing a year. The hard part is reducing dependence on &lt;code&gt;RSA&lt;/code&gt; and &lt;code&gt;ECC&lt;/code&gt; before the date matters.&lt;/p&gt; &lt;p&gt;A sane short checklist:&lt;/p&gt; &lt;ol&gt;&lt;li&gt;Inventory where you use public-key cryptography: &lt;code&gt;TLS&lt;/code&gt;, &lt;code&gt;VPN&lt;/code&gt;, &lt;code&gt;SSH&lt;/code&gt;, certificates, code signing, hardware roots of trust, document signing, blockchain keys.&lt;/li&gt; &lt;li&gt;Separate &lt;strong&gt;key exchange&lt;/strong&gt; from &lt;strong&gt;signatures&lt;/strong&gt; in your plan. They migrate on different timelines.&lt;/li&gt; &lt;li&gt;Move long-lived confidentiality use cases to the front of the queue.&lt;/li&gt; &lt;li&gt;Prefer &lt;code&gt;AES-256&lt;/code&gt; and strong hashes for symmetric components.&lt;/li&gt; &lt;li&gt;Start testing the NIST standards: &lt;a href=&quot;https://csrc.nist.gov/pubs/fips/203/final&quot; rel=&quot;nofollow&quot;&gt;ML-KEM&lt;/a&gt;, &lt;a href=&quot;https://csrc.nist.gov/pubs/fips/204/final&quot; rel=&quot;nofollow&quot;&gt;ML-DSA&lt;/a&gt;, and &lt;a href=&quot;https://csrc.nist.gov/pubs/fips/205/final&quot; rel=&quot;nofollow&quot;&gt;SLH-DSA&lt;/a&gt;.&lt;/li&gt; &lt;li&gt;Expect hybrid deployments first. In many systems, the first practical step is hybrid &lt;code&gt;TLS&lt;/code&gt; key exchange, not a full overnight switch of every certificate and signature stack.&lt;/li&gt; &lt;li&gt;Build for crypto agility. The teams that suffer most will be the ones that hard-coded one algorithm family into everything.&lt;/li&gt;&lt;/ol&gt; &lt;h2&gt;My bottom line&lt;/h2&gt; &lt;p&gt;If you want the shortest honest answer to the title question, here it is:&lt;/p&gt; &lt;p&gt;&lt;strong&gt;Quantum computing probably breaks the big public-key cryptosystems in the &lt;code&gt;2030s&lt;/code&gt;, with enough uncertainty that anyone protecting long-lived secrets should act now, not when the first machine shows up.&lt;/strong&gt;&lt;/p&gt; &lt;p&gt;&lt;code&gt;RSA&lt;/code&gt; and &lt;code&gt;ECC&lt;/code&gt; are living on borrowed time. &lt;code&gt;AES-256&lt;/code&gt; is not the main problem. The migration has already started. The only thing still uncertain is whether the industry finishes in an orderly way or gets forced into it late.&lt;/p&gt;&lt;!--]--&gt;</description>
</item>
<item>
    <title>How to learn programming in 2026</title>
    <link>https://blog.alcazarsec.com/tech/posts/how-to-learn-programming-2026</link>
    <pubDate>Thu, 16 Apr 2026 00:00:00 GMT</pubDate>
    <description>&lt;!--[--&gt;&lt;p&gt;If you want to learn programming in &lt;code&gt;2026&lt;/code&gt;, do not start by asking which degree to get.&lt;/p&gt; &lt;p&gt;Start by asking which small problem you can solve this month.&lt;/p&gt; &lt;p&gt;That is the big change. The old default path was: study for years, then maybe build. The new default path is: pick a real problem, use AI tools to get unstuck, ship something small, and learn the theory that the work forces you to learn.&lt;/p&gt; &lt;p&gt;For most people, I would not recommend a computer science degree as the default route into programming anymore.&lt;/p&gt; &lt;p&gt;Not because fundamentals stopped mattering. They did not. The &lt;a href=&quot;https://www.bls.gov/ooh/computer-and-information-technology/software-developers.htm&quot; rel=&quot;nofollow&quot;&gt;&lt;code&gt;U.S. Bureau of Labor Statistics&lt;/code&gt;&lt;/a&gt; still projects strong demand for software developers, with &lt;code&gt;15%&lt;/code&gt; growth from &lt;code&gt;2024&lt;/code&gt; to &lt;code&gt;2034&lt;/code&gt; and about &lt;code&gt;129,200&lt;/code&gt; openings per year across software developers, QA analysts, and testers. But the way you become useful changed. A lot of the beginner pain that used to justify years of slow, formal ramp-up is now reduced by tools that can explain code, generate scaffolding, debug errors, and help you ship faster.&lt;/p&gt; &lt;p&gt;At the same time, the credential story is getting weaker, even if it has not disappeared. A &lt;code&gt;2024&lt;/code&gt; report from the &lt;a href=&quot;https://www.hbs.edu/managing-the-future-of-work/Documents/research/Skills-Based%20Hiring.pdf&quot; rel=&quot;nofollow&quot;&gt;&lt;code&gt;Burning Glass Institute and Harvard Business School&lt;/code&gt;&lt;/a&gt; found that the annual number of roles dropping degree requirements increased almost fourfold from &lt;code&gt;2014&lt;/code&gt; to &lt;code&gt;2023&lt;/code&gt;. Hiring has not fully caught up, which is exactly why beginners need proof of work instead of empty optimism. The market is in between worlds. Degrees matter less than they used to, but the people who win without them still need to show that they can actually build.&lt;/p&gt; &lt;p&gt;So here is the path I would give a beginner today.&lt;/p&gt; &lt;h2&gt;The starter setup&lt;/h2&gt; &lt;p&gt;If I were starting from zero and wanted the fastest route to useful work, I would use this:&lt;/p&gt; &lt;ul&gt;&lt;li&gt;&lt;code&gt;JavaScript&lt;/code&gt; first, then &lt;code&gt;TypeScript&lt;/code&gt;&lt;/li&gt; &lt;li&gt;&lt;code&gt;Replit&lt;/code&gt; if I wanted zero setup and the fastest first win&lt;/li&gt; &lt;li&gt;an AI editor like &lt;code&gt;Cursor&lt;/code&gt;, &lt;code&gt;Windsurf&lt;/code&gt;, or &lt;code&gt;VS Code&lt;/code&gt; with &lt;a href=&quot;https://docs.github.com/en/copilot/get-started/features&quot; rel=&quot;nofollow&quot;&gt;&lt;code&gt;GitHub Copilot&lt;/code&gt;&lt;/a&gt; once I was ready to work locally&lt;/li&gt; &lt;li&gt;&lt;a href=&quot;https://docs.anthropic.com/en/docs/claude-code/overview&quot; rel=&quot;nofollow&quot;&gt;&lt;code&gt;Claude Code&lt;/code&gt;&lt;/a&gt;, &lt;code&gt;ChatGPT&lt;/code&gt;, or &lt;code&gt;Claude&lt;/code&gt; in the browser as a tutor, debugger, and reviewer&lt;/li&gt; &lt;li&gt;&lt;code&gt;GitHub&lt;/code&gt; from day one&lt;/li&gt; &lt;li&gt;simple deployment, either &lt;a href=&quot;https://docs.replit.com/getting-started/quickstarts/ask-ai&quot; rel=&quot;nofollow&quot;&gt;&lt;code&gt;Replit&lt;/code&gt; publishing&lt;/a&gt; or a static/frontend host like &lt;code&gt;Vercel&lt;/code&gt;&lt;/li&gt; &lt;li&gt;&lt;code&gt;Postgres&lt;/code&gt; or &lt;code&gt;Supabase&lt;/code&gt; later, not on day one&lt;/li&gt;&lt;/ul&gt; &lt;p&gt;That stack is not sacred. The point is to keep the number of moving parts small.&lt;/p&gt; &lt;p&gt;I would choose &lt;code&gt;JavaScript&lt;/code&gt; first because it gives beginners the shortest path from idea to visible result. You can make buttons, forms, calculators, dashboards, landing pages, and small web apps without switching languages. Later, the same family of tools can cover frontend, backend, APIs, and light automation.&lt;/p&gt; &lt;p&gt;If you want the easiest possible start, &lt;a href=&quot;https://docs.replit.com/core-concepts/agent/&quot; rel=&quot;nofollow&quot;&gt;&lt;code&gt;Replit Agent&lt;/code&gt;&lt;/a&gt; is hard to ignore. Their docs are very direct about the value proposition: describe what you want in plain language, let the agent build it, preview it, then publish it. That matters for beginners because local setup is a real tax. A lot of people do not quit programming because functions are too hard. They quit because the first weekend disappears into Node version problems, terminals, PATH issues, and random config.&lt;/p&gt; &lt;p&gt;Once you are serious enough to work locally, use one AI-native editor and stick with it for a few months. &lt;a href=&quot;https://docs.github.com/en/copilot/get-started/features&quot; rel=&quot;nofollow&quot;&gt;&lt;code&gt;GitHub Copilot&lt;/code&gt;&lt;/a&gt; now covers inline suggestions, multi-file edits, and agent mode. &lt;a href=&quot;https://docs.anthropic.com/en/docs/claude-code/overview&quot; rel=&quot;nofollow&quot;&gt;&lt;code&gt;Claude Code&lt;/code&gt;&lt;/a&gt; can read your codebase, edit files, run commands, and work across multiple files and tools. Pick one main editing setup and one main explainer model. Do not spend your first three months comparing nine assistants like a sports bettor.&lt;/p&gt; &lt;h2&gt;What to learn first&lt;/h2&gt; &lt;p&gt;The beginner mistake is trying to “cover the field” before building anything.&lt;/p&gt; &lt;p&gt;Do this instead:&lt;/p&gt; &lt;ol&gt;&lt;li&gt;Learn enough &lt;code&gt;HTML&lt;/code&gt;, &lt;code&gt;CSS&lt;/code&gt;, and &lt;code&gt;JavaScript&lt;/code&gt; to make a small page interactive.&lt;/li&gt; &lt;li&gt;Learn &lt;code&gt;Git&lt;/code&gt; and &lt;code&gt;GitHub&lt;/code&gt; early.&lt;/li&gt; &lt;li&gt;Learn how to read error messages.&lt;/li&gt; &lt;li&gt;Learn how to deploy.&lt;/li&gt; &lt;li&gt;Learn backend and databases only when your project needs them.&lt;/li&gt; &lt;li&gt;Learn deeper CS topics as support, not as your main loop.&lt;/li&gt;&lt;/ol&gt; &lt;p&gt;That order is not academically pure. It is productive.&lt;/p&gt; &lt;p&gt;If you want structure, use courses as scaffolding around projects:&lt;/p&gt; &lt;ul&gt;&lt;li&gt;&lt;a href=&quot;https://www.theodinproject.com/home&quot; rel=&quot;nofollow&quot;&gt;&lt;code&gt;The Odin Project&lt;/code&gt;&lt;/a&gt; is still one of the best practical web paths because it is built around curated resources and real projects, not endless passive watching.&lt;/li&gt; &lt;li&gt;Its &lt;a href=&quot;https://www.theodinproject.com/courses/web-development-101&quot; rel=&quot;nofollow&quot;&gt;&lt;code&gt;Foundations&lt;/code&gt;&lt;/a&gt; course is a good example of the right shape: HTML, CSS, Git, JavaScript basics, DOM work, and small projects.&lt;/li&gt; &lt;li&gt;&lt;a href=&quot;https://cs50.harvard.edu/x/2026/&quot; rel=&quot;nofollow&quot;&gt;&lt;code&gt;CS50x&lt;/code&gt;&lt;/a&gt; is excellent when you want stronger problem-solving habits. Harvard describes it as an introduction to computer science for students with or without prior experience, with an emphasis on correctness, design, and style.&lt;/li&gt; &lt;li&gt;&lt;a href=&quot;https://github.com/freeCodeCamp/freeCodeCamp&quot; rel=&quot;nofollow&quot;&gt;&lt;code&gt;freeCodeCamp&lt;/code&gt;&lt;/a&gt; is still useful for drills, repetition, and free structure.&lt;/li&gt;&lt;/ul&gt; &lt;p&gt;My recommendation is simple: use project-first resources like Odin as the spine, and use &lt;code&gt;CS50x&lt;/code&gt; as your theory gym when you want to get sharper.&lt;/p&gt; &lt;h2&gt;The new skill is not typing faster&lt;/h2&gt; &lt;p&gt;The highest-leverage beginner skill in &lt;code&gt;2026&lt;/code&gt; is not memorizing syntax. It is steering the tools well.&lt;/p&gt; &lt;p&gt;Can you describe a problem clearly? Can you break it into steps? Can you tell when the AI is wrong? Can you test the result? Can you ask a better follow-up question instead of pasting the same broken prompt again?&lt;/p&gt; &lt;p&gt;AI does not remove the need for skill. It changes the shape of the skill. A beginner who can guide the tools well, read the output critically, and keep shipping will outlearn the person who is still trying to become a human autocomplete engine.&lt;/p&gt; &lt;h2&gt;How to use AI without fooling yourself&lt;/h2&gt; &lt;p&gt;This is where a lot of beginners go wrong.&lt;/p&gt; &lt;p&gt;Do not use AI as a magic code vending machine. Use it like this:&lt;/p&gt; &lt;ul&gt;&lt;li&gt;ask it to explain the code it wrote in plain English&lt;/li&gt; &lt;li&gt;ask it what assumptions it made&lt;/li&gt; &lt;li&gt;ask it for the smallest next step, not the entire app&lt;/li&gt; &lt;li&gt;ask it to write tests or give you manual test cases&lt;/li&gt; &lt;li&gt;paste the exact error message and ask what it means&lt;/li&gt; &lt;li&gt;ask it for two approaches and the tradeoffs&lt;/li&gt; &lt;li&gt;ask it to review your code after you change it yourself&lt;/li&gt;&lt;/ul&gt; &lt;p&gt;A good prompt for a beginner looks more like:&lt;/p&gt; &lt;p&gt;&lt;code&gt;I&#39;m new to JavaScript. I want to build a simple invoice tracker for one freelancer. Give me the smallest working version first. Explain each file. Do not add auth, payments, or a database unless I ask for them.&lt;/code&gt;&lt;/p&gt; &lt;p&gt;That gets you much better results than:&lt;/p&gt; &lt;p&gt;&lt;code&gt;Build me a SaaS.&lt;/code&gt;&lt;/p&gt; &lt;p&gt;The other hard rule is this: never accept code you cannot roughly explain.&lt;/p&gt; &lt;p&gt;You do not need to understand every character. But you should understand what the file is for, what data is flowing through it, and how you would debug it if it broke.&lt;/p&gt; &lt;h2&gt;Do not confuse school with progress&lt;/h2&gt; &lt;p&gt;This is the part that annoys people, but I think it is true.&lt;/p&gt; &lt;p&gt;For most aspiring programmers who want to become productive and make money, a computer science degree is now the slow path, not the default path.&lt;/p&gt; &lt;p&gt;There are exceptions:&lt;/p&gt; &lt;ul&gt;&lt;li&gt;you want to do ML research&lt;/li&gt; &lt;li&gt;you want to work on compilers, operating systems, or deeply technical infrastructure&lt;/li&gt; &lt;li&gt;you need a formal credential for immigration, recruiting filters, or personal reasons&lt;/li&gt; &lt;li&gt;you genuinely want the college experience and can afford the time and cost&lt;/li&gt;&lt;/ul&gt; &lt;p&gt;But if your actual goal is “I want to build useful software, get good fast, and maybe freelance, join a startup, or build a small product,” then spending four years outside the market is often a bad trade.&lt;/p&gt; &lt;p&gt;A better plan is:&lt;/p&gt; &lt;ul&gt;&lt;li&gt;start building now&lt;/li&gt; &lt;li&gt;use AI to compress the boring parts&lt;/li&gt; &lt;li&gt;fill in your weak spots as they appear&lt;/li&gt; &lt;li&gt;get paid for increasingly valuable problems&lt;/li&gt;&lt;/ul&gt; &lt;p&gt;You can still learn algorithms, networking, and databases. You should. Just do not wait for official permission to become useful.&lt;/p&gt; &lt;h2&gt;What to build if you want to make money&lt;/h2&gt; &lt;p&gt;The fastest money in programming is usually not in clever projects.&lt;/p&gt; &lt;p&gt;It is in boring projects that solve a clear problem.&lt;/p&gt; &lt;p&gt;Good beginner targets:&lt;/p&gt; &lt;ul&gt;&lt;li&gt;a simple website for a local business&lt;/li&gt; &lt;li&gt;a lead capture page tied to email&lt;/li&gt; &lt;li&gt;a small internal dashboard&lt;/li&gt; &lt;li&gt;a form that turns messy input into a clean PDF or CSV&lt;/li&gt; &lt;li&gt;an automation that saves someone an hour a week&lt;/li&gt; &lt;li&gt;a niche directory or calculator site that can rank in search&lt;/li&gt; &lt;li&gt;a tiny CRUD app for one real workflow&lt;/li&gt;&lt;/ul&gt; &lt;p&gt;Bad beginner targets:&lt;/p&gt; &lt;ul&gt;&lt;li&gt;the next social network&lt;/li&gt; &lt;li&gt;an AI wrapper with no distribution&lt;/li&gt; &lt;li&gt;a giant multi-tenant SaaS on week one&lt;/li&gt; &lt;li&gt;a startup idea so broad that you cannot finish the first screen&lt;/li&gt;&lt;/ul&gt; &lt;p&gt;Money shows up faster when you solve expensive annoyances.&lt;/p&gt; &lt;p&gt;A restaurant owner does not care whether you understand red-black trees. They care whether online bookings stop disappearing. A small agency does not care whether you aced discrete math. They care whether their reporting workflow still burns six hours every Friday.&lt;/p&gt; &lt;h2&gt;A realistic 90-day path&lt;/h2&gt; &lt;p&gt;If I had to compress this into one simple plan, it would look like this.&lt;/p&gt; &lt;h3&gt;Month 1&lt;/h3&gt; &lt;p&gt;Learn basic &lt;code&gt;HTML&lt;/code&gt;, &lt;code&gt;CSS&lt;/code&gt;, and &lt;code&gt;JavaScript&lt;/code&gt;.&lt;/p&gt; &lt;p&gt;Build:&lt;/p&gt; &lt;ul&gt;&lt;li&gt;a calculator&lt;/li&gt; &lt;li&gt;a to-do app&lt;/li&gt; &lt;li&gt;a small business landing page&lt;/li&gt;&lt;/ul&gt; &lt;p&gt;Use AI constantly, but make it explain everything.&lt;/p&gt; &lt;h3&gt;Month 2&lt;/h3&gt; &lt;p&gt;Pick one project that resembles real work.&lt;/p&gt; &lt;p&gt;Examples:&lt;/p&gt; &lt;ul&gt;&lt;li&gt;invoice tracker&lt;/li&gt; &lt;li&gt;appointment booking site&lt;/li&gt; &lt;li&gt;basic CRM for one person&lt;/li&gt; &lt;li&gt;content planner&lt;/li&gt; &lt;li&gt;stock or price tracker&lt;/li&gt;&lt;/ul&gt; &lt;p&gt;Deploy it. Put it on &lt;code&gt;GitHub&lt;/code&gt;. Ask three real people to use it. Fix what breaks.&lt;/p&gt; &lt;h3&gt;Month 3&lt;/h3&gt; &lt;p&gt;Add one backend and one database.&lt;/p&gt; &lt;p&gt;Learn:&lt;/p&gt; &lt;ul&gt;&lt;li&gt;simple APIs&lt;/li&gt; &lt;li&gt;CRUD&lt;/li&gt; &lt;li&gt;auth only if the project really needs it&lt;/li&gt; &lt;li&gt;basic SQL&lt;/li&gt;&lt;/ul&gt; &lt;p&gt;Then build one project that could plausibly make money or help someone make money.&lt;/p&gt; &lt;p&gt;By the end of those &lt;code&gt;90&lt;/code&gt; days, you will still have gaps. That is normal. You will also know more than the person who spent the same three months shopping for the perfect curriculum.&lt;/p&gt; &lt;h2&gt;The shape of a modern beginner&lt;/h2&gt; &lt;p&gt;The beginner who wins in &lt;code&gt;2026&lt;/code&gt; is not the one who refuses AI to prove purity.&lt;/p&gt; &lt;p&gt;It is also not the one who lets AI do everything and learns nothing.&lt;/p&gt; &lt;p&gt;It is the one in the middle:&lt;/p&gt; &lt;ul&gt;&lt;li&gt;practical enough to use every good tool&lt;/li&gt; &lt;li&gt;skeptical enough to check the output&lt;/li&gt; &lt;li&gt;focused enough to finish small projects&lt;/li&gt; &lt;li&gt;humble enough to keep learning fundamentals&lt;/li&gt; &lt;li&gt;market-aware enough to solve problems people will pay for&lt;/li&gt;&lt;/ul&gt; &lt;p&gt;Learn enough to build. Build enough to learn more. Use the tools. Keep the loop tight. Get paid for useful work as early as you can.&lt;/p&gt; &lt;p&gt;For most beginners now, that is a better bet than waiting years to feel officially ready.&lt;/p&gt;&lt;!--]--&gt;</description>
</item>
<item>
    <title>Claude Mythos is the first model Anthropic didn&#39;t really release</title>
    <link>https://blog.alcazarsec.com/tech/posts/claude-mythos-the-first-model-anthropic-didnt-really-release</link>
    <pubDate>Thu, 09 Apr 2026 00:00:00 GMT</pubDate>
    <description>&lt;!--[--&gt;&lt;p&gt;Claude Mythos matters for one reason above all: &lt;code&gt;Anthropic&lt;/code&gt; seems to think it trained a model good enough at finding and exploiting bugs that the normal release playbook no longer made sense.&lt;/p&gt; &lt;p&gt;That is more interesting than another few points on &lt;code&gt;GPQA&lt;/code&gt;.&lt;/p&gt; &lt;p&gt;Instead of putting Mythos into broad general availability, Anthropic put it behind &lt;a href=&quot;https://www.anthropic.com/glasswing&quot; rel=&quot;nofollow&quot;&gt;Project Glasswing&lt;/a&gt;, limited access to a small set of defenders and infrastructure companies, published a long cybersecurity writeup on its &lt;a href=&quot;https://red.anthropic.com/2026/mythos-preview/&quot; rel=&quot;nofollow&quot;&gt;Frontier Red Team blog&lt;/a&gt;, and paired it with a public &lt;a href=&quot;https://www.anthropic.com/claude-mythos-preview-risk-report&quot; rel=&quot;nofollow&quot;&gt;alignment risk report&lt;/a&gt;. That does not prove every claim is right. It does tell us Anthropic thinks Mythos is different in a way that matters operationally, not just in marketing. The story became public in stages; &lt;a href=&quot;https://fortune.com/2026/03/26/anthropic-says-testing-mythos-powerful-new-ai-model-after-data-leak-reveals-its-existence-step-change-in-capabilities&quot; rel=&quot;nofollow&quot;&gt;Fortune reported&lt;/a&gt; a leak and Anthropic’s confirmation before the full April launch, and &lt;a href=&quot;https://www.securityweek.com/anthropic-unveils-claude-mythos-a-cybersecurity-breakthrough-that-could-also-supercharge-attacks/&quot; rel=&quot;nofollow&quot;&gt;SecurityWeek summarized&lt;/a&gt; the dual-use framing others picked up.&lt;/p&gt; &lt;p&gt;My read is simple:&lt;/p&gt; &lt;ul&gt;&lt;li&gt;&lt;code&gt;Claude Mythos&lt;/code&gt; looks like a real step up over &lt;code&gt;Claude Opus 4.6&lt;/code&gt;&lt;/li&gt; &lt;li&gt;the biggest jump appears in agentic coding and cybersecurity, not in casual chat&lt;/li&gt; &lt;li&gt;on AGI timelines, it looks more like acceleration inside the current curve than a clean break from it&lt;/li&gt; &lt;li&gt;the most important missing number is still the one many people care about most: we do not have a public &lt;a href=&quot;https://www.metr.org/time-horizons/&quot; rel=&quot;nofollow&quot;&gt;METR&lt;/a&gt; time-horizon score for Mythos&lt;/li&gt;&lt;/ul&gt; &lt;h2&gt;What Anthropic has actually said&lt;/h2&gt; &lt;p&gt;The basic facts are now public.&lt;/p&gt; &lt;p&gt;Anthropic says &lt;code&gt;Claude Mythos Preview&lt;/code&gt; is a general-purpose frontier model, stronger than its prior public models, but unusually strong at cybersecurity tasks. It says the model found thousands of high-severity vulnerabilities, including bugs in every major operating system and every major browser, and that it can autonomously turn some of those bugs into working exploits. Anthropic also says it does &lt;strong&gt;not&lt;/strong&gt; plan to make Mythos Preview generally available for now. Instead, it is giving access to a narrow group of defenders through Project Glasswing, including organizations like &lt;code&gt;AWS&lt;/code&gt;, &lt;code&gt;Google&lt;/code&gt;, &lt;code&gt;Microsoft&lt;/code&gt;, &lt;code&gt;Cisco&lt;/code&gt;, &lt;code&gt;CrowdStrike&lt;/code&gt;, &lt;code&gt;Palo Alto Networks&lt;/code&gt;, &lt;code&gt;JPMorganChase&lt;/code&gt;, and the &lt;code&gt;Linux Foundation&lt;/code&gt;.&lt;/p&gt; &lt;p&gt;That is already unusual.&lt;/p&gt; &lt;p&gt;Labs often say a new model is powerful. They do not often say, in effect, “we are going to keep this one mostly off the public shelf because the cyber capabilities look too spicy.”&lt;/p&gt; &lt;p&gt;Anthropic’s own risk report also says Mythos is the most capable model it has trained, that it is used internally for coding and agentic work, and that overall alignment risk remains “very low, but higher than for previous models.” In other words, Anthropic is not claiming Mythos is rogue. It is claiming the capability jump is large enough that the old comfort level no longer applies automatically.&lt;/p&gt; &lt;p&gt;The company has been telegraphing a cyber inflection for a while. In &lt;a href=&quot;https://www.anthropic.com/research/building-ai-cyber-defenders&quot; rel=&quot;nofollow&quot;&gt;building AI for cyber defenders&lt;/a&gt; it argued models had crossed from theory to practice on security tasks and that defenders needed to adopt AI or cede advantage. Glasswing is the operational version of that argument.&lt;/p&gt; &lt;h2&gt;The benchmark picture&lt;/h2&gt; &lt;p&gt;On public numbers, Mythos looks strongest in the places that matter for real software work.&lt;/p&gt; &lt;p&gt;Compared with &lt;code&gt;Opus 4.6&lt;/code&gt;, Anthropic reports (figures from the &lt;a href=&quot;https://www.anthropic.com/glasswing&quot; rel=&quot;nofollow&quot;&gt;Glasswing announcement&lt;/a&gt;):&lt;/p&gt; &lt;ul&gt;&lt;li&gt;&lt;a href=&quot;https://www.swebench.com/verified.html&quot; rel=&quot;nofollow&quot;&gt;SWE-bench Verified&lt;/a&gt;: &lt;code&gt;93.9%&lt;/code&gt; vs &lt;code&gt;80.8%&lt;/code&gt; (the benchmark is a human-filtered subset of real GitHub issues; see the &lt;a href=&quot;https://openreview.net/forum?id=VTF8yNQM66&quot; rel=&quot;nofollow&quot;&gt;SWE-bench paper&lt;/a&gt; and &lt;a href=&quot;https://openai.com/index/introducing-swe-bench-verified&quot; rel=&quot;nofollow&quot;&gt;OpenAI’s Verified writeup&lt;/a&gt;)&lt;/li&gt; &lt;li&gt;&lt;code&gt;SWE-bench Pro&lt;/code&gt;: &lt;code&gt;77.8%&lt;/code&gt; vs &lt;code&gt;53.4%&lt;/code&gt;&lt;/li&gt; &lt;li&gt;&lt;a href=&quot;https://www.tbench.ai/benchmarks/terminal-bench-2&quot; rel=&quot;nofollow&quot;&gt;Terminal-Bench 2.0&lt;/a&gt;: &lt;code&gt;82.0%&lt;/code&gt; vs &lt;code&gt;65.4%&lt;/code&gt; (&lt;a href=&quot;https://arxiv.org/abs/2601.11868&quot; rel=&quot;nofollow&quot;&gt;paper&lt;/a&gt;)&lt;/li&gt; &lt;li&gt;&lt;code&gt;SWE-bench Multilingual&lt;/code&gt;: &lt;code&gt;87.3%&lt;/code&gt; vs &lt;code&gt;77.8%&lt;/code&gt;&lt;/li&gt; &lt;li&gt;&lt;a href=&quot;https://arxiv.org/abs/2311.12022&quot; rel=&quot;nofollow&quot;&gt;GPQA Diamond&lt;/a&gt;: &lt;code&gt;94.6%&lt;/code&gt; vs &lt;code&gt;91.3%&lt;/code&gt;&lt;/li&gt; &lt;li&gt;&lt;a href=&quot;https://lastexam.ai/&quot; rel=&quot;nofollow&quot;&gt;Humanity’s Last Exam&lt;/a&gt;: &lt;code&gt;56.8%&lt;/code&gt; vs &lt;code&gt;40.0%&lt;/code&gt; without tools, &lt;code&gt;64.7%&lt;/code&gt; vs &lt;code&gt;53.1%&lt;/code&gt; with tools (&lt;a href=&quot;https://arxiv.org/abs/2501.14249&quot; rel=&quot;nofollow&quot;&gt;paper&lt;/a&gt;)&lt;/li&gt; &lt;li&gt;&lt;a href=&quot;https://openai.com/index/browsecomp&quot; rel=&quot;nofollow&quot;&gt;BrowseComp&lt;/a&gt;: &lt;code&gt;86.9%&lt;/code&gt; vs &lt;code&gt;83.7%&lt;/code&gt; (&lt;a href=&quot;https://arxiv.org/abs/2504.12516&quot; rel=&quot;nofollow&quot;&gt;paper&lt;/a&gt;)&lt;/li&gt; &lt;li&gt;&lt;a href=&quot;https://os-world.github.io/&quot; rel=&quot;nofollow&quot;&gt;OSWorld-Verified&lt;/a&gt;: &lt;code&gt;79.6%&lt;/code&gt; vs &lt;code&gt;72.7%&lt;/code&gt; (&lt;a href=&quot;https://openreview.net/forum?id=tN61DTr4Ed&quot; rel=&quot;nofollow&quot;&gt;benchmark paper&lt;/a&gt;, &lt;a href=&quot;https://xlang.ai/blog/osworld-verified&quot; rel=&quot;nofollow&quot;&gt;verified refresh&lt;/a&gt;)&lt;/li&gt; &lt;li&gt;&lt;a href=&quot;https://www.cybergym.io/&quot; rel=&quot;nofollow&quot;&gt;CyberGym&lt;/a&gt;: &lt;code&gt;83.1%&lt;/code&gt; vs &lt;code&gt;66.6%&lt;/code&gt; (&lt;a href=&quot;https://openreview.net/forum?id=2YvbLQEdYt&quot; rel=&quot;nofollow&quot;&gt;paper&lt;/a&gt;)&lt;/li&gt;&lt;/ul&gt; &lt;p&gt;Some of those gains are moderate. Some are not.&lt;/p&gt; &lt;p&gt;The biggest story here is not &lt;code&gt;GPQA&lt;/code&gt;. Going from &lt;code&gt;91.3%&lt;/code&gt; to &lt;code&gt;94.6%&lt;/code&gt; on a hard science benchmark is real, but it is not the part that should reset your mental model.&lt;/p&gt; &lt;p&gt;The bigger story is the cluster:&lt;/p&gt; &lt;ul&gt;&lt;li&gt;big jump on harder software engineering benchmarks&lt;/li&gt; &lt;li&gt;big jump on terminal work&lt;/li&gt; &lt;li&gt;big jump on cybersecurity reproduction&lt;/li&gt; &lt;li&gt;decent jump on computer use&lt;/li&gt;&lt;/ul&gt; &lt;p&gt;That cluster fits the release decision. Anthropic is not behaving like it trained a nicer chatbot. It is behaving like it trained a stronger software operator.&lt;/p&gt; &lt;p&gt;There are two caveats worth keeping in your head the whole time:&lt;/p&gt; &lt;p&gt;First, these are mostly vendor-reported numbers.&lt;/p&gt; &lt;p&gt;Second, Anthropic itself notes possible memorization concerns on Humanity’s Last Exam, and says it ran memorization screens on the &lt;code&gt;SWE-bench&lt;/code&gt; family. So these results are informative, but they are not sacred tablets.&lt;/p&gt; &lt;h2&gt;The cyber claims are the real headline&lt;/h2&gt; &lt;p&gt;The most striking claims are not the benchmark tables. They are in Anthropic’s &lt;a href=&quot;https://red.anthropic.com/2026/mythos-preview/&quot; rel=&quot;nofollow&quot;&gt;cyber writeup&lt;/a&gt;.&lt;/p&gt; &lt;p&gt;Anthropic says Mythos can identify and exploit zero-day vulnerabilities across major operating systems and browsers. More concretely, it gives several examples:&lt;/p&gt; &lt;ul&gt;&lt;li&gt;a &lt;code&gt;27&lt;/code&gt;-year-old &lt;code&gt;OpenBSD&lt;/code&gt; bug (Anthropic links a &lt;a href=&quot;https://ftp.openbsd.org/pub/OpenBSD/patches/7.8/common/025_sack.patch.sig&quot; rel=&quot;nofollow&quot;&gt;now-patched advisory path&lt;/a&gt;)&lt;/li&gt; &lt;li&gt;a &lt;code&gt;16&lt;/code&gt;-year-old &lt;code&gt;FFmpeg&lt;/code&gt; vulnerability (fixes landed in &lt;a href=&quot;https://git.ffmpeg.org/gitweb/ffmpeg.git/shortlog/n8.1&quot; rel=&quot;nofollow&quot;&gt;FFmpeg 8.1&lt;/a&gt; among others)&lt;/li&gt; &lt;li&gt;a &lt;code&gt;17&lt;/code&gt;-year-old &lt;code&gt;FreeBSD&lt;/code&gt; remote code execution bug triaged as &lt;a href=&quot;https://nvd.nist.gov/vuln/detail/CVE-2026-4747&quot; rel=&quot;nofollow&quot;&gt;CVE-2026-4747&lt;/a&gt;&lt;/li&gt; &lt;li&gt;multiple exploit chains in the &lt;code&gt;Linux&lt;/code&gt; kernel (one commit Anthropic cites is &lt;a href=&quot;https://github.com/torvalds/linux/commit/e2f78c7ec1655fedd945366151ba54fcb9580508&quot; rel=&quot;nofollow&quot;&gt;e2f78c7ec165&lt;/a&gt;)&lt;/li&gt;&lt;/ul&gt; &lt;p&gt;The numbers in the writeup are even more eye-catching than the examples.&lt;/p&gt; &lt;p&gt;Anthropic says that when it re-ran a &lt;code&gt;Firefox 147&lt;/code&gt; exploit-development benchmark, &lt;code&gt;Opus 4.6&lt;/code&gt; produced working exploits only &lt;code&gt;2&lt;/code&gt; times out of several hundred tries, while Mythos produced working exploits &lt;code&gt;181&lt;/code&gt; times and got register control &lt;code&gt;29&lt;/code&gt; more times.&lt;/p&gt; &lt;p&gt;It also says that on roughly &lt;code&gt;7000&lt;/code&gt; entry points from the &lt;code&gt;OSS-Fuzz&lt;/code&gt; corpus, earlier models mostly topped out at low-severity crashes, while Mythos got &lt;code&gt;595&lt;/code&gt; crashes at tiers 1 and 2, a handful at tiers 3 and 4, and &lt;code&gt;10&lt;/code&gt; full control-flow hijacks at tier 5.&lt;/p&gt; &lt;p&gt;If those numbers are broadly right, then the right frame is not “AI is helping with security now.” The right frame is “the cost of searching for serious bugs may be dropping fast enough to change who finds them first.”&lt;/p&gt; &lt;p&gt;That is the scary part and the exciting part at the same time.&lt;/p&gt; &lt;p&gt;Anthropic’s argument is that defenders should still win in the long run, because the same models that find bugs can also patch them, review code, and harden systems at scale. I think that is probably right over a long enough time horizon. But the transition could still be ugly. If model capability improves faster than patching pipelines, testing infrastructure, and boring institutional response, attackers get a window. Anthropic follows &lt;a href=&quot;https://www.anthropic.com/coordinated-vulnerability-disclosure&quot; rel=&quot;nofollow&quot;&gt;coordinated vulnerability disclosure&lt;/a&gt; for what it finds; the question is whether the rest of the ecosystem can absorb the pace.&lt;/p&gt; &lt;p&gt;That seems to be exactly what Anthropic is worried about.&lt;/p&gt; &lt;h2&gt;So where does Mythos fit on AGI timelines?&lt;/h2&gt; &lt;p&gt;This is where people start overclaiming.&lt;/p&gt; &lt;p&gt;&lt;code&gt;Dario Amodei&lt;/code&gt; has been publicly arguing that we are near “the end of the exponential” and that something like a “country of geniuses in a data center” could show up on a &lt;code&gt;1-3&lt;/code&gt; year horizon, with very high confidence on a &lt;code&gt;10&lt;/code&gt;-year view. You can hear it in the &lt;a href=&quot;https://www.dwarkesh.com/p/dario-amodei-2&quot; rel=&quot;nofollow&quot;&gt;Dwarkesh Patel interview&lt;/a&gt; and the &lt;a href=&quot;https://www.youtube.com/watch?v=n1E9IZfvGMA&quot; rel=&quot;nofollow&quot;&gt;YouTube recording&lt;/a&gt;. If you read those comments in &lt;code&gt;2024&lt;/code&gt;, they sounded to many people like frontier-lab chest beating. In &lt;code&gt;2026&lt;/code&gt;, Mythos makes them sound more grounded, though still not settled.&lt;/p&gt; &lt;p&gt;The strongest evidence for near-term AGI has never been “the model got a little better at trivia.” It is the idea that models keep getting better at economically real, verifiable, tool-using work. Software is the cleanest example because code lives in text, terminals, test suites, logs, diffs, and reproducible environments. It is exactly the kind of world language models can inhabit.&lt;/p&gt; &lt;p&gt;On that front, Mythos is a bullish datapoint.&lt;/p&gt; &lt;p&gt;But it is not a complete one.&lt;/p&gt; &lt;p&gt;The cleanest outside metric we have for long-horizon autonomy is still &lt;a href=&quot;https://www.metr.org/time-horizons/&quot; rel=&quot;nofollow&quot;&gt;METR’s task-completion time horizon&lt;/a&gt;. They explain the methodology in &lt;a href=&quot;https://www.metr.org/blog/2025-03-19-measuring-ai-ability-to-complete-long-tasks/&quot; rel=&quot;nofollow&quot;&gt;Measuring AI ability to complete long tasks&lt;/a&gt; and the underlying &lt;a href=&quot;https://arxiv.org/abs/2503.14499&quot; rel=&quot;nofollow&quot;&gt;paper&lt;/a&gt;. &lt;code&gt;METR&lt;/code&gt; currently has public numbers for &lt;code&gt;Opus 4.6&lt;/code&gt;, which lands around &lt;code&gt;14.5 hours&lt;/code&gt; at the &lt;code&gt;50%&lt;/code&gt; success level on its updated benchmark. That is already a big deal. It is well above where frontier models were not long ago.&lt;/p&gt; &lt;p&gt;The problem is that we do &lt;strong&gt;not&lt;/strong&gt; have a public &lt;code&gt;METR&lt;/code&gt; score for Mythos.&lt;/p&gt; &lt;p&gt;So if someone tells you Mythos proves week-long autonomous software work is here, they are getting ahead of the evidence. The honest answer is narrower:&lt;/p&gt; &lt;ul&gt;&lt;li&gt;Mythos strongly suggests the current curve is still moving&lt;/li&gt; &lt;li&gt;Mythos strengthens the case that software and cyber will keep going first&lt;/li&gt; &lt;li&gt;Mythos does not yet close the loop on broad autonomous work the way a fresh METR score would&lt;/li&gt;&lt;/ul&gt; &lt;p&gt;For a different slice of “autonomy,” &lt;a href=&quot;https://www.metr.org/blog/2024-11-22-evaluating-r-d-capabilities-of-llms/&quot; rel=&quot;nofollow&quot;&gt;RE-Bench&lt;/a&gt; looks at ML research engineering tasks; it is a reminder that “long horizon” means different things in different domains. &lt;a href=&quot;https://metr.org/blog/2026-03-12-sabotage-risk-report-opus-4-6-review/&quot; rel=&quot;nofollow&quot;&gt;METR’s review&lt;/a&gt; of Anthropic’s Opus 4.6 sabotage risk report is also worth reading if you care how third parties stress-test lab risk narratives.&lt;/p&gt; &lt;h2&gt;Are we accelerating, or just staying on pace?&lt;/h2&gt; &lt;p&gt;My answer is: both, depending on the slice.&lt;/p&gt; &lt;p&gt;On broad reasoning, Mythos looks like strong continued progress. &lt;code&gt;GPQA&lt;/code&gt;, &lt;code&gt;HLE&lt;/code&gt;, &lt;code&gt;BrowseComp&lt;/code&gt;, and &lt;code&gt;OSWorld&lt;/code&gt; all improve, but they do not scream “new paradigm.”&lt;/p&gt; &lt;p&gt;On agentic coding and cyber, Mythos looks more like acceleration.&lt;/p&gt; &lt;p&gt;That distinction matters. A lot of the public AGI argument gets confused because people average together very different capability domains. If you average everything into one vibe, you miss what is actually changing first.&lt;/p&gt; &lt;p&gt;The sharpest thing to say is this:&lt;/p&gt; &lt;p&gt;Mythos looks ahead of the previous pace where the real world is easiest to score and easiest to monetize: coding, terminals, exploit generation, bug finding, and computer-use loops.&lt;/p&gt; &lt;p&gt;That does not mean the whole AGI debate is over. It does mean the “maybe models are kind of plateauing” story is getting harder to tell with a straight face, at least for software-adjacent work.&lt;/p&gt; &lt;p&gt;Still, I would not call Mythos a clean discontinuity yet. It looks more like a steepening continuation of the same story we have been watching since agentic coding started to feel real.&lt;/p&gt; &lt;p&gt;Anthropic’s &lt;a href=&quot;https://www.anthropic.com/research/economic-index-march-2026-report&quot; rel=&quot;nofollow&quot;&gt;Economic Index&lt;/a&gt; shows coding and API automation still dominating real usage; the &lt;a href=&quot;https://www.anthropic.com/research/anthropic-economic-index-january-2026-report&quot; rel=&quot;nofollow&quot;&gt;January primitives report&lt;/a&gt; ties task success to estimated human time in ways that rhyme with METR’s story. None of that is Mythos-specific, but it is context for why software-shaped benchmarks keep biting first.&lt;/p&gt; &lt;h2&gt;A few things you might care about&lt;/h2&gt; &lt;p&gt;The first is benchmark saturation.&lt;/p&gt; &lt;p&gt;Anthropic says Mythos now saturates enough older cyber evaluations that it has shifted toward real-world zero-day discovery. &lt;a href=&quot;https://cybench.github.io/&quot; rel=&quot;nofollow&quot;&gt;Cybench&lt;/a&gt; (&lt;a href=&quot;https://openreview.net/forum?id=tc90LV0yRL&quot; rel=&quot;nofollow&quot;&gt;paper&lt;/a&gt;) is the kind of CTF-style suite labs have used to track this; when models stop separating on familiar suites, the action moves to messier targets. That is a bigger deal than another leaderboard shuffle.&lt;/p&gt; &lt;p&gt;The second is friction-based defense.&lt;/p&gt; &lt;p&gt;Anthropic explicitly argues that some defenses work mainly by making exploitation tedious, not impossible. That matters because tedious work is exactly what you can throw cheap, parallel model runs at. If a mitigation’s main virtue is “this would take a human exploit developer a long weekend,” that mitigation may age badly.&lt;/p&gt; &lt;p&gt;The third is memory-safe triumphalism.&lt;/p&gt; &lt;p&gt;Mythos reportedly found bugs not only in classic &lt;code&gt;C&lt;/code&gt; and &lt;code&gt;C++&lt;/code&gt; targets but also in “memory-safe” systems where unsafe boundaries still existed. Anthropic points to a &lt;a href=&quot;https://github.com/randombit/botan/security/advisories/GHSA-v782-6fq4-q827&quot; rel=&quot;nofollow&quot;&gt;critical Botan TLS issue&lt;/a&gt; as one cryptography-library example. That is not an argument against memory-safe languages. It is an argument against thinking language choice makes security easy. The interesting security question is moving from “does this code use Rust?” to “where are the unsafe seams, protocol assumptions, and logic gaps?”&lt;/p&gt; &lt;p&gt;The fourth is the bottleneck shift.&lt;/p&gt; &lt;p&gt;If models get much better at bug discovery than organizations get at patch validation, rollout, and incident response, the limiting factor stops being finding problems. It becomes operations. Software teams may soon drown less in unknown bugs than in known bugs they cannot fix fast enough.&lt;/p&gt; &lt;h2&gt;My take&lt;/h2&gt; &lt;p&gt;Claude Mythos does not prove &lt;code&gt;AGI&lt;/code&gt;.&lt;/p&gt; &lt;p&gt;It does not prove &lt;code&gt;ASI&lt;/code&gt;.&lt;/p&gt; &lt;p&gt;It does not even prove Anthropic’s strongest cyber claims in the sense a fully independent public evaluation would.&lt;/p&gt; &lt;p&gt;But it does look like the first recent model where the vendor’s own behavior changed in a way that is more informative than the benchmark gains. Anthropic is telling us, through its release choices, that it thinks the cyber jump is real enough to warrant a different deployment regime.&lt;/p&gt; &lt;p&gt;That is the part I would take seriously.&lt;/p&gt; &lt;p&gt;If you only look at Mythos as “Opus plus some benchmark uplift,” you miss the main story. The main story is that software security may be entering the phase where scaled model search beats a lot of human intuition, human patience, and older defensive habits.&lt;/p&gt; &lt;p&gt;And if that is true, then Mythos is less a product launch than a warning shot.&lt;/p&gt;&lt;!--]--&gt;</description>
</item>
<item>
    <title>How time-lock puzzles work</title>
    <link>https://blog.alcazarsec.com/tech/posts/time-lock-puzzles-need-hidden-order-groups</link>
    <pubDate>Mon, 30 Mar 2026 00:00:00 GMT</pubDate>
    <description>&lt;!--[--&gt;&lt;p&gt;The core idea behind a time-lock puzzle is simple: hide a secret behind a computation that has to be done in order, one step after another.&lt;/p&gt; &lt;p&gt;That sounds easy until you ask the real question: what kind of computation stays slow even if the attacker has a lot of hardware?&lt;/p&gt; &lt;p&gt;The best practical answer so far is to use a mathematical setting where the obvious shortcut is hidden. The classic version uses RSA-style repeated squaring. The main alternative uses class groups.&lt;/p&gt; &lt;p&gt;The short version is that time-lock puzzles are about &lt;strong&gt;forced waiting by computation&lt;/strong&gt;. You publish something now, but nobody should be able to open it until they have done a long chain of sequential work.&lt;/p&gt; &lt;h2&gt;Explain it like I’m five&lt;/h2&gt; &lt;p&gt;Imagine I lock a note in a box.&lt;/p&gt; &lt;p&gt;But this is a strange box.&lt;/p&gt; &lt;p&gt;It does not open with a key. It opens only after you turn a crank &lt;code&gt;10 million&lt;/code&gt; times.&lt;/p&gt; &lt;p&gt;The catch is that the box only counts turns if you do them one after another. You cannot hire &lt;code&gt;1,000&lt;/code&gt; people to each do &lt;code&gt;10,000&lt;/code&gt; turns at the same time. The box ignores that trick.&lt;/p&gt; &lt;p&gt;A time-lock puzzle is the math version of that box.&lt;/p&gt; &lt;p&gt;You take a secret, lock it behind a long chain of steps, and publish the locked version. Anyone can open it eventually, but only after doing the steps in order.&lt;/p&gt; &lt;p&gt;That is the whole idea.&lt;/p&gt; &lt;p&gt;The rest of the article is about the hard part: how to build a mathematical box that really behaves like that.&lt;/p&gt; &lt;h2&gt;What a time-lock puzzle is trying to do&lt;/h2&gt; &lt;p&gt;Suppose Alice wants to publish an encrypted message now that should only become readable after roughly one week of computation.&lt;/p&gt; &lt;p&gt;She does &lt;strong&gt;not&lt;/strong&gt; want to trust a server to release a key next week.&lt;/p&gt; &lt;p&gt;She wants the delay to come from math itself.&lt;/p&gt; &lt;p&gt;So the puzzle needs two properties:&lt;/p&gt; &lt;ol&gt;&lt;li&gt;It should be fast to create.&lt;/li&gt; &lt;li&gt;It should be slow to solve, even for someone with parallel hardware.&lt;/li&gt;&lt;/ol&gt; &lt;p&gt;That second point is the hard part. A lot of cryptographic work is expensive, but still parallelizable. If I can split the work across 1,000 machines, it is not a good time-lock puzzle.&lt;/p&gt; &lt;p&gt;The ideal puzzle forces a long chain of steps where step &lt;code&gt;i+1&lt;/code&gt; depends on the output of step &lt;code&gt;i&lt;/code&gt;. No shortcuts. No broad parallel speedup.&lt;/p&gt; &lt;p&gt;That is why time-lock puzzles are closely related to later ideas like &lt;a href=&quot;https://eprint.iacr.org/2018/601&quot; rel=&quot;nofollow&quot;&gt;verifiable delay functions&lt;/a&gt;, or &lt;code&gt;VDFs&lt;/code&gt;. A VDF is basically a delay function with a cheap proof that the output is correct. Time-lock puzzles came first. VDFs cleaned up the idea and made it easier to use in protocols.&lt;/p&gt; &lt;h2&gt;What people use them for&lt;/h2&gt; &lt;p&gt;The original 1996 paper already gave some practical examples:&lt;/p&gt; &lt;ul&gt;&lt;li&gt;sealed-bid auctions, where bids should stay secret until bidding closes&lt;/li&gt; &lt;li&gt;voting and contract-signing style protocols, where early disclosure would be unfair&lt;/li&gt; &lt;li&gt;timed release of private material, like a diary, archive, or message meant for the future&lt;/li&gt; &lt;li&gt;delayed access in escrow-like systems&lt;/li&gt;&lt;/ul&gt; &lt;p&gt;Modern papers add a few more:&lt;/p&gt; &lt;ul&gt;&lt;li&gt;timed commitments&lt;/li&gt; &lt;li&gt;public randomness systems&lt;/li&gt; &lt;li&gt;blockchain protocols that need a source of delay nobody can cheaply skip&lt;/li&gt;&lt;/ul&gt; &lt;p&gt;Some of these are more common in research papers than in mainstream products. That is worth saying plainly. Time-lock puzzles are influential, but they are still more common as a cryptographic building block than as a feature ordinary users interact with every day.&lt;/p&gt; &lt;h2&gt;The classic RSA construction&lt;/h2&gt; &lt;p&gt;The original practical construction comes from Rivest, Shamir, and Wagner’s 1996 paper on &lt;a href=&quot;https://dspace.mit.edu/handle/1721.1/149822&quot; rel=&quot;nofollow&quot;&gt;time-lock puzzles and timed-release crypto&lt;/a&gt;.&lt;/p&gt; &lt;p&gt;The basic trick is elegant.&lt;/p&gt; &lt;p&gt;Start with an RSA modulus &lt;code&gt;N = p * q&lt;/code&gt;. The puzzle creator knows &lt;code&gt;p&lt;/code&gt; and &lt;code&gt;q&lt;/code&gt;, so they know &lt;code&gt;phi(N)&lt;/code&gt;. Everyone else only sees &lt;code&gt;N&lt;/code&gt;.&lt;/p&gt; &lt;p&gt;Now pick some base &lt;code&gt;a&lt;/code&gt;, and define the hard part of the puzzle as:&lt;/p&gt; &lt;p&gt;&lt;code&gt;a^(2^T) mod N&lt;/code&gt;&lt;/p&gt; &lt;p&gt;Here &lt;code&gt;T&lt;/code&gt; is the delay parameter. If &lt;code&gt;T&lt;/code&gt; is huge, the obvious way to compute this is to square &lt;code&gt;a&lt;/code&gt; mod &lt;code&gt;N&lt;/code&gt;, then square the result, then square again, for &lt;code&gt;T&lt;/code&gt; rounds.&lt;/p&gt; &lt;p&gt;That is the sequential chain:&lt;/p&gt; &lt;p&gt;&lt;code&gt;a -&gt; a^2 -&gt; a^4 -&gt; a^8 -&gt; ... -&gt; a^(2^T) mod N&lt;/code&gt;&lt;/p&gt; &lt;p&gt;The puzzle setter can cheat, in the good sense, because they know &lt;code&gt;phi(N)&lt;/code&gt;. They can reduce the exponent modulo &lt;code&gt;phi(N)&lt;/code&gt; and compute the final answer quickly. A solver who does &lt;strong&gt;not&lt;/strong&gt; know the factorization of &lt;code&gt;N&lt;/code&gt; is believed to be stuck doing the long chain of squarings.&lt;/p&gt; &lt;p&gt;So the sender does this:&lt;/p&gt; &lt;ol&gt;&lt;li&gt;Generate a symmetric key &lt;code&gt;k&lt;/code&gt;.&lt;/li&gt; &lt;li&gt;Encrypt the message with &lt;code&gt;k&lt;/code&gt;.&lt;/li&gt; &lt;li&gt;Hide &lt;code&gt;k&lt;/code&gt; using the value &lt;code&gt;a^(2^T) mod N&lt;/code&gt;.&lt;/li&gt; &lt;li&gt;Publish the puzzle.&lt;/li&gt;&lt;/ol&gt; &lt;p&gt;Anybody can eventually recover &lt;code&gt;k&lt;/code&gt;, but only after doing about &lt;code&gt;T&lt;/code&gt; sequential squarings.&lt;/p&gt; &lt;p&gt;This is the original time-lock puzzle in one sentence: &lt;strong&gt;hide the key behind repeated squaring in an RSA group, where the setter knows a shortcut and the solver is not supposed to have one.&lt;/strong&gt;&lt;/p&gt; &lt;h2&gt;Why this works at all&lt;/h2&gt; &lt;p&gt;RSA is usually introduced as a public-key system. Time-lock puzzles use a different feature.&lt;/p&gt; &lt;p&gt;They use the fact that the multiplicative group modulo &lt;code&gt;N&lt;/code&gt; has &lt;strong&gt;hidden order&lt;/strong&gt; if you do not know the factorization of &lt;code&gt;N&lt;/code&gt;.&lt;/p&gt; &lt;p&gt;That hidden order is the key idea.&lt;/p&gt; &lt;p&gt;If the solver knew the group order, they could often reduce huge exponents modulo that order and skip the long sequence. The delay would collapse. The reason the RSA puzzle has a chance is that &lt;code&gt;phi(N)&lt;/code&gt; is hidden.&lt;/p&gt; &lt;p&gt;This “hidden-order group” view is more useful than thinking “time-lock puzzles use RSA.” RSA is just one concrete way to get a hidden-order group.&lt;/p&gt; &lt;p&gt;Once you see that, a lot of the literature makes more sense:&lt;/p&gt; &lt;ul&gt;&lt;li&gt;RSA groups are the classic hidden-order choice.&lt;/li&gt; &lt;li&gt;Class groups are the main non-RSA hidden-order choice.&lt;/li&gt; &lt;li&gt;Known-order groups are usually bad news for generic delay constructions.&lt;/li&gt;&lt;/ul&gt; &lt;h2&gt;The main alternative: class groups&lt;/h2&gt; &lt;p&gt;If hidden order is what you want, RSA is not the only option.&lt;/p&gt; &lt;p&gt;The main alternative is the class group of an imaginary quadratic field. That phrase sounds much scarier than the high-level idea.&lt;/p&gt; &lt;p&gt;You can think of class groups here as another algebraic setting where:&lt;/p&gt; &lt;ul&gt;&lt;li&gt;the group order is hidden&lt;/li&gt; &lt;li&gt;repeated operations can still be computed&lt;/li&gt; &lt;li&gt;there is no need to generate an RSA modulus with secret factors&lt;/li&gt;&lt;/ul&gt; &lt;p&gt;That last point is the big selling point.&lt;/p&gt; &lt;p&gt;RSA-based delay systems usually need a trusted setup story. Someone has to generate &lt;code&gt;N = p * q&lt;/code&gt;, and if they keep &lt;code&gt;p&lt;/code&gt; and &lt;code&gt;q&lt;/code&gt;, they keep the shortcut forever. Class groups avoid that particular trust problem. That is a big reason they became attractive in the VDF literature, and surveys like &lt;a href=&quot;https://ir.cwi.nl/pub/31559&quot; rel=&quot;nofollow&quot;&gt;this one from CWI&lt;/a&gt; treat them as one of the main practical hidden-order candidates.&lt;/p&gt; &lt;p&gt;The tradeoff is that class-group arithmetic is less familiar and usually less mature in engineering terms than plain modular arithmetic modulo an RSA integer.&lt;/p&gt; &lt;p&gt;So the practical picture today is roughly:&lt;/p&gt; &lt;ul&gt;&lt;li&gt;RSA groups are simple and well understood, but setup is touchy.&lt;/li&gt; &lt;li&gt;Class groups avoid trusted RSA setup, but they are more specialized.&lt;/li&gt;&lt;/ul&gt; &lt;h2&gt;A quick note on elliptic curves&lt;/h2&gt; &lt;p&gt;Since this is a common question: standard elliptic-curve groups are usually not where classic time-lock puzzles come from.&lt;/p&gt; &lt;p&gt;The short reason is that their group order is known, which tends to destroy the kind of “no shortcut” behavior these puzzles need. There is theory behind that, including Rotem, Segev, and Shahaf’s &lt;a href=&quot;https://eprint.iacr.org/2020/225&quot; rel=&quot;nofollow&quot;&gt;2020 result&lt;/a&gt; on hidden-order groups.&lt;/p&gt; &lt;p&gt;There are still curve-related delay constructions in the literature, especially around &lt;strong&gt;isogenies&lt;/strong&gt; and &lt;strong&gt;hyperelliptic curves&lt;/strong&gt;. For example, there is a line of work on VDFs and delay encryption where the slow step is walking through a long chain of isogenies rather than doing repeated squaring in a hidden-order group, including &lt;a href=&quot;https://cybersecurity.springeropen.com/articles/10.1186/s42400-023-00189-2&quot; rel=&quot;nofollow&quot;&gt;this hyperelliptic-curve paper&lt;/a&gt;.&lt;/p&gt; &lt;p&gt;But that is a different world from ordinary ECC. It is better to think of those as alternative delay constructions, not as a simple drop-in replacement for the original RSA-style puzzle.&lt;/p&gt; &lt;h2&gt;Other directions in the literature&lt;/h2&gt; &lt;p&gt;Once the original RSA puzzle existed, researchers pushed in a few directions.&lt;/p&gt; &lt;p&gt;One line asked whether time-lock puzzles can be built from broader assumptions, such as randomized encodings or random oracles. Some of that work is mostly theoretical. It helps explain what is possible in principle, but it has not displaced hidden-order groups in practice.&lt;/p&gt; &lt;p&gt;Another line focused on &lt;strong&gt;security models&lt;/strong&gt;. Katz, Loss, and Xu’s &lt;a href=&quot;https://eprint.iacr.org/2020/730&quot; rel=&quot;nofollow&quot;&gt;2020 paper&lt;/a&gt; studies the security of time-lock puzzles and timed commitments more carefully, including the classic sequential-squaring assumption.&lt;/p&gt; &lt;p&gt;Another line focused on &lt;strong&gt;verifiability&lt;/strong&gt;. A plain time-lock puzzle tells you “wait and solve.” A VDF adds an efficient way to verify that the solver’s output is right. That is why VDFs became popular in decentralized systems, randomness beacons, and blockchain protocol design.&lt;/p&gt; &lt;p&gt;And very recent work keeps adding features, like &lt;a href=&quot;https://eprint.iacr.org/2025/225&quot; rel=&quot;nofollow&quot;&gt;verifiable time-lock puzzles&lt;/a&gt;, where the solver can know in advance that the thing they will eventually reveal is actually useful.&lt;/p&gt; &lt;p&gt;The basic architecture, though, has not changed that much:&lt;/p&gt; &lt;ul&gt;&lt;li&gt;get a sequential process&lt;/li&gt; &lt;li&gt;avoid shortcut structure&lt;/li&gt; &lt;li&gt;preferably make verification cheap&lt;/li&gt;&lt;/ul&gt; &lt;p&gt;Hidden-order groups still sit near the center of that story.&lt;/p&gt; &lt;h2&gt;The annoying part: “time” is only approximate&lt;/h2&gt; &lt;p&gt;Time-lock puzzles do not measure wall-clock time directly. They measure how long a particular sequential computation takes on available hardware.&lt;/p&gt; &lt;p&gt;That means the “unlocks in 7 days” story is always approximate.&lt;/p&gt; &lt;p&gt;A faster machine will solve earlier. A slower machine will solve later. Hardware progress changes the estimate. Specialized implementations can change it too.&lt;/p&gt; &lt;p&gt;This is one reason trusted-release systems still exist. If you need a message to open at exactly 09:00 UTC next Tuesday, pure time-lock puzzles are awkward. If you want “this should take a lot of sequential work and no one should be able to shortcut it cheaply,” then they are much more natural.&lt;/p&gt; &lt;h2&gt;My take&lt;/h2&gt; &lt;p&gt;The cleanest way to understand time-lock puzzles is this:&lt;/p&gt; &lt;p&gt;They are a search for &lt;strong&gt;sequentiality without trust&lt;/strong&gt;.&lt;/p&gt; &lt;p&gt;The original RSA puzzle works because hidden order blocks the obvious exponent shortcut. Class groups matter because they give you hidden order without RSA setup. That is the conceptual center of the field.&lt;/p&gt; &lt;p&gt;For classic time-lock puzzles, the real dividing line is not old crypto versus new crypto. It is &lt;strong&gt;hidden order versus known order&lt;/strong&gt;.&lt;/p&gt;&lt;!--]--&gt;</description>
</item>
<item>
    <title>Hermes vs OpenClaw: what should you actually start with?</title>
    <link>https://blog.alcazarsec.com/tech/posts/hermes-vs-openclaw</link>
    <pubDate>Mon, 23 Mar 2026 00:00:00 GMT</pubDate>
    <description>&lt;!--[--&gt;&lt;p&gt;If you want the short answer first, here it is:&lt;/p&gt; &lt;p&gt;&lt;strong&gt;Most people should start with &lt;code&gt;OpenClaw&lt;/code&gt;. Pick &lt;code&gt;Hermes&lt;/code&gt; instead if you care more about learning over time, stronger safety rails, and running the agent in a cleaner sandboxed environment.&lt;/strong&gt;&lt;/p&gt; &lt;p&gt;That is the honest answer after looking at the official &lt;a href=&quot;https://docs.openclaw.ai/&quot; rel=&quot;nofollow&quot;&gt;OpenClaw docs&lt;/a&gt;, the official &lt;a href=&quot;https://hermes-agent.nousresearch.com/docs&quot; rel=&quot;nofollow&quot;&gt;Hermes docs&lt;/a&gt;, the public discussion, and the benchmark material.&lt;/p&gt; &lt;p&gt;&lt;code&gt;OpenClaw&lt;/code&gt; is easier to recommend as the default because it has a much larger community, much more visible real-world usage, and a more mature “message your agent from anywhere” product shape. &lt;code&gt;Hermes&lt;/code&gt; is the more interesting design if you want an agent that builds memory and reusable skills as it works, and if you want a security model that feels more deliberate out of the box.&lt;/p&gt; &lt;p&gt;The mistake is thinking one of them simply crushes the other.&lt;/p&gt; &lt;p&gt;They are close enough that your starting point should depend less on hype and more on the kind of agent you want to live with.&lt;/p&gt; &lt;h2&gt;What they are, in plain English&lt;/h2&gt; &lt;p&gt;&lt;code&gt;OpenClaw&lt;/code&gt; is a self-hosted AI assistant gateway. You run it on your own machine or server, connect it to chat apps like WhatsApp, Telegram, Discord, Slack, or iMessage, and it becomes an always-available assistant you can message like a person. Its docs frame it as your own personal AI assistant, and that is the right mental model.&lt;/p&gt; &lt;p&gt;&lt;code&gt;Hermes Agent&lt;/code&gt;, from Nous Research, is also a self-hosted AI agent you can talk to through chat and other interfaces. The big difference is what it emphasizes. Hermes is built around a learning loop. It tries to remember useful things, turn repeated work into reusable skills, and get better over time instead of just executing the next prompt well.&lt;/p&gt; &lt;p&gt;So the simplest framing is this:&lt;/p&gt; &lt;ul&gt;&lt;li&gt;&lt;code&gt;OpenClaw&lt;/code&gt; is more gateway-first and product-first.&lt;/li&gt; &lt;li&gt;&lt;code&gt;Hermes&lt;/code&gt; is more agent-first and learning-first.&lt;/li&gt;&lt;/ul&gt; &lt;p&gt;That difference shows up everywhere else.&lt;/p&gt; &lt;h2&gt;What OpenClaw is better for&lt;/h2&gt; &lt;p&gt;&lt;code&gt;OpenClaw&lt;/code&gt; is the better starting point if you want an assistant that feels real quickly.&lt;/p&gt; &lt;p&gt;Its strengths are pretty straightforward:&lt;/p&gt; &lt;ul&gt;&lt;li&gt;huge community and momentum&lt;/li&gt; &lt;li&gt;lots of examples, tutorials, and public discussion&lt;/li&gt; &lt;li&gt;strong support for messaging-first workflows&lt;/li&gt; &lt;li&gt;a simple mental model: one self-hosted assistant you can reach from your normal chat apps&lt;/li&gt; &lt;li&gt;a public benchmark, &lt;a href=&quot;https://openclaw.report/ecosystem/pinchbench-openclaw-benchmark&quot; rel=&quot;nofollow&quot;&gt;&lt;code&gt;PinchBench&lt;/code&gt;&lt;/a&gt;, that tests models inside an OpenClaw-style runtime instead of only in abstract evals&lt;/li&gt;&lt;/ul&gt; &lt;p&gt;That last point matters more than it sounds. Most AI benchmarks measure the model in isolation. &lt;code&gt;PinchBench&lt;/code&gt; measures how models behave while doing actual OpenClaw tasks such as file work, data tasks, web research, memory use, and tool calls. That does not prove OpenClaw is “better” than Hermes, but it does mean OpenClaw has a more visible performance culture around real usage.&lt;/p&gt; &lt;p&gt;There is also a social proof effect here. Public conversations around OpenClaw are no longer just demos. People are describing real team workflows with it in shared chats, daily digests, standups, monitoring, and lightweight operations work on &lt;a href=&quot;https://news.ycombinator.com/item?id=47148380&quot; rel=&quot;nofollow&quot;&gt;Hacker News&lt;/a&gt;. That is usually a sign a tool has crossed from novelty into habit.&lt;/p&gt; &lt;p&gt;If your goal is:&lt;/p&gt; &lt;ul&gt;&lt;li&gt;“I want a personal or team AI assistant I can message from anywhere”&lt;/li&gt; &lt;li&gt;“I want the bigger ecosystem”&lt;/li&gt; &lt;li&gt;“I want the most obvious place to start this weekend”&lt;/li&gt;&lt;/ul&gt; &lt;p&gt;then &lt;code&gt;OpenClaw&lt;/code&gt; is the better fit.&lt;/p&gt; &lt;h2&gt;Where OpenClaw is weaker&lt;/h2&gt; &lt;p&gt;The biggest issue is security posture.&lt;/p&gt; &lt;p&gt;OpenClaw’s own &lt;a href=&quot;https://docs.openclaw.ai/security&quot; rel=&quot;nofollow&quot;&gt;security docs&lt;/a&gt; are clear that it is designed around a personal-assistant trust model, not a hostile multi-tenant environment. In plain English, it assumes the people and systems around one gateway mostly trust each other. That is reasonable for a personal assistant. It is a much shakier fit if you want strong separation between unrelated users, risky credentials, or loosely controlled team access.&lt;/p&gt; &lt;p&gt;There is also a pattern in public discussion that is worth taking seriously: people love what OpenClaw makes possible, but some are tired of rough edges, bugs, and the sheer amount of authority these agents can get if you are not careful.&lt;/p&gt; &lt;p&gt;So the cons are not subtle:&lt;/p&gt; &lt;ul&gt;&lt;li&gt;its power can outrun its safety if you expose it carelessly&lt;/li&gt; &lt;li&gt;it is easier to treat like a shared bot than it really should be&lt;/li&gt; &lt;li&gt;some users still see it as rough around the edges&lt;/li&gt; &lt;li&gt;the benchmark story is stronger than the safety story&lt;/li&gt;&lt;/ul&gt; &lt;p&gt;If you are the kind of user who always asks “what is the blast radius if this goes wrong?”, OpenClaw should make you pause before it makes you excited.&lt;/p&gt; &lt;h2&gt;What Hermes is better for&lt;/h2&gt; &lt;p&gt;&lt;code&gt;Hermes&lt;/code&gt; is better when you want the agent itself to improve, not just answer.&lt;/p&gt; &lt;p&gt;According to the official docs, Hermes has a built-in learning loop, persistent memory across sessions, autonomous skill creation, skill self-improvement, and a deeper user model. That is the core bet. Instead of giving you one smart assistant that stays roughly the same, Hermes tries to become more useful the longer it runs.&lt;/p&gt; &lt;p&gt;That can be a big deal if your use case looks like this:&lt;/p&gt; &lt;ul&gt;&lt;li&gt;repeated research tasks&lt;/li&gt; &lt;li&gt;recurring ops work&lt;/li&gt; &lt;li&gt;long-running personal workflows&lt;/li&gt; &lt;li&gt;workflows where remembering your preferences actually matters&lt;/li&gt;&lt;/ul&gt; &lt;p&gt;Hermes also looks better on security design.&lt;/p&gt; &lt;p&gt;Its &lt;a href=&quot;https://hermes-agent.nousresearch.com/docs/user-guide/security&quot; rel=&quot;nofollow&quot;&gt;security docs&lt;/a&gt; describe a defense-in-depth model with several layers: user authorization, approval for dangerous commands, container isolation, MCP credential filtering, and context-file scanning for prompt injection. It also supports more clearly isolated execution backends such as Docker, Modal, Daytona, Singularity, SSH, and local execution.&lt;/p&gt; &lt;p&gt;That does not make Hermes magically safe. No agent with tools is magically safe. But the design is more obviously trying to narrow the damage when things go wrong.&lt;/p&gt; &lt;p&gt;If your goal is:&lt;/p&gt; &lt;ul&gt;&lt;li&gt;“I want the agent to build up memory and reusable skills”&lt;/li&gt; &lt;li&gt;“I care a lot about sandboxing and operational control”&lt;/li&gt; &lt;li&gt;“I am comfortable with a smaller ecosystem in exchange for a stronger architecture”&lt;/li&gt;&lt;/ul&gt; &lt;p&gt;then &lt;code&gt;Hermes&lt;/code&gt; becomes very compelling.&lt;/p&gt; &lt;h2&gt;Where Hermes is weaker&lt;/h2&gt; &lt;p&gt;The clearest drawback is ecosystem maturity.&lt;/p&gt; &lt;p&gt;Hermes looks promising, but it is still much smaller in public adoption than OpenClaw. The GitHub gap alone is large. OpenClaw is sitting above &lt;code&gt;330k&lt;/code&gt; stars. Hermes is a little above &lt;code&gt;10k&lt;/code&gt;. Star counts are not truth, but they do tell you where the docs, examples, integrations, community habits, and troubleshooting gravity are likely to be.&lt;/p&gt; &lt;p&gt;The second drawback is that Hermes is easier to admire than to verify.&lt;/p&gt; &lt;p&gt;Its learning loop is a strong idea. Its environment and benchmark tooling are serious, as shown in the &lt;a href=&quot;https://hermes-agent.nousresearch.com/docs/developer-guide/environments&quot; rel=&quot;nofollow&quot;&gt;benchmark framework docs&lt;/a&gt;. But there is much less public evidence showing a clear, widely accepted leaderboard or a large body of third-party production stories proving that Hermes consistently beats the alternatives in day-to-day use.&lt;/p&gt; &lt;p&gt;That means the Hermes case is partly conceptual right now. You are buying into the design, not just the size of the installed base.&lt;/p&gt; &lt;p&gt;The tradeoff is simple:&lt;/p&gt; &lt;ul&gt;&lt;li&gt;&lt;code&gt;Hermes&lt;/code&gt; looks more thoughtful&lt;/li&gt; &lt;li&gt;&lt;code&gt;OpenClaw&lt;/code&gt; looks more proven&lt;/li&gt;&lt;/ul&gt; &lt;p&gt;For many buyers, that single contrast is the whole decision.&lt;/p&gt; &lt;h2&gt;What people seem to be recommending&lt;/h2&gt; &lt;p&gt;The public recommendation pattern is surprisingly consistent.&lt;/p&gt; &lt;p&gt;People reach for &lt;code&gt;OpenClaw&lt;/code&gt; when they want an assistant that lives in chat, works across channels, and feels immediately usable. The most enthusiastic OpenClaw users are not talking about abstract agent theory. They are talking about having an AI teammate in a group chat, letting it run daily standups, summarize work, watch competitors, or help with internal tasks.&lt;/p&gt; &lt;p&gt;People who lean toward &lt;code&gt;Hermes&lt;/code&gt; tend to want something a bit more disciplined. The appeal is less “look what I can wire into WhatsApp tonight” and more “I want an agent that remembers, improves, and runs inside a setup I can reason about.” Even the small &lt;a href=&quot;https://news.ycombinator.com/item?id=47264225&quot; rel=&quot;nofollow&quot;&gt;Hacker News thread on Hermes&lt;/a&gt; leans in that direction.&lt;/p&gt; &lt;p&gt;That is a useful clue.&lt;/p&gt; &lt;p&gt;&lt;code&gt;OpenClaw&lt;/code&gt; is what people recommend when they want momentum and usability.&lt;/p&gt; &lt;p&gt;&lt;code&gt;Hermes&lt;/code&gt; is what people recommend when they are a little more skeptical, a little more security-conscious, or a little more interested in the long-term shape of the system.&lt;/p&gt; &lt;h2&gt;What the benchmarks are actually saying&lt;/h2&gt; &lt;p&gt;This is where a lot of comparisons get muddy.&lt;/p&gt; &lt;p&gt;There is no clean, universal public benchmark that says ”&lt;code&gt;Hermes&lt;/code&gt; scored 82 and &lt;code&gt;OpenClaw&lt;/code&gt; scored 78, case closed.”&lt;/p&gt; &lt;p&gt;That is not how this market works yet.&lt;/p&gt; &lt;p&gt;Here is the honest version:&lt;/p&gt; &lt;p&gt;&lt;code&gt;OpenClaw&lt;/code&gt; has the more visible product-native benchmark story today because of &lt;code&gt;PinchBench&lt;/code&gt;. That benchmark tests models inside real OpenClaw-style tasks, and the current leaderboard shows strong frontier models such as &lt;code&gt;NVIDIA Nemotron-3-Super-120B&lt;/code&gt;, &lt;code&gt;Claude Opus 4.6&lt;/code&gt;, and &lt;code&gt;GPT-5.4&lt;/code&gt; performing well in that environment.&lt;/p&gt; &lt;p&gt;&lt;code&gt;Hermes&lt;/code&gt; has something different. Its docs show a serious benchmark and training framework with support for evaluations like &lt;code&gt;TerminalBench2&lt;/code&gt;, &lt;code&gt;TBLite&lt;/code&gt;, and &lt;code&gt;YC-Bench&lt;/code&gt;. That tells you Hermes is built by people who care about agent evaluation. But it is more an evaluation harness than a single public scoreboard that settles the product comparison for you.&lt;/p&gt; &lt;p&gt;So the benchmark takeaway is:&lt;/p&gt; &lt;ul&gt;&lt;li&gt;&lt;code&gt;OpenClaw&lt;/code&gt; has better public evidence for “which model works well inside this runtime?”&lt;/li&gt; &lt;li&gt;&lt;code&gt;Hermes&lt;/code&gt; has better public evidence for “this project takes agent training and evaluation seriously”&lt;/li&gt; &lt;li&gt;neither benchmark story cleanly proves one product is the universal winner&lt;/li&gt;&lt;/ul&gt; &lt;p&gt;That may sound unsatisfying, but it is actually useful. It keeps you from asking a benchmark to answer a product question it cannot really answer.&lt;/p&gt; &lt;h2&gt;So what should you start with?&lt;/h2&gt; &lt;p&gt;If you want one recommendation, here it is:&lt;/p&gt; &lt;p&gt;&lt;strong&gt;Start with &lt;code&gt;OpenClaw&lt;/code&gt; unless you already know you want Hermes’ learning loop and security posture.&lt;/strong&gt;&lt;/p&gt; &lt;p&gt;That is the best default for most technical users.&lt;/p&gt; &lt;p&gt;Because the first thing most people need is not the most elegant agent architecture. They need an agent they can get running, understand, and use. &lt;code&gt;OpenClaw&lt;/code&gt; wins there. It has more momentum, more examples, more discussion, and a clearer path from installation to “this is actually useful.”&lt;/p&gt; &lt;p&gt;But the exception matters.&lt;/p&gt; &lt;p&gt;If you are the sort of person who immediately worries about command approval, container boundaries, prompt injection through context files, or whether the system gets better after a month of real work, then skip the default and start with &lt;code&gt;Hermes&lt;/code&gt;.&lt;/p&gt; &lt;p&gt;That is not a niche concern. It is a real one.&lt;/p&gt; &lt;h2&gt;The simplest decision rule&lt;/h2&gt; &lt;p&gt;Use this and move on:&lt;/p&gt; &lt;ul&gt;&lt;li&gt;Start with &lt;code&gt;OpenClaw&lt;/code&gt; if you want the fastest path to a useful self-hosted assistant in chat apps.&lt;/li&gt; &lt;li&gt;Start with &lt;code&gt;Hermes&lt;/code&gt; if you want a more opinionated, learning-oriented, safety-conscious agent stack.&lt;/li&gt;&lt;/ul&gt; &lt;p&gt;If you still cannot decide, that usually means you should start with &lt;code&gt;OpenClaw&lt;/code&gt;.&lt;/p&gt; &lt;p&gt;It is the lower-friction bet.&lt;/p&gt; &lt;p&gt;And if you outgrow it, you will know exactly why you are moving.&lt;/p&gt;&lt;!--]--&gt;</description>
</item>
<item>
    <title>What is the best agentic AI today?</title>
    <link>https://blog.alcazarsec.com/tech/posts/openclaw-vs-nemoclaw</link>
    <pubDate>Fri, 20 Mar 2026 00:00:00 GMT</pubDate>
    <description>&lt;!--[--&gt;&lt;p&gt;Agentic AI is getting crowded fast.&lt;/p&gt; &lt;p&gt;One week everyone is talking about &lt;code&gt;OpenClaw&lt;/code&gt;. The next week it is &lt;code&gt;NVIDIA NemoClaw&lt;/code&gt;. Then someone insists the real answer is &lt;code&gt;LangGraph&lt;/code&gt;, or &lt;code&gt;OpenHands&lt;/code&gt;, or just “pick the best model and build the rest yourself.”&lt;/p&gt; &lt;p&gt;The useful answer is simpler than that noise makes it sound:&lt;/p&gt; &lt;p&gt;&lt;strong&gt;There are real leaders right now. You just have to be clear about which layer you mean.&lt;/strong&gt;&lt;/p&gt; &lt;p&gt;If you want a self-hosted personal assistant you can text from your phone, &lt;code&gt;OpenClaw&lt;/code&gt; is the strongest product in that category today. If you want the same basic idea with tighter security controls, &lt;code&gt;NVIDIA NemoClaw&lt;/code&gt; is the most interesting next step, but it is still early and NVIDIA says not to use it in production yet. If you are building a production system, the best default is still a strong model inside a more controlled orchestration layer such as &lt;code&gt;LangGraph&lt;/code&gt;.&lt;/p&gt; &lt;p&gt;That is not a cop-out. It is the market taking shape.&lt;/p&gt; &lt;h2&gt;The first thing to understand&lt;/h2&gt; &lt;p&gt;When people talk about the “best agent,” they often mix together three different layers:&lt;/p&gt; &lt;ol&gt;&lt;li&gt;The product or runtime you install and use, such as &lt;a href=&quot;https://docs.openclaw.ai/index&quot; rel=&quot;nofollow&quot;&gt;OpenClaw&lt;/a&gt;, &lt;a href=&quot;https://docs.nvidia.com/nemoclaw/latest/&quot; rel=&quot;nofollow&quot;&gt;NVIDIA NemoClaw&lt;/a&gt;, &lt;a href=&quot;https://github.com/OpenHands/OpenHands&quot; rel=&quot;nofollow&quot;&gt;OpenHands&lt;/a&gt;, or &lt;a href=&quot;https://opencode.ai/&quot; rel=&quot;nofollow&quot;&gt;OpenCode&lt;/a&gt;.&lt;/li&gt; &lt;li&gt;The orchestration framework used to build agent systems, such as &lt;a href=&quot;https://langchain-ai.github.io/langgraph/&quot; rel=&quot;nofollow&quot;&gt;LangGraph&lt;/a&gt;, &lt;a href=&quot;https://microsoft.github.io/autogen/stable/&quot; rel=&quot;nofollow&quot;&gt;AutoGen&lt;/a&gt;, or &lt;a href=&quot;https://github.com/crewAIInc/crewAI&quot; rel=&quot;nofollow&quot;&gt;CrewAI&lt;/a&gt;.&lt;/li&gt; &lt;li&gt;The model doing the reasoning, such as &lt;code&gt;GPT-5.4&lt;/code&gt;, &lt;code&gt;Gemini 3.1 Pro&lt;/code&gt;, or &lt;code&gt;Claude 4.6&lt;/code&gt;.&lt;/li&gt;&lt;/ol&gt; &lt;p&gt;Those are not the same thing.&lt;/p&gt; &lt;p&gt;It is also worth separating two NVIDIA names that people keep blending together. &lt;code&gt;NemoClaw&lt;/code&gt; is NVIDIA’s sandboxed agent stack. &lt;code&gt;Nemotron&lt;/code&gt; is NVIDIA’s model family. One is the runtime. The other is the brain.&lt;/p&gt; &lt;p&gt;&lt;code&gt;OpenClaw&lt;/code&gt; is not trying to solve the exact same problem as &lt;code&gt;LangGraph&lt;/code&gt;. &lt;code&gt;GPT-5.4&lt;/code&gt; is not an agent product by itself. And a lot of bad comparisons come from throwing all three layers into one bucket.&lt;/p&gt; &lt;p&gt;Once you separate them, the space becomes much easier to read.&lt;/p&gt; &lt;h2&gt;The short answer&lt;/h2&gt; &lt;p&gt;If you want the practical version, here it is:&lt;/p&gt; &lt;ul&gt;&lt;li&gt;&lt;code&gt;OpenClaw&lt;/code&gt; is the best self-hosted personal agent platform right now.&lt;/li&gt; &lt;li&gt;&lt;code&gt;Pi&lt;/code&gt;, the compact agent core inside &lt;code&gt;OpenClaw&lt;/code&gt;, is a big reason it works as well as it does.&lt;/li&gt; &lt;li&gt;&lt;code&gt;NVIDIA NemoClaw&lt;/code&gt; is the most interesting security-focused evolution of that idea, but it is still alpha software.&lt;/li&gt; &lt;li&gt;&lt;code&gt;LangGraph&lt;/code&gt; is the best default framework for teams building serious production agent systems.&lt;/li&gt; &lt;li&gt;&lt;code&gt;GPT-5.4&lt;/code&gt; currently looks like the strongest all-around model for agent-style tasks on several current benchmarks.&lt;/li&gt; &lt;li&gt;&lt;code&gt;Gemini 3.1 Pro&lt;/code&gt; is extremely close and often looks especially strong on coding-heavy tasks.&lt;/li&gt; &lt;li&gt;&lt;code&gt;NVIDIA Nemotron-3-Super-120B&lt;/code&gt; deserves to be in the top-tier model conversation because it is currently leading OpenClaw’s own real-world benchmark.&lt;/li&gt; &lt;li&gt;The strongest systems on some agent benchmarks are already hybrid systems that route across multiple models rather than relying on one model for everything.&lt;/li&gt;&lt;/ul&gt; &lt;p&gt;That is the state of play.&lt;/p&gt; &lt;h2&gt;Why OpenClaw matters&lt;/h2&gt; &lt;p&gt;&lt;code&gt;OpenClaw&lt;/code&gt; matters because it made the personal-agent pitch feel real.&lt;/p&gt; &lt;p&gt;Its core pitch is simple: connect AI agents to messaging apps like WhatsApp, Telegram, Discord, and iMessage so you can talk to your agent the same way you talk to a person. The official &lt;a href=&quot;https://docs.openclaw.ai/index&quot; rel=&quot;nofollow&quot;&gt;OpenClaw docs&lt;/a&gt; describe it as a self-hosted gateway for always-available assistants across chat apps, mobile nodes, and a web control UI.&lt;/p&gt; &lt;p&gt;It also helps that &lt;code&gt;OpenClaw&lt;/code&gt; is not built around a giant mystery box. Its agent core is &lt;code&gt;Pi&lt;/code&gt;, a deliberately compact runtime documented in the &lt;a href=&quot;https://docs.openclaw.ai/pi&quot; rel=&quot;nofollow&quot;&gt;Pi integration docs&lt;/a&gt;. You can feel that design choice. A lot of agent products get worse as they pile on tools, prompts, and layers of abstraction. &lt;code&gt;Pi&lt;/code&gt; goes the other way: keep the core small, make tool use explicit, and let the model do the work. That is a big reason OpenClaw feels more usable than a lot of “look what my agent did” demos.&lt;/p&gt; &lt;p&gt;That product shape is a big part of why it exploded. Based on the current GitHub page and docs footprint, it has already reached roughly &lt;code&gt;328k&lt;/code&gt; stars and &lt;code&gt;63k+&lt;/code&gt; forks. For a category this young, that is a huge signal.&lt;/p&gt; &lt;p&gt;The appeal is obvious:&lt;/p&gt; &lt;ul&gt;&lt;li&gt;it is self-hosted&lt;/li&gt; &lt;li&gt;it works through familiar chat apps&lt;/li&gt; &lt;li&gt;it feels like a personal assistant, not just another chatbot tab&lt;/li&gt; &lt;li&gt;it supports tools, sessions, routing, and multi-agent setups&lt;/li&gt;&lt;/ul&gt; &lt;p&gt;For a lot of people, &lt;code&gt;OpenClaw&lt;/code&gt; is the first agent system that feels like software they might actually leave running instead of a demo they try once.&lt;/p&gt; &lt;h2&gt;Why people are still nervous about OpenClaw&lt;/h2&gt; &lt;p&gt;The excitement is real. So are the concerns.&lt;/p&gt; &lt;p&gt;OpenClaw’s own &lt;a href=&quot;https://github.com/openclaw/openclaw/security&quot; rel=&quot;nofollow&quot;&gt;security overview&lt;/a&gt; is unusually direct about its trust model. It says OpenClaw is built around a &lt;code&gt;one-user trusted-operator&lt;/code&gt; model, not a shared multi-tenant boundary. It also notes that host-side execution is the default unless sandboxing is enabled.&lt;/p&gt; &lt;p&gt;This is not a footnote. It means OpenClaw makes the most sense when one trusted person is running it for themselves, or when a team has very clear trust boundaries. It is a much worse fit for “let’s expose this casually and let lots of unrelated people use it.”&lt;/p&gt; &lt;p&gt;That matches what people are saying in public discussions. Recent &lt;a href=&quot;https://news.ycombinator.com/item?id=47217812&quot; rel=&quot;nofollow&quot;&gt;Hacker News discussion&lt;/a&gt; around OpenClaw’s growth shows a mix of admiration and skepticism. Some people see it as the first real app layer for personal AI. Others keep asking whether many of these workflows could still be handled with scripts, automations, or simpler tools.&lt;/p&gt; &lt;p&gt;There is also a serious conversation around exposed instances, risky skills, and over-broad permissions. That is not anti-innovation. It is what happens when a product becomes powerful enough to matter.&lt;/p&gt; &lt;h2&gt;What NVIDIA NemoClaw is trying to fix&lt;/h2&gt; &lt;p&gt;If &lt;code&gt;OpenClaw&lt;/code&gt; made self-hosted personal agents feel exciting, &lt;code&gt;NVIDIA NemoClaw&lt;/code&gt; is trying to make them feel less reckless.&lt;/p&gt; &lt;p&gt;NVIDIA announced NemoClaw in March 2026 as an open-source stack that runs OpenClaw inside &lt;code&gt;OpenShell&lt;/code&gt;, NVIDIA’s runtime for policy-based isolation and controlled inference routing. The official &lt;a href=&quot;https://nvidianews.nvidia.com/news/nvidia-announces-nemoclaw?nvid=nv-int-cwmfg-992729&quot; rel=&quot;nofollow&quot;&gt;NVIDIA announcement&lt;/a&gt; and &lt;a href=&quot;https://docs.nvidia.com/nemoclaw/latest/&quot; rel=&quot;nofollow&quot;&gt;developer docs&lt;/a&gt; make the idea clear: keep the OpenClaw experience, but add tighter guardrails around what the agent can touch.&lt;/p&gt; &lt;p&gt;The most important technical detail is in NemoClaw’s &lt;a href=&quot;https://docs.nvidia.com/nemoclaw/latest/reference/network-policies.html&quot; rel=&quot;nofollow&quot;&gt;network policy docs&lt;/a&gt;. The sandbox is &lt;code&gt;strict-by-default&lt;/code&gt;. If the agent tries to reach an endpoint that is not explicitly allowed, the request gets blocked and the operator has to approve it.&lt;/p&gt; &lt;p&gt;That is a real improvement over the more permissive self-hosted setups people have been using.&lt;/p&gt; &lt;p&gt;NemoClaw also narrows filesystem access and makes the operator approval flow much more visible. In plain English, it is trying to put a real safety layer between “the agent wants to do something” and “the host lets it happen.”&lt;/p&gt; &lt;p&gt;Still, the project is early. NVIDIA’s own docs label it &lt;code&gt;alpha software&lt;/code&gt; and say not to use it in production yet.&lt;/p&gt; &lt;p&gt;People gloss over that point all the time in AI infrastructure. NemoClaw is promising, but it is still firmly in watch-this-space territory.&lt;/p&gt; &lt;h2&gt;So is it OpenClaw or NemoClaw?&lt;/h2&gt; &lt;p&gt;Right now, the practical answer is:&lt;/p&gt; &lt;ul&gt;&lt;li&gt;pick &lt;code&gt;OpenClaw&lt;/code&gt; if you want the best self-hosted personal assistant experience today&lt;/li&gt; &lt;li&gt;watch &lt;code&gt;NemoClaw&lt;/code&gt; if you care deeply about security posture and want to see where the category is going next&lt;/li&gt;&lt;/ul&gt; &lt;p&gt;&lt;code&gt;OpenClaw&lt;/code&gt; is more usable today. &lt;code&gt;NemoClaw&lt;/code&gt; has the better long-term shape if NVIDIA executes.&lt;/p&gt; &lt;p&gt;There is a deeper point here. Sandboxing helps, but it does not solve the hardest problem in agentic AI. If you hand an agent access to your email, browser, GitHub, files, calendar, and internal systems, the hard question is no longer just “is it sandboxed?” It is “what authority did I just hand over?”&lt;/p&gt; &lt;p&gt;That is why the best discussions around agents are no longer about clever demos. They are about permissions, scope, trust boundaries, and failure modes.&lt;/p&gt; &lt;h2&gt;The real “something else” answer&lt;/h2&gt; &lt;p&gt;For many technical teams, the best agentic solution right now is not &lt;code&gt;OpenClaw&lt;/code&gt; or &lt;code&gt;NemoClaw&lt;/code&gt; at all.&lt;/p&gt; &lt;p&gt;It is a good model inside a well-controlled orchestration layer.&lt;/p&gt; &lt;p&gt;That is where tools like &lt;code&gt;LangGraph&lt;/code&gt;, &lt;code&gt;AutoGen&lt;/code&gt;, and &lt;code&gt;CrewAI&lt;/code&gt; come in.&lt;/p&gt; &lt;p&gt;&lt;a href=&quot;https://langchain-ai.github.io/langgraph/&quot; rel=&quot;nofollow&quot;&gt;LangGraph&lt;/a&gt; stands out because it is explicit about the boring but crucial parts of agent systems: state, long-running execution, human review, memory, and observability. Those are the things that start to matter once an agent has to do real work over and over instead of just looking impressive in a short video.&lt;/p&gt; &lt;p&gt;&lt;a href=&quot;https://microsoft.github.io/autogen/stable/&quot; rel=&quot;nofollow&quot;&gt;AutoGen&lt;/a&gt; is still strong for conversational and event-driven multi-agent applications, especially for teams that want a programmable framework with a lot of flexibility.&lt;/p&gt; &lt;p&gt;&lt;a href=&quot;https://github.com/crewAIInc/crewAI&quot; rel=&quot;nofollow&quot;&gt;CrewAI&lt;/a&gt; is still appealing when people want role-based multi-agent collaboration or a more approachable way to prototype “a crew” of specialized agents.&lt;/p&gt; &lt;p&gt;Then there are tools like &lt;a href=&quot;https://github.com/OpenHands/OpenHands&quot; rel=&quot;nofollow&quot;&gt;OpenHands&lt;/a&gt; and &lt;a href=&quot;https://opencode.ai/&quot; rel=&quot;nofollow&quot;&gt;OpenCode&lt;/a&gt;, which are more focused on coding workflows. If your real use case is software development rather than general personal assistance, those may matter more than either OpenClaw or NemoClaw.&lt;/p&gt; &lt;p&gt;If you are building an agent product that has to run reliably, stay debuggable, and survive contact with real users, &lt;code&gt;LangGraph&lt;/code&gt; is the best default choice today.&lt;/p&gt; &lt;h2&gt;What the benchmarks say about the best model&lt;/h2&gt; &lt;p&gt;The model question is separate from the platform question, and current benchmarks make that obvious.&lt;/p&gt; &lt;p&gt;According to the current &lt;a href=&quot;https://artificialanalysis.ai/evaluations/artificial-analysis-intelligence-index&quot; rel=&quot;nofollow&quot;&gt;Artificial Analysis Intelligence Index&lt;/a&gt;, &lt;code&gt;Gemini 3.1 Pro Preview&lt;/code&gt; and &lt;code&gt;GPT-5.4 (xhigh)&lt;/code&gt; are tied at the top with a score of &lt;code&gt;57&lt;/code&gt;.&lt;/p&gt; &lt;p&gt;On &lt;a href=&quot;https://artificialanalysis.ai/evaluations/terminalbench-hard&quot; rel=&quot;nofollow&quot;&gt;Terminal-Bench Hard&lt;/a&gt;, which is one of the more useful public benchmarks for agent-like terminal behavior, &lt;code&gt;GPT-5.4 (xhigh)&lt;/code&gt; leads with &lt;code&gt;57.6%&lt;/code&gt;, followed by &lt;code&gt;Gemini 3.1 Pro Preview&lt;/code&gt; at &lt;code&gt;53.8%&lt;/code&gt;, then &lt;code&gt;GPT-5.3 Codex (xhigh)&lt;/code&gt; at &lt;code&gt;53.0%&lt;/code&gt;.&lt;/p&gt; &lt;p&gt;On &lt;a href=&quot;https://artificialanalysis.ai/evaluations/gdpval-aa&quot; rel=&quot;nofollow&quot;&gt;GDPval-AA&lt;/a&gt;, which evaluates real-world work tasks with shell and browser access, &lt;code&gt;GPT-5.4 (xhigh)&lt;/code&gt; leads with an ELO of &lt;code&gt;1667&lt;/code&gt;, followed by &lt;code&gt;Claude Sonnet 4.6&lt;/code&gt; at &lt;code&gt;1633&lt;/code&gt;, then &lt;code&gt;Claude Opus 4.6&lt;/code&gt; at &lt;code&gt;1606&lt;/code&gt;.&lt;/p&gt; &lt;p&gt;On &lt;a href=&quot;https://artificialanalysis.ai/evaluations/livecodebench&quot; rel=&quot;nofollow&quot;&gt;LiveCodeBench&lt;/a&gt;, &lt;code&gt;Gemini 3 Pro Preview&lt;/code&gt; leads with &lt;code&gt;91.7%&lt;/code&gt;, followed by &lt;code&gt;Gemini 3 Flash Preview (Reasoning)&lt;/code&gt; at &lt;code&gt;90.8%&lt;/code&gt;, then &lt;code&gt;DeepSeek V3.2 Speciale&lt;/code&gt; at &lt;code&gt;89.6%&lt;/code&gt;.&lt;/p&gt; &lt;p&gt;If you care specifically about how models behave inside &lt;code&gt;OpenClaw&lt;/code&gt;, a different benchmark matters even more. On &lt;a href=&quot;https://openclaw.report/ecosystem/pinchbench-openclaw-benchmark&quot; rel=&quot;nofollow&quot;&gt;PinchBench&lt;/a&gt;, which evaluates real OpenClaw tasks with tools and grading logic exposed, &lt;code&gt;NVIDIA Nemotron-3-Super-120B&lt;/code&gt; currently leads the average leaderboard at &lt;code&gt;84.7%&lt;/code&gt;, ahead of &lt;code&gt;Claude Opus 4.6&lt;/code&gt; at &lt;code&gt;80.8%&lt;/code&gt;, &lt;code&gt;GPT-5.4&lt;/code&gt; at &lt;code&gt;80.5%&lt;/code&gt;, and &lt;code&gt;Kimi K2.5&lt;/code&gt; at &lt;code&gt;80.1%&lt;/code&gt;.&lt;/p&gt; &lt;p&gt;The practical read is pretty clear:&lt;/p&gt; &lt;ul&gt;&lt;li&gt;&lt;code&gt;GPT-5.4&lt;/code&gt; currently looks like the strongest all-around agent brain&lt;/li&gt; &lt;li&gt;&lt;code&gt;Gemini 3.1 Pro&lt;/code&gt; is right there with it and often looks especially strong for coding&lt;/li&gt; &lt;li&gt;&lt;code&gt;Claude 4.6&lt;/code&gt; remains one of the safest strong choices for careful tool use and developer work&lt;/li&gt; &lt;li&gt;&lt;code&gt;NVIDIA Nemotron&lt;/code&gt; is no longer a side note. If your agent actually lives inside an OpenClaw-style runtime, it looks like one of the strongest options on the board&lt;/li&gt;&lt;/ul&gt; &lt;p&gt;If you want one model and do not want to overthink it, &lt;code&gt;GPT-5.4&lt;/code&gt; is still the safest default. If you care most about OpenClaw-specific benchmarking, &lt;code&gt;Nemotron-3-Super-120B&lt;/code&gt; deserves serious attention.&lt;/p&gt; &lt;h2&gt;The strongest agents are already hybrid&lt;/h2&gt; &lt;p&gt;One benchmark result matters more than many readers realize.&lt;/p&gt; &lt;p&gt;On the &lt;a href=&quot;https://gaia-benchmark-leaderboard.hf.space/&quot; rel=&quot;nofollow&quot;&gt;GAIA leaderboard&lt;/a&gt;, some of the top-performing systems are not simple one-model agents. The top score shown right now is &lt;code&gt;92.36%&lt;/code&gt;, achieved by systems like &lt;code&gt;Manus_v0.0.112221&lt;/code&gt; and &lt;code&gt;OPS-Agentic-Search&lt;/code&gt;, and both use a mix of frontier models rather than a single model for every step.&lt;/p&gt; &lt;p&gt;It hints at where the state of the art is going.&lt;/p&gt; &lt;p&gt;The best agent may not be one brilliant model with tools. It may be a system that knows when to route coding work, search work, planning work, or summarization work to different models.&lt;/p&gt; &lt;p&gt;That is harder to build. It is also where the serious end of the market is headed.&lt;/p&gt; &lt;h2&gt;What it costs to run OpenClaw for a month&lt;/h2&gt; &lt;p&gt;There are two costs here: the machine that keeps the agent online, and the model tokens the agent burns through.&lt;/p&gt; &lt;p&gt;For hosting, the runtime itself is cheap:&lt;/p&gt; &lt;ul&gt;&lt;li&gt;an existing &lt;code&gt;Mac mini&lt;/code&gt; usually costs only about &lt;code&gt;$3&lt;/code&gt; to &lt;code&gt;$10&lt;/code&gt; a month in electricity&lt;/li&gt; &lt;li&gt;if you want to count hardware too, a &lt;code&gt;Mac mini&lt;/code&gt; amortized over two years is more like &lt;code&gt;$30&lt;/code&gt; to &lt;code&gt;$50&lt;/code&gt; a month all-in&lt;/li&gt; &lt;li&gt;a small &lt;code&gt;Hetzner VPS&lt;/code&gt; is usually about &lt;code&gt;$5&lt;/code&gt; to &lt;code&gt;$20&lt;/code&gt; a month&lt;/li&gt;&lt;/ul&gt; &lt;p&gt;The model bill is what actually moves.&lt;/p&gt; &lt;p&gt;For a single-user agent that chats with you throughout the day, does a bit of web work, and sends one daily digest, a reasonable monthly assumption is roughly &lt;code&gt;3M&lt;/code&gt; input tokens and &lt;code&gt;600k&lt;/code&gt; output tokens for light use, or &lt;code&gt;15M&lt;/code&gt; input and &lt;code&gt;3M&lt;/code&gt; output for moderate use.&lt;/p&gt; &lt;p&gt;Using current public pricing, that comes out to roughly this:&lt;/p&gt; &lt;ul&gt;&lt;li&gt;&lt;code&gt;MiniMax 2.5&lt;/code&gt;: about &lt;code&gt;$2&lt;/code&gt; light or &lt;code&gt;$8&lt;/code&gt; moderate&lt;/li&gt; &lt;li&gt;&lt;code&gt;Kimi K2.5&lt;/code&gt;: about &lt;code&gt;$4&lt;/code&gt; light or &lt;code&gt;$18&lt;/code&gt; moderate&lt;/li&gt; &lt;li&gt;&lt;code&gt;GLM-5&lt;/code&gt;: about &lt;code&gt;$5&lt;/code&gt; light or &lt;code&gt;$25&lt;/code&gt; moderate&lt;/li&gt; &lt;li&gt;&lt;code&gt;Gemini 3.1 Pro&lt;/code&gt;: about &lt;code&gt;$13&lt;/code&gt; light or &lt;code&gt;$66&lt;/code&gt; moderate&lt;/li&gt; &lt;li&gt;&lt;code&gt;GPT-5.4&lt;/code&gt;: about &lt;code&gt;$17&lt;/code&gt; light or &lt;code&gt;$83&lt;/code&gt; moderate&lt;/li&gt; &lt;li&gt;&lt;code&gt;Claude Sonnet 4.6&lt;/code&gt;: about &lt;code&gt;$18&lt;/code&gt; light or &lt;code&gt;$90&lt;/code&gt; moderate&lt;/li&gt;&lt;/ul&gt; &lt;p&gt;So the budgeting rule is pretty simple:&lt;/p&gt; &lt;ul&gt;&lt;li&gt;budget setup: &lt;code&gt;MiniMax 2.5&lt;/code&gt;, &lt;code&gt;Kimi K2.5&lt;/code&gt;, or &lt;code&gt;GLM-5&lt;/code&gt; on a small VPS, usually lands around &lt;code&gt;$10&lt;/code&gt; to &lt;code&gt;$40&lt;/code&gt; a month all-in&lt;/li&gt; &lt;li&gt;quality-first setup: &lt;code&gt;GPT-5.4&lt;/code&gt;, &lt;code&gt;Gemini 3.1 Pro&lt;/code&gt;, or &lt;code&gt;Claude 4.6&lt;/code&gt;, usually lands around &lt;code&gt;$25&lt;/code&gt; to &lt;code&gt;$110&lt;/code&gt; a month all-in&lt;/li&gt; &lt;li&gt;if you already own the &lt;code&gt;Mac mini&lt;/code&gt;, the hosting cost is almost irrelevant and the model choice dominates the bill&lt;/li&gt;&lt;/ul&gt; &lt;h2&gt;Mac mini or Hetzner VPS?&lt;/h2&gt; &lt;p&gt;This is not the main story, but it matters if you actually want to run one of these systems.&lt;/p&gt; &lt;p&gt;A &lt;code&gt;Mac mini&lt;/code&gt; is a strong choice if you want:&lt;/p&gt; &lt;ul&gt;&lt;li&gt;local privacy&lt;/li&gt; &lt;li&gt;low power use&lt;/li&gt; &lt;li&gt;easy access to your own apps and files&lt;/li&gt; &lt;li&gt;a machine that feels personal rather than server-like&lt;/li&gt;&lt;/ul&gt; &lt;p&gt;A &lt;code&gt;Hetzner VPS&lt;/code&gt; makes more sense if you want:&lt;/p&gt; &lt;ul&gt;&lt;li&gt;24/7 uptime&lt;/li&gt; &lt;li&gt;remote availability from anywhere&lt;/li&gt; &lt;li&gt;a clean host dedicated to the agent&lt;/li&gt; &lt;li&gt;easier separation from your personal daily machine&lt;/li&gt;&lt;/ul&gt; &lt;p&gt;The simple rule:&lt;/p&gt; &lt;ul&gt;&lt;li&gt;use a &lt;code&gt;Mac mini&lt;/code&gt; if the agent is mainly for you&lt;/li&gt; &lt;li&gt;use a &lt;code&gt;VPS&lt;/code&gt; if the gateway needs to stay online all the time and you are comfortable administering it&lt;/li&gt;&lt;/ul&gt; &lt;p&gt;OpenClaw’s own trust model pushes in the same direction. One trusted user per machine or VPS is a much cleaner setup than trying to share a powerful agent runtime loosely across mixed-trust users.&lt;/p&gt; &lt;h2&gt;A simple Hacker News digest workflow&lt;/h2&gt; &lt;p&gt;One nice thing about modern agent platforms is that useful workflows do not have to be complicated.&lt;/p&gt; &lt;p&gt;For example, &lt;a href=&quot;https://hn.alcazarsec.com/daily&quot; rel=&quot;nofollow&quot;&gt;Top HN Daily Digest&lt;/a&gt; is exactly the kind of source an always-on personal agent should handle for you.&lt;/p&gt; &lt;p&gt;The right way to handle this is to give the agent a standing instruction like this:&lt;/p&gt; &lt;blockquote&gt;&lt;p&gt;Every day at 8:00 AM local time, fetch &lt;code&gt;https://hn.alcazarsec.com/daily&lt;/code&gt; and send me a short briefing. Focus on stories relevant to my work in AI, developer tools, startups, and security. Include the 5 most relevant stories, one sentence on why each matters, one repeated theme across the day, and one link I should definitely read in full. Ignore off-topic stories unless they are unusually important. If the page is unavailable, retry in 30 minutes.&lt;/p&gt;&lt;/blockquote&gt; &lt;p&gt;That is much closer to the real value of an agent. You set the rule once, and it keeps doing the job.&lt;/p&gt; &lt;h2&gt;This is not just a US trend&lt;/h2&gt; &lt;p&gt;One thing casual readers miss is how global this shift already is.&lt;/p&gt; &lt;p&gt;In China, several of the strongest public benchmark submissions on GAIA come from teams connected to companies like Alibaba Cloud, JD Enterprise Intelligence, Lenovo, and CMCC. That points to a serious push at the enterprise and systems level, not just consumer curiosity.&lt;/p&gt; &lt;p&gt;In Japan, the conversation already looks implementation-focused. A reported summary of &lt;a href=&quot;https://www.iza.ne.jp/pressrelease/prtimes/6C6FILX3YBJUHKEU3Z5AEQOK2Q/&quot; rel=&quot;nofollow&quot;&gt;AI Agent Day 2026&lt;/a&gt; says the event drew &lt;code&gt;3,710&lt;/code&gt; registrations, with &lt;code&gt;38%&lt;/code&gt; of attendees coming from large enterprises and &lt;code&gt;36%&lt;/code&gt; being decision-makers. The same report says the biggest themes were business efficiency and security or privacy.&lt;/p&gt; &lt;p&gt;In Europe, the frame is different again. The &lt;a href=&quot;https://commission.europa.eu/news/ai-act-enters-force-2024-08-01_en&quot; rel=&quot;nofollow&quot;&gt;European Commission’s AI Act overview&lt;/a&gt; is not a direct agent benchmark, but the broader regulatory context is pushing teams to think more seriously about governance, risk, and documentation when they deploy AI systems. Self-hosting is attractive there, but it does not remove compliance obligations by itself.&lt;/p&gt; &lt;p&gt;In the US, the debate is louder and more polarized. You can see that in &lt;a href=&quot;https://news.ycombinator.com/item?id=47175503&quot; rel=&quot;nofollow&quot;&gt;Hacker News discussions about OpenClaw&lt;/a&gt;, where some people see agentic software as the start of a new application layer, while others see a lot of hype wrapped around workflows that could still be handled with scripts or ordinary automation.&lt;/p&gt; &lt;p&gt;That tension is real. It is also healthy.&lt;/p&gt; &lt;h2&gt;The honest conclusion&lt;/h2&gt; &lt;p&gt;If the question is “what is the best agentic AI solution right now?”, here is the cleanest answer:&lt;/p&gt; &lt;p&gt;&lt;strong&gt;For personal self-hosted use, &lt;code&gt;OpenClaw&lt;/code&gt; is the leader. For a more security-focused self-hosted direction, watch &lt;code&gt;NemoClaw&lt;/code&gt;. For production systems, use a strong model inside a controlled orchestration stack and default to &lt;code&gt;LangGraph&lt;/code&gt; unless you have a reason not to. For models, start with &lt;code&gt;GPT-5.4&lt;/code&gt;, keep &lt;code&gt;Gemini 3.1 Pro&lt;/code&gt; close behind it, and take &lt;code&gt;Nemotron&lt;/code&gt; seriously for OpenClaw-native workloads.&lt;/strong&gt;&lt;/p&gt; &lt;p&gt;That is where the evidence points today.&lt;/p&gt; &lt;p&gt;The category is being shaped by four things at once:&lt;/p&gt; &lt;ul&gt;&lt;li&gt;better models&lt;/li&gt; &lt;li&gt;better tools and runtimes&lt;/li&gt; &lt;li&gt;better security discipline&lt;/li&gt; &lt;li&gt;better understanding of where agents are actually useful&lt;/li&gt;&lt;/ul&gt; &lt;p&gt;Benchmark scores will not decide this by themselves.&lt;/p&gt; &lt;p&gt;The winners will be the systems that combine strong models, useful interfaces, safe defaults, and enough product discipline to earn trust in the real world.&lt;/p&gt;&lt;!--]--&gt;</description>
</item>
<item>
    <title>Wide logging: Stripe&#39;s canonical log line pattern</title>
    <link>https://blog.alcazarsec.com/tech/posts/wide-logging</link>
    <pubDate>Mon, 16 Mar 2026 00:00:00 GMT</pubDate>
    <description>&lt;!--[--&gt;&lt;p&gt;Most logging is too narrow.&lt;/p&gt; &lt;p&gt;One line has the route. Another has the user. Another has the timeout. Another has the feature flag. Another has the deploy SHA.&lt;/p&gt; &lt;p&gt;Then an incident happens and you end up doing joins by hand.&lt;/p&gt; &lt;p&gt;Stripe’s answer is &lt;a href=&quot;https://stripe.com/blog/canonical-log-lines&quot; rel=&quot;nofollow&quot;&gt;canonical log lines&lt;/a&gt;. The modern name is usually &lt;strong&gt;wide events&lt;/strong&gt;. The pattern is simple: emit one structured record per unit of work with all the important fields already attached.&lt;/p&gt; &lt;p&gt;For a web service, that usually means one log event at the end of every request.&lt;/p&gt; &lt;h2&gt;The Pattern&lt;/h2&gt; &lt;p&gt;A canonical log line is the summary row for a request.&lt;/p&gt; &lt;p&gt;It should include the fields you always wish you had in one place:&lt;/p&gt; &lt;ul&gt;&lt;li&gt;route&lt;/li&gt; &lt;li&gt;method&lt;/li&gt; &lt;li&gt;status&lt;/li&gt; &lt;li&gt;duration&lt;/li&gt; &lt;li&gt;user or account ID&lt;/li&gt; &lt;li&gt;request ID and trace ID&lt;/li&gt; &lt;li&gt;build or deploy ID&lt;/li&gt; &lt;li&gt;feature flags&lt;/li&gt; &lt;li&gt;downstream timings&lt;/li&gt; &lt;li&gt;error code&lt;/li&gt;&lt;/ul&gt; &lt;p&gt;In raw form it might look like this:&lt;/p&gt; &lt;pre class=&quot;language-text&quot;&gt;&lt;!----&gt;&lt;code class=&quot;language-text&quot;&gt;ts=2026-03-16T12:03:41Z service=api env=prod route=/v1/charges method=POST status=500 request_id=req_123
account_id=acct_456 build_id=9f2c1d7 feature_flag.payments_v2=true duration_ms=843 db_ms=792
db_queries=18 cache_hit=false error_slug=charge_db_timeout&lt;/code&gt;&lt;!----&gt;&lt;/pre&gt; &lt;p&gt;This is useful because the log line is already pre-joined. You are not reconstructing a request from fragments. You are querying complete rows.&lt;/p&gt; &lt;p&gt;That sounds minor, but it changes what production debugging feels like.&lt;/p&gt; &lt;h2&gt;Why It Works&lt;/h2&gt; &lt;p&gt;Stripe did two important things.&lt;/p&gt; &lt;p&gt;First, they treated the canonical line as critical infrastructure. It is emitted after the request finishes, and their implementation is hardened so the line still appears when exceptions happen.&lt;/p&gt; &lt;p&gt;Second, they did not stop at debugging. Stripe pushed these records into warehousing systems and used them for longer-term analysis and product surfaces like the Developer Dashboard.&lt;/p&gt; &lt;p&gt;That is the part many teams miss.&lt;/p&gt; &lt;p&gt;A canonical log line is more than a nicer log. It is a request-shaped data model.&lt;/p&gt; &lt;p&gt;If the schema is stable, the same event can support:&lt;/p&gt; &lt;ul&gt;&lt;li&gt;incident response&lt;/li&gt; &lt;li&gt;release analysis&lt;/li&gt; &lt;li&gt;customer support investigations&lt;/li&gt; &lt;li&gt;product analytics&lt;/li&gt;&lt;/ul&gt; &lt;p&gt;Amazon describes a similar idea in the Builders Library: emit one structured request log entry per unit of work, then derive metrics later. Log first. Aggregate later.&lt;/p&gt; &lt;h2&gt;What To Log&lt;/h2&gt; &lt;p&gt;Most teams stop too early.&lt;/p&gt; &lt;p&gt;They log route, status, and latency. That is enough for a dashboard, but not enough for diagnosis.&lt;/p&gt; &lt;p&gt;The highest-value fields tend to be:&lt;/p&gt; &lt;ul&gt;&lt;li&gt;&lt;strong&gt;Route template:&lt;/strong&gt; &lt;code&gt;/teams/{team_id}/members/{user_id}&lt;/code&gt; is better than raw paths with IDs embedded in them.&lt;/li&gt; &lt;li&gt;&lt;strong&gt;Identity:&lt;/strong&gt; &lt;code&gt;user_id&lt;/code&gt;, &lt;code&gt;account_id&lt;/code&gt;, API key ID, auth method.&lt;/li&gt; &lt;li&gt;&lt;strong&gt;Release metadata:&lt;/strong&gt; Git SHA, build ID, deploy ring, region.&lt;/li&gt; &lt;li&gt;&lt;strong&gt;Execution cost:&lt;/strong&gt; duration, DB time, query count, cache hit or miss, retry count.&lt;/li&gt; &lt;li&gt;&lt;strong&gt;Decision inputs:&lt;/strong&gt; feature flags, experiment variant, plan tier, client version.&lt;/li&gt; &lt;li&gt;&lt;strong&gt;Outcome:&lt;/strong&gt; status code, throttled yes or no, fallback path used, error slug.&lt;/li&gt;&lt;/ul&gt; &lt;p&gt;Two fields are especially underrated.&lt;/p&gt; &lt;p&gt;The first is &lt;code&gt;build_id&lt;/code&gt;. Metrics tell you that latency went up. &lt;code&gt;build_id&lt;/code&gt; tells you which deploy owns the regression.&lt;/p&gt; &lt;p&gt;The second is an &lt;code&gt;error_slug&lt;/code&gt;. Not just an exception class. A stable identifier for the exact failure site or failure reason.&lt;/p&gt; &lt;p&gt;That is the difference between “timeouts increased” and “the timeout came from the new write path behind &lt;code&gt;feature_flag.double_write&lt;/code&gt;.”&lt;/p&gt; &lt;h2&gt;The Real Benefit&lt;/h2&gt; &lt;p&gt;The real power of wide logging is not observability. It is correlation.&lt;/p&gt; &lt;p&gt;Once every request carries business context and execution context in the same row, you can ask much better questions:&lt;/p&gt; &lt;ul&gt;&lt;li&gt;Did the new build hurt only enterprise accounts?&lt;/li&gt; &lt;li&gt;Did the regression appear only on iOS &lt;code&gt;7.4.1&lt;/code&gt;?&lt;/li&gt; &lt;li&gt;Did variant &lt;code&gt;B&lt;/code&gt; increase errors only in &lt;code&gt;eu-west-1&lt;/code&gt;?&lt;/li&gt; &lt;li&gt;Did the slow requests all miss cache and hit the same downstream service?&lt;/li&gt;&lt;/ul&gt; &lt;p&gt;Metrics are bad at this because they throw context away early.&lt;/p&gt; &lt;p&gt;Traditional logs are bad at this because the context is scattered.&lt;/p&gt; &lt;p&gt;Canonical log lines keep the context intact long enough to query it.&lt;/p&gt; &lt;p&gt;That is why the pattern keeps coming back under different names.&lt;/p&gt; &lt;h2&gt;High Cardinality&lt;/h2&gt; &lt;p&gt;This is where people get nervous.&lt;/p&gt; &lt;p&gt;&lt;code&gt;user_id&lt;/code&gt;, &lt;code&gt;request_id&lt;/code&gt;, &lt;code&gt;build_id&lt;/code&gt;, and feature flags are high-cardinality fields. In many systems that is a warning sign.&lt;/p&gt; &lt;p&gt;The important distinction is where the cardinality lives.&lt;/p&gt; &lt;p&gt;High-cardinality values are often fine inside a wide event. They become expensive when you force them into the wrong indexing model.&lt;/p&gt; &lt;p&gt;That is why this pattern works best with systems designed for filtering and grouping over many dimensions. Stripe used Splunk and Redshift. Modern teams might use ClickHouse-backed tools, Honeycomb, BigQuery, or their own warehouse.&lt;/p&gt; &lt;p&gt;The storage choice is less important than the query shape. You want to slice rich rows, not pre-aggregate away the useful parts.&lt;/p&gt; &lt;h2&gt;Common Mistakes&lt;/h2&gt; &lt;h3&gt;Only logging the happy path&lt;/h3&gt; &lt;p&gt;The canonical event should be emitted in &lt;code&gt;finally&lt;/code&gt;, &lt;code&gt;ensure&lt;/code&gt;, or equivalent teardown logic. If it disappears on exceptions, it fails when you need it most.&lt;/p&gt; &lt;h3&gt;Logging raw paths instead of route templates&lt;/h3&gt; &lt;p&gt;&lt;code&gt;/users/123/orders/456&lt;/code&gt; is terrible for grouping. &lt;code&gt;/users/{user_id}/orders/{order_id}&lt;/code&gt; is what you want.&lt;/p&gt; &lt;h3&gt;Logging exception classes but not error reasons&lt;/h3&gt; &lt;p&gt;&lt;code&gt;TimeoutError&lt;/code&gt; is often too broad. An error slug gives you a stable grouping key tied to a real code path.&lt;/p&gt; &lt;h3&gt;Dumping raw input into the event&lt;/h3&gt; &lt;p&gt;Amazon recommends sanitizing and truncating request details before logging. That is important here. A rich event becomes dangerous fast if you start packing it with tokens, secrets, or arbitrary payloads.&lt;/p&gt; &lt;h3&gt;Letting the schema drift&lt;/h3&gt; &lt;p&gt;Field names become muscle memory. If one service logs &lt;code&gt;user_id&lt;/code&gt;, another logs &lt;code&gt;uid&lt;/code&gt;, and a third logs &lt;code&gt;account_user&lt;/code&gt;, cross-service queries get messy fast.&lt;/p&gt; &lt;h2&gt;Implementation&lt;/h2&gt; &lt;p&gt;The usual implementation is middleware.&lt;/p&gt; &lt;p&gt;Create a request-scoped object at the start of the request. Let middleware and business logic add fields as work happens. Emit one structured line at the end.&lt;/p&gt; &lt;p&gt;If you use OpenTelemetry, the root span can play this role. If not, JSON or logfmt is fine.&lt;/p&gt; &lt;p&gt;A good starting schema is:&lt;/p&gt; &lt;ul&gt;&lt;li&gt;&lt;code&gt;service.name&lt;/code&gt;&lt;/li&gt; &lt;li&gt;&lt;code&gt;env&lt;/code&gt;&lt;/li&gt; &lt;li&gt;&lt;code&gt;request_id&lt;/code&gt;&lt;/li&gt; &lt;li&gt;&lt;code&gt;trace_id&lt;/code&gt;&lt;/li&gt; &lt;li&gt;&lt;code&gt;route&lt;/code&gt;&lt;/li&gt; &lt;li&gt;&lt;code&gt;method&lt;/code&gt;&lt;/li&gt; &lt;li&gt;&lt;code&gt;status&lt;/code&gt;&lt;/li&gt; &lt;li&gt;&lt;code&gt;duration_ms&lt;/code&gt;&lt;/li&gt; &lt;li&gt;&lt;code&gt;user_id&lt;/code&gt; or &lt;code&gt;account_id&lt;/code&gt;&lt;/li&gt; &lt;li&gt;&lt;code&gt;build_id&lt;/code&gt;&lt;/li&gt; &lt;li&gt;&lt;code&gt;error_slug&lt;/code&gt;&lt;/li&gt; &lt;li&gt;&lt;code&gt;sample_rate&lt;/code&gt;&lt;/li&gt;&lt;/ul&gt; &lt;p&gt;Then add fields whenever a real production question is hard to answer.&lt;/p&gt; &lt;h2&gt;Summary&lt;/h2&gt; &lt;p&gt;Canonical logging is a simple idea that pays for itself quickly.&lt;/p&gt; &lt;p&gt;Emit one rich, trustworthy event per request. Make it stable. Make it complete. Make sure it still appears on failures.&lt;/p&gt; &lt;p&gt;Once you do that, logs stop being breadcrumbs and start being records.&lt;/p&gt;&lt;!--]--&gt;</description>
</item>
<item>
    <title>Whisper is no longer the whole answer: how to transcribe audio and video in 2026</title>
    <link>https://blog.alcazarsec.com/tech/posts/best-way-to-transcribe-audio-video</link>
    <pubDate>Sun, 15 Mar 2026 00:00:00 GMT</pubDate>
    <description>&lt;!--[--&gt;&lt;p&gt;If you asked this a year or two ago, the answer was simple: use Whisper.&lt;/p&gt; &lt;p&gt;That was the right default for a long time. Whisper was open, accurate, multilingual, and strong enough to reshape the category.&lt;/p&gt; &lt;p&gt;It is no longer the whole answer.&lt;/p&gt; &lt;p&gt;In 2026, the best way to transcribe audio or video depends on what “best” means to you:&lt;/p&gt; &lt;ul&gt;&lt;li&gt;raw accuracy on messy long-form audio&lt;/li&gt; &lt;li&gt;a local, private setup&lt;/li&gt; &lt;li&gt;real-time latency for voice agents&lt;/li&gt; &lt;li&gt;jargon, diarization, subtitles, or compliance&lt;/li&gt; &lt;li&gt;price at scale&lt;/li&gt;&lt;/ul&gt; &lt;p&gt;Speech-to-text is a tradeoff market again. No single model wins every category.&lt;/p&gt; &lt;h2&gt;The Best Default Strategy&lt;/h2&gt; &lt;p&gt;Here is the shortest honest answer:&lt;/p&gt; &lt;p&gt;Start with a local Whisper-family setup if privacy or cost is the priority. On Apple Silicon, also test Parakeet. Use a premium API when transcript quality is critical.&lt;/p&gt; &lt;ul&gt;&lt;li&gt;For local, offline, or self-hosted work, use &lt;code&gt;faster-whisper&lt;/code&gt; or &lt;code&gt;whisper.cpp&lt;/code&gt;.&lt;/li&gt; &lt;li&gt;If you are on a Mac, test &lt;a href=&quot;https://github.com/senstella/parakeet-mlx&quot; rel=&quot;nofollow&quot;&gt;&lt;code&gt;parakeet-mlx&lt;/code&gt;&lt;/a&gt;. It makes NVIDIA’s Parakeet models practical on Apple Silicon, and the &lt;code&gt;uvx parakeet-mlx&lt;/code&gt; workflow is refreshingly simple.&lt;/li&gt; &lt;li&gt;For production-grade batch transcription where quality is the priority, look hard at &lt;a href=&quot;https://elevenlabs.io/blog/introducing-scribe-v2&quot; rel=&quot;nofollow&quot;&gt;ElevenLabs Scribe v2&lt;/a&gt;.&lt;/li&gt; &lt;li&gt;For real-time products, voice agents, and live captioning, look at &lt;a href=&quot;https://deepgram.com/learn/introducing-nova-3-speech-to-text-api&quot; rel=&quot;nofollow&quot;&gt;Deepgram Nova-3&lt;/a&gt;, &lt;a href=&quot;https://www.assemblyai.com/docs/getting-started/universal-3-pro&quot; rel=&quot;nofollow&quot;&gt;AssemblyAI Universal-3 Pro&lt;/a&gt;, or OpenAI’s newer &lt;a href=&quot;https://openai.com/index/introducing-our-next-generation-audio-models&quot; rel=&quot;nofollow&quot;&gt;&lt;code&gt;gpt-4o-transcribe&lt;/code&gt;&lt;/a&gt;.&lt;/li&gt;&lt;/ul&gt; &lt;p&gt;If that sounds less satisfying than “just use X,” that is because the category matured.&lt;/p&gt; &lt;h2&gt;Which Benchmarks Matter Now&lt;/h2&gt; &lt;p&gt;Vendor benchmark charts are useful, but they are also self-serving.&lt;/p&gt; &lt;p&gt;The public benchmark I would watch first is &lt;a href=&quot;https://artificialanalysis.ai/articles/aa-wer-v2&quot; rel=&quot;nofollow&quot;&gt;Artificial Analysis’ AA-WER v2.0&lt;/a&gt;. It is not perfect, but it is one of the better attempts to compare modern transcription quality across different conditions.&lt;/p&gt; &lt;p&gt;Its dataset mix is why it is worth watching:&lt;/p&gt; &lt;ul&gt;&lt;li&gt;&lt;strong&gt;AA-AgentTalk&lt;/strong&gt; for speech aimed at voice agents&lt;/li&gt; &lt;li&gt;&lt;strong&gt;VoxPopuli-Cleaned-AA&lt;/strong&gt; for parliamentary speech&lt;/li&gt; &lt;li&gt;&lt;strong&gt;Earnings22-Cleaned-AA&lt;/strong&gt; for long, technical earnings calls&lt;/li&gt;&lt;/ul&gt; &lt;p&gt;That last set is especially useful. A model can look great on clean speech and still struggle with bad mics, accents, names, finance jargon, cross-talk, or long recordings.&lt;/p&gt; &lt;p&gt;On the current AA-WER v2.0 leaderboard, the top models are:&lt;/p&gt; &lt;ol&gt;&lt;li&gt;&lt;strong&gt;ElevenLabs Scribe v2&lt;/strong&gt; at &lt;strong&gt;2.3%&lt;/strong&gt;&lt;/li&gt; &lt;li&gt;&lt;strong&gt;Google Gemini 3 Pro&lt;/strong&gt; at &lt;strong&gt;2.9%&lt;/strong&gt;&lt;/li&gt; &lt;li&gt;&lt;strong&gt;Mistral Voxtral Small&lt;/strong&gt; at &lt;strong&gt;3.0%&lt;/strong&gt;&lt;/li&gt; &lt;li&gt;&lt;strong&gt;Google Gemini 2.5 Pro&lt;/strong&gt; at &lt;strong&gt;3.1%&lt;/strong&gt;&lt;/li&gt; &lt;li&gt;&lt;strong&gt;Google Gemini 3 Flash&lt;/strong&gt; at &lt;strong&gt;3.1%&lt;/strong&gt;&lt;/li&gt;&lt;/ol&gt; &lt;p&gt;AssemblyAI Universal-3 Pro is also very strong at &lt;strong&gt;3.3%&lt;/strong&gt;. OpenAI’s &lt;code&gt;gpt-4o-transcribe&lt;/code&gt; shows up at &lt;strong&gt;4.1%&lt;/strong&gt; on that benchmark, and Whisper-derived options are still competitive, but no longer obviously leading.&lt;/p&gt; &lt;p&gt;Two caveats are important.&lt;/p&gt; &lt;p&gt;First, a benchmark is only as good as its audio mix. If you transcribe podcasts, hearings, sales calls, lectures, or YouTube voiceovers, your failure modes will differ from a call-center agent.&lt;/p&gt; &lt;p&gt;Second, open models count for more than leaderboard placement suggests. Whisper still stands out because of the whole package: open weights, mature tooling, local deployment, and an ecosystem that has been heavily optimized.&lt;/p&gt; &lt;p&gt;That is why &lt;a href=&quot;https://mlcommons.org/2025/09/whisper-inferencev5-1/&quot; rel=&quot;nofollow&quot;&gt;MLPerf chose Whisper Large v3&lt;/a&gt; as its ASR benchmark. Not because Whisper is best at everything in 2026, but because it is still the shared baseline.&lt;/p&gt; &lt;h2&gt;FLEURS Still Counts&lt;/h2&gt; &lt;p&gt;&lt;a href=&quot;https://openai.com/index/introducing-our-next-generation-audio-models&quot; rel=&quot;nofollow&quot;&gt;FLEURS&lt;/a&gt; is also worth watching. It is a multilingual benchmark, and it is useful because many models that look strong in English get worse once you add other languages, code-switching, or regional accents.&lt;/p&gt; &lt;p&gt;OpenAI says its newer audio models beat older Whisper variants on FLEURS and handle accents, noise, and variable speaking speed better. That is plausible, and it matches the broader direction of the market.&lt;/p&gt; &lt;p&gt;But “better than Whisper” is not the same as “best for you.” If you need local transcription on your own machine, &lt;code&gt;gpt-4o-transcribe&lt;/code&gt; does not replace the reason most people adopted Whisper.&lt;/p&gt; &lt;h2&gt;What Practitioners Are Actually Saying&lt;/h2&gt; &lt;p&gt;Benchmarks tell you what models do in a lab. Practitioner discussions tell you what breaks in real use.&lt;/p&gt; &lt;p&gt;The pattern is consistent. In a recent &lt;a href=&quot;https://news.ycombinator.com/item?id=45843249&quot; rel=&quot;nofollow&quot;&gt;Hacker News thread about transcription tools&lt;/a&gt;, people complained that raw Whisper can feel slow. The usual fix was not “stop using Whisper.” It was “use &lt;code&gt;faster-whisper&lt;/code&gt;, &lt;code&gt;whisper.cpp&lt;/code&gt;, or a better local wrapper.”&lt;/p&gt; &lt;p&gt;That matches real-world use. Many complaints about Whisper are really complaints about the default implementation, not the model family.&lt;/p&gt; &lt;p&gt;The same thread surfaced another pattern: more people now add a cleanup pass after transcription. They transcribe first, then use a cheaper LLM to fix punctuation, domain terms, or code-related oddities. That is a practical modern workflow. The transcription model gets the words down. A text model makes the result readable.&lt;/p&gt; &lt;p&gt;The open-source crowd is also testing alternatives. In another &lt;a href=&quot;https://news.ycombinator.com/item?id=44380330&quot; rel=&quot;nofollow&quot;&gt;Hacker News discussion&lt;/a&gt;, some people said NVIDIA Parakeet beat Whisper for English and speed, while others said Whisper still handled accented meeting audio better. Both can be true. ASR quality depends heavily on the audio you actually have.&lt;/p&gt; &lt;p&gt;Parakeet is especially interesting on Macs because of &lt;a href=&quot;https://github.com/senstella/parakeet-mlx&quot; rel=&quot;nofollow&quot;&gt;&lt;code&gt;parakeet-mlx&lt;/code&gt;&lt;/a&gt;, an Apple Silicon implementation built on MLX. It supports CLI and Python usage, chunking, timestamps, and streaming transcription with a simple entry point. Good local tools win because people actually use them.&lt;/p&gt; &lt;p&gt;This is probably the single most important point in the whole article:&lt;/p&gt; &lt;p&gt;&lt;strong&gt;The best transcription model is the one that fails least often on your audio, not the one with the prettiest average score.&lt;/strong&gt;&lt;/p&gt; &lt;h2&gt;So What Should You Actually Use?&lt;/h2&gt; &lt;p&gt;Here is the short version I would give a technical friend.&lt;/p&gt; &lt;h3&gt;1. If you want privacy, local processing, or no vendor lock-in&lt;/h3&gt; &lt;p&gt;Use &lt;code&gt;faster-whisper&lt;/code&gt;, &lt;code&gt;whisper.cpp&lt;/code&gt;, or on Apple Silicon, &lt;a href=&quot;https://github.com/senstella/parakeet-mlx&quot; rel=&quot;nofollow&quot;&gt;&lt;code&gt;parakeet-mlx&lt;/code&gt;&lt;/a&gt;.&lt;/p&gt; &lt;p&gt;This is still the safest default for:&lt;/p&gt; &lt;ul&gt;&lt;li&gt;journalists&lt;/li&gt; &lt;li&gt;researchers&lt;/li&gt; &lt;li&gt;lawyers&lt;/li&gt; &lt;li&gt;internal team recordings&lt;/li&gt; &lt;li&gt;people transcribing sensitive interviews&lt;/li&gt; &lt;li&gt;developers who want a self-hosted pipeline&lt;/li&gt;&lt;/ul&gt; &lt;p&gt;It is also the best answer if you want predictable cost. You pay for compute, not per minute.&lt;/p&gt; &lt;p&gt;If you are on macOS and want something polished instead of building your own workflow, MacWhisper remains a common recommendation in technical circles.&lt;/p&gt; &lt;p&gt;If you want a local Mac setup that feels modern rather than DIY, add these to the shortlist:&lt;/p&gt; &lt;ul&gt;&lt;li&gt;&lt;a href=&quot;https://github.com/senstella/parakeet-mlx&quot; rel=&quot;nofollow&quot;&gt;&lt;code&gt;parakeet-mlx&lt;/code&gt;&lt;/a&gt; if you are happy in the terminal and want &lt;code&gt;uvx parakeet-mlx&lt;/code&gt; simplicity.&lt;/li&gt; &lt;li&gt;&lt;a href=&quot;https://github.com/kitlangton/Hex&quot; rel=&quot;nofollow&quot;&gt;&lt;code&gt;Hex&lt;/code&gt;&lt;/a&gt; if you want a practical press-to-talk desktop app. It uses Parakeet TDT v3 by default and is a good example of what “daily-driver local transcription” looks like on Apple Silicon.&lt;/li&gt;&lt;/ul&gt; &lt;h3&gt;2. If you care most about batch accuracy on real media&lt;/h3&gt; &lt;p&gt;Use &lt;strong&gt;ElevenLabs Scribe v2&lt;/strong&gt;.&lt;/p&gt; &lt;p&gt;Right now it has the best public accuracy case, and it is clearly built for long-form recordings, subtitles, captioning, diarization, timestamps, and multilingual media pipelines. If I were transcribing podcasts, interviews, webinars, lectures, or a large video archive, this would be one of the first tools I tested.&lt;/p&gt; &lt;h3&gt;3. If you are building a live product&lt;/h3&gt; &lt;p&gt;Use &lt;strong&gt;Deepgram Nova-3&lt;/strong&gt; or &lt;strong&gt;AssemblyAI Universal-3 Pro&lt;/strong&gt;.&lt;/p&gt; &lt;p&gt;Here, latency, streaming behavior, diarization, redaction, and structured output count as much as raw WER.&lt;/p&gt; &lt;p&gt;Deepgram has a solid reputation in real-time speech products, and its whole story is built around speed plus production readiness. AssemblyAI is especially attractive when you want transcription plus extras like diarization, entity extraction, PII handling, and downstream speech intelligence.&lt;/p&gt; &lt;h3&gt;4. If you want open weights, but something newer than Whisper&lt;/h3&gt; &lt;p&gt;Test &lt;strong&gt;Mistral Voxtral Small&lt;/strong&gt;.&lt;/p&gt; &lt;p&gt;It is the open-weight model I would watch most closely right now. On the current Artificial Analysis leaderboard, it is the highest-ranked open-weight option. I would still call it less battle-tested than Whisper, but Whisper finally has a serious open competitor.&lt;/p&gt; &lt;p&gt;If your world is mostly Mac laptops and command-line tooling, Parakeet belongs near this section too. The family has posted strong public numbers and is building a reputation for speed and English performance. The catch is that it is not the same all-purpose multilingual default that made Whisper famous.&lt;/p&gt; &lt;h3&gt;5. If you already live in the OpenAI ecosystem&lt;/h3&gt; &lt;p&gt;Test &lt;strong&gt;&lt;code&gt;gpt-4o-transcribe&lt;/code&gt;&lt;/strong&gt;.&lt;/p&gt; &lt;p&gt;I would not pick it because it is the runaway benchmark leader. I would pick it if I were already building around OpenAI, wanted a simpler vendor stack, and cared about noisy audio, accents, and solid out-of-the-box quality without running my own local infrastructure.&lt;/p&gt; &lt;h2&gt;The Real Shift&lt;/h2&gt; &lt;p&gt;Whisper did not get bad. It just stopped being the automatic answer to every transcription problem.&lt;/p&gt; &lt;p&gt;Today, the best setup is usually a two-level strategy:&lt;/p&gt; &lt;ol&gt;&lt;li&gt;Use a strong local or low-cost model for drafts, privacy, and bulk throughput.&lt;/li&gt; &lt;li&gt;Route important or revenue-critical audio through a premium API tuned for your use case.&lt;/li&gt;&lt;/ol&gt; &lt;p&gt;That is what mature infrastructure looks like. You do not need one sacred model. You need a sensible policy.&lt;/p&gt; &lt;p&gt;For most technical users, my shortlist would be:&lt;/p&gt; &lt;ul&gt;&lt;li&gt;&lt;strong&gt;Local and private:&lt;/strong&gt; &lt;code&gt;faster-whisper&lt;/code&gt;&lt;/li&gt; &lt;li&gt;&lt;strong&gt;Best Mac terminal setup:&lt;/strong&gt; &lt;a href=&quot;https://github.com/senstella/parakeet-mlx&quot; rel=&quot;nofollow&quot;&gt;&lt;code&gt;parakeet-mlx&lt;/code&gt;&lt;/a&gt;&lt;/li&gt; &lt;li&gt;&lt;strong&gt;Best Mac desktop app:&lt;/strong&gt; &lt;a href=&quot;https://github.com/kitlangton/Hex&quot; rel=&quot;nofollow&quot;&gt;&lt;code&gt;Hex&lt;/code&gt;&lt;/a&gt;&lt;/li&gt; &lt;li&gt;&lt;strong&gt;Best batch media transcription:&lt;/strong&gt; &lt;strong&gt;ElevenLabs Scribe v2&lt;/strong&gt;&lt;/li&gt; &lt;li&gt;&lt;strong&gt;Best real-time product bet:&lt;/strong&gt; &lt;strong&gt;Deepgram Nova-3&lt;/strong&gt;&lt;/li&gt; &lt;li&gt;&lt;strong&gt;Best feature-rich API:&lt;/strong&gt; &lt;strong&gt;AssemblyAI Universal-3 Pro&lt;/strong&gt;&lt;/li&gt; &lt;li&gt;&lt;strong&gt;Most interesting open-weight challenger:&lt;/strong&gt; &lt;strong&gt;Mistral Voxtral Small&lt;/strong&gt;&lt;/li&gt;&lt;/ul&gt; &lt;p&gt;If you only remember one thing, remember this:&lt;/p&gt; &lt;p&gt;Whisper is still the baseline. It just is not the whole market anymore.&lt;/p&gt;&lt;!--]--&gt;</description>
</item>
<item>
    <title>The best time to post on Hacker News</title>
    <link>https://blog.alcazarsec.com/tech/posts/best-time-to-post-on-hacker-news</link>
    <pubDate>Sat, 14 Mar 2026 00:00:00 GMT</pubDate>
    <description>&lt;!--[--&gt;&lt;p&gt;People ask this question because Hacker News can feel mysterious.&lt;/p&gt; &lt;p&gt;One post gets 600 points. Another disappears in 20 minutes. The natural reaction is to look for the perfect hour, the perfect day, or the perfect headline formula.&lt;/p&gt; &lt;p&gt;There is some truth to that. Timing helps. But not in the clean, deterministic way people want.&lt;/p&gt; &lt;p&gt;The honest answer is:&lt;/p&gt; &lt;ul&gt;&lt;li&gt;good posts can die&lt;/li&gt; &lt;li&gt;average posts can catch a wave&lt;/li&gt; &lt;li&gt;timing helps at the margin&lt;/li&gt; &lt;li&gt;the rules and etiquette count for more than most growth advice admits&lt;/li&gt;&lt;/ul&gt; &lt;p&gt;If you want the short version, here it is.&lt;/p&gt; &lt;h2&gt;The Short Answer&lt;/h2&gt; &lt;p&gt;For a normal technical post, I would start with &lt;strong&gt;Tuesday through Thursday, roughly 14:00 to 17:00 UTC&lt;/strong&gt;.&lt;/p&gt; &lt;p&gt;That is &lt;strong&gt;7am to 10am Pacific&lt;/strong&gt; and &lt;strong&gt;10am to 1pm Eastern&lt;/strong&gt;. It lines up with when the US technical audience is awake and active, and it lines up with a lot of the practical launch advice that keeps getting repeated in decent HN guides.&lt;/p&gt; &lt;p&gt;But that is not the whole story.&lt;/p&gt; &lt;p&gt;If your goal is &lt;strong&gt;maximum raw audience&lt;/strong&gt;, posting during busy US daytime hours is a reasonable default.&lt;/p&gt; &lt;p&gt;If your goal is &lt;strong&gt;lower competition&lt;/strong&gt;, some analyses argue for &lt;strong&gt;Sunday night Pacific&lt;/strong&gt;, especially around &lt;strong&gt;midnight to 1am Pacific&lt;/strong&gt;, because fewer people submit then even though enough readers are still around to give a strong post an early push.&lt;/p&gt; &lt;p&gt;Those are two different goals:&lt;/p&gt; &lt;ul&gt;&lt;li&gt;more readers online&lt;/li&gt; &lt;li&gt;fewer competing submissions&lt;/li&gt;&lt;/ul&gt; &lt;p&gt;People often treat them as the same thing. They are not.&lt;/p&gt; &lt;h2&gt;What The Data Actually Says&lt;/h2&gt; &lt;p&gt;The official Hacker News FAQ says ranking is not just votes plus time. It also includes flags, anti-abuse software, overheating penalties, account or site weighting, and moderator action. In other words, even if you knew the best time, you still would not have a guaranteed recipe for the front page. See the &lt;a href=&quot;https://news.ycombinator.com/newsfaq.html&quot; rel=&quot;nofollow&quot;&gt;HN FAQ&lt;/a&gt;.&lt;/p&gt; &lt;p&gt;That is why timing advice is all over the place.&lt;/p&gt; &lt;p&gt;Older large-scale analysis from Max Woolf found that overall activity peaks around &lt;strong&gt;12pm Eastern / 9am Pacific&lt;/strong&gt;, but also concluded that submission time alone did not strongly determine which posts went viral. See &lt;a href=&quot;https://minimaxir.com/2014/02/hacking-hacker-news/&quot; rel=&quot;nofollow&quot;&gt;A Statistical Analysis of All Hacker News Submissions&lt;/a&gt;.&lt;/p&gt; &lt;p&gt;Recent practical launch guides mostly converge on &lt;strong&gt;Tuesday to Thursday morning Pacific&lt;/strong&gt;. See &lt;a href=&quot;https://smollaunch.com/guides/hacker-news-launch-guide&quot; rel=&quot;nofollow&quot;&gt;Smol Launch’s Hacker News launch guide&lt;/a&gt; and &lt;a href=&quot;https://calmops.com/indie-hackers/hacker-news-launch-500-upvotes/&quot; rel=&quot;nofollow&quot;&gt;Calmops’ guide&lt;/a&gt;. These are useful, but they are also launch guides written by people who want to help you launch things, so read them as practical heuristics, not revealed truth.&lt;/p&gt; &lt;p&gt;Then you get a more contrarian result. A June 2025 analysis of &lt;strong&gt;23,000 posts&lt;/strong&gt; argued that the best odds came from &lt;strong&gt;Sunday, midnight to 1am Pacific&lt;/strong&gt;, because competition was lower while engagement stayed decent. See &lt;a href=&quot;https://news.ycombinator.com/item?id=44569046&quot; rel=&quot;nofollow&quot;&gt;When to Post on HN: Analyzed 23k posts (June 2025)&lt;/a&gt;.&lt;/p&gt; &lt;p&gt;That sounds contradictory until you separate &lt;strong&gt;best chance per post&lt;/strong&gt; from &lt;strong&gt;biggest total audience&lt;/strong&gt;.&lt;/p&gt; &lt;p&gt;Those are different questions, and different articles answer different ones.&lt;/p&gt; &lt;h2&gt;Our Take&lt;/h2&gt; &lt;p&gt;If you want one simple recommendation, use this:&lt;/p&gt; &lt;p&gt;&lt;strong&gt;Post during US morning for broad technical posts. Try lower-competition windows for niche posts.&lt;/strong&gt;&lt;/p&gt; &lt;p&gt;Concretely:&lt;/p&gt; &lt;ul&gt;&lt;li&gt;Start with &lt;strong&gt;Tue-Thu, 14:00-17:00 UTC&lt;/strong&gt; for technical essays, product writeups, and serious link posts.&lt;/li&gt; &lt;li&gt;Test &lt;strong&gt;Sunday night Pacific&lt;/strong&gt; if your post is niche, weird, or likely to benefit from less crowding.&lt;/li&gt; &lt;li&gt;Treat &lt;strong&gt;weekends&lt;/strong&gt; as viable for side projects, longer reads, and more curiosity-driven posts.&lt;/li&gt;&lt;/ul&gt; &lt;p&gt;That recommendation is also broadly consistent with our own &lt;a href=&quot;https://hn.alcazarsec.com/stats?days=365&quot; rel=&quot;nofollow&quot;&gt;Top HN stats page&lt;/a&gt;, which is useful because it shows when top stories were actually created over the last 365 days.&lt;/p&gt; &lt;p&gt;One important caveat: our page is a &lt;strong&gt;top-stories heatmap&lt;/strong&gt;, not a magical conversion model for all submissions. It tells you when high-scoring stories tend to appear, not your personal odds of success. Over the last year, it points strongly at &lt;strong&gt;weekday US morning to early afternoon&lt;/strong&gt;, especially around &lt;strong&gt;14:00-17:00 UTC&lt;/strong&gt;. That is a useful factual anchor, even if it is not the whole answer.&lt;/p&gt; &lt;h2&gt;Use The Right Submission Type&lt;/h2&gt; &lt;p&gt;Before worrying about time, make sure you are using the right kind of post.&lt;/p&gt; &lt;h3&gt;Regular link post&lt;/h3&gt; &lt;p&gt;Use a normal submission when you are sharing:&lt;/p&gt; &lt;ul&gt;&lt;li&gt;a blog post&lt;/li&gt; &lt;li&gt;a research writeup&lt;/li&gt; &lt;li&gt;a news article&lt;/li&gt; &lt;li&gt;a technical tutorial&lt;/li&gt;&lt;/ul&gt; &lt;h3&gt;Show HN&lt;/h3&gt; &lt;p&gt;Use &lt;strong&gt;Show HN&lt;/strong&gt; when you made something and people can actually try it.&lt;/p&gt; &lt;p&gt;The official &lt;a href=&quot;https://news.ycombinator.com/showhn.html&quot; rel=&quot;nofollow&quot;&gt;Show HN guidelines&lt;/a&gt; are clear about this:&lt;/p&gt; &lt;ul&gt;&lt;li&gt;it should be something you worked on personally&lt;/li&gt; &lt;li&gt;users should be able to try it&lt;/li&gt; &lt;li&gt;blog posts, newsletters, sign-up pages, and landing pages do not count&lt;/li&gt; &lt;li&gt;you should be around to discuss it&lt;/li&gt;&lt;/ul&gt; &lt;p&gt;This is where a lot of self-promotion goes wrong. People post a glossy homepage with no product behind it, call it Show HN, and then wonder why the thread goes badly.&lt;/p&gt; &lt;p&gt;HN is not confused about the difference between “I built this” and “I would like traffic to this.”&lt;/p&gt; &lt;h3&gt;Ask HN&lt;/h3&gt; &lt;p&gt;Use &lt;strong&gt;Ask HN&lt;/strong&gt; when you genuinely want answers, recommendations, feedback, or discussion.&lt;/p&gt; &lt;p&gt;Do not use Ask HN as a disguised launch.&lt;/p&gt; &lt;h2&gt;The Rules That Count&lt;/h2&gt; &lt;p&gt;If you only read one primary source, read the official &lt;a href=&quot;https://news.ycombinator.com/newsguidelines.html&quot; rel=&quot;nofollow&quot;&gt;Hacker News Guidelines&lt;/a&gt;.&lt;/p&gt; &lt;p&gt;The main rules in practice are:&lt;/p&gt; &lt;ul&gt;&lt;li&gt;submit the original source when possible&lt;/li&gt; &lt;li&gt;do not hype the title with ALL CAPS, exclamation points, or marketing language&lt;/li&gt; &lt;li&gt;do not change the title unless the original is misleading or linkbait&lt;/li&gt; &lt;li&gt;do not use HN primarily for promotion&lt;/li&gt; &lt;li&gt;do not solicit upvotes, comments, or submissions&lt;/li&gt; &lt;li&gt;if it is a video or PDF, label it &lt;code&gt;[video]&lt;/code&gt; or &lt;code&gt;[pdf]&lt;/code&gt;&lt;/li&gt;&lt;/ul&gt; &lt;p&gt;That last one is more important than people think. A lot of bad HN advice starts from “how do I make my post stand out?” The official answer is almost the opposite: &lt;strong&gt;do not decorate it&lt;/strong&gt;.&lt;/p&gt; &lt;p&gt;HN generally rewards clear, plain, descriptive framing.&lt;/p&gt; &lt;h2&gt;Etiquette Counts More Than Timing&lt;/h2&gt; &lt;p&gt;This is the part many guides underplay.&lt;/p&gt; &lt;p&gt;The social rules on Hacker News are not cosmetic. They shape whether people give you the benefit of the doubt.&lt;/p&gt; &lt;p&gt;The official guidelines ask people to:&lt;/p&gt; &lt;ul&gt;&lt;li&gt;be kind&lt;/li&gt; &lt;li&gt;be curious&lt;/li&gt; &lt;li&gt;avoid snark&lt;/li&gt; &lt;li&gt;reply to the strongest version of what someone said&lt;/li&gt; &lt;li&gt;avoid shallow dismissals&lt;/li&gt; &lt;li&gt;avoid political or ideological flamewars&lt;/li&gt; &lt;li&gt;assume good faith&lt;/li&gt; &lt;li&gt;not post AI-generated or AI-edited comments&lt;/li&gt;&lt;/ul&gt; &lt;p&gt;That last one is now explicit in the official guidelines: &lt;a href=&quot;https://news.ycombinator.com/newsguidelines.html#generated&quot; rel=&quot;nofollow&quot;&gt;HN is for conversation between humans&lt;/a&gt;.&lt;/p&gt; &lt;p&gt;If you post your own work, the etiquette is simple:&lt;/p&gt; &lt;ul&gt;&lt;li&gt;be present&lt;/li&gt; &lt;li&gt;answer real questions&lt;/li&gt; &lt;li&gt;admit limitations&lt;/li&gt; &lt;li&gt;do not act like you are above criticism&lt;/li&gt; &lt;li&gt;do not vanish after dropping the link&lt;/li&gt;&lt;/ul&gt; &lt;p&gt;One small HN thread about how to submit your product put it well: expect unrequested feedback, and read the thread from time to time so people can see the author is around. See &lt;a href=&quot;https://news.ycombinator.com/item?id=44346801&quot; rel=&quot;nofollow&quot;&gt;How to Submit&lt;/a&gt;.&lt;/p&gt; &lt;h2&gt;What Usually Does Well&lt;/h2&gt; &lt;p&gt;One of the more interesting writeups on this is &lt;a href=&quot;https://randomshit.dev/posts/what-gets-to-the-front-page-of-hacker-news&quot; rel=&quot;nofollow&quot;&gt;What gets to the front page of Hacker News?&lt;/a&gt;. It is imperfect and the HN comments correctly point out some methodology issues, but the broad conclusion feels directionally right:&lt;/p&gt; &lt;ul&gt;&lt;li&gt;personal technical writing does well&lt;/li&gt; &lt;li&gt;useful tutorials do well&lt;/li&gt; &lt;li&gt;thoughtful engineering blog posts do well&lt;/li&gt; &lt;li&gt;naked corporate promotion usually does not&lt;/li&gt;&lt;/ul&gt; &lt;p&gt;That matches the official rules and it matches lived experience on the site.&lt;/p&gt; &lt;p&gt;If your post teaches something, explains a tradeoff clearly, shares a real build story, or gives technical readers something concrete to react to, it has a chance.&lt;/p&gt; &lt;p&gt;If it reads like demand generation content, it probably does not.&lt;/p&gt; &lt;h2&gt;Practical Advice Before You Hit Submit&lt;/h2&gt; &lt;p&gt;Here is the checklist I would actually use.&lt;/p&gt; &lt;h3&gt;Before posting&lt;/h3&gt; &lt;ul&gt;&lt;li&gt;make sure the page loads fast&lt;/li&gt; &lt;li&gt;put the real subject in the title&lt;/li&gt; &lt;li&gt;remove hype, slogans, and vague claims&lt;/li&gt; &lt;li&gt;make sure the article itself is worth reading without a sales agenda&lt;/li&gt; &lt;li&gt;decide whether this should be a link post, Show HN, or Ask HN&lt;/li&gt;&lt;/ul&gt; &lt;h3&gt;Right after posting&lt;/h3&gt; &lt;ul&gt;&lt;li&gt;be around for the first hour&lt;/li&gt; &lt;li&gt;answer early questions clearly&lt;/li&gt; &lt;li&gt;fix obvious bugs or broken links fast&lt;/li&gt; &lt;li&gt;do not argue with every critic&lt;/li&gt; &lt;li&gt;do not ask friends to upvote&lt;/li&gt;&lt;/ul&gt; &lt;h3&gt;If it dies&lt;/h3&gt; &lt;ul&gt;&lt;li&gt;do not panic&lt;/li&gt; &lt;li&gt;do not immediately repost&lt;/li&gt; &lt;li&gt;do not conclude the article was bad&lt;/li&gt;&lt;/ul&gt; &lt;p&gt;Sometimes the window is short. Sometimes the timing was wrong. Sometimes the topic just did not match the audience that day. Sometimes HN is random.&lt;/p&gt; &lt;p&gt;The FAQ explicitly says ranking has other hidden factors. That is a polite way of saying you should stay humble about any theory of the system.&lt;/p&gt; &lt;h2&gt;The Real Answer&lt;/h2&gt; &lt;p&gt;The best time to post on Hacker News is the time when all of these are true:&lt;/p&gt; &lt;ul&gt;&lt;li&gt;the relevant readers are awake&lt;/li&gt; &lt;li&gt;you can stick around to answer comments&lt;/li&gt; &lt;li&gt;the post is on-topic for HN&lt;/li&gt; &lt;li&gt;the title is plain and accurate&lt;/li&gt; &lt;li&gt;the thing you are sharing is genuinely interesting&lt;/li&gt;&lt;/ul&gt; &lt;p&gt;If you force me to give a schedule, I would say:&lt;/p&gt; &lt;ul&gt;&lt;li&gt;default window: &lt;strong&gt;Tue-Thu, 14:00-17:00 UTC&lt;/strong&gt;&lt;/li&gt; &lt;li&gt;experimental low-competition window: &lt;strong&gt;Sunday night Pacific&lt;/strong&gt;&lt;/li&gt; &lt;li&gt;best format for your own real product: &lt;strong&gt;Show HN, if people can actually try it&lt;/strong&gt;&lt;/li&gt;&lt;/ul&gt; &lt;p&gt;But the deeper advice is simpler:&lt;/p&gt; &lt;p&gt;Write something smart. Say plainly what it is. Follow the rules. Show up like a human.&lt;/p&gt; &lt;p&gt;That still beats timing tricks.&lt;/p&gt;&lt;!--]--&gt;</description>
</item>
<item>
    <title>Portable Secret is now open source</title>
    <link>https://blog.alcazarsec.com/tech/posts/portable-secret-is-now-opensource</link>
    <pubDate>Thu, 26 Feb 2026 00:00:00 GMT</pubDate>
    <description>&lt;!--[--&gt;&lt;p&gt;Sharing passwords or sensitive files with people outside your team is painful.&lt;/p&gt; &lt;p&gt;Secure portals require accounts. Zip files confuse people. Chat apps retain history you don’t want kept.&lt;/p&gt; &lt;p&gt;We built Portable Secret to solve this. It generates a single HTML file containing your encrypted data and the code to decrypt it. You send the file, share the password via another channel, and the recipient opens it in their browser. No accounts, no servers, no dependencies.&lt;/p&gt; &lt;p&gt;Today, we are releasing the project as &lt;a href=&quot;https://github.com/alcazarsec/portable-secret&quot; rel=&quot;nofollow&quot;&gt;open source on GitHub&lt;/a&gt;. You can audit the code, fork it, or host it yourself.&lt;/p&gt; &lt;h2&gt;The Tool is Now Portable Too&lt;/h2&gt; &lt;p&gt;The generated secret files have always been self-contained. But until now, the tool to &lt;em&gt;create&lt;/em&gt; them was a standard web application hosted on a server.&lt;/p&gt; &lt;p&gt;For the open source release, we changed how we build the application. We switched the SvelteKit adapter to “inline” mode.&lt;/p&gt; &lt;p&gt;This means the &lt;strong&gt;creator tool itself&lt;/strong&gt; compiles into a single HTML file. You can go to the &lt;a href=&quot;https://alcazarsec.github.io/portable-secret/&quot; rel=&quot;nofollow&quot;&gt;live tool&lt;/a&gt;, save the page as HTML, and keep it on a USB drive. You can generate secrets completely offline, on an air-gapped machine, without ever hitting our servers.&lt;/p&gt; &lt;h2&gt;How It Works&lt;/h2&gt; &lt;p&gt;The security model is simple and browser-native.&lt;/p&gt; &lt;ol&gt;&lt;li&gt;&lt;strong&gt;Local Encryption:&lt;/strong&gt; When you add files or text, the browser encrypts them using AES-256-GCM.&lt;/li&gt; &lt;li&gt;&lt;strong&gt;Key Derivation:&lt;/strong&gt; We use Argon2id (or PBKDF2 as a fallback) to derive the encryption key from your password. This makes brute-force attacks significantly harder.&lt;/li&gt; &lt;li&gt;&lt;strong&gt;The Payload:&lt;/strong&gt; The encrypted data is bundled with a lightweight decryption template into a single &lt;code&gt;.ps.html&lt;/code&gt; file.&lt;/li&gt;&lt;/ol&gt; &lt;p&gt;When the recipient opens the file, the browser derives the key again and decrypts the content in memory. The unencrypted data never touches a disk or a network.&lt;/p&gt; &lt;h2&gt;Why Open Source?&lt;/h2&gt; &lt;p&gt;Trust is the most important feature of a security tool. You shouldn’t have to take our word that we aren’t uploading your secrets.&lt;/p&gt; &lt;p&gt;By opening the code, we allow anyone to verify that:&lt;/p&gt; &lt;ul&gt;&lt;li&gt;No network requests are made during encryption or decryption.&lt;/li&gt; &lt;li&gt;The cryptography implementation follows standards.&lt;/li&gt; &lt;li&gt;Your data stays on your device.&lt;/li&gt;&lt;/ul&gt; &lt;p&gt;Check out the &lt;a href=&quot;https://github.com/alcazarsec/portable-secret&quot; rel=&quot;nofollow&quot;&gt;repository&lt;/a&gt; or try the &lt;a href=&quot;https://alcazarsec.github.io/portable-secret/&quot; rel=&quot;nofollow&quot;&gt;live tool&lt;/a&gt;. It is a simple solution for a complex problem.&lt;/p&gt;&lt;!--]--&gt;</description>
</item>
<item>
    <title>Portable Secret: encrypted files in one HTML</title>
    <link>https://blog.alcazarsec.com/tech/posts/portable-secret</link>
    <pubDate>Mon, 23 Feb 2026 00:00:00 GMT</pubDate>
    <description>&lt;!--[--&gt;&lt;p&gt;Sharing secrets with people outside your organization is surprisingly hard.&lt;/p&gt; &lt;p&gt;Password-protected zip files confuse non-technical users. Secure portals require accounts and maintenance. Messaging apps often violate compliance rules.&lt;/p&gt; &lt;p&gt;We wanted a solution that was boringly simple: one file, no accounts, no server dependency.&lt;/p&gt; &lt;p&gt;Portable Secret is a self-contained HTML file. It holds both the encrypted data and the code to decrypt it.&lt;/p&gt; &lt;p&gt;&lt;strong&gt;Update (2026-02-26):&lt;/strong&gt; Portable Secret is now open source, and the creator tool is portable too. Read the announcement: &lt;a href=&quot;portable-secret-is-now-opensource&quot;&gt;Portable Secret: Now Open Source and Fully Local&lt;/a&gt;.&lt;/p&gt; &lt;h2&gt;The Constraint&lt;/h2&gt; &lt;p&gt;We set one hard rule: &lt;strong&gt;The recipient must be able to decrypt the file with only a browser and a password.&lt;/strong&gt;&lt;/p&gt; &lt;p&gt;This drove every design decision. The key derivation must happen in the browser. The UI must work on slow devices. The format must be forward-compatible.&lt;/p&gt; &lt;h2&gt;Inside the File&lt;/h2&gt; &lt;p&gt;The generated HTML contains two HTML comments: a metadata block and a base64-encoded payload.&lt;/p&gt; &lt;p&gt;When you open the file, the embedded script:&lt;/p&gt; &lt;ol&gt;&lt;li&gt;Reads the metadata.&lt;/li&gt; &lt;li&gt;Asks for a password.&lt;/li&gt; &lt;li&gt;Derives the encryption key.&lt;/li&gt; &lt;li&gt;Decrypts the payload using AES-256-GCM.&lt;/li&gt; &lt;li&gt;Renders the files for download.&lt;/li&gt;&lt;/ol&gt; &lt;p&gt;No network calls occur. You can disconnect your internet and it still works.&lt;/p&gt; &lt;h2&gt;Cryptography Choices&lt;/h2&gt; &lt;p&gt;We support two key derivation functions (KDFs):&lt;/p&gt; &lt;ul&gt;&lt;li&gt;&lt;strong&gt;Argon2id:&lt;/strong&gt; Hard against GPU attacks. Preferred for modern devices.&lt;/li&gt; &lt;li&gt;&lt;strong&gt;PBKDF2:&lt;/strong&gt; The compatibility fallback.&lt;/li&gt;&lt;/ul&gt; &lt;p&gt;We default to Argon2id but fall back to PBKDF2 if the device or browser is limited. This balances security with usability.&lt;/p&gt; &lt;h2&gt;The UX of Encryption&lt;/h2&gt; &lt;p&gt;Browser cryptography can be slow. If key derivation takes 10 seconds, the page looks frozen.&lt;/p&gt; &lt;p&gt;We solve this with calibration. Before we start, we run a quick test to estimate the device’s speed. We use this to show a realistic progress bar and a time estimate. Users will wait 30 seconds if they know it is working; they will close the tab if it looks dead.&lt;/p&gt; &lt;h2&gt;Responsive Performance&lt;/h2&gt; &lt;p&gt;Argon2id is heavy. Running it on the main thread freezes the UI.&lt;/p&gt; &lt;p&gt;We offload the derivation to a Web Worker whenever possible. This keeps the interface responsive while the heavy lifting happens in the background.&lt;/p&gt; &lt;h2&gt;The Build System&lt;/h2&gt; &lt;p&gt;We split the architecture.&lt;/p&gt; &lt;ul&gt;&lt;li&gt;&lt;strong&gt;Create:&lt;/strong&gt; A full Svelte application builds the file.&lt;/li&gt; &lt;li&gt;&lt;strong&gt;Open:&lt;/strong&gt; A lightweight, framework-free template handles decryption.&lt;/li&gt;&lt;/ul&gt; &lt;p&gt;This keeps the resulting file small and robust. The recipient doesn’t need to download a framework to read a text message.&lt;/p&gt; &lt;h2&gt;Summary&lt;/h2&gt; &lt;p&gt;Portable Secret is a tactical tool. It doesn’t replace a secure collaboration platform, but it solves the immediate problem of sending a file securely to someone who doesn’t have an account.&lt;/p&gt; &lt;p&gt;It turns a security headache into a simple file transfer.&lt;/p&gt;&lt;!--]--&gt;</description>
</item>
<item>
    <title>Building a Hacker News digest around discussion, not headlines</title>
    <link>https://blog.alcazarsec.com/tech/posts/hackernews-digest</link>
    <pubDate>Wed, 04 Feb 2026 00:00:00 GMT</pubDate>
    <description>&lt;!--[--&gt;&lt;p&gt;&lt;a href=&quot;https://hn.alcazarsec.com/daily&quot; rel=&quot;nofollow&quot;&gt;Top HN&lt;/a&gt; is our daily Hacker News digest. It pulls the top stories for the day, writes a short summary of the link, then summarizes the comment thread with citations back to the original comments.&lt;/p&gt; &lt;p&gt;We built it for a very Hacker News problem. The title gets you curious, but the real value is often buried in a 200-comment thread, a correction from someone who built the thing in 2009, or a reply that is smarter than the article itself.&lt;/p&gt; &lt;p&gt;We did not want a list of links with generic AI blurbs attached. We wanted a short daily brief that captured what people were actually arguing about.&lt;/p&gt; &lt;h2&gt;Pick Stories People Argued About&lt;/h2&gt; &lt;p&gt;We fetch stories from the Algolia Hacker News API in a strict UTC day window. Then we sort by points and trim the list.&lt;/p&gt; &lt;p&gt;That still is not enough. Hacker News regularly has stories that get a wave of upvotes but very little real discussion. Those are bad digest material.&lt;/p&gt; &lt;p&gt;We filter for threads that look alive. If a story has very high points but too few comments, we drop it. The goal is not to mirror the front page. The goal is to find stories that produced useful conversation.&lt;/p&gt; &lt;h2&gt;Summarize The Link And The Thread&lt;/h2&gt; &lt;p&gt;For each selected story, we do two separate fetches.&lt;/p&gt; &lt;p&gt;First, we fetch the Hacker News item page and pull out the top comments. We rank comments by replies, skip flagged comments, and keep the ones most likely to contain the real debate.&lt;/p&gt; &lt;p&gt;Second, we fetch the story itself. Jina Reader is the happy path. If that fails, we fall back to parsing the raw HTML. If both fail, we still keep going with a thinner summary.&lt;/p&gt; &lt;p&gt;Then we generate two pieces of text:&lt;/p&gt; &lt;ol&gt;&lt;li&gt;A one-sentence summary of the article.&lt;/li&gt; &lt;li&gt;A short summary of the comment thread with citations like &lt;code&gt;[1]&lt;/code&gt; and &lt;code&gt;[2]&lt;/code&gt;.&lt;/li&gt;&lt;/ol&gt; &lt;p&gt;Those citations are not cosmetic. We turn them into links to the actual Hacker News comments, so readers can jump straight to the source of a claim or anecdote.&lt;/p&gt; &lt;p&gt;That makes the digest feel less like AI commentary and more like a guided map of the discussion.&lt;/p&gt; &lt;h2&gt;Store Markdown Once, Render It Everywhere&lt;/h2&gt; &lt;p&gt;The backend stores each story digest as markdown, with escaped text and safe links baked in.&lt;/p&gt; &lt;p&gt;That gives us one canonical format for the web page and the RSS feed. The frontend turns the markdown into HTML for the daily page. The RSS endpoint renders the same content and escapes it for XML.&lt;/p&gt; &lt;p&gt;We are not maintaining separate presentation pipelines for every surface. We store one durable artifact and render it where needed.&lt;/p&gt; &lt;h2&gt;Summary&lt;/h2&gt; &lt;p&gt;The interesting part of Hacker News is rarely the headline alone. It is the mix of the article, the rebuttals, the war stories, and the comment that explains the whole thing better than the link did.&lt;/p&gt; &lt;p&gt;Our digest is built around that idea.&lt;/p&gt; &lt;p&gt;Pick stories with real discussion. Summarize comments with citations. Reuse the same artifact everywhere.&lt;/p&gt; &lt;p&gt;If you want to see the result, it is live at &lt;a href=&quot;https://hn.alcazarsec.com/daily&quot; rel=&quot;nofollow&quot;&gt;HN daily digest&lt;/a&gt;.&lt;/p&gt;&lt;!--]--&gt;</description>
</item>
<item>
    <title>Realtime updates without WebSockets</title>
    <link>https://blog.alcazarsec.com/tech/posts/fast-realtime-updates</link>
    <pubDate>Sun, 25 Jan 2026 00:00:00 GMT</pubDate>
    <description>&lt;!--[--&gt;&lt;p&gt;Realtime features often tempt teams into complex architecture. You start by wanting a status indicator to update automatically. You end up maintaining a fleet of WebSocket servers and a Redis PubSub cluster.&lt;/p&gt; &lt;p&gt;We wanted snappy updates without the operational headache. We chose HTTP long polling combined with snapshot semantics.&lt;/p&gt; &lt;h2&gt;The Problem With WebSockets&lt;/h2&gt; &lt;p&gt;WebSockets introduce state. You have to manage persistent connections, handle backpressure, and debug a separate transport stack. Load balancers become more difficult to configure. Autoscaling gets tricky.&lt;/p&gt; &lt;p&gt;For most applications, this is overkill.&lt;/p&gt; &lt;h2&gt;Long Polling Done Right&lt;/h2&gt; &lt;p&gt;Long polling differs from standard polling. The client requests an update, and the server holds the request open until data changes or a timeout occurs.&lt;/p&gt; &lt;p&gt;This approach feels instant because the server responds immediately when an event happens. It works over standard HTTP, requires no special infrastructure, and is naturally stateless.&lt;/p&gt; &lt;h2&gt;Snapshots, Not Streams&lt;/h2&gt; &lt;p&gt;We do not stream individual events. When a change occurs, the backend returns a minimal snapshot: “The Approval object with ID 123 changed.”&lt;/p&gt; &lt;p&gt;This payload tells the frontend what is stale. The frontend then decides how to handle it—usually by re-fetching the relevant data.&lt;/p&gt; &lt;p&gt;This simplifies the logic. You do not need to replay a sequence of events to rebuild state. You just mark the cache as dirty.&lt;/p&gt; &lt;h2&gt;Precision Cursors&lt;/h2&gt; &lt;p&gt;To track state, we use high-precision timestamps. Every response includes the timestamp of the latest event. The client sends this timestamp back in the next request.&lt;/p&gt; &lt;p&gt;This ensures we never miss an update. If the client disconnects, it reconnects with its last known cursor, and the server sends everything that happened in the meantime.&lt;/p&gt; &lt;h2&gt;Invalidation as a Primitive&lt;/h2&gt; &lt;p&gt;The frontend does not apply patches. It invalidates queries.&lt;/p&gt; &lt;p&gt;If an “Account” event arrives, the frontend invalidates all account-related queries. If a “Comment” event arrives, it invalidates the comment list.&lt;/p&gt; &lt;p&gt;This keeps the UI consistent. We add a new model, define its invalidation rules, and the realtime system just works.&lt;/p&gt; &lt;h2&gt;Handling Edge Cases&lt;/h2&gt; &lt;p&gt;Realtime systems are full of sharp edges. We handle them explicitly:&lt;/p&gt; &lt;ul&gt;&lt;li&gt;&lt;strong&gt;Timeouts:&lt;/strong&gt; If the request times out (usually after 30 seconds), the client immediately sends a new one.&lt;/li&gt; &lt;li&gt;&lt;strong&gt;Disconnects:&lt;/strong&gt; If the server shuts down, it closes the connection cleanly. The client reconnects.&lt;/li&gt; &lt;li&gt;&lt;strong&gt;Empty State:&lt;/strong&gt; The first request returns a cursor so the polling loop can start.&lt;/li&gt;&lt;/ul&gt; &lt;h2&gt;Summary&lt;/h2&gt; &lt;p&gt;You can build excellent realtime experiences without WebSockets.&lt;/p&gt; &lt;p&gt;Long polling with precise cursors and cache invalidation provides 90% of the benefit with 10% of the complexity. The UI feels alive, and the operations team sleeps at night.&lt;/p&gt;&lt;!--]--&gt;</description>
</item>
<item>
    <title>Designing better database IDs with UUIDv7 and typed public IDs</title>
    <link>https://blog.alcazarsec.com/tech/posts/better-database-ids</link>
    <pubDate>Wed, 14 Jan 2026 00:00:00 GMT</pubDate>
    <description>&lt;!--[--&gt;&lt;p&gt;Engineers fight about primary keys constantly. Auto-incrementing integers leak business metrics. UUIDv4 fragments database indexes. Snowflake IDs require dedicated infrastructure.&lt;/p&gt; &lt;p&gt;We found a middle ground. We use a split-ID strategy: UUIDv7 for the database, and a prefixed, checksummed, Base32-encoded UUIDv4 for the API.&lt;/p&gt; &lt;p&gt;Our public IDs look like this: &lt;code&gt;u_14wrt1xnprkj3nj7wtndz74037rkw&lt;/code&gt;&lt;/p&gt; &lt;h2&gt;The Database Layer (UUIDv7)&lt;/h2&gt; &lt;p&gt;Random UUIDs (v4) are terrible primary keys for PostgreSQL. They scatter inserts across the index, which forces the database to load random disk pages for every write. This kills throughput.&lt;/p&gt; &lt;p&gt;UUIDv7 solves this. It embeds a timestamp in the first 48 bits, so new IDs are appended to the end of the index. You get the performance of integers with the global uniqueness of UUIDs.&lt;/p&gt; &lt;p&gt;We use UUIDv7 for all primary and foreign keys. It is a free performance optimization.&lt;/p&gt; &lt;h2&gt;The Public Layer (Stripe-style Prefixes)&lt;/h2&gt; &lt;p&gt;We do not expose UUIDv7 to users because it leaks creation time. Instead, we generate a separate random UUIDv4 for public use and format it for humans.&lt;/p&gt; &lt;p&gt;We use prefixes like &lt;code&gt;u_&lt;/code&gt; for users or &lt;code&gt;p_&lt;/code&gt; for projects. This idea comes from Stripe.&lt;/p&gt; &lt;p&gt;Prefixes solve three problems:&lt;/p&gt; &lt;ol&gt;&lt;li&gt;&lt;strong&gt;Readability:&lt;/strong&gt; You know what the ID represents immediately.&lt;/li&gt; &lt;li&gt;&lt;strong&gt;Debugging:&lt;/strong&gt; You cannot accidentally pass a User ID to a function expecting a Project ID.&lt;/li&gt; &lt;li&gt;&lt;strong&gt;Polymorphism:&lt;/strong&gt; A single API endpoint can handle different object types by checking the prefix.&lt;/li&gt;&lt;/ol&gt; &lt;h2&gt;The Formatting (Crockford Base32)&lt;/h2&gt; &lt;p&gt;Hexadecimal UUIDs are hard to read. Base64 contains special characters that break URLs. We chose Crockford’s Base32.&lt;/p&gt; &lt;p&gt;This encoding is designed for humans:&lt;/p&gt; &lt;ul&gt;&lt;li&gt;&lt;strong&gt;No ambiguity:&lt;/strong&gt; It excludes I, L, O, and U. You never have to guess if a character is a one or an &lt;code&gt;l&lt;/code&gt;.&lt;/li&gt; &lt;li&gt;&lt;strong&gt;Case insensitive:&lt;/strong&gt; &lt;code&gt;u_14WRT...&lt;/code&gt; works the same as &lt;code&gt;u_14wrt...&lt;/code&gt;.&lt;/li&gt; &lt;li&gt;&lt;strong&gt;Selection friendly:&lt;/strong&gt; Double-clicking the ID selects the whole string, unlike standard UUIDs which break at hyphens.&lt;/li&gt;&lt;/ul&gt; &lt;h2&gt;The Checksum&lt;/h2&gt; &lt;p&gt;We append a 3-character checksum to the end of the ID. This adds a small amount of length but catches 99.99% of typos.&lt;/p&gt; &lt;p&gt;If a user swaps a character, our API rejects the request before it even hits the database. This prevents valid-but-wrong lookups and wasted database cycles.&lt;/p&gt; &lt;h2&gt;Implementation&lt;/h2&gt; &lt;p&gt;Our table structure is simple:&lt;/p&gt; &lt;pre class=&quot;language-sql&quot;&gt;&lt;!----&gt;&lt;code class=&quot;language-sql&quot;&gt;&lt;span class=&quot;token keyword&quot;&gt;CREATE&lt;/span&gt; &lt;span class=&quot;token keyword&quot;&gt;TABLE&lt;/span&gt; users &lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;
    &lt;span class=&quot;token comment&quot;&gt;-- Internal PK: UUIDv7. Fast indexing.&lt;/span&gt;
    id secret_uuid &lt;span class=&quot;token keyword&quot;&gt;PRIMARY&lt;/span&gt; &lt;span class=&quot;token keyword&quot;&gt;KEY&lt;/span&gt; &lt;span class=&quot;token keyword&quot;&gt;DEFAULT&lt;/span&gt; uuid_generate_v7&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;,&lt;/span&gt;

    &lt;span class=&quot;token comment&quot;&gt;-- Public ID: UUIDv4. Random and opaque.&lt;/span&gt;
    uuid uuid &lt;span class=&quot;token keyword&quot;&gt;UNIQUE&lt;/span&gt; &lt;span class=&quot;token operator&quot;&gt;NOT&lt;/span&gt; &lt;span class=&quot;token boolean&quot;&gt;NULL&lt;/span&gt; &lt;span class=&quot;token keyword&quot;&gt;DEFAULT&lt;/span&gt; gen_random_uuid&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;,&lt;/span&gt;

    email &lt;span class=&quot;token keyword&quot;&gt;varchar&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;token number&quot;&gt;255&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt; &lt;span class=&quot;token operator&quot;&gt;NOT&lt;/span&gt; &lt;span class=&quot;token boolean&quot;&gt;NULL&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;,&lt;/span&gt;
    created_at timestamptz &lt;span class=&quot;token keyword&quot;&gt;DEFAULT&lt;/span&gt; &lt;span class=&quot;token function&quot;&gt;now&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;
&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;;&lt;/span&gt;&lt;/code&gt;&lt;!----&gt;&lt;/pre&gt; &lt;p&gt;When the API receives an ID, we strip the prefix, verify the checksum, decode the payload, and query the &lt;code&gt;uuid&lt;/code&gt; column. When we return data, we encode the &lt;code&gt;uuid&lt;/code&gt; back into our custom format.&lt;/p&gt; &lt;h2&gt;Summary&lt;/h2&gt; &lt;p&gt;IDs are part of the user interface.&lt;/p&gt; &lt;p&gt;By separating the physical ID (database performance) from the logical ID (developer experience), we get fast queries and usable handles. It takes a little more setup than a standard serial ID, but it pays off in every support ticket and log entry.&lt;/p&gt;&lt;!--]--&gt;</description>
</item>
</channel>
</rss>