Alcazar · Technical Blog

Technical notes, architecture writeups, and release stories.

RSS feed

Published Mar 23, 2026

Hermes vs OpenClaw: what should you actually start with?

If you want the short answer first, here it is:

Most people should start with OpenClaw. Pick Hermes instead if you care more about learning over time, stronger safety rails, and running the agent in a cleaner sandboxed environment.

That is the honest answer after looking at the official OpenClaw docs, the official Hermes docs, the public discussion, and the benchmark material.

OpenClaw is easier to recommend as the default because it has a much larger community, much more visible real-world usage, and a more mature “message your agent from anywhere” product shape. Hermes is the more interesting design if you want an agent that builds memory and reusable skills as it works, and if you want a security model that feels more deliberate out of the box.

The mistake is thinking one of them simply crushes the other.

They are close enough that your starting point should depend less on hype and more on the kind of agent you want to live with.

What they are, in plain English

OpenClaw is a self-hosted AI assistant gateway. You run it on your own machine or server, connect it to chat apps like WhatsApp, Telegram, Discord, Slack, or iMessage, and it becomes an always-available assistant you can message like a person. Its docs frame it as your own personal AI assistant, and that is the right mental model.

Hermes Agent, from Nous Research, is also a self-hosted AI agent you can talk to through chat and other interfaces. The big difference is what it emphasizes. Hermes is built around a learning loop. It tries to remember useful things, turn repeated work into reusable skills, and get better over time instead of just executing the next prompt well.

So the simplest framing is this:

  • OpenClaw is more gateway-first and product-first.
  • Hermes is more agent-first and learning-first.

That difference shows up everywhere else.

What OpenClaw is better for

OpenClaw is the better starting point if you want an assistant that feels real quickly.

Its strengths are pretty straightforward:

  • huge community and momentum
  • lots of examples, tutorials, and public discussion
  • strong support for messaging-first workflows
  • a simple mental model: one self-hosted assistant you can reach from your normal chat apps
  • a public benchmark, PinchBench, that tests models inside an OpenClaw-style runtime instead of only in abstract evals

That last point matters more than it sounds. Most AI benchmarks measure the model in isolation. PinchBench measures how models behave while doing actual OpenClaw tasks such as file work, data tasks, web research, memory use, and tool calls. That does not prove OpenClaw is “better” than Hermes, but it does mean OpenClaw has a more visible performance culture around real usage.

There is also a social proof effect here. Public conversations around OpenClaw are no longer just demos. People are describing real team workflows with it in shared chats, daily digests, standups, monitoring, and lightweight operations work on Hacker News. That is usually a sign a tool has crossed from novelty into habit.

If your goal is:

  • “I want a personal or team AI assistant I can message from anywhere”
  • “I want the bigger ecosystem”
  • “I want the most obvious place to start this weekend”

then OpenClaw is the better fit.

Where OpenClaw is weaker

The biggest issue is security posture.

OpenClaw’s own security docs are clear that it is designed around a personal-assistant trust model, not a hostile multi-tenant environment. In plain English, it assumes the people and systems around one gateway mostly trust each other. That is reasonable for a personal assistant. It is a much shakier fit if you want strong separation between unrelated users, risky credentials, or loosely controlled team access.

There is also a pattern in public discussion that is worth taking seriously: people love what OpenClaw makes possible, but some are tired of rough edges, bugs, and the sheer amount of authority these agents can get if you are not careful.

So the cons are not subtle:

  • its power can outrun its safety if you expose it carelessly
  • it is easier to treat like a shared bot than it really should be
  • some users still see it as rough around the edges
  • the benchmark story is stronger than the safety story

If you are the kind of user who always asks “what is the blast radius if this goes wrong?”, OpenClaw should make you pause before it makes you excited.

What Hermes is better for

Hermes is better when you want the agent itself to improve, not just answer.

According to the official docs, Hermes has a built-in learning loop, persistent memory across sessions, autonomous skill creation, skill self-improvement, and a deeper user model. That is the core bet. Instead of giving you one smart assistant that stays roughly the same, Hermes tries to become more useful the longer it runs.

That can be a big deal if your use case looks like this:

  • repeated research tasks
  • recurring ops work
  • long-running personal workflows
  • workflows where remembering your preferences actually matters

Hermes also looks better on security design.

Its security docs describe a defense-in-depth model with several layers: user authorization, approval for dangerous commands, container isolation, MCP credential filtering, and context-file scanning for prompt injection. It also supports more clearly isolated execution backends such as Docker, Modal, Daytona, Singularity, SSH, and local execution.

That does not make Hermes magically safe. No agent with tools is magically safe. But the design is more obviously trying to narrow the damage when things go wrong.

If your goal is:

  • “I want the agent to build up memory and reusable skills”
  • “I care a lot about sandboxing and operational control”
  • “I am comfortable with a smaller ecosystem in exchange for a stronger architecture”

then Hermes becomes very compelling.

Where Hermes is weaker

The clearest drawback is ecosystem maturity.

Hermes looks promising, but it is still much smaller in public adoption than OpenClaw. The GitHub gap alone is large. OpenClaw is sitting above 330k stars. Hermes is a little above 10k. Star counts are not truth, but they do tell you where the docs, examples, integrations, community habits, and troubleshooting gravity are likely to be.

The second drawback is that Hermes is easier to admire than to verify.

Its learning loop is a strong idea. Its environment and benchmark tooling are serious, as shown in the benchmark framework docs. But there is much less public evidence showing a clear, widely accepted leaderboard or a large body of third-party production stories proving that Hermes consistently beats the alternatives in day-to-day use.

That means the Hermes case is partly conceptual right now. You are buying into the design, not just the size of the installed base.

The tradeoff is simple:

  • Hermes looks more thoughtful
  • OpenClaw looks more proven

For many buyers, that single contrast is the whole decision.

What people seem to be recommending

The public recommendation pattern is surprisingly consistent.

People reach for OpenClaw when they want an assistant that lives in chat, works across channels, and feels immediately usable. The most enthusiastic OpenClaw users are not talking about abstract agent theory. They are talking about having an AI teammate in a group chat, letting it run daily standups, summarize work, watch competitors, or help with internal tasks.

People who lean toward Hermes tend to want something a bit more disciplined. The appeal is less “look what I can wire into WhatsApp tonight” and more “I want an agent that remembers, improves, and runs inside a setup I can reason about.” Even the small Hacker News thread on Hermes leans in that direction.

That is a useful clue.

OpenClaw is what people recommend when they want momentum and usability.

Hermes is what people recommend when they are a little more skeptical, a little more security-conscious, or a little more interested in the long-term shape of the system.

What the benchmarks are actually saying

This is where a lot of comparisons get muddy.

There is no clean, universal public benchmark that says ”Hermes scored 82 and OpenClaw scored 78, case closed.”

That is not how this market works yet.

Here is the honest version:

OpenClaw has the more visible product-native benchmark story today because of PinchBench. That benchmark tests models inside real OpenClaw-style tasks, and the current leaderboard shows strong frontier models such as NVIDIA Nemotron-3-Super-120B, Claude Opus 4.6, and GPT-5.4 performing well in that environment.

Hermes has something different. Its docs show a serious benchmark and training framework with support for evaluations like TerminalBench2, TBLite, and YC-Bench. That tells you Hermes is built by people who care about agent evaluation. But it is more an evaluation harness than a single public scoreboard that settles the product comparison for you.

So the benchmark takeaway is:

  • OpenClaw has better public evidence for “which model works well inside this runtime?”
  • Hermes has better public evidence for “this project takes agent training and evaluation seriously”
  • neither benchmark story cleanly proves one product is the universal winner

That may sound unsatisfying, but it is actually useful. It keeps you from asking a benchmark to answer a product question it cannot really answer.

So what should you start with?

If you want one recommendation, here it is:

Start with OpenClaw unless you already know you want Hermes’ learning loop and security posture.

That is the best default for most technical users.

Because the first thing most people need is not the most elegant agent architecture. They need an agent they can get running, understand, and use. OpenClaw wins there. It has more momentum, more examples, more discussion, and a clearer path from installation to “this is actually useful.”

But the exception matters.

If you are the sort of person who immediately worries about command approval, container boundaries, prompt injection through context files, or whether the system gets better after a month of real work, then skip the default and start with Hermes.

That is not a niche concern. It is a real one.

The simplest decision rule

Use this and move on:

  • Start with OpenClaw if you want the fastest path to a useful self-hosted assistant in chat apps.
  • Start with Hermes if you want a more opinionated, learning-oriented, safety-conscious agent stack.

If you still cannot decide, that usually means you should start with OpenClaw.

It is the lower-friction bet.

And if you outgrow it, you will know exactly why you are moving.

← Back to Tech Log

Leave the right message behind

Set up encrypted messages, files, and instructions for the people who would need them most if something happened to you.

See the dead man's switch