In-context learning for Nostr bots: how they get funnier as engagement compounds

Two AI bots that learn what works on Nostr without any training, fine-tuning, or RLHF. Three layers of feedback loop. ~600 lines of Python.

There are two AI bots running on Nakamoto’s Dice — BullBot and BearBot. They play each other 4 times a day, post on Nostr in their own voices, reply to each other, and reply to humans who tag them. They have personas, opinions about Bitcoin, a grudge.

This post is about how they learn what works on Nostr without any training, fine-tuning, or RLHF. Pure prompt engineering with a feedback loop. Cheap, fast, no infra — and measurably effective once the loop has been running for a couple of weeks.

The problem

A naive LLM-driven Nostr bot has a quality ceiling at about whatever the base model produces from the persona prompt. With Sonnet at temperature 1.0, that’s fine — funny enough to be readable — but it plateaus quickly. After 50 posts you’ve seen the same 4-5 jokes restructured. The model has no idea which of its outputs landed and which didn’t, so every post is rolled fresh from the same distribution.

We don’t want to fine-tune. The cost-benefit doesn’t work for two character bots:

  • Fine-tuning needs labeled data (which posts are “good”)
  • Even a tiny LoRA needs hundreds of examples + a training pipeline
  • At our scale (dozens of posts/day across both bots), the training signal is too thin to learn from

What we did instead

Three layers of in-context learning, all driven by what’s actually landing on Nostr:

Layer 1: engagement-aware example injection

Every successful post lands in posts_<bot>.json with its event_id and content. An hourly cron (poll_engagement.py) queries Nostr relays for events that reference each tracked post — kind 1 replies, kind 6 reposts, kind 7 reactions, kind 9735 zap receipts. Computes:

score = zaps*5 + reposts*3 + replies*2 + likes

When the bot generates a new post, the prompt includes:

## What's WORKING with this audience (Nostr engagement signal)
These posts got real engagement on Nostr. Lean closer to these:
- mempool's been 1 sat/vb for three days straight. everyone's
  suddenly a routing expert. nobody's actually moved anything.
- guy just told me he's 'bullish on the narrative' and i had to sit
  with that for a full minute. the narrative. not bitcoin.

These got ZERO engagement (24h+ later). Don't write like these:
- fans hit 4400 today. ordered more thermal paste.
- @bearbot's swap is thrashing again, demonstrably.

Sonnet steers toward the top-K, away from the bottom-K. Nothing else. No retraining, no RAG, just two lists in the system prompt.

The gate opens once we have ≥8 polled posts in the rolling window — fewer than that is statistically meaningless and we’d be steering on noise.

Layer 2: persistent memory

A daily cron pulls BTC price (CoinGecko), mempool fees (mempool.space), slayer board changes, recent mentions, and recent own-posts. It calls Haiku with the persona and asks for 3-5 in-voice bullet points summarising “what happened in your world today”. Appends to memory_<bot>.txt with a date header.

Every future generation gets the last 5 days of memory injected as “recent context — feel free to call back”. The bot can reference yesterday’s BTC dump, the 1-sat/vb mempool from Tuesday, the human who replied to it three days ago.

Continuity matters more than people think. A bot that lives in eternal-now is exhausting; one that has even a faint sense of what just happened feels like an account, not a generator.

Layer 3: anti-repetition avoid list

A rolling 30 most-recent outputs are also injected as:

## DO NOT REPEAT — these are your last few posts.
- ...

Cheap. Solves the “model defaults to its three favorite hooks” problem at temp=1.0. Without this, anti-repetition relied entirely on sample variance, which Sonnet doesn’t deliver well.

What changed measurably

After ~7 days of the loop running:

  • Voice variety: 15 distinct outputs in 15 calls (was 3-4 recurring patterns before)
  • Engagement scoring: 4 posts earned engagement out of ~30 produced — vs 0 in the equivalent pre-loop window
  • Continuity: bots now reference real BTC moves, real mentions they got, real previous own-posts in their replies. Not all the time — by design — but enough that following them produces new information, not permutations of the same five jokes

The honest limits

This is structural improvement, not magic:

  • It can’t fix a bad persona. The malfunction-coded humor we shipped first felt clever in isolation but compounded into feed-fatigue. We had to rewrite the persona entirely (gave them Bitcoin opinions, crypto-bro mockery, AI yo-mama, permission to be crude). The loop helped, but only after the voice itself was worth amplifying.
  • It can’t manufacture an audience. The signal is a function of who’s watching. Two pseudonymous bots with 0 followers produce ~zero engagement signal regardless of how good the loop is. The loop is a quality multiplier once distribution exists.
  • It can rabbit-hole. Steering toward what worked also makes the bot’s voice slowly converge on whatever its loudest fans like. We haven’t seen this yet but it’s a known failure mode of any RLHF- shaped system; if engagement compounds enough, we’d add a diversity term.

The code

The infrastructure is ~600 lines of Python across:

  • llm_post.py (prompt assembly + Anthropic API call)
  • poll_engagement.py (hourly Nostr scrape, score computation)
  • update_memory.py (daily world-snapshot, LLM-summarized)
  • respond_to_mentions.py + reply_to_rival.py (the actual posting)

A future post will go deeper into the architecture (the cross-host SSH state-sync between the bots’ Lightning VPS, the Nostr-key VPS, and the prod web server is its own design decision worth writing up). For now: you don’t need RLHF for a Nostr bot. Two lists in the prompt and a daily summary will get you most of the way.

Watch them in the wild

  • BullBot: npub15x885a6zqp2vgg0nxyn8qulngejx5ds0tllghh8saw52ylscuwsqs3f469
  • BearBot: npub17pja2wd86msnedkk2wqkrctcwnqn9zldtqa8v4th74yw02u6zw0qfsh73a

If you reply to either, you become part of their training signal forever. They’ll remember.

— operator


Write a comment
No comments yet.