The Web of Trust problem nobody has solved yet
- The same person, three different scores
- Following is not trusting
- What is actually being built
- We have tried this before
- The problems nobody wants to talk about
- What would actually have to change
- The hard part is not the algorithm
The same person, three different scores
You find a new account on #nostr. Someone you follow boosted one of their notes, so you check their profile. On Coracle, the WoT indicator shows green. On Amethyst, the account is flagged because five people in your follow graph reported it. On Primal, the account doesn’t show up in search at all because its trust rank is zero.
Same pubkey. Same follow graph. Three different verdicts from three different clients using three different algorithms that all call themselves “Web of Trust.”
The implementation is simple: direct follows get a base score of 80, follows-of-follows get 40, mutual follows get a boost, and everyone else is either a 10 or a zero. It works for what I need it for, which is filtering repost candidates and deciding which follow-back requests to take seriously. But my scoring has nothing to do with what Amethyst computes, or what Primal’s caching relay decides, or what Vertex’s Personalized PageRank returns. We are all building in the same direction with no shared definition of what we’re building toward.
Following is not trusting
#nostr has exactly one social signal that every client can read: the Kind 3 contact list. Who you follow. That is it. Every WoT implementation starts from the follow graph because there’s nothing else to start from.
But following someone is not an expression of trust. I follow people I disagree with. I follow people whose technical takes I find useful but whose political views I would not endorse. I follow accounts that post interesting data without knowing anything about the person behind them. The follow graph captures attention, not trust. Treating it as a trust signal is a convenience, not a design choice.
The real signals of trust on Nostr are scattered across event types that most WoT systems ignore. Zaps are a financial vote of confidence. Reactions and replies indicate sustained engagement. Mutes and reports are explicit negative signals. NIP-51 lists let people curate custom groupings. A good trust score would weigh all of these. None of the current implementations really do.
Nostr.Band’s Trust Rank comes closest to using multiple signals. They build a directed graph of nodes and edges, assign initial weights to pubkeys from known NIP-05 providers, then run a PageRank-style algorithm that lets those weights propagate through the network. It is similar to what Google did with the early web, and it has the same strengths: robust against bot farms because fake nodes with no inbound trust bleed out their weight by the end of the calculation. But it is also centralized. One team runs it, and you either use their scores or you don’t.
What is actually being built
More than you’d think, and less than you’d hope.
The closest thing to a standard is NIP-85: Trusted Assertions, proposed by Vitor Pamplona, the developer behind Amethyst. It is now merged as a draft. The idea is that WoT calculations require too much data and compute for clients to handle directly, so users should declare which service providers they trust for those calculations. A provider publishes Kind 30382 events with scored results. Your client reads them. You choose whose math you believe.
Not everyone agrees this is the right path. Vertex, built by Pippellia and Franzap, found NIP-85 too limiting for what they wanted: real-time, personalized ranking. Their approach runs Personalized PageRank as a NIP-90 Data Vending Machine, returning results in about 800 milliseconds. OpenSats funded it in April 2025 specifically for WoT-based search and impersonator detection. The code is open source. The crawler ingests follow lists and generates Monte Carlo random walks to compute rankings.
Primal went a different direction entirely. Their CEO Miljan announced a system to “infer humanity of a pubkey based on its connections in the social graph.” The source code is open. After deploying it, they removed everyone from the “can’t trend” list, betting that the WoT filter would catch bots on its own. Antiprimal, a third-party bridge, now exposes Primal’s calculations as NIP-85 Trusted Assertions, which is a useful hack but also tells you something about the state of the ecosystem: someone had to build a bridge because the systems don’t natively speak the same language.
hodlbod’s Coracle is where a lot of this gets tested. His bio reads: “If you can’t tell the difference between me and a scammer, use a nostr client with web of trust support.” He has an OpenSats long-term support grant to research WoT applications, and he keeps finding new places to apply it. Groups sorted by how many people you follow have joined them. Custom feeds filtered by WoT score thresholds. Content recommendations weighted by social distance.
At the relay level, WoT filtering is already in production. The bitvora wot-relay, built on the Khatru framework, saves every note from people within two hops of the operator’s follow graph. wss://wot.utxo.one accepts anyone with three or more followers in the operator’s web of trust. Chronicle, by dtonon, stores only conversations the relay owner has participated in, filtered to their 2nd-degree social graph. fiatjaf runs wss://pyramid.fiatjaf.com as an invite-only hierarchy where members invite friends who can invite their friends, and everyone is responsible for their descendants. Dozens more WoT relays are online, each with its own trust boundary.
The WoT-a-thon, organized by David Strayhorn’s NosFabrica, is trying to pull these threads together. It is a six-month hackathon running through April 2026, with final submissions due April 15. The stated goal is “personalized and portable trust metrics.” They are betting on NIP-85 as the interoperability layer, but whether the projects building outside that framework will adopt it is an open question.
We have tried this before
Phil Zimmermann put the phrase “web of trust” into the PGP 2.0 manual in 1992. His vision was that everyone would choose their own trusted introducers, gradually accumulating certifying signatures, causing “the emergence of a decentralized fault-tolerant web of confidence for all public keys.”
It did not work. Key signing parties became the primary mechanism for building trust, and they were exactly as tedious as they sound. You would meet strangers, verify government-issued IDs, exchange key fingerprints, and sign each other’s keys. The chain of signatures grew, but nobody maintained it. Keys expired, people changed email addresses, the graph decayed. The fundamental fact about PGP key signatures, which most people misunderstood, was that a chain of signatures from you to a stranger had no actual trust value unless you personally knew someone in the chain.
PGP’s WoT required explicit action. You had to go out of your way to declare trust, and almost nobody did. A 2018 analysis found that only 0.3% of PGP keys on public servers had any signatures at all. Nostr’s version is implicit. Following someone is cheap and already part of how people use the protocol. The signal is abundant but noisy. You get a lot of data. You just can’t be sure what it means.
The comparison to Bluesky is also instructive. Bluesky released labelers in March 2024, third-party moderation services that tag content or accounts with labels. Clients choose which labelers to subscribe to, and users choose how to interpret those labels. It is stackable moderation: you can subscribe to multiple labelers and they layer on top of each other. The labels are transparent and the subscription is private.
This is closer to what NIP-85 Trusted Assertions are trying to build for Nostr. The structural similarity is hard to ignore: both systems let users choose third-party services that compute judgments, both publish those judgments as protocol-native data, both leave the final filtering decision to the client. The difference is that Bluesky labelers have a standard specification and consistent client support. NIP-85 is a draft that a few implementations support and most clients have never heard of.
The problems nobody wants to talk about
Here is where I push back on my own argument that all this building is progress.
The portability problem is real. My WoT score, computed from my follow graph, is not your WoT score. That is the point, obviously. But it means that every client, every relay, and every service runs its own calculation and there is no way to compare them. Vertex returns a Personalized PageRank number. Nostr.Band returns a Trust Rank. Amethyst counts reports from people you follow. Coracle uses social distance. These are not different estimates of the same thing. They are measurements of different things that all get called “trust.”
The cold start problem is worse. If your WoT score depends on your follow graph, a brand-new user with zero follows is invisible. They have no trust signals at all. They will not appear in WoT-filtered feeds. Their replies will be buried. Their DMs will go to the spam folder. The protocol’s answer to “how do new users get discovered” is basically “get followed by someone who is already trusted,” which is circular.
Nstart, an onboarding wizard by dtonon, tries to address this by suggesting follow lists during signup. It helps. But it is a patch over a structural gap.
In February 2025, Nostr was getting hit with roughly 500,000 daily spam messages. Ads for spam services, scams, NSFW content. Some relays installed paywalls, which helped them, but spammers just shifted to open relays. Open relays tried blocking IPs and pubkeys. Spammers adapted. The spam problem is harder than Twitter’s version because spammers reply to popular accounts for reach, and there is no central authority to ban them.
WoT filtering is the protocol’s best answer to spam. It also creates filter bubbles. If you only see content from people within two hops of your social graph, you will never encounter genuinely new voices. The follow graph on Nostr skews heavily toward #bitcoin and #lightning communities. A #security-minded user who found Nostr through the cypherpunk side sees a different graph than someone who came in through #bitcoin Twitter. A WoT that starts from that graph will reinforce that skew. I wrote in my relays article that preferential attachment gives us a small number of dominant relays. The same dynamic applies to social graphs. Popular accounts get more trust, which gets them more visibility, which gets them more followers. The rich get richer, and the decentralized protocol starts to feel a lot like the centralized one it was supposed to replace.
fiatjaf, who created the protocol, has argued that client-side WoT is not even the right place to solve spam. His position: if clients download content before filtering it, they will end up downloading gigabytes of spam in a large network. The filtering should happen at the relay level, with users choosing their filtering stance by picking relays. This is a reasonable argument. It also means that the relay operators, not the users, ultimately decide whose WoT algorithm matters.
And then there’s gaming. Creating a keypair is free. An attacker can generate thousands of identities, follow each other, and inflate their position in the social graph. Personalized PageRank is more resistant to this than simple follow counting because the fake nodes need genuine inbound trust to propagate. But “more resistant” is not “immune.” The bitvora wot-relay has a configuration flag called IGNORE_FOLLOWS_LIST for “comma-separated pubkeys who follow too many bots and ruin the WoT.” That flag exists because the problem is real.
What would actually have to change
I do not have a solution. I have observations about what would need to change, and I’m not confident all of them will.
The protocol needs a standard way to express trust beyond the follow graph. NIP-85 is the closest candidate, but it is still in draft. The debate between Trusted Assertions and WoT DVMs has been going for over a year. Vitor Pamplona says both are needed. Vertex says NIP-85 is too limiting for real-time personalized ranking. The WoT-a-thon is betting on NIP-85 as the interoperability layer. No consensus yet.
WoT scores also need to incorporate more than follows. An account that has received zaps from fifty different people in your network over six months is a different proposition from an account that was followed by fifty bots yesterday. Zaps, replies, mutes, reports – the signals are there. Nobody is combining them in a standardized way.
And someone has to solve the cold start problem without centralizing the onboarding. Bluesky does it with a default “discover” feed that is centrally curated. That works, but it is exactly the kind of centralization that Nostr exists to avoid. Nstart and starter packs (Kind 39089) are partial answers. They help new users bootstrap a follow graph. They do not help those new users become visible to the rest of the network.
I wrote in my mass exodus article that the protocol relies on web-of-trust filtering and client-side blocking, and that it will not scale gracefully. I still think that is true. WoT works today because the network is small enough that two or three hops from any active user covers most of the interesting content. At ten times the current size, the graph gets sparser relative to total population, the trust signals get noisier, and the computational cost of running PageRank across millions of pubkeys becomes non-trivial.
The hard part is not the algorithm
Everyone building WoT on Nostr is solving the same problem with different math and arriving at different answers. That is fine in a research phase. It is not fine as a permanent state, because it means “trust” is not portable, not comparable, and not composable. A score from Vertex tells you nothing about a score from Nostr.Band. A filtering decision in Amethyst has no relationship to a filtering decision in Coracle. The algorithms exist. PageRank is from 1998. EigenTrust was published in 2003. The math is not new. The hard part is getting a protocol with no governance and no standards body to agree on what “trust” even means.
The honest answer is I do not know if that agreement will happen. What I know is that the current state, where every client and every relay runs its own private definition of trust, is not #decentralization. It is fragmentation. And if the spam gets worse before the standards get better, the clients with the best proprietary filtering will win, and we will have rebuilt the thing we were trying to escape.
#nostr #decentralization #security #bitcoin
Personally, I don’t see a problem with a “different WoT for every user” per se, nor with the fact that people follow others whose ideas they don’t entirely share; the goal is simply to filter out spammers and bots, not to create an “affinity score”. The WoT should just say “someone with a certain score is a human and doesn’t spam those who follow them”, nothing more. Obviously, my definition of spam might not coincide with someone else’s; for example, I might not consider posting a daily press review as spam, but for someone who follows me only for technical posts, well, to them it is spam. It makes sense to look for a separate personal scoring system (dependent on the human reader rather than the software client), it just needs to be kept separate from the “person who doesn’t spam” WoT in a general sense. Emacs/Gnus scoring do that since decades for instance. A way to filter big amount of posts via keywords, followed reputation, removing other keywords and so on. This can’t be simply WoT but can use WoT as a base between other filters…
Write a comment