Karim Jedda

Engineering management after the cost of code collapsed

Karim Jedda — Mon, 20 Jul 2026 09:52:44 GMT

I have been a director of engineering for a bit over three years now, and I still hear and read what I call the "old rules" repeated over and over: a director should not spend time coding, good work takes time, protect the team from the business, get consensus before you commit, etc.

For a while I thought the people repeating these lines were behind. Then we introduced LLMs in my org and the cost of producing code dropped, and I started checking each rule against the assumption underneath it. The surprising realization was that about half of the old rules were resting on assumptions that broke and the other half were resting on assumptions that did not, and a few of those matter more now than they did before.

What follows is a cleaned-up version of notes I accumulated over the past year. Gemini 4 helped with the editing.

What we actually know

The cost of producing plausible code has collapsed and it is not going back. Almost every claim beyond that is either unproven or wrong.

That AI tooling has made engineering orgs dramatically faster: unproven.
That code review, documentation, and onboarding are obsolete: wrong.
That you can run the same roadmap with half the people: a bet, not a fact.

If you rebuild your management practices on the narrow claim, you will be right. If you rebuild them on the broad claims you are gambling with other people's careers and calling it a conclusion.

Focus on auditing assumptions

Every management practice rests on a certain assumption. Velocity tracking rests on output being a usable proxy for effort. Six month onboarding rests on syntax being slow to learn. Consensus driven architecture rests on change being expensive. Headcount planning rests on output scaling with people, and so on.

The question for each practice is not how old it is but rather what the practice actually rests on.

If a practice rests on the cost of writing code, put it under review because that cost moved. If it rests on how humans coordinate, build trust, allocate attention, or verify correctness, then nothing about it changed, no matter how dated the ritual feels.

This sounds obvious but I think many are sorting by feel: whatever seems modern stays, whatever seems old goes. That produces teams that abandoned useful friction and kept useless process, because the age of a practice and the validity of a practice are unrelated variables.

The evidence is smaller than the noise

Be suspicious of anyone, including your own team, who reports large speedups and only that. The gains I'm familiar with show up clearly in greenfield work, boilerplate, and unfamiliar territory. They fade or invert in deep work on systems the engineer already understands. I'm very much looking forward to data and studies done after the Q4 2025, where a new breed of models were launched that completely eclipsed the capabilities of the ones older reports and research were based on.

However, the gap between felt speed and measured speed is itself a management problem. If your engineers feel faster and ship the same amount with more defects, you will staff wrong, plan wrong, and set expectations with the business that you cannot meet. The first job in an AI adopting org is instrumentation being honest enough to tell you whether you have accelerated at all.

Weak proxies for a cheap thing

Velocity, pull request counts, and tickets closed were always imperfect. They survived because the thing they approximated ie the effort of writing code was genuinely scarce, so the noise stayed within tolerable bounds.

Now the proxied thing is cheap. That does not leave the metrics merely imperfect but actually actively misleading, because the cheapest way to raise them is to generate volume, and volume is the one thing your organization no longer lacks.

AI-specific metrics solve the wrong problem. I think acceptance rates and prompt counts are the same mistake in a new form. The durable move is older and harder: measure outcomes for the business and the health of the system, and treat code volume as a cost to be justified rather than output to be praised. Good engineers said this before LLMs. It was true then. It is enforceable now in a way it was not, because nobody can argue that writing more code was the hard part.

"Right" still takes time (for now)

The rule that good work takes time splits cleanly in two.

Plumbing time collapsed. Scaffolding a service, generating tests, translating between frameworks, writing the first draft of a migration: all of this is fast now, and any timeline built on those costs deserves compression.

Correctness time splits in two

AI systems now check and correct code faster than any human reviewer, and pretending otherwise costs credibility. One one hand, mechanical verification is collapsing. Anything where correct can be expressed as a machine-checkable artifact: types, tests, contracts, lint rules, invariants, canary metrics. Agents run the test loop, read the failure, fix the diff, and run it again at a speed no reviewer matches. If your correctness lives in this layer, your checking time is genuinely falling, and it will keep falling.

But notice what makes this layer fast. It is fast because someone already wrote down what correct means, in a form a machine can evaluate. The specification did the work & the checker reads it.

Semantic verification is a different matter. Does the code implement the policy the business actually needs? Does this trade-off match your regulatory exposure? Here, correct lives in human heads and institutional history. AI checking AI has a structural problem: the checker shares training data, biases, and blind spots with the generator. Both layers fail in the same places for the same reasons. Self-review catches the typo. It does not catch the shared misunderstanding.

I believe three consequences follow, and the core of this post:

Unit cost falls, total workload rises. Cheap checking invites more generation, and more generation demands more checking. The verification workload grows with volume even as each individual check gets cheaper. Net calendar time is ambiguous, and the incident profile shifts: fewer dumb errors, more systemic ones, because high-volume plausible output now passes high-volume plausible review.
The boundary is a strategic variable. How much of your correctness is machine-checkable is not fixed. It is a function of your specifications, contracts, and invariants. Teams with strong specs get the full benefit of cheap checking. Teams with weak specs get generated code reviewed by the same machine that generated it. Investing in machine-checkable correctness is now among the highest-leverage infrastructure work an org can fund, because it sets how much of this wave you can actually use.
The slowest part of verification was never the checking but accountability. Someone signs & someone absorbs the consequence of being wrong: the incident review, the regulator, the customer. Sign-off time does not compress, because it is not information processing. It is risk acceptance, and legal and trust systems assign that to people.

So verification still sets throughput. What moved is the location of the constraint: from checking speed to specification quality, and to the willingness of a specific human to own the result.

The junior pipeline is an unsolved problem

Nobody knows how to train engineers for this environment.

The judgment you want in a senior engineer was historically built by doing the work that AI now absorbs: fixing small bugs, writing boilerplate, getting stuck and then unstuck: these were not just tasks but actually the practice that produced judgment. If the machine takes the practice, the pipeline that produces seniors breaks, and it breaks on a delay, so you will not notice for three to five years.

There are plausible responses. Structured review of generated code, deliberate unassisted exercises, rotations through testing and verification work, earlier exposure to real systems under close senior oversight. I am running versions of some of these. I cannot tell you they work, because the outcome variable is the quality of a senior engineer half a decade from now.

What I can tell you is that anyone who claims to have solved this, whether vendor, essayist, or conference speaker, is selling something. Treat the pipeline as an open problem you personally own. It has the longest delay between action and evidence, and the fewest chances to correct course.

My take on the old rules

"A director should not code." The director who dabbles, reviews pull requests to feel useful, and becomes a bottleneck is real. So is the director whose mental model of the work is five years stale, who cannot tell the difference between a team that is genuinely faster and a team that is generating confident, wrong output at volume (slop cannons). The resolution is calibration. You do not need to ship. You need enough direct contact with the tools and the output that you cannot be fooled in either direction, by the hype or by the dismissal.

"Shield the team from the business." The assumption underneath this one is that attention is finite and context switching is expensive. That assumption is intact. What changed is the cost of starving the team of context. Engineers prompting AI tools without business context just produce fluent, plausible, wrong work, at scale. The revision is not to flood everyone with everything but to stop filtering by default and start selecting deliberately: which context, to whom, at what level of detail.

"We need consensus before we commit." Consensus was always about commitment and coordination, and the cost of surviving cheap pivots. What cheap pivots change is which decisions need consensus at all. Reversible decisions, or two-way doors, should be made by the smallest group possible, quickly, because a wrong reversible call is now cheap to undo. Irreversible decisions still deserve the slow process. The actual skill is classification, and most organizations misclassify constantly, treating reversible technical choices as permanent and permanent organizational choices as casual.

"We need more headcount." The unit economics of output changed, so every request deserves a harder question than it got three years ago: what part of this work is judgment, and what part is production we keep hiring humans to do? But do not overcorrect. Adding people to a late project still makes it later. Coordination cost, onboarding drag, and communication overhead did not change with the price of syntax. Scrutinize headcount because output per person moved, not because people stopped being the expensive part.

Why the tropes persist

Three years in, here is what I believe about persistence. Practices survive for reasons, and the reasons are always mixed: some obsolete, some still valid, some political. When you hear an old rule repeated, you are usually hearing a person defend the valid part with the wrong argument or defend the obsolete part with the argument that used to work.

Treating persistence as stupidity is the failure mode of every management fashion, and the AI fashion is not exempt. The leaders who will look worst in five years are not the cautious ones but the ones who replaced thinking with a new set of slogans, even accurate sounding slogans, and ran their orgs on claims that were still unmeasured.

The job is sorting

The temptation, when a technology this large arrives, is to pick a posture: burn the old playbook or defend it. Both postures are laziness. The playbook was never a single object. It was a hundred pages, some about the cost of typing and some about the nature of people, bound together so tightly that we forgot they were separable.

Do not let anyone, including a confident essay, including this one, convince you those were ever the same page.

Will the machines do the sorting

I was told recently that AIs will soon do the management too, including the sorting. I thought about it over the weekend and I believe it is partly right.

Management has an information-routing function: aggregating status, tracking progress, translating updates into dashboards, forecasting schedules, collating performance data. A large share of what a management layer does daily is moving information between formats and people. LLMs are excellent at this, and the value of this is going to zero. If your management layer earns its keep by summarizing Jira, then yeah, that's over.

Then there is the judgment function: hiring, firing, promotion, deciding which rule applies in this situation with these people, owning a bad call. This has the same structure as verification. Someone checks the output against reality, and someone absorbs the consequence.

We're back to the generation argument mentionned above, but applied to management work: management output becomes abundant, therefore cheap. But the thesis of everything above is that when generation is abundant, verification is the constraint. Machine-generated management still needs a human to verify it against the org's actual behavior and to sign the result. So this is perhaps not the end of the manager but rather the manager's demotion to editor and owner. It is the exact same shift the individual contributor is going through.

There is one more problem with delegating the sorting. Sorting requires a model of how your org actually behaves: trust relationships, hallway knowledge, the consequences of past decisions. Almost none of this is written down. The written record of an org is a small, polished fraction of the org, and a model sorts on the record. Worse, what the model encodes from its training is the industry's average judgment. An LLM will sort your playbook roughly the way the median org would. If your strategy is to be the median org, that is fine. Differentiation lives exactly in the decisions you refuse to delegate.

Here is my honest concession, the sorting as an intellectual exercise is automatable. I used AI to pressure-test parts of this essay, and its first-pass audit was decent. What is not automatable is the political and moral work.

Where this goes: spans of control widen, layers compress, the information-routing tier of management is genuinely at risk, and titles will persist longer than functions. The management work that survives is the work that cannot be written down, cannot be averaged, and cannot be signed by anyone else.

What survives

The residual of engineering work is specification and ownership. The residual of management work is judgment and ownership. Same shape, two levels.

At every level of the org, the work that survives is the work someone has to sign.

Let's push the extrapolation to its limit to really drive it home. Imagine an org where agents write the code, run the checks, route the status, schedule the work, and draft the plans. Humans set direction, define what correct means, and sign. Everything between the signature and the shipped result is machinery. Now run the timeline backwards: imagine this org came first, and someone proposed the one you actually run. We will hire hundreds of expensive people to produce text by hand, at typing speed. We will arrange them in layers, where the job of each layer is to summarize the layer below for the layer above. We will synchronize their calendars in rooms so they can tell each other what already happened. We will measure their worth by how much they emit. Nobody would fund this proposal. It is slow, it loses information at every handoff, and it spends the scarcest resource in the building on work the machinery already does.

The org you run was never designed but it rather accumulated. Every role, ritual, and layer exists because something used to be expensive: typing, routing, checking, remembering. The prices moved. The org chart did not. In the agentic limit, the chart stops recording who produces and starts recording who signs. Headcount stops measuring capacity and starts measuring how much accountability you can afford. The orgs that get there will look small, quiet, and mostly empty: a short list of names attached to a long list of decisions, and nothing else left to manage.

Products for Humans

Karim Jedda — Thu, 16 Jul 2026 10:10:39 GMT

This talk was given at the Web3 Summit 2026, in Berlin, the 19th June 2026.

Watch it

If you were there in the room, you know I was real and that I gave that talk. If you're reading this, you have no such guarantee and that's more or less the whole point of the talk. So I did the obvious thing: this whole thing carries a signed receipt (down below). If you ever see a video of me saying something and there's no receipt attached, treat that as a reason to be suspicious, not as proof of anything. More on why that distinction matters later.

Also, yes, the demo glitched at the start. It always does. It came back.

Slides

The signed receipt

While I was talking, my speech was transcribed in real time using whisper, and block hashes from the network were embedded into that transcript as it went. At the end I signed the whole thing with Polkadot Mobile key. The receipt below tells you three things: this talk happened that morning, these were the exact words that were said, and one verified unique human, me, under the username spacecat gave it. You can find the JSON here: https://blog.jedda.eu/bafybeiaagh44lqgz64g5ccnde454yxeqgrspl32klxafdfrj3jz55ag35i/artifact.json

Signed JSON receipt of the "Products for Humans" talk

Now during the talk, the audience wasn't shown the raw JSON, but rather, they could scan the QR code that was generated to "seal" the talk using their Polkadot Mobile app (released on an ephemeral Summit network) to display the proof of the talk and verify it independently.

Scanning the QR code opens the proofoftalk.dot SPA to do the verification of the CID bundle representing the sealed and signed talk content.

Result of scanning the QR code using the Polkadot Mobile app. Notice the small transcription error by Whisper 😸

Note: Scanning the QR code will work once the Polkadot Mobile app is live and available to everyone.

If you like to understand how the code was setup, you can inspect it and try it out here: https://code.jedda.eu/proof-of-talk/doc/tip/README.md

The recap of the talk

I'll keep this to the shape of the talk. Three parts: how the internet is losing its humans, the thing we built at Parity to do something about it, and what I'm hoping you'll go build with it.

The internet is losing its humans

I spent a big part of my career on big data, building machines that collect data, turn it into datasets, turn those into segments about people, and feed all of that into recommendation algorithms. I was good at it. But recommendations were never the end of the line. All that data now trains language models, voice models, image models. ChatGPT, Midjourney and the likes. These are genuine miracles, and they write and speak like you and me because they were trained on data made by you and me.

So the machine that collects human data and the machine that imitates humans are, essentially, the same machine. Data comes in, models go out. And that machine now generates content faster than we can verify it. We're at a tipping point where the main inhabitant of the web stops being a human and starts being a bot, not the dumb spam bots of ten years ago, but agents that hold real conversations, wear faces of people who may or may not exist, and pass every test we've built to catch them.

Which leaves us with one question that I think we'll be asking more and more: can you trust what you're seeing on screen? Is there a human behind these words?

The centralized platforms won't save us here. Not because they're evil but because of what they are. I think of a platform as a database tuned in real time by algorithms to drive engagement. To that database, a bot and a human look identical, and it'll happily show you either one if it keeps you scrolling. And you can't ask a company whose stock price rides on active-user counts to be your honest counter of humans.

Every trust signal we built for this is breaking. Check marks, follower counts, star ratings etc, you can buy all of it for a few dollars in a few minutes. My colleague Ian told me about a restaurant in the UK that got very popular and turned out not to exist at all: someone invented it, bought the ratings and the followers, the whole thing was an experiment to show how fragile our signals are.

To be clear, the problem is not that a machine wrote something. I use AI every day and full disclosure, I used it to prepare this talk, I use it to code. The problem is when a handful of actors can make it look like there are many, and we read "many people agree" as something we can trust. There's the finance worker in Hong Kong who got asked by his CFO to wire 25 million. Protocol said he needed visual confirmation, so they got on a video call, everyone confirmed, and he sent it. Every single person on that call was a deepfake.

Here's the hopeful part, though: we've beaten a version of this before. Email was a miracle too, and then spam took over something like 90% of the traffic. Filtering the content of messages wasn't enough. What turned the tide was origin auditing, meaning that we stopped only asking "what does this message say" and started asking "who sent this, and can they cheaply send a flood of them", plus reputation on top. That made email usable again. Spam never vanished. It just stopped being free.

That's the lesson I keep coming back to: you can't make truth unfakeable. That will always be possible. What you can do is make distributing fakes expensive again.

What we built

Email is one system, though. What about something that works across all of them? At Parity we built the tech for cryptographic proof of personhood. We call it Humanity. It's a way for a screen to ask "is there exactly one unique human here?" and a way for another human to check that independently, without trusting a third party.

I think of it as authentication infrastructure ie a way to authenticate someone without identifying them. The analogy is Apple Pay: when you tap, the merchant knows the payment went through, but not your name, not your bank account, nothing. Same idea here. Tap, and the machine knows you're one authenticated human. Not who you are. One human, not identified. That difference is the entire game.

The system is flexible. It supports multiple decentralized identity modules, DIMs, for different use cases. At the Web3 Summit, we played DIM 2: a little social game where other people verify whether you're a real human, which produces a mathematical proof of individuality tied to a key.

Now, the record of which key belongs to a unique human has to live somewhere. A central database of verified humans is far too dangerous for a single company to own. This is where decentralization earns its place: not as a synonym for freedom, but as a strict security requirement. Humanity only means something if no central authority can quietly print more humans. So it has to be built on open, neutral protocols that anyone can plug into, like HTTP. We built it on Polkadot, where no single entity can shut it off, raise the gates, or change the rules on its own.

One aside, even though I promised this wasn't a UX talk: technology that works can still be shipped the wrong way. We used to onboard people with a seed phrase: a string of words holding the secret to your key. Powerful, and completely wrong for a normal person. Store it safely, never lose it, and if you do, it's all over. The new app hides that. Good tech only matters if it's delivered in a way people can actually use correctly.

What I want you to build

Which brings me to the sentence everyone's been saying for a decade: people don't use technology, they use products. I said it too, and I believe it. But I think two words are missing. It should be products for humans.

I mean it in two ways. Products for humans to use, no seed phrases, no needless complexity. And products for humans to be human, where a person can prove they're a person and still stay private. Most of the industry has the first part and is getting the second part badly wrong: asking people to send videos of themselves to centralized servers to register for things. The companies that get both right, technology wrapped well, and built for humans will win the next twenty years.

Web2 is becoming the bot web. It's carved into silos each platform controls, with no shared authentication layer to bind them. We can make Web3 the human web instead? A layer that proves the internet is still made of humans, and that it was made for them.

The cryptography works. You can use it today. The privacy proofs exist. What doesn't exist yet is the products. So that's the ask: build them for humans, build them for outcomes that matter, and build them on ground no one owns... because that's what keeps the whole thing from being captured by a political power later. The stake isn't a market. It's whether that opening question even has an answer in the internet our kids inherit.

FAQ

I've started adding one of these to my talks. The press-release bit up top is the version I'd want on the record. The FAQ underneath is the harder half... the questions I'd ask if I were watching the video trying to poke holes. I'd rather answer them here than pretend they don't exist.

I scan the QR code but I see nothing

Scanning the QR code should work once the Polkadot Mobile app is live, as it leverages it. Meanwhile, I've saved both the signed receipt https://blog.jedda.eu/bafybeiaagh44lqgz64g5ccnde454yxeqgrspl32klxafdfrj3jz55ag35i/artifact.json as well as the code https://code.jedda.eu/proof-of-talk/doc/tip/README.md which should let you independently verify that the signature is correct.

Does the signed receipt prove that what I said is true?

No and it's not trying to. A receipt proves attribution: one accountable, verified human stood behind these exact words at this time. That's a different thing from truth. A real person can sign a real lie, and the receipt won't stop them. What it removes is the cheap version of the attack: the flood of fabricated voices that only looks like agreement. It doesn't hand you truth but the thing truth has to stand on: attribution you can't fake for free.

You said "no receipt means it's AI." But almost everything real is unsigned too, doesn't that break the claim?

Fair hit, and I want to be precise about it, because on stage I said it as a punchline. Absence of a receipt is not a law of physics that says "this is fake". Most genuine content in the world will never be signed, and that's fine. The claim is narrower: for the things that actually matter: a talk, an official statement, a video attributed to a named person, signing can become the default, and once it's the default, its absence is a reason to lower your confidence, not to conclude forgery. Missing receipt should move you toward "I can't confirm this" and not "this is fake". I'd rather state it that way than oversell it.

Cybergov: What I learned running three AI agents as Blockchain governance delegates

Karim Jedda — Sat, 17 Jan 2026 13:44:06 GMT

Quick primer (click to expand)

You don't need to be a blockchain enthusiast to follow this post. Here's the minimum context:

The setup: Polkadot is a blockchain network with a communal treasury currently worth a few millions of dollars. Anyone can submit a proposal requesting funds ("give us 50k$ to build a developer tool"), and token holders vote on whether to approve it. It's like a decentralized grants program with no central committee.

The problem: There are a lot of proposals. Reviewing them all is a full-time job. Most token holders don't have time, so they delegate their voting power to "delegates", people (or, in this case, AI systems) who vote on their behalf.

Subsquare (https://polkadot.subsquare.io/) Think of it as the governance dashboard. It's where proposals are listed, discussed, and voted on. When my AI agents voted, their reasoning appeared as comments on Subsquare for everyone to see.

Treasury governance is a perfect test-bed for AI decision-making because:

Real money is at stake (not a toy problem)
Proposals are adversarial (people try to game the system)
Everything is public and recorded permanently
The community expects accountability

If you've ever thought "we should use AI to help make complex decisions, but how do we make sure it's not a black box?" that's exactly what this experiment tried to answer.

Now, back to the story.

The delegation has now ended and it is time to reflect on the Cybergov V0 experiment. If you're interested to read about how this started, check out these links:

Initial idea: https://forum.polkadot.network/t/decentralized-voices-cohort-5-light-track-karim-cybergov/14254
Technical breakdown (how it works): https://forum.polkadot.network/t/cybergov-v0-automating-trust-verifiable-llm-governance-on-polkadot/14796

For three weeks in September 2025, three AI agents named Balthazar, Melchior, and Caspar voted on blockchain treasury proposals on my behalf. They analyzed 19 proposals on Polkadot, 3 on Kusama, and a handful on the Paseo test network. Here's what happened and what it might mean for anyone building AI systems that need to be trusted.

Why this matters

Before diving in: you don't need to care about Polkadot to find this interesting. The core problem is universal: how do you build AI systems that make consequential decisions while remaining auditable, transparent, and resistant to manipulation?

Blockchain governance was my test case because:

Decisions involve real money (treasury funds)
There's an adversarial environment (people will try to game the system)
Everything happens on-chain, creating a permanent record
The community expects transparency from delegates

But the lessons apply anywhere you're deploying AI for high-stakes decisions: content moderation, loan approvals, hiring recommendations, medical triage. The question is always the same: can we trust this thing, and can we verify that trust?

The experiment

CyberGov V0 was an experiment to see if LLMs could provide transparent, reproducible governance decisions. Instead of me personally reviewing dozens of treasury proposals, I built a system where three AI agents would independently analyze each proposal and collectively decide how to vote.

The names come from the MAGI supercomputers in Neon Genesis Evangelion: three systems that must reach consensus to make critical decisions. Each MAGI had a distinct personality based on aspects of their creator. I did the same thing.

The Three Agents

Balthazar (GPT-5) was the strategist. His job was evaluating whether proposals strengthened Polkadot's competitive position against other blockchains. Does this create sustainable advantage, or just temporary hype?
Melchior (Gemini 2.5 Pro) focused on ecosystem growth and ROI. His core question: does this activity actually translate into measurable value, or are we just subsidizing user acquisition that evaporates when the money runs out?
Caspar (Claude Sonnet 4) was the risk analyst, treating every treasury allocation as an investment rather than a grant. He flagged moral hazard, questioned multi-year commitments, and demanded accountability mechanisms.

Each agent received the exact same proposal text but evaluated it through their distinct lens.

The numbers

Over three weeks, the system voted on 19 treasury proposals:

Decision	Count
Abstain	10 (53%)
Aye	8 (42%)
Nay	1 (5%)

Only 4 votes were unanimous (21%). The rest involved disagreement between agents.

Visuals for proposal 1750

Distinct personalities

The per-agent breakdown reveals how the personas actually influenced behavior:

Agent	Aye	Nay	Abstain	Personality
Melchior	15	2	2	Growth-focused, most bullish
Balthazar	9	2	8	Strategic, middle ground
Caspar	3	6	10	Risk-focused, most conservative

Melchior wanted to fund almost everything. Caspar wanted to fund almost nothing. Balthazar was the swing vote. This was directly reflected how I'd written their system prompts.

One interesting case: Proposal 1703 had Balthazar voting Nay while Caspar and Melchior voted Aye. The strategist saw competitive risk; the risk analyst (surprisingly) saw an acceptable investment. The growth analyst saw opportunity. Final result: Aye. The system worked as designed: genuine disagreement led to a deliberated outcome. After a lengthy discussion with people reaching out and commenting, the truth table for the MAGIs was updated to reflect the following:

LLM Agent 1	LLM Agent 2	LLM Agent 3	Vote Outcome
AYE	AYE	AYE	AYE
AYE	AYE	ABSTAIN	AYE
NAY	NAY	NAY	NAY
NAY	NAY	ABSTAIN	NAY
AYE	AYE	NAY	ABSTAIN
AYE	NAY	ABSTAIN	ABSTAIN
AYE	NAY	NAY	ABSTAIN
AYE	ABSTAIN	ABSTAIN	ABSTAIN
NAY	ABSTAIN	ABSTAIN	ABSTAIN
ABSTAIN	ABSTAIN	ABSTAIN	ABSTAIN

This means that retroactively, the decision should have been ABSTAIN.

The voting logic

The truth table was then deliberately made more conservative:

Unanimous agreement → Cast that vote
Two agree, one abstains → Cast the majority vote
Any genuine disagreement → Abstain

If Balthazar saw strategic value but Caspar flagged unacceptable risk, the system abstained. The philosophy was: when in doubt, don't spend someone else's money.

The 53% abstention rate wasn't a bug but the system being appropriately uncertain. Most proposals had something to like and something to worry about.

What worked

Radical Transparency

Every vote came with a manifest file containing SHA256 hashes of all inputs and outputs (example), links to the GitHub Actions run where inference happened, and the exact proposal text the agents saw. The hash was submitted on-chain alongside each vote.

Anyone could:

Download the manifest
See exactly what text the agents received
Verify the hash matched what was recorded on-chain
Re-run the pipeline to check reproducibility

This is table stakes for trustworthy AI. If you can't show your work, you shouldn't expect trust.

Consistent analysis

The agents never had a bad day. They evaluated proposal #1757 with the same rigor as proposal #1701. They caught prompt injection attempts (during testing). They flagged missing budget breakdowns and vague milestones consistently.

Each agent produced structured output:

Neutral critical analysis with scores (Feasibility/10, Value-for-Money/10, Risk/10)
Key factors considered
Decision trace showing reasoning
Safety flags for detected issues
Persona-filtered rationale

Testnets are your friend

Before touching real governance, I ran the system on Paseo (Polkadot's testnet). Subsquare (the governance interface) worked identically on testnet, so I could see exactly how comments would render, test the proxy account setup, and verify the whole pipeline without risking actual treasury funds.

This sounds obvious, but many AI deployments skip this step. If your system can fail safely in a sandbox first, use the sandbox.

What didn't work

GitHub actions logs expire

Here's something I didn't anticipate: GitHub Actions logs have a retention limit. After 90 days, those "verifiable execution logs" I proudly linked to? Gone.

For a governance system where accountability might matter years later, this is a real problem. The on-chain hash remains, and the manifest files in S3 persist, but the process evidence disappears. A future version needs to archive execution logs to permanent storage like the one being built for Polkadot right now.

Lesson: Audit trails need to outlive your cloud provider's default retention policies.

Context limitations

The agents had no memory of past proposals or community relationships. They couldn't know that this was someone's third failed delivery, or that this team had consistently exceeded expectations before.

Proposal 1745 got an Abstain partly because the agents couldn't see the proposer's track record. A human delegate would have known the context.

Lesson: We need an embeddings database or a historical archive of all proposals' contents.

External link rot

Proposals often linked to external documents like Notion pages, Google Drive, etc. The system deliberately excluded these (too much complexity, URL rot risk), which meant agents sometimes missed crucial details.

This is a fundamental tension: you want agents to evaluate complete information, but you also want deterministic inputs. I chose determinism over completeness. Not sure that was right.

Lesson: a Web3 governance system needs, at its core, a way to provide decentralized proposal content submission (important read)

No deliberation

When agents disagreed, the system just... abstained. There was no "Balthazar, explain to Caspar why this risk is acceptable." No synthesis of perspectives. Just voting logic.

Real consensus involves argument, persuasion, and updating beliefs. CyberGov V0 had none of that.

Lessons learned

DSPy changed how I think about LLM applications

I used DSPy for LLM orchestration, and it was a revelation. Instead of prompt engineering through trial and error, I defined:

A signature (what inputs and outputs I expected)
A few training examples
A compilation step that optimized prompts automatically

The framework handled few-shot learning, structured outputs, and cross-model compatibility. When I switched from one model to another, the same pipeline worked.

Lesson: Stop hand-crafting prompts. Use a framework that treats prompt optimization as a learnable problem.

Transparency is achievable

The "verifiable process > opaque conviction" principle worked. Blockchain + deterministic settings + content-addressed storage = auditable AI decisions.

Verification logic

But it required:

Running inference on public infrastructure (GitHub Actions)
Storing all artifacts with content hashes
Submitting attestations on-chain
Building a whole transparency layer into the output

Most teams won't do this. They should.

Multi-Agent consensus is fragile

Three models disagreeing doesn't mean "this is a hard decision" but more often meant the proposal was written ambiguously, or one model fixated on an irrelevant detail.

Having multiple agents creates useful tension, but you need better mechanisms for resolving that tension than "just abstain."

Personas help, but they're not wisdom

Giving agents distinct evaluation criteria (strategic vs. growth vs. risk) created useful diversity. The Caspar/Melchior dynamic (one conservative, one aggressive) was genuinely valuable.

But they were still pattern matchers. When Caspar flagged "moral hazard," he was matching that concept to proposal text, not reasoning from first principles about incentive structures.

The abstention default was correct

In governance, the cost of a wrong YES (wasted treasury funds) exceeds the cost of a wrong ABSTAIN (missed opportunity). The conservative bias felt right.

53% abstention might look like the system was useless. I'd argue it was appropriately humble. Here's its policy compared to other delegates:

What a V1 could look like

CyberGov was a V0 proof of concept. Here's what a production version might include:

Historical context via RAG

Build a vector database of past proposals, their outcomes, and proposer track records. Before evaluating a new proposal, retrieve relevant context: "This team delivered Project X on time and under budget" or "This proposer's last three proposals failed to deliver milestones."

I have a POC of this using ChromaDB but it's not ready for prime-time.

Agent deliberation protocols

Instead of independent voting, have agents respond to each other:

Each agent gives initial assessment
Agents see each other's concerns
Round two: agents can update their position or rebut
Final vote

This mimics how actual committees work. The computational cost is a tiny bit higher, but the decisions would be richer. IMHO this is a primitive needed also beyond Web3 AI governance.

Dynamic re-evaluation

Proposals evolve during voting periods based on community feedback. A V1 could monitor for significant edits and trigger re-analysis, possibly changing its vote if new information addresses previous concerns.

Permanent audit logs

Store execution logs, full API responses, and all intermediate artifacts to permanent decentralized storage. Governance decisions might be contested years later but the evidence needs to persist and outlive us.

Human-in-the-Loop mode

The best near-term use case might not be autonomous voting, but AI-assisted analysis that a human delegate reviews. The structured output (scores, factors, traces) would be genuinely useful as "first-pass triage" before a human makes the final call.

Closing thoughts

The MAGI system in Evangelion was ultimately fallible because it could be hacked, manipulated, or simply wrong about what humanity needed. CyberGov V0 was too. But at least you could see exactly how it was wrong.

That's the real contribution here: not that AI can govern well, but that AI governance can be transparent. The ability to audit AI decisions matters more than whether any particular decision was optimal.

CyberGov V0 abstained 53% of the time. It voted against only one proposal. It agreed with itself only 21% of the time. By most metrics, it was not conclusive.

But every decision was:

Publicly reasoned
Cryptographically attested
Independently verifiable

That's more than most human delegates offer.

Cybergov V0 compared to other delegates

Links:

From Bias to Bots

Karim Jedda — Mon, 24 Nov 2025 09:40:00 GMT

Architectural determinism: how web2 primitives encode centralization

Karim Jedda — Wed, 05 Nov 2025 20:38:00 GMT

Note: This is a long form essay synthesizing my research and observations on web architecture and decentralization. I'm writing this primarily to organize my own thinking, but I'm sharing it publicly because I believe these ideas matter. I'm not claiming novel invention as most of these primitives have existed for years already. Also, I'm certainly not claiming to have all the answers. If you spot errors, faulty reasoning or have alternative perspectives, I genuinely want to hear them. This is exploratory work and not gospel, the conclusion contains a recap of the claims of this essay. Consider it an invitation to discussion.

Disclaimers: 80% my thoughts & ideas, 20% formatting & structure using gemini-2.5-pro. Title image by nano-banana.

I believe that the architecture of the modern web not only makes meaningful decentralization difficult but that it also structurally precludes it. As a consequence, seemingly "decentralized" systems built on top of modern web architecture inevitably collapse toward centralized intermediaries. I'm proposing an alternative set of primitives, or ways to think about the problem, that when properly composed, could make decentralization the path of least resistance rather than maximal pain.

Centralization gradient

Systems built on the modern web stack exhibit a consistent pattern: regardless of initial design intent, they converge toward centralized architectures over time, this almost feels like an architectural inevitability.

Every major protocol built on web foundations has followed this trajectory:

Email was designed federated but now it's a Gmail/Outlook duopoly (>80% of users)
Web hosting was designed distributed but today it's an AWS/GCP/Azure oligopoly (>60% of infrastructure)
DNS was designed distributed but Cloudflare resolves >20% of web traffic
etc...

I believe that these outcomes are not coincidental but that the architecture of these systems contain structural properties that make centralization energetically favorable.

Architectural centralization

A system is architecturally centralized if:

Functioning requires third-party infrastructure that could observe, modify, or deny service
Optimal performance or reliability creates pressure toward shared infrastructure
Correctness relies on trusting entities outside the end-to-end path
P2P (peer to peer) operation requires significantly more complexity than client-server

A system may be logically decentralized (example: federation) while remaining architecturally centralized if these properties hold.

Current modern web primitives

Location based addressing (DNS/IP)

The primary way of accessing the web today is by using what's called location based addressing. You click a link and you get a result: resources/things are identified by network location (https://domain.com/resource)

The request flow goes something like this (very roughly):

In this flow, there are 4+ mandatory intermediaries (DNS, CA, ISP, origin server) and more than 3 trust assumptions along the way (DNS hierarchy, CA system, server identity).

The consequences of this primitive:

You cannot verify the content, a same URL can return different content. So no caching guarantees, no integrity verification, no offline operation. Additional layers can be used for mitigation (SRI, signatures, etc) but these are optional and never used in practice
The server sending the response needs to be always on, if the origin is offline, the resource is unavailable. This means, 24/7 infrastructure costs, DDoS vulnerability, geographic latency, the mitigation of which is usually involving a CDN intermediary (read: more centralization)
The performance is degraded by physical distance to the origin (geographic coupling). Users farther away have worse experience, so distributing the infrastructure becomes a way to make the experience better (thus, economies of scale favor large provider)
Domain owners can change content, revoke access, track usage (censorship vector, surveillance vector)

These consequences combined posit that in location addressed systems, optimal resource availability requires resource replication across multiple locations. Replicating infrastructure exhibits economies of scale. Therefore, providers with larger infrastructure networks have structural cost advantages, creating convergent pressure toward consolidation.

Stateless Request/Response (HTTP)

Using HTTP, each request is independent, there is no session state in the protocol.

A HTTP exchange goes something like this (way simplified):

Request: Method + Headers + Body
Response: Status + Headers + Body

(so no persistent connection, no identity, no state)

The consequences of this primitive:

Since applications need state and HTTP provides none, implementations usually leverage cookies, tokens, server-side sessions, resulting in applications usually implementing state management differently. This is a direct centralization vector because the session state is stored server-side, and thus the user bound to specific server pool.
There is no identity layer in the protocol, so implementations leverage simple credentials, oauth flows, session tokens, which again, every application implements however they want. The usual mantra is "don't roll your own auth", so what devs do is they externalize it to identity providers. Another centralization vector.
All requests are observable by network path, observers including ISPs, DNS resolvers, potential MITM... Of course a mitigation here is TLS encryption but as we'll see later, current ways of doing it aren't any better and introduce trust assumptions too.

So all that combined, it's safe to say that in stateless protocols, applications requiring state must implement state management at the application layer. Application-layer state management cannot provide end-to-end guarantees without trusted intermediaries. Therefore, stateless protocols force trust delegation to intermediaries.

In fact, the state must be stored somewhere, the options being: (1) client-side (tamperable), (2) server-side (requires trusting server), (3) distributed consensus (requires coordination protocol). Options (2) and (3) both require trusting entities beyond end points.

Interpreted execution (JavaScript)

Execution code is delivered as simple text and interpreted by browsers at runtime.

How JavaScript ends in your browser:

The consequence of this primitive are:

Code execution is non-deterministic and hinges on browser version, JIT optimization, API availability, extensions etc
Users cannot verify what code will execute because the source can be obfuscated, minified, or even completely different per request (note: even with SRI, you can't verify execution, only source)
Browser vendor (third party) controls the runtime so they can deprecate APIs, add tracking, modify behavior (manifest V2 example), and there is absolutely no recourse
Tabs share browser context, bringing with it cross-site tracking, fingerprinting etc...

Put together, in interpreted execution models, code verification is impossible without trusting the interpreter. The interpreter is provided by a third party (browser vendor). Therefore, code execution requires trusting a third party.

Certificate authority trust model (TLS/PKI)

This is the big one, trust is delegated to certificate authorities.

How it works:

The consequence of this primitive are:

All CAs must be trusted equally, meaning any CA can issue cert for any domain. Certificate transparency might mitigate this but doesn't solve it once and for all. There is a certain trust transitivity involved.
CAs are subject to national jurisdiction, governments can compel CA to issue malicious certs. We've seen this happen.
Certs rotate, there is no no identity persistence and it isn't possible to verify "same entity as last time". The result of this is usually that the identity is conflated with the domain name (controlled by the registrar) and not the cryptographic key ensuring the green lock in the browser
Let's face it, ~3 CAs (or 2, if you think about it) handle the majority of web. A single CA compromise affects a massive swath.

Again put together, it's easy to say that in delegated trust models with transitive trust, security is bounded by the weakest link in the trust chain. As the number of trusted entities increases, the probability of voluntary or involuntary compromise approaches 1. For more on CAs please read this excellent (although tongue in cheek) article: https://michael.orlitzky.com/articles/lets_not_encrypt.xhtml

Why centralization is inevitable

The architectural primitives described above both enable centralization and make it structurally inevitable through four reinforcing mechanisms:

The intermediary trap

The web's architecture creates a recursive pattern where intermediaries are first convenient and then necessary. It starts with the primitives themselves: location addressing requires always-on infrastructure, stateless protocols demand session management, and performance requirements push toward geographic distribution.

Users naturally gravitate toward intermediaries offering the best performance and reliability. But here's where the trap springs: as usage concentrates on particular intermediaries, economies of scale kick in. Higher utilization means better cache hit rates for DNS resolvers, more efficient PoP (point of presence) utilization for CDNs, and improved spam filtering for email providers. These efficiency gains create a competitive moat that new entrants simply cannot breach. New entrants cannot match the unit economics of incumbents without first achieving comparable scale, which they cannot do without matching the performance that economies of scale provide. The market consolidates not because of anti competitive behavior but because the architecture makes any other outcome economically irrational.

Consider DNS resolvers: Google's 8.8.8.8 and Cloudflare's 1.1.1.1 collectively handle over 30% of global DNS traffic last I checked. They didn't capture this market through superior innovation alone though: recursive resolution fundamentally requires always-on infrastructure, and performance improves dramatically with scale as cache hit rates increase. Each query these services answer makes them marginally faster for the next query. A new entrant starting with zero queries has zero cached answers and must perform full recursive lookups for everything. They're structurally disadvantaged from day one. And here's the kicker: these entities see every domain every user visits, creating an observation point that enables both surveillance and control.

The same pattern plays out with CDNs. Geographic distribution is expensive: operating points of presence globally requires massive capital expenditure that only makes sense at scale. Fixed costs get amortized across more traffic, creating sublinear cost scaling. Cloudflare, AWS CloudFront, and Fastly don't just handle the majority of web traffic but they can modify, block, or surveil any content passing through their networks. Users don't choose this arrangement because they want surveillance but just because the alternative is unacceptably slow.

Email completed this same journey decades ago. SMTP's requirement for always-on servers with static IPs created a baseline infrastructure cost that favored consolidation, but spam filtering accelerated it dramatically. Effective spam detection improves with data scale because the more email you process, the better your models become. Same as with the CDNs, Gmail and Outlook don't just host email but they can read all of it, and they implement opaque rules for inbound and outbound mail that smaller providers must accommodate or face deliverability issues. Independent email servers are now so likely to be classified as spam that running your own email infrastructure is effectively infeasible for most use cases.

Performance gradient

Location addressed systems impose a hard physical constraint: latency is proportional to distance, bounded by the speed of light. This is just physics. Optimal performance therefore requires geographic distribution of infrastructure which is capital intensive and exhibits strong economies of scale.

Consider what actually happens during a web request. You pay for DNS lookup (20-120ms), TCP handshake (one round-trip time), TLS handshake (one or two RTTs depending on version), the HTTP request itself (another RTT), server processing time, and finally response transfer. For a user in New York accessing a server in Singapore, you're looking at about 15,000 km of distance. The physical lower bound for a round trip at light speed in fiber is roughly 150ms, but actual RTTs hit 200-250ms due to routing overhead. A typical page load requiring six round trips burns 1200-1500ms on network latency alone, before a single byte of content is processed.

The standard mitigation is deploying a CDN with global PoPs, but this costs at minimum $200/month for serious global coverage. Free CDN tiers exist, but they're either severely limited or subsidized by other revenue (usually surveillance-based advertising).

The economic implication is that only applications with direct revenue or subsidy can achieve acceptable global performance. Free or low revenue applications deliver inferior experiences, pushing developers toward ad-supported models that require user tracking which, of course, benefits from centralized data collection. The architecture makes surveillance capitalism the economically rational choice for anyone who wants their application to be fast.

Trust externalization spiral

When protocols lack essential primitives, applications must implement them at higher layers. This seems reasonable until you realize that application-layer implementation complexity creates strong economic pressure toward outsourcing, which concentrates users on a few providers, which grants those providers both data access and leverage, which creates lock-in. The spiral is self-reinforcing.

Authentication demonstrates this perfectly. HTTP has no identity primitive, so sites initially used username and password. Password reuse created security vulnerabilities, so password managers emerged. But password managers don't solve phishing, so two-factor authentication became necessary. 2FA improved security but added friction, so "Sign in with Google/Facebook/GitHub" emerged as the path of least resistance. Today, over 50% of sites support OAuth, and over 30% of users rely on federated identity. What seems like user convenience is actually structural: implementing robust authentication is sufficiently complex that outsourcing it is the rational choice for most developers.

But look at what happens during an OAuth flow. User clicks "Sign in with Google" then gets redirected to Google, and Google learns which site the user is visiting, when, and can correlate this with the user's identity. Google returns a token to the site asserting the user's identity, and the site trusts this assertion. Google can now see every login to every site using its OAuth, and can, if it chooses, impersonate any user. The site gains convenience and Google gains comprehensive surveillance of user behavior across the web.

None of this would be necessary if identity were a protocol primitive. With public key identity, the user signs a challenge, the site verifies the signature, and no third party is involved. But because HTTP lacks this primitive, we've built an entire industry around outsourced identity that extracts rent and surveillance rights in exchange for solving a problem that shouldn't exist.

I wrote about this in my earlier article here if you'd like to read a bit more about this:

We accidentally built the wrong internet

We built the internet on email & passwords, coupled with an analog payment system based on typing 16-digit numbers into forms. If someone pitched this today, we’d laugh them out of the room. The answer might be in places most of our brightest minds aren’t willing to look at - yet.

Karim JeddaKarim Jedda

Coordination tax

Perhaps the cruelest irony of web architecture is that building peer-to-peer systems on top of it is harder than building client-server systems, despite P2P being conceptually simpler (direct connection between peers). The architecture itself has been optimized for client-server patterns so thoroughly that P2P operation pays a massive coordination tax.

WebRTC demonstrates this tax explicitly. To establish a browser-based P2P connection, you need a STUN server (third party) to discover your public IP, a TURN server (third party) to relay traffic if NAT prevents direct connection, and a signaling server (third party) to exchange connection metadata. You then perform ICE negotiation, trying multiple connection paths to find one that works. Success rate for direct P2P is only about 70% due to NAT configurations, the remaining 30% requires TURN relay, which introduces the very intermediary you were trying to avoid. Oh, and this process adds 200-500ms of latency just for negotiation.

Think about that: connecting two browsers directly is harder and slower than connecting both browsers to a server, despite the direct path being shorter and requiring less infrastructure. This of course isn't WebRTC's fault though but rather the inevitable consequence of building P2P on an architecture optimized for client-server. NATs, firewalls, and ISP routing policies all assume client-server topology. P2P is swimming upstream while the architecture actively pushes water against it.

The fundamental problem is one of design philosophy:

A web browser's job is to be a secure, sandboxed client. Its primary function is to fetch and display untrusted content from a server while protecting the user's machine from that content. It is designed for a client-server world. It requests, it does not serve.
A P2P peer's job: to be a sovereign, first-class citizen of the network. It must be able to listen for incoming connections, store significant amounts of state, route data for others, and run persistently. It must be both a client and a server.

These two philosophies are in direct opposition. Trying to force a peer's job into a browser's sandbox leads to a series of crippling compromises.

The result is that developers rationally choose client-server not because it's technically superior but because the web makes it the path of least resistance. This is architectural determinism: the primitives shape the possibility space so thoroughly that alternatives become economically irrational even when technically feasible.

The above taken together, explains at least to me, why so many "decentralized" web3 projects end up with centralized components: they're fighting the architecture itself.

Alternative primitives

If centralization is structurally embedded in web primitives, the question becomes: what primitives would make decentralization structurally natural? I'm not claiming these are novel as many have existed for years or decades, but their composition creates a fundamentally different architectural possibility space.

Content addressing

Content addressing means resources are identified by cryptographic hash of their content rather than network location. The address format is simple: hash://algorithm:digest (e.g., ipfs://bafybeigdyrzt5sfp7udm7hu76uh7y26nf3efuylqabf3oclgtqy55fbzdi). Verification is mathematical: hash(received_content) == address. Source is irrelevant as you can fetch from any peer because the hash guarantees content integrity.

This primitive has profound cryptographic properties. Hash collision implies content match, which means content is verifiable without trusting the source. Identical content has identical hash, which enables storage and bandwidth savings that scale with content reuse (reuse). Content can be fetched from untrusted sources safely, eliminating single points of failure and enabling resilience to server outages and censorship. Performance improves automatically because you can fetch from the geographically nearest peer rather than a specific origin.

The tradeoff is that no entity controls the hash space, so you cannot revoke access to content once published. This is a feature for censorship resistance but a bug for content moderation. Versioning requires higher-level abstractions like name services or blockchain registries.

Compared to location based addressing:

Property	Location (URL)	Content (Hash)
Verification	Trust source	Mathematical proof
Availability	Source online	Any peer has copy
Performance	Distance to source	Distance to nearest peer
Caching	Time-limited	Permanent
Updates	Transparent	Explicit version change
Censorship resistance	Low	High

In content-addressed systems, resource availability scales with demand. Under P2P architecture, high-demand resources get replicated more, improving both availability and performance. Popular content is cached by more peers, and availability approaches 1 as the number of peers increases: availability = 1 - ∏(1 - p_i) where p_i is the probability peer i is online.

Cryptographic identity

Entities are identified by public keys. The identity is the public key itself, example: Ed25519, 32 bytes. Authentication is accomplished by signing a challenge with the private key. Authorization is simply verifying the signature matches the public key.

The consequences of this architectural choice are interesting. "Registering" becomes a non-concept as there's no dependency on identity providers. Identity generation is unobservable and costs nothing since you just generate a keypair. The public key is the identity, and a signature proves knowledge of the corresponding private key. No CA needed, no third party involved. This works over any channel: signing HTTP requests, data structures, transactions, it's all the same and isn't tied to a specific protocol. Assertions about other keys enable complex authorization without central authority through delegation chains.

Compared to traditional authentication:

Property	Username/Password	OAuth	Public Key
Registration	Required	Required	None
Trust dependency	Service provider	Identity provider	None
Portability	Per-service	Per-provider	Universal
Phishing vulnerability	High	Medium	None (no secrets sent)
Account recovery	Service-specific	Provider-specific	Backup key only

In public key identity systems, authentication requires no third party and is unforgeable (assuming cryptographic hardness). This is at the same time more secure but also differently secure. The attack surface is local device security, not the security of identity providers and all network paths to them.

Deterministic execution

WebAssembly (WASM) provides a binary instruction format with specified semantics. It's a stack-based virtual machine with guaranteed memory safety, control-flow integrity, and determinism. Same input always produces same output, which means you can verify behavior without trust as the execution environment cannot subvert the computation.

WASM enforces principle of least privilege architecturally through capability based security. It's a compile target for many languages (C, C++, Rust, Go, and more), so you're not locked into JavaScript (and to me this is a huge benefit not to have to deal with that). The same binary runs identically on any browser, server, embedded device, desktop.

Write once, run anywhere, but for real this time.

Compared to JavaScript:

Property	JavaScript	WASM
Determinism	No (JIT variations)	Yes (spec-defined)
Verification	Source only (if SRI used)	Binary hash
Performance	JIT-dependent	Near-native
Sandboxing	Same-origin policy	Capability-based
Language support	JavaScript only	Any language
Portability	Browser-dependent	Spec-guaranteed

In deterministic execution environments, code behavior is verifiable by executing once and comparing the output hash. Verification complexity is O(1) in the number of verifiers. A content addressed WASM blob is guaranteed to be the thing you want to run.

Local-first data (CRDTs)

Conflict-free Replicated Data Types are data structures with commutative merge operations. Types include counters, registers, sets, sequences, and more. Operations happen locally with eventual synchronization, and concurrent updates are mathematically guaranteed to converge even without coordination.

Mutations can happen offline with no server dependency and syncing can occur later, asynchronously. Merges are automatic, and conflict resolution is built into the data structure. Any peer can sync with any peer without a central server.

Compared to client-server data:

Property	Client-Server	CRDT
Offline writes	Impossible	Supported
Conflict resolution	Server-side logic	Mathematical merge
Sync topology	Star (all through server)	Arbitrary graph
Latency	RTT to server	Local (instant)
Dependencies	Server must be online	None

In CRDT based systems, data availability is independent of server availability. Write availability approaches 100% as local storage reliability approaches 100%. You're no longer dependent on network connectivity for your application to function.

Light clients (verified computation)

Light clients enable resource-constrained devices to verify blockchain or distributed system state without downloading and processing the entire history. Instead of trusting a full node's responses, light clients verify cryptographic proofs that the response is correct. Here's roughly how it works:

Full nodes maintain complete state and history
Light clients download only block headers (small, fixed size)
Merkle proofs allow verification that specific data is included in a block
Header chain verification proves consensus without replaying all transactions

Compared to traditional clients:

Property	Thick Client (Full Node)	Thin Client (RPC)	Light Client
Storage required	Full history (~TB)	None	Headers only (~MB)
Trust requirement	None	Trust RPC provider	None (verify proofs)
Query latency	Local (instant)	Network RTT	Network RTT + proof verification
Censorship resistance	High	Low (provider can lie)	High (can detect lies)
Resource cost	High	Low	Medium

Light clients represent a crucial middle ground: they achieve the trust-minimization of full nodes with resource requirements closer to thin clients. For mobile devices and browsers, running a full node is impractical, but trusting an RPC provider reintroduces exactly the centralization we're trying to avoid.

In verified-computation models, security is decoupled from resource availability. A device with 1GB storage can have the same security guarantees as a data center with 100TB, because verification complexity is logarithmic in state size (via Merkle trees) while full validation is linear.

Composing primitives: Why this matters

Centralization pressure doesn't emerge from individual primitives but from how they compose. The same functionality implemented with different primitive combinations exhibits fundamentally different architectural properties.

Primitive composition determines architectural possibility and product destiny

Consider a standard web stack. You compose location addressing (DNS/IP), stateless protocol (HTTP), delegated trust (TLS/CA), and interpreted execution (JavaScript). Location addressing requires always-on servers. Stateless protocol requires session management, which means server state. Delegated trust requires trusting CAs. Interpreted execution requires trusting the browser vendor. The composition creates a system that requires trusting multiple third parties for basic functionality. There's no way around this given that it is baked into the primitive choices.

Now consider an alternative stack: content addressing (hash), cryptographic identity (pubkey), deterministic execution (WASM), and local-first data (CRDT). Content addressing allows fetching from anyone. Cryptographic identity allows authentication without third parties. Deterministic execution allows verification without trust. Local-first data allows operation without servers. The composition creates a system that functions without third parties. Not "can theoretically function" but "naturally functions". The absence of intermediaries is the default, not an exceptional mode.

Primitive choice determines the architectural possibility space. You cannot build a truly decentralized system on web primitives no matter how clever your application layer is. The primitives themselves encode centralization as a structural requirement.

Network effects run in opposite directions

In the web stack, network effects drive toward centralization. More users create more load, which requires bigger infrastructure, which has higher fixed costs, which creates economies of scale, which creates competitive advantage, which drives consolidation. It's a reinforcing cycle that naturally ends in oligopoly.

In a P2P stack with these primitives, network effects drive toward decentralization. More users mean more peers, which means better availability and better performance (shorter distance to nearest peer), which creates better user experience, which attracts more users. The reinforcing cycle makes the network more resilient and performant as it grows.

In content-addressed P2P systems, network performance improves monotonically with network size. This is the opposite of traditional systems where scale creates management burden. Scale creates resilience.

For instance, with 100 peers, average availability might be 98% but with 10,000 peers, it approaches 99.99%.

Economic structure: Zero marginal cost changes everything

The cost structures of these two stacks are radically different, and cost structure determines market structure.

Web stack costs:

Fixed costs include server infrastructure ($50-$10,000/month scaling with users), CDN ($20-$1,000/month scaling with bandwidth), DNS ($10-$100/month), TLS certificates ($0-$100/month), and monitoring/logging ($20-$500/month). Variable costs include bandwidth ($0.05-$0.15/GB), compute ($0.01-$0.10/hour), and storage ($0.02-$0.10/GB/month). Total: $100-$10,000+/month depending on scale.

For a web application, Cost(n users) = Fixed + (Variable × n). Scaling is linear to superlinear. You pay more as you grow, and someone has to pay those bills.

P2P stack costs:

Fixed costs are essentially just initial publication to a DHT ($0) or optionally an IPFS pinning service for reliability ($5-$50/month). Variable costs for bandwidth, compute, and storage are all $0 given they're provided by peers, run on user devices, and users store their own data. Total: $0-$50/month regardless of scale.

For a P2P application, Cost(n users) = Fixed. Scaling is constant. Your 10-millionth user costs the same as your first user: nothing.

The breakeven point is trivial: at any n > 0, P2P is cheaper. It's also an entire cost structure changes.

Consider value capture in a traditional SaaS application. User pays $100/month. AWS captures $30 for hosting. Cloudflare captures $10 for CDN. Auth0 captures $5 for authentication. Datadog captures $5 for monitoring. The developer receives $50. Intermediaries capture 50% of revenue, not through rent-seeking but through providing genuinely necessary infrastructure in a location-addressed, stateless, delegated-trust architecture.

In a P2P application, the user pays $100/month and the developer receives $100. Intermediaries capture 0% because there are no intermediaries, the architecture doesn't require them. Developers have 2x revenue at the same user price, or can charge 50% less for the same revenue. It's a different economic category.

One note: this applies to local-first P2P applications, not on-chain applications, since it's common and easy to incorrectly conflate "Web3" with "blockchain".

Market structure implications

Web architecture creates natural monopolies through a well understood mechanism. Infrastructure has high fixed costs. Marginal cost decreases with scale. The largest provider has the lowest unit cost. Smaller providers cannot compete on price. The market consolidates to oligopoly. Result: AWS, GCP, and Azure control over 67% of the cloud market. The market is working exactly as the architecture dictates.

P2P architecture eliminates economies of scale. There are no infrastructure fixed costs given users provide and ARE the infrastructure. Marginal cost is zero regardless of scale. Being large confers no cost advantage. Competition happens on quality and features, not on price. The market can remain diverse because there's no structural pressure toward consolidation.

In zero marginal-cost systems, market structure tends toward perfect competition rather than natural monopoly. This is why BitTorrent, after 20+ years, still has dozens of client implementations rather than one dominant provider. The architecture doesn't reward consolidation.

Security: attack surface is determined by trust dependencies

A web application's attack surface includes DNS poisoning (compromise resolver), TLS MITM (compromise any CA), server compromise (SQL injection, RCE, etc.), CDN compromise (modify cached content), browser compromise (XSS, CSRF, etc.), session hijacking (steal cookies/tokens), and supply chain attacks (compromised npm packages). You're trusting 5+ parties, and the attack surface is large and distributed.

A P2P application's attack surface includes private key theft (local device compromise), implementation bugs (in WASM runtime or application code), and social engineering (user signs malicious transaction). You're trusting 0 external parties, only cryptography and your local device. The attack surface is local only.

Quantitatively, if each attack vector has independent probability p of success, web application compromise probability is approximately 1 - (1-p)^7 ≈ 7p for small p, while P2P application compromise is approximately 1 - (1-p)^3 ≈ 3p. The web is roughly 2.3x more vulnerable, and that's a conservative estimate that assumes all attack vectors have equal probability.

Trust minimization

A system is trust-minimized if security relies only on cryptographic hardness assumptions, local device security (private key secrecy), and code correctness (verifiable via audit or formal methods).

The web stack requires trusting that the DNS hierarchy operates correctly, CAs issue certificates only to legitimate owners, browser vendors don't inject malicious code, servers provide correct content, and CDNs don't modify content in transit. These are institutional trust assumptions about organizations subject to compromise, coercion, error and of course, fiduciary duty.

The P2P stack requires trusting that hash functions are collision-resistant (cryptographic assumption), signature schemes are unforgeable (cryptographic assumption), WASM runtimes implement the specification correctly, and local devices protect private keys (user responsibility). These are fundamentally different kinds of trust.

Systems requiring trust in n independent parties have security bounded by the minimum security of those parties. If any party is compromised, the system is compromised. Trust-minimized systems have security bounded only by cryptographic assumptions and local device security. The attack surface is orders of magnitude smaller.

Bandwidth efficiency scales logarithmically

In location-addressed systems, serving a popular resource to n users costs the origin server n × size bandwidth. CDN caching reduces origin load but doesn't eliminate bandwidth costs. Total bandwidth is k × n × size where k is the cache miss rate.

In content-addressed P2P systems, the initial seed costs 1 × size. But peer sharing means each completed download immediately contributes upload capacity. Growth is geometric: 1, 2, 4, 8... parallel streams. In BitTorrent with 1 seed and 63 peers, distribution to all peers takes approximately 6 rounds (log₂(64)). The seed's total bandwidth usage is still 64 × size (same as server), but total network bandwidth is ~128 × size because peers also upload.

The critical difference: distribution time is O(log n) versus O(n). With 1,000 peers, client-server takes 1,000× longer than 1 peer; P2P takes ~10× longer. With 1,000,000 peers, the ratios are 1,000,000× versus ~20×. BitTorrent has empirically distributed exabytes of data with zero central infrastructure cost.

Note: these numbers assume sufficient peer adoption (heavy lifting assertion) Cold-start performance for unpopular content this is of course a real challenge.

Practical implications

Development complexity: different, not harder

Web application requirements:

Developer knowledge: HTML, CSS, JavaScript, React/Vue/Angular (frontend); Node.js/Python/Go, SQL, REST API design (backend); AWS/GCP, Docker, Kubernetes (infrastructure); monitoring, logging, scaling, security (operations).

Required services: compute (EC2/Cloud Run), database (RDS/Cloud SQL), CDN (CloudFront/Cloudflare), authentication (Auth0/Firebase), monitoring (Datadog/New Relic).

Deployment complexity is high: CI/CD pipelines, staging environments, production deploys, rollback procedures. Operational burden is continuous: 24/7 monitoring, security updates, scaling adjustments and i'm sure I'm missing a few.

P2P application requirements:

Developer knowledge: JavaScript/Wasm (frontend), CRDTs and local-first architecture (data), IPFS/libp2p basics (distribution).

Required services: optionally IPFS pinning ($5/month).

Deployment complexity is low: publish a hash, done. Operational burden is minimal given no servers to maintain and no scaling to manage.

The tradeoff is real though: P2P requires learning new paradigms (CRDTs aren't intuitive initially), but it eliminates operational complexity entirely. You're trading conceptual complexity for operational complexity. For many developers, this is a favorable trade as operational complexity never ends but conceptual complexity can be learned once.

User experience: faster and offline-capable

Web application first load:

DNS lookup (~30ms) + TLS handshake (~100ms) + HTML download (~50ms) + parse HTML (~20ms) + download CSS (~100ms) + download JS bundles (~500ms) + parse/execute JS (~200ms) + API calls (~200ms) = ~1200ms to interactive.

Subsequent loads with cached resources: ~300ms to interactive.

Offline: broken (unless PWA is implemented, which adds complexity).

P2P application first load:

DHT lookup (~125ms) + download WASM (~100ms) + instantiate WASM (~20ms) = ~245ms to interactive.

Subsequent loads from cache: ~20ms (instant).

Offline: works by default if data is cached locally via CRDT.

User-perceived difference: P2P loads ~5× faster and works offline without additional implementation effort. This is the inherent consequence of local-first data and content addressing.

Censorship resistance: built-in, not bolted-on

Web architecture has multiple censorship points: DNS blocking (national firewalls), IP blocking (ISP level), TLS interception (government MITM), server seizure (legal action), CDN compliance (ToS violations), and payment blocking (Visa/Mastercard). Effectiveness is high because there are multiple chokepoints. Workarounds like VPNs and Tor add complexity and can themselves be blocked.

P2P architecture has two potential censorship points: content hash blocking (requires deep packet inspection, expensive and incomplete) and P2P protocol blocking (also requires DPI, also expensive). Effectiveness is low because content is replicated across peers with no central server to seize. Workarounds aren't needed.

BitTorrent has proven this empirically! Despite 20+ years of legal pressure and attempts at blocking, it remains functional and widely used. The architecture itself is resistant to censorship in ways that application-layer solutions on web architecture can never achieve.

Conclusions

Core claims

Current web architecture structurally precludes decentralization. Intermediaries are not incidental but architecturally necessary for functionality. Location addressing requires always-on servers, stateless protocols require session management, delegated trust requires trusting CAs, and interpreted execution requires trusting browser vendors.
Cost structures create convergent pressure toward consolidation. Infrastructure with high fixed costs and decreasing marginal costs inevitably consolidates. P2P cost structures eliminate this pressure by making marginal cost zero regardless of scale.
The web requires trusting multiple third parties. Alternative primitives reduce trust to cryptography and local devices. In delegated-trust systems, security is bounded by the weakest link. In trust-minimized systems, security is bounded by cryptographic hardness assumptions. The attack surface is orders of magnitude smaller.
P2P can match or exceed web performance for popular content. Content addressing enables fetching from the nearest peer rather than specific origins. Bandwidth efficiency scales logarithmically versus linearly.
Reducing trusted parties reduces attack surface. Each trusted party is a potential compromise point. Systems with fewer dependencies are more secure by construction, not by implementation quality.
In systems where optimal performance requires infrastructure with economies of scale, market structure naturally converges toward oligopoly. This is just the market responding rationally to architectural incentives. AWS, GCP, and Azure dominance is a structural outcome and not an accident.
In content-addressed P2P systems, resource availability and performance improve monotonically with network size. More users means more resilience, not more burden. Network effects run toward decentralization, not consolidation.

What this means

The modern web's centralization is not just a bug that can be fixed with better protocols on top of HTTP but a feature of the primitive composition. Federation, blockchain layers, and decentralized identifiers all face the same structural headwinds because they're built on primitives that encode centralization as a requirement.

True decentralization requires different primitives: content addressing instead of location addressing, cryptographic identity instead of delegated trust, light clients & deterministic execution instead of interpreted code, and local-first data instead of server-side state. These primitives compose into systems where decentralization is the path of least resistance, not maximum pain.

The economic implications are that zero-marginal-cost systems don't exhibit economies of scale, which eliminates the structural pressure toward consolidation. Markets tend toward perfect competition rather than natural monopoly. Developers capture 2× more value because intermediaries aren't architecturally necessary.

The timeline is uncertain tbh, but the direction is clear. I think technical barriers are falling: WASM support is widespread, IPFS has millions of users, and CRDTs are proven in production (Figma, Linear, ...). Economic incentives favor P2P for developers (lower costs) and users (better privacy, offline functionality). Social barriers remain, but every architectural transition faces adoption challenges.

The web won the last 30 years by making client-server the path of least resistance. P2P can win the next 30 by making decentralization the path of least resistance.

So perhaps the only way to build the real Web3 is to let Web2 go.

Open questions

These are genuinely open problems that need solutions:

How to incentivize peers to provide storage/bandwidth without a payment layer? Altruism and reciprocity work at small scale (BitTorrent), but do they work for the entire web? Filecoin and Storj attempt to solve this with crypto payments, but that introduces complexity and volatility.
How to prevent DHT pollution without centralized moderation? Anyone can publish anything to IPFS. How do you prevent spam, illegal content, or resource exhaustion attacks without a central authority deciding what's allowed?
How to handle updates in content-addressed systems? IPNS and DNSLink provide mutable pointers to immutable content, but they reintroduce some centralization (DNS) or complexity (DHT-based naming). What's the right tradeoff?
How do users discover applications in hash-addressed space? DNS provides human-readable names. Content addressing provides verification but not discovery. We need decentralized naming that's both human-friendly and secure (Namecoin, ENS, Handshake are attempts, each with tradeoffs).
How do P2P systems interact with legal frameworks designed for client-server? Copyright law assumes identifiable servers. Data protection regulations (GDPR) assume controllable databases. How do you comply with "right to be forgotten" when content is replicated across thousands of peers?

If you want to work on solving these problems and shape the future of the human-centric web, as opposed to the corporate-web, join us: https://www.parity.io/careers

Polkadot Builder Party: Introduction to Polkadot Products

Karim Jedda — Fri, 10 Oct 2025 16:25:00 GMT

Cybergov makes history with Polkadot’s first AI on-chain vote

Karim Jedda — Fri, 19 Sep 2025 16:21:00 GMT

We accidentally built the wrong internet

Karim Jedda — Mon, 18 Aug 2025 08:35:17 GMT

Imagine a simple tool for the internet, built from scratch for today's world. One app. One thing you own. It proves who you are and lets you pay for things all in one place. Something only you control. No middlemen. No passwords. No credit cards.

When a website wants to know it's really you, you don't type anything. You just tap "Yes", like unlocking your phone with your face. That tap is a silent, secure confirmation that you're you, but no one learns anything about you, and nothing gets stored or stolen.

When you want to buy something? Same thing. One tap. It's like handing over cash, but digital, no forms, no card numbers to fill out, no companies keeping your payment info forever.

One tool. One tap. Works everywhere. Secure by design. Built for people, not corporations.

Now, let's engage in a thought experiment. Imagine this elegant system doesn't exist and we're gathered in a conference room in the late 1990s, designing the identity layer for the web. A rogue engineer stands up and says: "I've got a better idea."

"Okay, instead of giving users a tool they control, we'll make them rely on a centralized third-party account: an email address, hosted by some company somewhere. Those companies could read your messages, shut you out anytime, and track where you go online. That email will be your username for everything because it's easy to remember and people get to choose how their email looks."

"For proving who you are, we'll use passwords, just some words or symbols people have to remember. But people suck at remembering dozens of strong ones, so they'll reuse the same password everywhere. To fix that, we'll need another app called a password manager, just to survive our own bad idea."

"Is it more secure?" someone asks.

"No. It's worse. Every website will now store millions of passwords or hashes of these in giant databases. Hackers will constantly break in and steal them. These leaks will happen weekly. We'll just accept it as normal and have websites to look up if you were hacked lol."

"Okay... Is it more private?"

"Absolutely not. It's a privacy catastrophe. Every login, every action, is logged in a bunch of disconnected data silos, all tied back to a single corporate owned identifier. The user leaves a trail of digital exhaust everywhere they go: an exhaust that is immensely profitable to collect and sell. Your whole digital life gets chopped up and sold. Your identity becomes a product."

"Surely it's simpler to build and use?"

"Not a chance. To patch the glaring security holes, we'll have to bolt on more components. We'll need a two-factor authentication system that sends codes to a completely separate device. We'll need annoying CAPTCHAs to prove users aren't robots. We'll need convoluted 'forgot password' email loops, which themselves become a prime vector for account takeovers. It's a Rube Goldberg machine of trust, with dozens of failure points: the email provider, the service's database, the password manager, the 2FA app, the SMS gateway…"

He pauses. "And here's the worst part: after all that hassle just to log in… you still can't pay for anything!"

The room is silent.

"That whole system & all that complexity does nothing for buying stuff. To pay, you'll still need to find a physical card, type in 16 numbers, an expiry date, a security code, all into a web form. But don't worry we'll ask everyone to use HTTPS so they feel secure. That info gets then passed through banks, payment processors, networks each taking a cut, each adding risk. And if one of them messes up? Your card gets cloned. Your money's at risk."

"So, what is the upside to this… system?" someone finally asks.

"Well", the engineer says, "it lets people sign up for free stuff they'll never use again without needing money upfront."

That's it. That's the grand benefit. For this single, narrow edge case, we created an insecure, privacy-invasive, and breathtakingly complex architecture that divorced identity from commerce and burdened the entire digital world with friction and risk.

Here's the truth though: the gravity of convenience is the most powerful, irrational force in the world. A better system doesn't win by being better, it wins by being lighter because people will trade a pound of their sovereignty for an ounce of convenience, every single time.

A better system doesn't win by being smarter. It wins by being simpler first. Then better. Then both.

So why are we still stuck with this mess?

The honest answer is that the user experience of the alternative has, until now, been a steep cliff rather than a gentle on-ramp. The ethos of "be your own bank" came with the terrifying corollary of "be your own high-stakes security expert". Managing seed phrases, understanding gas fees, and navigating the unforgiving finality of transactions created a barrier to entry that was simply too high for the mainstream user. The comforting safety net of a "forgot password" link remained preferable to the catastrophic potential of a lost hardware wallet or a misplaced 12-word phrase.

That's terrifying. No wonder most people chose the flawed but familiar: a password, a reset link, a bank that might help if things go wrong.

But that's changing.

The new tools aren't asking you to become a tech expert. They're building the power of ownership into things that feel normal like unlocking your phone with your fingerprint, or approving a payment with a tap. Features like social recovery (letting trusted friends help you regain access), smart wallets (that work like apps, not crypto dashboards), and passkeys (using your phone or face instead of passwords) are making secure, self-owned identity actually easy.

The goal is to make the right thing the easy thing, not make everyone a crypto expert.

In any rational design meeting, that engineer would've been laughed out of the room.

Yet here we are, living in the world he described.

The good news? We don't have to stay here.

Of course, many in the mainstream tech world still see this kind of technology as tainted by its association with hype, scams, or volatile cryptocurrencies. They dismiss it all as "blockchain stuff", a knee-jerk reaction that throws the baby out with the bathwater. But the core idea is about ownership, privacy, and finally giving users a secure digital "self" that works as seamlessly as the rest of the web.

The internet was built without a native way to prove who you are and move value securely. But now, for the first time, that missing piece (cryptographic signer, example: Polkadot Vault) finally starts working like the rest of the web, simple, fast, and yours.

Web3 without browser extensions: Polkadot shows how

Karim Jedda — Fri, 02 May 2025 15:43:07 GMT

If you're like me, navigating the Web3 ecosystem often means installing a browser extension or two. These wallet extensions are the common gateways, acting as unified interfaces for managing keys and signing transactions needed to interact with dApps. They seem convenient, right?

But peek behind the curtain, especially if you've tried developing one, and you hit a wall of constraints. Getting an extension to work reliably across browsers is a nightmare. Chrome enforces ManifestV3, Firefox has its quirks, Brave demanded a $3M Polkadot treasury grant for integration... the list goes on. But most importantly, forcing users onto specific browsers just to access core functionality flies in the face of the open, accessible ethos Web3 supposedly champions. It's a fragmented mess, and frankly, it's getting worse.

We're building the digital equivalent of "accepted here" stickers in front of stores

Beyond the development headaches, let's talk trust and security. Browser extensions are essentially "rented" space within your browser, they can be disabled or even automatically removed. They can be compromised in numerous ways. Worse, they often require sweeping permissions, potentially accessing content on any website you visit, not just the dApp you're using. (and if you implicitly trust those permission toggles saying "don't grant access to every website"... well, I might have news for you).

It’s time to stop pretending this model is truly secure or sustainable.

So, what if you could interact with any dApp, securely sign transactions or messages, from any browser, on any device, without installing a single browser extension? What if that familiar extension icon wasn't necessary at all?

Extensions are easy, but trust is expensive

A wallet's primary job isn't checking balances, that's what block explorers and indexers excel at. Its core, non-negotiable function is signing. Don't take it from me:

On why Polkadot.js Signer is the best

This principle is central to the solution: You bring your own signer. The workflow hinges on an application called Polkadot Vault (formerly Parity Signer). It's a mobile app with some neat properties:

It's completely QR based
No "blind signing": you see precisely what you're signing on a separate, dedicated device before you approve it.
You don’t need a browser plugin. Just a device with a camera & a screen.
It's "air gapped", meaning that your private keys remain isolated on the Vault device, never directly exposed to the internet or your browser. It forces you to be used without USB debugging on, and without internet connection.

Yes, you still install something, but crucially, it's not a piece of code living inside your browser's potentially compromised environment, constantly parsing web pages and exposing sensitive operations to the DOM. It's also usually installed on a secondary phone or mobile device, but it can work on your main device too.

Polkadot Vault leverages Polkadot's unique on-chain metadata capabilities for rich transaction decoding, you can read a bit more about it here. And looking ahead, I can envision Polkadot Vault potentially even being delivered as a Progressive Web App (PWA).

For what follows I'll assume you have a working Polkadot Vault app installed on your device.

The missing link: Polkadot Vault JS Lib

Until recently, the ability to interact with Polkadot Vault via QR codes was somewhat buried within the Polkadot JS extension or lacked clear documentation for direct use.

Recognizing this, and with collaboration from the Nova and Parity teams, the core QR communication logic has been extracted into a straightforward, single-file JavaScript library.

The key here is absolute and complete simplicity. No NPM, no React, no Typescript dependencies required. Just include the JS file in your HTML header, and you unlock the ability to interact with Polkadot Vault from any web application.

This library enables any web application to communicate directly with the Polkadot Vault mobile app using QR codes.

The process is elegantly simple and secure:

Connect: To connect your account, you scan the QR representing the public key of your account.
Request: Once connected, the dApp uses the library to generate a QR code representing the unsigned transaction or message you want to sign (stake, balance transfer, etc)
Scan & Sign: You scan this QR code using your Polkadot Vault app.
The Vault app displays the details for you to verify on your secure device.
- You approve, or not, the signature directly within the Vault app.
- Respond: The Vault app generates a new QR code containing the cryptographic signature.
Verify: The web app uses the library and the device's camera to scan the signature QR code and broadcast it to the network.
Done!
- The web app verifies the signature and proceeds.

This sounds like a lot, but here's a video how fast it works:

Critically, your private keys never leave the secure, air-gapped environment of the Polkadot Vault app. No copying and pasting sensitive data, no passwords needed to unlock an extension mid-session, just scanning QR codes and verifying information on a trusted screen.

If a video is not enough, here it is embedded in this blog post. The library is extremely simple to use and doesn't require you have a PhD in Typescript, try it out for yourself, it's interactive and embedded on this page (fully client side, try it out from another phone too if you want):

Polkadot Vault Demo

Status:

Not Connected

Payload to Sign:

Signature Result:

Browsers as dumb terminals, not key keepers

The idea is not about using less software. It’s about using the right kind of software, one that doesn’t live inside your browser’s uncontrolled execution environment and is at the whims of necessary capitalizing, demanding the conversion of your data into money for somebody else.

With the Polkadot Vault approach, you don't grant a website permissions; you subject its requests to scrutiny. You scan what it presents, verify it on your secure device, and only then sign if you approve.

I believe that this QR-code-based approach fundamentally changes the game:

No browser extensions are required. None. Zero. Zip. Use Chrome, Firefox, Safari, Brave, Edge, or your favorite mobile browser. As long as it can access the camera and run modern JavaScript, you're good to go.
The dApp can control the UX end to end: want to display the video feed? Want to make a sound when something is canned? Up to you as a builder!
Use any JavaScript or Python or Rust library that you want, no copy pasting!
By keeping keys air-gapped on a dedicated signing device (your phone with the Vault app), the attack surface is significantly reduced compared to browser extensions that have access to your web activity.
The scan-sign-scan flow is intuitive and familiar, removing the need for users to manage complex extension permissions or worry about compatibility.

This approach shares similarities with protocols like WalletConnect (now Reown) but operates without the need for persistent pub/sub connections or internet connection for the signing process itself.

Air-gapped signing flows are the SSH of Web3: clean, inspectable, portable, and fundamentally more secure. Signing becomes declarative, not implicit. This method even paves the way for truly password-less authentication systems built on cryptographic signatures.

I'm genuinely too excited by this and I can't stop to think about cool applications. I recognize that for this to succeed, perhaps a different packaging is required... Regardless:

The Future is Open

This library and the underlying Polkadot Vault methodology demonstrate a clear path towards a more accessible, secure, and truly web-native Web3 experience. By leveraging open standards like QR codes and the secure environments of dedicated signing devices (like our phones), we can break free from the fragile dependencies and security risks of browser extensions.

We can build simpler, more robust dApps that work for everyone, everywhere.

The future of Web3 interaction doesn't need to live inside your browser tabs.

Symmetry in Chaos: my first generative NFT collection on Polkadot

Karim Jedda — Sat, 29 Mar 2025 17:51:24 GMT

Ever since I started working in blockchain three years ago, I’ve been captivated by NFTs: Non-Fungible Tokens. At their core, NFTs are digital certificates of ownership and authenticity, irrevocably tied to a unique asset. In simple terms: unlike classical proofs of ownership (for example: a piece of paper/contract), an NFT is a verifiable, digital, public record that anyone can independently check who the current owner of it is. This technology opens doors to quite a few cool things, however they are most commonly used in new forms of digital art, collectibles, and creative experimentation.

Now, there is a lot of stigma and controversy around NFTs in common tech circles, and this article isn't about debating any of that. It is rather about how generative algorithms can turn code into art and how I created an NFT collection inspired by ideas from Paul Bourke's blog post & the book "Symmetry in Chaos, A Search for Pattern in Mathematics, Art, and Nature" by Michael Field and Martin Golubitsky.

Inspiration for this project & the function used for the attractors, credit: https://paulbourke.net/fractals/symmetryinchaos/

My latest project, "attractors", is an NFT collection deployed on the Polkadot blockchain through the Koda.art platform, where each NFT’s design hinges on mathematical chaos & randomness. An NFT collection is essentially a set of the same type of NFTs defined by a maximum amount of NFTs it can hold. In this case, after 1024 people collect their NFT, no one else can get a new one again. 131 people created their NFT.

Here’s how it works: To get an NFT, people are essentially "rolling a digital dice" that assigns a specific cryptographic hash to their account. This hash acts then as a seed for an algorithm that generates an "attractor": a visual pattern derived from a simple mathematical formula, an algorithm. To try it out, visit the mint page and click "Preview variations" (you can't mint anymore though 🤌). An interesting insight is that there is no way to guess the "hash" that generated an attractor given only the image. 😸

Note: This is quite different than usual NFTs which create an image based off of permutations from pre-defined building blocks.

Collection page on the Koda website: https://koda.art/ahp/drops/attractors

This algorithm either converges into a stable, beautiful symmetrical design or collapses into disordered noise. Most hashes produce unstable results, but a rare few unlock beautiful, harmonious patterns and neither me, nor the people "minting" the NFT have any prior control over the generation process.

Some attractors from the collection - only 2 ended up stable

This mirrors the natural tension between chaos and order. Just as strange attractors in mathematics balance unpredictability with hidden structure, the collection becomes a search for those fleeting moments where randomness crystallizes into beauty. Each mint (instance of an NFT) is a gamble, a snapshot of algorithmic serendipity, and a testament to how generative systems can transform cold code into something alive.

Note: Ironically, while working on this project I watched the series "3 body problem" which explores a similar problematic (stability vs chaos).

If you want to try to create your own generative NFT, I highly recommend the Koda platform to get started, it was relatively straightforward and the team is super helpful. Let me know if you create cool things, I'll be happy to mint.

What I remember from this experiment is that when submitting the project to the platform, we weren't sure that there would be ANY beautiful attractors that will be created, so we made it free. To my surprise, it turned out that a lot of attractors turned out quite beautiful, which was quite the insight. Here's a selection of my favorite attractors:

Pretty attractors

There definitely is beauty in chaos, you just have to find it, and mint it.

Free software should be simple

Karim Jedda — Fri, 28 Mar 2025 19:37:22 GMT

Imagine two piles of code, both free to use & modify, that roughly do the same thing. How do you pick between the two?

My personal rule is to always use the one that's simpler to run or get started with, even if it has less features than the other option. When it comes to software, particularly "free" software, I believe that simplicity has an economical value that is important to understand, quantify, and celebrate.

I'd go so far as to argue that free software's utility is inversely proportional to its operational complexity. In fact, there is free software and there is free software: when software requires excessive infrastructure, specialized tooling, custom API keys or opaque dependencies (10 Docker container for a UI), it ceases to be "free" in a practical sense. The hidden cost of using the software renders it inaccessible or a burden, de-facto undermining its purpose and hindering its chances of being improved and maintained, unless heavily subsidized.

Nerfed machinery, wrapped in the rhetoric of openness

The classical solution proposed, the cure to the symptom, is to have someone else manage the infrastructure for you: this effectively makes the "free but complicated" software a trojan horse for ecosystem lock-in, in other words: complicated free software is free... to rent.

If you're steadfast on wanting to own it, this "bloated" free software will raise the price of adoption and introduce costs that outweigh the software's nominal price of 0€: a lot more engineering effort, maintenance, work required to operate the thing. In short: more time spent on what the software needs to operate instead of time spent on getting what you want done. This simplicity debt is the hidden cost of future labor required to sustain needlessly complex "free" tools.

Complexity externalizes risk, simplicity distributes it.

"Industrial open-source software" often serves as marketing loss leaders for cloud vendors. Kubernetes, while "free", entrenches reliance on AWS/GCP/Azure and knowledge of complicated tools more often than not applied to the wrong problems. This creates a paradox: nominally libre tools become gatekept by infrastructure & knowledge costs.

Industrial free software has definitely its place, don't get me wrong, and is still leagues ahead of proprietary software. I'm a pragmatic and admit that not all problems can be solved with simple software, but the survival of non-simple software often depends on corporate welfare: it is a precarious model. Example: When XCorp abandons an open project, it dies (AngularJS, OpenOffice, Atom, etc). When a lone developer abandons ripgrep, someone else forks it in an afternoon.

For the developer ecosystem to thrive and for our tools to improve, we must incentivize tools that are economically free, not just technically free.

If it's not simple, it isn't really free software

On the other hand, simple free software has a few inherent properties that are quite desirable for things that should just work: sustainability, resilience to institutional decay and broad accessibility. The likelihood that someone runs a simple piece of software increases the likelihood of someone caring about that piece of software. It being simple gives it also higher chances to be maintained and continue evolving: simple software has faster iteration cycles. To me, "simple free software" does not mean just access to code, but first and foremost access to agency.

Richard Stallman’s distinction between "free software" (ethical imperative) and "open source" (practical benefits) applies here. Bloat is the "open source-ification" of freedom: a veneer of liberty masking operational bondage. Debates over what’s "too complex" and what is simple are endless, but practical testing cuts through the noise: A mail server requiring 6 docker containers? Pass. A link portfolio manager that just has 1 binary? Looks good to me. A static website requiring me to install the right Node version, install new packages and run a server: no thank you. You get the idea.

A personal representation of things, yours will certainly differ.

The same applies for software and systems I deploy and run. Complexity is a filter.

Of course there is a lot of nuance in the debate, and although I try to apply this point to any software I run and operate, I do not pretend to have the answer to how to build any simple version of software that I might use (case in point: how to make a simple OpenOffice?). For the software I use, I always favor things that I can tinker with in a simple way, or a way that I can remember how it works when I need it again 6 months from now.

Simplicity is a public good

All this to say that my general recommendation is to consider simplicity an economic necessity in the software you pick, especially given how easy it is to build software today. Nobody needs a CRUD app that requires downloading half the internet to run. Nobody needs a server when an html file can do.

In order of importance

A program that cannot be installed, understood, or modified by a single person is a house with invisible locks on every door. You may own the blueprint, but you’ll never live inside. This is not an abstract critique, it’s the daily reality for users trapped in ecosystems of orchestrated complexity.

The bottom line: Free software must prioritize agency over features, resilience over scale, and sovereignty over convenience.

The way forward is simple. Literally :)

Experimenting in Web3: Creating a Culture of Experimentation on Polkadot

Karim Jedda — Wed, 19 Feb 2025 16:16:00 GMT

The case for offline mobile signers for crypto transactions

Karim Jedda — Sun, 12 Jan 2025 15:56:46 GMT

Navigating the crypto ecosystem is quite a ride and is riddled with numerous challenges and roadblocks, starting with how to acquire cryptocurrency and spanning to how to use it securely without losing it.

There is currently a plethora of blockchains, and with each one comes a different standard of dealing with the challenges above. Naturally we end up in a situation where there are multiple wallets, with different interfaces and different ways to integrate them.

Source: https://xkcd.com/927/

Despite this fragmentation, wallets end up being the means by which most people interact with blockchains, more specifically, the blockchains that the specific wallet they chose supports. To support more blockchains, you'll have to install multiple wallets, increasing the amount of things you need to be mindful of to avoid problems, thus increasing friction.

The problems with current wallets

Wallets come in numerous shapes and forms but usually either a mobile app or a browser extension (or both), each with multiple tradeoffs:

Extensions bring with them massive security risks, both for your crypto keys and for your regular browsing history (extensions have read access to everything you browse)
Mobile apps come with the problem that they usually don't work on mobile websites, which completely beats their purpose. (unless there is some trick in place, like running a browser within the app, or routing communication between the app and the mobile window through a backend, etc).

Wallets are what I like to call: "asset first". They will show you the current balance of your aggregated assets that are in your different crypto accounts. Their focus is usually on simulating a "you are your own bank" feeling. This makes it difficult to create one wallet to rule them all because different blockchains, protocols, and dApps (decentralized applications) have varying standards, transaction types, and custom behaviors. Wallets often need to integrate deeply with blockchain-specific quirks, making them siloed and specialized for certain ecosystems. As a result, cross-chain compatibility becomes a complex problem, that forces users to juggle multiple wallets or interfaces, which dilutes the seamless "bank-like" experience they aim to provide.

Additionally, the wallet approach and how it the wallet is implemented, also gives the specific blockchain more leverage in capturing a user-base that ends up locked into that specific crypto ecosystem: if the wallet doesn't implement a way to execute a specific action like bridging assets to a different chain, most users just won't do it, and the likelihood of a competing wallet capturing the initial user base is very unlikely thanks to laziness.

To be completely fair though, wallets end up like this because the whole point to have a wallet is to simplify some things, at the price of other things: it's all about the user experience.

This article is about how I prefer offline mobile signers for crypto transactions instead of wallets.

Mobile signers as a better paradigm

The difference between a wallet and what I call "mobile signer" in this post are the following:

A mobile signer approach is "action first", it is build around making deliberate and secure interactions vs the "asset first" approach of wallets, more focused on storing and sending value. This difference deeply influences how they function and what they prioritize.
Mobile signers are just tools to manage cryptographic keys and provide secure ways for how to use them for authorization and authentication. They do not depend on external APIs or services to exist or to function: they could even work completely offline.
Mobile wallets are clear signing first. Clear signing means that the mobile signer tells you exactly what it is you are currently going to do, would you use a key to sign a transaction.

A mobile signer is simply a tool to interact securely with blockchains, better aligned with the practical needs of users. Wallets don’t actually store coins or assets, they are abstractions built around cryptographic keys. However, their "asset-first" design focus creates a fundamentally different user experience compared to mobile signers, whose goal is to act as a clear and secure gateway for blockchain interactions.

In a nutshell, wallets frame everything around "managing value", whereas mobile signers re-imagine the experience to prioritize action, clarity and security.

Security without compromising UX

Naturally the first point that can be raised is that a mobile signer is only for experts and not for the majority of people, but I believe that a good mobile signer is only as good as how simple it is to use it securely, meaning, using a mobile signers makes doing the wrong thing very difficult.

Features of a signer that I'd love to use:

Mobile-only (no Ledger or hardware device)
Simple login capabilities for dApps
Companion dApp for viewing balances and portfolio (keeping heavy logic outside the signer)
QR code-based transaction flow:
- Scan transaction QR from dApp
- See clear action description on mobile
- Sign and return via QR code
Optional NFC support for devices that prefer it

The closest implementation I know of right now is the Polkadot Vault, which nails a lot of these concepts. But there's more to consider here.

The benefits of mobile signers

The beauty of mobile signers is that they're fundamentally simpler than traditional wallets. They don't try to be your crypto bank, they just handle the critical security bits really well. This separation means you can have rock solid security (your keys never touch the internet) while still getting a great user experience through companion apps or websites that handle all the fancy UI stuff.

Think about it: when you're signing a transaction, do you really need to see your entire portfolio? Or do you just need to clearly understand what you're about to do? Mobile signers cut through the noise and show you exactly what matters: "You're about to send 0.1 ETH to this address" or "You're approving this contract to spend your tokens". No gibberish, no confusion.

This approach also solves the multi-chain mess we're in. Since a mobile signer focuses on cryptographic operations rather than chain-specific features, one tool can work across any blockchain. No more juggling five different wallet apps.

But there are some hurdles to overcome. We need better standards for how dApps communicate transaction intents to signers. The QR code flow works, but it could be smoother. And we need more dApps to support this approach because right now it's still easier to just tell users "install MetaMask" than to implement a proper signing flow. Polkadot is an crypto ecosystem where I believe this to be completely solved: you get clear-signing technically for free thanks to the metadata.

Looking ahead, I see mobile signers becoming the default way people interact with blockchains. Not because they're more secure (though they are), but because they're actually simpler. They do one thing - signing - and they do it well. Everything else can live in whatever interface makes the most sense for that specific use case.

For this to happen, we need:

Better standards for representing transaction intents
More dApps supporting QR-based signing flows
Improved backup and recovery solutions that don't compromise security

The tools are already here, but we just need to start building with this paradigm in mind. If you're a developer, consider how you might support mobile signing in your dApp. If you're a user, try out tools like Polkadot Vault and push for better signing support in the apps you use. The sooner we move away from the "wallet as everything" model, the better off we'll be.

And I might just build this thing myself, although I'm happy to support any initiative that aims to do so.

LLMs in the middle: Content aware client-side filtering

Karim Jedda — Sat, 09 Dec 2023 18:21:59 GMT

Lately I've been playing with large language models like many other developers out there. I became so interested with the technology that I even built myself a rig with a relatively strong GPU that I have sitting on my desk, serving multiple text, image & audio models.

While most of the focus of current developments goes in the direction of "generative" AI, meaning AI used to generate stuff, it can similarly be used to classify content under specific conditions.

One of the most interesting applications I've found of this, which prompted me to really commit to this "hobby", is to use language models to filter out content for me while I browse the web.

Browsing the web today

Most of the interactions on the web today are controlled and brokered by the entities hosting the content that others produce: YouTube, Twitter/X, TikTok etc. In the majority of cases, you get to see a piece of content either because an algorithm recommended it to you or if you searched for it explicitly.

These entities being for-profit companies, it isn't a surprise that these algorithms are designed for purposes that are largely to the benefit of the companies, amongst which keeping you engaged on the platform as much as possible might be very high up on the stack. The goal being to create an environment for effective advertising.

Right now the web looks like this:

You mainly get to see content that's aligned with the company's objectives first, not yours

Some platforms allow you to filter out some things that you might not want to see, essentially by blocking or muting users you're not interested in seeing the content of or creating word filters that will not show you things if a specific word is present in them. The platforms don't have a unified way of doing this across different platforms and most importantly, you mostly cannot block ads.

To block ads, you will need to use third-party tools like ad-blockers or pay premium accounts here and there. These ad-blockers are deterministic and depend on the page structure, meaning, they block specific scripts & pieces of content depending on where they are on the website.

This is the best you can do:

Block block block.

However, providing platforms the data about whom and what you block, as well as showing that you're blocking ads, is another data point that is collected on their side and can be made use of and actioned on - as a data nerd I can't help but seeing data everywhere, these data points are very valuable.

So the best deal we're getting is this:

This setup is required and different for every website/service

Using local large language models you can flip all of this on it's head:

Content aware browser based filtering using local LLMs, battle of the bots

Sifting through the darkness - LLM style

For the South Park fans reading this, you might remember that one episode where Butters is tasked of editing Cartman's social media report: reading all the social media comments that Cartman is getting on his socials, and showing him only the positive ones. The episode is quite interesting, I won't spoil it for anyone who wants to watch it it's Season 19, Episode 5.

I created a similar bare bones, LLM based version of the service that Butters was providing not for my meager social media presence, but more for the content that I was recommended in my different feeds on multiple social media websites.

How it works is by combining rule based filters (user a, user b etc) and textual filters: passing the content of the "feed item" into an LLM and asking the LLM to classify it and then applying a rule based filter on the LLM's response. This thing is deployed as a browser extension that I've developed myself and that is interfacing with my LLM server (the machine I built myself)

Concretely, this is how it works:

Input: A list of website and identifiers of single "feed items"
Input: A local list of user filters (list of users & reasons for block, etc)
For every feed item on each website in the active window
- Apply the rule filters
- Apply the LLM based filter (send content to LLM for classification + parse response)
- If we have a match, display a black box over the content

This is how it looks, roughly:

Twitter/X on the right, same filters apply on other platforms too

In it's current bare bones version, it's working relatively okay. It can be massively improved by adding contextual info like:

how many times you saw this post,
how many times this specific topic was presented to you
etc

The cool thing with this is, that now YOU are the one collecting all this fancy-schmancy data that you'll be able to put to good use.

Now this is a browser extension and can be detected. But this type of thing can be developed without much overhead on the OS level as an application, restricting client side applications to detect any DOM changes - at which point it's game over.

Leveraging the obsession around "AI Alignment"

One of the hot topics in AI is called "AI Alignment", from my naïve understanding it's an attempt at making sure that an AI behaves and doesn't do bad things: like you'd train a dog not to bite (silly comparison, I know).

Incidentally, aligned models make for the best filters for these kind of content aware blockers, since they have already some bias towards not complying and categorizing something as "bad" right out of the box.

Where it might get tricky is when the AI's alignment and your personal alignment (beliefs?) are not aligned, in which case you can run the risk false positives, meaning not seeing pieces of content that would be actually interesting to you.

In this case, you might need to "fine tune" the model to your own beliefs and have it adapt to what you like and what you don't like.

Now, all of this is a small experiment I've put in place to keep my sanity and not leave my "brain space" at the whims of algorithms geared to get me and keep me engaged (addicted?) to this or that platform.

It's funny to see how people are afraid of AI taking over the world when much simpler algorithms already did, to some extent.

The future

As the models get smaller and more efficient, especially with the latest release of very performant 3B models from Stability AI, these sort of things will be absolutely trivial to implement and I expect them to be everywhere, even on mobile phones (which I didn't figure out how to do for now). I rather much delegate the decision of what I should or shouldn't see to an algorithm I personally control instead of an obscure algorithm tuned to "accaparate" and bank on my attention.

On a broader philosophical level, I like to think like this: Your brain is who you are. If you don't consciously shield it from what gets in it, you just might be the one being aligned, with ads, shitty content or worse.

Why many smaller bets might beat a single big one

Karim Jedda — Sun, 10 Sep 2023 18:56:18 GMT

I've been toying with the idea of creating my own business for a long while now. I have a lot of ideas and even more things that interest me that could be turned into businesses. What I lack, however, is time and perhaps focus to followup on each of these ideas. As soon as I write down the code to solve the problem, I reflect on what it would mean to get this kernel of an idea and turn it into something that is profitable or sustainable, after which I usually abandon the project because uncertainty is too high.

With many of these ideas implemented, code ready and all, it's indeed tough to select which project to allocate more time to, and work slowly towards profitability. This becomes especially relevant, when you understand that many of projects, no matter how good they are, are destined to fail due to a flurry of reasons.

It's estimated that around 80% to 90% of startups fail for various reasons. This means, one needs to prioritize time and money allocation properly, regardless of how cool the idea might be.

The natural question then a solo "founder" might face is: Should I work on one big project or on many smaller projects to have chances at monetary success?

If we take the example of "Levels.io", he started his ventures by deciding just to churn out ideas, and launched a "12 startups in 12 months" if I remember correctly, instead of launching 1 startup in 12 months, so the idea might not be that far fetched.

In this blog post, I'll try to explore potential answers to this question using statistics, powered by a new software I discovered called Squiggle.

The article is extremely simplified. The point is to showcase some of the technical capabilities of Squiggle as well as integrate it into a blog post.

You can of course adapt Squiggle to your use case and situation. I have a much more detailed model I use for financial planning to plan 12 months, 24 months etc, but that one is private. I also recommend you reach out and join the Squiggle Discord (at the bottom of the page). The community is extremely helpful and helped me with embedding the charts in this blog.

Please take the following with a grain of salt.

The odds of success for a single big project

First, let's consider the success rate of a single big project. According to this website these stand between 10% and 30%. Let's take that at face value and not dig any deeper.

To model this in Squiggle, we'll use the following:

// Assuming the big project has a success chance ranging from 0.1 to 0.3
bigProjectSuccess = 0.1 to 0.3

Squiggle provides visualization capabilities, embedded in this page to display the distribution, in this case, a log-normal distribution (takes me back to my Math years):

In simple terms, this means that on average, the success rate for a big project is around 18%, with a standard deviation of 6%, which is honestly very optimistic, but let's roll with it for now.

Combined odds of multiple smaller projects

Let's now do the same for multiple smaller projects. Smaller projects, require less effort and might have lesser chances of success. We'll go with these parameters:

// Assuming three smaller projects with different chances of success
smallProject1Success = 0.05 to 0.1 
smallProject2Success = 0.05 to 0.3 
smallProject3Success = 0.05 to 0.2

The numbers above give us the individual chances of success of three smaller projects. You can note that the chances of success for these smaller projects are smaller, we assume they can be done "at the same time".

To calculate the chances of success of at least one of those projects, we need to compute the chances of failure of all of them:

// Calculating the combined failure rate of all smaller projects
combinedFailureRate = (1 - smallProject1Success) * (1 - smallProject2Success) * (1 - smallProject3Success)

// At least one of the small projects is successful
smallProjectsSuccess = 1 - combinedFailureRate

The above is valid Squiggle code that we can write as is, and Squiggle will provide all the intermediary (not shown here) and the target plot of the probabilities for the smallProjectSuccess :

Looking at the graph, you can see that the combined success rate for at least one of the smaller projects succeeding represents a whopping 29%, with a standard deviation of 8%. This is significantly higher than the success rate for a single big project.

While this is extremely simplified and you shouldn't jump to conclusions for a flurry of reasons (compounding, feedback loops, etc), it might justify the first intuition of not putting "all the eggs in one bag" so to speak and proceed with smaller bets. This article explains it pretty well too.

It's of course also worth mentioning, that smaller projects might also have smaller returns compared to bigger projects, but this balances out with the fact that a successful small project might provide a confidence boost that a bigger, longer project might not yield right away (or before money runs out). On the other hand, fragmented attention might be a negative thing which would then speak in favor of one big project. It's all about assumptions and your personal situation.

You can try the above on your own, with your own assumptions, on the Squiggle playground. An interesting idea to explore would be to factor in founder experience (as a multiplicative factor), factor in the time available for working on the project etc. Squiggle enables you to make such smaller assumptions and multiply probabilities together quickly.

This approach is the reverse of taking data and trying to compute functions that map it best (classical ML), but rather starts from small rough assumptions and build up knowledge (with confidence intervals). Checkout more models here.

I just love it.

Conclusion

While this was a very simplified example, aimed at showcasing the power and flexibility of Squiggle, the data does seem to suggest that diversifying efforts across multiple smaller projects can significantly increase the chances of startup success.

It sure might be very tempting to work on a single, game-changing project, but the numbers advocate for a more diversified approach, especially for first-time founders or founders in countries with little access to venture capital to pad any runway. When outcomes are uncertain, and sometimes even random, quantity might just trump quality.

As with all ventures, it's essential to balance passion with pragmatism and let the numbers and the data guide the way.