· methodology · transparency
Our insider-detector lied to us about its top wallet. Here's the autopsy.
An investigation we almost published would have called a wallet a "crypto insider with 89% win rate and p = 0.00000000000018." The raw trades told a different story — 166,746 of them. This is how we caught it.
By CrowdIntel
Last week our scorer flagged what looked like the cleanest insider story we'd seen. A single wallet. 90 resolved bets. An 89% win rate. A p-value of 1.767475e-13 — by any normal definition of "could this be luck," it could not be luck. The investigation row was already written. The blog post was about to follow. We wrote the headline first:
"He placed 90 bets. He won 89% of them. He walked away with $2.27 million."
Then we tried to verify it. And the wallet stopped looking like a 90-bet insider.
What the investigation said
Investigation #147 in our database described the wallet 0x0b9cae...87e44 — public Polymarket handle geniusMC — like this, in our own words:
Single wallet with 89% win rate across 90 resolved bets. 4 trades in crypto markets. Total volume: $4.5M. Profit: $349K. Funded by 0x9642…d079. Classified as whale. Probability of this performance by chance: 0.0000%.
Three things to flag before we keep going:
- Category: "crypto." Remember that.
- Profit: $349K. Also remember that — the headline I was about to write said $2.27M.
- 90 resolved bets. Out of "226 total bets," per our
wallet_statstable.
Each of those three numbers is wrong, in a different way, and the way they're wrong tells you something about how production data pipelines drift.
The first crack: two PnL columns, one wallet, six-times disagreement
We store wallet profit in two columns: total_pnl (on-chain derived from trade fills resolved against market_outcomes) and polymarket_pnl (mirrored from Polymarket's own profit endpoint). Per our internal source-of-truth rule, polymarket_pnl wins when both exist.
For geniusMC:
| Column | Value |
|---|---|
total_pnl (on-chain) | $349,035 |
polymarket_pnl (Polymarket API) | $2,265,410 |
total_volume (on-chain) | $4,517,510 |
polymarket_volume (Polymarket API) | $97,904,700 |
The two PnL figures disagree by 6.5×. The two volume figures disagree by 22×. Both PnL columns sit next to their own volume column from the same source, so the gap can't be a column-naming mistake on our end.
The thing that broke the tie was a sanity check on polymarket_volume. The investigation's stored top_markets lists the wallet's six biggest markets by volume. The single largest is the Queens-vs-Purdue college basketball game — total volume $394,219. The next is the Florida Panthers Stanley Cup market at $309,373. We summed all six: about $1.16M.
The wallet has 226 markets in wallet_stats. If the six biggest sum to $1.16M and the long tail is small bets, you can sketch a rough upper bound: nothing about that picture reaches $97.9M of volume. A single market would have to clear $400K just to match the top one, and there aren't 220 of those. The polymarket_volume column is somewhere between corrupt and measuring the wrong unit (shares, not dollars; lifetime across linked accounts; some scaling bug upstream — we don't yet know which).
Once polymarket_volume is unreliable, the sibling column polymarket_pnl — drawn from the same source — has to be downgraded too. The $2.27M headline died right there.
The second crack: 90 bets, or 166,746?
A 6× PnL disagreement should already kill a post. The more interesting failure mode came when we asked a question we should have asked first: how many trades has 0x0b9cae…87e44 actually placed?
Our wallet_stats table said 226. Our raw trades table — the on-chain truth, one row per fill — said:
| Metric | wallet_stats says | trades table says |
|---|---|---|
| Total bets / trades | 226 | 166,746 |
| Distinct markets | 226 | 1,101 |
| Bet value | $4,517,510 | $40,302,200 |
| Largest single bet | (not stored) | $2,295,960 |
| Active from / to | (not stored) | Feb 5 2025 → Apr 25 2026 |
wallet_stats isn't a separate source of truth — it's a periodically rebuilt aggregate over trades. For this wallet, the snapshot is stale by a factor of ~740× on trade count and ~9× on bet value. The investigation's "90 resolved bets" is computed against the 226-trade stale snapshot, so the entire statistical framing — 90, 88.9%, p = 1.8e-13 — applies to a wallet that no longer exists. The real wallet placed 1,101 distinct markets worth of bets between when the snapshot was taken and now.
A p-value computed over an unrepresentative sliver of a wallet's actual trading is not a p-value in the sense readers will interpret it. It's a statement about ninety bets we happened to have resolved metadata for, in a universe of 166,746 trades, and there's no reason to believe those ninety are independent of the rest.
The third crack: the wallet isn't even in the right category
Our investigation labeled the wallet crypto. The investigation's own stored top_markets are:
| Rank | Market | Volume | Wallet's pick |
|---|---|---|---|
| 1 | Queens (NC) vs Purdue Boilermakers (CBB) | $394K | Purdue |
| 2 | Will the Florida Panthers win the 2026 NHL Stanley Cup? | $309K | No |
| 3 | Everton vs Manchester United: O/U 3.5 | $213K | Under |
| 4 | Will Arsenal win the 2025–26 EPL? | $95K | Yes |
| 5 | Will Swansea win on 2026-02-24? | $76K | No |
| 6 | NBA: Heat -8.5 vs Pacers | $75K | Pacers |
College basketball. Hockey. English Premier League. NBA. Six markets, zero of them crypto.
We broke down the wallet's full categorized trade volume:
| Category | Trades | Bet value |
|---|---|---|
| (uncategorized) | 165,059 | $36.29M |
| sports | 1,677 | $3.86M |
| crypto | 4 | $80,982 |
| other | 3 | $77,448 |
| Other | 1 | $404 |
| Sports | 2 | $0 |
Of the trades that do carry a category label, 99.5% are sports. The "crypto" label on investigation #147 was assigned based on four trades — 0.24% of the categorized footprint. The remaining 165,059 trades carry no category at all because our enrichment cursor hasn't caught them yet, but the categorized sample is unambiguous.
This wallet isn't a crypto insider. It's the busiest sports operation in our dataset.
Where the false story actually came from
We can reconstruct it. Sometime in early 2025, wallet_stats was rebuilt and captured geniusMC at ~226 trades. At that point in time, four of those trades happened to be crypto markets (out of 226 they're 1.8% — close enough to a "crypto-eligible" wallet to win the category coin flip). Of the ninety that had resolved, eighty had won. Eighty out of ninety against a no-skill null is genuinely astronomical — p = 1.77e-13 is correct math.
Then geniusMC kept trading. And kept trading. And kept trading. For fourteen months. It opened positions across 1,101 markets, almost all of them sports, totaling $40M in bet value. The wallet_stats row was never refreshed against the new reality. Investigation #147 was generated on top of the stale row and inherited every distortion: the wrong category, the wrong sample size, the wrong PnL.
The original 88.9% number wasn't even computed wrong. It was computed correctly, over a sample that has since been swamped by a year of trading we never re-counted.
What we changed
Three things, before anything ships:
-
Two-source PnL gate. No investigation publishes if
total_pnlandpolymarket_pnldisagree by more than 1.5×, or if either is missing while the wallet has more than 50 resolved bets. The 6× geniusMC gap would have been the first thing flagged. -
Stale-snapshot guard. Any candidate where the raw
tradescount is more than 3× thewallet_statscount gets rejected from the publish queue and requeued for a stat rebuild. The 740× drift would have failed instantly. -
Category re-derivation at publish time. Categories are recomputed from the most recent 200 categorized trades, not from the snapshot's stored category field. A wallet whose categorized trades are 99% sports doesn't get to publish as "crypto" no matter what the snapshot says.
The geniusMC investigation row has been marked do_not_publish with a note pointing back to this post.
What this post does not prove
It doesn't prove geniusMC isn't skilled. A wallet that opens 166,746 trades across 1,101 markets without going bankrupt is doing something interesting. It might be a sports modelling shop running through Polymarket. It might be a market-making operation. It might be one person with a Bloomberg terminal and no sleep schedule. CrowdIntel doesn't know yet, because we don't have a clean enough recent sample to compute a fresh win rate against. We're rebuilding the stats. When the new numbers settle, we'll come back and tell you what geniusMC actually is.
What this post proves is narrower: that our scoring pipeline can confidently produce a publishable-looking insider story out of stale data, and that catching it took a manual cross-check no part of the automated pipeline performed.
That's the failure we're publishing. The fix is in the queue.