· methodology · detection
How CrowdIntel detects Polymarket insiders
A narrative tour of the detection stack — what we look at, what triggers an investigation, and why specific parameters stay private. Companion to the methodology page.
By CrowdIntel
Every trade on Polymarket leaves a signature on the Polygon blockchain. Every wallet has a history. Every funding transfer is permanent. CrowdIntel reads those signatures at scale and publishes investigations when the statistics clear a high bar.
This is the narrative version. For the formal overview see /methodology. For the product terms see /glossary.
What the problem looks like
Imagine you're trying to answer one question: "Is there insider activity on this Polymarket market?"
The first thing you'd want is: who is betting on this market? Polymarket shows you holder sizes — useful, but not enough. You don't know who's behind any wallet. A wallet with $500K on YES could be a staffer with inside information, a sharp trader who read the right Telegram channel, or someone who's going to lose $500K.
What you actually want is: the track record of every wallet on this market, the funding trail that reveals which wallets are the same operator, and a statistical filter that separates "lucky" from "skilled" from "informed." None of that exists natively in Polymarket. CrowdIntel builds it.
The four signal families
Every Polymarket trade gets scored against four broad signal families. Specific signals and thresholds are proprietary — publishing them would teach adversaries exactly how to stay under our radar. The categories are:
1. Trade shape
What does the trade itself look like? Size relative to the market's baseline, timing relative to news and other trades, odds at the moment of entry. A big bet placed just before a market-moving announcement, at a long-odds price, is a different event than the same bet placed after the news, at near-certain odds.
2. Wallet history
What has this wallet done before? Lifetime win rate (both raw and shrinkage-adjusted), realized PnL, concentration across markets, and category-specific edges. A wallet with 2 wins in 2 bets tells us almost nothing. A wallet with 200 wins in 300 bets, positive PnL, and 85% in the current market's category tells us a lot.
3. Network topology
Who is this wallet connected to on chain? Shared funding sources, overlapping counterparties, correlated trade timing. One operator running ten wallets leaves a signature in the funding graph; one wallet that happens to look like another leaves a coincidence. Statistical tests separate them.
4. Cluster reputation
Has this wallet — or the cluster it belongs to — been part of a past investigation? Reputations are earned through track record, not claimed through branding. A wallet that's been in three prior investigations with p < 0.01 carries different weight than a fresh wallet with a small sample.
What triggers an investigation
A flagged trade is a starting point. A published investigation — where we name a wallet or cluster — is a higher bar. Every investigation on the site has cleared all of:
- Sample size that rules out luck at standard significance levels
- Excess win rate above the category's base rate, big enough to reject the no-skill null hypothesis
- Positive realized PnL (win rate inflated by heavy-favorite betting doesn't count)
- Statistical p-value under an internal threshold
- For cluster cases: on-chain coordination evidence — shared funding, timing correlation that isn't coincidence, or both
The specific numbers are tuned against observed behaviour and revised without notice. What matters for a reader is that every investigation you see exists because the combined evidence passed the bar, not because any single metric did.
Why we don't publish the parameters
Two reasons:
Adversarial robustness
If we publish "investigations open at ≥ 70% WR and p ≤ 0.1," the smart operator aims for 68% WR and p = 0.12. That's not what detection should incentivize. By keeping parameters private and revising them, we force adversaries to assume we see everything they do and reason from first principles.
Evolving stack
The detection model changes. Adding a new signal, re-weighting an old one, or raising a threshold are routine. Published numbers would be stale the moment they're out.
What we do publish
Every investigation page shows:
- The wallets involved (with full dossiers)
- The funder address and the funding trail
- A chronological timeline of the cluster's trades
- The top markets they've been active on
- The aggregate win rate, PnL, and p-value
- When the case was opened, and when it was last updated
If a case is retired (the cluster fell below the bar with new resolved bets, or data was corrected), the URL remains with a note explaining what changed. Investigations are living documents.
The shrinkage estimator in one paragraph
Raw win rate is noisy for small samples. A wallet with 2 wins in 2 bets has a raw win rate of 100% — obviously meaningless. CrowdIntel's shrinkage estimator computes a Bayesian posterior on every wallet's win rate, starting from a prior at the category's base rate and updating with the wallet's actual results. The posterior converges to the raw rate as sample size grows, and stays near the prior when sample size is small. Leaderboards rank by the shrunk value. Both the raw and shrunk numbers are visible on every wallet page so readers can see the adjustment for themselves.
Data pipeline, start to finish
- Polygon blockchain. Source of truth. Every trade, transfer, and resolution is read directly from state.
- Subsquid indexer. A custom subgraph ingests every Polymarket contract with short cursor lag.
- Enrichment worker. Each new trade is scored, linked to its funding trail, and assigned to any clusters.
- Outcome tracker. Resolved markets update wallet statistics — win rate, PnL, category edges, the shrinkage estimator.
- Investigation engine. Periodic evaluation opens new investigations when thresholds clear and re-scores existing ones as new bets resolve.
All of this runs continuously. A trade made at 14:02:10 on Polygon is scored, linked, and available on CrowdIntel by 14:02:40 at the latest.
What we can't see
Transparency on the limits:
- Sophisticated laundering. Multi-hop USDC routing, exchange-hopping, and fresh CEX withdrawals per wallet can partially defeat 1-hop funding-graph analysis.
- Off-chain coordination. Chat-room tipping, dark pools, private signal groups — invisible unless members share funding.
- Identity. CrowdIntel proves statistical anomalies. It doesn't prove who operates a wallet.
- Early-stage markets. Brand-new thin markets produce unstable scores regardless of method.
None of these make the product useless — most coordinated insider activity on Polymarket is not running exchange-hop laundering. It's operators who didn't expect anyone would be watching.
What to read next
- /methodology — the public overview, formal language
- /investigations — published cases
- The anatomy of a funding cluster — deep-dive on the coordination signal
- How to find insiders on Polymarket — the research workflow