Claude score on Humanity’s Last Exam by June 30?

Tech Claude One Off Open Ends Jun 30, 2026, 00:00 UTC Source: Polymarket
45%+
25%
$0.25
50%+
20.5%
$0.205
55%+
7.9%
$0.079
Volume$366.93K Liquidity$11.22K Open Interest$51.77K Last updated6 mins ago

Odds, liquidity, volume, and open interest are sourced from Polymarket and last synced at Jun 16, 2026 1:12 pm.

Probability history

Market details

Resolution criteria
This market will resolve to "Yes" if the Humanity’s Last Exam leaderboard lists any Anthropic Claude model with a score of at least the specified score by June 30, 2026, 11:59 PM ET. Otherwise, this market will resolve to "No".
Platform
Category
Tech Claude
Close date
June 30, 2026, 12:00 AM UTC
Settlement source
Scale
Market rules summary
Multi-timeframe Polymarket event. Each listed timeframe is represented by its Yes price on the underlying binary market. View full rules
CryptoSlate Market Analysis

Claude HLE Market Prices Progress While Drawing a Hard Line at 55%

The laddered odds suggest confidence in another Claude gain before the June deadline, paired with skepticism that a public Scale leaderboard entry clears the highest bar. The real question is how much the market is leaning on Anthropic’s release cadence versus the benchmark’s resolution mechanics.

Claude’s HLE market is pricing a narrow path: Anthropic has enough time to place a stronger model on Scale’s Humanity’s Last Exam leaderboard by June 30, yet the odds compress once the target moves from a competitive score into a clear benchmark break. That shape matters because the contract settles on a public listing, making product timing and evaluation exposure as important as model capability.

The ladder prices a climb to 50%, then a harder jump

The three outcomes form a visible curve: 45%+ at 23%, 50%+ at 20.5%, and 55%+ at 8.4%. As an inference from those prices, the market treats 45% and 50% as linked states of the same upgrade cycle, while 55% sits in a different regime. The narrow gap between the first two thresholds suggests that if Anthropic produces a Claude model capable of moving meaningfully on HLE, the market sees the extra five points as a smaller obstacle than the next five.

That matters because HLE-style benchmarks can produce step changes in market perception. A score just above 45% would validate the release-timing thesis but leave the top rung unresolved. A score near 50% would pressure both lower outcomes together. A visible result in the low 50s could reshape expectations for 55% even before it crosses, because the remaining distance would shrink to a few leaderboard points.

OutcomeYes priceInferred burden
45%+23%A meaningful Claude gain reaches the public leaderboard
50%+20.5%The same gain extends into a stronger benchmark band
55%+8.4%A larger leap arrives with clean public verification

The deadline turns Anthropic’s release cadence into a settlement constraint

The closing window is doing real pricing work. The rules require any Anthropic Claude model to appear on Scale’s leaderboard with the specified score by June 30, 2026, 11:59 PM ET. That means the market is implicitly evaluating at least three linked events: a model improvement exists, the model is evaluated for HLE, and the score becomes visible on the settlement source before the deadline.

The late-June timing also reduces the value of broad claims about future AI progress. General expectations for Claude can support the lower rungs, but the contract pays attention to a dated, source-specific artifact. A powerful internal model, a delayed public launch, or a score published after the cutoff would fail to satisfy the market rules. That is why the market can give meaningful probability to 45% while still drawing a steep line at 55%: the hardest outcome needs both a larger gain and a cleaner path to public verification.

Scale’s leaderboard is the evidence layer that matters most

The settlement source concentrates power in a single external record. Polymarket’s resolution criteria refer to the Scale Humanity’s Last Exam leaderboard, and each threshold resolves through the listed score. This creates a blind spot for narrative-driven repricing: impressive demos, model-card claims, or third-party commentary have weaker settlement value unless they point toward a Scale-listed Claude score.

That rule also explains why the market can price the lower thresholds meaningfully without treating them as simple bets on Anthropic’s overall research trajectory. The required artifact is specific: an Anthropic Claude model, a Scale HLE score, and a number at or above the relevant threshold. Every extra dependency reduces the market’s willingness to carry the same confidence from 45% through 55%.

Catalysts need to change the leaderboard path

Specific repricing catalysts are easy to define because the rules are narrow. A hypothetical new Claude entry on Scale with a score above 45%, 50%, or 55% would directly settle or nearly settle the debate for the relevant rung. A hypothetical entry close to a threshold would matter because the market would have to decide whether another update can arrive before the clock runs out.

Other catalysts would work through inference. A hypothetical Anthropic release announcement that names HLE or comparable exam-style evaluations could shift expectations if it increases confidence that a Scale score is coming. A hypothetical Scale leaderboard update policy, format change, or delay would also matter because the contract is tied to what the leaderboard lists; broad consensus about Claude’s ability has weaker settlement force.

Market structure can amplify those updates. The market has recorded $366.71K of volume and $51.77K of open interest, showing sustained engagement, while posted liquidity is $11.64K. That combination can leave prices sensitive to fresh objective evidence, especially if a leaderboard change affects multiple thresholds at once.

The main counter-signal is benchmark-specific friction

The strongest counterargument to the climb-through-50 story is that model progress may arrive in forms outside HLE’s scoring frame or through channels outside the contract. A Claude release could improve user-facing performance, coding, tool use, or safety behavior without producing the required Scale leaderboard score. Since the market resolves on an entry in one leaderboard, broad perceptions of AI progress can diverge from the contract outcome.

This failure mode helps explain the collapse from 50%+ to 55%+. The higher target requires a larger capability move, a public evaluation, and clean timing. It also leaves less room for measurement variance or partial gains. If future evidence shows Claude improving on adjacent benchmarks without a Scale HLE listing, the lower thresholds could lose narrative support because the settlement risk would become more visible.

The market’s central tension is publication versus capability. The prices imply belief that Anthropic can generate an HLE-relevant improvement before the deadline, yet the steep top rung says the public leaderboard path is the choke point. Any update that collapses those two issues into one visible Scale score would be the event most likely to force the cleanest repricing.

Sources