Claude 3.5 sets new AI benchmarks, beating GPT-4o in coding and reasoning

Claude 3.5 Sonnet excels in solving 64% of coding problems, outperforming Claude 3 Opus in agentic coding evaluations.

Liam 'Akiba' Wright

Jun. 20, 2024 at 6:45 pm UTC

2 min read

Updated: Jun. 20, 2024 at 4:50 pm UTC

Claude 3.5 sets new AI benchmarks, beating GPT-4o in coding and reasoning

Cover art/illustration via CryptoSlate. Image includes combined content which may include AI-generated content.

Anthropic has launched Claude 3.5 Sonnet, the latest addition to its AI model lineup, claiming it surpasses previous models and competitors like OpenAI's GPT-4 Omni. Available for free on Claude.ai and the Claude iOS app, the model is also accessible via the Anthropic API, Amazon Bedrock, and Google Cloud's Vertex AI. Claude 3.5 Sonnet is priced at $3 per million input tokens and $15 per million output tokens, with a 200,000-token context window.

Claude 3.5 Sonnet benchmarks (Anthropic)

Claude 3.5 Sonnet sets new benchmarks in graduate-level reasoning (GPQA), undergraduate-level knowledge (MMLU), and coding proficiency (HumanEval). It demonstrates significant improvements in understanding nuance, humor, and complex instructions and excels at generating high-quality content with a natural tone. The model operates at twice the speed of Claude 3 Opus, making it suitable for complex tasks like context-sensitive customer support and multi-step workflows.

“In an internal agentic coding evaluation, Claude 3.5 Sonnet solved 64% of problems, outperforming Claude 3 Opus, which solved 38%.”

The model can independently write, edit, and execute code, making it effective for updating legacy applications and migrating codebases. It also excels in visual reasoning tasks, such as interpreting charts and graphs, and can accurately transcribe text from imperfect images, benefiting sectors like retail, logistics, and financial services.

Anthropic has also introduced Artifacts, a new feature on Claude.ai that allows users to generate and edit content like code snippets, text documents, or website designs in real time. This feature marks Claude's evolution from a conversational AI to a collaborative work environment, with plans to support team collaboration and centralized knowledge management in the future.

Anthropic emphasizes its commitment to safety and privacy, stating that Claude 3.5 Sonnet has undergone rigorous testing to reduce misuse. The model has been evaluated by external experts, including the UK's Artificial Intelligence Safety Institute (UK AISI), and has integrated feedback from child safety experts to update its classifiers and fine-tune its models. Anthropic assures that it does not train its generative models on user-submitted data without explicit permission.

Looking ahead, Anthropic plans to release Claude 3.5 Haiku and Claude 3.5 Opus later this year, along with new features like Memory, which will enable Claude to remember user preferences and interaction history.

Mentioned in this article

Anthropic

OpenAI

Posted in

Author View profile

Liam 'Akiba' Wright

Editor-in-Chief CryptoSlate

Also known as "Akiba," Liam Wright is a reporter, podcast producer, and Editor-in-Chief at CryptoSlate. He believes that decentralized technology has the potential to make widespread positive change.

Editor View profile

News Desk

Editor CryptoSlate

CryptoSlate is a comprehensive and contextualized source for crypto news, insights, and data. Focusing on Bitcoin, macro, DeFi and AI.

Disclaimer

Our writers' opinions are solely their own and do not reflect the opinion of CryptoSlate. None of the information you read on CryptoSlate should be taken as investment advice, nor does CryptoSlate endorse any project that may be mentioned or linked to in this article. Buying and trading cryptocurrencies should be considered a high-risk activity. Please do your own due diligence before taking any action related to content within this article. Finally, CryptoSlate takes no responsibility should you lose money trading cryptocurrencies. For more information, see our company disclaimers.

Latest News

Asset Coverage

Topic Coverage

Market Structure

Top Links

Applications

Crypto Sectors

Ecosystems

Top Links

Review Categories

Company Verticals

People Categories

Product Categories

Company

Help & Legal

Follow

Claude 3.5 sets new AI benchmarks, beating GPT-4o in coding and reasoning

Daily signals, zero noise.

Related coverage

The crypto winners from AI are not AI coins as agents start spending autonomously

The AI reset is now underway as layoffs accelerate and one group is hit hardest

Can crypto protect us against the growing web of economic AI agents?

AI is hiring more senior developers while quietly erasing the jobs that create them

One of the biggest US Bitcoin miners eyes sale of its entire 53,000 BTC stash

XRP Ledger nearly shipped a feature that could drain accounts without owners signing

T-REX Network and Zama Launch Institutional-Grade Confidentiality Infrastructure for RWA Tokenization

BYDFi Expands European Reach with Next Block Expo 2026 Sponsorship in Warsaw

BNB Chain Kicks Off University Dev Roadshow at NYU Today

RIV Coin Launches on Solana to Bridge Institutional Capital with DeFi Infrastructure

Playnance Unveils the First Democratic Social Gaming Protocol, Surpassing 1M GCOIN Holders

$METAWIN Presale Raises $350,000 in Hours

Playnance G Coin shifts from breakout launch to utility test

The Market Maker’s Exchange Checklist (Liquidity, Latency, and Risk Controls)