Google Gemini 3 Deep Think Achieves Record-Breaking Performance Across AI Benchmarks

Google released an updated version of Gemini 3 Deep Think on February 12, 2026, achieving unprecedented scores across multiple evaluation benchmarks including 84.6% on ARC-AGI-2, 48.4% on Humanity's Last Exam, and a 3455 Elo rating on Codeforces competitive programming. The specialized reasoning mode targets researchers, scientists, and engineers tackling complex problems, with only 7 humans currently ranked above the system in competitive programming.

Gemini 3 Deep Think Sets New Standards on Abstract Reasoning Tests

The system achieved 84.6% on ARC-AGI-2, a benchmark designed to test abstract reasoning and general intelligence, representing an unprecedented score on this evaluation. Additional performance metrics include 48.4% on Humanity's Last Exam without tools, a benchmark specifically designed to test capabilities beyond current AI systems, and 81.8% on MMMU-Pro for multimodal understanding. The system also demonstrated gold medal standard performance on Physics and Chemistry Olympiad problems and set record benchmarks for research and engineering tasks.

System Ranks Among Top Competitive Programmers Worldwide

Gemini 3 Deep Think achieved a 3455 Elo rating on Codeforces, placing it among the world's elite competitive programmers with only 7 humans currently ranked above it. This competitive programming achievement demonstrates world-class problem-solving capability and represents a significant milestone in AI systems matching and exceeding human expert performance in complex domains. The system shows that test-time compute scaling continues to hold, with more computational resources producing better results.

New Sketch-to-3D Capability Added for Engineering Applications

The updated system includes Sketch-to-3D functionality that takes hand-drawn sketches, analyzes the shape and details, then automatically generates ready-to-use files for 3D printing. This feature complements enhanced performance on physical systems and engineering problems, along with advanced reasoning for complex data analysis. The release came shortly after OpenAI's GPT-5.3-Codex and Anthropic's Claude Opus 4.6, with multiple observers noting that both competing systems were surpassed by the Deep Think update.

Key Takeaways

Gemini 3 Deep Think achieved 84.6% on ARC-AGI-2 and 48.4% on Humanity's Last Exam without tools
The system scored a 3455 Elo rating on Codeforces with only 7 humans ranked above it
Performance includes 81.8% on MMMU-Pro and gold medal standard on Physics and Chemistry Olympiads
New Sketch-to-3D capability automatically generates 3D printing files from hand-drawn sketches
Test-time compute scaling continues to improve results with additional computational resources

Gemini 3 Deep Think Sets New Standards on Abstract Reasoning Tests

System Ranks Among Top Competitive Programmers Worldwide

New Sketch-to-3D Capability Added for Engineering Applications

Key Takeaways

Gemini 3 Deep Think achieved 84.6% on ARC-AGI-2 and 48.4% on Humanity's Last Exam without tools

The system scored a 3455 Elo rating on Codeforces with only 7 humans ranked above it

Performance includes 81.8% on MMMU-Pro and gold medal standard on Physics and Chemistry Olympiads

New Sketch-to-3D capability automatically generates 3D printing files from hand-drawn sketches

Test-time compute scaling continues to improve results with additional computational resources