Nov 18, 2025, 4:14 PM UTC

Google Drops Gemini 3.0 Preview, Early Benchmarks Beat ChatGPT

Pskiway2pski Several AI applications can be seen on a smartphone screen, including ChatGPT, Claude, Gemini, Perplexity, Microsoft Copilot, Meta AI, Grok and DeepSeek.

Several AI applications can be seen on a smartphone screen, including ChatGPT, Claude, Gemini, Perplexity, Microsoft Copilot, Meta AI, Grok and DeepSeek.

Gemini 3.0 (Pro) preview has been released with impressive numbers. (Image by Philip Dulian/dpa/Alamy Live News)

Google surprised developers today by adding a Gemini 3.0 preview model to AI Studio without any prior announcement. The update appeared suddenly in the interface, and early testing shows it performing at a level that places it above ChatGPT-class models on several key reasoning, coding, and multimodal benchmarks. Although Google has not issued a formal release statement, the early numbers suggest that Gemini 3.0 is not a minor revision of Gemini 2.5, but a genuine step into frontier-model territory.

A Shift Toward a New Architecture

According to Google’s blog, Gemini 3 is built on the company’s evolving “System 3” architecture, which focuses heavily on long-context reasoning, improved planning, more reliable tool use, and stronger multimodal processing. The model is designed to operate across text, images, audio, and video in a more coordinated way. These upgrades reflect Google’s goal of making its models more stable and more capable during complex, multi-step tasks.

Developers testing the preview in AI Studio report faster responses, more accurate instruction following, and smoother handling of long outputs. These early impressions align with the direction outlined in the blog and suggest meaningful architectural improvements beneath the surface.

Benchmarks Show Clear Performance Gains

A benchmark table circulating online compares Gemini 3 Pro with Gemini 2.5 Pro, Claude Sonnet 4.5, and GPT-5.1. The results indicate broad gains across reasoning, math, coding, and multimodal tasks.

On reasoning benchmarks, Gemini 3 Pro posts the highest score on Humanity’s Last Exam, a test that measures advanced academic reasoning. It also records a strong performance on ARC-AGI-2, one of the most difficult reasoning benchmarks available, and improves more than six-fold over the previous Gemini generation. These gains suggest a model that handles complexity and abstraction more effectively than before.

Scientific and mathematical performance is notably stronger as well. Gemini 3 Pro delivers high accuracy on the GPOQ Diamond scientific benchmark. On AIME 2025, a competitive math benchmark, it scores well both with and without access to code execution, which indicates improved stability and numerical reasoning.

Multimodal understanding shows similarly strong progress. The model performs well on MMMU-Pro, a benchmark that mixes text with diagrams, charts, and other visual material. It also achieves the highest published score on Video-MMMU, which tests understanding of video clips and dynamic scenes. These results point to a system that can interpret more complicated media with greater clarity.

Coding and software reasoning have also seen measurable improvement. Gemini 3 Pro scores high on LiveCodeBench Pro, which ranks models by competitive coding ability. It performs well on t2-bench, a test of tool use and step-by-step execution, and records a strong result on SWE-Bench Verified, a benchmark that involves fixing real bugs in open-source repositories. The model’s improved performance on these tests indicates better reliability during multi-step engineering tasks.

Comparison table showing benchmark scores for Gemini 3 Pro, Gemini 2.5 Pro, Claude Sonnet 4.5, and GPT-5.1 across reasoning, math, coding, and multimodal tests. — Early benchmark data shows Gemini 3 Pro outperforming competing models, including ChatGPT-class systems, across several key reasoning and coding evaluations.

A More Capable and Stable Model

Taken together, the results show that Gemini 3.0 preview is more capable across nearly every category of practical use. The model is better at sustaining coherent reasoning, handling long or technical instructions, interpreting images and video, and writing or debugging software. It appears more grounded, more predictable, and more stable than previous versions.

Users have also noted improvements in factual consistency and reduced hallucination rates. The model maintains better continuity over long sessions and performs more reliably when asked to chain several reasoning steps together. These observations match Google’s stated goals for the new architecture.

A New Release Strategy From Google

The way this preview appeared is almost as notable as the model itself. Google’s choice to release Gemini 3.0 Preview directly inside AI Studio, without a coordinated announcement, suggests a shift toward a faster, more iterative development strategy. Rather than unveiling fully polished models at once, Google is making major upgrades available earlier and allowing developers to test them in real workloads while the company continues refining the system.

This approach mirrors broader changes in the AI industry, where competitive pressure has accelerated release cycles and encouraged companies to ship models in stages rather than through single, high-profile product events.

What to Expect as the Rollout Continues

Google has not yet published documentation, pricing details, or a timeline for Vertex AI support. The blog notes that the Gemini 3 family will include both lighter Flash variants for efficiency-focused tasks and more capable Pro versions for research, engineering, and high-complexity reasoning. As the preview matures, Google is expected to expand access, publish model cards, and integrate the system into Workspace and other product lines.

Given the speed at which this preview appeared, a broader announcement is likely to follow soon.

Bottom Line

Gemini 3.0 preview offers the strongest indication so far that Google is regaining momentum in the model-quality race. The early benchmark data and the improvements described in the blog suggest meaningful advances in reasoning, mathematics, multimodal understanding, and coding. If these gains hold through the full release, Gemini 3 Pro will stand as a serious competitor to top models from OpenAI and Anthropic.

For developers already testing it in AI Studio, the early impression is clear. This is a significantly more capable model and one that represents a real step forward for Google’s AI platform.

Search Users

Popular Users

Google Drops Gemini 3.0 Preview, Early Benchmarks Beat ChatGPT

A Shift Toward a New Architecture

Benchmarks Show Clear Performance Gains

A More Capable and Stable Model

A New Release Strategy From Google

Bottom Line

Recommended Articles

Bitcoin Falls Below $90,000 as Market Sentiment Weakens

Trump Approves Sale of F-35 Jets to Saudi Arabia

Monad Launches First-Ever Coinbase Token Sale Ahead of Its Market Debut

Search Users

Popular Users

Google Drops Gemini 3.0 Preview, Early Benchmarks Beat ChatGPT

A Shift Toward a New Architecture

Benchmarks Show Clear Performance Gains

A More Capable and Stable Model

A New Release Strategy From Google

Bottom Line

Which company has best AI model end of 2025?

Recommended Articles

Bitcoin Falls Below $90,000 as Market Sentiment Weakens

Trump Approves Sale of F-35 Jets to Saudi Arabia

Monad Launches First-Ever Coinbase Token Sale Ahead of Its Market Debut