Google Launches Gemini 3.1 Flash-Lite: 2.5x Faster at Rock-Bottom Prices

Google Releases Gemini 3.1 Flash-Lite With 45% Speed Boost

Google launched Gemini 3.1 Flash-Lite on March 3, 2026, in preview via the Gemini API in Google AI Studio and Vertex AI for enterprise customers. The model delivers 363 tokens per second compared to Gemini 2.5 Flash's 249 tokens per second, representing a 45% improvement in output speed and 2.5x overall performance gain.

Aggressive Pricing Targets High-Volume Workloads

Gemini 3.1 Flash-Lite dramatically undercuts existing pricing with input tokens at $0.25 per million and output tokens at $1.50 per million. According to Artificial Analysis benchmarks, the model achieves an impressive Elo score of 1432 on the Arena.ai Leaderboard, outperforming other models in similar tiers including GPT-5 mini and Claude 4.5 Haiku. Google designed the model specifically for high-volume workloads including content moderation, translation, and e-commerce processing where speed and per-token cost drive economics.

Configurable Thinking Levels Give Developers Granular Control

Developers can configure thinking levels directly in both AI Studio and Vertex AI, giving them precise control over how much reasoning compute the model applies to each task. This flexibility allows teams to optimize the balance between speed, cost, and reasoning depth based on specific use case requirements.

Key specifications include:

363 tokens per second output speed (45% faster than Gemini 2.5 Flash)
2.5x overall performance improvement
$0.25 per million input tokens
$1.50 per million output tokens
Configurable reasoning levels for compute optimization

Launch Intensifies AI Pricing War

The March 3, 2026 launch date matches OpenAI's GPT-5.3 Instant release, highlighting the competitive intensity in the AI model market. Gemini 3.1 Flash-Lite directly competes with OpenAI's GPT-5 nano and GPT-5-mini, as well as Claude 4.5 Haiku in the efficiency tier. For complete pricing details, see the Gemini Developer API pricing page.

Early developer testing confirms the substantial speed differences, with the model described as delivering "insane" performance improvements for the price point. The combination of rock-bottom pricing, improved speed, and competitive intelligence makes Gemini 3.1 Flash-Lite a significant entry in Google's model lineup.

Key Takeaways

Google launched Gemini 3.1 Flash-Lite on March 3, 2026, delivering 363 tokens per second output speed
The model is 2.5x faster overall than Gemini 2.5 Flash while costing significantly less
Pricing is set at $0.25 per million input tokens and $1.50 per million output tokens
Developers can configure thinking levels to optimize the balance between speed, cost, and reasoning depth
The launch coincided with OpenAI's GPT-5.3 Instant, demonstrating intense competition in the efficiency model tier

Google Releases Gemini 3.1 Flash-Lite With 45% Speed Boost

Aggressive Pricing Targets High-Volume Workloads

Configurable Thinking Levels Give Developers Granular Control

Key specifications include:

363 tokens per second output speed (45% faster than Gemini 2.5 Flash)

2.5x overall performance improvement

$0.25 per million input tokens

$1.50 per million output tokens

Configurable reasoning levels for compute optimization

Launch Intensifies AI Pricing War

Key Takeaways

Google launched Gemini 3.1 Flash-Lite on March 3, 2026, delivering 363 tokens per second output speed

The model is 2.5x faster overall than Gemini 2.5 Flash while costing significantly less

Pricing is set at $0.25 per million input tokens and $1.50 per million output tokens

Developers can configure thinking levels to optimize the balance between speed, cost, and reasoning depth

The launch coincided with OpenAI's GPT-5.3 Instant, demonstrating intense competition in the efficiency model tier