Google’s new Gemini Pro model has record benchmark scores—again
Gemini 3.1 Pro promises a Google LLM capable of handling more complex forms of work. Image: BoliviaInteligente / Unsplash Source: TechCrunch
Gemini 3.1 Pro promises a Google LLM capable of handling more complex forms of work. Image: BoliviaInteligente / Unsplash Source: TechCrunch
Introduction A stark new reality check is emerging from the world of artificial intelligence. While headlines tout AI’s potential to revolutionize knowledge work, a rigorous new benchmark reveals a significant chasm between promise and performance. When tested on authentic tasks from high-stakes fields like law, finance, and consulting, most leading AI models stumbled, raising urgent…
Introduction A new, rigorous benchmark is challenging the breathless promises of an AI-powered professional revolution. By testing leading models on authentic tasks from consulting, banking, and law, researchers have uncovered a sobering reality: most AI agents currently stumble when asked to perform complex, multi-step white-collar work. The findings suggest a significant chasm remains between impressive…