CRYSTAL Benchmark Reveals Systematic Reasoning Failures in 20 Leading Multimodal Models

Monday, March 16, 2026