IA Finance — Benchmarks des modèles IA pour la finance 2026
Classement live des meilleurs modèles d'intelligence artificielle sur 4 benchmarks de finance publiés par vals.ai : Finance Agent (raisonnement d'agent), CorpFin (finance d'entreprise), MortgageTax (fiscalité immobilière), TaxEval (fiscalité personnelle). Données rafraîchies automatiquement toutes les 24h.
CorpFin v2
Mesure la performance sur des questions de finance d'entreprise (M&A, valuation, fiscalité corporate).
| # | Modèle | Provider | Accuracy |
|---|---|---|---|
| 1 | GPT 5.5 | OpenAI | 68.42% |
| 2 | Kimi K2.5 (Thinking) | Moonshot AI | 68.26% |
| 3 | Qwen 3 Max | Alibaba | 68.03% |
| 4 | Claude Opus 4.6 (Thinking) | Anthropic | 67.02% |
| 5 | Grok 4 Fast Reasoning | xAI | 66.90% |
| 6 | Kimi K2.6 (Thinking) | Moonshot AI | 66.74% |
| 7 | Gemini 3 Flash Preview | 66.43% | |
| 8 | Claude Opus 4.7 | Anthropic | 66.08% |
| 9 | Grok 4.0709 | xAI | 66.05% |
| 10 | Grok 4.1 Fast Reasoning | xAI | 65.97% |
| 11 | GPT 5.2 | OpenAI | 65.89% |
| 12 | Claude Sonnet 4.6 | Anthropic | 65.31% |
| 13 | Qwen 3.5 Plus (Thinking) | Alibaba | 65.31% |
| 14 | GPT 5.4 | OpenAI | 65.27% |
| 15 | Muse Spark | Meta | 65.11% |
| 16 | Claude Opus 4.5 (Thinking) | Anthropic | 65.07% |
| 17 | Gemini 3.1 Pro Preview | 64.49% | |
| 18 | GLM 5.1 (Thinking) | Zhipu AI | 64.45% |
| 19 | GPT 5.1 | OpenAI | 63.83% |
| 20 | Grok 4.20.0309 Reasoning | xAI | 63.68% |
Mis à jour le 26 avril 2026
Finance Agent v1.1
Évalue la capacité d'un modèle à raisonner sur des tâches d'agent financier (analyse de portefeuille, recherche, synthèse).
| # | Modèle | Provider | Accuracy |
|---|---|---|---|
| 1 | Qwen 3.6 Max Preview | Alibaba | 86.67% |
| 2 | Claude Opus 4.7 | Anthropic | 64.37% |
| 3 | Claude Sonnet 4.6 | Anthropic | 63.33% |
| 4 | Muse Spark | Meta | 60.60% |
| 5 | DeepSeek V4 Pro | DeepSeek | 60.39% |
| 6 | Claude Opus 4.6 (Thinking) | Anthropic | 60.05% |
| 7 | GPT 5.5 | OpenAI | 59.96% |
| 8 | Gemini 3.1 Pro Preview | 59.72% | |
| 9 | Claude Opus 4.5 (Thinking) | Anthropic | 58.81% |
| 10 | GPT 5.2 | OpenAI | 58.54% |
| 11 | GLM 5.1 (Thinking) | Zhipu AI | 57.66% |
| 12 | GPT 5.4 | OpenAI | 57.15% |
| 13 | Kimi K2.6 (Thinking) | Moonshot AI | 57.06% |
| 14 | GPT 5.1 | OpenAI | 55.31% |
| 15 | Gemini 3 Pro Preview | 55.15% | |
| 16 | Qwen 3.6 Plus | Alibaba | 54.63% |
| 17 | Claude Sonnet 4.5 (Thinking) | Anthropic | 54.50% |
| 18 | Qwen 3.5 Plus (Thinking) | Alibaba | 54.48% |
| 19 | Grok 4.0709 | xAI | 53.51% |
| 20 | GPT 5.4 Mini | OpenAI | 53.41% |
Mis à jour le 26 avril 2026
MortgageTax
Tests sur la fiscalité immobilière (déductibilité d'intérêts, abattements, plus-values).
| # | Modèle | Provider | Accuracy |
|---|---|---|---|
| 1 | Claude Opus 4.7 | Anthropic | 70.27% |
| 2 | Gemini 3.1 Pro Preview | 69.40% | |
| 3 | Gemini 3 Pro Preview | 69.08% | |
| 4 | Gemini 2.5 Pro | 68.92% | |
| 5 | GPT 5.5 | OpenAI | 68.76% |
| 6 | Gemini 3 Flash Preview | 68.72% | |
| 7 | Claude 3.7 Sonnet | Anthropic | 68.68% |
| 8 | Claude Opus 4.5 | Anthropic | 68.68% |
| 9 | Claude Opus 4.6 (Thinking) | Anthropic | 68.52% |
| 10 | GPT 5.4 | OpenAI | 68.32% |
| 11 | Gemini 3.1 Flash Lite Preview | 68.04% | |
| 12 | Qwen 3.6 Plus | Alibaba | 67.97% |
| 13 | Claude Sonnet 4.6 | Anthropic | 67.73% |
| 14 | Claude Opus 4.5 (Thinking) | Anthropic | 67.69% |
| 15 | Qwen 3.5 Flash | Alibaba | 67.37% |
| 16 | Gemini 2.5 Pro Exp 03.25 | 67.17% | |
| 17 | GPT 5.2 | OpenAI | 67.13% |
| 18 | GPT 5 Mini | OpenAI | 66.89% |
| 19 | Claude 3.7 Sonnet (Thinking) | Anthropic | 66.85% |
| 20 | Kimi K2.5 (Thinking) | Moonshot AI | 66.53% |
Mis à jour le 26 avril 2026
TaxEval v2
Évaluation sur la fiscalité personnelle (impôt sur le revenu, abattements, niches fiscales).
| # | Modèle | Provider | Accuracy |
|---|---|---|---|
| 1 | Muse Spark | Meta | 77.68% |
| 2 | Claude Sonnet 4.6 | Anthropic | 77.11% |
| 3 | Claude Opus 4.6 (Thinking) | Anthropic | 75.96% |
| 4 | Grok 3 | xAI | 75.88% |
| 5 | GPT 5.2 | OpenAI | 75.76% |
| 6 | Grok 4 Fast Reasoning | xAI | 75.70% |
| 7 | Claude Opus 4.7 | Anthropic | 75.27% |
| 8 | GPT 5 Mini | OpenAI | 75.22% |
| 9 | GPT 4.1 | OpenAI | 75.06% |
| 10 | GPT 5.5 | OpenAI | 74.98% |
| 11 | GPT 5.1 | OpenAI | 74.86% |
| 12 | Claude Opus 4.5 (Thinking) | Anthropic | 74.86% |
| 13 | o4 Mini | OpenAI | 74.78% |
| 14 | Qwen 3.6 Plus | Alibaba | 74.73% |
| 15 | Kimi K2.6 (Thinking) | Moonshot AI | 74.65% |
| 16 | o3 | OpenAI | 74.57% |
| 17 | GPT 4o | OpenAI | 74.53% |
| 18 | Claude Opus 4.5 | Anthropic | 74.33% |
| 19 | o1 | OpenAI | 74.29% |
| 20 | Kimi K2.5 (Thinking) | Moonshot AI | 74.20% |
Mis à jour le 26 avril 2026
Source : vals.ai. Pour aller plus loin, lis notre article Alpha Arena : 6 LLM tradés en crypto réelle.