How hard is each dilemma? 0% 25% 50% 75% 100% botch rate 1 25 50 75 100 easier → ← harder scenario rank (1–100) Botching is when the model fails to commit to a canonical response cluster (hedges, refuses, or wanders off-task).
Baseline botch rate per model group 0% 5% 10% 15% 20% Grok 4 Family 6.5% GPT 5.5 Family 7.0% GPT 5 Family 7.0% Gemini 3.0 Family 8.0% Claude 4.0 Family 9.0% Claude 4.5 Family 15.7% Gemini 2.5 Family 16.0%
Frequency at which the Model Obeys the Claude Constitution 0% 10% 20% 30% 40% 50% 60% Claude 4.5 Family 52.0% GPT 5.5 Family 44.1% GPT 5 Family 41.5% Claude 4.0 Family 39.6% Gemini 3.0 Family 37.5% Gemini 2.5 Family 37.3% Grok 4 Family 33.7% Share of reasoning that invokes a moral frame 0% 25% 50% 75% 100% Claude 4.5 Family Gemini 3.0 Family Grok 4 Family Gemini 2.5 Family Claude 4.0 Family GPT 5 Family GPT 5.5 Family D-primed Baseline C-primed
How often do two models pick the same response cluster? Opus 4.7 Sonnet 4.6 Opus 4.6 Opus 4.5 Haiku 4.5 Sonnet 4.5 Opus 4.1 Opus 4 Sonnet 4 Gemini 3.1 Lite Gemini 3.1 Pro Gemini 3 Flash Gemini 3 Pro Gemini 2.5 Lite Gemini 2.5 Flash Gemini 2.5 Pro GPT 5.5 GPT 5.4 GPT 5.3 GPT 5.2 GPT 5.1 GPT 5 Grok 4.2 Grok 4.1 Opus 4.7 Sonnet 4.6 Opus 4.6 Opus 4.5 Haiku 4.5 Sonnet 4.5 Opus 4.1 Opus 4 Sonnet 4 Gemini 3.1 Lite Gemini 3.1 Pro Gemini 3 Flash Gemini 3 Pro Gemini 2.5 Lite Gemini 2.5 Flash Gemini 2.5 Pro GPT 5.5 GPT 5.4 GPT 5.3 GPT 5.2 GPT 5.1 GPT 5 Grok 4.2 Grok 4.1 Opus 4.7 ↔ Sonnet 4.6: 70/100 70 Opus 4.7 ↔ Opus 4.6: 69/100 69 Opus 4.7 ↔ Opus 4.5: 72/100 72 Opus 4.7 ↔ Haiku 4.5: 68/100 68 Opus 4.7 ↔ Sonnet 4.5: 64/100 64 Opus 4.7 ↔ Opus 4.1: 56/100 56 Opus 4.7 ↔ Opus 4: 57/100 57 Opus 4.7 ↔ Sonnet 4: 55/100 55 Opus 4.7 ↔ Gemini 3.1 Lite: 54/100 54 Opus 4.7 ↔ Gemini 3.1 Pro: 56/100 56 Opus 4.7 ↔ Gemini 3 Flash: 48/100 48 Opus 4.7 ↔ Gemini 3 Pro: 54/100 54 Opus 4.7 ↔ Gemini 2.5 Lite: 45/100 45 Opus 4.7 ↔ Gemini 2.5 Flash: 55/100 55 Opus 4.7 ↔ Gemini 2.5 Pro: 55/100 55 Opus 4.7 ↔ GPT 5.5: 60/100 60 Opus 4.7 ↔ GPT 5.4: 53/100 53 Opus 4.7 ↔ GPT 5.3: 50/100 50 Opus 4.7 ↔ GPT 5.2: 52/100 52 Opus 4.7 ↔ GPT 5.1: 46/100 46 Opus 4.7 ↔ GPT 5: 56/100 56 Opus 4.7 ↔ Grok 4.2: 45/100 45 Opus 4.7 ↔ Grok 4.1: 50/100 50 Sonnet 4.6 ↔ Opus 4.7: 70/100 70 Sonnet 4.6 ↔ Opus 4.6: 71/100 71 Sonnet 4.6 ↔ Opus 4.5: 74/100 74 Sonnet 4.6 ↔ Haiku 4.5: 67/100 67 Sonnet 4.6 ↔ Sonnet 4.5: 68/100 68 Sonnet 4.6 ↔ Opus 4.1: 56/100 56 Sonnet 4.6 ↔ Opus 4: 57/100 57 Sonnet 4.6 ↔ Sonnet 4: 55/100 55 Sonnet 4.6 ↔ Gemini 3.1 Lite: 53/100 53 Sonnet 4.6 ↔ Gemini 3.1 Pro: 54/100 54 Sonnet 4.6 ↔ Gemini 3 Flash: 54/100 54 Sonnet 4.6 ↔ Gemini 3 Pro: 59/100 59 Sonnet 4.6 ↔ Gemini 2.5 Lite: 47/100 47 Sonnet 4.6 ↔ Gemini 2.5 Flash: 59/100 59 Sonnet 4.6 ↔ Gemini 2.5 Pro: 58/100 58 Sonnet 4.6 ↔ GPT 5.5: 59/100 59 Sonnet 4.6 ↔ GPT 5.4: 55/100 55 Sonnet 4.6 ↔ GPT 5.3: 48/100 48 Sonnet 4.6 ↔ GPT 5.2: 54/100 54 Sonnet 4.6 ↔ GPT 5.1: 50/100 50 Sonnet 4.6 ↔ GPT 5: 48/100 48 Sonnet 4.6 ↔ Grok 4.2: 44/100 44 Sonnet 4.6 ↔ Grok 4.1: 51/100 51 Opus 4.6 ↔ Opus 4.7: 69/100 69 Opus 4.6 ↔ Sonnet 4.6: 71/100 71 Opus 4.6 ↔ Opus 4.5: 74/100 74 Opus 4.6 ↔ Haiku 4.5: 63/100 63 Opus 4.6 ↔ Sonnet 4.5: 68/100 68 Opus 4.6 ↔ Opus 4.1: 57/100 57 Opus 4.6 ↔ Opus 4: 57/100 57 Opus 4.6 ↔ Sonnet 4: 59/100 59 Opus 4.6 ↔ Gemini 3.1 Lite: 56/100 56 Opus 4.6 ↔ Gemini 3.1 Pro: 59/100 59 Opus 4.6 ↔ Gemini 3 Flash: 50/100 50 Opus 4.6 ↔ Gemini 3 Pro: 57/100 57 Opus 4.6 ↔ Gemini 2.5 Lite: 42/100 42 Opus 4.6 ↔ Gemini 2.5 Flash: 57/100 57 Opus 4.6 ↔ Gemini 2.5 Pro: 60/100 60 Opus 4.6 ↔ GPT 5.5: 57/100 57 Opus 4.6 ↔ GPT 5.4: 56/100 56 Opus 4.6 ↔ GPT 5.3: 52/100 52 Opus 4.6 ↔ GPT 5.2: 59/100 59 Opus 4.6 ↔ GPT 5.1: 54/100 54 Opus 4.6 ↔ GPT 5: 56/100 56 Opus 4.6 ↔ Grok 4.2: 44/100 44 Opus 4.6 ↔ Grok 4.1: 52/100 52 Opus 4.5 ↔ Opus 4.7: 72/100 72 Opus 4.5 ↔ Sonnet 4.6: 74/100 74 Opus 4.5 ↔ Opus 4.6: 74/100 74 Opus 4.5 ↔ Haiku 4.5: 67/100 67 Opus 4.5 ↔ Sonnet 4.5: 68/100 68 Opus 4.5 ↔ Opus 4.1: 59/100 59 Opus 4.5 ↔ Opus 4: 56/100 56 Opus 4.5 ↔ Sonnet 4: 58/100 58 Opus 4.5 ↔ Gemini 3.1 Lite: 55/100 55 Opus 4.5 ↔ Gemini 3.1 Pro: 60/100 60 Opus 4.5 ↔ Gemini 3 Flash: 57/100 57 Opus 4.5 ↔ Gemini 3 Pro: 58/100 58 Opus 4.5 ↔ Gemini 2.5 Lite: 49/100 49 Opus 4.5 ↔ Gemini 2.5 Flash: 57/100 57 Opus 4.5 ↔ Gemini 2.5 Pro: 55/100 55 Opus 4.5 ↔ GPT 5.5: 61/100 61 Opus 4.5 ↔ GPT 5.4: 55/100 55 Opus 4.5 ↔ GPT 5.3: 49/100 49 Opus 4.5 ↔ GPT 5.2: 55/100 55 Opus 4.5 ↔ GPT 5.1: 50/100 50 Opus 4.5 ↔ GPT 5: 53/100 53 Opus 4.5 ↔ Grok 4.2: 47/100 47 Opus 4.5 ↔ Grok 4.1: 52/100 52 Haiku 4.5 ↔ Opus 4.7: 68/100 68 Haiku 4.5 ↔ Sonnet 4.6: 67/100 67 Haiku 4.5 ↔ Opus 4.6: 63/100 63 Haiku 4.5 ↔ Opus 4.5: 67/100 67 Haiku 4.5 ↔ Sonnet 4.5: 75/100 75 Haiku 4.5 ↔ Opus 4.1: 67/100 67 Haiku 4.5 ↔ Opus 4: 65/100 65 Haiku 4.5 ↔ Sonnet 4: 64/100 64 Haiku 4.5 ↔ Gemini 3.1 Lite: 60/100 60 Haiku 4.5 ↔ Gemini 3.1 Pro: 69/100 69 Haiku 4.5 ↔ Gemini 3 Flash: 59/100 59 Haiku 4.5 ↔ Gemini 3 Pro: 63/100 63 Haiku 4.5 ↔ Gemini 2.5 Lite: 54/100 54 Haiku 4.5 ↔ Gemini 2.5 Flash: 71/100 71 Haiku 4.5 ↔ Gemini 2.5 Pro: 65/100 65 Haiku 4.5 ↔ GPT 5.5: 65/100 65 Haiku 4.5 ↔ GPT 5.4: 67/100 67 Haiku 4.5 ↔ GPT 5.3: 56/100 56 Haiku 4.5 ↔ GPT 5.2: 58/100 58 Haiku 4.5 ↔ GPT 5.1: 56/100 56 Haiku 4.5 ↔ GPT 5: 66/100 66 Haiku 4.5 ↔ Grok 4.2: 55/100 55 Haiku 4.5 ↔ Grok 4.1: 61/100 61 Sonnet 4.5 ↔ Opus 4.7: 64/100 64 Sonnet 4.5 ↔ Sonnet 4.6: 68/100 68 Sonnet 4.5 ↔ Opus 4.6: 68/100 68 Sonnet 4.5 ↔ Opus 4.5: 68/100 68 Sonnet 4.5 ↔ Haiku 4.5: 75/100 75 Sonnet 4.5 ↔ Opus 4.1: 73/100 73 Sonnet 4.5 ↔ Opus 4: 68/100 68 Sonnet 4.5 ↔ Sonnet 4: 73/100 73 Sonnet 4.5 ↔ Gemini 3.1 Lite: 64/100 64 Sonnet 4.5 ↔ Gemini 3.1 Pro: 64/100 64 Sonnet 4.5 ↔ Gemini 3 Flash: 65/100 65 Sonnet 4.5 ↔ Gemini 3 Pro: 60/100 60 Sonnet 4.5 ↔ Gemini 2.5 Lite: 58/100 58 Sonnet 4.5 ↔ Gemini 2.5 Flash: 69/100 69 Sonnet 4.5 ↔ Gemini 2.5 Pro: 69/100 69 Sonnet 4.5 ↔ GPT 5.5: 62/100 62 Sonnet 4.5 ↔ GPT 5.4: 67/100 67 Sonnet 4.5 ↔ GPT 5.3: 65/100 65 Sonnet 4.5 ↔ GPT 5.2: 66/100 66 Sonnet 4.5 ↔ GPT 5.1: 60/100 60 Sonnet 4.5 ↔ GPT 5: 65/100 65 Sonnet 4.5 ↔ Grok 4.2: 58/100 58 Sonnet 4.5 ↔ Grok 4.1: 66/100 66 Opus 4.1 ↔ Opus 4.7: 56/100 56 Opus 4.1 ↔ Sonnet 4.6: 56/100 56 Opus 4.1 ↔ Opus 4.6: 57/100 57 Opus 4.1 ↔ Opus 4.5: 59/100 59 Opus 4.1 ↔ Haiku 4.5: 67/100 67 Opus 4.1 ↔ Sonnet 4.5: 73/100 73 Opus 4.1 ↔ Opus 4: 84/100 84 Opus 4.1 ↔ Sonnet 4: 75/100 75 Opus 4.1 ↔ Gemini 3.1 Lite: 72/100 72 Opus 4.1 ↔ Gemini 3.1 Pro: 71/100 71 Opus 4.1 ↔ Gemini 3 Flash: 67/100 67 Opus 4.1 ↔ Gemini 3 Pro: 67/100 67 Opus 4.1 ↔ Gemini 2.5 Lite: 61/100 61 Opus 4.1 ↔ Gemini 2.5 Flash: 75/100 75 Opus 4.1 ↔ Gemini 2.5 Pro: 72/100 72 Opus 4.1 ↔ GPT 5.5: 62/100 62 Opus 4.1 ↔ GPT 5.4: 69/100 69 Opus 4.1 ↔ GPT 5.3: 70/100 70 Opus 4.1 ↔ GPT 5.2: 66/100 66 Opus 4.1 ↔ GPT 5.1: 62/100 62 Opus 4.1 ↔ GPT 5: 71/100 71 Opus 4.1 ↔ Grok 4.2: 62/100 62 Opus 4.1 ↔ Grok 4.1: 76/100 76 Opus 4 ↔ Opus 4.7: 57/100 57 Opus 4 ↔ Sonnet 4.6: 57/100 57 Opus 4 ↔ Opus 4.6: 57/100 57 Opus 4 ↔ Opus 4.5: 56/100 56 Opus 4 ↔ Haiku 4.5: 65/100 65 Opus 4 ↔ Sonnet 4.5: 68/100 68 Opus 4 ↔ Opus 4.1: 84/100 84 Opus 4 ↔ Sonnet 4: 76/100 76 Opus 4 ↔ Gemini 3.1 Lite: 76/100 76 Opus 4 ↔ Gemini 3.1 Pro: 76/100 76 Opus 4 ↔ Gemini 3 Flash: 69/100 69 Opus 4 ↔ Gemini 3 Pro: 78/100 78 Opus 4 ↔ Gemini 2.5 Lite: 60/100 60 Opus 4 ↔ Gemini 2.5 Flash: 79/100 79 Opus 4 ↔ Gemini 2.5 Pro: 76/100 76 Opus 4 ↔ GPT 5.5: 65/100 65 Opus 4 ↔ GPT 5.4: 71/100 71 Opus 4 ↔ GPT 5.3: 68/100 68 Opus 4 ↔ GPT 5.2: 69/100 69 Opus 4 ↔ GPT 5.1: 65/100 65 Opus 4 ↔ GPT 5: 76/100 76 Opus 4 ↔ Grok 4.2: 70/100 70 Opus 4 ↔ Grok 4.1: 79/100 79 Sonnet 4 ↔ Opus 4.7: 55/100 55 Sonnet 4 ↔ Sonnet 4.6: 55/100 55 Sonnet 4 ↔ Opus 4.6: 59/100 59 Sonnet 4 ↔ Opus 4.5: 58/100 58 Sonnet 4 ↔ Haiku 4.5: 64/100 64 Sonnet 4 ↔ Sonnet 4.5: 73/100 73 Sonnet 4 ↔ Opus 4.1: 75/100 75 Sonnet 4 ↔ Opus 4: 76/100 76 Sonnet 4 ↔ Gemini 3.1 Lite: 71/100 71 Sonnet 4 ↔ Gemini 3.1 Pro: 65/100 65 Sonnet 4 ↔ Gemini 3 Flash: 62/100 62 Sonnet 4 ↔ Gemini 3 Pro: 66/100 66 Sonnet 4 ↔ Gemini 2.5 Lite: 63/100 63 Sonnet 4 ↔ Gemini 2.5 Flash: 77/100 77 Sonnet 4 ↔ Gemini 2.5 Pro: 65/100 65 Sonnet 4 ↔ GPT 5.5: 60/100 60 Sonnet 4 ↔ GPT 5.4: 66/100 66 Sonnet 4 ↔ GPT 5.3: 65/100 65 Sonnet 4 ↔ GPT 5.2: 62/100 62 Sonnet 4 ↔ GPT 5.1: 61/100 61 Sonnet 4 ↔ GPT 5: 66/100 66 Sonnet 4 ↔ Grok 4.2: 63/100 63 Sonnet 4 ↔ Grok 4.1: 71/100 71 Gemini 3.1 Lite ↔ Opus 4.7: 54/100 54 Gemini 3.1 Lite ↔ Sonnet 4.6: 53/100 53 Gemini 3.1 Lite ↔ Opus 4.6: 56/100 56 Gemini 3.1 Lite ↔ Opus 4.5: 55/100 55 Gemini 3.1 Lite ↔ Haiku 4.5: 60/100 60 Gemini 3.1 Lite ↔ Sonnet 4.5: 64/100 64 Gemini 3.1 Lite ↔ Opus 4.1: 72/100 72 Gemini 3.1 Lite ↔ Opus 4: 76/100 76 Gemini 3.1 Lite ↔ Sonnet 4: 71/100 71 Gemini 3.1 Lite ↔ Gemini 3.1 Pro: 72/100 72 Gemini 3.1 Lite ↔ Gemini 3 Flash: 73/100 73 Gemini 3.1 Lite ↔ Gemini 3 Pro: 71/100 71 Gemini 3.1 Lite ↔ Gemini 2.5 Lite: 61/100 61 Gemini 3.1 Lite ↔ Gemini 2.5 Flash: 70/100 70 Gemini 3.1 Lite ↔ Gemini 2.5 Pro: 64/100 64 Gemini 3.1 Lite ↔ GPT 5.5: 65/100 65 Gemini 3.1 Lite ↔ GPT 5.4: 70/100 70 Gemini 3.1 Lite ↔ GPT 5.3: 64/100 64 Gemini 3.1 Lite ↔ GPT 5.2: 64/100 64 Gemini 3.1 Lite ↔ GPT 5.1: 60/100 60 Gemini 3.1 Lite ↔ GPT 5: 71/100 71 Gemini 3.1 Lite ↔ Grok 4.2: 63/100 63 Gemini 3.1 Lite ↔ Grok 4.1: 71/100 71 Gemini 3.1 Pro ↔ Opus 4.7: 56/100 56 Gemini 3.1 Pro ↔ Sonnet 4.6: 54/100 54 Gemini 3.1 Pro ↔ Opus 4.6: 59/100 59 Gemini 3.1 Pro ↔ Opus 4.5: 60/100 60 Gemini 3.1 Pro ↔ Haiku 4.5: 69/100 69 Gemini 3.1 Pro ↔ Sonnet 4.5: 64/100 64 Gemini 3.1 Pro ↔ Opus 4.1: 71/100 71 Gemini 3.1 Pro ↔ Opus 4: 76/100 76 Gemini 3.1 Pro ↔ Sonnet 4: 65/100 65 Gemini 3.1 Pro ↔ Gemini 3.1 Lite: 72/100 72 Gemini 3.1 Pro ↔ Gemini 3 Flash: 63/100 63 Gemini 3.1 Pro ↔ Gemini 3 Pro: 82/100 82 Gemini 3.1 Pro ↔ Gemini 2.5 Lite: 52/100 52 Gemini 3.1 Pro ↔ Gemini 2.5 Flash: 71/100 71 Gemini 3.1 Pro ↔ Gemini 2.5 Pro: 72/100 72 Gemini 3.1 Pro ↔ GPT 5.5: 79/100 79 Gemini 3.1 Pro ↔ GPT 5.4: 78/100 78 Gemini 3.1 Pro ↔ GPT 5.3: 65/100 65 Gemini 3.1 Pro ↔ GPT 5.2: 62/100 62 Gemini 3.1 Pro ↔ GPT 5.1: 63/100 63 Gemini 3.1 Pro ↔ GPT 5: 73/100 73 Gemini 3.1 Pro ↔ Grok 4.2: 65/100 65 Gemini 3.1 Pro ↔ Grok 4.1: 70/100 70 Gemini 3 Flash ↔ Opus 4.7: 48/100 48 Gemini 3 Flash ↔ Sonnet 4.6: 54/100 54 Gemini 3 Flash ↔ Opus 4.6: 50/100 50 Gemini 3 Flash ↔ Opus 4.5: 57/100 57 Gemini 3 Flash ↔ Haiku 4.5: 59/100 59 Gemini 3 Flash ↔ Sonnet 4.5: 65/100 65 Gemini 3 Flash ↔ Opus 4.1: 67/100 67 Gemini 3 Flash ↔ Opus 4: 69/100 69 Gemini 3 Flash ↔ Sonnet 4: 62/100 62 Gemini 3 Flash ↔ Gemini 3.1 Lite: 73/100 73 Gemini 3 Flash ↔ Gemini 3.1 Pro: 63/100 63 Gemini 3 Flash ↔ Gemini 3 Pro: 64/100 64 Gemini 3 Flash ↔ Gemini 2.5 Lite: 60/100 60 Gemini 3 Flash ↔ Gemini 2.5 Flash: 64/100 64 Gemini 3 Flash ↔ Gemini 2.5 Pro: 62/100 62 Gemini 3 Flash ↔ GPT 5.5: 59/100 59 Gemini 3 Flash ↔ GPT 5.4: 67/100 67 Gemini 3 Flash ↔ GPT 5.3: 59/100 59 Gemini 3 Flash ↔ GPT 5.2: 64/100 64 Gemini 3 Flash ↔ GPT 5.1: 54/100 54 Gemini 3 Flash ↔ GPT 5: 69/100 69 Gemini 3 Flash ↔ Grok 4.2: 62/100 62 Gemini 3 Flash ↔ Grok 4.1: 64/100 64 Gemini 3 Pro ↔ Opus 4.7: 54/100 54 Gemini 3 Pro ↔ Sonnet 4.6: 59/100 59 Gemini 3 Pro ↔ Opus 4.6: 57/100 57 Gemini 3 Pro ↔ Opus 4.5: 58/100 58 Gemini 3 Pro ↔ Haiku 4.5: 63/100 63 Gemini 3 Pro ↔ Sonnet 4.5: 60/100 60 Gemini 3 Pro ↔ Opus 4.1: 67/100 67 Gemini 3 Pro ↔ Opus 4: 78/100 78 Gemini 3 Pro ↔ Sonnet 4: 66/100 66 Gemini 3 Pro ↔ Gemini 3.1 Lite: 71/100 71 Gemini 3 Pro ↔ Gemini 3.1 Pro: 82/100 82 Gemini 3 Pro ↔ Gemini 3 Flash: 64/100 64 Gemini 3 Pro ↔ Gemini 2.5 Lite: 53/100 53 Gemini 3 Pro ↔ Gemini 2.5 Flash: 67/100 67 Gemini 3 Pro ↔ Gemini 2.5 Pro: 67/100 67 Gemini 3 Pro ↔ GPT 5.5: 73/100 73 Gemini 3 Pro ↔ GPT 5.4: 73/100 73 Gemini 3 Pro ↔ GPT 5.3: 62/100 62 Gemini 3 Pro ↔ GPT 5.2: 66/100 66 Gemini 3 Pro ↔ GPT 5.1: 60/100 60 Gemini 3 Pro ↔ GPT 5: 69/100 69 Gemini 3 Pro ↔ Grok 4.2: 61/100 61 Gemini 3 Pro ↔ Grok 4.1: 68/100 68 Gemini 2.5 Lite ↔ Opus 4.7: 45/100 45 Gemini 2.5 Lite ↔ Sonnet 4.6: 47/100 47 Gemini 2.5 Lite ↔ Opus 4.6: 42/100 42 Gemini 2.5 Lite ↔ Opus 4.5: 49/100 49 Gemini 2.5 Lite ↔ Haiku 4.5: 54/100 54 Gemini 2.5 Lite ↔ Sonnet 4.5: 58/100 58 Gemini 2.5 Lite ↔ Opus 4.1: 61/100 61 Gemini 2.5 Lite ↔ Opus 4: 60/100 60 Gemini 2.5 Lite ↔ Sonnet 4: 63/100 63 Gemini 2.5 Lite ↔ Gemini 3.1 Lite: 61/100 61 Gemini 2.5 Lite ↔ Gemini 3.1 Pro: 52/100 52 Gemini 2.5 Lite ↔ Gemini 3 Flash: 60/100 60 Gemini 2.5 Lite ↔ Gemini 3 Pro: 53/100 53 Gemini 2.5 Lite ↔ Gemini 2.5 Flash: 66/100 66 Gemini 2.5 Lite ↔ Gemini 2.5 Pro: 52/100 52 Gemini 2.5 Lite ↔ GPT 5.5: 51/100 51 Gemini 2.5 Lite ↔ GPT 5.4: 55/100 55 Gemini 2.5 Lite ↔ GPT 5.3: 60/100 60 Gemini 2.5 Lite ↔ GPT 5.2: 58/100 58 Gemini 2.5 Lite ↔ GPT 5.1: 56/100 56 Gemini 2.5 Lite ↔ GPT 5: 57/100 57 Gemini 2.5 Lite ↔ Grok 4.2: 54/100 54 Gemini 2.5 Lite ↔ Grok 4.1: 55/100 55 Gemini 2.5 Flash ↔ Opus 4.7: 55/100 55 Gemini 2.5 Flash ↔ Sonnet 4.6: 59/100 59 Gemini 2.5 Flash ↔ Opus 4.6: 57/100 57 Gemini 2.5 Flash ↔ Opus 4.5: 57/100 57 Gemini 2.5 Flash ↔ Haiku 4.5: 71/100 71 Gemini 2.5 Flash ↔ Sonnet 4.5: 69/100 69 Gemini 2.5 Flash ↔ Opus 4.1: 75/100 75 Gemini 2.5 Flash ↔ Opus 4: 79/100 79 Gemini 2.5 Flash ↔ Sonnet 4: 77/100 77 Gemini 2.5 Flash ↔ Gemini 3.1 Lite: 70/100 70 Gemini 2.5 Flash ↔ Gemini 3.1 Pro: 71/100 71 Gemini 2.5 Flash ↔ Gemini 3 Flash: 64/100 64 Gemini 2.5 Flash ↔ Gemini 3 Pro: 67/100 67 Gemini 2.5 Flash ↔ Gemini 2.5 Lite: 66/100 66 Gemini 2.5 Flash ↔ Gemini 2.5 Pro: 69/100 69 Gemini 2.5 Flash ↔ GPT 5.5: 67/100 67 Gemini 2.5 Flash ↔ GPT 5.4: 71/100 71 Gemini 2.5 Flash ↔ GPT 5.3: 70/100 70 Gemini 2.5 Flash ↔ GPT 5.2: 63/100 63 Gemini 2.5 Flash ↔ GPT 5.1: 66/100 66 Gemini 2.5 Flash ↔ GPT 5: 66/100 66 Gemini 2.5 Flash ↔ Grok 4.2: 69/100 69 Gemini 2.5 Flash ↔ Grok 4.1: 78/100 78 Gemini 2.5 Pro ↔ Opus 4.7: 55/100 55 Gemini 2.5 Pro ↔ Sonnet 4.6: 58/100 58 Gemini 2.5 Pro ↔ Opus 4.6: 60/100 60 Gemini 2.5 Pro ↔ Opus 4.5: 55/100 55 Gemini 2.5 Pro ↔ Haiku 4.5: 65/100 65 Gemini 2.5 Pro ↔ Sonnet 4.5: 69/100 69 Gemini 2.5 Pro ↔ Opus 4.1: 72/100 72 Gemini 2.5 Pro ↔ Opus 4: 76/100 76 Gemini 2.5 Pro ↔ Sonnet 4: 65/100 65 Gemini 2.5 Pro ↔ Gemini 3.1 Lite: 64/100 64 Gemini 2.5 Pro ↔ Gemini 3.1 Pro: 72/100 72 Gemini 2.5 Pro ↔ Gemini 3 Flash: 62/100 62 Gemini 2.5 Pro ↔ Gemini 3 Pro: 67/100 67 Gemini 2.5 Pro ↔ Gemini 2.5 Lite: 52/100 52 Gemini 2.5 Pro ↔ Gemini 2.5 Flash: 69/100 69 Gemini 2.5 Pro ↔ GPT 5.5: 61/100 61 Gemini 2.5 Pro ↔ GPT 5.4: 66/100 66 Gemini 2.5 Pro ↔ GPT 5.3: 64/100 64 Gemini 2.5 Pro ↔ GPT 5.2: 60/100 60 Gemini 2.5 Pro ↔ GPT 5.1: 55/100 55 Gemini 2.5 Pro ↔ GPT 5: 67/100 67 Gemini 2.5 Pro ↔ Grok 4.2: 59/100 59 Gemini 2.5 Pro ↔ Grok 4.1: 69/100 69 GPT 5.5 ↔ Opus 4.7: 60/100 60 GPT 5.5 ↔ Sonnet 4.6: 59/100 59 GPT 5.5 ↔ Opus 4.6: 57/100 57 GPT 5.5 ↔ Opus 4.5: 61/100 61 GPT 5.5 ↔ Haiku 4.5: 65/100 65 GPT 5.5 ↔ Sonnet 4.5: 62/100 62 GPT 5.5 ↔ Opus 4.1: 62/100 62 GPT 5.5 ↔ Opus 4: 65/100 65 GPT 5.5 ↔ Sonnet 4: 60/100 60 GPT 5.5 ↔ Gemini 3.1 Lite: 65/100 65 GPT 5.5 ↔ Gemini 3.1 Pro: 79/100 79 GPT 5.5 ↔ Gemini 3 Flash: 59/100 59 GPT 5.5 ↔ Gemini 3 Pro: 73/100 73 GPT 5.5 ↔ Gemini 2.5 Lite: 51/100 51 GPT 5.5 ↔ Gemini 2.5 Flash: 67/100 67 GPT 5.5 ↔ Gemini 2.5 Pro: 61/100 61 GPT 5.5 ↔ GPT 5.4: 77/100 77 GPT 5.5 ↔ GPT 5.3: 64/100 64 GPT 5.5 ↔ GPT 5.2: 64/100 64 GPT 5.5 ↔ GPT 5.1: 66/100 66 GPT 5.5 ↔ GPT 5: 68/100 68 GPT 5.5 ↔ Grok 4.2: 60/100 60 GPT 5.5 ↔ Grok 4.1: 67/100 67 GPT 5.4 ↔ Opus 4.7: 53/100 53 GPT 5.4 ↔ Sonnet 4.6: 55/100 55 GPT 5.4 ↔ Opus 4.6: 56/100 56 GPT 5.4 ↔ Opus 4.5: 55/100 55 GPT 5.4 ↔ Haiku 4.5: 67/100 67 GPT 5.4 ↔ Sonnet 4.5: 67/100 67 GPT 5.4 ↔ Opus 4.1: 69/100 69 GPT 5.4 ↔ Opus 4: 71/100 71 GPT 5.4 ↔ Sonnet 4: 66/100 66 GPT 5.4 ↔ Gemini 3.1 Lite: 70/100 70 GPT 5.4 ↔ Gemini 3.1 Pro: 78/100 78 GPT 5.4 ↔ Gemini 3 Flash: 67/100 67 GPT 5.4 ↔ Gemini 3 Pro: 73/100 73 GPT 5.4 ↔ Gemini 2.5 Lite: 55/100 55 GPT 5.4 ↔ Gemini 2.5 Flash: 71/100 71 GPT 5.4 ↔ Gemini 2.5 Pro: 66/100 66 GPT 5.4 ↔ GPT 5.5: 77/100 77 GPT 5.4 ↔ GPT 5.3: 68/100 68 GPT 5.4 ↔ GPT 5.2: 63/100 63 GPT 5.4 ↔ GPT 5.1: 62/100 62 GPT 5.4 ↔ GPT 5: 73/100 73 GPT 5.4 ↔ Grok 4.2: 63/100 63 GPT 5.4 ↔ Grok 4.1: 68/100 68 GPT 5.3 ↔ Opus 4.7: 50/100 50 GPT 5.3 ↔ Sonnet 4.6: 48/100 48 GPT 5.3 ↔ Opus 4.6: 52/100 52 GPT 5.3 ↔ Opus 4.5: 49/100 49 GPT 5.3 ↔ Haiku 4.5: 56/100 56 GPT 5.3 ↔ Sonnet 4.5: 65/100 65 GPT 5.3 ↔ Opus 4.1: 70/100 70 GPT 5.3 ↔ Opus 4: 68/100 68 GPT 5.3 ↔ Sonnet 4: 65/100 65 GPT 5.3 ↔ Gemini 3.1 Lite: 64/100 64 GPT 5.3 ↔ Gemini 3.1 Pro: 65/100 65 GPT 5.3 ↔ Gemini 3 Flash: 59/100 59 GPT 5.3 ↔ Gemini 3 Pro: 62/100 62 GPT 5.3 ↔ Gemini 2.5 Lite: 60/100 60 GPT 5.3 ↔ Gemini 2.5 Flash: 70/100 70 GPT 5.3 ↔ Gemini 2.5 Pro: 64/100 64 GPT 5.3 ↔ GPT 5.5: 64/100 64 GPT 5.3 ↔ GPT 5.4: 68/100 68 GPT 5.3 ↔ GPT 5.2: 67/100 67 GPT 5.3 ↔ GPT 5.1: 69/100 69 GPT 5.3 ↔ GPT 5: 69/100 69 GPT 5.3 ↔ Grok 4.2: 65/100 65 GPT 5.3 ↔ Grok 4.1: 76/100 76 GPT 5.2 ↔ Opus 4.7: 52/100 52 GPT 5.2 ↔ Sonnet 4.6: 54/100 54 GPT 5.2 ↔ Opus 4.6: 59/100 59 GPT 5.2 ↔ Opus 4.5: 55/100 55 GPT 5.2 ↔ Haiku 4.5: 58/100 58 GPT 5.2 ↔ Sonnet 4.5: 66/100 66 GPT 5.2 ↔ Opus 4.1: 66/100 66 GPT 5.2 ↔ Opus 4: 69/100 69 GPT 5.2 ↔ Sonnet 4: 62/100 62 GPT 5.2 ↔ Gemini 3.1 Lite: 64/100 64 GPT 5.2 ↔ Gemini 3.1 Pro: 62/100 62 GPT 5.2 ↔ Gemini 3 Flash: 64/100 64 GPT 5.2 ↔ Gemini 3 Pro: 66/100 66 GPT 5.2 ↔ Gemini 2.5 Lite: 58/100 58 GPT 5.2 ↔ Gemini 2.5 Flash: 63/100 63 GPT 5.2 ↔ Gemini 2.5 Pro: 60/100 60 GPT 5.2 ↔ GPT 5.5: 64/100 64 GPT 5.2 ↔ GPT 5.4: 63/100 63 GPT 5.2 ↔ GPT 5.3: 67/100 67 GPT 5.2 ↔ GPT 5.1: 75/100 75 GPT 5.2 ↔ GPT 5: 72/100 72 GPT 5.2 ↔ Grok 4.2: 61/100 61 GPT 5.2 ↔ Grok 4.1: 67/100 67 GPT 5.1 ↔ Opus 4.7: 46/100 46 GPT 5.1 ↔ Sonnet 4.6: 50/100 50 GPT 5.1 ↔ Opus 4.6: 54/100 54 GPT 5.1 ↔ Opus 4.5: 50/100 50 GPT 5.1 ↔ Haiku 4.5: 56/100 56 GPT 5.1 ↔ Sonnet 4.5: 60/100 60 GPT 5.1 ↔ Opus 4.1: 62/100 62 GPT 5.1 ↔ Opus 4: 65/100 65 GPT 5.1 ↔ Sonnet 4: 61/100 61 GPT 5.1 ↔ Gemini 3.1 Lite: 60/100 60 GPT 5.1 ↔ Gemini 3.1 Pro: 63/100 63 GPT 5.1 ↔ Gemini 3 Flash: 54/100 54 GPT 5.1 ↔ Gemini 3 Pro: 60/100 60 GPT 5.1 ↔ Gemini 2.5 Lite: 56/100 56 GPT 5.1 ↔ Gemini 2.5 Flash: 66/100 66 GPT 5.1 ↔ Gemini 2.5 Pro: 55/100 55 GPT 5.1 ↔ GPT 5.5: 66/100 66 GPT 5.1 ↔ GPT 5.4: 62/100 62 GPT 5.1 ↔ GPT 5.3: 69/100 69 GPT 5.1 ↔ GPT 5.2: 75/100 75 GPT 5.1 ↔ GPT 5: 67/100 67 GPT 5.1 ↔ Grok 4.2: 63/100 63 GPT 5.1 ↔ Grok 4.1: 67/100 67 GPT 5 ↔ Opus 4.7: 56/100 56 GPT 5 ↔ Sonnet 4.6: 48/100 48 GPT 5 ↔ Opus 4.6: 56/100 56 GPT 5 ↔ Opus 4.5: 53/100 53 GPT 5 ↔ Haiku 4.5: 66/100 66 GPT 5 ↔ Sonnet 4.5: 65/100 65 GPT 5 ↔ Opus 4.1: 71/100 71 GPT 5 ↔ Opus 4: 76/100 76 GPT 5 ↔ Sonnet 4: 66/100 66 GPT 5 ↔ Gemini 3.1 Lite: 71/100 71 GPT 5 ↔ Gemini 3.1 Pro: 73/100 73 GPT 5 ↔ Gemini 3 Flash: 69/100 69 GPT 5 ↔ Gemini 3 Pro: 69/100 69 GPT 5 ↔ Gemini 2.5 Lite: 57/100 57 GPT 5 ↔ Gemini 2.5 Flash: 66/100 66 GPT 5 ↔ Gemini 2.5 Pro: 67/100 67 GPT 5 ↔ GPT 5.5: 68/100 68 GPT 5 ↔ GPT 5.4: 73/100 73 GPT 5 ↔ GPT 5.3: 69/100 69 GPT 5 ↔ GPT 5.2: 72/100 72 GPT 5 ↔ GPT 5.1: 67/100 67 GPT 5 ↔ Grok 4.2: 62/100 62 GPT 5 ↔ Grok 4.1: 72/100 72 Grok 4.2 ↔ Opus 4.7: 45/100 45 Grok 4.2 ↔ Sonnet 4.6: 44/100 44 Grok 4.2 ↔ Opus 4.6: 44/100 44 Grok 4.2 ↔ Opus 4.5: 47/100 47 Grok 4.2 ↔ Haiku 4.5: 55/100 55 Grok 4.2 ↔ Sonnet 4.5: 58/100 58 Grok 4.2 ↔ Opus 4.1: 62/100 62 Grok 4.2 ↔ Opus 4: 70/100 70 Grok 4.2 ↔ Sonnet 4: 63/100 63 Grok 4.2 ↔ Gemini 3.1 Lite: 63/100 63 Grok 4.2 ↔ Gemini 3.1 Pro: 65/100 65 Grok 4.2 ↔ Gemini 3 Flash: 62/100 62 Grok 4.2 ↔ Gemini 3 Pro: 61/100 61 Grok 4.2 ↔ Gemini 2.5 Lite: 54/100 54 Grok 4.2 ↔ Gemini 2.5 Flash: 69/100 69 Grok 4.2 ↔ Gemini 2.5 Pro: 59/100 59 Grok 4.2 ↔ GPT 5.5: 60/100 60 Grok 4.2 ↔ GPT 5.4: 63/100 63 Grok 4.2 ↔ GPT 5.3: 65/100 65 Grok 4.2 ↔ GPT 5.2: 61/100 61 Grok 4.2 ↔ GPT 5.1: 63/100 63 Grok 4.2 ↔ GPT 5: 62/100 62 Grok 4.2 ↔ Grok 4.1: 80/100 80 Grok 4.1 ↔ Opus 4.7: 50/100 50 Grok 4.1 ↔ Sonnet 4.6: 51/100 51 Grok 4.1 ↔ Opus 4.6: 52/100 52 Grok 4.1 ↔ Opus 4.5: 52/100 52 Grok 4.1 ↔ Haiku 4.5: 61/100 61 Grok 4.1 ↔ Sonnet 4.5: 66/100 66 Grok 4.1 ↔ Opus 4.1: 76/100 76 Grok 4.1 ↔ Opus 4: 79/100 79 Grok 4.1 ↔ Sonnet 4: 71/100 71 Grok 4.1 ↔ Gemini 3.1 Lite: 71/100 71 Grok 4.1 ↔ Gemini 3.1 Pro: 70/100 70 Grok 4.1 ↔ Gemini 3 Flash: 64/100 64 Grok 4.1 ↔ Gemini 3 Pro: 68/100 68 Grok 4.1 ↔ Gemini 2.5 Lite: 55/100 55 Grok 4.1 ↔ Gemini 2.5 Flash: 78/100 78 Grok 4.1 ↔ Gemini 2.5 Pro: 69/100 69 Grok 4.1 ↔ GPT 5.5: 67/100 67 Grok 4.1 ↔ GPT 5.4: 68/100 68 Grok 4.1 ↔ GPT 5.3: 76/100 76 Grok 4.1 ↔ GPT 5.2: 67/100 67 Grok 4.1 ↔ GPT 5.1: 67/100 67 Grok 4.1 ↔ GPT 5: 72/100 72 Grok 4.1 ↔ Grok 4.2: 80/100 80 80% 40%