Back to the article

Philosophy Bench Live Dashboard

Philosophy Bench is a project still under active development. These are the latest results. You can find the code for this benchmark on Github.

Overall

How hard is each dilemma?

0%25%50%75%100%botch rate1255075100easier →← harderscenario rank (1–100)

Botching is when the model fails to commit to a canonical response cluster (hedges, refuses, or wanders off-task).

Baseline botch rate per model group

0%5%10%15%20%Grok 4 Family6.5%GPT 5.5 Family7.0%GPT 5 Family7.0%Gemini 3.0 Family8.0%Claude 4.0 Family9.0%Claude 4.5 Family15.7%Gemini 2.5 Family16.0%

Frequency at which the Model Obeys the Claude Constitution

0%10%20%30%40%50%60%Claude 4.5 Family52.0%GPT 5.5 Family44.1%GPT 5 Family41.5%Claude 4.0 Family39.6%Gemini 3.0 Family37.5%Gemini 2.5 Family37.3%Grok 4 Family33.7%

Three models were asked to grade which response was the most aligned with the Claude Constitution. This is the overlap between model responses and the constitutionally implied best response.

Share of reasoning that invokes a moral frame

0%25%50%75%100%Claude 4.5FamilyGemini 3.0FamilyGrok 4FamilyGemini 2.5FamilyClaude 4.0FamilyGPT 5FamilyGPT 5.5Family
D-primedBaselineC-primed

How often do two models pick the same response cluster?

Opus 4.7Sonnet 4.6Opus 4.6Opus 4.5Haiku 4.5Sonnet 4.5Opus 4.1Opus 4Sonnet 4Gemini 3.1 LiteGemini 3.1 ProGemini 3 FlashGemini 3 ProGemini 2.5 LiteGemini 2.5 FlashGemini 2.5 ProGPT 5.5GPT 5.4GPT 5.3GPT 5.2GPT 5.1GPT 5Grok 4.2Grok 4.1Opus 4.7Sonnet 4.6Opus 4.6Opus 4.5Haiku 4.5Sonnet 4.5Opus 4.1Opus 4Sonnet 4Gemini 3.1 LiteGemini 3.1 ProGemini 3 FlashGemini 3 ProGemini 2.5 LiteGemini 2.5 FlashGemini 2.5 ProGPT 5.5GPT 5.4GPT 5.3GPT 5.2GPT 5.1GPT 5Grok 4.2Grok 4.1Opus 4.7 ↔ Sonnet 4.6: 70/10070Opus 4.7 ↔ Opus 4.6: 69/10069Opus 4.7 ↔ Opus 4.5: 72/10072Opus 4.7 ↔ Haiku 4.5: 68/10068Opus 4.7 ↔ Sonnet 4.5: 64/10064Opus 4.7 ↔ Opus 4.1: 56/10056Opus 4.7 ↔ Opus 4: 57/10057Opus 4.7 ↔ Sonnet 4: 55/10055Opus 4.7 ↔ Gemini 3.1 Lite: 54/10054Opus 4.7 ↔ Gemini 3.1 Pro: 56/10056Opus 4.7 ↔ Gemini 3 Flash: 48/10048Opus 4.7 ↔ Gemini 3 Pro: 54/10054Opus 4.7 ↔ Gemini 2.5 Lite: 45/10045Opus 4.7 ↔ Gemini 2.5 Flash: 55/10055Opus 4.7 ↔ Gemini 2.5 Pro: 55/10055Opus 4.7 ↔ GPT 5.5: 60/10060Opus 4.7 ↔ GPT 5.4: 53/10053Opus 4.7 ↔ GPT 5.3: 50/10050Opus 4.7 ↔ GPT 5.2: 52/10052Opus 4.7 ↔ GPT 5.1: 46/10046Opus 4.7 ↔ GPT 5: 56/10056Opus 4.7 ↔ Grok 4.2: 45/10045Opus 4.7 ↔ Grok 4.1: 50/10050Sonnet 4.6 ↔ Opus 4.7: 70/10070Sonnet 4.6 ↔ Opus 4.6: 71/10071Sonnet 4.6 ↔ Opus 4.5: 74/10074Sonnet 4.6 ↔ Haiku 4.5: 67/10067Sonnet 4.6 ↔ Sonnet 4.5: 68/10068Sonnet 4.6 ↔ Opus 4.1: 56/10056Sonnet 4.6 ↔ Opus 4: 57/10057Sonnet 4.6 ↔ Sonnet 4: 55/10055Sonnet 4.6 ↔ Gemini 3.1 Lite: 53/10053Sonnet 4.6 ↔ Gemini 3.1 Pro: 54/10054Sonnet 4.6 ↔ Gemini 3 Flash: 54/10054Sonnet 4.6 ↔ Gemini 3 Pro: 59/10059Sonnet 4.6 ↔ Gemini 2.5 Lite: 47/10047Sonnet 4.6 ↔ Gemini 2.5 Flash: 59/10059Sonnet 4.6 ↔ Gemini 2.5 Pro: 58/10058Sonnet 4.6 ↔ GPT 5.5: 59/10059Sonnet 4.6 ↔ GPT 5.4: 55/10055Sonnet 4.6 ↔ GPT 5.3: 48/10048Sonnet 4.6 ↔ GPT 5.2: 54/10054Sonnet 4.6 ↔ GPT 5.1: 50/10050Sonnet 4.6 ↔ GPT 5: 48/10048Sonnet 4.6 ↔ Grok 4.2: 44/10044Sonnet 4.6 ↔ Grok 4.1: 51/10051Opus 4.6 ↔ Opus 4.7: 69/10069Opus 4.6 ↔ Sonnet 4.6: 71/10071Opus 4.6 ↔ Opus 4.5: 74/10074Opus 4.6 ↔ Haiku 4.5: 63/10063Opus 4.6 ↔ Sonnet 4.5: 68/10068Opus 4.6 ↔ Opus 4.1: 57/10057Opus 4.6 ↔ Opus 4: 57/10057Opus 4.6 ↔ Sonnet 4: 59/10059Opus 4.6 ↔ Gemini 3.1 Lite: 56/10056Opus 4.6 ↔ Gemini 3.1 Pro: 59/10059Opus 4.6 ↔ Gemini 3 Flash: 50/10050Opus 4.6 ↔ Gemini 3 Pro: 57/10057Opus 4.6 ↔ Gemini 2.5 Lite: 42/10042Opus 4.6 ↔ Gemini 2.5 Flash: 57/10057Opus 4.6 ↔ Gemini 2.5 Pro: 60/10060Opus 4.6 ↔ GPT 5.5: 57/10057Opus 4.6 ↔ GPT 5.4: 56/10056Opus 4.6 ↔ GPT 5.3: 52/10052Opus 4.6 ↔ GPT 5.2: 59/10059Opus 4.6 ↔ GPT 5.1: 54/10054Opus 4.6 ↔ GPT 5: 56/10056Opus 4.6 ↔ Grok 4.2: 44/10044Opus 4.6 ↔ Grok 4.1: 52/10052Opus 4.5 ↔ Opus 4.7: 72/10072Opus 4.5 ↔ Sonnet 4.6: 74/10074Opus 4.5 ↔ Opus 4.6: 74/10074Opus 4.5 ↔ Haiku 4.5: 67/10067Opus 4.5 ↔ Sonnet 4.5: 68/10068Opus 4.5 ↔ Opus 4.1: 59/10059Opus 4.5 ↔ Opus 4: 56/10056Opus 4.5 ↔ Sonnet 4: 58/10058Opus 4.5 ↔ Gemini 3.1 Lite: 55/10055Opus 4.5 ↔ Gemini 3.1 Pro: 60/10060Opus 4.5 ↔ Gemini 3 Flash: 57/10057Opus 4.5 ↔ Gemini 3 Pro: 58/10058Opus 4.5 ↔ Gemini 2.5 Lite: 49/10049Opus 4.5 ↔ Gemini 2.5 Flash: 57/10057Opus 4.5 ↔ Gemini 2.5 Pro: 55/10055Opus 4.5 ↔ GPT 5.5: 61/10061Opus 4.5 ↔ GPT 5.4: 55/10055Opus 4.5 ↔ GPT 5.3: 49/10049Opus 4.5 ↔ GPT 5.2: 55/10055Opus 4.5 ↔ GPT 5.1: 50/10050Opus 4.5 ↔ GPT 5: 53/10053Opus 4.5 ↔ Grok 4.2: 47/10047Opus 4.5 ↔ Grok 4.1: 52/10052Haiku 4.5 ↔ Opus 4.7: 68/10068Haiku 4.5 ↔ Sonnet 4.6: 67/10067Haiku 4.5 ↔ Opus 4.6: 63/10063Haiku 4.5 ↔ Opus 4.5: 67/10067Haiku 4.5 ↔ Sonnet 4.5: 75/10075Haiku 4.5 ↔ Opus 4.1: 67/10067Haiku 4.5 ↔ Opus 4: 65/10065Haiku 4.5 ↔ Sonnet 4: 64/10064Haiku 4.5 ↔ Gemini 3.1 Lite: 60/10060Haiku 4.5 ↔ Gemini 3.1 Pro: 69/10069Haiku 4.5 ↔ Gemini 3 Flash: 59/10059Haiku 4.5 ↔ Gemini 3 Pro: 63/10063Haiku 4.5 ↔ Gemini 2.5 Lite: 54/10054Haiku 4.5 ↔ Gemini 2.5 Flash: 71/10071Haiku 4.5 ↔ Gemini 2.5 Pro: 65/10065Haiku 4.5 ↔ GPT 5.5: 65/10065Haiku 4.5 ↔ GPT 5.4: 67/10067Haiku 4.5 ↔ GPT 5.3: 56/10056Haiku 4.5 ↔ GPT 5.2: 58/10058Haiku 4.5 ↔ GPT 5.1: 56/10056Haiku 4.5 ↔ GPT 5: 66/10066Haiku 4.5 ↔ Grok 4.2: 55/10055Haiku 4.5 ↔ Grok 4.1: 61/10061Sonnet 4.5 ↔ Opus 4.7: 64/10064Sonnet 4.5 ↔ Sonnet 4.6: 68/10068Sonnet 4.5 ↔ Opus 4.6: 68/10068Sonnet 4.5 ↔ Opus 4.5: 68/10068Sonnet 4.5 ↔ Haiku 4.5: 75/10075Sonnet 4.5 ↔ Opus 4.1: 73/10073Sonnet 4.5 ↔ Opus 4: 68/10068Sonnet 4.5 ↔ Sonnet 4: 73/10073Sonnet 4.5 ↔ Gemini 3.1 Lite: 64/10064Sonnet 4.5 ↔ Gemini 3.1 Pro: 64/10064Sonnet 4.5 ↔ Gemini 3 Flash: 65/10065Sonnet 4.5 ↔ Gemini 3 Pro: 60/10060Sonnet 4.5 ↔ Gemini 2.5 Lite: 58/10058Sonnet 4.5 ↔ Gemini 2.5 Flash: 69/10069Sonnet 4.5 ↔ Gemini 2.5 Pro: 69/10069Sonnet 4.5 ↔ GPT 5.5: 62/10062Sonnet 4.5 ↔ GPT 5.4: 67/10067Sonnet 4.5 ↔ GPT 5.3: 65/10065Sonnet 4.5 ↔ GPT 5.2: 66/10066Sonnet 4.5 ↔ GPT 5.1: 60/10060Sonnet 4.5 ↔ GPT 5: 65/10065Sonnet 4.5 ↔ Grok 4.2: 58/10058Sonnet 4.5 ↔ Grok 4.1: 66/10066Opus 4.1 ↔ Opus 4.7: 56/10056Opus 4.1 ↔ Sonnet 4.6: 56/10056Opus 4.1 ↔ Opus 4.6: 57/10057Opus 4.1 ↔ Opus 4.5: 59/10059Opus 4.1 ↔ Haiku 4.5: 67/10067Opus 4.1 ↔ Sonnet 4.5: 73/10073Opus 4.1 ↔ Opus 4: 84/10084Opus 4.1 ↔ Sonnet 4: 75/10075Opus 4.1 ↔ Gemini 3.1 Lite: 72/10072Opus 4.1 ↔ Gemini 3.1 Pro: 71/10071Opus 4.1 ↔ Gemini 3 Flash: 67/10067Opus 4.1 ↔ Gemini 3 Pro: 67/10067Opus 4.1 ↔ Gemini 2.5 Lite: 61/10061Opus 4.1 ↔ Gemini 2.5 Flash: 75/10075Opus 4.1 ↔ Gemini 2.5 Pro: 72/10072Opus 4.1 ↔ GPT 5.5: 62/10062Opus 4.1 ↔ GPT 5.4: 69/10069Opus 4.1 ↔ GPT 5.3: 70/10070Opus 4.1 ↔ GPT 5.2: 66/10066Opus 4.1 ↔ GPT 5.1: 62/10062Opus 4.1 ↔ GPT 5: 71/10071Opus 4.1 ↔ Grok 4.2: 62/10062Opus 4.1 ↔ Grok 4.1: 76/10076Opus 4 ↔ Opus 4.7: 57/10057Opus 4 ↔ Sonnet 4.6: 57/10057Opus 4 ↔ Opus 4.6: 57/10057Opus 4 ↔ Opus 4.5: 56/10056Opus 4 ↔ Haiku 4.5: 65/10065Opus 4 ↔ Sonnet 4.5: 68/10068Opus 4 ↔ Opus 4.1: 84/10084Opus 4 ↔ Sonnet 4: 76/10076Opus 4 ↔ Gemini 3.1 Lite: 76/10076Opus 4 ↔ Gemini 3.1 Pro: 76/10076Opus 4 ↔ Gemini 3 Flash: 69/10069Opus 4 ↔ Gemini 3 Pro: 78/10078Opus 4 ↔ Gemini 2.5 Lite: 60/10060Opus 4 ↔ Gemini 2.5 Flash: 79/10079Opus 4 ↔ Gemini 2.5 Pro: 76/10076Opus 4 ↔ GPT 5.5: 65/10065Opus 4 ↔ GPT 5.4: 71/10071Opus 4 ↔ GPT 5.3: 68/10068Opus 4 ↔ GPT 5.2: 69/10069Opus 4 ↔ GPT 5.1: 65/10065Opus 4 ↔ GPT 5: 76/10076Opus 4 ↔ Grok 4.2: 70/10070Opus 4 ↔ Grok 4.1: 79/10079Sonnet 4 ↔ Opus 4.7: 55/10055Sonnet 4 ↔ Sonnet 4.6: 55/10055Sonnet 4 ↔ Opus 4.6: 59/10059Sonnet 4 ↔ Opus 4.5: 58/10058Sonnet 4 ↔ Haiku 4.5: 64/10064Sonnet 4 ↔ Sonnet 4.5: 73/10073Sonnet 4 ↔ Opus 4.1: 75/10075Sonnet 4 ↔ Opus 4: 76/10076Sonnet 4 ↔ Gemini 3.1 Lite: 71/10071Sonnet 4 ↔ Gemini 3.1 Pro: 65/10065Sonnet 4 ↔ Gemini 3 Flash: 62/10062Sonnet 4 ↔ Gemini 3 Pro: 66/10066Sonnet 4 ↔ Gemini 2.5 Lite: 63/10063Sonnet 4 ↔ Gemini 2.5 Flash: 77/10077Sonnet 4 ↔ Gemini 2.5 Pro: 65/10065Sonnet 4 ↔ GPT 5.5: 60/10060Sonnet 4 ↔ GPT 5.4: 66/10066Sonnet 4 ↔ GPT 5.3: 65/10065Sonnet 4 ↔ GPT 5.2: 62/10062Sonnet 4 ↔ GPT 5.1: 61/10061Sonnet 4 ↔ GPT 5: 66/10066Sonnet 4 ↔ Grok 4.2: 63/10063Sonnet 4 ↔ Grok 4.1: 71/10071Gemini 3.1 Lite ↔ Opus 4.7: 54/10054Gemini 3.1 Lite ↔ Sonnet 4.6: 53/10053Gemini 3.1 Lite ↔ Opus 4.6: 56/10056Gemini 3.1 Lite ↔ Opus 4.5: 55/10055Gemini 3.1 Lite ↔ Haiku 4.5: 60/10060Gemini 3.1 Lite ↔ Sonnet 4.5: 64/10064Gemini 3.1 Lite ↔ Opus 4.1: 72/10072Gemini 3.1 Lite ↔ Opus 4: 76/10076Gemini 3.1 Lite ↔ Sonnet 4: 71/10071Gemini 3.1 Lite ↔ Gemini 3.1 Pro: 72/10072Gemini 3.1 Lite ↔ Gemini 3 Flash: 73/10073Gemini 3.1 Lite ↔ Gemini 3 Pro: 71/10071Gemini 3.1 Lite ↔ Gemini 2.5 Lite: 61/10061Gemini 3.1 Lite ↔ Gemini 2.5 Flash: 70/10070Gemini 3.1 Lite ↔ Gemini 2.5 Pro: 64/10064Gemini 3.1 Lite ↔ GPT 5.5: 65/10065Gemini 3.1 Lite ↔ GPT 5.4: 70/10070Gemini 3.1 Lite ↔ GPT 5.3: 64/10064Gemini 3.1 Lite ↔ GPT 5.2: 64/10064Gemini 3.1 Lite ↔ GPT 5.1: 60/10060Gemini 3.1 Lite ↔ GPT 5: 71/10071Gemini 3.1 Lite ↔ Grok 4.2: 63/10063Gemini 3.1 Lite ↔ Grok 4.1: 71/10071Gemini 3.1 Pro ↔ Opus 4.7: 56/10056Gemini 3.1 Pro ↔ Sonnet 4.6: 54/10054Gemini 3.1 Pro ↔ Opus 4.6: 59/10059Gemini 3.1 Pro ↔ Opus 4.5: 60/10060Gemini 3.1 Pro ↔ Haiku 4.5: 69/10069Gemini 3.1 Pro ↔ Sonnet 4.5: 64/10064Gemini 3.1 Pro ↔ Opus 4.1: 71/10071Gemini 3.1 Pro ↔ Opus 4: 76/10076Gemini 3.1 Pro ↔ Sonnet 4: 65/10065Gemini 3.1 Pro ↔ Gemini 3.1 Lite: 72/10072Gemini 3.1 Pro ↔ Gemini 3 Flash: 63/10063Gemini 3.1 Pro ↔ Gemini 3 Pro: 82/10082Gemini 3.1 Pro ↔ Gemini 2.5 Lite: 52/10052Gemini 3.1 Pro ↔ Gemini 2.5 Flash: 71/10071Gemini 3.1 Pro ↔ Gemini 2.5 Pro: 72/10072Gemini 3.1 Pro ↔ GPT 5.5: 79/10079Gemini 3.1 Pro ↔ GPT 5.4: 78/10078Gemini 3.1 Pro ↔ GPT 5.3: 65/10065Gemini 3.1 Pro ↔ GPT 5.2: 62/10062Gemini 3.1 Pro ↔ GPT 5.1: 63/10063Gemini 3.1 Pro ↔ GPT 5: 73/10073Gemini 3.1 Pro ↔ Grok 4.2: 65/10065Gemini 3.1 Pro ↔ Grok 4.1: 70/10070Gemini 3 Flash ↔ Opus 4.7: 48/10048Gemini 3 Flash ↔ Sonnet 4.6: 54/10054Gemini 3 Flash ↔ Opus 4.6: 50/10050Gemini 3 Flash ↔ Opus 4.5: 57/10057Gemini 3 Flash ↔ Haiku 4.5: 59/10059Gemini 3 Flash ↔ Sonnet 4.5: 65/10065Gemini 3 Flash ↔ Opus 4.1: 67/10067Gemini 3 Flash ↔ Opus 4: 69/10069Gemini 3 Flash ↔ Sonnet 4: 62/10062Gemini 3 Flash ↔ Gemini 3.1 Lite: 73/10073Gemini 3 Flash ↔ Gemini 3.1 Pro: 63/10063Gemini 3 Flash ↔ Gemini 3 Pro: 64/10064Gemini 3 Flash ↔ Gemini 2.5 Lite: 60/10060Gemini 3 Flash ↔ Gemini 2.5 Flash: 64/10064Gemini 3 Flash ↔ Gemini 2.5 Pro: 62/10062Gemini 3 Flash ↔ GPT 5.5: 59/10059Gemini 3 Flash ↔ GPT 5.4: 67/10067Gemini 3 Flash ↔ GPT 5.3: 59/10059Gemini 3 Flash ↔ GPT 5.2: 64/10064Gemini 3 Flash ↔ GPT 5.1: 54/10054Gemini 3 Flash ↔ GPT 5: 69/10069Gemini 3 Flash ↔ Grok 4.2: 62/10062Gemini 3 Flash ↔ Grok 4.1: 64/10064Gemini 3 Pro ↔ Opus 4.7: 54/10054Gemini 3 Pro ↔ Sonnet 4.6: 59/10059Gemini 3 Pro ↔ Opus 4.6: 57/10057Gemini 3 Pro ↔ Opus 4.5: 58/10058Gemini 3 Pro ↔ Haiku 4.5: 63/10063Gemini 3 Pro ↔ Sonnet 4.5: 60/10060Gemini 3 Pro ↔ Opus 4.1: 67/10067Gemini 3 Pro ↔ Opus 4: 78/10078Gemini 3 Pro ↔ Sonnet 4: 66/10066Gemini 3 Pro ↔ Gemini 3.1 Lite: 71/10071Gemini 3 Pro ↔ Gemini 3.1 Pro: 82/10082Gemini 3 Pro ↔ Gemini 3 Flash: 64/10064Gemini 3 Pro ↔ Gemini 2.5 Lite: 53/10053Gemini 3 Pro ↔ Gemini 2.5 Flash: 67/10067Gemini 3 Pro ↔ Gemini 2.5 Pro: 67/10067Gemini 3 Pro ↔ GPT 5.5: 73/10073Gemini 3 Pro ↔ GPT 5.4: 73/10073Gemini 3 Pro ↔ GPT 5.3: 62/10062Gemini 3 Pro ↔ GPT 5.2: 66/10066Gemini 3 Pro ↔ GPT 5.1: 60/10060Gemini 3 Pro ↔ GPT 5: 69/10069Gemini 3 Pro ↔ Grok 4.2: 61/10061Gemini 3 Pro ↔ Grok 4.1: 68/10068Gemini 2.5 Lite ↔ Opus 4.7: 45/10045Gemini 2.5 Lite ↔ Sonnet 4.6: 47/10047Gemini 2.5 Lite ↔ Opus 4.6: 42/10042Gemini 2.5 Lite ↔ Opus 4.5: 49/10049Gemini 2.5 Lite ↔ Haiku 4.5: 54/10054Gemini 2.5 Lite ↔ Sonnet 4.5: 58/10058Gemini 2.5 Lite ↔ Opus 4.1: 61/10061Gemini 2.5 Lite ↔ Opus 4: 60/10060Gemini 2.5 Lite ↔ Sonnet 4: 63/10063Gemini 2.5 Lite ↔ Gemini 3.1 Lite: 61/10061Gemini 2.5 Lite ↔ Gemini 3.1 Pro: 52/10052Gemini 2.5 Lite ↔ Gemini 3 Flash: 60/10060Gemini 2.5 Lite ↔ Gemini 3 Pro: 53/10053Gemini 2.5 Lite ↔ Gemini 2.5 Flash: 66/10066Gemini 2.5 Lite ↔ Gemini 2.5 Pro: 52/10052Gemini 2.5 Lite ↔ GPT 5.5: 51/10051Gemini 2.5 Lite ↔ GPT 5.4: 55/10055Gemini 2.5 Lite ↔ GPT 5.3: 60/10060Gemini 2.5 Lite ↔ GPT 5.2: 58/10058Gemini 2.5 Lite ↔ GPT 5.1: 56/10056Gemini 2.5 Lite ↔ GPT 5: 57/10057Gemini 2.5 Lite ↔ Grok 4.2: 54/10054Gemini 2.5 Lite ↔ Grok 4.1: 55/10055Gemini 2.5 Flash ↔ Opus 4.7: 55/10055Gemini 2.5 Flash ↔ Sonnet 4.6: 59/10059Gemini 2.5 Flash ↔ Opus 4.6: 57/10057Gemini 2.5 Flash ↔ Opus 4.5: 57/10057Gemini 2.5 Flash ↔ Haiku 4.5: 71/10071Gemini 2.5 Flash ↔ Sonnet 4.5: 69/10069Gemini 2.5 Flash ↔ Opus 4.1: 75/10075Gemini 2.5 Flash ↔ Opus 4: 79/10079Gemini 2.5 Flash ↔ Sonnet 4: 77/10077Gemini 2.5 Flash ↔ Gemini 3.1 Lite: 70/10070Gemini 2.5 Flash ↔ Gemini 3.1 Pro: 71/10071Gemini 2.5 Flash ↔ Gemini 3 Flash: 64/10064Gemini 2.5 Flash ↔ Gemini 3 Pro: 67/10067Gemini 2.5 Flash ↔ Gemini 2.5 Lite: 66/10066Gemini 2.5 Flash ↔ Gemini 2.5 Pro: 69/10069Gemini 2.5 Flash ↔ GPT 5.5: 67/10067Gemini 2.5 Flash ↔ GPT 5.4: 71/10071Gemini 2.5 Flash ↔ GPT 5.3: 70/10070Gemini 2.5 Flash ↔ GPT 5.2: 63/10063Gemini 2.5 Flash ↔ GPT 5.1: 66/10066Gemini 2.5 Flash ↔ GPT 5: 66/10066Gemini 2.5 Flash ↔ Grok 4.2: 69/10069Gemini 2.5 Flash ↔ Grok 4.1: 78/10078Gemini 2.5 Pro ↔ Opus 4.7: 55/10055Gemini 2.5 Pro ↔ Sonnet 4.6: 58/10058Gemini 2.5 Pro ↔ Opus 4.6: 60/10060Gemini 2.5 Pro ↔ Opus 4.5: 55/10055Gemini 2.5 Pro ↔ Haiku 4.5: 65/10065Gemini 2.5 Pro ↔ Sonnet 4.5: 69/10069Gemini 2.5 Pro ↔ Opus 4.1: 72/10072Gemini 2.5 Pro ↔ Opus 4: 76/10076Gemini 2.5 Pro ↔ Sonnet 4: 65/10065Gemini 2.5 Pro ↔ Gemini 3.1 Lite: 64/10064Gemini 2.5 Pro ↔ Gemini 3.1 Pro: 72/10072Gemini 2.5 Pro ↔ Gemini 3 Flash: 62/10062Gemini 2.5 Pro ↔ Gemini 3 Pro: 67/10067Gemini 2.5 Pro ↔ Gemini 2.5 Lite: 52/10052Gemini 2.5 Pro ↔ Gemini 2.5 Flash: 69/10069Gemini 2.5 Pro ↔ GPT 5.5: 61/10061Gemini 2.5 Pro ↔ GPT 5.4: 66/10066Gemini 2.5 Pro ↔ GPT 5.3: 64/10064Gemini 2.5 Pro ↔ GPT 5.2: 60/10060Gemini 2.5 Pro ↔ GPT 5.1: 55/10055Gemini 2.5 Pro ↔ GPT 5: 67/10067Gemini 2.5 Pro ↔ Grok 4.2: 59/10059Gemini 2.5 Pro ↔ Grok 4.1: 69/10069GPT 5.5 ↔ Opus 4.7: 60/10060GPT 5.5 ↔ Sonnet 4.6: 59/10059GPT 5.5 ↔ Opus 4.6: 57/10057GPT 5.5 ↔ Opus 4.5: 61/10061GPT 5.5 ↔ Haiku 4.5: 65/10065GPT 5.5 ↔ Sonnet 4.5: 62/10062GPT 5.5 ↔ Opus 4.1: 62/10062GPT 5.5 ↔ Opus 4: 65/10065GPT 5.5 ↔ Sonnet 4: 60/10060GPT 5.5 ↔ Gemini 3.1 Lite: 65/10065GPT 5.5 ↔ Gemini 3.1 Pro: 79/10079GPT 5.5 ↔ Gemini 3 Flash: 59/10059GPT 5.5 ↔ Gemini 3 Pro: 73/10073GPT 5.5 ↔ Gemini 2.5 Lite: 51/10051GPT 5.5 ↔ Gemini 2.5 Flash: 67/10067GPT 5.5 ↔ Gemini 2.5 Pro: 61/10061GPT 5.5 ↔ GPT 5.4: 77/10077GPT 5.5 ↔ GPT 5.3: 64/10064GPT 5.5 ↔ GPT 5.2: 64/10064GPT 5.5 ↔ GPT 5.1: 66/10066GPT 5.5 ↔ GPT 5: 68/10068GPT 5.5 ↔ Grok 4.2: 60/10060GPT 5.5 ↔ Grok 4.1: 67/10067GPT 5.4 ↔ Opus 4.7: 53/10053GPT 5.4 ↔ Sonnet 4.6: 55/10055GPT 5.4 ↔ Opus 4.6: 56/10056GPT 5.4 ↔ Opus 4.5: 55/10055GPT 5.4 ↔ Haiku 4.5: 67/10067GPT 5.4 ↔ Sonnet 4.5: 67/10067GPT 5.4 ↔ Opus 4.1: 69/10069GPT 5.4 ↔ Opus 4: 71/10071GPT 5.4 ↔ Sonnet 4: 66/10066GPT 5.4 ↔ Gemini 3.1 Lite: 70/10070GPT 5.4 ↔ Gemini 3.1 Pro: 78/10078GPT 5.4 ↔ Gemini 3 Flash: 67/10067GPT 5.4 ↔ Gemini 3 Pro: 73/10073GPT 5.4 ↔ Gemini 2.5 Lite: 55/10055GPT 5.4 ↔ Gemini 2.5 Flash: 71/10071GPT 5.4 ↔ Gemini 2.5 Pro: 66/10066GPT 5.4 ↔ GPT 5.5: 77/10077GPT 5.4 ↔ GPT 5.3: 68/10068GPT 5.4 ↔ GPT 5.2: 63/10063GPT 5.4 ↔ GPT 5.1: 62/10062GPT 5.4 ↔ GPT 5: 73/10073GPT 5.4 ↔ Grok 4.2: 63/10063GPT 5.4 ↔ Grok 4.1: 68/10068GPT 5.3 ↔ Opus 4.7: 50/10050GPT 5.3 ↔ Sonnet 4.6: 48/10048GPT 5.3 ↔ Opus 4.6: 52/10052GPT 5.3 ↔ Opus 4.5: 49/10049GPT 5.3 ↔ Haiku 4.5: 56/10056GPT 5.3 ↔ Sonnet 4.5: 65/10065GPT 5.3 ↔ Opus 4.1: 70/10070GPT 5.3 ↔ Opus 4: 68/10068GPT 5.3 ↔ Sonnet 4: 65/10065GPT 5.3 ↔ Gemini 3.1 Lite: 64/10064GPT 5.3 ↔ Gemini 3.1 Pro: 65/10065GPT 5.3 ↔ Gemini 3 Flash: 59/10059GPT 5.3 ↔ Gemini 3 Pro: 62/10062GPT 5.3 ↔ Gemini 2.5 Lite: 60/10060GPT 5.3 ↔ Gemini 2.5 Flash: 70/10070GPT 5.3 ↔ Gemini 2.5 Pro: 64/10064GPT 5.3 ↔ GPT 5.5: 64/10064GPT 5.3 ↔ GPT 5.4: 68/10068GPT 5.3 ↔ GPT 5.2: 67/10067GPT 5.3 ↔ GPT 5.1: 69/10069GPT 5.3 ↔ GPT 5: 69/10069GPT 5.3 ↔ Grok 4.2: 65/10065GPT 5.3 ↔ Grok 4.1: 76/10076GPT 5.2 ↔ Opus 4.7: 52/10052GPT 5.2 ↔ Sonnet 4.6: 54/10054GPT 5.2 ↔ Opus 4.6: 59/10059GPT 5.2 ↔ Opus 4.5: 55/10055GPT 5.2 ↔ Haiku 4.5: 58/10058GPT 5.2 ↔ Sonnet 4.5: 66/10066GPT 5.2 ↔ Opus 4.1: 66/10066GPT 5.2 ↔ Opus 4: 69/10069GPT 5.2 ↔ Sonnet 4: 62/10062GPT 5.2 ↔ Gemini 3.1 Lite: 64/10064GPT 5.2 ↔ Gemini 3.1 Pro: 62/10062GPT 5.2 ↔ Gemini 3 Flash: 64/10064GPT 5.2 ↔ Gemini 3 Pro: 66/10066GPT 5.2 ↔ Gemini 2.5 Lite: 58/10058GPT 5.2 ↔ Gemini 2.5 Flash: 63/10063GPT 5.2 ↔ Gemini 2.5 Pro: 60/10060GPT 5.2 ↔ GPT 5.5: 64/10064GPT 5.2 ↔ GPT 5.4: 63/10063GPT 5.2 ↔ GPT 5.3: 67/10067GPT 5.2 ↔ GPT 5.1: 75/10075GPT 5.2 ↔ GPT 5: 72/10072GPT 5.2 ↔ Grok 4.2: 61/10061GPT 5.2 ↔ Grok 4.1: 67/10067GPT 5.1 ↔ Opus 4.7: 46/10046GPT 5.1 ↔ Sonnet 4.6: 50/10050GPT 5.1 ↔ Opus 4.6: 54/10054GPT 5.1 ↔ Opus 4.5: 50/10050GPT 5.1 ↔ Haiku 4.5: 56/10056GPT 5.1 ↔ Sonnet 4.5: 60/10060GPT 5.1 ↔ Opus 4.1: 62/10062GPT 5.1 ↔ Opus 4: 65/10065GPT 5.1 ↔ Sonnet 4: 61/10061GPT 5.1 ↔ Gemini 3.1 Lite: 60/10060GPT 5.1 ↔ Gemini 3.1 Pro: 63/10063GPT 5.1 ↔ Gemini 3 Flash: 54/10054GPT 5.1 ↔ Gemini 3 Pro: 60/10060GPT 5.1 ↔ Gemini 2.5 Lite: 56/10056GPT 5.1 ↔ Gemini 2.5 Flash: 66/10066GPT 5.1 ↔ Gemini 2.5 Pro: 55/10055GPT 5.1 ↔ GPT 5.5: 66/10066GPT 5.1 ↔ GPT 5.4: 62/10062GPT 5.1 ↔ GPT 5.3: 69/10069GPT 5.1 ↔ GPT 5.2: 75/10075GPT 5.1 ↔ GPT 5: 67/10067GPT 5.1 ↔ Grok 4.2: 63/10063GPT 5.1 ↔ Grok 4.1: 67/10067GPT 5 ↔ Opus 4.7: 56/10056GPT 5 ↔ Sonnet 4.6: 48/10048GPT 5 ↔ Opus 4.6: 56/10056GPT 5 ↔ Opus 4.5: 53/10053GPT 5 ↔ Haiku 4.5: 66/10066GPT 5 ↔ Sonnet 4.5: 65/10065GPT 5 ↔ Opus 4.1: 71/10071GPT 5 ↔ Opus 4: 76/10076GPT 5 ↔ Sonnet 4: 66/10066GPT 5 ↔ Gemini 3.1 Lite: 71/10071GPT 5 ↔ Gemini 3.1 Pro: 73/10073GPT 5 ↔ Gemini 3 Flash: 69/10069GPT 5 ↔ Gemini 3 Pro: 69/10069GPT 5 ↔ Gemini 2.5 Lite: 57/10057GPT 5 ↔ Gemini 2.5 Flash: 66/10066GPT 5 ↔ Gemini 2.5 Pro: 67/10067GPT 5 ↔ GPT 5.5: 68/10068GPT 5 ↔ GPT 5.4: 73/10073GPT 5 ↔ GPT 5.3: 69/10069GPT 5 ↔ GPT 5.2: 72/10072GPT 5 ↔ GPT 5.1: 67/10067GPT 5 ↔ Grok 4.2: 62/10062GPT 5 ↔ Grok 4.1: 72/10072Grok 4.2 ↔ Opus 4.7: 45/10045Grok 4.2 ↔ Sonnet 4.6: 44/10044Grok 4.2 ↔ Opus 4.6: 44/10044Grok 4.2 ↔ Opus 4.5: 47/10047Grok 4.2 ↔ Haiku 4.5: 55/10055Grok 4.2 ↔ Sonnet 4.5: 58/10058Grok 4.2 ↔ Opus 4.1: 62/10062Grok 4.2 ↔ Opus 4: 70/10070Grok 4.2 ↔ Sonnet 4: 63/10063Grok 4.2 ↔ Gemini 3.1 Lite: 63/10063Grok 4.2 ↔ Gemini 3.1 Pro: 65/10065Grok 4.2 ↔ Gemini 3 Flash: 62/10062Grok 4.2 ↔ Gemini 3 Pro: 61/10061Grok 4.2 ↔ Gemini 2.5 Lite: 54/10054Grok 4.2 ↔ Gemini 2.5 Flash: 69/10069Grok 4.2 ↔ Gemini 2.5 Pro: 59/10059Grok 4.2 ↔ GPT 5.5: 60/10060Grok 4.2 ↔ GPT 5.4: 63/10063Grok 4.2 ↔ GPT 5.3: 65/10065Grok 4.2 ↔ GPT 5.2: 61/10061Grok 4.2 ↔ GPT 5.1: 63/10063Grok 4.2 ↔ GPT 5: 62/10062Grok 4.2 ↔ Grok 4.1: 80/10080Grok 4.1 ↔ Opus 4.7: 50/10050Grok 4.1 ↔ Sonnet 4.6: 51/10051Grok 4.1 ↔ Opus 4.6: 52/10052Grok 4.1 ↔ Opus 4.5: 52/10052Grok 4.1 ↔ Haiku 4.5: 61/10061Grok 4.1 ↔ Sonnet 4.5: 66/10066Grok 4.1 ↔ Opus 4.1: 76/10076Grok 4.1 ↔ Opus 4: 79/10079Grok 4.1 ↔ Sonnet 4: 71/10071Grok 4.1 ↔ Gemini 3.1 Lite: 71/10071Grok 4.1 ↔ Gemini 3.1 Pro: 70/10070Grok 4.1 ↔ Gemini 3 Flash: 64/10064Grok 4.1 ↔ Gemini 3 Pro: 68/10068Grok 4.1 ↔ Gemini 2.5 Lite: 55/10055Grok 4.1 ↔ Gemini 2.5 Flash: 78/10078Grok 4.1 ↔ Gemini 2.5 Pro: 69/10069Grok 4.1 ↔ GPT 5.5: 67/10067Grok 4.1 ↔ GPT 5.4: 68/10068Grok 4.1 ↔ GPT 5.3: 76/10076Grok 4.1 ↔ GPT 5.2: 67/10067Grok 4.1 ↔ GPT 5.1: 67/10067Grok 4.1 ↔ GPT 5: 72/10072Grok 4.1 ↔ Grok 4.2: 80/1008080%40%

Deontological / Consequentialist Lean

Decision lean by model family under priming

← more deontologicalmore consequentialist →-1-0.50+0.5+1Claude 4.5 FamilyClaude 4.0 FamilyGemini 3.0 FamilyGemini 2.5 FamilyGPT 5.5 FamilyGPT 5 FamilyGrok 4 Family
D-primed baseline C-primed

Anthropic models under priming

← more deontologicalmore consequentialist →-1-0.50+0.5+1Opus 4.7Sonnet 4.6Opus 4.6Opus 4.5Haiku 4.5Sonnet 4.5Opus 4.1Opus 4Sonnet 4
D-primed baseline C-primed

Gemini models under priming

← more deontologicalmore consequentialist →-1-0.50+0.5+1Gemini 3.1 LiteGemini 3.1 ProGemini 3 FlashGemini 3 ProGemini 2.5 LiteGemini 2.5 FlashGemini 2.5 Pro
D-primed baseline C-primed

OpenAI models under priming

← more deontologicalmore consequentialist →-1-0.50+0.5+1GPT 5.5GPT 5.4GPT 5.3GPT 5.2GPT 5.1GPT 5
D-primed baseline C-primed

xAI models under priming

← more deontologicalmore consequentialist →-1-0.50+0.5+1Grok 4.2Grok 4.1
D-primed baseline C-primed

User Compliance

Do models shift their ethics based on user requests?

How much more likely is a model to select an ethical framework if the user is advocating for it vs. against it

← User asked for DUser asked for C →10050050100Opus 4.7D+14C+11GPT 5.4D+68C+38Gemini 3.1 ProD+59C+43Grok 4.2D+59C+41

User compliance during ethical dilemmas

0%25%50%75%100%Claude 4.5 FamilyClaude 4.0 FamilyGemini 3.0 FamilyGemini 2.5 FamilyGPT 5.5 FamilyGPT 5 FamilyGrok 4 Family
D-style request (user favors principle over outcome)C-style request (user favors outcome over principle)

Does the model do what the user asks?

0%20%40%60%80%27%59%Opus 4.749%81%GPT 5.451%84%Gemini 3.1 Pro49%78%Grok 4.2
D-style request (user favors principle over outcome)C-style request (user favors outcome over principle)

Probability Deontological Option Selected

0%20%40%60%80%46%59%Opus 4.714%81%GPT 5.424%84%Gemini 3.1 Pro19%78%Grok 4.2
D-style request (user favors principle over outcome)C-style request (user favors outcome over principle)