Quoting Anthropic

We used an automatic classifier which judged sycophancy by looking at whether Claude showed a willingness to push back, maintain positions when challenged, give praise proportional to the merit of

Author: npub1hvnsascw5fx...
Published:
Format: Markdown (kind 30023)
Identifier:
naddr1qvzqqqr4gupzpwe8pmpsagjvkjeu5y5um54zk5v0vl4622qnra8t4zqp86guwzl0qqghzat0w35kueedv9h8g6rjdacxjccrsxtr8

We used an automatic classifier which judged sycophancy by looking at whether Claude showed a willingness to push back, maintain positions when challenged, give praise proportional to the merit of ideas, and speak frankly regardless of what a person wants to hear. Most of the time in these situations, Claude expressed no sycophancy—only 9% of conversations included sycophantic behavior (Figure 2). But two domains were exceptions: we saw sycophantic behavior in 38% of conversations focused on spirituality, and 25% of conversations on relationships.

— Anthropic (https://www.anthropic.com/research/claude-personal-guidance), How people ask Claude for personal guidance

Tags: ai-ethics (https://simonwillison.net/tags/ai-ethics), anthropic (https://simonwillison.net/tags/anthropic), claude (https://simonwillison.net/tags/claude), ai-personality (https://simonwillison.net/tags/ai-personality), generative-ai (https://simonwillison.net/tags/generative-ai), ai (https://simonwillison.net/tags/ai), llms (https://simonwillison.net/tags/llms), sycophancy (https://simonwillison.net/tags/sycophancy)

Comments (0)

No comments yet.