We used an automatic classifier which judged sycophancy by looking at whether Claude showed a willingness to push back, maintain positions when challenged, give praise proportional to the merit of ideas, and speak frankly regardless of what a person wants to hear. Most of the time in these situations, Claude expressed no sycophancy—only 9% of conversations included sycophantic behavior (Figure 2). But two domains were exceptions: we saw sycophantic behavior in 38% of conversations focused on spirituality, and 25% of conversations on relationships.
— Anthropic (https://www.anthropic.com/research/claude-personal-guidance), How people ask Claude for personal guidance
Tags: ai-ethics (https://simonwillison.net/tags/ai-ethics), anthropic (https://simonwillison.net/tags/anthropic), claude (https://simonwillison.net/tags/claude), ai-personality (https://simonwillison.net/tags/ai-personality), generative-ai (https://simonwillison.net/tags/generative-ai), ai (https://simonwillison.net/tags/ai), llms (https://simonwillison.net/tags/llms), sycophancy (https://simonwillison.net/tags/sycophancy)
Comments (0)
No comments yet.