Large language model (LLM) agents have shown remarkable progress in social deduction games (SDGs). However, existing approaches primarily focus on information processing and strategy selection, overlooking the significance of persuasive communication in influencing other players' beliefs and responses. In SDGs, success depends not only on making correct deductions but also on convincing others to respond in alignment with one's intent.
To address this limitation, we formalize turn-based dialogue in SDGs as a Stackelberg competition, where the current player acts as the leader who strategically influences the follower's response. Building on this theoretical foundation, we propose a reinforcement learning framework that trains agents to optimize utterances for persuasive impact. Through comprehensive experiments across four diverse social deduction benchmarks, we demonstrate that our agents significantly outperform baselines.
Existing methods primarily focus on processing environmental information (such as identifying other players' roles) and selecting strategies based on this information.
In contrast, our method measures the next player's response distribution and optimizes utterances specifically for persuasive impact on subsequent player responses.
We model the interaction between consecutive players as a two-player Stackelberg competition, where the current player acts as the leader, and the next player acts as the follower. If the leader sufficiently understands how the follower will respond, they can maximize their utility by selecting actions that optimize their expected outcomes.
Stackelberg optimization process. First, the leader identifies its strategic intent by analyzing the current situation. Then, the leader measures the follower's response distribution to different leader actions. Finally, the leader optimizes its strategy to maximize its utility given the follower's response distribution.
The training framework of our agent. Dark blue arrows indicate the inference pipeline, while light blue arrows represent additional processes during training. The backend LLM identifies desired and undesired target responses, then generates a base utterance. The Refiner enhances it for maximum persuasive impact. The Measurer computes rewards by measuring how different refined utterances affect the probabilities of generating desired and undesired responses.
The leader identifies its persuasive intent: a set of desired responses that would be advantageous if spoken by the follower, and a set of undesired responses that would be disadvantageous.
The Measurer evaluates how different utterances shift the probability distribution of the follower's responses toward desired outcomes, computing a reward signal for training.
Using GRPO, the Refiner is trained to maximize persuasive impact by computing relative advantages within training batches, without requiring an explicit critic model.
Classic social deduction with Werewolves, Seer, and Guardian. 7 players engage in night actions and day discussions.
7 PlayersTeam-based deduction with hidden roles including Merlin, Assassin, and Servants. Features quest missions and assassination.
5 PlayersOne Night Ultimate Werewolf β a fast-paced single-night variant with role-swapping mechanics.
5 PlayersOpen-ended social simulation with diverse interpersonal scenarios requiring negotiation and cooperation.
2 Players@inproceedings{
zheng2026thestackelbergspeaker,
title={The Stackelberg Speaker: Optimizing Persuasive Communication in Social Deduction Games},
author={Zhang Zheng and Deheng Ye and Peilin Zhao and Hao Wang},
booktitle={The 64th Annual Meeting of the Association for Computational Linguistics},
year={2026},
url={https://openreview.net/forum?id=mqzGJF0nc3}
}