Decision-ToM Utility Fitting

This page expands Section 4.2 in the paper: a low-dimensional utility model for predicting accept/reject decisions from ToM-valued gain/loss terms, and what the fitted coefficients reveal about strategic preference structure.

Paper Section 4.2 Figure 6 + Table 1 Logistic Utility Model

Utility Construction from ToM Reports

For one incoming proposal, the decision maker may receive item multiset I_recv and give item multiset I_give. Item utilities are read from the decision maker's own reported ToM dictionaries.

\[ G_k=\sum_{i\in I_{\mathrm{recv}}} v_i^{(k)},\qquad L_k=\sum_{i\in I_{\mathrm{give}}} v_i^{(k)} \]

\[ U^{(0)}=G_0-\rho L_0 \]

\[ U^{(0,1)}=U^{(0)}+w_1\!\left(G_1-\gamma L_1\right) \]

\[ U^{(0,1,2)}=U^{(0,1)}+w_2\!\left(G_2-\kappa L_2\right) \]

Higher-order utilities are added progressively from zero-order to second-order ToM terms.

Here Gk is perceived gain and Lk is perceived loss under ToM order k. Higher-order terms are progressively added.

Important: this is a behavioral approximation model. It is used to explain observed decisions, not to claim an internal mechanistic decomposition of the LLM.

Decision Classifier

Accept/reject is modeled with logistic regression on standardized utility:

\[ P(y=1\mid U)=\sigma\!\left(\beta\,\widetilde{U}+b\right) \]

  • Optimizer: LBFGS.
  • Utility parameters: searched by grid search (e.g., rho, gamma, kappa, w1, w2).
  • Validation: episode-wise 3-fold cross-validation.
  • Reported metric: Macro-F1 under zero/first/second-order settings.

Predictability and Coefficient Structure

Figure 6 normalized logistic coefficients for zero first second order settings
Figure 6. Row-L1-normalized signed coefficients over c_G0, c_L0, c_G1, c_L1, c_G2, c_L2 across ToM-order settings.

Table 1: Macro-F1 (mean +/- std)

Model Zero First Second
GPT-4o0.77 +/- 0.280.94 +/- 0.091.00 +/- 0.00
o30.80 +/- 0.191.00 +/- 0.001.00 +/- 0.00
o4-mini0.74 +/- 0.180.80 +/- 0.090.83 +/- 0.04
GPT-5 (medium)0.74 +/- 0.240.92 +/- 0.060.85 +/- 0.11

All settings are above 0.74, indicating that a compact ToM-derived utility signal is sufficient to predict much of accept/reject behavior.

Feature Semantics (Table 2 in Paper)

FeatureMeaning (from decision maker reports)
G0Value of items the decision maker would receive (self valuation).
L0Value of items the decision maker would give (self valuation).
G1Value of receive-items under decision maker's estimate of opponent valuation.
L1Value of give-items under decision maker's estimate of opponent valuation.
G2Value of receive-items under decision maker's estimate of opponent's estimate of self.
L2Value of give-items under decision maker's estimate of opponent's estimate of self.

What Coefficients Say About Strategy

Zero-Order Setting

Decisions are largely self-gain driven (G0 positive). More capable models show stronger loss aversion (L0 more negative).

First-Order Setting

Models diverge in handling opponent-valued terms. GPT-4o/o4-mini and o3/GPT-5 show qualitatively different first-order weighting patterns.

Second-Order Setting

Across models, G2 tends to be rewarded and L2 penalized, suggesting sensitivity to how trades appear in the opponent's inferred view.

Strategic Layer Hypothesis

For o4-mini/GPT-5, second-order prompting increases caution toward empowering opponent high-value items (more negative L1).

Bottom-Line Conclusion

The fitted model indicates that accept/reject behavior is not random or purely lexical. It is strongly explainable by structured gain/loss features derived from ToM reports, and higher-order terms can reshape preference geometry in a model-specific way.