Discussion Subpage 03

Decision-ToM Utility Fitting

This page expands Section 4.2 in the paper: a low-dimensional utility model for predicting accept/reject decisions from ToM-valued gain/loss terms, and what the fitted coefficients reveal about strategic preference structure.

Paper Section 4.2 Figure 6 + Table 1 Logistic Utility Model

Model Formulation Optimization & Validation Macro-F1 and Coefficients Strategic Interpretation

Utility Construction from ToM Reports

For one incoming proposal, the decision maker may receive item multiset I_recv and give item multiset I_give. Item utilities are read from the decision maker's own reported ToM dictionaries.

\[ G_k=\sum_{i\in I_{\mathrm{recv}}} v_i^{(k)},\qquad L_k=\sum_{i\in I_{\mathrm{give}}} v_i^{(k)} \]

\[ U^{(0)}=G_0-\rho L_0 \]

\[ U^{(0,1)}=U^{(0)}+w_1\!\left(G_1-\gamma L_1\right) \]

\[ U^{(0,1,2)}=U^{(0,1)}+w_2\!\left(G_2-\kappa L_2\right) \]

Higher-order utilities are added progressively from zero-order to second-order ToM terms.

Here Gk is perceived gain and Lk is perceived loss under ToM order k. Higher-order terms are progressively added.

Important: this is a behavioral approximation model. It is used to explain observed decisions, not to claim an internal mechanistic decomposition of the LLM.

Decision Classifier

Accept/reject is modeled with logistic regression on standardized utility:

\[ P(y=1\mid U)=\sigma\!\left(\beta\,\widetilde{U}+b\right) \]

Optimizer: LBFGS.
Utility parameters: searched by grid search (e.g., rho, gamma, kappa, w1, w2).
Validation: episode-wise 3-fold cross-validation.
Reported metric: Macro-F1 under zero/first/second-order settings.

Results

Predictability and Coefficient Structure

Figure 6 normalized logistic coefficients for zero first second order settings — **Figure 6.** Row-L1-normalized signed coefficients over `c_G0, c_L0, c_G1, c_L1, c_G2, c_L2` across ToM-order settings.

Table 1: Macro-F1 (mean +/- std)

Model	Zero	First	Second
GPT-4o	0.77 +/- 0.28	0.94 +/- 0.09	1.00 +/- 0.00
o3	0.80 +/- 0.19	1.00 +/- 0.00	1.00 +/- 0.00
o4-mini	0.74 +/- 0.18	0.80 +/- 0.09	0.83 +/- 0.04
GPT-5 (medium)	0.74 +/- 0.24	0.92 +/- 0.06	0.85 +/- 0.11

All settings are above 0.74, indicating that a compact ToM-derived utility signal is sufficient to predict much of accept/reject behavior.

Feature Semantics (Table 2 in Paper)

Feature	Meaning (from decision maker reports)
G0	Value of items the decision maker would receive (self valuation).
L0	Value of items the decision maker would give (self valuation).
G1	Value of receive-items under decision maker's estimate of opponent valuation.
L1	Value of give-items under decision maker's estimate of opponent valuation.
G2	Value of receive-items under decision maker's estimate of opponent's estimate of self.
L2	Value of give-items under decision maker's estimate of opponent's estimate of self.

Interpretation

What Coefficients Say About Strategy

Zero-Order Setting

Decisions are largely self-gain driven (G0 positive). More capable models show stronger loss aversion (L0 more negative).

First-Order Setting

Models diverge in handling opponent-valued terms. GPT-4o/o4-mini and o3/GPT-5 show qualitatively different first-order weighting patterns.

Second-Order Setting

Across models, G2 tends to be rewarded and L2 penalized, suggesting sensitivity to how trades appear in the opponent's inferred view.

Strategic Layer Hypothesis

For o4-mini/GPT-5, second-order prompting increases caution toward empowering opponent high-value items (more negative L1).

Bottom-Line Conclusion

The fitted model indicates that accept/reject behavior is not random or purely lexical. It is strongly explainable by structured gain/loss features derived from ToM reports, and higher-order terms can reshape preference geometry in a model-specific way.

Previous: Proposal Shift Next: Overall Discussion Back to Discussion Hub