This paper pays tribute to Professor Giovanni Andrea Cornia's lifelong contributions to the measurement of global inequality. We review twelve world and regional databases of the Gini coefficient, illustrate their coverage, overlapping, and data gaps, and analyse the major sources of discrepancy among published Ginis. Merging all databases into a unified collection of over 122,000 observations spanning 222 countries from 1867 to 2024, we document how differences in welfare metrics, reference units, sub-metric definitions, post-survey adjustments, and survey design produce Gini estimates that diverge considerably -- sometimes by as much as 50 percentage points -- for the same country and year. We quantify pairwise cross-database discordance, document the income-consumption Gini gap by region and income group, and discuss the contributions of welfare metric and equivalence scale choices to cross-database dispersion. We extend the analysis with a dedicated discussion of comparability across time and across measurement dimensions, showing how multiple layers of methodological choice interact to make any single Gini figure a product of a complex chain of decisions that are rarely fully disclosed. Our analysis confirms that the choice of welfare metric remains the single most important source of cross-country non-comparability, while sub-metric definitions and equivalence scales introduce further systematic differences that are routinely overlooked in comparative work.
Historical episodes such as the World War I "live-and-let-live" system and the Christmas Truce of 1914 demonstrate that opposing military units can establish spontaneous, local cooperation even in extreme conflict environments. Such cooperative behavior is typically fragile and temporary, while large-scale wars persist. We develop a hierarchical decision problem in which local units adopt contingent strategies that depend on interactions, accumulated payoffs, and signals from a central command. The command authority can impose enforcement that penalizes non-aggression to prolong hostilities. Our model features a continuous space of parametric strategies and formalizes replicator dynamics over the population. We analytically characterize the conditions under which local cooperation emerges as a stable evolutionary equilibrium and identify critical thresholds of central enforcement that destroy cooperative equilibria. We show that stable peace requires either alignment of command incentives with frontline welfare, external constraints on enforcement, or diminishing political returns to conflict. The framework provides a micro-founded explanation for the persistence of war despite locally beneficial cooperation.
The persistence of poverty is not well explained by who is poor. We argue the relevant object of measurement is trappedness--expected escape time from deprivation--which varies systematically across institutional environments and is invisible to standard poverty indices. Using Markov chains estimated on twenty years of longitudinal data from 27 European countries, we show that countries with identical deprivation rates differ in escape times by up to fourfold. These differences are not explained by household characteristics alone: exogenous shocks reshape welfare landscapes differently across countries, with divergence tracking welfare regime architecture rather than household composition. The mechanism is behavioural: health constrains a household's capacity to convert income gains into durable welfare improvement. Income transfers without health improvement fail to reduce poverty-return risk; combined interventions are super-additive across 28 countries, and the gap widens with transfer size. These findings dissolve the long-running poverty trap debate--studies that rejected traps measured the wrong dimension; studies that found them captured one projection of a multidimensional dynamic process. Trappedness is continuous, multidimensional, and institutionally shaped.
Clustered sampling is prevalent in empirical regression discontinuity (RD) designs, but it has not received much attention in the theoretical literature. In this paper, we introduce a general model-based framework for such settings and derive high-level conditions under which the standard local linear RD estimator is asymptotically normal. We verify that our high-level assumptions hold across a wide range of empirical designs, including settings of growing cluster sizes. We further show that clustered standard errors that are currently used in practice can be either inconsistent or overly conservative in finite samples. To address these issues, we propose a novel nearest-neighbor-type variance estimator and illustrate its properties in a diverse set of empirical applications.
A German ministry recently proposed a limit of at most one price increase per day for petrol stations. At what time should the price reset be allowed in order to lower price levels the most throughout the day? To answer this question, I infer the share of price-sensitive consumers for every hour of the day from German petrol station price data, based on a simple spatial-competition model. I focus on weekdays, which are the relevant target because commuter demand is less flexible than weekend demand. Hourly petrol station prices peak at 07:00 and bottom out at 19:00. Given the inferred composition of price-sensitivity throughout the day and hourly passenger-car traffic frequencies as a proxy for quantity, I evaluate every possible reset-hour of the new policy. The lowest traffic-weighted average price is achieved by an 11:00 reset. With this reset-hour, the resulting equilibrium price throughout the day is constant. This would lead to lower prices in the morning but higher prices in the evening, harming price-sensitive consumers but benefiting morning commuters and firms.
Online social platforms increasingly rely on crowd-sourced systems to label misleading content at scale, but these systems must both aggregate users' evaluations and decide whose evaluations to trust. To address the latter, many platforms audit users by rewarding agreement with the final aggregate outcome, a design we term consensus-based auditing. We analyze the consequences of this design in X's Community Notes, which in September 2022 adopted consensus-based auditing that ties users' eligibility for participation to agreement with the eventual platform outcome. We find evidence of strategic conformity: minority contributors' evaluations drift toward the majority and their participation share falls on controversial topics, where independent signals matter most. We formalize this mechanism in a behavioral model in which contributors trade off private beliefs against anticipated penalties for disagreement. Motivated by these findings, we propose a two-stage auditing and aggregation algorithm that weights contributors by the stability of their past residuals rather than by agreement with the majority. The method first accounts for differences across content and contributors, and then measures how predictable each contributor's evaluations are relative to the latent-factor model. Contributors whose evaluations are consistently informative receive greater influence in aggregation, even when they disagree with the prevailing consensus. In the Community Notes data, this approach improves out-of-sample predictive performance while avoiding penalization of disagreement.
We present a new solution concept called evolutionarily stable Stackelberg equilibrium (SESS). We study the Stackelberg evolutionary game setting in which there is a single leading player and a symmetric population of followers. The leader selects an optimal mixed strategy, anticipating that the follower population plays an evolutionarily stable strategy (ESS) in the induced subgame and may satisfy additional ecological conditions. We consider both leader-optimal and follower-optimal selection among ESSs, which arise as special cases of our framework. Prior approaches to Stackelberg evolutionary games either define the follower response via evolutionary dynamics or assume rational best-response behavior, without explicitly enforcing stability against invasion by mutations. We present algorithms for computing SESS in discrete and continuous games, and validate the latter empirically. Our model applies naturally to biological settings; for example, in cancer treatment the leader represents the physician and the followers correspond to competing cancer cell phenotypes.
This paper provides a behavioral analysis of the post-pandemic transformation of work, using a dataset of approximately 41 billion mobile geolocation records from 73.5 million individuals in the five largest U.S. metropolitan areas from the pre- to post- pandemic periods. By tracking movements between corporate headquarters, residences, and other points of interest, we document a structural shift in work patterns. Office based workdays declined from 42% in 2019 to 20.7% in 2022, before settling at 29.1% in 2023, a new equilibrium significantly below pre-pandemic levels. A "midweek mountain" peak of office attendance on Tuesdays through Thursdays, emerged as a robust new phenomenon post-pandemic. The nature of remote work has also changed: both in and after the pandemic, employees working from home allocated significantly more time to non-work locations like parks and malls during the workday. These findings indicate that the pandemic catalyzed a lasting transformation not just in work arrangements but also in the integration of personal and professional life, with implications for corporate policy, urban economics, and the future of work.
AI agents are increasingly deployed in interactive economic environments characterized by repeated AI-AI interactions. Despite AI agents' advanced capabilities, empirical studies reveal that such interactions often fail to stably induce a strategic equilibrium, such as a Nash equilibrium. Post-training methods have been proposed to induce a strategic equilibrium; however, it remains impractical to uniformly apply an alignment method across diverse, independently developed AI models in strategic settings. In this paper, we provide theoretical and empirical evidence that off-the-shelf reasoning AI agents can achieve Nash-like play zero-shot, without explicit post-training. Specifically, we prove that `reasonably reasoning' agents, i.e., agents capable of forming beliefs about others' strategies from previous observation and learning to best respond to these beliefs, eventually behave along almost every realized play path in a way that is weakly close to a Nash equilibrium of the continuation game. In addition, we relax the common-knowledge payoff assumption by allowing stage payoffs to be unknown and by having each agent observe only its own privately realized stochastic payoffs, and we show that we can still achieve the same on-path Nash convergence guarantee. We then empirically validate the proposed theories by simulating five game scenarios, ranging from a repeated prisoner's dilemma game to stylized repeated marketing promotion games. Our findings suggest that AI agents naturally exhibit such reasoning patterns and therefore attain stable equilibrium behaviors intrinsically, obviating the need for universal alignment procedures in many real-world strategic interactions.
To estimate the causal effect of an intervention, researchers need to identify a control group that represents what might have happened to the treatment group in the absence of that intervention. This is challenging without a randomized experiment and further complicated when few units (possibly only one) are treated. Nevertheless, when data are available on units over time, synthetic control (SC) methods provide an opportunity to construct a valid comparison by differentially weighting control units that did not receive the treatment so that their resulting pre-treatment trajectory is similar to that of the treated unit. The hope is that this weighted ``pseudo-counterfactual" can serve as a valid counterfactual in the post-treatment time period. Since its origin twenty years ago, SC has been used over 5,000 times in the literature (Web of Science, December 2025), leading to a proliferation of descriptions of the method and guidance on proper usage that is not always accurate and does not always align with what the original developers appear to have intended. As such, a number of accepted pieces of wisdom have arisen: (1) SC is robust to various implementations; (2) covariates are unnecessary, and (3) pre-treatment prediction error should guide model selection. We describe each in detail and conduct simulations that suggest, both for standard and alternative implementations of SC, that these purported truths are not supported by empirical evidence and thus actually represent misconceptions about best practice. Instead of relying on these misconceptions, we offer practical advice for more cautious implementation and interpretation of results.
In order to identify expertise, forecasters should not be tested by their calibration score, which can always be made arbitrarily small, but rather by their Brier score. The Brier score is the sum of the calibration score and the refinement score; the latter measures how good the sorting into bins with the same forecast is, and thus attests to "expertise." This raises the question of whether one can gain calibration without losing expertise, which we refer to as "calibeating." We provide an easy way to calibeat any forecast, by a deterministic online procedure. We moreover show that calibeating can be achieved by a stochastic procedure that is itself calibrated, and then extend the results to simultaneously calibeating multiple procedures, and to deterministic procedures that are continuously calibrated.
The Arellano-Bond estimator is a fundamental method for dynamic panel data models, widely used in practice. It can be severely biased when the time series dimension of the data, $T$, is long. The source of the bias is the large degree of overidentification. We propose a simple two-step approach to deal with this problem. The first step applies LASSO to the cross-section data at each time period to select the most informative moment conditions, exploiting the approximately sparse structure of these conditions. The second step applies a linear instrumental variable estimator using the instruments constructed from the moment conditions selected in the first step. Using asymptotic sequences where the two dimensions of the panel grow with the sample size, we show that the new estimator is consistent and asymptotically normal under much weaker conditions on $T$ than the Arellano-Bond estimator. Our theory covers models with high-dimensional covariates including multiple lags of the dependent variable and strictly exogenous covariates, which are becoming common in modern applications. We illustrate our approach by applying it to weekly county-level panel data from the United States to study opening K-12 schools and other mitigation policies' short and long-term effects on COVID-19's spread.
Recent literature proposes combining short-term experimental and long-term observational data to provide alternatives to conventional observational studies for the identification of long-term average treatment effects (LTEs). This paper re-examines the identification problem and uncovers that assumptions restricting temporal link functions -- relationships between short-term and mean long-term potential outcomes -- are central in this context. The experimental data serve to amplify the identifying power of such assumptions; absent them, the combined data are no more informative than the observational data alone. Plausible inference thus hinges on justifiable restrictions in this class. Motivated by this, I introduce two treatment response assumptions that may be defensible based on economic theory or intuition. To utilize them and facilitate future developments, I develop a novel unifying identification framework that computationally produces sharp bounds on the LTE for a general class of temporal link function restrictions and accommodates imperfect experimental compliance -- thereby also extending existing approaches. I illustrate the method by estimating the long-term effects of Head Start participation. The findings indicate that the effects on educational attainment, employment, and criminal involvement are lasting but smaller in magnitude than those established by sibling comparisons.
Measures of inflation uncertainty and directional risk derived from higher moments of forecast distributions are contaminated by the first moment, but in distinct ways. Using individual density forecasts from the ECB Survey of Professional Forecasters, this paper shows that 42% of the variation in raw forecast variance reflects the distance of expected inflation from target, a mechanical level effect, while raw asymmetry is too noisy to identify directional risk without reference to the central forecast. We propose two complementary corrections. Normalized Uncertainty (NU) purges dispersion of its predictable component linked to the policy anchor, recovering genuine belief imprecision. Asymmetry Coherence (AC) extracts directional risk only when asymmetry aligns with the central forecast, formalizing the balance of risks. These corrections alter inference. In a replication of Barro (1995), the volatility effect on growth disappears once level contamination is removed, while the inflation-level coefficient regains significance. In a VAR, the sign of the policy response reverses: raw asymmetry suggests easing, whereas coherent upside risk predicts tightening. In the credit channel, higher uncertainty slows and weakens pass-through from policy easing to loan pricing, especially at longer maturities. A division of roles emerges: NU governs transmission, AC informs policy response. Higher moments are informative only when measurement separates macroeconomic signals from first-moment contamination.
We use valid inequalities (cuts) of the binary integer program for winner determination in a combinatorial auction (CA) as "artificial items" that can be interpreted intuitively and priced to generate Artificial Walrasian Equilibria. We thus provide a method for converting a CA problem that admits only non-anonymous, nonlinear bundle prices into one that admits anonymous linear prices over the augmented item space, forestalling ex-post bidder complaints about opaque and strongly discriminatory pricing. To this end, we introduce a refinement of the Walrasian equilibrium which we call a "price-match equilibrium" (PME) in which all prices are justified by providing an iso-revenue reallocation for the hypothetical removal of any single bidder. We prove the existence of PME for any CA and characterize their economic properties and computation. We implement minimally artificial PME rules and compare them with other prominent CA payment rules in the literature.
This paper measures the effects of temperature and precipitation shocks on Mexican inflation using a regional panel. To measure the long-term inflationary effects of climate shocks, we estimate a panel autoregressive distributed lag model (panel ARDL) of the quarterly variation of the price index against the population-weighted temperature and precipitation deviations from their historical norm, computed using the 30-year moving average. In addition, we measure the short-term effects of climate shocks by estimating impulse response functions using panel local projections. The result indicates that, in the short term, the climate variables have no statistical effect on Mexican inflation. However, in the long term, only precipitation norms have a statistical effect, and the temperature norms have no statistical impact. Higher than normal precipitation has a positive and statistically significant effect on Mexican inflation for all items.
This paper develops limit theorems for random variables with network dependence, without requiring the individuals in the network to be located in a Euclidean or metric space. This distinguishes our approach from most existing limit theorems in network statistics and econometrics, which are based on weak dependence concepts such as strong mixing, near-epoch dependence, or $\psi$-dependence. All these weak dependence concepts presuppose an underlying metric. By relaxing the assumption of an underlying metric space, our theorems can be applied to a broader range of network data, including financial and social networks. To derive the limit theorems, we generalize the concept of functional dependence (also known as physical dependence) from time series to random variables with network dependence. Using this framework, we establish several inequalities, a law of large numbers, and central limit theorems. Furthermore, we demonstrate the verifiability of our high-level conditions by deriving primitive sufficient conditions for spatial autoregressive models, which are widely used in network data analysis.
An agent chooses an action based on her private information and a recommendation from an informed but potentially misaligned adviser. With a known probability, the adviser truthfully reports his signal; with the remaining probability, he can send any message. We characterize optimal robust decision rules that maximize the agent's worst-case expected payoff. Every optimal rule is equivalent to a trust-region policy in belief space: the adviser's reported beliefs are taken at face value if they fall within the trust region but are otherwise clipped to the trust region's boundary. We derive alignment thresholds above which advice is strictly valuable and fully characterize the solution in both binary-state and binary-action environments.
We study the $n$-dimensional contest between two asymmetric players with different marginal effort costs, with each dimension (i.e., battle) modeled as a Tullock contest. We allow general identity-independent and budget-balanced prize allocation rules in which each player's prize increases weakly in the number of their victories, e.g., a majority rule. When the discriminatory power of the Tullock winner-selection mechanism is no greater than $2/(n+1)$, a unique equilibrium arises where each player exerts deterministic and identical effort across all dimensions. This condition applies uniformly to all eligible prize allocation rules and all levels of players' asymmetry, and it is tight. Under this condition, we derive the effort-maximizing prize allocation rule: the entire prize is awarded to the player who wins more battles than his opponent by a pre-specified margin, and the prize is split equally if neither player does. When players are symmetric, the majority rule is optimal.
We study stochastic choice when behavior is generated by switching among a small library of transparent deterministic decision procedures. The object of interest is procedural heterogeneity: how the relative importance of these procedures varies across decision environments. We model this heterogeneity with a Random Rule Model (RRM), in which menu-level choice probabilities arise from environment-dependent weights on named rules. We show that identification has a two-step structure. At a fixed feature value, variation in decisive-side patterns across menus identifies the vector of relative rule weights up to scale; across sufficiently rich feature values, these recovered weights identify the parameters of an affine gate. Applied to a large dataset of binary lottery choices, the estimated procedure weights are concentrated on a small subset of interpretable rules and shift systematically with menu characteristics such as tradeoff complexity and dispersion asymmetry. Out-of-sample prediction and cross-dataset portability provide supporting evidence that the recovered procedural representation is empirically disciplined.
AI coding agents make empirical specification search fast and cheap, but they also widen hidden researcher degrees of freedom. Building on an open-source agent-loop architecture, this paper adapts that framework to an empirical economics workflow and adds a post-search holdout evaluation. In a forecast-combination illustration, multiple independent agent runs outperform standard benchmarks in the original rolling evaluation, but not all continue to do so on a post-search holdout. Logged search and holdout evaluation together make adaptive specification search more transparent and help distinguish robust improvements from sample-specific discoveries.
Forward regression is a classical and effective tool for variable screening in ultra-high dimensional linear models, but its standard projection-based implementation can be computationally costly and numerically unstable when predictors are strongly collinear. Motivated by this limitation, we propose an orthogonalized forward regression procedure, implemented recursively through Gram-Schmidt updates, that ranks predictors according to their unique contributions after removing the effects of variables already selected. This approach preserves the interpretability of forward regression while substantially reducing the cost of repeated projections. We further develop a path-based model size selection rule using statistics computed directly from the forward sequence, thereby avoiding cross-validation and extensive tuning. The resulting method is particularly well suited to settings in which the number of predictors far exceeds the sample size and strong collinearity renders the conventional forward fitting ineffective. Theoretically, we derive the optimal convergence rate for the proposed Gram-Schmidt forward regression, thereby extending existing results for projection-based forward regression, and further show that it enjoys sure screening property and variable selection consistency under suitable conditions. Simulation studies and empirical examples demonstrate that it provides a favorable balance among computational efficiency, numerical stability, screening accuracy, and predictive performance, especially in highly correlated ultra-high dimensional settings.
In strategic environments with private information, evaluating a change in policy requires predicting how the equilibrium responds -- but when actions reshape opponents' signals, each agent's optimal response depends on an infinite hierarchy of beliefs about beliefs that has resisted exact analysis for four decades. We provide the first exact equilibrium characterization of finite-player continuous-time LQG games with endogenous signals. Conditioning on primitive Brownian shocks rather than the physical state -- a dynamic analogue of Harsanyi's common-prior construction -- collapses the belief hierarchy onto deterministic two-time kernels, reducing Nash equilibrium to a deterministic fixed point with no truncation and no large-population limit. The characterization yields an explicit information wedge that prices the marginal value of shifting opponents' posteriors. The wedge vanishes precisely when signals are exogenous to controls, formally delineating the boundary where strategic belief manipulation matters, and provides a closed-form mapping from information primitives to equilibrium outcomes.