My research explores how institutions shape economic and financial outcomes, and how advances in machine learning can be used to study complex social and economic systems. More broadly, I am interested in methodological questions concerning scientific discovery, prediction, and the generation of knowledge.
Publications
Does Securities Regulation Matter? Mandatory Disclosure, Excess Stock Volatility and the U.S. 1934 Securities Exchange Act
Journal of Law and Economics 2026
We examine whether the US Securities Exchange Act of 1934 significantly stabilized the market by introducing mandatory disclosure of information. We argue that mandatory information disclosure can curb stock manipulation by enhancing transparency, thereby reducing excess stock volatility. After a comprehensive assessment of the voluntary disclosure practices of companies listed on the New York Stock Exchange before 1934, we find that those with poor disclosure practices experienced a significantly greater reduction in volatility after the implementation of the act compared with those with good disclosure practices. Further analysis reveals that the liquidity of these companies with poor disclosure practices also improved significantly more than that of companies with better disclosure, and the improvement in liquidity was linked to the decrease in their volatility. Given that one key purpose of the act’s legislators was to reduce excess market volatility, our findings provide empirical support for considering this legislative aim successful.
Can regulation make markets fundamentally more stable, or does it merely impose additional compliance costs? Our research tends to support the former view.
Is Machine Learning a Necessity? A Regression-based Approach for Stock Return Prediction
Journal of Empirical Finance 2025
We propose a simple, linear-regression-based method for prediction of the time series of stock returns. The method achieves out-of-sample performances comparable to machine learning methods while having ignorable computational costs. The key component of the method is to integrate a straightforward cross-market factor screening into the iterated combination method proposed by Lin et al., (2018). Our empirical results on the U.S. stock market show that the method outperforms many state-of-the-art machine learning methods in certain periods. The method also exhibits greater utility gain and investment profits in most periods after considering transaction costs.
I have some thoughts on the predictability of the timeseries of stock returns. They are shared in the Introduction.
Complete Subset Averaging Methods in Corporate Bond Return Prediction
Finance Research Letters 2023
We investigate the performances of two methods of complete subset averaging—complete subset linear averaging (CSLA) and complete subset quantile averaging (CSQA)—on the problem of corporate bond return prediction. We find that the two methods are overwhelmingly better than univariate linear regression and simple forecast combination. Meanwhile, CSQA is better than CSLA in most cases. For practical implementation, we also provide discussions on the selection of the hyperparameter k when applying these complete subset averaging methods.
Stock Return Prediction: Stacking a Variety of Models
Journal of Empirical Finance 2022
We employ an ensemble learning approach, “stacking”, to refine and combine a variety of linear and nonlinear individual stock return prediction models. In an application of forecasting U.S. market excess return, stacking with a simple structure can outperform the traditional historical mean benchmark, Mallows model averaging, simple combination forecast, complete subset regression, combination elastic net forecast, and several other models in terms of both in- and out-of-sample performance measures on a consistent basis. More importantly, we find that the out-of-sample gains of stacking are especially evident during extreme downside market movements. Overall, stacking can generate substantive improvements in market excess return predictability.
I was expecting an even better outcome. It may still be improved, though.
The Impact of COVID-19 Pandemic on the Volatility Connectedness Network of Global Stock Market
Pacific-Basin Finance Journal 2022
This paper investigates how the COVID-19 pandemic affects the connectedness network of stock market volatility in 19 economies around the world. Our method builds on the Diebold-Yilmaz volatility network model to construct the volatility spillover index, and uses lag sparse group LASSO to accommodate the high-dimensional system. We find that the outbreak of the COVID-19 pandemic strengthens the overall volatility connectedness, and the global connectedness level remains high throughout 2020. In particular, connections across different continents have become stronger during this period. However, China is shown to be disconnected from the global volatility connectedness network until late November 2020. We find evidence that China is not the main source of volatility spillover during the COVID-19 pandemic.
Working papers
经济学研究“过度模型化”之辨:行为公设、制度框架与经验校准
Albert Bo Zhao
近年来,经济学研究的方法论问题重新受到关注。陆铭教授将经济学研究中的一种倾向概括为“过度模型化”:模型、识别和形式化工具本应服务于对现实问题的解释,但在一些研究中,它们本身却成为衡量论文质量和学术贡献的主要标准(陆铭,2026)。本文以这一讨论为入口,考察经济学研究如何判断模型、识别和形式化工具的解释边界。模型化是经济学理解现实的必要方式;所谓“过度”,是指模型越过了它在特定问题中能够负责的解释范围。本文提出,经济学研究若要避免这种过度,通常需要把三项要素连接起来:关于行动者如何形成目标、信念和选择方式的行为公设,规定其选择集合和收益后果的制度框架,以及对机制量级和适用边界的经验证据校准。本文首先解释这一三元框架,并说明三者失衡如何构成“过度”;随后回顾李嘉图与熊彼得、门格尔与历史学派、米塞斯与弗里德曼等经济思想史上的方法论争论,指出这一问题并非中国经济学或当代经济学的新现象,而是经济学长期面对的基本方法论问题。接着,本文讨论现代因果识别进入历史研究时引发的争议,借此说明经验工具进入既有命题时,需要面对机制解释、制度语境和知识增量的问题。最后,本文以拍卖理论和期权定价为例,说明模型复杂本身并不一定导致过度;关键在于模型是否有清楚的行为逻辑、制度框架和持续校准机制。本文最后提出,纠正“过度模型化”需要恢复行为公设、制度框架和经验校准之间的连接。
对这个问题的完整叙述需要更长篇幅。我会尽力把自己的(可能错误的)想法慢慢讲清楚。
Predicting the Predictable: Decomposing and Forecasting Stock Returns in a Data-rich Environment
Resubmitted to Management Science
We propose a new method for forecasting stock market returns in a data-rich environment: the factor-augmented sum-of-the-parts (FA-SOP) approach. Rather than predicting returns directly, FA-SOP decomposes them into three components—dividend-price ratio, earnings growth, and price-earnings ratio growth (gm)—and models each separately. We emphasize that gm is a more promising target for forecasting due to its stronger connection with macroeconomic conditions and greater variability over time. FA-SOP forecasts gm using latent macroeconomic factors extracted from high-dimensional data, capturing the underlying state of the economy while avoiding overfitting. Applied to S&P 500 returns from 1960 to 2022, FA-SOP outperforms predictive regressions, factor-augmented regressions, and traditional decomposition approaches, yielding robust out-ofsample gains in both statistical and economic terms. Simulations based on a present-value model further show that FA-SOP’s advantage stems from its ability to track the true data-generating process more closely. Our results highlight the value of decomposing returns and focusing on components that are more predictably linked to economic fundamentals.
Rather than treating return predictability as an all-or-nothing property, we argue that stock returns consist of both more predictable and less predictable components. Disentangling these components and analyzing them separately provides a clearer understanding of where and how predictability arises. For components with inherently low signal-to-noise ratios, a passive approach—such as using the historical mean—is sufficient. But for components with clearer economic meaning or stronger signals, more active and structured modeling could uncover substantial predictive value. This targeted approach allows us to focus modeling effort where it matters most, and avoid overfitting where little can be gained.
Combination Forecast of Corporate Bond Return: An Ensemble Learning Approach
Revise & Resubmit at Journal of Banking and Finance
This study employs an ensemble machine learning method, known as “Stacking," to forecast corporate bond returns. We find that the Stacking method introduces new features into combination forecasts, increasing the predictive model’s power and generating higher statistical and economic gains across bond ratings and maturities. Moreover, the method is efficient for tackling high dimensionality and achieves the best result when using predictors from corporate bond, Treasury, and stock markets jointly. While the overall performance of different Stacking models is satisfactory, simpler Stacking models appear to outperform others and generate optimal forecasts.
Combination is still what we need in time-series prediction of returns.
Direction is More Important than Speed: A Comparison of Discrete and Continuous Modeling of Stock Excess Returns
We contrast continuous magnitude estimation with discrete directional classification in equity premium prediction. Through a symmetric, single-pass out-of-sample evaluation of traditional econometric and machine learning algorithms, we document a distinct divergence in predictive performance. While continuous models struggle with low signal-to-noise ratios and generally fail to outperform the historical mean, discrete classifiers consistently reveal statistically significant predictability and generate substantial economic utility. We demonstrate that this divergence stems from the inherent sensitivity of continuous models to magnitude noise. Under continuous estimation, algorithms face a structural dilemma: they either overreact to unforecastable extreme shocks or resort to excessive shrinkage, both of which severely limit their market-timing ability during downturns. Conversely, discarding magnitude estimation acts as a form of structural regularization. This transformation frees nonlinear algorithms from extrapolating non-stationary macroeconomic trends, allowing them to utilize persistent high-frequency risk signals to execute timely market exits. Ultimately, our findings suggest that the fundamental choice of the predictive paradigm exerts a first-order impact that outweighs specific algorithmic sophistication.
It seems that predicting the value of future stock returns has been an orthodox practice in the field of asset pricing. The beautiful theory of factor model provides both explanatory and predictive implications for statistical exercises, inspiring a voluminous body of empirical literature. However, an often encountered layperson’s first question, while not naive, is: “Do you think the market will go up (bullish) or down (bearish) in the near future?”. In this paper, we systematically compare these two predictive paradigms.
Revisiting Incentive Issues in China’s Central-Local Top-Down Hierarchy
Warnings hanging over my head:
The cost of computing has dropped exponentially, but the cost of thinking is what it always was. That is why we see so many articles with so many regressions and so little thought.
– Zvi Griliches
If you torture the data long enough, it will confess.
– Ronald Coase