Topics that always interest me: political (institutional) economics; methodological issues (science of science); machine learning and deep learning; asset pricing.
Publications
-
Does Securities Regulation Matter? Mandatory Disclosure, Excess Stock Volatility and the U.S. 1934 Securities Exchange Act
forthcoming, Journal of Law and Economics
We examine whether the U.S. Securities Exchange Act of 1934 significantly stabilized the market by introducing mandatory disclosure of information. We argue that mandatory information disclosure can curb stock manipulation by enhancing transparency, thereby reducing excess stock volatility. After a comprehensive assessment of the voluntary disclosure practices of NYSE-listed companies before 1934, we group the companies and find that those with poor disclosure practices experienced a significantly greater reduction in volatility after the implementation of the 1934 Act compared to those with good disclosure practices. Further analysis reveals that the liquidity of these poorly disclosing companies also improved significantly more than that of the better disclosing companies, and the improvement in liquidity was linked to the decrease in their volatility. Given that one of the key intentions of the legislators was to reduce excess market volatility through the Act, our findings provide empirical support for this legislative intent.
A simple but important question which is somewhat hard to clarify.
-
Is Machine Learning a Necessity? A Regression-based Approach for Stock Return Prediction
Journal of Empirical Finance 2025
We propose a simple, linear-regression-based method for prediction of the time series of stock returns. The method achieves out-of-sample performances comparable to machine learning methods while having ignorable computational costs. The key component of the method is to integrate a straightforward cross-market factor screening into the iterated combination method proposed by Lin et al., (2018). Our empirical results on the U.S. stock market show that the method outperforms many state-of-the-art machine learning methods in certain periods. The method also exhibits greater utility gain and investment profits in most periods after considering transaction costs.
I have some thoughts on the predictability of the timeseries of stock returns. They are shared in the Introduction.
-
Complete Subset Averaging Methods in Corporate Bond Return Prediction
Finance Research Letters 2023
We investigate the performances of two methods of complete subset averaging—complete subset linear averaging (CSLA) and complete subset quantile averaging (CSQA)—on the problem of corporate bond return prediction. We find that the two methods are overwhelmingly better than univariate linear regression and simple forecast combination. Meanwhile, CSQA is better than CSLA in most cases. For practical implementation, we also provide discussions on the selection of the hyperparameter k when applying these complete subset averaging methods.
-
Stock Return Prediction: Stacking a Variety of Models
Journal of Empirical Finance 2022
We employ an ensemble learning approach, “stacking”, to refine and combine a variety of linear and nonlinear individual stock return prediction models. In an application of forecasting U.S. market excess return, stacking with a simple structure can outperform the traditional historical mean benchmark, Mallows model averaging, simple combination forecast, complete subset regression, combination elastic net forecast, and several other models in terms of both in- and out-of-sample performance measures on a consistent basis. More importantly, we find that the out-of-sample gains of stacking are especially evident during extreme downside market movements. Overall, stacking can generate substantive improvements in market excess return predictability.
I was expecting an even better outcome. It may still be improved, though.
-
The Impact of COVID-19 Pandemic on the Volatility Connectedness Network of Global Stock Market
Pacific-Basin Finance Journal 2022
This paper investigates how the COVID-19 pandemic affects the connectedness network of stock market volatility in 19 economies around the world. Our method builds on the Diebold-Yilmaz volatility network model to construct the volatility spillover index, and uses lag sparse group LASSO to accommodate the high-dimensional system. We find that the outbreak of the COVID-19 pandemic strengthens the overall volatility connectedness, and the global connectedness level remains high throughout 2020. In particular, connections across different continents have become stronger during this period. However, China is shown to be disconnected from the global volatility connectedness network until late November 2020. We find evidence that China is not the main source of volatility spillover during the COVID-19 pandemic.
Working papers
-
Empirical Asset Pricing via Deep Sequential Learning: An Exploration
Albert Bo Zhao, and Peiyi Zhao
Since Gu, Kelly, and Xiu (2020), empirical asset pricing has increasingly adopted machine learning models. While neural networks have demonstrated superior performance across various studies, the literature remains largely focused on feedforward architectures, which are limited in modeling temporal dependencies. This paper explores the applicability of deep sequential models—specifically RNNs, LSTMs, and GRUs—for excess return prediction in the U.S. stock market from 1990 to 2021. We compare these models to standard neural networks under different time windows, factor horizons, and sampling strategies. Out-of-sample evaluation and alpha tests relative to the Fama-French five-factor model reveal that sequential models better capture nonlinear patterns and uncover pricing anomalies not explained by traditional factors. However, we also find a disconnect between predictive accuracy and portfolio performance, highlighting the role of model complexity, noise, and transaction costs. Our results suggest that dynamic, nonlinear augmentations to factor models can enhance return forecasting under appropriate conditions.
-
Predicting the Predictable: Decomposing and Forecasting Stock Returns in a Data-rich Environment
We propose a new method for forecasting stock market returns in a data-rich environment: the factor-augmented sum-of-the-parts (FA-SOP) approach. Rather than predicting returns directly, FA-SOP decomposes them into three components—dividend-price ratio, earnings growth, and price-earnings ratio growth (gm)—and models each separately. We emphasize that gm is a more promising target for forecasting due to its stronger connection with macroeconomic conditions and greater variability over time. FA-SOP forecasts gm using latent macroeconomic factors extracted from high-dimensional data, capturing the underlying state of the economy while avoiding overfitting. Applied to S&P 500 returns from 1960 to 2022, FA-SOP outperforms predictive regressions, factor-augmented regressions, and traditional decomposition approaches, yielding robust out-ofsample gains in both statistical and economic terms. Simulations based on a present-value model further show that FA-SOP’s advantage stems from its ability to track the true data-generating process more closely. Our results highlight the value of decomposing returns and focusing on components that are more predictably linked to economic fundamentals.
Rather than treating return predictability as an all-or-nothing property, we argue that stock returns consist of both more predictable and less predictable components. Disentangling these components and analyzing them separately provides a clearer understanding of where and how predictability arises. For components with inherently low signal-to-noise ratios, a passive approach—such as using the historical mean—is sufficient. But for components with clearer economic meaning or stronger signals, more active and structured modeling could uncover substantial predictive value. This targeted approach allows us to focus modeling effort where it matters most, and avoid overfitting where little can be gained.
-
Direction is more important than speed: A comparison of direction and value prediction of stock returns
A major research topic in asset pricing is predicting the value of stock excess returns. We examine a seemingly simpler and yet less explored problem—–predicting the direction. Theoretically, mechanisms such as the Campbell-Shiller identity and volatility clustering can support direction predictability. Using various established predictors from value prediction literature, we compare linear, regularized linear, machine learning, and combination models across both tasks. When shifting from value to direction prediction, models achieve higher accuracy and yield greater economic gains, mainly because of their stronger ability to predict market downturns. Consistent with the value prediction literature, machine learning and combination methods generally outperform simpler models in direction prediction as well. While most models perform better when incorporating the full set of predictors, direction prediction with a limited set of predictors can still rival value prediction using a comprehensive set of predictors. Moreover, blending value and direction strategies outperforms value strategies but does not surpass direction-only results. We also find that the returns of direction strategies can explain the returns of value strategies, but not vice versa.
It seems that predicting the value of future stock returns has been an orthodox practice in the field of asset pricing since at least Sharpe (1964) and Ross (1976). The beautiful theory of factor model provides both explanatory and predictive implications for statistical exercises, inspiring a voluminous body of empirical literature. However, an often encountered layperson’s first question, while not naive, is: “Do you think the market will go up (bullish) or down (bearish) in the near future?”. In this paper we examine this question through the lens of empirical asset pricing and machine learning.
-
Corporate Bond Return Prediction: An Ensemble Learning Approach
Using an ensemble machine learning method “Stacking” to forecast returns, we find that corporate bond returns can be predicted by a comprehensive set of predictors from corporate bond, Treasury, and stock markets. By introducing new features into combination forecasts, the Stacking method increases the power of the predictive model and generates higher statistical and economic gains across bond ratings and maturities. The method is efficient for tackling high dimensionality and achieves the best result when using all predictors jointly. While the overall performance of different Stacking models is satisfactory, simpler Stacking models appear to outperform others and generate optimal forecasts.
Combination is still what we need in time-series prediction of returns.
-
Revisiting Incentive Issues in China’s Central-Local Top-Down Hierarchy
We develop a general principal-agent model with multiple agents and multiple tasks to ... (Rest omitted 😶🌫️)
This model may be regarded as a general analytical framework for examining institutional structures.
Warnings hanging over my head:
The cost of computing has dropped exponentially, but the cost of thinking is what it always was. That is why we see so many articles with so many regressions and so little thought.
– Zvi Griliches
If you torture the data long enough, it will confess.
– Ronald Coase