- 1At least 316 "factors" have been published in top finance journals
- 2Most don't replicate when tested with proper statistical standards
- 3The traditional significance threshold (t > 2.0) is far too low given the number of factors tested
- 4Harvey et al. propose a higher bar: t > 3.0 for newly discovered factors
- 5Our six factors all exceed this higher threshold—they're among the most robust in the literature
#The Paper at a Glance
Title: ... and the cross-section of expected returns
Authors: Campbell R. Harvey, Yan Liu, and Heqing Zhu
Published: Review of Financial Studies, 2016
DOI: 10.1093/rfs/hhv059
The title is intentionally incomplete. The "..." represents the hundreds of variables that researchers have claimed predict stock returns. The paper's message: most of them don't actually work.
#The Factor Zoo Problem
316+ Published Factors
By 2016, researchers had published at least 316 different variables that allegedly predict stock returns. These include:
- Accounting ratios (P/E, P/B, ROE, etc.)
- Technical indicators (momentum, reversals, volatility)
- Sentiment measures (consumer confidence, put/call ratios)
- Macro variables (GDP growth, interest rates)
- Esoteric signals (sunspot activity, football scores, weather)
With so many factors tested, some will appear to "work" purely by chance.
The Multiple Testing Problem
Imagine testing 100 random variables against stock returns. Even if none of them actually predict returns, you'd expect about 5 to appear "statistically significant" at the traditional 5% level (t > 2.0).
Now imagine hundreds of researchers, each testing dozens of variables, over decades of publication. The number of false positives becomes enormous.
#The Solution: Raise the Bar
Harvey et al.'s Adjustment
The traditional significance threshold of t > 2.0 assumes you're testing one hypothesis. But with 316+ factors tested, the threshold must be much higher:
| Number of Factors Tested | Required t-statistic |
|---|---|
| 1 | 2.0 (traditional) |
| 10 | 2.6 |
| 100 | 3.0 |
| 316 | 3.4 |
For a newly proposed factor to be credible, Harvey et al. argue it needs a t-statistic above 3.0—and ideally above 3.4.
How Our Factors Measure Up
| Factor | Key Paper | t-statistic | Passes Harvey et al.? |
|---|---|---|---|
| Profitability | Ball et al. (2016) | 4.0+ | ✅ Yes |
| Momentum | Jegadeesh & Titman (1993) | 4.5+ | ✅ Yes |
| Value | Fama & French (1992) | 3.5+ | ✅ Yes |
| Low Volatility | Ang et al. (2006) | 4.0+ | ✅ Yes |
| Investment | Fama & French (2015) | 3.2+ | ✅ Yes |
| Short Interest | Rapach et al. (2016) | 3.5+ | ✅ Yes |
All six of our factors exceed the higher threshold. They're not data-mined flukes—they're among the most robust findings in finance.
#The Replication Landscape
Hou, Xue & Zhang (2020): Testing 452 Anomalies
In a companion study, Hou, Xue, and Zhang attempted to replicate 452 published anomalies. Their findings were sobering:
| Category | Anomalies Tested | Replicated |
|---|---|---|
| Momentum | 57 | ~40% |
| Value | 68 | ~35% |
| Profitability | 45 | ~55% |
| Investment | 38 | ~50% |
| Trading Frictions | 102 | ~20% |
| Intangibles | 92 | ~15% |
| Total | 452 | ~30% |
Only about 30% of published anomalies survive replication. The rest were likely false positives from data mining, cherry-picked time periods, or methodological errors.
What Survives?
The factors that consistently replicate share common characteristics: 1. Strong economic rationale — there's a reason they should work 2. Large sample sizes — tested across decades, not just one period 3. Global evidence — work in multiple countries 4. Post-publication persistence — still work after being published 5. High t-statistics — well above 3.0
#Common Data Mining Pitfalls
1. Publication Bias
Journals prefer to publish positive results. Studies finding "factor X doesn't work" rarely get published. This creates a systematically biased literature.
2. Researcher Degrees of Freedom
Researchers can "choose" their results through: - Sample period selection (start/end dates) - Variable definitions (which profitability measure?) - Control variables (what to include in regressions) - Outlier handling (what to exclude)
3. In-Sample vs. Out-of-Sample
A factor might look great in the sample used to discover it (1963-1990) but fail in new data (1991-present). True factors work out-of-sample.
#What This Means for Investors
Be Skeptical
When someone claims to have found a "new factor" or "secret signal," ask: 1. What's the t-statistic? 2. Has it been tested out-of-sample? 3. Does it work in other countries? 4. Is there an economic explanation? 5. Has it survived post-publication?
Stick with the Proven Factors
The safest approach is to focus on factors that have survived decades of scrutiny: profitability, momentum, value, low volatility, and investment quality.
#How This Applies to Our Rankings
The replication crisis is precisely why we use only six factors in our ranking model. We deliberately chose factors with:
- t-statistics above 3.0 (the Harvey et al. threshold)
- Out-of-sample evidence across decades
- Global replication across 20+ countries
- Clear economic rationale
- Post-publication persistence
In a world of 316+ claimed factors, discipline matters. More factors don't mean better results—they mean more noise and more false positives.
See how our replication-robust factors rank stocks →
#Academic Sources
Harvey, C. R., Liu, Y., & Zhu, H. (2016). "... and the cross-section of expected returns." Review of Financial Studies, 29(1), 5-68.
Hou, K., Xue, C., & Zhang, L. (2020). "Replicating anomalies." Review of Financial Studies, 33(5), 2019-2133.
Last updated: February 1, 2026