Harvey, Liu & Zhu (2016): The Replication Crisis in Finance

Q: The Paper at a Glance

Title: ... and the cross-section of expected returns Authors: Campbell R. Harvey, Yan Liu, and Heqing Zhu Published: *Review of Financial Studies*, 2016

Key Takeaways

1At least 316 "factors" have been published in top finance journals
2Most don't replicate when tested with proper statistical standards
3The traditional significance threshold (t > 2.0) is far too low given the number of factors tested
4Harvey et al. propose a higher bar: t > 3.0 for newly discovered factors
5Our six factors all exceed this higher threshold—they're among the most robust in the literature

#The Paper at a Glance

Title: ... and the cross-section of expected returns

Authors: Campbell R. Harvey, Yan Liu, and Heqing Zhu

Published: Review of Financial Studies, 2016

DOI: 10.1093/rfs/hhv059

The title is intentionally incomplete. The "..." represents the hundreds of variables that researchers have claimed predict stock returns. The paper's message: most of them don't actually work.

#The Factor Zoo Problem

316+ Published Factors

By 2016, researchers had published at least 316 different variables that allegedly predict stock returns. These include:

Accounting ratios (P/E, P/B, ROE, etc.)
Technical indicators (momentum, reversals, volatility)
Sentiment measures (consumer confidence, put/call ratios)
Macro variables (GDP growth, interest rates)
Esoteric signals (sunspot activity, football scores, weather)

With so many factors tested, some will appear to "work" purely by chance.

The Multiple Testing Problem

Imagine testing 100 random variables against stock returns. Even if none of them actually predict returns, you'd expect about 5 to appear "statistically significant" at the traditional 5% level (t > 2.0).

Now imagine hundreds of researchers, each testing dozens of variables, over decades of publication. The number of false positives becomes enormous.

#The Solution: Raise the Bar

Harvey et al.'s Adjustment

The traditional significance threshold of t > 2.0 assumes you're testing one hypothesis. But with 316+ factors tested, the threshold must be much higher:

Number of Factors Tested	Required t-statistic
1	2.0 (traditional)
10	2.6
100	3.0
316	3.4

For a newly proposed factor to be credible, Harvey et al. argue it needs a t-statistic above 3.0—and ideally above 3.4.

How Our Factors Measure Up

Factor	Key Paper	t-statistic	Passes Harvey et al.?
Profitability	Ball et al. (2016)	4.0+	✅ Yes
Momentum	Jegadeesh & Titman (1993)	4.5+	✅ Yes
Value	Fama & French (1992)	3.5+	✅ Yes
Low Volatility	Ang et al. (2006)	4.0+	✅ Yes
Investment	Fama & French (2015)	3.2+	✅ Yes
Short Interest	Rapach et al. (2016)	3.5+	✅ Yes

All six of our factors exceed the higher threshold. They're not data-mined flukes—they're among the most robust findings in finance.

#The Replication Landscape

Hou, Xue & Zhang (2020): Testing 452 Anomalies

In a companion study, Hou, Xue, and Zhang attempted to replicate 452 published anomalies. Their findings were sobering:

Category	Anomalies Tested	Replicated
Momentum	57	~40%
Value	68	~35%
Profitability	45	~55%
Investment	38	~50%
Trading Frictions	102	~20%
Intangibles	92	~15%
Total	452	~30%

Only about 30% of published anomalies survive replication. The rest were likely false positives from data mining, cherry-picked time periods, or methodological errors.

What Survives?

The factors that consistently replicate share common characteristics: 1. Strong economic rationale — there's a reason they should work 2. Large sample sizes — tested across decades, not just one period 3. Global evidence — work in multiple countries 4. Post-publication persistence — still work after being published 5. High t-statistics — well above 3.0

#Common Data Mining Pitfalls

1. Publication Bias

Journals prefer to publish positive results. Studies finding "factor X doesn't work" rarely get published. This creates a systematically biased literature.

2. Researcher Degrees of Freedom

Researchers can "choose" their results through: - Sample period selection (start/end dates) - Variable definitions (which profitability measure?) - Control variables (what to include in regressions) - Outlier handling (what to exclude)

3. In-Sample vs. Out-of-Sample

A factor might look great in the sample used to discover it (1963-1990) but fail in new data (1991-present). True factors work out-of-sample.

#What This Means for Investors

Be Skeptical

When someone claims to have found a "new factor" or "secret signal," ask: 1. What's the t-statistic? 2. Has it been tested out-of-sample? 3. Does it work in other countries? 4. Is there an economic explanation? 5. Has it survived post-publication?

Stick with the Proven Factors

The safest approach is to focus on factors that have survived decades of scrutiny: profitability, momentum, value, low volatility, and investment quality.

#How This Applies to Our Rankings

The replication crisis is precisely why we use only six factors in our ranking model. We deliberately chose factors with:

t-statistics above 3.0 (the Harvey et al. threshold)
Out-of-sample evidence across decades
Global replication across 20+ countries
Clear economic rationale
Post-publication persistence

In a world of 316+ claimed factors, discipline matters. More factors don't mean better results—they mean more noise and more false positives.

See how our replication-robust factors rank stocks →

#Academic Sources

Harvey, C. R., Liu, Y., & Zhu, H. (2016). "... and the cross-section of expected returns." Review of Financial Studies, 29(1), 5-68.

Hou, K., Xue, C., & Zhang, L. (2020). "Replicating anomalies." Review of Financial Studies, 33(5), 2019-2133.

Last updated: February 1, 2026

Key Takeaways

1At least 316 "factors" have been published in top finance journals
2Most don't replicate when tested with proper statistical standards
3The traditional significance threshold (t > 2.0) is far too low given the number of factors tested
4Harvey et al. propose a higher bar: t > 3.0 for newly discovered factors
5Our six factors all exceed this higher threshold—they're among the most robust in the literature

#The Paper at a Glance

Title: ... and the cross-section of expected returns

Authors: Campbell R. Harvey, Yan Liu, and Heqing Zhu

Published: Review of Financial Studies, 2016

DOI: 10.1093/rfs/hhv059

The title is intentionally incomplete. The "..." represents the hundreds of variables that researchers have claimed predict stock returns. The paper's message: most of them don't actually work.

#The Factor Zoo Problem

316+ Published Factors

By 2016, researchers had published at least 316 different variables that allegedly predict stock returns. These include:

Accounting ratios (P/E, P/B, ROE, etc.)
Technical indicators (momentum, reversals, volatility)
Sentiment measures (consumer confidence, put/call ratios)
Macro variables (GDP growth, interest rates)
Esoteric signals (sunspot activity, football scores, weather)

With so many factors tested, some will appear to "work" purely by chance.

The Multiple Testing Problem

Now imagine hundreds of researchers, each testing dozens of variables, over decades of publication. The number of false positives becomes enormous.

#The Solution: Raise the Bar

Harvey et al.'s Adjustment

The traditional significance threshold of t > 2.0 assumes you're testing one hypothesis. But with 316+ factors tested, the threshold must be much higher:

Number of Factors Tested	Required t-statistic
1	2.0 (traditional)
10	2.6
100	3.0
316	3.4

For a newly proposed factor to be credible, Harvey et al. argue it needs a t-statistic above 3.0—and ideally above 3.4.

How Our Factors Measure Up

Factor	Key Paper	t-statistic	Passes Harvey et al.?
Profitability	Ball et al. (2016)	4.0+	✅ Yes
Momentum	Jegadeesh & Titman (1993)	4.5+	✅ Yes
Value	Fama & French (1992)	3.5+	✅ Yes
Low Volatility	Ang et al. (2006)	4.0+	✅ Yes
Investment	Fama & French (2015)	3.2+	✅ Yes
Short Interest	Rapach et al. (2016)	3.5+	✅ Yes

All six of our factors exceed the higher threshold. They're not data-mined flukes—they're among the most robust findings in finance.

#The Replication Landscape

Hou, Xue & Zhang (2020): Testing 452 Anomalies

In a companion study, Hou, Xue, and Zhang attempted to replicate 452 published anomalies. Their findings were sobering:

Category	Anomalies Tested	Replicated
Momentum	57	~40%
Value	68	~35%
Profitability	45	~55%
Investment	38	~50%
Trading Frictions	102	~20%
Intangibles	92	~15%
Total	452	~30%

Only about 30% of published anomalies survive replication. The rest were likely false positives from data mining, cherry-picked time periods, or methodological errors.

What Survives?

#Common Data Mining Pitfalls

1. Publication Bias

Journals prefer to publish positive results. Studies finding "factor X doesn't work" rarely get published. This creates a systematically biased literature.

2. Researcher Degrees of Freedom

3. In-Sample vs. Out-of-Sample

A factor might look great in the sample used to discover it (1963-1990) but fail in new data (1991-present). True factors work out-of-sample.

#What This Means for Investors

Be Skeptical

Stick with the Proven Factors

The safest approach is to focus on factors that have survived decades of scrutiny: profitability, momentum, value, low volatility, and investment quality.

#How This Applies to Our Rankings

The replication crisis is precisely why we use only six factors in our ranking model. We deliberately chose factors with:

t-statistics above 3.0 (the Harvey et al. threshold)
Out-of-sample evidence across decades
Global replication across 20+ countries
Clear economic rationale
Post-publication persistence

In a world of 316+ claimed factors, discipline matters. More factors don't mean better results—they mean more noise and more false positives.

See how our replication-robust factors rank stocks →

#Academic Sources

Harvey, C. R., Liu, Y., & Zhu, H. (2016). "... and the cross-section of expected returns." Review of Financial Studies, 29(1), 5-68.

Hou, K., Xue, C., & Zhang, L. (2020). "Replicating anomalies." Review of Financial Studies, 33(5), 2019-2133.

Last updated: February 1, 2026

Harvey, Liu & Zhu (2016): The Replication Crisis in Finance

#The Paper at a Glance

#The Factor Zoo Problem

316+ Published Factors

The Multiple Testing Problem

#The Solution: Raise the Bar

Harvey et al.'s Adjustment

How Our Factors Measure Up

#The Replication Landscape

Hou, Xue & Zhang (2020): Testing 452 Anomalies

What Survives?

#Common Data Mining Pitfalls

1. Publication Bias

2. Researcher Degrees of Freedom

3. In-Sample vs. Out-of-Sample

#What This Means for Investors

Be Skeptical

Stick with the Proven Factors

#How This Applies to Our Rankings

#Academic Sources

Continue Reading

Carhart (1997): The Four-Factor Model That Changed Fund Analysis

Grinold (1989): The Fundamental Law of Active Management

Asness (2016): Why Factor Timing Is a Siren Song

Stay ahead of the market

Ready to Explore the Rankings?

Harvey, Liu & Zhu (2016): The Replication Crisis in Finance

#The Paper at a Glance

#The Factor Zoo Problem

316+ Published Factors

The Multiple Testing Problem

#The Solution: Raise the Bar

Harvey et al.'s Adjustment

How Our Factors Measure Up

#The Replication Landscape

Hou, Xue & Zhang (2020): Testing 452 Anomalies

What Survives?

#Common Data Mining Pitfalls

1. Publication Bias

2. Researcher Degrees of Freedom

3. In-Sample vs. Out-of-Sample

#What This Means for Investors

Be Skeptical

Stick with the Proven Factors

#How This Applies to Our Rankings

#Academic Sources

Continue Reading

Carhart (1997): The Four-Factor Model That Changed Fund Analysis

Grinold (1989): The Fundamental Law of Active Management

Asness (2016): Why Factor Timing Is a Siren Song

Stay ahead of the market

Ready to Explore the Rankings?