Is science a strong-link problem and peer review counterproductive? A pledge for community-driven evaluation of research quality and impact

Is science a strong-link problem and peer review counterproductive? A pledge for community-driven evaluation of research quality and impact

In a thought-provoking blog post, Adam Mastroianni (Columbia Business School) recently stated: “There are two kinds of problems in the world: strong-link problems and weak-link problems.” For weak-link problems, “the overall quality depends on how good the worst stuff is”. (1) To fix them, we need to eliminate the weakest links or make them stronger. That’s why we have strict quality standards for food. Nobody wants to die because they picked the wrong tuna sandwich off the shelf!

Science, on the other hand, is a strong-link problem, Mastroianni argues: “In the long run, the best stuff is all that matters. The bad stuff doesn’t matter at all.” (1) Peer review, on the other hand, is a massive weak-link intervention, according to Mastroianni. We all spend an incredible amount of collective time and effort trying to prevent “bad research” from being published. But does pre-publication peer review really do a good job of eliminating or improving the worst? Mastroianni argues that gatekeeping might even be counterproductive to solving strong link problems. It could “eliminate the best research, too.” (1)

One has to admit: the path of history is paved with dead scientific theories – some buried with considerable delay next to their most influential proponents – as well as some marvelous ideas, far ahead of their time, that were repeatedly rejected for funding or publication.

Building on Mastroianni’s contribution, Maxwell Tabarrok pointed out the non-linear relationship between “input quality” and “final impact”. (2) Since the top few percent of scientific projects (by “quality”) account for the majority of scientific progress, “filtering out the bottom half of the quality distribution is less important for final impact.” His conclusion: a “slightly higher average impact might fail to make up for the losses in total output compared to no peer review.” (2) Or in short: costly peer review that tries to prevent bad stuff from being published to improve the low or medium quality stuff is unlikely to be productive. We should focus our efforts on increasing the production of “high-quality stuff” with “large final impact” if we want to improve scientific progress.

Mastroianni’s list of imperatives for strong link problems (1):

  1. Increase variance (creates outliers in both directions)
  2. Don’t gatekeep (might accidentally delete the best)
  3. Ignore the worst (doesn’t matter at all, not worth the effort)
  4. Improve the best (all that matters in the long run)
  5. Accept risk (downsides don’t matter; in contrast to weak-link problems like food safety)

Unfortunately, both authors present a rather simple view of scientific progress, probably having the rare groundbreaking discoveries in physics in mind, and seem to underestimate the crucial role of reproducibility in the empirical sciences.

»Giant replication projects […] also only make sense for weak-link problems. There’s no point in picking some studies that are convenient to replicate, doing ‘em over, and reporting “only 36% of them replicate!” In a strong-link situation, most studies don’t matter.« (1)

We have a devastating reproducibility crisis in molecular medicine and translational research. This crisis is largely due to three factors: non-transparent reporting, lack of adherence to good scientific practice and the “publish or perish” paradigm. (3–5) Indeed, one might conclude that huge replication projects are an avoidable waste of time and money. (6) The unwanted result of an inefficient quality control system in the life sciences. A temporary corrective measure that shouldn’t be necessary in the first place.

Of course, one should be cautious, whenever everything is supposed to fit into one of two categories. Fun fact: The weak link vs. strong link distinction comes from the book “The Numbers Game: Why Everything You Know About Football Is Wrong” by Chris Anderson and David Sally. (7) Statistically, basketball is more like a strong-link game: the team with the best player usually wins. Low-scoring sports like soccer are weak-link games: the team without the worst player usually wins. Ironically, Mastroianni claims that “science” is a strong-link problem. That’s a bit like saying “sports” is a strong-link problem. The solutions for math or physics may be very different from those for biology. Promoting scientific progress might even be a middle-link problem for some disciplines. In my humble opinion, it’s definitely worthwhile – even essential – to improve the overall quality of biomedical research. Don’t get me wrong: pre-publication peer review isn’t supposed to do all the hard work.

Want to ignore bad quality? You’ll have to identify it first!

Lack of reproducibility is not a problem we can ignore or solve by ignoring the “bad studies”. The reason is simple: if we want to ignore them, we have to identify them first. The same is true for the “best stuff” we want to selectively reward and promote.

It’s hard, if not impossible, to judge the true impact of any single publication – even in retrospect. What are the chances of correctly predicting the actual long-term impact of a single paper on scientific progress at the time of publication? Close to zero. From a diagnostic point of view: Whenever we can’t measure something directly, we need a feasible and easily quantifiable surrogate marker that correlates with the desired outcome. We’ve chosen the “cumulative number of citations” to quantify the “impact” of a scientific publication.

A journal whose mission is to publish “groundbreaking research” needs quality standards and relies on expert judgment to predict the number of citations the article is going to accumulate over time. According to Tabarrok, journals with pre-publication peer review filter by quality, simply because “observing quality ex ante is much easier than predicting impact ex post“. (2) This is only half the story. Non-predatory journals filter on both “quality” (judged as good as possible) and “final impact” (predicted as good as possible).

It seems that the term “quality” is used in two different ways: for Mastroianni, quality is directly linked to impact – “good quality” is actually defined by “large impact”. The conclusion: promoting high-quality research promotes scientific progress. Tabarrok, on the other hand, rightly points out that impact must be predicted, while quality (i.e. quality of experimental design, data collection, interpretation, and reporting) can be assessed upon disclosure. (1, 2)

Funnily enough, while actual data quality is a prerequisite for scientific progress — fabricated or over-interpreted data are of no use to science – a well-written manuscript that convincingly presents statistically significant results will be published in a highly prestigious journal and will accumulate citations; as long as its flaws are not exposed.

There are numerous recommendations and reporting guidelines for pre-clinical in vitro research, like ARRIVE, PREPARE, and, most recently, RIVER. (8–10) The pharmaceutical industry has already lost billions of dollars betting on the wrong drug candidate, which delays the development of effective treatment option. More subtle and creeping is the damage from non-transparent reporting of basic research data that has been overinterpreted – in order to meet certain Journal Submission Requirements – and does not meet basic standards of Good Scientific Practice (GSP).

We aim […] to help ensure that researchers, reviewers, and journal editors are better equipped to improve the rigour and transparency of the scientific process and thus reproducibility.

from “The ARRIVE guidelines 2.0” (9)

We can’t wait a lifetime to measure the actual impact on scientific progress

We have to decide here and now who to support and who to provide with scarce resources such as personnel, equipment, and money. Currently, we mainly rely on a surrogate marker of a surrogate marker to make these decisions: we measure “journal impact factors” (the average number of citations per article in a journal) because they are easy to measure, and use this to predict the impact a person will have on scientific progress. Let that sink in… (11) Other more author-level metrics like the h-index are still strictly citation-based.

It should be clear by now how difficult it is to define, judge, and measure “quality” and “impact,” and that terms like “peer review” and “gatekeeping” are not synonymous. Our decisions depend on how we define “high-quality research” and “final impact”. More importantly, how do intend to identify the things we want to selectively improve or ignore; and how do we identify and appropriately reward talented people who are able to produce the “right stuff” on a regular basis?

Scientists who disclose their discovery to the scientific community must be able to claim credit for it. Actually, any valuable contribution to scientific progress should be rewarded. Even “boring” replication work, reporting “negative” results, and validating and improving the reporting of research data produced by others.

Before the Internet, disclosure and initial validation were necessarily combined in a printed journal publication. This is certainly gatekeeping – especially when the same actors who funded and produced the results are expected to pay for access. Pre-publication peer review limits variance and may delay the disclosure of intradisciplinary or non-paradigmatic research that ultimately proves valuable and impactful. (12)

The physicist Paul H. Ginsparg can proudly claim priority for the following discovery: disclosure and validation can be easily separated once documents can be stored and accessed electronically. (13) The invention of pre-print archives like arXiv.org was the end of scientific publishing as we know it. Without proper gatekeeping, crazy ideas and partial solutions could spread uncontrollably among scientists, causing chaos and mass confusion. We clearly reached doomsday as soon as biology and medicine adopted this crazy idea. (12) Seriously though, even for health-related research, we’ve already accepted the risks that come with leaving out the gatekeeping part. (14) It certainly increased variance – mainly in one direction of the quality distribution, some might argue.

The remaining challenges: “Ignoring the Worst” and “Improving the Best”

A response from Calvin McCarter nicely illustrates how the perception of “peer review” as “gatekeeping” is slowly beginning to change:

Peer review is a service to paper-writers. It is especially a service to paper-writers lacking prestigious affiliations and notable previous work — authors whose papers would go unread if they just posted on arXiv. So in this sense, peer review is a path to entry, not a barrier to entry.

Calvin McCarter (15)

From this point of view, peer review is a service to both authors and readers, providing recognition and reward for a valuable contribution, as well as guidance in the paper jungle.

What do we all normally do when looking for “quality stuff” on the Internet, where good is rare and bad is abundant? We usually ignore those with bad reviews and those with no reviews at all. A handful of anonymous reviews that collectively praise the product? Makes me suspicious, at least.

I am the one who decides if the product is of value to me: Do I really need it? Will it help me achieve my intended goal? I certainly ignore reviews from influencers who try to convince me that my life is worthless without this very product. Instead, I rely on reviews that address technical quality (Is it functional and durable?) and whether the product lives up to the manufacturer’s claims (Does it deliver what it promises?). It’s up to the companies to use these reviews to improve their products. We’re all fine as long as we don’t buy their bad stuff. In a time where quality is going down the drain, I personally refrain from using prestigious brands as a marker for quality. Garbage is produced by prestigious and non-prestigious manufacturers alike. This is true for cell phones, shoes, and publications.

What about the toxic tuna sandwich? How dangerous can a publication be? It depends on our habits and expectations. Today, literally anyone can tweet their “research” or start an “academic journal” and “publish” it. The smart ones invite their sisters and brothers in spirit to “peer review” their results. The rich and lazy simply pay for the production of fake manuscripts with fake results published in fake journals with fake peer review. We all know the absurd consequences of the current “publish or perish” paradigm. With artificial intelligence, the paper mills will even run faster and faster in the near future.

A paper is “good quality” if it tells the truth in a comprehensible way. Period.

Some manuscripts report exciting, surprising, and potentially groundbreaking results. Whether they are “good quality” mainly depends on the quality of the experimental design, statistical analysis, data interpretation and transparent disclosure – we can only judge what’s reported. (8–10)

Observing quality is much easier than predicting impact? Actually, it’s much easier for a specialist in the field to judge the excitement and surprise factor, and even to predict the breakthrough potential. It takes much, much more time and effort to properly assess its actual quality. To be fair, some publishers like CellPress made marvelous contributions to improve “structured, transparent, accessible reporting” (STAR Methods). However, “Community Review” here still means “our editors will pick the most fitting journal for your manuscript to strategically improve our own journal portfolio” instead of “the manuscript is reviewed by the Scientific Community”.

The major challenges: We need to create the right incentives, metrics and tools for the scientific community to actually review the quality and impact after disclosure – as efficiently, accurately, and forgery-proof as possible. What if 99% goes unread? Might be the right and less overdue incentive for the scientific community to produce less, but of higher quality.

How do we know if a paper is telling the truth upon disclosure? We simply don’t. Even after thorough peer review, we can only judge the validity of the experimental design, the statistical analysis, and the logical conclusions drawn from the data presented. We can check for plagiarism and may be able to tell if the numbers and Figures have been manipulated. We certainly can’t detect highly sophisticated fraud. Just as with doping, counterfeiting, or tax evasion – we only catch the lousy cheaters.

You can cheat to get a paper. You can cheat to get a degree. You can cheat to get a grant. You can’t cheat to cure a disease. Biology doesn’t care.

Matthew Schrag (Vanderbilt University) (16)

“In the long run, the best stuff is all that matters,” says Mastroianni. (1) I couldn’t agree more. “The bad stuff doesn’t matter at all.” (1) Well, the scientific community and the pharmaceutical industry are still choking on a spoiled amyloid-beta sandwich served at a five-star restaurant nearly two decades ago. (17–20) And even those few breakthrough discoveries that magically propel an entire field forward stand on the shoulders of giants, as Newton once put it. What if the giants turn out to be wimps masquerading as beefcakes?

We tend to focus on obvious misconduct such as plagiarism, fabrication, and falsification published in highly respected journals. We either point them out with glee or dismiss them as rare exceptions that prove the rule. As long as we create the wrong incentives (publish or perish!) and overemphasize brands, newsworthiness, and citation counts (reach double digit!), we will perpetuate a system that produces more tainted output than we can choke.

The current journal-based publishing system in a nutshell:

We’ve created a shelf full of boxes, sorted from high to low prestige. Now everyone has to put as many documents as possible into these boxes, which are protected by a few hand-picked experts who decide whether the documents are good enough for the box. The strategies are remarkably different: Some can rely on their education, skills, and instincts, always following the right path and hiring the best and most honest people, while slowly but surely filling the upper boxes. Others rely on connections, trickery, and deceit to reach the top shelf. Still others try to smuggle as many documents as possible into the lower, less guarded boxes. Not to mention the desperate fake box strategy. Box owners regularly check their archives and meticulously count how many times papers from their own boxes have been mentioned on papers from other boxes. Or better yet, on papers that were previously thrown into their own boxes. The reason? The higher the number, the higher the box will be placed next year! It’s easy to anticipate the strategies of box owners to maintain or improve the position of their boxes, or to fill them with as many papers as possible, depending on their business model. Documents and their producers are judged by the box they end up in, and readers judge the quality of each document by the average quality of the box contents.

Unfortunately, the essential task of quality assurance for the entire system rests on very few shoulders, since the quality and significance of empirical research results are traditionally assessed at the time of disclosure (pre-publication peer review) and quantified by a journal- and citation-based metric (impact factor). (21)

A significant number of top researchers are in open rebellion against the very system that has allowed them to succeed. (22–24) Why not more? Perhaps a form of survivorship bias. Others are concerned about the direction we are heading (25):

Today I wouldn’t get an academic job. It’s as simple as that. I don’t think I would be regarded as productive enough.

Peter Higgs (25)

Let’s interpret this blog post as a pledge for “community-driven evaluation of research quality and impact” – as an addendum to my first Editorial “Rethinking Scientific Publishing” for ScienceOpen. (26)

Instead of handing out “I-successfully-published-a-paper” award badges, we could at least try to judge, measure, and reward each valuable contribution to scientific progress actually made. We certainly can’t just ignore the bad stuff, wherever its producers managed to get it published. But we might be able to stop the very force that drives their production: the desperation to publish as many papers as possible in the most prestigious journals.


Acknowledgment

The author would like to thank Christoph Emmerich (PAASP GmbH) for his valuable comments.

REFERENCES

1.    Mastroianni A. Science is a strong-link problem: OR: How to eat fewer asparagus beetles. Experimental History Blog (2023 Apr 11). https://www.experimental-history.com/p/science-is-a-strong-link-problem

2.    Tabarrok A. Strong and Weak Link Problems and the Value of Peer Review. Marginal Revolution Blog (2023 Apr 15). https://marginalrevolution.com/marginalrevolution/2023/04/strong-and-weak-link-problems-and-the-value-of-peer-review.html

3.    Reality check on reproducibility. Nature (2016) 533:437. doi:10.1038/533437a

4.    Ioannidis JP. Why most published research findings are false. PLoS Med (2005) 2:e124. doi:10.1371/journal.pmed.0020124

5.    Begley CG, Ioannidis JP. Reproducibility in science: improving the standard for basic and preclinical research. Circ Res (2015) 116:116–26. doi:10.1161/CIRCRESAHA.114.303819

6.    Errington TM, Iorns E, Gunn W, Tan FE, Lomax J, Nosek BA. An open investigation of the reproducibility of cancer biology research. Elife (2014) 3. doi:10.7554/eLife.04333

7.    Anderson C, Sally D. The numbers game: Why everything you know about football is wrong. London: Penguin Books (2014).

8.    The RIVER working group. Reporting In Vitro Experiments Responsibly – the RIVER Recommendations (2023). https://osf.io/preprints/metaarxiv/x6aut

9.    Du Percie Sert N, Hurst V, Ahluwalia A, Alam S, Avey MT, Baker M, et al. The ARRIVE guidelines 2.0: Updated guidelines for reporting animal research. PLoS Biol (2020) 18:e3000410. doi:10.1371/journal.pbio.3000410

10.  Smith AJ, Clutton RE, Lilley E, Hansen KE, Brattelid T. PREPARE: guidelines for planning animal research and testing. Lab Anim (2018) 52:135–41. doi:10.1177/0023677217724823

11.  Thelwall M, Kousha K, Makita M, Abdoli M, Stuart E, Wilson P, et al. In which fields do higher impact journals publish higher quality articles? Scientometrics (2023) 128:3915–33. doi:10.1007/s11192-023-04735-0

12.  Vale RD, Hyman AA. Priority of discovery in the life sciences. Elife (2016) 5. doi:10.7554/eLife.16931

13.  Prof. Paul Ginsparg. Department of Physics, Cornell University, NY, United States of America. https://physics.cornell.edu/paul-ginsparg

14.  Jocelyn Kaiser. BioRxiv at 1 year: A promising start. Science (2014 Nov 11). doi:10.1126/article.53269

15.  McCarter C. Peer review worsens precision but improves recall (2022 Dec 29) https://calvinmccarter.writeas.com/peer-review-worsens-precision-but-improves-recall.

16.  Piller C. Blots on a field? Potential fabrication in research images threatens key theory of Alzheimer’s disease. Science (2022 Jul 21)

17.  Kim CK, Lee YR, Ong L, Gold M, Kalali A, Sarkar J. Alzheimer’s Disease: Key Insights from Two Decades of Clinical Trial Failures. J Alzheimers Dis (2022) 87:83–100. doi:10.3233/JAD-215699

18.  Thorp HH. Rethinking the retraction process. Science (2022) 377:793. doi:10.1126/science.ade3742

19.  Plascencia-Villa G, Perry G. Lessons from antiamyloid-β immunotherapies in Alzheimer’s disease. Handb Clin Neurol (2023) 193:267–92. doi:10.1016/B978-0-323-85555-6.00019-9

20.  Piller C. Blots on a field? Potential fabrication in research images threatens key theory of Alzheimer’s disease. Science (2022) 377:358–63. doi:10.1126/science.add9993

21.  Alwine JC, Enquist LW, Dermody TS, Goodrum F. What Is the Price of Science? mBio (2021) 12. doi:10.1128/mbio.00117-21

22.  Callaway E. Beat it, impact factor! Publishing elite turns against controversial metric. Nature (2016) 535:210–1. doi:10.1038/nature.2016.20224

23.  Sample I. Nobel winner declares boycott of top science journals. The Guardian (2013 Dec 09). https://www.theguardian.com/science/2013/dec/09/nobel-winner-boycott-science-journals

24.  Gowers T. The Cost of Knowledge. http://thecostofknowledge.com

25.  Aitkenhead D. Peter Higgs: I wouldn’t be productive enough for today’s academic system. The Guardian (2013 Dec 06). https://www.theguardian.com/science/2013/dec/06/peter-higgs-boson-academic-system

26.  Alers S. Rethinking Scientific Publishing.ScienceOpen Research (2015). doi:10.14293/S2199-1006.1.sor-uncat.e073g

Image: Weak link by Yuri Samoilov, Flickr, CC BY 2.0 (https://flic.kr/p/2j4cRrd)

Leave a Reply

Your email address will not be published. Required fields are marked *