NIH Grant Scores Are Poor Predictors Of Scientific Impact
The most important federal funding mechanism for biomedical research in the United States is the R01 grant proposal submitted to the National Institutes of Health (NIH). Most scientists submitting R01 proposals request around $250,000 per year for 5 years. This may sound like a lot of money, but these requested funds have to pay for the salaries of the research staff including the salary of the principal investigator. The money that is left over once the salaries are subtracted has to cover the costs of new scientific equipment, maintenance contracts for existing equipment, monthly expenses for research reagents such as chemicals, cell lines, cell culture media and molecular biology assay kits, housing animals, user fees for research core facilities….. basically a very long list of expenditures. Universities that submit the grant proposals to the NIH add on their own “indirect costs” to pay for general expenses such as maintaining the building and providing general administrative support, but the researchers and their laboratories rarely receive any of these “indirect costs”.
Instead, the investigators who receive a notification that their R01 proposals have been awarded often find out that the NIH has reduced the requested money by either cutting the annual budget or by shortening the funding period from 5 years to 4 years. They then have to decide how to ensure that their laboratory will survive with the reduced funding, how they can ensure that nobody is forced to lose their jobs and that the research can be conducted under these financial constraints without compromising its scientific rigor. These scientists are the lucky ones, because the vast majority of the R01 proposals do not get funded. And the lack of R01 funding in recent years has forced many scientists to shut down their research laboratories.
When an R01 proposal is submitted to the NIH, it is assigned to one of its institutes such as the NHLBI (National Heart Lung and Blood Institute) or the NCI (National Cancer Institute) depending on the main research focus. Each institute of the NIH is allotted a certain budget for funding extramural applicants, so the institute assignment plays an important role in determining whether or not there is money available to fund the proposal. In addition to the institute assignment, each proposal is also assigned to a panel of expert peer reviewers, so called “study sections”. The study section members are active scientists who review the grant proposals and rank them by assigning scores to each grant. The grant proposals describe experiments that the respective applicants plan to conduct during the next five years. The study section members try to identify grant proposals that describe research which will have the highest impact on the field. They also have to take into account that the proposed work is based on solid preliminary data, that it will yield meaningful results even if the scientific hypotheses of the applicants turn out to be wrong and that the applicants have the necessary expertise and resources to conduct the work.
Identifying the grants that fall in the lower half of the rank list is not too difficult, because study section members can easily spot the grants which present a disorganized plan and rationale for the proposed experiments. But it becomes very challenging to discriminate between grants in the top half. Some study section members may think that a grant is outstanding (e.g. belongs in the top 10th percentile) whereas others may think that it is just good (e.g. belongs in the top 33rd percentile). After the study section members review each other’s critiques of the discussed grant, they usually come to a consensus, but everyone is aware of the difficulties of making such assessments. The very nature of research is the unpredictability of its path. It is impossible to make an objective assessment of the impact of a proposed five-year scientific project because a lot can happen during those five years. For example, nowadays one comes across many grant applications that propose to use the CRISPR genome editing tool to genetically modify cells. This technique has only become broadly available during the last 1-2 years and is quite exciting but we also do not know much about potential pitfalls of the approach. Some study section members are bound to be impressed by applicants who want to use this cutting-edge genome editing technique and rank their proposal highly, whereas other study section members may find this approach too premature. Small differences in the subjective assessments of the potential impact between study section members can result in a grant proposal receiving a 10th percentile score versus a 19th percentile score.
Ten or fifteen years ago, this difference in the percentile score would not have been too tragic because the NIH was funding more than 30% of the submitted research grant applications, but now the success rate has dropped down to 17%! Therefore, the subjective assessment of whether a grant deserves a 10th percentile versus a 19th percentile research impact score can determine whether or not the grant will be funded. This determination in turn will have a major impact on the personal lives and careers of the graduate students, postdoctoral fellows, research assistants and principal investigators who may depend on the funding of the submitted grant in order to keep their jobs and their laboratory running. It would be reassuring to know that the score assigned to a grant application is at least a good prognostic indicator of how much of a scientific impact the proposed research will have. It never feels good to deny research funding to a laboratory, but we also have a duty to fund the best research. If there was indeed a clear association between grant score and future impact, one could at least take solace in the fact that grant applications which received poor scores would have really not resulted in meaningful research.
A recent paper published in Circulation Research, a major cardiovascular research journal, challenges the assumption that the scores a grant application receives can reliably predict the future impact of the research. In the study “Percentile Ranking and Citation Impact of a Large Cohort of NHLBI-Funded Cardiovascular R01 Grants” by Danthi and colleagues, researchers at the National Heart Lung and Blood Institute (NHLBI) reviewed the percentile ranking scores of 1,492 R01 grant applications assigned to the NHLBI as well as the productivity of the funded grants. They assessed grants funded 2001-2008 and the scientific publications ensuing as a result of the funding. Their basic finding is that there is no obvious correlation between the percentile score and the scientific impact, as assessed by the number of publications as well as the number of citations each publication received. The funded R01 grant applications were divided in three categories: Category 1= <10.0 % (i.e. the cream of the crop); Category 2 = 10.0 – 19.9% (i.e. pretty darn good) and Category 3 = 20.0 – 41.8% (good but not stellar). The median number of publications was 8.0 for Category 1, 8.0 for Category 2 and 8.5 for Category 3. This means that even though category 3 grants were deemed to be of significantly worse quality or impact than Category 1 applications, they resulted in just as many scientific publications. But what about the quality of the publications? Did the poorly scored Category 3 grant applications fund research that was of little impact? No, the scientific impact as assessed by citations of the published papers was the same for no matter how the grant applications had been ranked. In fact, the poorly scored grants (Category 3 grants) received less funding but still produced the same amount of publications and citations of the published research as their highly scored Category 1 counterparts.
There are few important limitations to this study. The scientific impact was measured as number of publications and number of citations, which are notoriously poor measures of impact. For example, a controversial paper may be refuted but if it is frequently cited in the context of the refutation, it would be considered “high impact”. Another limitation was the assessment of shared funding. In each category, the median number of grants acknowledged in a paper was 2.5. Because a single paper often involves the collaboration of multiple scientists, the collaborative papers routinely acknowledge all the research funding which contributed to the publication. In order to correct for this, the study adjusted the counts for publications and citations by dividing by the number of acknowledged grants. For example, if a paper cited three grants and garnered 30 citations, each grant would be credited with only a third of a publication (0.3333…) and with 10 citations. This is a rather crude method because it does not take into account that some papers are primarily funded by one grant and other grants may have just provided minor support. It is also not clear from the methodology how the study accounted for funding from other government agencies (such as other NIH institutes or funding from the Department of Veterans Affairs). However, it is noteworthy that when they analyzed the papers that were only funded by one grant, they still found no difference in the productivity of the three categories of percentile scores. The current study only focused on NHLBI grants (cardiovascular, lung and blood research) so it is not clear whether these findings can be generalized to all NIH grants. A fascinating question that was also not addressed by the study is why the Category 3 grants received the lower score. Did the study section reviewers feel that the applicants were proposing research that was too high-risk? Were the grant applicants unable to formulate their ideas in a cogent fashion? Performing such analyses would require reviewing the study sections’ summary statements for each grant but this cumbersome analysis would be helpful in understanding how we can reform the grant review process.
The results of this study are sobering because they remind us of how bad we are at predicting the future impact of research when we review grant applications. The other important take-home message is that we are currently losing out on quite a bit of important research because the NIH does not receive adequate funding. Back in the years 2001-2008, it was still possible to receive grant funding for grants in Category 3 (percentile ranking 20.0 – 41.8%). However, the NIH budget has remained more or less flat or even suffered major cuts (for example during the sequester) despite the fact that the cost of biomedical research continues to rise and many more investigators are now submitting grant applications to sustain their research laboratories. In the current funding environment, the majority of the Category 3 grants would not be funded despite the fact that they were just as productive as Category 1 grants. By maintaining the current low level of NIH funding, many laboratories will not receive the critical funding they need to conduct cutting edge biomedical research, some of which could have far greater impact than the research conducted by researchers receiving high scores.
Going forward, we need to devise new ways of assessing the quality of research grants to identify the most meritorious grant applications, but we also need to recognize that the NIH is in dire need of a major increase in its annual budget.
Narasimhan Danthi, Colin O Wu, Peibei Shi, & Michael S Lauer (2014). Percentile Ranking and Citation Impact of a Large Cohort of NHLBI-Funded Cardiovascular R01 Grants Circulation Research