NIH Grant Scores Are Poor Predictors Of Scientific Impact

10 January 2014 by Jalees Rehman, posted in funding, grants, nih, peer review, research

The most important federal funding mechanism for biomedical research in the United States is the R01 grant proposal submitted to the National Institutes of Health (NIH). Most scientists submitting R01 proposals request around $250,000 per year for 5 years. This may sound like a lot of money, but these requested funds have to pay for the salaries of the research staff including the salary of the principal investigator. The money that is left over once the salaries are subtracted has to cover the costs of new scientific equipment, maintenance contracts for existing equipment, monthly expenses for research reagents such as chemicals, cell lines, cell culture media and molecular biology assay kits, housing animals, user fees for research core facilities….. basically a very long list of expenditures. Universities that submit the grant proposals to the NIH add on their own “indirect costs” to pay for general expenses such as maintaining the building and providing general administrative support, but the researchers and their laboratories rarely receive any of these “indirect costs”.

Instead, the investigators who receive a notification that their R01 proposals have been awarded often find out that the NIH has reduced the requested money by either cutting the annual budget or by shortening the funding period from 5 years to 4 years. They then have to decide how to ensure that their laboratory will survive with the reduced funding, how they can ensure that nobody is forced to lose their jobs and that the research can be conducted under these financial constraints without compromising its scientific rigor. These scientists are the lucky ones, because the vast majority of the R01 proposals do not get funded. And the lack of R01 funding in recent years has forced many scientists to shut down their research laboratories.

annual-report-203762_640

When an R01 proposal is submitted to the NIH, it is assigned to one of its institutes such as the NHLBI (National Heart Lung and Blood Institute) or the NCI (National Cancer Institute) depending on the main research focus. Each institute of the NIH is allotted a certain budget for funding extramural applicants, so the institute assignment plays an important role in determining whether or not there is money available to fund the proposal. In addition to the institute assignment, each proposal is also assigned to a panel of expert peer reviewers, so called “study sections”. The study section members are active scientists who review the grant proposals and rank them by assigning scores to each grant. The grant proposals describe experiments that the respective applicants plan to conduct during the next five years. The study section members try to identify grant proposals that describe research which will have the highest impact on the field. They also have to take into account that the proposed work is based on solid preliminary data, that it will yield meaningful results even if the scientific hypotheses of the applicants turn out to be wrong and that the applicants have the necessary expertise and resources to conduct the work.

 

Identifying the grants that fall in the lower half of the rank list is not too difficult, because study section members can easily spot the grants which present a disorganized plan and rationale for the proposed experiments. But it becomes very challenging to discriminate between grants in the top half. Some study section members may think that a grant is outstanding (e.g. belongs in the top 10th percentile) whereas others may think that it is just good (e.g. belongs in the top 33rd percentile). After the study section members review each other’s critiques of the discussed grant, they usually come to a consensus, but everyone is aware of the difficulties of making such assessments. The very nature of research is the unpredictability of its path. It is impossible to make an objective assessment of the impact of a proposed five-year scientific project because a lot can happen during those five years. For example, nowadays one comes across many grant applications that propose to use the CRISPR genome editing tool to genetically modify cells. This technique has only become broadly available during the last 1-2 years and is quite exciting but we also do not know much about potential pitfalls of the approach. Some study section members are bound to be impressed by applicants who want to use this cutting-edge genome editing technique and rank their proposal highly, whereas other study section members may find this approach too premature. Small differences in the subjective assessments of the potential impact between study section members can result in a grant proposal receiving a 10th percentile score versus a 19th percentile score.

 

Ten or fifteen years ago, this difference in the percentile score would not have been too tragic because the NIH was funding more than 30% of the submitted research grant applications, but now the success rate has dropped down to 17%! Therefore, the subjective assessment of whether a grant deserves a 10th percentile versus a 19th percentile research impact score can determine whether or not the grant will be funded. This determination in turn will have a major impact on the personal lives and careers of the graduate students, postdoctoral fellows, research assistants and principal investigators who may depend on the funding of the submitted grant in order to keep their jobs and their laboratory running. It would be reassuring to know that the score assigned to a grant application is at least a good prognostic indicator of how much of a scientific impact the proposed research will have. It never feels good to deny research funding to a laboratory, but we also have a duty to fund the best research. If there was indeed a clear association between grant score and future impact, one could at least take solace in the fact that grant applications which received poor scores would have really not resulted in meaningful research.

 

A recent paper published in Circulation Research, a major cardiovascular research journal, challenges the assumption that the scores a grant application receives can reliably predict the future impact of the research. In the study “Percentile Ranking and Citation Impact of a Large Cohort of NHLBI-Funded Cardiovascular R01 Grants” by Danthi and colleagues, researchers at the National Heart Lung and Blood Institute (NHLBI) reviewed the percentile ranking scores of 1,492 R01 grant applications assigned to the NHLBI as well as the productivity of the funded grants. They assessed grants funded 2001-2008 and the scientific publications ensuing as a result of the funding. Their basic finding is that there is no obvious correlation between the percentile score and the scientific impact, as assessed by the number of publications as well as the number of citations each publication received. The funded R01 grant applications were divided in three categories: Category 1= <10.0 % (i.e. the cream of the crop); Category 2 = 10.0 – 19.9% (i.e. pretty darn good) and Category 3 = 20.0 – 41.8% (good but not stellar). The median number of publications was 8.0 for Category 1, 8.0 for Category 2 and 8.5 for Category 3. This means that even though category 3 grants were deemed to be of significantly worse quality or impact than Category 1 applications, they resulted in just as many scientific publications. But what about the quality of the publications? Did the poorly scored Category 3 grant applications fund research that was of little impact? No, the scientific impact as assessed by citations of the published papers was the same for no matter how the grant applications had been ranked. In fact, the poorly scored grants (Category 3 grants) received less funding but still produced the same amount of publications and citations of the published research as their highly scored Category 1 counterparts.

 

There are few important limitations to this study. The scientific impact was measured as number of publications and number of citations, which are notoriously poor measures of impact. For example, a controversial paper may be refuted but if it is frequently cited in the context of the refutation, it would be considered “high impact”. Another limitation was the assessment of shared funding. In each category, the median number of grants acknowledged in a paper was 2.5. Because a single paper often involves the collaboration of multiple scientists, the collaborative papers routinely acknowledge all the research funding which contributed to the publication. In order to correct for this, the study adjusted the counts for publications and citations by dividing by the number of acknowledged grants. For example, if a paper cited three grants and garnered 30 citations, each grant would be credited with only a third of a publication (0.3333…) and with 10 citations. This is a rather crude method because it does not take into account that some papers are primarily funded by one grant and other grants may have just provided minor support. It is also not clear from the methodology how the study accounted for funding from other government agencies (such as other NIH institutes or funding from the Department of Veterans Affairs). However, it is noteworthy that when they analyzed the papers that were only funded by one grant, they still found no difference in the productivity of the three categories of percentile scores.  The current study only focused on NHLBI grants (cardiovascular, lung and blood research) so it is not clear whether these findings can be generalized to all NIH grants. A fascinating question that was also not addressed by the study is why the Category 3 grants received the lower score. Did the study section reviewers feel that the applicants were proposing research that was too high-risk? Were the grant applicants unable to formulate their ideas in a cogent fashion? Performing such analyses would require reviewing the study sections’ summary statements for each grant but this cumbersome analysis would be helpful in understanding how we can reform the grant review process.

 

The results of this study are sobering because they remind us of how bad we are at predicting the future impact of research when we review grant applications. The other important take-home message is that we are currently losing out on quite a bit of important research because the NIH does not receive adequate funding. Back in the years 2001-2008, it was still possible to receive grant funding for grants in Category 3 (percentile ranking 20.0 – 41.8%). However, the NIH budget has remained more or less flat or even suffered major cuts (for example during the sequester) despite the fact that the cost of biomedical research continues to rise and many more investigators are now submitting grant applications to sustain their research laboratories. In the current funding environment, the majority of the Category 3 grants would not be funded despite the fact that they were just as productive as Category 1 grants. By maintaining the current low level of NIH funding, many laboratories will not receive the critical funding they need to conduct cutting edge biomedical research, some of which could have far greater impact than the research conducted by researchers receiving high scores.

 

Going forward, we need to devise new ways of assessing the quality of research grants to identify the most meritorious grant applications, but we also need to recognize that the NIH is in dire need of a major increase in its annual budget.

ResearchBlogging.org
Narasimhan Danthi, Colin O Wu, Peibei Shi, & Michael S Lauer (2014). Percentile Ranking and Citation Impact of a Large Cohort of NHLBI-Funded Cardiovascular R01 Grants Circulation Research


16 Responses to “NIH Grant Scores Are Poor Predictors Of Scientific Impact”

  1. Jim Woodgett Reply | Permalink

    An interesting study that highlights what we all know: that research is unpredictable. Many great ideas don't work out and as funding becomes tighter, we become more conservative and reduce risk. The study also points out that just as we are lousy at measuring impact since the available metrics each have significant caveats (which likely greatly exceed the margins of study section scoring). Yet, we and the funders (namely the public) need to demonstrate that decisions as to what is funded are made rationally. This is no longer the case and this realization is coming back to bite us in our justifications to funders for more funding. This was not such a problem when funding lines were around 25-30%. That allowed for sufficient wobble and incorporated the inherent uncertainty and errors in evaluating science. But as the cut-off drops, the decisions become influenced by biases and feed-back effects. The simple fact is that most vigorous research does not have predictable outcomes and that applying inappropriate precision to something that is, by definition, uncertain is a fools game.

    So what to do? We've passed the point of futility in terms of the reliability of current processes which simply are not competent for stratification below 20% success. Adding more money to the system would help but is politically unlikely and would only delay the inevitable if other aspects of the research enterprise remained the same. So we either improve efficiencies of the current spend and/or we reduce demand by limiting applications in some way (there are many). Efficiencies might derive from far greater transparency in what resources an applicant reports, reduced costs of wasted review, changes in calculation of full costs (direct/indirect), etc. Since this is not a problem limited to any one country or agency, I'd think there could be a better response than continuing reductions in funding lines....

    • Jalees Rehman Reply | Permalink

      Dear Jim,

      Thank you for your comment. I agree with you that we need a more efficient system, especially when research funding is so tight. But I also do not think that we should discount the possibility to increase government funding for research. My hope is that as scientists become more involved in outreach, they will be able to convince how important it is for any society to sustain and grow an innovative research program. Writing letters to Congress or participating in advisory panels is not enough. If voters demand increased funding for research, the political leaders will eventually have to agree.

      Jalees

  2. Jalees Rehman Reply | Permalink

    Drug Monkey (Twitter: @drugmonkeyblog and Blog: DrugMonkey) pointed out on Twitter that Jeremy Berg (the former director of the NIGMS) reached similar conclusions when analyzing the impact of grant applications funded by the NIGMS. This is a good point, which is why I am providing a link to Dr. Berg's analysis.

    I think that one difference between the NIGMS analysis and the recent NHLBI study is that the latter tracked impact by looking at the citations each published paper received, whereas the NIGMS analysis assessed impact by measuring the median impact factor of the journals in which the funded studies were published.

    Neither the median impact factor nor the median impact of each individual article are perfect markers of "scientific impact", but citation scores for individual articles are probably a bit more accurate than just measuring the median impact factor of the journals they were published in. Either way, both studies highlight that our current peer review system for grant applications is deeply flawed.

    • Jalees Rehman Reply | Permalink

      Thank you very much for these two links! Your analysis is fascinating, and I am glad that I have now read the extensive citation analysis you performed. The difference between new grants and competitive renewals is also interesting. The NHLBI study focused on new R01 grants and could therefore not uncover the relationship between number of publications and percentile scores of competitive renewals that you found.

  3. Margalit Reply | Permalink

    As a member of a study section, I agree with the report - it is not possible to objectively distinguish the top 10% from the top 20% - I try not to be conservative and reward creative and novelty, but there are always risks. There are now NIH grants specifically focusing on high risk original research (EUREKA and others) - I think percentiling young people, and risky research separately is a way to guarantee not only the same old stuff gets funded!.

    The one point you mention I disagree with is whether more funding to NIH will help. When I started in the early 90s, we had a similar situation - my first grant had a 14%ile and didn't get funded. Then NIH budget was doubled, and only during that time was it possible to get 25%ile grants funded. But this lead to a large increase in research space acquisitions and building boom by major Universities, so now the same $$$ are available for many more researchers. In addition, when I started, I had to bring in 18% of my salary, now its 50%, and for many its 95%. Universities have pushed the responsibility to NIH to pay faculty salary, despite tenure. In addition, there are now many more >65 yo R01 scientists, so getting rid of mandatory retirement has caused another problem. So, if the NIH budget gets increased, the funding rate won't change - it will again get into this spiral. What NIH needs to do is disallow too many grants by a single investigator (e.g. forcing a minimal 25% effort per R01 grant), and to disallow more than 50% salary on grants. Otherwise, increasing NIH spending will just result in another building and hiring boom.

    • Jalees Rehman Reply | Permalink

      Thank you for this comment. You are correct, just pouring more money into this system will not improve our ability to identify and fund the most innovative and creative projects. Restrictions on how much maximal funding a single PI can receive and a special emphasis on funding cutting-edge, high-risk projects is also needed. If universities contributed more to the salaries of the faculty, there would be less of a pressure for every investigator to submit so many proposals and more intellectual effort could be spent on doing science rather than writing grants.

      But I do think that there is also a need for some increase in NIH funding. As you pointed out, the doubling of the NIH budget back in the 1990s created a research boom, and this research boom lead to the training of so many highly qualified scientists which now need resources and decent career opportunities to practice the kind of science they were trained for.

      How about coupling some increase in NIH budget with a (partial) reform of the grant system, such as restricting the total amount of grants a single investigator can hold, mandating that universities contribute more to salaries, placing more emphasis on out-of-the-box thinking when grants are reviewed, etc?

  4. Val Reply | Permalink

    I agree with the suggestions here, and will add that we could fund more projects by cutting indirect costs, as well.

    I decided to look at how much is given out in indirect costs. I looked at the sample funded applications on NIAID's web pages (http://www.niaid.nih.gov/researchfunding/grant/pages/appsamples.aspx). Using information in the applications and information on NIH's Project Reporter site, I discovered that more than half of the direct costs were added on to the funded grants.

    The NIAID page has 9 funded applications asking for $6.38 million in direct costs. The total indirects were $3.32 million (based on either the rate published in the application or calculated from the Project Reporter site). The lowest indirect rate I found was 46%, and the highest was 94.5%. These numbers seem high.

    If indirect costs had only been 20%, there would have been $2 million left over to fund more research. Assuming a 20% indirect rate, this money could have funded one more R01, one more R21, and still have money left over. Out of just those nine grants alone! It makes me wonder about how the payline would shift if indirect cuts were cut.

    It isn't clear to me why a university needs more than half the direct costs of a grant to provide administrative support for it. In fact, when I searched for the question "How do universities use indirect costs?", I found all sorts of uses, not all of which were even indirectly related to the project. These included student services (admissions and student health services) and general university administration (as distinct from sponsored projects administration and departmental administration). Even ignoring these uses, the rates seem excessive to me.

  5. Jeremy Berg Reply | Permalink

    Indirect cost rates are negotiated between universities and the federal government according to OMB Circular A-21 (http://www.whitehouse.gov/omb/circulars_a021_2004). The rules are quite clearly defined and the negotiations occur every four years and involve large amounts of data about space, etc.

    Across all institutions, indirect cost rates average around 50%, that is, for every $1 of direct costs, the institution receives an additional $0.50 but rates can go as high as 90% or so. The administrative portion of indirect cost rates is has been capped at 16% for more than a decade. The major region for the difference is facilities costs, particularly debt service on funds borrowed to build research buildings. The intent of this policy was to aid research institutions in updating facilities. However, this policy has motivated institutions to build more space with the "business plan" that the will fill the building with researchers who will bring in sufficient grants that the indirect costs will cover the costs for the building.

    The only thing that individual funding agencies can do is pay grants as total costs (as NSF does) so that the investigator and the institution have to divide up the award (up to the maximum allowed indirect cost rate). NIH occasionally caps grants according to total costs so that applications from institutions with high indirect cost rates get lower direct costs.

    My sense is that the percentage of the total NIH budget going to indirect costs over the past decade has been essentially constant, but I requested more detailed data and, remarkably, they do not appear to be readily available.

    • Jalees Rehman Reply | Permalink

      Thank you for your comments on indirect costs and pointing out the cap on the administrative portion of the indirect costs.

      One concern that I have is the fact that indirect costs are simply applied as a standard percentage to all grants without taking into account the project specific needs. It makes sense that universities need to service the debt for building new facilities, but should every researcher on campus with an NIH grant have to contribute to this debt from money assigned to their specific project? I think that at most institutions, universities charge the same rate of indirect costs to every grant, independent of whether the investigator's lab is in a brand-new building or an old building.

      If these indirect costs are provided by the NIH to help the universities facilitate the research of a given project, shouldn't the indirect costs be tailored according to the project? When it comes to direct costs, investigators are not supposed to use the awarded funds to pay for research projects of investigators working on a completely different area of research. Why should indirect costs be used for costs that are unrelated to the proposed research? It would make sense for the universities to provide a breakdown for the indirect costs, i.e. Investigator A will need X square feet of lab space, Y amount for admin support, Z funds for utilities, etc. and this is why we are adding on this amount of indirect costs to her R01 grant.

  6. Jeremy Berg Reply | Permalink

    The averaging of costs with the associated reduction in administrative burden is the guiding principle of the indirect cost system. The indirect cost rate is established based on an average of the costs over the entire institution. The actual costs of doing research in a new building with associated debt service is certainly higher than the cost of doing research in a fully paid for older building, but the rate reflects with weighted average of all types of space. The rate also assumes a particular level of research activity. The idea is to avoid having to quantify how much and what space and other indirect research expenses are associated with each project (which would represent a considerable administrative burden for investigators and administrators) but have the total costs come out right in the end. You are correct that the indirect cost rate is generally the same for all research space although many institutions have "on-campus" and "off-campus" rates. Further, many have pointed out that the indirect costs for the second or third grant for a lab should be lower since the space, lights, etc. are already paid for. Again, however, this is taken care of in the averaging process. Once the rate is established, the institution (and its faculty) takes the risk that the amount of research support obtained with meet or beat the level assumed during the negotiation. If the support is too low (not enough grants) then the institution will not have the assumed indirect costs covered. If the faculty of the institution are unusually successful (over the projected level) then the institution can recover more that the expected amount.

    I want to be clear. I am not defending the indirect cost system as flawless. I think there are perverse incentives that should be examined. But I did want to point out that it was a rationally designed process that we intended to balance the needs and goals of the government, the institutions, and investigators.

  7. Val Reply | Permalink

    I see your point, especially as it relates to historical reasons for high indirect rates.

    At the same time, it may be time to re-examine current practices. Our universities seem to have been embroiled in a hiring and construction arms race since the big funding increases in the 1990s. More buildings! More researchers! Maybe it's time to slow down. Maybe it's time to ask if all this quantity is giving us the boost in output that we thought it would.

    I think it isn't. When people have to spend more time seeking funding, they spend less time thinking about new ideas (much less trying them out). And with so many applications to consider, risk aversion becomes a natural outcome of the review process --- not because the reviewers set out that way, but because everyone has to like an application a lot in order for it to get a fundable score. Not everyone will like a given quirky, risky idea.

    When we (rightfully) bemoan lack of funding opportunities and the risk-averse review environment, we need to consider these ideas. We have a very serious problem right now, and I suspect that addressing it in a meaningful way will require significant changes in the system's fundamental structure.

    Ultimately, IMO, the goal has to be to get back to funding roughly a third of applications. At that point, the PIs will spend more time doing research and some of the quirky stuff will get funded --- meaning we won't have to design special programs like EUREKA.

Leave a Reply


− 2 = seven