A Four Part Series on Open Notebook Science (Part 4)

In the previous article, I debated with Dr. Jean-Claude Bradley about different approaches to open notebook science. One approach is an open notebook model that relies upon voluntary participation. However, this model has been shown to have limited success in terms of the number of participants who agree to volunteer. In this final article of my four part series on open notebook science, I would like to issue a call to action and encourage readers to participate by signing a petition for scientists to open their lab notebooks to the world. Specifically, I am seeking the creation of a national digital repository that would reside at the United States Patent Office (USPTO). In terms of this petition, a voluntary contribution would be acceptable, but the USPTO would need to spend the money to create and maintain such a database, including hardware and software acquisition or cloud storage as well as employee labor to manage it. If participation was low, the cost to implement such a program would be unwarranted. Given these facts, a mandatory model should be explored, at least when it comes to notebooks for research that result in patents.

In the course of doing my research, I consulted with some top legal firms specializing in U. S. patent law. I spent several years looking into a variety of possible model approaches and arrived at one that I believe might work. Finally, I believe I arrived at a solution that could fund itself. It is to propose a legislative mandate, to accompany the new open data laws, requiring the legal deposit of digitized or “born digital” electronic laboratory notebooks generated as part of governmentally-funded research projects that led to a patent application to be filed with the national patent office.

Science and Religion: Where the Future and Past Meet

From ancient times, "treasures" and "records," were "safely stored and hidden from the strangers' gaze" in sacred temples and churches; in modern times, the "architecture of any archives buildings [and museums] deliberately embodies imagery of temples and shrines" where a feeling of entering a precious "sanctum" is impressed on visitors. Many are convinced that the treasures contained within are so valuable it justifies their "monopoliz[ation]," limiting access to only individuals initiated into specific practices and who possess the foundational understanding to allow for their care and correct interpretation (Archives Power: Memory, Accountability, and Social Justice by Randall C. Jimerson, p 3-4). Over time, these caretakers have been designated "high priests," "archivists," and "curators." The historic opening of those materials housed in archives and museums to the public over time allowed greater social justice through access to the world's most valuable treasures and the knowledge they imparted. In today's modern world, where faith is increasingly placed, not on religious doctrine, but on innovations in science and technology, the open science movement hopes to shed light upon the inner sanctum of the scientist's laboratory and to create digital fortresses for their preservation. These fortresses of science cyberinfrastructure, like libraries, museums, and archives that opened to the public, must also enable safe and reliable public access to materials. The open science movement has gained such traction that, just this week, in an unprecedented move, under a new 2 year trial initiative publishers will be giving UK public libraries free access to subscription, academic STEM research articles. In addition to published materials, preserving and archiving original materials and data generated in the process of scientific and medical research, especially in the life sciences but applicable to all scientific disciplines, is more important now than ever. The journal Nature alone has published more than 19 pieces on the "Challenges in Irreproducible Research" from 2012-2014.

Part of the problem that the ancients foresaw was that knowledge in the wrong hands can call fundamental beliefs and premises into question, which could result in grave financial, political, and social turmoil. So too, when archives, books, and statues could not be integrated into a new regime, they were defaced or destroyed when a new paradigm or new regime replaced an old one. In the history of science, the concept of "paradigm shift" was embraced by Thomas S. Kuhn in his book The Structure of Scientific Revolutions, which turned the field of the history and philosophy of science on its head in the 1960s. The underlying premise is that anomalies and errors found in data may change our understanding of the world and improve the scientific process. What might be called the reproducibility crisis is forcing some serious re-evaluations within the scientific community that may lead to such a paradigm shift in the sciences. This may take the form of a new branch of the sciences, the "computational approach," as I intimated in earlier articles on open data, "What is E-science?" and "Open Data Tools," where "I showed how E-science and ‘big data’ fit into the philosophy of science though a paradigm shift as a trilogy of approaches: deductive, empirical, and computational, which was pointed out, provides a logical extenuation of Robert Boyle's tradition of scientific inquiry involving 'skepticism, transparency, and reproducibility for independent verification' to the computational age."

Donald Berry stresses the importance of writing things down to remember when it comes to enabling reproducibility. Scientists traditionally have used laboratory notebooks to record all of their thoughts about a particular experiment that they might otherwise forget.

Donald Berry stresses the importance of writing things down to remember when it comes to enabling reproducibility. Scientists traditionally have used laboratory notebooks to record all of their thoughts about a particular experiment that they might otherwise forget. (Screenshot credit: President’s Council of Advisors on Science and Technology (PCAST)).

One of the controversial aspects of archiving is maintaining archives of grass roots activism, which espouse a particular political or social viewpoint, for example, human rights, social justice (civil rights, anti-discrimination), or labor movements. Maintaining these archives documenting and preserving these materials may be viewed by some as a form of political activism wherein archives form "a framework of shared cultural understanding." There are special member-only discussion lists such as the Society of American Archivists' "SAA Archivists & Archives of Color Roundtable Discussion List," "SAA Human Rights Archives Roundtable Discussion List," and the "SAA Labor Archives Roundtable Steering Committee List") as well as an open Yahoo! groups email discussion list ("Progressive Archivists") devoted explicitly to the role of "activist archivists" and others interested in "social responsibility in the context of the archival profession." Other organizations like The United Nations' Educational, Scientific, and Cultural Organization (UNESCO) and the International Council on Archives (ICA) are also actively engaged in these issues. The act of preserving knowledge, in all its forms, is an effort to promote the historical significance of the events surrounding a major change. Joan M. Schwartz and Terry Cook, point out in "Archives, Records, and Power: The Making of Modern Memory" that:

archives--as institutions--wield power over the administrative, legal and fiscal accountability of governments, corporations, and individuals, and engage in powerful public policy debates around the right to know, freedom of information, protection of privacy, copyright and intellectual property, and protocols for electronic commerce. Archives--as records--wield power over the shape and direction of historical scholarship, collective memory, and national identity, over how we know ourselves as individuals, groups, and societies....[the] underlying nature, theoretical assumptions, practical applications, historical evolution, and consequences for users... [demonstrate the] "power of the archive."

Simply put, what is preserved versus what is lost to history constitutes both "memory" and "identity."


Video: “Reproducibility of Scientific Results, A recorded presentation,” from the 4th EQUATOR Annual Lecture on 'Reporting and Reproducible Research: Salvaging the Self-correction Principle of Science.'" According to John Ioannidis, we are losing the equivalent of a Library of Alexandria worth of materials every day. Ioannidis correctly argues that published articles are "advertisements," not the "scientific record." An article is just "a small piece that advertises the scientific record," which constitutes the entirety of the scientific research pipeline. Many improper findings, he says, fall into three categories: "publication bias, selective reporting bias, and fabrication bias." Ioannidis also points out the economic impacts of irreproducibility for one stakeholder group I mentioned earlier in this series, industry and corporate stockholders. Simply put, irreproducibility increases risk for this group, creating a vacuum that can be filled through scenarios whereby avenues and tools that are created through technology and facilitated through science policy for improved reproducibility are used to incentivize and speed innovation.
(Video credit: John Ioannidis, the C. F. Rehnborg Professor in Disease Prevention in the School of Medicine and Professor of Health Research and Policy and, by Courtesy, of Statistics, Stanford School of Medicine, Stanford University).


The underlying issues of irreproducibility as discussed in Nature, as described by its Editor-in-Chief as put forth in a recent article by the National Institutes of Health (NIH). (Screenshot credit: President’s Council of Advisors on Science and Technology (PCAST)). (Click image to enlarge).

In a January 24, 2014 article published in the journal Nature, "Policy: NIH plans to enhance reproducibility," The National Institutes of Health (NIH)--the same organization that led the way in federally-funded open access research mentioned earlier in this series--announced that "A growing chorus of concern, from scientists and laypeople, contends that the complex system for ensuring the reproducibility of biomedical research is failing and is in need of restructuring. As leaders of the US National Institutes of Health (NIH), we share this concern... [because] the recent evidence showing the irreproducibility of significant numbers of biomedical-research publications demands immediate and substantive action." The article suggests that scientific misconduct and fraud constitute only a very small fraction of irreproducibility cases and that most cases of error in reproducibility are due to poor training and failure to report necessary details of the experiment's design such as "blinding, randomization, replication, sample-size calculation and the effect of sex differences." Efforts to ameliorate a "credibility crisis" or "crisis of faith" across a variety of scientific disciplines need to be undertaken swiftly and effectively to assure funding agencies, companies, and the public that the entire process in the research pipeline will produce credible and safe scientific and health findings. Failures, like "Climate Gate," create molehills upon which mountains are built by anti-scientific and ultra-religious factions, disrupting the political and social policies that govern progress. The NIH article does point out that there are significant steps taken before a product comes to market, but these steps unnecessarily put into danger the welfare of test animals and subjects of clinical studies if there are fundamental and correctable errors in the underlying bench work. Ensuring reproducibility provides a public confidence in the scientific process leading from the bench to the bedside.


Video: “Reproducible Research: Concepts and Ideas.”
(Video credit: Roger Peng, Associate Professor Department of Biostatistics, Johns Hopkins Bloomberg School of Public Health).


In this case, the revolutionary grass-roots driven change (see earlier petition on the "We the People" website asking for openness in publishing) is one of greater openness, transparency, and accountability in the scientific and medical publishing industry by exposing a greater amount of data about the scientific process and re-invigorating the paramount principle of modern science, that of the reproducibility of research.

In part one of this series I stated that reproducibility, not peer review, is the gold standard in science. Just to clarify, as Marcia McNutt, Editor-in-Chief of the journal Science, pointed out a January 31st PCAST meeting devoted to the topic, there is "a spectrum of reproducibility." First there is repeatability, which is the minimum standard, which entails accessing data, and then there is replication, which is the "gold standard," which entails completely starting from scratch and redoing an experiment. Eric Lander followed suit, arguing that developing a taxonomy of problems in reproducibility is important, for example, distinguishing between "irreproducibility" and "non-generalizability." McNutt also pointed out how journal articles often fail to capture the "tacit knowledge" of scientists whose physical expertise in performing their job resembles those of a master artisan. Because it is not captured, it cannot be scrutinized. The lab experimenter in this regard is like the astronomical observer, where both have biases and tacit knowledge regarding experimental protocols that can reinforce preconceived notions.

This concept is known to historians of astronomy as the personal equation. The personal equation was notably discussed in relation to chronographs as early as 1900 by Walter E. Maunder in The Royal Observatory Greenwich: A Glance at its History and Work (London: The Religious Tract Society, p. 177) and then in great depth by Simon Schaffer, “Astronomers Mark Time: Discipline and the Personal Equation,” Science in Context, 1988, 2:115-145, on p. 122, 124, 127, 129-138. In short, the modern "personal equation" is the expression of tacit knowledge that emanates from persistent, sometimes deliberate or sometimes unintentional, habits of individuals who are visualizing, interpreting, and documenting observations. McNutt pointed out that when a visit to a lab by an outside group occurs, an obvious "AHA" moment occurs relating to lab procedures. But could not this independent observing of lab protocols also occur by reading lab notebooks? Times they are changing, Lander noted. senior scientists "grew up in a very different world where the data is in their lab notebook" not in big data sets; problems arise they are unprepared for where software may generate different results when running on different platform architectures, but where reproducibility may one day be achieved though "persistent piece of software or environment" like through computer emulation. Ultimately, though, as Glenn Begley pointed out, it is whether study results are "robust" enough for "discovering drugs to help people" and avoiding "scientific exaggeration" that makes claims about "curing" a disease simply to obtain publication.

Marcia McNutt, Editor-in-Chief of Science, discusses the variability of lab protocols when it comes to tacit knowledge

Marcia McNutt, Editor-in-Chief of Science, discusses the variability of lab protocols when it comes to tacit knowledge. (Screenshot credit: President’s Council of Advisors on Science and Technology (PCAST)).

This change, to demand more from publicized research claims, is occurring now, in part, to the deluge of data and the computational capabilities of software and hardware advances in modern computer technology, as well as science policies for greater openness and ease of access that are rapidly, almost virally, spreading around the globe. This data can be used for reproducibility of research. By making the products and processes of science publicly available, science becomes more participatory and collaborative, and the rise of “citizen science” crowd sourced projects is just one example of how this is occurring. Another rising trend is post-publication review using social media. Citizens and students are interested in learning more about STEM fields—how they work, and what learning opportunities exist from freely available educational resources and tools. Education and training are key to data-driven science reform. By opening lab notebooks, researchers increase transparency about what they did and did not do. Notebooks can be used to evaluate study design to see omitted results and avoid publication of only a statistically significant result when same study produced lots of other bad results. Notebooks can also be studied and reviewed to evaluate protocols and teach best practices for applying the scientific method--ones that avoid selection bias and improper data analysis where statistics are inappropriately applied.

Everyone seems to have a stake in the fact that this training, and the benefits to society that science publications promise, are error-free, can be validated, and are reproducible. In short, computational reproducibility guarantees a model of "best practices" in an era of "Big Data'" by offering up information for greater scrutiny and skepticism by peers and an informed public.

In the aforementioned Nature article, the NIH stated that reproducibility is a "community responsibility...clearly, reproducibility is not a problem that the NIH can tackle alone. Consequently, we are reaching out broadly to the research community, scientific publishers, universities, industry, professional organizations, patient-advocacy groups and other stakeholders to take the steps necessary to reset the self-corrective process of scientific inquiry." It is important to remember that even with the 2013 OSTP open data mandates (based on the NIH model), missing data from those submitted processed data sets to data repositories may still lead to misinterpretations and faulty conclusions that could be corrected by lab notebooks, especially if the notebooks contain calculations, notes on errors, and other contextual information when raw data is either not available, was part of a longitudinal study that is not repeatable, or is so large as to be beyond the computational resources (i.e., supercomputers and petabytes of data) of those attempting to verify results. If the taxpayers funded research leading to a patent, then all the records, including lab notebooks, should be made public.

Glenn Begley stresses the relationship between reproducibility, patents, and speed to innovation.

Glenn Begley stresses the relationship between reproducibility, patents, and speed to innovation. (Screenshot credit: President’s Council of Advisors on Science and Technology (PCAST)).

In part one (January 1, 2014) and part two of this series (January 10, 2014) I mentioned a variety of stakeholder groups that included industry and corporate investors like shareholders and venture capitalists. Glenn Begley, the former head of the oncology group at Amgen (he was the individual at this company who did the study I cited in part one of this series), was called to talk about his findings before the President's Council of Advisors on Science and Technology (PCAST) in a session called "Improving Scientific Reproducibility in an Age of International Competition and Big Data" held on January 31, 2014. In his statement, Begley explicitly linked the relationship between improved reproducibility and improved innovation. In particular, he argued that using patents that were verified as reproducible would not only save research institutions money, but would speed venture capitalist investment through the reduction of risk. He said:

Institutions have to my mind been tardy in beginning to address this problem. I think that there is a significant advantage to them were they to do so. They could save money first by the patents that they file, many of which will not stand the test of time. That immediately would save them money. In addition, if an institution could put a stamp of approval on a particular patent, then I know that the venture capitalists and those that would be willing to take that forward in terms of additional discovery and turning it into a drug would be much more confident that they had been independently replicated. I immediately see value in terms of the institutions addressing this.

Phillip Campbell stresses the importance of data AND lab notebooks access to PCAST.

Phillip Campbell stresses the importance of data AND lab notebooks access to PCAST. (Screenshot credit: President’s Council of Advisors on Science and Technology (PCAST)). (Click image to enlarge).

A Call to Action:
The USPTO has a Constitutional Responsibility to Ensure Long-term
Data Preservation Related to U.S. Innovation

As the final article in this series on Open Notebook Science, I took the notion of "activist archivist" to The White House by creating a "We the People" petition, and on, January 20th I submitted to the scientific journal Nature a correspondence that was accepted on January 23rd for publication, titled "Open up access to lab notebooks," and have been told is "currently scheduled for 13 February issue" pending space limitations. The US, the UK, and the EU are all engaging in the formulation of open access and open data public policies, hoping they choose the best combination of solutions, engaging in a trial by fire of sorts, refining resolutions as problems inevitably creep in. As policies emerge, the gaps in science preservation become more obvious, such as the lack of openness of laboratory notebooks.

Petition on Laboratory Notebooks.  (Screenshot credit: The White House).

Petition on Laboratory Notebooks. (Screenshot credit: The White House).

The petition was created on January 17, 2014, and the text, which was limited to 800 characters including spaces, reads:

WE PETITION THE OBAMA ADMINISTRATION TO: Mandate Open Access to Digital Copies of Lab Notebooks Created Through Publicly Funded Research Leading to a US Patent. Access to notebooks improves the processes of patenting, inventing & preserving U.S. scientific & medical history. In 2013, OSTP mandated open access for federally funded research articles & data, but excluded notebooks. This petition requests expansion of the mandate. When federally funded research results in a (provisional) patent application, a digital copy of searchable, full-text notebooks should be required. Why? Without notebooks, recent studies were unable to reproduce journal findings, resulting in serious economic & health implications for products & processes. Notebooks are evidence in patent litigation, so funding USPTO storage prevents fraud. After the life of the patent, notebooks should become public domain with an exclusion allowing transfer of classified materials to NARA.

(The length limitations forced me to omit the fact that materials facing export limitations would be treated differently).

John Holdren, Director of the White House OSTP and Co-Chair of PCAST, speaking at the latest PCAST meeting on reproducibility. (Image Credit: ).

John Holdren, Director of the White House OSTP and Co-Chair of PCAST, speaking at the latest PCAST meeting on "scientific reproducibility in an age of international competition and big data." (Screenshot credit: President’s Council of Advisors on Science and Technology (PCAST)).

Due to the fact that the petition was so limited in its length, I would like to explain my thoughts in more detail.

Background: Recent studies have called into question the ability to reproduce scientific findings published in scientific and medical journals. The published articles produce a summary of the objectives, methodology (including the protocols followed), and a brief description of the findings. However, they usually do not provide enough material to reproduce those results. Recently, the United States Office of Science and Technology Policy (OSTP) mandated open access to federally funded research articles and their associated data. Given the limitations of "cleaned data sets," lack of metadata about these data sets, and the general explosion of data occurring in an age of "big data," providing snippets of underlying data is not enough. There will be greater and greater scrutiny and problems arising as this data is publicly available and is re-used. There needs to be additional qualitative descriptions to accompany the quantitative data to explain how this data was generated with enough clarity to actually reproduce the science behind the claims. One way to do this is to provide access to laboratory notebooks. These scientific and medical claims are relevant to Americans because they affect the health and well-being of the public, as well as the economic prosperity of the country as corporations and small businesses invest their money to produce and sell products based on these findings. (To read more about the background of this debate, please see part one of this series).

Scope: This petition pertains only to individuals and organizations who receive government funds for their research and then use that research to file for a patent or provisional patent. The petition would *NOT* include the following: privately funded research, research conducted outside the United States, and research that did not result in filing for a US patent or provisional patent. Nor would it prohibit individuals from submitting their data and notebooks elsewhere in addition to the United States Patent Office (USPTO).

Aims: This petition requests a mandate that during the process of filing for a provisional and/or actual patent, that a digital copy of their associated laboratory notebooks also be submitted to the USPTO as part of the filing process. It is understood that an additional small fee may be required to help pay for the costs associated with the processing as well as the long-term digital storage and preservation of these documents. It is understood that the USPTO would also be required to create a database for such an endeavor, similar to their existing patent database, which would allow access to and search of (and across) the collection of laboratory notebooks. At the discretion of the USPTO, the digital copies of the notebooks that are submitted may be redacted by the filing entity to exclude any material contained therein not pertinent to the patent and the reproducibility of the science behind it. The long-term preservation, accessibility, and searchability of such notebooks, which have historically been used in patent litigation in the United States, is an important step to improve the patent process, the legislative process of patent litigation, the improvement of the reliability and validity of scientific endeavor and inventorship through reproducibility of research findings, and the long-term preservation of the scientific and medical history of the United States.

1) It is requested that all individuals who relied upon federal funding for scientific and medical research and then who subsequently file for a patent or provisional patent in the US, be mandated by the OSTP to submit their associated laboratory notebooks to the USPTO along with their filing.
2) It is requested that the OSTP include in this mandate the right of individuals who file a petition to a patent be given access to the laboratory notebooks on file with the USPTO related to that patent.
3) It is requested that the OSTP mandate that laboratory notebooks (as full-text, open access digital documents) become part of the public domain upon the conclusion of the life of the patent.
4) It is requested that it be included in the mandate by the OSTP that patent filers who wish to make their notebooks open access to the public prior to the conclusion of the life of the patent be allowed to do so at the USPTO and/or in any repository of their choice.
5) It is requested that any laboratory notebooks that may contain sensitive or classified information that, for national security reasons, be omitted from the portion of the mandate which requires public access at the end of the life of the patent. It is therefore requested that the OSTP mandate "national security" to be noted when filing the laboratory notebooks with the USPTO and, after review and agreement by an internal review board, that very limited access be given to these documents and that they are released under FOIA (with reductions as needed) just as similarly classified materials are handled by other agencies, such as the National Archives. If necessary, it is requested that such laboratory notebooks be transferred to the National Archives should the added expense be too much to bear for the USPTO. (For more information, see: Malakoff, D. (4 October 2013). Hey, You've Got to Hide Your Work Away. Science 342(6154), 70-71.doi:10.1126/science.342.6154.70).

Why Should a Lab Notebooks Database be Maintained by the USPTO?

The USPTO just makes sense as an open notebook leader due to its historical ties to the usefulness of laboratory notebooks, the continued legal value of laboratory notebooks, and the fee generating structure (patent applications) in place to assist in the financial costs of establishing such a repository. Indeed, many of the infrastructural requirements, such as institutional linking of patents with their associated laboratory notebooks as well as existing outsourcing of large scale data management for similar documents (a national patent database) are already in place.

The USPTO maintains this permanent, interdisciplinary historical record of all US patent applications in order to fulfill objectives outlined in the United States constitution. “The United States Patent and Trademark Office (USPTO) is the federal agency for granting U.S. patents and registering trademarks. In particular, the USPTO has a Constitutional mandate to preserve the records of U.S. innovation. Specifically, in maintaining the patent database (and I argue a lab notebooks database), the USPTO acknowledges it is fulfilling the legislative mandate of Article I, Section 8, Clause 8, of the U. S. Constitution ("To promote the Progress of Science and useful Arts, by securing for limited Times to Authors and Inventors the exclusive Right to their respective Writings and Discoveries") such that:

Under this system of protection, American industry has flourished. New products have been invented, new uses for old ones discovered, and employment opportunities created for millions of Americans. The strength and vitality of the U.S. economy depends directly on effective mechanisms that protect new ideas and investments in innovation and creativity. The continued demand for patents and trademarks underscores the ingenuity of American inventors and entrepreneurs. The USPTO is at the cutting edge of the nation's technological progress and achievement.

The USPTO is also unique because it operates solely on fees collected by its users, and not on taxpayer dollars. In the recent budgetary downsizing in the federal government and the sequestration of funds we saw in 2013, a program of preservation operating outside this sphere of fiscal uncertainty would indeed be a blessing. The cost of patenting using a firm usually costs a minimum of $20,000, averages about $20,000-$40,000, and can sometimes exceed, if litigation is needed for a patent defense, over $100,000. The USPTO administrative fees of a few hundred dollars are a minuscule portion of the patenting process costs. Of course, in the case of biotech or “big pharma” company patents, for example, millions of dollars are often the expected return. Therefore, a similar fee of a few hundred dollars for the deposit of laboratory notebooks would not constitute a financial barrier to patent filers but would have tremendous value to a number of stakeholders. Specifically, the initial outlay of funds might be freed from the Fee Reserve Fund, and then it could recoup its initial financial outlay through an additional fee for notebook submission. The America Invents Act (AIA) "established the Patent and Trademark Fee Reserve Fund where collections in excess of approved spending levels would be deposited for further use” (p. 25). Therefore, not only is a lab notebooks repository a financially viable proposition for the USPTO but it is also one that meets its Constitutionally-driven purpose as well as its stated strategic goals. In its strategic plan, the USPTO says it is “engaging stakeholders in validating our quality [patent] findings" (p. 16). The plan states “The USPTO will continue its work to establish a sustainable funding model that provides us with a reliable and sustainable source of funding. Our operating structure is like a business in that it receives requests for services—applications for patents and trademark registrations—and charges fees projected to cover the cost of performing the services we provide." (p. 25). Specifically, the USPTO claims on its website that “As a fully user-fee funded agency, the USPTO’s requirements are addressed at no cost to the taxpayer.”

In short, I am petitioning for a national electronic historical database managed by the patent office so that it would accompany the existing database where records of patent applications are presently stored. (An email to the USPTO in mid-September requesting an evaluation of this proposal was rejected by Paul Fucito, and no response was received from him after a follow-up email as of this printing). In this way, the agents filing for patents would absorb the expenses associated with preservation of materials, which by regulatory law in some industries, like the pharmaceutical industry, mandates that notebooks be maintained for the entire time a product is being developed, which can be 15 years or more. The life of a patent is usually 17-20 years. In some cases, the public release of notebooks could be embargoed. Redactions, if needed, could also be done with the legal staff and lead scientist working together. Costs for the new database could be covered through deposit fees. In addition, to protect the possibility that an undiscovered use might arise out of a published notebook, legally deposited notebooks should have some additional legal protections issued to their creators. In the new system, patent filings need to include copies of any published works by the creator related to the patent under consideration. If they did publish, and their research had been funded under the new open data mandates, their underlying data sets would also have been published and as associated data to that paper would be available to the patent office.

Anticipating Some Objections: Completeness, Existing Databases,
U.S.-centrism, Copyright Law, and Mandates

While it is true that patents comprise a small fraction of the uses of laboratory data, it is necessary to start somewhere and given the current economic climate, a project based upon an economically self-sustaining model would be ideal for imitation by others. I believe the USPTO is the most logical choice to begin an open notebook repository. The long-term goal would be, like the NIH (the US institution to first rigorously undertake open access publishing requirements), for one institution, the USPTO, to take the lead in when it comes to open notebook mandates. Other governmental agencies would follow as collaborative partners, and finally private organizations such as publishers would join in. Such a database might be expanded at a later time, through partner organizations, to include additional notebooks that might better reflect the broader need within the scientific community for reproducibility and replication of results. At the given time, I am unaware of any other alternative organization (other than related efforts recently underway by the NIH in the previously mentioned article) that is willing to take on the project of a central depository and incur the related expenses of establishing and maintaining a comprehensive database across all the sciences, especially when there may be no profit motive on the part of the depositors to enable fee collection to pay for the database.

Notebooks that do not result in patent applications might find homes in new databases, or conjoined “mega databases,” as well as databases established by journal publishers. As big data gains traction, more and more tools will need to be created for specific disciplines and even specific high priority research issues (climate, energy, public health, cancer, aging, and chronic diseases) either within a discipline or spread across disciplines and issues and organized around model organism research. This is exactly why specialized databases are created, to collocate either having full copies of data or published works in one location or by creating descriptive surrogates (records with abstracts). Take, for example, notebooks related to model organisms and their related databases. Similar projects for data and literature curation exist, including E. coli (EcoCyc with GenProtEC), A. thaliana (plantgdb with TAIR), or drosophilia (BDGP and EDGP with FlyBase), C. elegans (WormBase with Worm Literature Index), etc. (A helpful list of additional databases can be found here). Centralization helps to eliminate redundancy in tool creation, allows standardization and interoperability, and makes resources easier to find by the end user. It can be hoped, and indeed, it is beginning to happen (with Textpresso, for example), that these databases will be conjoined (or “pipelined”) so fuller examinations and discoveries might emerge amongst patterns that are now disjointed text and data. This can be accomplished through the integration of text mining and computational analysis, including data curation and data mining. Joining data together, as I mentioned in an earlier post, also allows for easier data mining and data analysis using machine learning.

While the petition I suggest is clearly U.S.-centric, it certainly would be relevant to scientists and policy makers in other countries. In my proposed approach, individuals employed throughout the world would submit to their laboratory notebooks into depositories of their own countries’ patent offices in the language(s) required by those respective entities.

In terms of copyright, laboratory notebooks do not fall under U. S. copyright law. Instead, they are considered "unpublished materials," like archives, and are not subject to copyright restrictions in the US unless explicitly published by the creator or his or her agency. My petition asks for unpublished materials from an individual or organization to be submitted, and this submission would include the right for the USPTO to publish those materials online at the end of the life of the patent. Arguably, at the point of publication, rights over those notebooks could theoretically belong to the USPTO as a publisher. As publisher, the USPTO then establishes the terms of use, in the case of the petition I am asking them to publish under a CC license essentially, offering the materials to the public domain. Most government publications (because they are done with tax dollars in the US) are public domain materials. For example, almost all NASA photos are in the public domain and can be used freely.

Again, I would like to stress the importance of broad compliance by scientists. In my last article in this series I compared voluntary versus mandatory participatory models. A recent article on laboratory notebooks, "NC State professor uncovers problems in lab journal" provides a case example where "a $1 million grant" based on "a prominent Science journal article...was built on a false premise." In it, a researcher amazingly continued to refuse access to materials and evidence to reproduce his work for eight years, including his lab notebooks, leading to a nasty legal battle and several subsequent studies that relied on those results. If the company he was working with had filed any patents, as I suggest in the petition, those notebooks would have been available for the NSF and any challengers to see. The aftermath of bad results has a snowball effect where research based on bad results only escalates until science's "self-correcting" mechanism kicks in. Time, money, and scientists' career years that were wasted could have been saved through a mandated policy of open notebook science.


In today's computationally-driven discoveries, more than ever, is the preservation of laboratory notebooks data as a supplement to the methodology section of the scientific article to fill in gaps in data analyses and more fully explain research protocols, presents itself as a timely and important matter. As this data becomes more readily available, questions will arise surrounding intellectual property rights and remuneration (both in the academe and financially). By providing a simple and easy way to submit materials and then publishing them on the internet following the life of the patent, the office is providing a service to both content producers to make this information available online following the period of time where the financial value has been extruded, and then making that information accessible to public readers, the USPTO is acting as an intermediary for the safekeeping of this material and it's long term preservation between content producers and content consumers; i.e. performing the dual role of publisher and digital storage facility. These are not unfamiliar duties for the USPTO, which publishes and stores millions of patent applications and currently offers a full-text, open access, searchable database of patents.

To conclude, adding laboratory notebooks would be one additional measure to ensure reproducibility, an added protection to industry and consumers that the presented research is not "a blunder" prior to making huge investments in paying for rights to utilize that patent. According to a report published by the highly regarded business consulting firm, McKinsey & Company, summarizing its projections on the financial value open data will have for stimulating the economy and affecting a variety of stakeholders, it states, “Governments, companies, and individuals will need to understand how to take advantage of open data. All stakeholders—governments, non-profits, businesses, individuals (as consumers and citizens)—have roles to play in maximizing the benefits of open data. Deriving valuable insights from open data will require new rules and procedures and new attitudes as well as investments in technology and capabilities” (p. 12). It is important that when building the "scalable digital infrastructure" I mentioned in part three, that laboratory notebooks are included.

Take a moment to reflect how this effort could help preserve scientific and medical historical lab notebooks that have led to U.S. innovations and would improve reproducibility of data-driven discoveries. Ask yourself, how could more freely available data help your personal initiatives to streamline costs, increase productivity, and benefit society? Take another moment to consider a future possibility where you or a family member are ill and are chosen to participate in a clinical trial where the treatment received, possibly based on irreproducible results, could result in life or death. Then, take time to sign the online petition:

Disclaimer: This article series should not be interpreted as legal advice or counsel, but does provide some available legal resources for the readers’ consultation and the author’s personal opinions of them.

Corrections of typographical errors were made at 3:12 PM ET, February 11, 2014.

2 Responses to “A Four Part Series on Open Notebook Science (Part 4)”

Leave a Reply

nine − = 3