Conference Papers Online
Session 5: The Archival Control of Systems
CSIRO - Recordkeeping and Electronic Laboratory Notebooks
Joint Presentation by Philip Kent and Rodney Teakle
CSIRO IT Services
The paper explores the issues within the context of the Commonwealth Scientific and Industrial Research Organisation (CSIRO), Australia’s premier government funded science and technology body. Recent developments in electronic laboratory notebooks are reported and information management implications raised.
Part 1 – Rodney Teakle
A Brief Administrative History of CSIRO
The Commonwealth Scientific and Industrial Research Organisation (CSIRO) is a Commonwealth government organisation in Australia. CSIRO’s origins date from the early years of the First World War. In 1916 Prime Minister Hughes’s Government established the Advisory Council of Science and Industry as a means to create a ‘National Laboratory’ and so bring scientific research to a national standing. The Council’s earliest work was to collect information about the state of scientific research in Australia, undertake research, review existing research, and collect and disseminate scientific information.
The moves for a permanent body resulted in the passing of an Act in 1920 to establish the Institute of Science and Industry. The Institute’s work was hampered by a lack of funds and clear mandate for its existence. The imperative for organising research in a country at war had gone and the benefits were not appreciated.
The early 1920s brought in a period of closer cooperation between countries of the British Empire. The 1923 Imperial Economic Conference called for greater scientific and technical cooperation. The Empire Marketing Board was established in 1926 and administered an Empire research fund of one million pounds. Prime Minister Bruce saw that it was time to foster and take advantage of the closer Imperial cooperation and funding and arranged for Sir Frank Heath of the British Department of Scientific and Industrial Research to report on reorganising the Institute. The result was new legislation and a successor agency, namely the Council for Scientific and Industrial Research, and in 1926 research began in earnest under David Rivett, the Chief Executive Officer. Initially, research was concerned with primary industries while from the late 1930s research expanded to include secondary industries. In 1949, the Council was reorganised as the Commonwealth Scientific and Industrial Research Organization (in December 1986 changed to Organisation).
CSIRO has about 7,000 staff located in about one hundred sites varying from very large sites with a multitude of laboratories to small research stations. The twenty-two divisions cover most research disciplines except for a few areas such as nuclear physics and medical research. A few of the divisions occupy only single sites; the others are spread over a number of sites. For example, the Division of Wildlife and Ecology has its main site in Canberra, but it has smaller laboratories in Perth, Alice Springs, Atherton and Darwin. On the other hand, the Division of Human Nutrition has only one site. While the divisions are mostly defined by scientific discipline, much of CSIRO’s research is conducted on a multi-disciplinary basis.
The record systems that have developed reflect the different divisions and the different needs for creating and maintaining records for administrative and research reasons. At one level, the great number of research projects that are undertaken has resulted in a multitude of discrete and unrelated record systems. At another level, there are several recordkeeping systems which cover all CSIRO sites and support specific administrative functions and high level project management.
The degree of formality in the creation and maintenance of records varies greatly. Some types of research records do not need elaborate systems for their maintenance. In particular, this is the case for those records of low research or administrative values maintained by groups or individuals. Some records are created and maintained in very formal and structured systems. In most sites there are correspondence files which have formal systems of control. In the past the control systems have been manual but now most are controlled by electronic records management software. Increasingly, that software is being used to control not only the traditional formats such as the paper correspondence files but also the related electronic records such as email, spreadsheets and electronic documents.
Regarding research records, there are two main areas where they should be maintained in a formal manner. One is where records may be needed for legal reasons and the other is where records may be of long term research value.
A laboratory notebook recording experimental research is a good example of the need to maintain a highly structured record to meet potential legal needs. Undoubtedly, the informational content would be crucial in proving when an experiment was conducted, the results, who did the work and who witnessed it. However, the very structure of the laboratory notebook should have certain properties for it to validate and authenticate the information contained in it. Indelible ink should be used, not pencil; unused parts of the pages should be ruled off; errors should not be erased but crossed out; and each page should be numbered and, when completed, signed and dated. These are just some of the measures to take to ensure the integrity of the record.
Records of long term value tend to be observational records, otherwise known as time-series or time dependant, rather than experimental records. They must be properly created if they are to be understandable and useable to anyone in the future. Scientists creating these records need to estimate whether or not the records will have long term value. If the records are likely to have long term value, the scientists should incorporate from the very beginning of the research process the metadata that will be vital for others to identify, access and understand the records.
Research records in an electronic format require significant resources if they are to be maintained for any substantial period as non-current records. In CSIRO, the great number of electronic records created reflects the number of research projects which are undertaken. Many of the projects are short term, not ongoing. This contrasts with what I see as one of the assumptions underlying some of the theory and practice concerning the preservation of electronic data. Specifically, the assumption is that the systems tend to be large and ongoing. In a research organisation, the function of research is certainly ongoing but it is not one large and ongoing research project, rather it is a preponderance of small and finite projects. Consequently, the total effort to address the issue is greater than would be the case where the electronic information is maintained in relatively few systems.
The recordkeeping is decentralised and diffuse in CSIRO; therefore, the scientists and the support staff are important and integral to any records strategy. Ideally, the CSIRO records managers and archivists need to give most assistance and attention to the scientists and support staff at the start and finish of projects. This will give them the best chance of explaining good recordkeeping practices to best effect and placing them in the position of providing assistance when the disposal of the records is considered. We have been saying ‘When a research project is started, that’s when you need to think about the level of recordkeeping that is needed. And when a project finishes, before everyone goes to other projects, work out what records need to be retained, what should be more clearly documented by way of contextual information, and undertake any basic preservation measures’.
The effort put into any aspect of creating and maintaining records must be commensurate with the value and importance of the records. The time and effort of staff should be spent where it counts. Those who provide advice on recordkeeping will lose support and credibility if they ask scientists to create records without taking into account the degree of importance of the records and the expected retention periods. Spending time on the structures and contextual information should be done only where it is clearly worthwhile doing so.
Part 2 – Philip Kent
Today I will present a case study of a recent development in CSIRO and then talk about electronic laboratory notebooks.
Last week I was in Hobart, where I was delighted to find out about a new initiative. Our Marine Laboratories cover the disciplines of oceanography and fisheries that are very data centred. They create considerable data that is used continuously. They create many systems, particularly related to the project-based nature of the work. Many systems can be quite small and reside on a PC. In other cases, the laboratory uses larger databases that have been developed over time. The laboratory may manage these databases centrally to facilitate re-use. A good example is a database or system emanating from a specific scientific project. CSIRO owns two large ocean-going vessels that conduct regular voyages, generally of two weeks duration. When the scientists return from those voyages there is a vast amount of electronic data that is part of the project that is either migrated to local systems within projects or into larger systems within the laboratory. The reason why they have embarked on this data management project is because the information within those databases is absolutely vital to their research. They use and reuse data to support ongoing research.
The system that was launched last week is called MARLIN. Unfortunately because of security you can’t look at it. It’s on the web page for the Marine Laboratories, but you won’t be able to go beyond the first page without specific security. MARLIN stands for the Marine Laboratories Information Network. They’ve developed a web-based tool that was developed by ERIN, an environmental database from the relevant federal department in Canberra. ERIN gave our laboratory permission to use the software. Part of the deal is that the work that CSIRO does to enhance the tool is available to ERIN for long term development.
Basically it is a web-based system using forms to describe the metadata for those databases used within the laboratory. It has a very easy look and feel for a system and it’s got the usual pull-down menus when you are creating the metadata files. It is available for laboratory-wide access. There is the possibility of using security to limit access, but in general the nature of the work is not top secret. The approach taken in developing the metadata and building the information database is at the project level. This is the most logical place to create the metadata because they are the people that know how they created the data, what type of systems it is mounted on and who are able to describe it in a meaningful way.
Consistent with my past background in librarianship, I was interested by the use of controlled vocabulary in the development of the system. Many scientific disciplines have thesaurii or taxonomies that are very useful to build into these types of systems. For example, they’ve used the Aquatic Science and Fisheries Abstract thesaurus as a common vocabulary for when they’re entering the metadata about the projects and the systems. Similarly with the work that’s done in the fisheries area, they point and click a taxonomic list to identify the types of fish that might be involved in the project.
A really useful aspect about MARLIN is that there is the potential to link to other records held in the laboratory. For example, the project will have files in the records management system and it may be possible to build in links there. Another area that has been suggested for linkage is with the publications database. The end of the scientific chain is the publication of a scientific paper, so it would be useful to have some cross linkages there as well.
Marine science is a collaborative area of research and our laboratories work very closely with external bodies, both Australian and overseas. They exchange data with each other, and the benefit of this system is that they can link into other organisations. When building the metadata they can specify where the data is going to be exported to at other sites as well.
The MARLIN experience was exciting for me. The launch that I attended engaged 80 scientists who asked questions and exhibited considerable interest because this is their core business. I think that it provides a good model for work in other divisions of our organisation.
I want to talk a little bit about electronic laboratory notebooks as another area of electronic systems that are being used, or could be used, in the scientific area. As Rodney Teakle mentioned, the area of intellectual property is important and provides a background to the paper notebooks. We have the whole regime set up to record all the details of the time and date and who did the work as a safeguard against a court action at a later time.
An electronic laboratory notebook should mirror what is available in the paper format. In other words, security issues like signatures, time stamping and date stamping are all very important in the electronic arena as well as the paper arena.
A large number of our scientists are doing the majority of their work in electronic format and it seems almost ridiculous that we’re continuing to use paper notebooks in this current environment. Really, the paper notebooks are inadequate. We have examples in our laboratories where staff print out reams and reams of paper from systems and paste them into the paper notebooks. Counter signatures, dating and other routines are recommended by our panel of patent attorneys.
So, this great dream of an electronic laboratory notebook that can do all that for us is a great dream and that’s all it is at the moment. According to one of the experts in the field, from whom I received email yesterday, there is still no product in the market that meets both the basic scientific needs and also the legal requirements to comply with patent legislation. There are a few products around, but none of them really meet the fully evidential requirements.
As a result of this dilemma, a group called the Collaborative Electronic Laboratory Notebooks Consortium was formed in the United States in December 1995. It’s very much a user-driven group. It’s a consortium of stakeholders from a variety of companies. The consortium is led by a commercial company called TeamScience and the guru in this area is Rich Lysakowski who has got a strong background in both chemistry and chemical information. If you wish to join the consortium, a subscription is due and there are various levels of membership. It is still very much at the research stage, but you can pay a fairly low level of membership that entitles you to the information about what’s happening and what’s coming. It is also possible to a become a major member of the consortium, paying more money, but receiving free software at the end of the process.
The major members of the consortium come from the chemical, pharmaceutical, and applied technology areas. Understandably, these are the areas where there are many implications for patents. Some of the members are wealthy companies that have a vested interest in this area. One of the important considerations is the team approach to science. The concept of a single notebook for a single scientist needs to be expanded to become a resource for the project team as a whole. The collaborative concept is particularly important in the electronic area; the electronics can assist the process.
The approach to the project is not to go out and start from scratch and build hardware and software solutions. Instead, they are working with existing hardware and software and working with the vendors to modify the products in a way that can meet the evidential requirements. Adobe is, I believe, a strong member of the consortium, and I know that from past discussions with TeamScience that Lotus Notes is also being looked at. One of the reasons why they’re looking at these types of products is that these are already alive and well in organisations. They are hoping to build on existing platforms rather than to start totally from scratch. The first software component is due in early 1999, so early next year they are planning to have the first software rolling off the assembly line.
One of the approaches is to incorporate object-oriented technologies. This reminds me of Didier Devriese’s comments about the importance of building relationships between records. I think that this is really exciting. Through object-oriented technologies where everything is stored as an object, valuable linkages can be established to support a record keeping point of view. Didier Devriese’s example included a large number of laboratory notebooks and records in other types of formats but no interrelationship between the two. This project should resolve the linkages.
The software will support multimedia, and this again goes back to some of the things Didier Devriese said yesterday about the need to be able to record three-dimensional issues. They’re using client server types of technologies because of the shared nature of the work. For example, it would be possible to link Excel spreadsheets, hard data from various systems, scientific systems and video clips. The issue of personal reminiscences raised yesterday is also relevant. It would be possible to incorporate audio clips of a scientist talking about his project or even describing things as they happen. The possibilities are endless.
The project involves a hardware/software solution that is legally acceptable. TeamScience are collaborating with the major intellectual property bodies, such as the patent offices in each of the major countries. I should highlight the interest and contribution of David Bearman. He has been involved with Rich Lysakowski in feeding ideas into the project.
In the email that I had from Rich Lysakowski yesterday, he said that they’re looking for people to work on the project. If there was anyone from ‘down under’ that is looking for a sabbatical on the East Coast of the US, let him know.
What is CSIRO going to do about electronic laboratory notebooks? Well, obviously we’ve had an interest in this area for sometime, but because there are no deliverables to date, our path is uncertain. One of the things that I’ve been planning to do for some time is to bring Rich Lysakowski to Australia. He conducts one-day workshops that are comprehensive and challenging. I attended one two years ago in Pittsburgh. Rich Lysakowski is a very passionate and knowledgeable expert – perhaps the world’s best in this field.
So, watch for details of an Australian lecture tour around some of the major cities. It will be run on a cost recovery basis, involving external attendance like the audience at this conference.
CSIRO is unsure whether we’ll join the consortium. We want to see what results. Although a large organisation, CSIRO is not quite in the same league as Monsanto and some of the very large pharmaceutical companies. So we’re maintaining a watching brief at the moment and following a visit from Rich Lysakowski we’ll be in a better position to determine our needs.
I was talking yesterday with Anne Barrett and Didier Devriese. I gave a paper on this issue in Liege, Belgium two years ago where they were present. Unfortunately I am reporting today the same as I was two years ago, that there is still not a suitable product in the market place. However there is certainly some very positive work happening and there are very important needs of the science community that this product should be able to meet.
Published by: Australian Science and Technology Heritage Centre on ASAPWeb
Comments or questions to: ASAPWeb (email@example.com)
Prepared by: Helen Morgan
Graphics by Lisa Cianci
Date modified: 8 October 1999