Archival aspects of e-mail

Dr Terry Stokes

Office of the National Health and Medical Research Council

I would like to begin by confessing that I am not a professional archivist. I nearly was - I was offered a job in the Commonwealth Archives in 1977, after I finished my first degree, but the arrival of an offer of a scholarship and a higher degree place in the Department of History and Philosophy of Science (HPS) at the University of Melbourne led to a different future.

That future re-intersected with professional archivists when, some years ago whilst I was working for the Australian Research Council, I encouraged Rod Home from HPS - who many of you will know began the Australian Science Archives Project (ASAP) - to apply for funding through what was then called the ARC Mechanism 'C' research infrastructure scheme. ASAP has gone on to one of that scheme's success stories.

Anyway, what follows are the reflections of an user of archives, as a sometime historian of science; of a sometime anthropologist of science who studied the relationship between the practice of his tribe of scientists and the evidence it produced; and some one who is now a 'bureausaur' charged with trying to make a paperless research grant application system work for the National Health and Medical Research Council (which it certainly doesn't at present).

Letters form a pretty important source for the history of science. A quick tour of the site found the following volumes currently in print:

Robert Oppenheimer: Letters and Recollections; Always, Rachel: The Letters of Rachel Carson and Dorothy Freeman, 1952-1964; The Early Letters and Classified Papers, 1660-1740 (Collections from the Royal Society); Letters Addressed to H. R. H. the Grand Duke of Saxe Coburg and Gotha, on the Theory of Probabilities; The Letters and Papers of Jan Hendrik Oort: As Archived in the University Library, Leiden; Letters and Papers of Robert Boyle: From the Archives of the Royal Society; Partners in Science: Letters of James Watt and Joseph Black; Elie Cartan and Albert Einstein: Letters on Absolute Parallelism; Science and the Cure of Diseases: Letters to Members of Congress.

But letters were more important as a source than they are now - for that we must blame the telephone. Alexander Graham Bell was no friend of the archivist. Before the telephone, snail mail ruled remote communication and collaboration, providing a rich resource for those who would study science and scientists – historians, methodologists, biographers could, and did mine this lode to get behind the bland and often misleading public scientific documents.

With the popularity of the telephone, much of this was lost to a medium which, whatever its advantages, had, and has the considerable disadvantage of not usually leaving an archival artefact for later study (there are exceptions, as Richard Nixon lived to regret).

Then came facsimile, which has the speed of the telephone, without the problems imposed by time zones. The author sends at her or his convenience, the receiver reads and responds at his or hers. There is also, in the fax, the opportunity for detail and precision, for sending data, for communicating drafts. And, for the archivist and posterity, there is a record of it all. An important collaborative paper which was published in the 80s might well be studied with the assistance of a series of drafts, perhaps with hand written emendations, faxed between the co-authors enabling matters such as their respective contributions to be examined.

(Mind you, the early fax machines used photosensitive paper, which faded to nothing if left exposed to sunlight. They had to be photocopied and the copy kept if a permanent record was to be created. This was OK for clearly important faxes destined for filing; but many everyday communications disappeared in the light of day. Plain paper fax machines fixed that problem, however.)

Now e-mail combines the advantages of the fax with some of those of the telephone, as well as having some of its own. Time zones impede communication much less between fax and e-mail than for telephone. Even within Australia, during our patchily observed daylight saving period, a telephone call from a university in Sydney one in Perth is unlikely to be answered until getting on for Noon, EDST. Similarly, call from Perth might well go unanswered in the Sydney laboratory any time after 2pm WST. The need for both to have lunch further restricts the convenient overlap between the two States.

Although a fax with a signature has the same legal authority as hard copy, faxes are frequently of relatively poor quality, and usually will not scan. This means data and text must be re-keyed, or hard copy sent snail mail. There is no such problem with e-mail. E-mail also provides some of the immediacy of a telephone call – exchanges can be, often are less formal than is the case with either snail mail or fax. The popularity of news groups shows this is true even across time zones.

I read somewhere of a study that suggests there is a difference between men and women here. Women, it was said, are more likely to think of an e-mail as a formal communication like a business letter, while men think of e-mail as more informal, like a telephone call. Anecdotal support for this comes from a former colleague of mine at the University of Wollongong who regularly copied his highly uncomplimentary e-mail to the Vice-Chancellor to *@* – until, that is, the University’s solicitors drew his attention to the law of libel.

Communications media, of course, have relative costs as well as relative convenience. Notwithstanding the considerable beneficial effects of competition, ISD and ISD telephone calls cost a lot. Sometimes this is not regarded as a major consideration – at least not until the bills come in and the now devolved departmental or laboratory budget starts a red shift. Faxes are cheaper than phone calls because they can compress more information into a given minute of connect time. When did you last hear someone reading his or her paper to you down a STD telephone line ? On the other hand, it won't have been long since you last heard them say, "I’ll just fax you the most recent draft."

Now the great virtue of e-mail is that it doesn’t appear to cost anything. Of course, that isn’t actually true. What is true, though, is that most individuals do not incur a personal cost, at least not for their business e-mail. When I worked for the Australian Research Council a few years back, I often represented it in discussions about AARNet – for which the Council partly paid. I guess the prevailing economic rationalism in Canberra must have seeped in, because I was always arguing that if AARNet charges reached the typical user – an e-mail user, primarily - the exponential growth in usage might flatten right out.

But, really, even if you do have to pay for e-mail, it is pretty economical. Monash University – very much an enterprise interested in levelling the playing field, and for which I worked until recently, has introduced devolved e-mail cost recovery at the departmental level. Unless, however, you were involved in truly mass data transfer – computing data – the costs were trivial. As a private e-mail user, as well, the costs are modest. My Telstra BigPond Internet Service Provider charges me $20 a month for 7 hours connect time, $4 an hour after that. The hourly cost can be reduced a fair bit on that if you use more hours per month, and BigPond is by no means the cheapest ISP. And, of course, unless you happen to live in the country, 25c buys you an untimed connection.

Letters get kept – of course that’s why they are an important archival source. In our working lives and, perhaps to a lessor extent, in our private lives we keep the letters we receive. As we learn about the world of work, we learn about the formal and informal rules which govern the keeping of business correspondence. In the public service, and in organizations like universities there are lots of these. When I first started work at Monash, I was puzzled to find personal correspondence arriving on files as folios marked to my attention – the first example appeared to have been to the Vice-Chancellor first! The reason was that, as Director, Research Services, all my snail mail was opened by Registry, which sought to identify the correct file or, failing that, establish a new one. All my snail mail, that is, except that marked ‘Personal’ or ‘Confidential’, so I had to start training my friends and colleagues to do that. (On the other hand, all my official and private e-mail escaped the Registry entirely; but more of that later.)

So lots of letters get formally kept, on a file. But lots of others just get stuffed in boxes. Personally, I’m not very good at filing – I was an academic before I was a bureaucrat. When I open a letter, it gets generally dealt with there and then, ‘circular filed’ in the bin, or put on one of the many piles which litter my desk. Sometimes this is for later action (a bad idea - they get buried), sometimes just because I think it needs to be kept – say, for example, because it is part of a collaboration on a joint research project, or proposal to jointly tender for a consultancy.

Eventually the clutter on my desk precludes its use for any purpose other than holding up the piles of paper. I am forced to go through it all, and decide a fate for each piece of paper – the bin, a file, or just to be kept in a box long term because I might need to refer to it latter (e.g., if I am subjected to an audit by the Australian Tax Office). I think lots of us work in this way. Every time I move jobs (and, usually, cities) boxes of paper move from office to office. Because I have never spent more than four years in the same job or city since I left high school, occasionally I give these boxes a hard look, and ask whether I really need:

  • more than a decade of tax receipts
  • all my undergraduate essays, or
  • the photocopies of articles which, as an undergraduate, had particularly impressed me.
When I left the University of Wollongong for Canberra in 1990, all of the above went to the recycling bin. A less peripatetic person would probably just continue to accumulate.

Of course, only I will ever be interested in my accumulation, correspondence included. But these are common habits, common among scientists as well as bureausaurs. They provide much of the personal archival material with which the Australian Science Archives Project works.

One similarity between snail mail letters and faxes is that they both have a hard copy existence. As a result, they tend to be treated alike in terms of preservation. If anything, since faxes are generally urgent, which tends to flag them as important, such communications have a greater chance of escaping the circular file. Anyway, important faxes are often marked ‘hard copy follows’, giving them double the chance of being preserved (plenty of files end up with both on them).

E-mails, I think, have a much chancier prospect of survival. I noted above that they tend to evade the formal processes which institutions of various kinds have long established for dealing with correspondence. Yet the fact of the matter is that e-mail has, for many of use, come to assume a dominant role in our daily work. I certainly receive many more e-mails than snail mails a day, more e-mail in fact than telephone calls. For the first time in my present position at the NHMRC, I find myself receiving regularly minutes (memos they’d be called outside government bureaucracy) as attachments to e-mails only (i.e., no hard copy) – and minutes are the stuff, the very essence of bureaucratic communication. The scientific equivalent is draft papers and pre-prints, and I have little doubt that e-mail is becoming - very probably has become - the principal means for communicating these.

What do we do with our e-mail when we get it? I suspect the great majority of us deal with it online; that is we read it on screen and reply to it, if necessary, there and then. Just there lies the first risk – if you don’t check the Save Reply box, not even an electronic copy of will be kept. The original message will have at least a somewhat longer life. It will be consigned to a Read folder automatically in many e-mail packages, or at least kept for a time with an indication that it has been read. What next ?

Well, the short answer is, I think, that there are no established conventions. Some people print all their e-mail and then delete it. Some print only that which needs to be filed. These people are likely to delete the electronic version, either then or later; but there will be no harm in that. It the rest of us, who generally don’t print their e-mail who create the problem. But why a problem ? After all, there they still are filed electronically somewhere by the e-mail Postmaster. Well, that’s just it. Your e-mail isn’t stored on your PC’s hard disk; it’s stored on a big server disk somewhere else that somebody else controls. And you will be allowed just so much space on the server for your e-mail, and then be told you have to delete some or you won’t get any new e-mail. If you get many attachments, this limit will be reached sooner rather than later.

The email package we use in the Department of Health and Family Services automatically deletes mail more than 21 days old which had not been moved from the Outbox or Read folders to other, specially created folders.

Of course, this isn’t all that hard to deal with. Various strategies are possible:

  • print out and delete all old mail
  • delete ephemera, including attachments which have been down loaded and saved elsewhere, leaving only electronic copies of important e-mail
  • copy important e-mail to the ‘C’ drive on the PC, perhaps ‘zipped’ up to save space.
The trouble is all of these approaches usually require one to open and read all that old e-mail, and the attachments, in order to assess their importance. It is much easier just to highlight all the old e-mail and delete it. Just this morning, I made a conscious effort to construct an electronic filing cabinet for the e-mail I had received or sent and need to keep. Even though I've not yet been in this job a month, there were more than a hundred e-mails to file. It took more than half an hour.

Suppose we take a concrete, scientific example, a collaboration leading to a jointly authored scientific paper. These days, this might well have generated the following e-mail and attachments covering the following:

  • a research idea, evolving over time
  • a research proposal in the form of a grant application to an agency in various versions, together with human and/or animal ethics approval applications to local ethics committees
  • research data, with various proposed interpretations and implication for further work, leading to
  • draft sections of the paper from various of the collaborators covering the work they were responsible for, followed by
  • full draft manuscripts, exchanged between the co-authors, perhaps using a word processing program which shows changes between versions
  • an e-mail from the editor of a journal, together with anonymous referees reports, inviting responses
  • e-mails among the co-authors developing such a response
  • the response, and let us hypothesize, an e-mail accepting the paper
  • e-mail exchanges among the collaborators discussing who should receive pre-prints (by e-mail, of course)
  • requests for preprints, and comments from those who have read them.
It is not, I suggest, at all far-fetched to suggest that none of this will have a hard copy existence, whereas within quite recent memory almost all of it will have had.

Suppose, now, that our authors decide that once the paper finally appears, in paper form, or perhaps in an electronic journal (of which there are more and more); that is the time to delete all the above ? It could be as simple as highlighting the folder called ‘such-such-paper’, and selecting Delete in the File menu.

There are further twists. Even scrupulous electronic preservation of the e-documentation might not succeed. Notoriously, there are now few, or even no computers that can read data and run programs scarcely more than thirty years old. One day soon COBOL will pose the same problem as Linear ‘B’ Punch cards already resemble cuneiform tablets.

The same problems confront e-mail communication – as the regular troubles we have reading each others' attachments should forewarn.. One day, soon I guess, no one will use UUencoding any more. After that has been true for a while it will become even harder than it is now, for some us, to find a way to decode an archived attachment which had been UUencoded. Similarly, Macintosh attachments may become even more unreadable than they are now. And, Mac fans, don’t think it can’t happen. Suppose one day you stumbled on a wedding video dating from the first years of the popular use of VCRs. Do you know where to find a Betamax machine ? Better, it might have been. Gone it also has.

