The story so far...
Before diving into the Fellowship - a little about how I ended up here.
The early months of the COVID-19 pandemic provided both the opportunity and impetus to explore options for hosting our digital assets online, reuniting collections and researchers in a virtual space. In the world of remote work and study, a digital repository was no longer a ‘nice-to-have’ but ‘need-to-have’. However, in a model of bad timing, our repository had been decommissioned shortly before the pandemic began, and the tender process for a new one had ground to a halt in the disruption that followed. We needed a back-up plan.
Many of our collections had been photographed over the years, and so even though I was working at a makeshift desk in my bedroom, I had remote access to the images via OneDrive. So far so good. The metadata was another matter. Many of our images had been supplied for digitisation projects led by partner institutions, such as the National Library of Wales. Filenames and metadata had been supplied to project partners in accordance with their specifications, rather than internal protocols. These requirements tended to change on a project-by-project basis, leaving us with inconsistent, incomplete, and non-interoperable metadata. Even if I could find a place to host the images, there would be a huge amount of work required behind the scenes to reconcile the metadata. I consoled myself that, in the short term, access was still available via those partner projects - or was it?!
In the first few weeks of lockdown, myself and my colleagues scrambled to develop a resource guide for digital primary sources, to help our final year humanities students who had been abruptly cut off from access to archives just as their dissertation deadlines were looming. While compiling our guide, I discovered that one major collaborative digital project we had submitted content to was no longer live or accessible. It was time to audit of all our digital content and metadata, and search for free, temporary accommodation - fast.
The Internet Archive came to the rescue. Perhaps more well known for archiving websites, the Internet Archive can also host copies of books and archives, provided that an institution is willing to license their content for public access. If your files are named systematically and incrementally, they can be uploaded with some basic metadata, and the Internet Archive will do the rest, for free.
Free, yes, but not necessarily fast. All those image files needed renaming, for one thing. And via the web interface, it was only possible to upload one digital object at a time. I had hundreds… A little googling revealed some documentation about batch ingesting using python via the command line. This could be the answer, if only I could penetrate the developer-speak in which the guidance was written. The only thing I knew about python was that it could be used to make bulk changes to files. Could it also help me systematically change all those filenames? It was time to get under the hood.
I was defeated at the first hurdle. Batch uploading to the Internet Archive required a ‘Unix-like environment’. Whatever this was, Windows didn’t have it, and I was reluctant to attempt to create a Linux partition on the hard drive of my elderly laptop, brought out of semi-retirement for what I thought would be a brief period of home-working. I succumbed to a long-held temptation, and acquired a Raspberry Pi. These tiny, budget computers are able to run anything that requires Linux, and fast.
To cut a long story short, I indulged in a few weeks of testing and trialling my now toy, amid MUCH googling, until I had written simple python scripts to batch change all of our image filenames, and connected our image folders with our Internet Archive account via the command line. Next came an epic spreadsheet of metadata, and finally, hours of overnight uploads, as Internet Archive ingested all our collections. I’ve documented the nuts and bolts of these processes elsewhere, and in the interim a similar resource has been developed by SUCHO (Saving Ukrainian Cultural Heritage Online), which is leveraging the Internet Archive to swiftly rescue and secure archives threatened by war in Ukraine.
Following this work, I gave presentations at conferences about my experiences of using the Internet Archive to do digital ‘on a shoestring’. I realised the level of interest from other institutions, and began to wish I had more time to dedicate to researching free tools, and sharing them with others.
When the call came around for the RLUK-AHRC Professional Practice Fellowships, my application wrote itself. This was the opportunity I’d been looking for to build on the skills and knowledge I’d been able to acquire during the pandemic’s disruption to business-as-usual. For all the anxiety and uncertainty of this time, it provided a unique chance, as individuals and as organisations, to review and reassess our direction and priorities.
Putting our collections on the Internet Archive achieved far more than just making them accessible remotely. It showed our stakeholders what we could offer, and helped to build a case for internal investment. Cardiff University subsequently acquired a subscription for Alma Digital, an Ex Libris product, and I was able to lead its implementation, configuration, and deployment. Alma Digital is IIIF-enabled - more on this in a future post - and we are only just beginning to scratch the surface of its potential. Among other outputs, the Fellowship will provide the opportunity to review researchers’ requirements and aspirations for existing and future digital archives, and test the capability of our new digital repository to support these needs.