Metadata the key to unlocking access

Zoe Bartliff

June 16, 2022

As digital collections and research materials become increasingly prevalent, voluminous, and heterogeneous the issue of facilitating access to this material concurrently increases in complexity. Effective use of metadata has proven the key to unlocking avenues of access within cultural heritage collections of various types, increasing the discoverability and usability of research data (Clarke et al., 20191, Orgel et al., 20152, Burke & Zavalina, 20203). Considerations of access are multi-faceted and, with reference to cultural heritage collections, incorporate a range of elements, each interlinked with one another. Of particular interest to my research are the practical concerns of facilitating access, the conceptual challenges of facilitating user engagement with content and, most recently, the legal and ethical ramifications of access. Each of these branches of investigation, both in relation to archival collections and the study of medieval texts, have centred on the use of metadata and as such has encountered corresponding benefits and challenges. The variety and potential wider applicability of these investigations are the focus of this post.

To turn discussion first to medieval manuscripts, my work in this area explores methods for approaching and moving beyond a long-standing roadblock found in the study of the medieval Welsh law texts, The Laws of Hywel Dda (Cyfraith Hywel). As a rich source of legal, historical and cultural information about medieval Welsh culture, this body of manuscripts is invaluable for understanding the evolving situation in medieval Wales. Scholars have long expressed the desire to explore the distinctions and similarities between the manuscripts. However, the complexity of the tradition, spanning approximately 80 manuscripts dating across 700 years in combination with the non-standardised language typical of medieval manuscripts, has long frustrated attempts by researchers to gain an overarching perspective on this material. The text is composed of many sections, each focused on a particular aspects of Welsh law (cf the Cyfraith Hywel website 4). In each version of the text, these are adapted, rearranged, added or deleted at the need of the author, just as the vocabulary, style and modes of expression shift naturally over time and in response to the author’s environment. As a foil to this issue, my work explored enhancing practical access to the textual data through the manual XML encoding of the data in line with the TEI metadata standards.

Commencing with the textual data which was presented on the Welsh Prose5 website, each word in the textual data was supplemented with semantic and grammatical metadata and incorporating structural information relating to different sections and subsections of the text. It was therefore possible to support an unprecedented level of access to the collection, without encountering the issues raised by, for example, non-standardised language or spelling errors. By enhancing the ability to search, filter and compare aspects of the manuscripts, this practical focus on access facilitated the more conceptual concerns of access, supporting the application of statistical language processing methods to understand the scope of different sections, the relationships that exist between them and the placement and usage of keywords. In combination, these techniques supported the empirical investigation of long held theories relating to these texts, sometimes supporting established thought and at other points challenging it. For instance, it was possible to support the validity of the long-held practice of grouping the manuscripts based on their contents. In contrast, the perceptions about the extent to which these groupings possess internal cohesion are challenged when the text is explored for statistical similarity. The supplemented metadata offered the ability for a systematic and holistic overview of an otherwise opaque collection of texts to be developed and explored (cf. Bartliff 2021a6, Bartliff 2021b7). In such cases, the high-level metadata associated to manuscripts with, for instance, Dublin Core, is complemented by more detailed, research-focused enhancements to support discoverability and comprehension of the textual data.

Another branch of my research involves the investigation of several legacy hard drives that form the digital branch of a filmmaker’s hybrid archive. My research in this area was intended to supplement the drive-level metadata practices employed by the archive, to facilitate investigation of the contents. As the first point of contact with the drives, the research utilised digital forensics to extract metadata that had been automatically generated by the computer as records of drive activity within the collection. This metadata supplied the information for practical access - allowing for the identification of key software, for instance. When investigated with a variety of data exploration techniques, the extracted metadata also facilitated comprehension of the collection as a whole, opening up the possibility for the efficient surveying of the drive contents, for instance through the creation of visualisations that could be utilised by users with a range of skill levels and needs. For instance, we created a variety of visualisations relating to the identification of the filmmaker’s creative process (Bartliff et al., 2020). Overviews of the disk content were aimed at supporting archivists in the appraisal process and users in discoverability of content (Bartliff, Kim & Baxter 2020). More recently the focus has shifted towards a forthcoming paper (the groundwork of which is available at Bartliff, Kim & Hopfgartner 2022) which examines how the data and metadata specific to email collections might be employed to create privacy-aware visualisations to facilitate degrees of access to sensitive material.

Both the rich variety of automatically generated metadata, such as those found within hard drives or other computer systems and the ability to supply generous, flexible metadata supplementsin the form of XML encoding provide an impressive array of opportunities for exploring, enhancing, and accessing research data. These go far beyond the traditional library or archive level approaches to metadata. Uncovering how these metadata might be combined to enhance the research potential of collections is an interdisciplinary and rich area of research.

About the author

Zoe Bartliff is a researcher based in the Information Studies department in Glasgow. Her research focuses on computational digital humanities methods and how these can transform access and engagement for users. She has contributed to a wide range of teaching activities in the fields of digital humanities and archival studies, also contributing to research in these areas, for instance through her work on the AHRC funded “The legacies of Stephen Dwoskin Project” and the AHRC & IRC funded “IIIF 4 research” project.

  1. https://doi.org/10.1108/JD-01-2019-0003 

  2. https://doi.org/10.1007/s00799-015-0138-2 

  3. https://doi.org/10.1002/pra2.429 

  4. http://www.cyfraith-hywel.cymru.ac.uk/en/llawysgrifau-disgrifiadau.php 

  5. http://www.rhyddiaithganoloesol.caerdydd.ac.uk/en/ 

  6. https://theses.gla.ac.uk/82440/ 

  7. https://aevum.space/darkarchives/20/21/unedition/Cyfraith_Hywel