For my UT Austin capstone project, I worked with the Dolph Briscoe Center for American History to preserve 113 3.5" and 5.25" floppy disks, dating from 1985 to 2003, in the Walter Cronkite Papers. Existing workflows directed how to image the disks, generate checksums and minimal metadata for each disk, and upload the resulting files into the UT DSpace digital repository for preservation and access. While this process provided access to the disk images, researchers would still need to examine the contents of each disk to identify relevant files, many of which were in obsolete file formats that current applications could not open.
To address these issues, I created a toolkit of bash and perl scripts to generate more detailed metadata and to provide direct access to the file contents. The bash scripts use command line utilities to extract metadata embedded in the files, such as file name, date created, date last accessed, size, file extension, and file format, as well as the directory structure of each disk. The perl scripts consolidate this metadata by series, resulting in tab-delimited files that may be opened as a spreadsheet for browsing, searching, sorting, or filtering across multiple metadata elements. The bash scripts also converted each file to a text file (for files on the 5.25" floppies, many of which were created in long-dead applications such as DisplayWriter), or PDF (when the original formatting could be preserved). These access copies were added to the digital repository, so researchers could immediately read or search the full-text contents, greatly enhancing access to the born-digital portion of the collection.