Skip to main content
logo

2018 Conference and Developers Day

August 20–23, 2018

Clarion Congress Hotel, Prague, Czech Republic

Search or browse the IGeLU 2018 programme

 

 

Feeding PDF files to Rosetta: favourite food?

Monday, August 20, 2018 at 3:15 PM–4:15 PM CEST
Leo
Conference or Developers Day

Conference

Abstract

Do you have PDF files in your collections? Do you think they solve all your archival problems? Think again. PDF files are mighty difficult and can be broken in myriad ways. Nevertheless, PDF is the go-to format for text-based content in digital preservation. Therefore the Digital Archive has to be able to detect file errors and repair them to ensure long-term availability. The ZBW library has been using Rosetta since 2010 and 10% of our ingested files are PDF files. Thus, we are currently dealing with 134,000 PDF files in our Archive (16 different flavors: PDF 1.1–1.7, PDF/A, PDF/X etc.). Rosetta checks the file validity using the validation tool JHOVE. However, JHOVE states that 22% of our PDF files are invalid. We urgently need post-processing for these PDF files. We want to share our lessons learned and show what works and what does not (yet) work in Rosetta; ingesting PDF-files as well as migrating them within Rosetta’s preservation planning module.

Main Topic

Rosetta

Presenters

[photo]
Yvonne Tunnat, ZBW
Presenter's job title

Preservation Manager

Co-Presenters

[photo]
Moderator: Dave Allen, State Library of Queensland, Australia
Co-presenter's job title

Lead, Enterprise Architecture

Loading…