This is an old version of this page. You can view the most recent version or browse the history.

Home

CMU Archives Repository Roadmap v.1

										2017-08-09

Main sub-projects:

1. Metadata clean-up

 -pull metadata out of ArchivalWare.  Fix, normalize, augment & convert.

2. Scanning Workflow

 -automate OCR & ingest

3. Archive Repository Software

 -build repository, workflows, UI & preservation

Can be done in parallel: Metadata Cleanup--------------------------------
\ (ingest into repository) Scanning Workflow----------------
(OCR) (ingest into repository) Archive Repository-------------------------------------------------------------…------->

Tasks in the sub-projects:

Metadata Clean-up

-dump from ArchivalWare

check for completeness -(restore data lost on ArchivalWare import?) -evaluate -plan [vocabularies, taxonomies, schemas to use for each type of collection] [vocabularies, taxonomies, schemas to use in common – DublinCore/MODS +?] how do we get from metadata we have to metadata we want? -automate conversions we can - openrefine / conciliator + cmu modifications? -automate verification/analysis (e.g. show all values of each field & # occurances) -global find-replaces -convert to standard md models (DC/MODS) -(eventually map to linked data?)

Scanning Workflow

-bagit (-bagit with collection affiliation? -use scanned path?) -Abbyy OCR automation -command line version -config file version -server (pull tiffs from repository, push OCR/text in) -PDF web optimization(?) -conversion scripts to do: -derivative generation (e.g. text, thumbnails, JP2, epub etc.) -metadata lookup (use id to get metadata) -FITS metadata generation (file info tool set technical md) -ingest into repositories (CMU’s ArchiveRepository, figshare, box, etc.) -backup (rsync)

Archive Repository Software

-evaluate current state (hydra/CLAW) -choose platform (probably CLAW. both supposedly interoperable - PCDM) -meet w/ Pitt about their CLAW experience [8/21/2017, ongoing] -discuss w/ peer institutions (Penn St, U Maryland, U Oregon) -discuss w/ CLAW team -CLAW tech calls -map each type of our collections to existing/needed templates (photos/books/newspapers/finding aids/etc.) -put a collection which is supported into system for evaluation -determine needed pieces which are not being worked on (based on our use cases) -build missing pieces (easier said than done) -user interface design -admin user interface design -preservation layer (determine what this means & provide it) -APIs (OAI-PMH (for Primo etc.), LOCKSS (for MetaArchive? Etc.) , SPARQL) -staging system w/ administrative functions -production system -usability studies -accessibility testing

References

Islandora CLAW https://islandora.ca/CLAW Intro to Islandora CLAW https://islandora-claw.github.io/CLAW/user-documentation/intro-to-claw/
Islandora CLAW MVP https://islandora-claw.github.io/CLAW/mvp/mvp_doc/

LoC Standards (EAD,MODS,METS,MARC,MARCXML,VRA, etc.) http://www.loc.gov/standards/ VIAF: The Virtual International Authority File https://viaf.org/ W3C Linked Data https://www.w3.org/standards/semanticweb/data