|
|
CMU Archives Repository Roadmap v.1
|
|
|
2017-08-09
|
|
|
Main sub-projects:
|
|
|
1. Metadata clean-up
|
|
|
-pull metadata out of ArchivalWare. Fix, normalize, augment & convert.
|
|
|
2. Scanning Workflow
|
|
|
-automate OCR & ingest
|
|
|
3. Archive Repository Software
|
|
|
-build repository, workflows, UI & preservation
|
|
|
|
|
|
Can be done in parallel:
|
|
|
Metadata Cleanup--------------------------------\
|
|
|
\ (ingest into repository)
|
|
|
Scanning Workflow---\-------------\
|
|
|
\(OCR) \(ingest into repository)
|
|
|
Archive Repository-------------------------------------------------------------…------->
|
|
|
|
|
|
|
|
|
Tasks in the sub-projects:
|
|
|
|
|
|
Metadata Clean-up
|
|
|
-dump from ArchivalWare
|
|
|
- check for completeness
|
|
|
-(restore data lost on ArchivalWare import?)
|
|
|
-evaluate
|
|
|
-plan [vocabularies, taxonomies, schemas to use for each type of collection]
|
|
|
[vocabularies, taxonomies, schemas to use in common – DublinCore/MODS +?]
|
|
|
how do we get from metadata we have to metadata we want?
|
|
|
-automate conversions we can - openrefine / conciliator + cmu modifications?
|
|
|
-automate verification/analysis (e.g. show all values of each field & # occurances)
|
|
|
-global find-replaces
|
|
|
-convert to standard md models (DC/MODS)
|
|
|
-(eventually map to linked data?)
|
|
|
|
|
|
Scanning Workflow
|
|
|
-bagit
|
|
|
(-bagit with collection affiliation? -use scanned path?)
|
|
|
-Abbyy OCR automation
|
|
|
-command line version
|
|
|
-config file version
|
|
|
-server (pull tiffs from repository, push OCR/text in)
|
|
|
-PDF web optimization(?)
|
|
|
-conversion scripts to do:
|
|
|
-derivative generation (e.g. text, thumbnails, JP2, epub etc.)
|
|
|
-metadata lookup (use id to get metadata)
|
|
|
-FITS metadata generation (file info tool set technical md)
|
|
|
-ingest into repositories (CMU’s ArchiveRepository, figshare, box, etc.)
|
|
|
-backup (rsync)
|
|
|
|
|
|
Archive Repository Software
|
|
|
-evaluate current state (hydra/CLAW)
|
|
|
-choose platform (probably CLAW. both supposedly interoperable - PCDM)
|
|
|
-meet w/ Pitt about their CLAW experience [8/21/2017, ongoing]
|
|
|
-discuss w/ peer institutions (Penn St, U Maryland, U Oregon)
|
|
|
-discuss w/ CLAW team
|
|
|
-CLAW tech calls
|
|
|
-map each type of our collections to existing/needed templates
|
|
|
(photos/books/newspapers/finding aids/etc.)
|
|
|
-put a collection which is supported into system for evaluation
|
|
|
-determine needed pieces which are not being worked on
|
|
|
(based on our use cases)
|
|
|
-build missing pieces (easier said than done)
|
|
|
-user interface design
|
|
|
-admin user interface design
|
|
|
-preservation layer (determine what this means & provide it)
|
|
|
-APIs (OAI-PMH (for Primo etc.), LOCKSS (for MetaArchive? Etc.) , SPARQL)
|
|
|
-staging system w/ administrative functions
|
|
|
-production system
|
|
|
-usability studies
|
|
|
-accessibility testing
|
|
|
|
|
|
|
|
|
|
|
|
The CMU Archives Repository Workflow:
|
|
|
|
|
|
|
|
|
|
|
|
References
|
|
|
|
|
|
Islandora CLAW
|
|
|
https://islandora.ca/CLAW
|
|
|
Intro to Islandora CLAW
|
|
|
https://islandora-claw.github.io/CLAW/user-documentation/intro-to-claw/
|
|
|
Islandora CLAW MVP
|
|
|
https://islandora-claw.github.io/CLAW/mvp/mvp_doc/
|
|
|
|
|
|
LoC Standards (EAD,MODS,METS,MARC,MARCXML,VRA, etc.)
|
|
|
http://www.loc.gov/standards/
|
|
|
VIAF: The Virtual International Authority File
|
|
|
https://viaf.org/
|
|
|
W3C Linked Data
|
|
|
https://www.w3.org/standards/semanticweb/data
|
|
|
|
|
|
|