Chris Kellen · ecc84f79
Hide whitespace changes
Inline Side-by-side

Showing with 94 additions and 0 deletions

home.md home.md +94 -0

No files found.
--- a/home.md
+++ b/home.md
+CMU Archives Repository Roadmap  v.1 
+											2017-08-09
+Main sub-projects:
+1. Metadata clean-up
+     -pull metadata out of ArchivalWare.  Fix, normalize, augment & convert.
+2. Scanning Workflow
+     -automate OCR & ingest
+3. Archive Repository Software
+     -build repository, workflows, UI & preservation
+
+Can be done in parallel:
+  Metadata Cleanup--------------------------------\
+                                                                                 \ (ingest into repository)
+  Scanning Workflow---\-------------\
+                                           \(OCR)        \(ingest into repository)
+  Archive Repository-------------------------------------------------------------…------->
+
+
+Tasks in the sub-projects:
+
+Metadata Clean-up
+-dump from ArchivalWare
+- check for completeness
+-(restore data lost on ArchivalWare import?)
+-evaluate
+-plan   [vocabularies, taxonomies, schemas to use for each type of collection]
+            [vocabularies, taxonomies, schemas to use in common – DublinCore/MODS +?]
+            how do we get from metadata we have to metadata we want?
+-automate conversions we can  - openrefine / conciliator + cmu modifications?
+-automate verification/analysis  (e.g. show all values of each field & # occurances)
+-global find-replaces
+-convert to standard md models (DC/MODS)
+-(eventually map to linked data?)
+
+Scanning Workflow
+-bagit
+(-bagit with collection affiliation?  -use scanned path?)
+-Abbyy OCR  automation
+   -command line version
+   -config file version
+   -server (pull tiffs from repository, push OCR/text in)
+-PDF web optimization(?)
+-conversion scripts to do:
+    -derivative generation (e.g. text, thumbnails, JP2, epub etc.)
+    -metadata lookup (use id to get metadata)
+    -FITS metadata generation (file info tool set technical md)
+-ingest into repositories (CMU’s ArchiveRepository, figshare, box, etc.)
+-backup (rsync)
+
+Archive Repository Software
+-evaluate current state (hydra/CLAW)
+-choose platform (probably CLAW.  both supposedly interoperable - PCDM)
+-meet w/ Pitt about their CLAW experience [8/21/2017, ongoing]
+-discuss w/ peer institutions (Penn St, U Maryland, U Oregon)
+-discuss w/ CLAW team
+-CLAW tech calls
+-map each type of our collections to existing/needed templates
+       (photos/books/newspapers/finding aids/etc.)
+-put a collection which is supported into system for evaluation
+-determine needed pieces which are not being worked on
+        (based on our use cases)
+-build missing pieces   (easier said than done)
+-user interface design
+-admin user interface design
+-preservation layer (determine what this means & provide it)
+-APIs (OAI-PMH (for Primo etc.), LOCKSS (for MetaArchive? Etc.) , SPARQL)
+-staging system w/ administrative functions
+-production system
+-usability studies
+-accessibility testing
+
+
+
+The CMU Archives Repository Workflow:
+ 
+
+ 
+References
+
+Islandora CLAW
+  https://islandora.ca/CLAW 
+Intro to Islandora CLAW
+  https://islandora-claw.github.io/CLAW/user-documentation/intro-to-claw/  
+ Islandora CLAW MVP
+    https://islandora-claw.github.io/CLAW/mvp/mvp_doc/ 
+
+LoC Standards   (EAD,MODS,METS,MARC,MARCXML,VRA, etc.)
+   http://www.loc.gov/standards/ 
+VIAF: The Virtual International Authority File
+  https://viaf.org/ 
+W3C Linked Data
+  https://www.w3.org/standards/semanticweb/data 
+
+