Skip to main content
Idaho Harvester home, Baby Joe Vandal logo
Special Collections & Archives, University of Idaho Library home
Featured Image

Metadata Cleanup

Sometime we work with metadata that is especially messy! Below are some dates from a spreadsheet that generated our HistPhoto database.

There were originally 4346 unique values in the date column - but many of these values meant the same thing, just written in different ways (i.e. “1934?” vs. “circa 1934” vs. “possibly 1934” or “July 4, 1947” vs “7/4/1947” vs “07-04-1947”). By standardizing the way metadata is written, we were able to clean it up to only 3763 unique values - without removing any data! (shout-out to our best friend, OpenRefine)

screenshot of HistPhoto metadata in OpenRefine
screenshot of HistPhoto metadata in OpenRefine
screenshot of HistPhoto metadata in OpenRefine
screenshot of HistPhoto metadata in OpenRefine
screenshot of HistPhoto metadata in OpenRefine
screenshot of HistPhoto metadata in OpenRefine

Sources

OpenRefine is free and easy to use! Interested in learning more? One of our librarians has a workshop on OpenRefine!

Have Feedback on this post or the site?

Send us your thoughts!