Skip to main content
Idaho Harvester home, Baby Joe Vandal logo
Special Collections and Archives, University of Idaho Library home
Featured Image

Metadata Cleanup

Sometime we work with metadata that is especially messy! Below are some dates from a spreadsheet that generated our HistPhoto database.

There were originally 4346 unique values in the date column - but many of these values meant the same thing, just written in different ways (i.e. “1934?” vs. “circa 1934” vs. “possibly 1934” or “July 4, 1947” vs “7/4/1947” vs “07-04-1947”). By standardizing the way metadata is written, we were able to clean it up to only 3763 unique values - without removing any data! (shout-out to our best friend, OpenRefine)

HistPhoto metadata in OpenRefine [1]
HistPhoto metadata in OpenRefine [1]
HistPhoto metadata in OpenRefine [2]
HistPhoto metadata in OpenRefine [2]
HistPhoto metadata in OpenRefine [3]
HistPhoto metadata in OpenRefine [3]

Sources

OpenRefine is free and easy to use! Interested in learning more? One of our librarians has a workshop on OpenRefine!

Have Feedback on this post or the site?

Send us your thoughts!