Sometime we work with metadata that is especially messy! Below are some dates from a spreadsheet that generated our HistPhoto database.
There were originally 4346 unique values in the date column - but many of these values meant the same thing, just written in different ways (i.e. “1934?” vs. “circa 1934” vs. “possibly 1934” or “July 4, 1947” vs “7/4/1947” vs “07-04-1947”). By standardizing the way metadata is written, we were able to clean it up to only 3763 unique values - without removing any data! (shout-out to our best friend, OpenRefine)
Sources
OpenRefine is free and easy to use! Interested in learning more? One of our librarians has a workshop on OpenRefine!