As we create content and digitize images, oral histories, transcripts, etc. we need to adopt well-defined file naming conventions so that these electronic files can be quickly and unambiguously identified. To that end, here is a proposed standard:
Rules:
- Only use lower-case - since linux is case-sensitive, this will avoid problems.
- No spaces - the web cannot handle spaces in file names, so it's best not to use them in our file names. Periods, underscores, and hyphens are all good substitutes, but I propose we use a hyphen as our standard replacement for a space since it can be seen when underlined, as is the case with links. Since we've already adopted periods in the accession number, it makes sense to stick with that, but only for the accession number part of the name
- No punctuation - filenames shouldn't have punctuation other than a period or a hyphen. Characters like the following will cause problems with scripts and the operating system and must not be used: # / \ , $ % ^ & * ( ) @ ! ~ { } [ ] ` >.
- Start with accession number - since this is our basic identifier it makes sense to start with it.
- General to specific - Refinements to further specify the file name should come at the end of the filename, so that we can still easily search and list related objects because they share the same starting accession number. By putting the most specific information at the end of the filename, the related files will sort together, and can be accessed easily with regular expressions
With these rules in mind, here are a few 'valid' file names:
The following names all share the same file name, but differ only in the file-type, which is fine:
2007.500.144-eartha.mp3
2007.500.144-eartha.wav
2007.500.144-eartha.odt
Notice that the name 'eartha' is there to help us see what it is. Without that, we'd have to remember what the number refers to, which would be difficult when you're looking at 1000 files.
Here are three more names which have refinements at the end to help resolve different versions:
2007.500.144-eartha-a.wav
2007.500.144-eartha-b.wav
2007.500.144-eartha-c.wav
The refinement specifier should contain information that helps identify the nature of the differences, for example, the following might be used to resolve between two different image resolutions:
2007.500.144-eartha-1600x1200.jpg
2007.500.144-eartha-640x480.jpg
We will develop standard conventions for these kind of differences as we evolve this naming spec.