Let's Create a Data Format
I have a problem. I want to be able to transfer self-contained binary data with metadata through a variety of protocols with no knowledge of the binary data's format or the protocol being used for transfer.
Or in other words, I want to be able to send files anywhere without losing the filename.
That's a bit simpler than my actual goal, but I think this is a problem every software developer has considered at some point. We've all asked the question, "Why isn't the filename attached to the file?" or slightly more advanced, "Why isn't the file format attached to the file?"
The answer isn't all that complicated.
- Any file transfer protocol ever invented can pass the filename with the file
- File extensions are Good Enough for identifying the file format
- We have good tools for guessing the format if the filename is missing
- As human beings we can use context to guess the format and "fix" the extension
But I'm going to declare that Good Enough isn't good enough. Perhaps this is the metadata that is most useful for files, but it's not the only useful metadata. It also depends on the transfer protocol to preserve the metadata. What if I don't want to rely on a specific protocol?
And so, knowing full well that this is likely to go nowhere and that solutions to this problem almost certainly already exist, I'm going to set out to create a new data format that encapsulates data and metadata into a single file.
The hard problem #
The first thing to do is give my new format a name. After some deliberation, I'm going to settle on Self-described Binary Document or SDBD for short. It's contains arbitrary binary data. It's a document and not a file because it could live anywhere. And the whole purpose is to let the document describe its own contents. Now that I've tackled one of the hard problems, the rest should be easy.
The next step is to talk about how we talk about the format. How is it structured and what concepts do we use to build it?
- Next: Semantics of SDBD