The Heart of the Code

09 January 2024
SDBD

To complete the proof of concept we need to implement SDBD.ICodec. This is what we're proving after all. The most complicated part will be the HPACK encoding. I'd rather not implement that myself, not for something basic. Fortunately there is a NuGet package that should do the trick. It's called simply hpack.

At the risk of pulling a "rest of the owl", I'm going to jump straight to the final code. I think a short walkthrough is enough to show how the code works without going through the coding process step by step.

Encoder #

public byte[] Encode(Document document) {
  var dataLength = document.Data.Length;
  var contentLength = new Dictionary<string, string> () {
    { "content-length", dataLength.ToString() }
  };
  var headers = document.Metadata.Union(contentLength);

  var packedHeaders = packHeaders(headers);
  var headerLength = Convert.ToUInt16(packedHeaders.Length);

  using var output = new MemoryStream();

  output.WriteByte(0x01);
  output.Write(BitConverter.GetBytes(headerLength));
  output.Write(packedHeaders);
  output.Write(document.Data);

  return output.ToArray();
}

private byte[] packHeaders(IEnumerable<KeyValuePair<string, string>> headers) {
  //0 will disable dynamic table that we don't need anyways
  var encoder = new hpack.Encoder(0);
  using var output = new MemoryStream();
  using var writer = new BinaryWriter(output);

  foreach(var (name, value) in headers) {
    encoder.EncodeHeader(writer, name, value);
  }

  return output.ToArray();
}

The first thing we do is add our one required header to the metadata: content-length. We then pack the headers. The hpack encoder is pretty easy to use. Passing that 0 into the constructor should disable some HPACK features that don't make sense in the context of SDBD.

That gives us all the bits we need to write out a document. Version number: 0x01. Header length as an unsigned 16-bit integer. The packed headers themselves. And finally the data.

If I encode a file named test.txt with the content This is a test. This is only a test., this is the output:

0000	01 17 00 00 89 21 ea 49  6a 4a d5 0e 92 ff 86 49   .....!.IjJ.....I
0010	50 95 d3 e5 3f 0f 0d 02  33 36 54 68 69 73 20 69   P...?...36This i
0020	73 20 61 20 74 65 73 74  2e 20 54 68 69 73 20 69   s a test. This i
0030	73 20 6f 6e 6c 79 20 61  20 74 65 73 74 2e         s only a test.

Looks promising. The first byte is the version, the next two decode to integer 23. The next 23 bytes look like the headers to me, and we can even see the content length 36 in the text. Finally there's the 36 bytes of data.

Decoder #

public Document Decode(byte[] data) {
  using var input = new MemoryStream(data);

  var version = input.ReadByte();

  return version switch {
    0x01 => DecodeV1(input),
    _ => throw new Exception("Unsupported version")
  };
}

private Document DecodeV1(Stream stream) {
  var headerLengthBytes = new byte[2];
  stream.ReadExactly(headerLengthBytes);
  var headerLength = BitConverter.ToUInt16(headerLengthBytes);

  var headerBytes = new byte[headerLength];
  stream.ReadExactly(headerBytes);
  var headers = unpackHeaders(headerBytes);
   
  string contentLengthString;
  headers.Remove("content-length", out contentLengthString);
  var contentLength = int.Parse(contentLengthString);

  var data = new byte[contentLength];
  stream.ReadExactly(data);

  return new Document(headers, data);
}

To decode we first read the version byte. If the version is the only version we have implemented now, we jump into the real implementation. (Throw an exception otherwise.)

We read the next two bytes to get the header length. We read that number of bytes to get the encoded headers. We decode the headers, extract content-length, and read that number of bytes as the data. Then we pack it up in our Document data structure and send it back.

I left out the implementation of unpackHeaders here because it's mildly confusing if you're not familiar with old .NET patterns. You can find it with the complete source for this implementation on GitHub.

And now... #

It's time for a break. I have successfully created a brand new data format. Or at least stitched one together like Frankenstein's monster. I have a working implementation that can encode and decode the format, albeit with only one header implemented.

I'm already building a list of improvements. I'm not planning on tackling them for a bit. First I want to hear other people's feedback. Since this is the internet, somebody will no doubt tell me that something like this already exists. If that's true, great! I'll be sure to link it here for anybody interested.

Either way I certainly haven't wasted my time, and I hope you don't feel like you've wasted yours. I think the process of developing the SDBD format and a proof of concept was a valuable learning experience on its own. I want to hear feedback, so feel free to log issues on GitHub or find me on the Fediverse.

If you like SDBD and want to use it, go for it! I think the format itself is sound. The demo implementation has at least one security issue, so use it with caution.

Previous: Building a Proof of Concept
Next: Improvements