almost json deserializer

I wanted to solve the following problem:

  • process 20 million of relatively small JSON files in a projection
  • each JSON file is valid and contains up to 100 properties
  • there is about 30 different JSON formats
  • there is about 50 different projections
  • each projection uses only few properties of each JSON format
  • each projection resides in separated git repository

With the following constraints:

  • the processing should take about 10 minutes
  • the number of JSON formats will increase
  • the number of projections will increase
  • the projection should report used properties

And the following preferences:

  • I don’t want to share or maintain JSON formats as csharp code
  • I want to use the projection code to report used properties

I wrote the prototype and I found out that:

  • the bottleneck is JSON deserialization into dynamics
  • the Newtonsoft.Json deserializer is very slow
  • the Jil is faster but still slow

I wrote my own deserializer which:

  • deserializes into dynamics
  • parses only valid JSON
  • maximum JSON size is 64kB
  • there is not JSON indentation
  • deserialized object should be consumed before deserializing next one

I compared the results and my implementation is as fast as static Jil or NetJSON. Sometimes is even faster.

Interested? Check it out: https://github.com/amacal/jynd

endianness

Recently I was working on torrent encryption protocol, which uses Diffie-Hellman key exchange. I used .net built-in System.Numerics assembly which offers the BigInteger structure. Event ModPow method was included. “Great, there is event ToByteArray method”, I thought. Then I spent two days of debugging because I didn’t check the byte order returned by this method. Why does Microsoft always implement things this way? As a developer I would expect to have the following signature of the BigInteger structure:

public struct BigInteger : // some interfaces
{
   public BigInteger(byte[] value);
   public BigInteger(byte[] value, ByteOrder endianness);

   public byte[] ToByteArray();
   public byte[] ToByteArray(ByteOrder endianness);

   // other members
}

anonymous types and dynamics

Anonymous types are compiled as internal. What is the impact of it? You can still inspect them using reflections, but you cannot access their properties from other assemblies using dynamic keyword. It will throw RuntimeBinderException.