almost json deserializer

I wanted to solve the following problem:

  • process 20 million of relatively small JSON files in a projection
  • each JSON file is valid and contains up to 100 properties
  • there is about 30 different JSON formats
  • there is about 50 different projections
  • each projection uses only few properties of each JSON format
  • each projection resides in separated git repository

With the following constraints:

  • the processing should take about 10 minutes
  • the number of JSON formats will increase
  • the number of projections will increase
  • the projection should report used properties

And the following preferences:

  • I don’t want to share or maintain JSON formats as csharp code
  • I want to use the projection code to report used properties

I wrote the prototype and I found out that:

  • the bottleneck is JSON deserialization into dynamics
  • the Newtonsoft.Json deserializer is very slow
  • the Jil is faster but still slow

I wrote my own deserializer which:

  • deserializes into dynamics
  • parses only valid JSON
  • maximum JSON size is 64kB
  • there is not JSON indentation
  • deserialized object should be consumed before deserializing next one

I compared the results and my implementation is as fast as static Jil or NetJSON. Sometimes is even faster.

Interested? Check it out: https://github.com/amacal/jynd