I wanted to solve the following problem:
- process 20 million of relatively small JSON files in a projection
- each JSON file is valid and contains up to 100 properties
- there is about 30 different JSON formats
- there is about 50 different projections
- each projection uses only few properties of each JSON format
- each projection resides in separated git repository
With the following constraints:
- the processing should take about 10 minutes
- the number of JSON formats will increase
- the number of projections will increase
- the projection should report used properties
And the following preferences:
- I don’t want to share or maintain JSON formats as csharp code
- I want to use the projection code to report used properties
I wrote the prototype and I found out that:
- the bottleneck is JSON deserialization into dynamics
- the Newtonsoft.Json deserializer is very slow
- the Jil is faster but still slow
I wrote my own deserializer which:
- deserializes into dynamics
- parses only valid JSON
- maximum JSON size is 64kB
- there is not JSON indentation
- deserialized object should be consumed before deserializing next one
I compared the results and my implementation is as fast as static Jil or NetJSON. Sometimes is even faster.
Interested? Check it out: https://github.com/amacal/jynd