A couple of days ago I saw a page talking about how parsing JSON is a Minefield. I have room for a new spare-time project. So I decided to write a C11 JSON parser that:
- Passes all the “yes” tests and rejects all the “no” ones.
- Let you choose if you want to enable an “implementation defined” test.
- Accept UTF-8, UTF-16, or UTF-32 including both their big-endian and little-endian variants.
- Never crash.
Probably 64 hours will not be enough, but I will try. So I will call it json-128. I will publish the local repository status, no matter what. As well as a summary of what I did here in my blog.
First hour: create a working environment
Let’s start the clock. First thing I will create a brand new repository: Github, MIT license, default C .ignore file and no template is ok. https://github.com/zaerl/json-128.
Clone the repository and add some informations in the README file.
I decide to have a base structure that simply tests a j128_version function. Let’s use CMake for this and Attractor for testing. I will not support Windows. There’s not enough time. I will use some code from my Mojibake library.
I reconstruct the base structure, it took me ~20 minutes. But I have a running test.
build/tests/json-128-test
Test: version
Tests valid/run: 3/3
Execution time: 0.0001 seconds
The structure:
- build/ – (generated by CMake)
- src/ – main source
- src/CMakeLists.txt
- tests/ – main tests
- tests/attractor – Attractor folder
- tests/CMakeLists.txt
- CMakeLists.txt
- Makefile
I have 15 minutes left. Let’s move to analyze what I will do in the future.
What to you to parse the JSON
I need to parse that JSON file, so I need to follow that business card grammar.

I want something lightweigth. I don’t want to use Bison, YACC and co. So I decided for Lemon parser. It has many pros:
- It’s the parser used in the SQLite project.
- It’s lightweight.
- They parse JSON(5) in SQLite, maybe I can
stealuse some code.
What to do with huge structures
It’s ok to have a maximum depth allowed in your nested structures. But I decide to have not. My parser will stream the values and not try to save in memory a one million elements array and return it. Example:
{
"test-array": [
"Hello",
"world!",
]
}
This JSON will call a callback function four times with these values:
- Start object
- Start array called “test-array”
- String child “Hello”
- String child “world!”
No need to specify when something end. Or at least this is my idea now.
The first hour is finished!
You can see the first version of the repository here.
Until next time!