JSON parser in 64h: 04/64, the Lemon parser

Let’s take some rest from the Unicode world and speak about JSON. There are multiple parsers in the wild, all with different approaches. I will use the Lemon Parser Generator and not some more famous ones.

Lemon is an LALR(1) parser generator for C. It does the same job as “bison” and “yacc”

In yacc and bison, the parser calls the tokenizer. In Lemon, the tokenizer calls the parser.

Lemon do need a tokenizer to send it data. I can use re2c, or other similar tools for this. I will start easy and write a very basic tokenizer myself.

These tokens can be skipped:

  1. 0x0020: // SPACE
  2. 0x0009: // CHARACTER TABULATION
  3. 0x000A: // LINE FEED (LF)
  4. 0x000D: // CARRIAGE RETURN (CR)

String literals

In the Parsing JSON is a Minefield this is mentioned: ECMA-262 – 7.8.4 String Literals):

All characters may appear literally in a string literal except for the closing quote character, backslash, carriage return, line separator, paragraph separator, and line feed

The last specification mention this:

All code points may appear literally in a string literal except for the closing quote code points, U+005C (REVERSE SOLIDUS), U+000D (CARRIAGE RETURN), and U+000A (LINE FEED)

Do this mean that now 0x2029 PARAGRAPH SEPARATOR and 0x000A LINE FEED are accepted?

A basic tokenizer

I started with a very basic tokenizer that skip 0x0020, 0x0009, 0x000A, 0x000D and accept {, }. [, ], : and ,.

The callback now accept the token as well:

typedef void (*j128_tokenizer_callback)(size_t index, size_t string_index, j128_codepoint codepoint, j128_token token);

I created an enum for the values. I know that Lemon create a list of #defines with the various token, but it is needed for tests.

Add Lemon

The lemon.c and lemonpar.h files can be downloaded from SQLite site, without installing anything. Adding a CMake rule is very simple:

set(LEMON_SOURCES
    lemon.c
)

add_executable(lemon ${LEMON_SOURCES})
target_compile_options(lemon PRIVATE -Wno-strict-prototypes)

I needed to remove a warning from the main options -Wall -Wextra -pedantic -Wno-unused-parameter. There are some function that are declared without parameters, without void.

Fourth hours is finished!

Time flies. You can see the third version of the repository here.

Until next time!

@online{zaerl2025-json-parser-in-64h-04-64-the-lemon-parser,
  author = {Francesco Bigiarini},
  title = {JSON parser in 64h: 04/64, the Lemon parser},
  date = {2025-03-05},
  url = {https://zaerl.com/2025/03/05/json-parser-in-64h-04-64-the-lemon-parser/},
  urldate = {2025-03-05}
}