GON Parsers

GON is a plain-text file format similar to JSON which can be used to store structured data in fields, objects, and arrays. Compared to JSON the syntax is extremely minimal and, in my opinion, much easier to read and edit. The original C++ implementation was created by Tyler Glaiel, and can be accessed here.


An Introduction to the Format

Every field in a gon file consists of a name and value pair (unless the object is in an array, in which case it has no name). Each token (a name or value) is separated by whitespace, unless encased in quotation marks, in which case the content of the quotation marks is treated as a single token. There are three types of gon field:

Below is a brief example of what the format looks like:

sample_object {
    file_name test.gon
    number    35.35
    string    "this is a string"
    array     [ 1 [ 2.1 2.2 ] 3 ]
    nested_object {
        number 53.53
        string "this is a string"
        array  [ 1 2 " three " 4 five 6 7 8 ]
    }
}

In this example, we have a root object called "sample_object" which contains several fields, including an array and another nest object. Objects and arrays can be nested as deeply as you wish without issue. Because all of the data in a gon file is plain text, there is no inherent issue with mixing data/fields/objects of different types. You can also placed comments in a file with '#'. Everything after this token will be ignored until the end of the line.

I find that the extremely minimal syntax of the format makes it ideal for situations requiring manual data entry. I currently use the format for entering variable values in my upcoming game's level editor in leiu of a gui interface, since it is extremely easy to edit the text and reload the file whenever I need to update the data. I also use the format for all of my configuration files as it is extremely easy for users to edit and requires very little explanation.


Links to Implementations

At this point I have written five different implementations across four different languages, and have augmented the format with some additional features. The original implementation is linked above, and the three implementations below are the one I've written that are currently public.


Evolution of the Parser

My first implementation was in C#, and it was more or less a straightforward port of the original C++ implementation. This was done solely for the purpose of not having to use cross-language bindings in my TEiN Randomizer, which was written in C#. The original C++ implementation is already pretty object-oriented in its design,

Later, I wrote an entirely new implementation in C and, taking some inspiration from RapidXml's in-situ parsing, saw a major performance increase. Compared to the original C++ implementation, my C implementation is 150x faster. According to my benchmarks, it is also marginally faster than RapidXml for a structurally identical input, but of course as XML is inherently a more verbose format, that can proabbly be chocked up to the shorter input stream. One thing I wanted to try to do with this C implementation was to somehow embed type information into the file format so that the parser could extract data directly into internal data structures without the need to generate and intermediate representation (e.g. a DOM). However, at the time I was unsure how to do this without either using reflection or making the syntax much more verbose. So, this implementation does make use of an intermediate DOM-like representation, but it is a very compact flat array and is still quite fast.

My next implementation was written in Jai, and this was a major improvement. Thanks to Jai's runtime type information, it was quite trivial to implement a SAX-style parser which would store data directly into strongly-typed structures, without constructing any intermediate representation of the file. The association of a particular field in a GON file to a piece of internal data is handled through "data bindings", which the user can very simply define in the parser context. For structs, the parser will automatically create 'indirect' bindings to member fields recursively.

This parser was relatively flexible and extensible through the use of callback procedures, but because it is relatively stateless, the SAX-style parsing paradigm ultimately has some limitations. The biggest limitation is the linearity of the parser. Unless we know ahead of time that we will need to hold on to some GON object (and handle that in a callback), we cannot later refer back to the object or set a data binding on that object. For many use cases, this limitation is no issue. But if you want a file to define references between the data it contains, you now need to do some very messy workarounds.

So for my next implementation, I restructured the parser completely to use a DOM. (For all my attempts to avoid such an intermediate representation, I have to admit that it does afford some additional niceities.) I also added some additional syntax that allows the user to reference other fields in a file by pointer, index, or value. The current implementation uses an iterative approach to resolve all references between nodes, and will return an error if there is some cyclic reference(s). (This new DOM-based version of the parser has also been ported to Odin.)


Future Plans

I plan to clean up and rewrite the SAX-based version in both Odin and Jai, and host those on a separate repository. The tokenizer and error reporting were not very good and there are certainly some other aspects which could be refactored.

I would like to extend the capabilities of new DOM-based parser with some basic expression evaluation so one can do things like defining values based on mathematical formulas or computing one field's value based on another field's value. I plan to add this capability once my Lead Sheets module is more complete.

If I ever get back around to improving the C implementation, I think it would be fun to try implementing some of the reflection-based automatic parsing of the Jai and Odin versions in C. This could probably be done in a relatively simple manner using something like Dyncall's dcAggr definitions. And of course I would want to design the parsing primitives to be simple enough that users could make it work with their own reflection systems.