Description

There is a difficult to resolve notation ambiguity in a JSON ← → Erlang term parser. Supporting principle of least surprise turns out to be fantastically difficult.

Skillset

  1. Erlang
  2. Javascript (or other ecma-derived languages like actionscript)
  3. c/c++ helps

Task

Convert JSON to/from erlang notation. For example, JSON's 3.1415 should be represented in erlang as 3.1415. More complex conversions turn out to be surprisingly unclear.

Problems

Jesus Christ. I mean, seriously, Jesus Christ. Where do I even start?

  1. Container Syntaxes
    1. Erlang offers two fundamental container syntaxes: tuples {} and lists []. These are a fixed sequence container and a linked list, respectively.
    2. JS/ECMA offer two fundamental container syntaxes: objects {} and arrays []. These are a sparse string key/variant value hash and a dense sequence, respectively.
    3. Erlang {} is equivalent to JSON [] (both are dense sequence containers).
    4. JSON does not have Erlang's [] (list), and Erlang does not have JSON's {} (k/v map).
  2. Strings
    1. In the way that c-strings are arrays, erlang strings are singly linked lists; strings are not distinct from containers.
    2. ECMA has a fundamental string type which is distinct from its containers.
    3. JSON requires unicode support. Erlang has no unicode support.
  3. JSON has no sensible representation of atoms.
    1. Even hacks like setting the value in an object to null have limitations such as the lack of ordering and the ability to repeat. {"atom":"youratomhere"} is close, if you're willing to tolerate the loss of {"atom":...} as a standalone.
  4. JSON has no sensible representation of binaries or bitstrings.
    1. Attempting to support binaries just leads to ambiguity cycles with either lists or strings, depending on the JSON type the binary is implemented as.
    2. Disambiguation wrapper types at the JSON side get ugly fast.
  5. Erlang Refs, funs, ports and pids are right out without type wrappers.

Difficult conversions

  1. The exact representation of JSON's {"a":"b", "c":"d"} in Erlang is unavailable. The canonical analogue is a 2ary tag tuple list, [ {a,b}, {c,d} ].
  2. Many roundtrip conversions are ambiguous. Consider the JSON values [97,98,99] and "abc", which are meaningfully distinct, whereas their Erlang analogues are not.
    1. When converting erlang's [97,98,99] to JSON, should [97,98,99] or "abc" come out?
    2. What should erlang receive from { "abc" : [97,98,99] }? What should Javascript get when sending the Erlang version back?
  3. How do you represent JSON objects in erlang?
    1. If you choose to convert JSON objects to Erlang tuple lists, the roundtrip problem makes JSON lists of single-term objects ambiguous.
    2. You could make tuples of tuples, if you're willing to assume accept ordering behavior differences, exclusion behavior differences, etc.
      1. Tuples of tuples would not be round-trip ambiguous, because JSON objects of objects cannot be keyless, nor can their keys be valueless.
      2. Tuples of tuples almost don't exist in erlang, and the list of tuples is conceptually equivalent to JSON's object. Is the ambiguity resolvable without tagging?
  4. Tagging isn't viable
    1. One could parse the JSON {"a":"b"} into { json_object, [{a},{b}] }. However, this means the erlang→JSON direction either can't handle general erlang data without being marked up (undesirable), or retains some classes of cyclic ambiguity (unacceptable).
    2. Allowing the user to specify which conversions will be used just lets users dig themselves in without understanding the ambiguity, and does not solve the problem for users who need both strings and lists.
    3. Providing a set of tagging converters and non-tagging converters ameliorates the problem by letting them choose between painful and unsafe; this does not seem a desirable solution.
  5. Erlang records are ambiguous in round trip with JSON arrays containing the type representation of the record's label as a value.
  6. The least damaging conversion is from JSON arrays [] to erlang tuples {} and back, but the symbol switch looks like a bug and breaks least-surprise hard.

The challenge

Find a round-trip conversion that will:

  1. Take JSON { "abc":[1,2,3] }, convert to Erlang, and convert back to JSON correctly
  2. Take Erlang [ {a,b}, {"c","d"} ], convert to JSON, and convert back to Erlang correctly
  3. Does not require configuration to achieve both conversions
  4. Does not require a "safe" API to convert unambiguously