20080408

Serializing universal types as S-expressions using Sexplib

The standard ML way to "unify" multiple types is to use sum types a.k.a. discriminated unions, e.g. type foo = Int of int | String of string | Pair of foo * foo.

Universal types are types without type variables that allow you to store and retrieve values of any arbitrary type. They can be handy in some situations. For instance, you might decorate an abstract syntax tree with cells of a universal type and insert various data inferred at different phases of your compiler without having to polymorphise your AST types.

Sweek blogged about this recently here and the MLTon wiki has nice entries about property lists and universal types.

Sexplib on the other hand is a very nice piece of Markus Mottl engineering (again at Jane St) and is an absolute must for developing in Ocaml. The Sexplib camlp4 syntax extension automatically defines converters for any type you define, and handles polymorphism, modules and functors perfectly well.

Obviously, Sexplib can't serialize functions. (You can have a look at AliceML if you absolutely want that; function serialization requires some kind of byte code).

And that's where universal types and Sexplib don't mix well. Why? Because, at least for implementation techniques of universal types that I know, universal types work by cheating on the type system using closures and side-effects. Greetings to the relaxed value restriction!

Basically, to store a value of type bar, the computer creates a closure of fixed type unit -> unit that stores within its environment the value of type bar; when invoked, the closure spits it out at a pre-agreed location.

For every type, the closure will have the same closed type unit -> unit and can be stored anywhere you please. To retrieve the value, you invoke the closure, and it writes its content somewhere known. This will usually be a foo option ref. If multi-threading is used, you must protect that reference, for instance with a mutex.

However, functions are generally not serializable. Marshall does an ugly thing by spitting out a PC, the environment and the MD5 of your code, which works until you recompile your program. Not particularly persistent. As for Sexplib, as it works at the preprocessor level, it is not even possible.

Fortunately you can always define your own foo_of_sexp and sexp_of_foo. With some trickery, I managed to write a Sexp-aware Proplist.

Interface:

(* Proplist *)

type property_id = string
type 'a propname
type proplist

val propname : ('a -> Sexplib.Sexp.t) -> (Sexplib.Sexp.t -> 'a) -> property_id -> 'a propname
val proplist : unit -> proplist
val set : proplist -> 'a propname -> 'a -> unit
val get : proplist -> 'a propname -> 'a
val get' : proplist -> 'a propname -> 'a option
val sexp_of_proplist : proplist -> Sexplib.Sexp.t
val proplist_of_sexp : Sexplib.Sexp.t -> proplist


Implementation:

(* Proplist *)

TYPE_CONV_PATH "Util.Proplist"

open Sexplib
open Sexp

type property_id = string

module I = struct type t = property_id let compare (x : property_id) y = compare x y end
module SS = Set.Make(I)
module SM = Map.Make(I)

type 'a propname =
{
id : property_id;
to_sexp : 'a -> Sexp.t;
mutable value : 'a option
}

type property =
{
extract : unit -> unit;
sexpify : unit -> Sexp.t
}

type proplist =
{
mutable stuff : property SM.t
}

let ids = ref SM.empty

let proplist () = { stuff = SM.empty }

let sexp_of_property p = p.sexpify ()

let sexp_of_proplist p =
List(
SM.fold
(fun k p result ->
List[Atom k; sexp_of_property p] :: result
)
p.stuff
[]
)

let set p name x =
let f () = name.value <- Some x in let g () = name.to_sexp x in let r = { extract = f; sexpify = g } in p.stuff <- SM.add name.id r p.stuff let propname to_sexp from_sexp id = if SM.mem id !ids then failwith "Property already defined" else begin let name = { id = id; value = None; to_sexp = to_sexp; } in let f pl x = set pl name (from_sexp x) in ids := SM.add id f !ids; name end let get' p name = let r = SM.find name.id p.stuff in r.extract (); let x = name.value in name.value <- None; x let get p name = match get' p name with | Some x -> x
| None -> raise Not_found

let proplist_of_sexp x =
let pl = proplist () in
begin
match x with
| List l ->
List.iter
(function
| List[Atom k; p] ->
let f = SM.find k !ids in
f pl p
| x -> raise (Conv.Of_sexp_error("Pair expected", x)))
l
| _ -> raise (Conv.Of_sexp_error("List expected", x))
end;
pl


Note that if you are using a property list where property names are integers generated from a global counter, there is no guarantee that property names will always get the same ID. So, when constructing property names, you'll have to give, well, a unique property name. And strings are more convenient than integers, here. The unicity of such strings cannot be checked at runtime but that's about the best you can do. Also, when constructing a property name, you have to give the foo_of_sexp and sexp_of_foo converters. Retrieving them at unsexprification time is a little bit tricky since the return type must be hidden - these will go into a private global table indexed by property names.