Functions that write to and read from a database are usually straghtforward—and probably a bit dry—to implement, and we are happy enough if we can leave this implementation to an automatic tool. In this post I present an organisation of this problem in data structures and algorithms that allow clear and simple implementations for these read-write-functions, should we write these functions manually or prepare a tool to generate them automatically.

These ideas are implemented in Lemonade Sqlite, an OCaml library. The project is still very young and your feedback ancd suggestions are very welcome.

Data Structures + Algorithms = Programs

Almost every programmer has already seen the equation popularised by Niklas Wirth, but is has an aspect whose subtlety that is less well-known: the frontier between data structures and algorithms is very fluid, when it comes to decide how should the different aspects of our program be represented. As an example, when we process the rows returned by a database query, we can iterate over them using a while-loop as in

open Sqlite3
let process_rows stmt =
  while step stmt = Rc.ROW do
    let data = row_data stmt in
    for i = 0 to Array.length data - 1 do
      Printf.printf "%s\n%!"
        (Data.to_string (data.(i)))

We can also select another structure for our code, using a stream to carry the rows to the processing routine:

open Sqlite3
let process_rows stmt =
  let retrieve_rows _ =
    match step stmt with
    | Rc.ROW -> Some(row_data stmt)
    | Rc.DONE -> None
    | _ -> raise Stream.Failure
  let print_row row =
    for i = 0 to Array.length row - 1 do
      Printf.printf "%s\n%!"
        (Data.to_string (row.(i)))
  Stream.iter print_row (Stream.from retrieve_rows)

In the second approach we recognised the rows retrieveal and the treatment of the rows as two orthogonal activities, that we combine through a higher-level function. This is a little but nice improvement because:

  • Orthognality favours reusability and also increase combinatorically the number of programs that can easily be expressed. There is nice little essay by Markus Schnalke discussing this as being an essential aspect of the so called Unix philosophy.

  • Orthogonality eases testing and debugging and thus improves reliability.

  • If we were to write generators for database read-write functions, the second organisation reduces the surface of the generated functions by factorising out important subroutines. This makes the generation easier, as it is noticeably harder to debug generated code than straight code, because of the extra level of indirection. (We need to understand bugs in the generated code as bugs in the generator.)

Monadic streams

Instead of using the streams from the standard library, I opted for monadic streams as implemented in Lemonade. The essential difference between these two flavours of streams is that reading from the stream is a monadic operation, which makes it a neat way to organise error handling.

Lemonade Sqlite outline

The Lemonade Sqlite library builds a monad `̀‘a t for database operations, which is essentially a success monad, the corresponding monadic streams ‘a S.t and defines the following important functions:

val query : statement -> handle -> row S.t
(** [query statement] is the stream of rows returned by the
    [statement]. *)

val exec : statement -> handle -> unit t
(** [exec commmands] is a monad applying the statements provided by
    [commands] and ignoring produced rows if any. *)

val insert : statement S.t -> handle -> unit t
(** [insert commmands] is a monad applying the statements provided by
    [commands] and ignoring produced rows if any. *)

Writing to the library is performed by turning a stream of arbitrary data into a stream of statements using bindings:

val bindings: (string * ('a -> data)) list -> 'a S.t -> binding S.t
(** [bindings spec data] turn a binding specification and data stream into
    a binding stream. The specification is made of pairs [(name, get)] which
    identify which variable to bind and which data to bind it to. *)

val bindings_apply : ?rowid:int64 ref -> binding S.t -> statement -> statement S.t
(** [bindings_apply bindings stmt] create a statement stream by
    repetitively using the bindings provided by [bindings] on [stmt].

    If the argument [rowid] is given, the statements generated by the
    binding arrange so that the [rowid] is updated with the
    [last_insert_rowid] value after execution of the statement. *)

The resulting stream of statements can then be processed using the insert function above.