Skip to content

pat227/ocaml-db-model

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

The purpose of this project is to provide a convenient manner to
"model" an existing database. This would be especially useful if a database
has a large number of tables and fields. This project seeks to provide a
fast way to generate modules and correct types that match the tables and fields,
thus sparing us from having to manually create each and assign correct types.
Other libraries exist in other languages to accomplish similar goals, such as
JOOQ for java, and ODB for C++. Some call this endeavor "object mapping", and
ODB for C++ even performs "migration" by which hand written code is used to create
tables and fields in a database, which is not here supported (yet?). This library
seeks only to create valid Ocaml modules to describe existing tables, not the
other way around.

Ocaml >= 4.08 is required as of November 2020. Neither testing nor use has been
attempted, nor reported, under any later versions of Ocaml to date.

By way of example:
Let us presume we have a table Foo with the following fields and types:
 A integer (signed 32 bit)
 B bigint unsigned (unsigned 64-bit)
 C varchar(of any length)
 D timestamp
 E date
 F nullable int
This project would output a module described in foo.ml and foo.mli like so (most of
the data types are from Jane Street Core, some are customized):
module Foo = struct
  type t = {
    A : Int32.t;
    B : UInt64_extended.t;
    C : String.t;
    D : Time.t;
    E : Date.t;
    F : Int64.t option
  } [@@deriving fields,show,sexp,eq,ord,yojson]
  ...
end
module Foo : sig
  type t = {
    A : Int32.t;
    B : UInt64_extended.t;
    C : String.t;
    D : Time.t;
    E : Date.t;
    F : Int64.t option
  } [@@deriving fields,show,sexp,eq,ord,yojson]
  ...
end

Command line arguments also control which modules, or all or none, should have type t
defined as part of a sub-module T for inclusion by way of an "include T" statement in the
ml file, and furthermore running sub-module T through Core.Comprable.Make(T).

Boiler plate functions are also provided to run queries and return Sets of type t, wrapped
in Core.Result.t, as well as function for creating some of the SQL needed to insert or update
records in a table.

The modules created by this project as output should be correct and ready to serve
as inputs to the Ocaml compiler. If used with many tables or tables with many
fields, this project should hopefully save somebody a lot of time. At present
almost all the types are taken from Jane Street's Core libraries and this project
supports Mysql databases. Support for Postgres is found in another project entitled
ocaml-pgsql-model. Each module would also include some support functions for queries
and creating instances of type t from results of those queries from the same tables.

This project does not use the conversion functions provided by ocaml-mysql, but
instead does its own string parsing. Unlike ocaml-mysql, this project also rejects
possible combinations of mysql data types and Ocaml data types where clipping
or overflow might be possible. Library support for signed and unsigned 8,16,32,
and 64 bit data types is almost total. The only mysql numeric data type so far this
project will reject is the 24-bit data type. Needless to say varchar
clipping is unavoidable given a sufficiently large input string unless we perhaps
come up with our own length-aware data type as a drop-in replacement. (You may in
general wish to set your Mysql schema to strict mode to reject all out-of-range
values on insert instead of clipping values to the max or min permissible
endpoint value of the column's data type.) While on the subject of clipping and
data-types, perhaps in future we can also create overflow-aware data-types for
additional safety when inter-operating with a db.

At present this project will output modules with some ppx extensions by default:
fields, show, sexp, eq, ord, and yojson.

USAGE:
Users can either manually invoke this project at the command line or incorporate invokation into your build system, such as Make files, dune, etc. This project first must be installed using opam. 

USAGE WITH DUNE:
The overall goal is to create a rule and an alias that invokes the a command line dependency that is this project after installation using opam.
Within your source code directory, create a dune file, within which you use the "(include_subdirs unqualified)" directive.
Use an alias to create the modules based on tables before compile-time.
Include the following statement, or something similar, to direct dune to execute an alias:
(preprocessor_deps (alias tables/tables_alias) (source_tree src/lib/tables))

Within your source code subdirectory, create another subdirectory named "tables" and within that create another dune file in which to define the alias that creates the modules like so:
(rule
 (alias tables_alias)
 (targets table_name_one.ml table_name_one.mli table_name_two.ml table_name_two.mli table_name_three.ml table_name_three.mli tables.ml)
 (deps (:gen /home/<userhome>/.opam/<version>/bin/ocaml_mysql_model))
 (action (run %{gen} -host <ip_of_db> -user <username> -password <password> -db <dbname> -table-list table_name_one,table_name_two,table_name_three -comparable-tables table_name_one,table_name_two,table_name_three -fields2ignore ts -ppx-decorators fields,eq,make,ord,sexp,show,yojson -destination <path_to_project>/_build/default/src/lib/tables/))
 (mode fallback)
)

Create a second identical rule and alias within the same dune file that outputs to your "tables" subdirectory within your source code tree, such as <path_to_project>/src/lib/tables. Presently it appears dune copies source code over to the _build tree for compilation from the source code tree, and while our rule generates files within the source code tree, apparently that doesn't happen in time before dune copies source code from the source tree to the build tree. Use of dune locks doesn't seem to help. If anyone knows how to do this with just one rule instead of two with dune, please let me know.

----TODO----Command line options------
 -> Provide a version without Core?
 -> Support POSTGRES
 -> Plain modules output - define modules each with a name matching that of
 a table in the schema and with a type t that is a record (a product type).
 Each component of each type t record bears the name of a field in the table
 for which the module is named. The type of each component matches that of
 it's corresponding field as closely as possible. Null-able fields must be
 optional. An mli is also created. No functions are defined. Types that are
 not supported are prohibited, such as 24-bit integer types for which
 we have no matching type in Ocaml. Some unsigned types are supported. See below
 for full details on which types are supported. 

 -> with ppx decoration - modules are written with ppx extensions. Supported
 ppx extensions presently include yojson, sexp, ord, eq, show, and fields.

 -> with query functions - modules are written with (at least) the fields
 ppx extension and a function of type:
 conn:Mysql.dbd -> query:string -> (t list, string) Core.Std.Result.t
 This function is named get_from_db and requires that any field not of type
 string be parsed in an appropriate manner, such as for integers of several
 supported types, dates, time-stamps, etc. Errors must be caught. No other
 types are supported--yet--such as a field whose type is defined as another
 module.

-> include serial fields - if some tables contain a Serial data type field
   we can choose to include it. By default we exclude it. If included. these
   are always optional. Mysql permits null as a value on insert for fields of
   type Serial and by default will use an auto-incremented next sequence value.
   Instances of any type that includes a field that is of type Serial in the
   db can enjoy a None value of this field if created at run-time and later
   inserted into the db. Note that if the Serial field is NOT declared with
   NOT NULL, then inserts that include a null will actually insert a null,
   not an auto-incremented next in sequence value; so don't do that unless
   you really mean it.


-----------------Mapping of data types is described below: --------------------

MYSQL:

We use the uint package in opam that provides Ocaml with unsigned integer types, specifically
8, 16, 32, and 64 bit unsigned integers. We also extend these to play nicely with ppx
when desired. Since MySQl provides some integer types that don't map perfectly to any
type in Ocaml--such as 24 bit medium integers--it is most wise to never use them--
including but not limited to the medium integer mysql data type. 
--------------------------------------------------------------------------------
SERIAL is alias for BIGINT UNSIGNED NOT NULL AUTO_INCREMENT UNIQUE ...  ->
  uint64   (Ocaml-Mysql parses to and from signed Int64?)


  TINY_INT (8 bit values) -> uint8 when unsigned, else unsupported
| TINY_INT when field name has prefix is_ -> bool (can chose prefix on which to filter)

  SMALLINT (16bit values) -> uint16 when unsigned, else unsupported

  MEDIUMINT (24 bit values) -> UNSUPPORTED

  INT (32 bit values) -> 
| INTEGER (32 bit values) -> Core.Std.Int32 when SIGNED, else uint32 when unsigned

| BIGINT (64 bit values) -> Core.Std.Int64 when SIGNED, else uint64; NOTE that Mysql
                           only permits values of 63 bits at most--SO USE OCAML's 63 bit integer type?????
			   Even UNSIGNED the limit is 63 bits; Mysql converts values to
			   DOUBLE for arithmetic above 63 bits. Exact values can be stored in BigInt column
			   by using a string else Mysql does a conversion step using Double that can
			   create rouding errors; a string-to-number conversion involves no such intermediate
			   step using the Double type.
   DECIMAL
   DEC    -> FOR NOW USE STRING tpye until we can get zarith or bignum (DONE) support in the ppx_xml_conv or csvfields 
   FIXED
   FLOAT
 | DOUBLE -> float

   DECIMAL UNSIGNED  -> FOR NOW USE STRING tpye until we can get zarith or bignum support in the ppx_xml_conv or csvfields 
   FLOAT UNSIGNED
 | DOUBLE UNSIGNED -> UNSUPPORTED    NOTE: Unsigned floating point types in Mysql does not alter the upper range of these types.


 | DATE -> Core.Std.Date
 | DATETIME
 | TIMESTAMP -> Core.Std.Time

TIME -> Core.Std.Time.Span??? UNSUPPORTED
YEAR -> UNSUPPORTED

  BINARY
  VARBINARY
  VARCHAR (variable length 0 - 65,535 stings) -> string (the only thing that could
    go wrong is the length of your string exceeds field width.
 | BLOB -> string

   CHAR[M] where 0<=M<=255 -> unsupported
 | TINYBLOB
 | TINYTEXT
 | MEDIUM BLOB -> unsupported

ENUM -> unsupported? We can get the permitted values and create a type that can be
  serialized to appropriate values? But it is bad practice to use enums with
  mysql, so I have heard anyway.

So far that's 12 Ocaml data types from >12 mysql data types, and several more
mysql data types unsupported in the name of type safety.
If we create a type that lists all the combinations ... we get 12 * 2 (if field
is nullable) * 2 (default values) = 48 possible combinations...to many to
comfortable type. Just use a struct.

This is the map ocaml-mysq uses to map mysql types to ocaml-mysql types:
static value
type2dbty (int type)
{
  static struct {int mysql; value caml;} map[] = {
    {FIELD_TYPE_DECIMAL     , Val_long(DECIMAL_TY)},
    {FIELD_TYPE_TINY        , Val_long(INT_TY)},
    {FIELD_TYPE_SHORT       , Val_long(INT_TY)},
    {FIELD_TYPE_LONG        , Val_long(INT_TY)},
    {FIELD_TYPE_FLOAT       , Val_long(FLOAT_TY)},
    {FIELD_TYPE_DOUBLE      , Val_long(FLOAT_TY)},
    {FIELD_TYPE_NULL        , Val_long(STRING_TY)},
    {FIELD_TYPE_TIMESTAMP   , Val_long(TIMESTAMP_TY)},
    {FIELD_TYPE_LONGLONG    , Val_long(INT64_TY)},
    {FIELD_TYPE_INT24       , Val_long(INT_TY)},
    {FIELD_TYPE_DATE        , Val_long(DATE_TY)},
    {FIELD_TYPE_TIME        , Val_long(TIME_TY)},
    {FIELD_TYPE_DATETIME    , Val_long(DATETIME_TY)},
    {FIELD_TYPE_YEAR        , Val_long(YEAR_TY)},
    {FIELD_TYPE_NEWDATE     , Val_long(UNKNOWN_TY)},
    {FIELD_TYPE_ENUM        , Val_long(ENUM_TY)},
    {FIELD_TYPE_SET         , Val_long(SET_TY)},
    {FIELD_TYPE_TINY_BLOB   , Val_long(BLOB_TY)},
    {FIELD_TYPE_MEDIUM_BLOB , Val_long(BLOB_TY)},
    {FIELD_TYPE_LONG_BLOB   , Val_long(BLOB_TY)},
    {FIELD_TYPE_BLOB        , Val_long(BLOB_TY)},
    {FIELD_TYPE_VAR_STRING  , Val_long(STRING_TY)},
    {FIELD_TYPE_STRING      , Val_long(STRING_TY)},
    {-1 /*default*/         , Val_long(UNKNOWN_TY)}
  };
  int i;

  /* in principle using bsearch() would be better -- but how can
   * we know that the left side of the map is properly sorted? 
   */

  for (i=0; map[i].mysql != -1 && map[i].mysql != type; i++)
    /* empty */ ;

  return map[i].caml;
}
----ocaml_mysql does some conversions between mysql types and ocaml_mysql types-----
let int2ml   str        = int_of_string str
let decimal2ml   str    = str
let int322ml str        = Int32.of_string str
let int642ml str        = Int64.of_string str
let nativeint2ml str    = Nativeint.of_string str
let float2ml str        = float_of_string str
let str2ml   str        = str
let enum2ml  str        = str
let blob2ml  str        = str
...and a few more for date, time, datetime
----
(* ml2xxx encodes OCaml values into strings that match the MysQL syntax of 
   the corresponding type *)

let ml2str str  = "'" ^ escape str ^ "'"
let ml2rstr conn str = "'" ^ real_escape conn str ^ "'"
let ml2blob     = ml2str
let ml2rblob    = ml2rstr
let ml2int x    = string_of_int x
let ml2decimal x    = x
let ml322int x  = Int32.to_string x
let ml642int x  = Int64.to_string x
let mlnative2int x = Nativeint.to_string x
let ml2float x  = string_of_float x
let ml2enum x   = escape x
let ml2renum x  = real_escape x
...and a few more for date, time, datetime
--------------------------------------------

------FROM mysql_com.h-------x
#define CLIENT_MULTI_QUERIES    CLIENT_MULTI_STATEMENTS    
#define FIELD_TYPE_DECIMAL     MYSQL_TYPE_DECIMAL
#define FIELD_TYPE_NEWDECIMAL  MYSQL_TYPE_NEWDECIMAL
#define FIELD_TYPE_TINY        MYSQL_TYPE_TINY
#define FIELD_TYPE_SHORT       MYSQL_TYPE_SHORT
#define FIELD_TYPE_LONG        MYSQL_TYPE_LONG
#define FIELD_TYPE_FLOAT       MYSQL_TYPE_FLOAT
#define FIELD_TYPE_DOUBLE      MYSQL_TYPE_DOUBLE
#define FIELD_TYPE_NULL        MYSQL_TYPE_NULL
#define FIELD_TYPE_TIMESTAMP   MYSQL_TYPE_TIMESTAMP
#define FIELD_TYPE_LONGLONG    MYSQL_TYPE_LONGLONG
#define FIELD_TYPE_INT24       MYSQL_TYPE_INT24
#define FIELD_TYPE_DATE        MYSQL_TYPE_DATE
#define FIELD_TYPE_TIME        MYSQL_TYPE_TIME
#define FIELD_TYPE_DATETIME    MYSQL_TYPE_DATETIME
#define FIELD_TYPE_YEAR        MYSQL_TYPE_YEAR
#define FIELD_TYPE_NEWDATE     MYSQL_TYPE_NEWDATE
#define FIELD_TYPE_ENUM        MYSQL_TYPE_ENUM
#define FIELD_TYPE_SET         MYSQL_TYPE_SET
#define FIELD_TYPE_TINY_BLOB   MYSQL_TYPE_TINY_BLOB
#define FIELD_TYPE_MEDIUM_BLOB MYSQL_TYPE_MEDIUM_BLOB
#define FIELD_TYPE_LONG_BLOB   MYSQL_TYPE_LONG_BLOB
#define FIELD_TYPE_BLOB        MYSQL_TYPE_BLOB
#define FIELD_TYPE_VAR_STRING  MYSQL_TYPE_VAR_STRING
#define FIELD_TYPE_STRING      MYSQL_TYPE_STRING
#define FIELD_TYPE_CHAR        MYSQL_TYPE_TINY
#define FIELD_TYPE_INTERVAL    MYSQL_TYPE_ENUM
#define FIELD_TYPE_GEOMETRY    MYSQL_TYPE_GEOMETRY
#define FIELD_TYPE_BIT         MYSQL_TYPE_BIT