XALT Database Design

XALT provides a MySQL database schema and tools to load the database from the *.json files that the XALT library (libxalt_init.so) and the program xalt_run_submision generate.

The basic flow is the following:

  1. libxalt_init.so and xalt_run_submission generate *.json files for tracking what happens for each tracked execution.
  2. The ld wrapper that XALT provides generates a *.json file for information about the linking of the executable.
  3. These *.json files generated are transported by either the “file” method or “syslog” method.
  4. When using the file transmission style, the program xalt_file_to_db.py can be used to ingest the *.json files into the database.
  5. When using the syslog transmission style, the program xalt_syslog_to_db.py can be used to ingest the *.json records into the database.

Syslog Transmission style

When XALT uses the syslog method to transmit data to the database, it has to be careful to not overflow the syslog records, so XALT breaks up the json data string into blocks.

The each syslog record is looks like this:

<date> XALT_LOGGING_<cluster> V:3 kind:<type> nb:<nb> syshost:<cluster> key:<key> idx:<idx> value:<value>

Where:

<date>:     is the typical syslog timestamp
<cluster>:  is your cluster name like stampede2 or frontera
<kind>:     is the kind of record.  This can be three types: link, run or pkg
<nb>:       is the number of blocks.
<key>:      is made up of two parts.  The first is related to the
            kind and the uuid for the link, pkg or run.  The run has
            two types either run_strt or run_fini to indicate the
            start or end record.
<idx>:      the index into the number of blocks.  It runs from 0 to nb-1.
<value>:    The idx's part of the syslog record.

The file py_src/xalt_syslog_to_db.in.py file contains the logic to combine the split apart values that make up the *.json record.

If a site is interested in processing the *.json record they may be better off using the file method and write the data to a shared global location.

XALT MySQL database

The XALT database has several tables. Since there are several many-to-many relationships, we have designed the database with “join” tables that reduce storing redundant information. We have four core tables: XALT_LINK, XALT_RUN, XALT_OBJECT and XALT_ENV_NAME. When a linker builds a program or shared library, a single entry is added to the XALT_LINK table. This is true every time there is a link even if user is building the same program over and over. This table contains information about the program such as the builder, the modules that where in the environment and what program was used to link it. Typically, there will be multiple libraries that constitute the built program either static or shared libraries. Each of these is stored in XALT_OBJECT. There is only one entry in XALT_OBJECT for each object path that has the same sha1sum. The join table JOIN_LINK_OBJECT connects the XALT_LINK entry to the object files that belong to the program.

Each time a program is run and it is tracked by XALT, we create a single entry in the XALT_RUN table. We take the shared libraries that belong to that executable and place them into the XALT_OBJECT table and connect to the XALT_RUN entry via the JOIN_RUN_OBJECT join table. We also record a filtered set of the environment variables in the XALT_ENV_NAME table and use the JOIN_RUN_ENV table to connect the values of the variable to the name and the program execution.

It is fair to ask why we store the shared libraries twice, once at link time and another at run time. The reason is we can track changes in versions of shared libraries automatically used by a program over time. When a program is built, it links with the libraries that exist at the current time. In the future, there may be a newer shared library that is used instead. We wish to track that. Also if an executable is not built under XALT, we will still capture the shared libraries.

Sites will notice since XALT is tracking both scalar and MPI executables that there can be more than one execution associated with single job number. This is quite common.