Adding the pg_catalog.pg_am system table to CrateDB

Wednesday, December 9, 2020

In this article I’ll show you how I added the pg_catalog.pg_am system table to CrateDB.

pg_am ¶

In PostgreSQL the pg_catalog.pg_am table exposes the available index access methods within PostgreSQL. The table contains one row per index access method supported by the system.

The motivation to add the table to CrateDB is for compatibility with various PostgreSQL tools. CrateDB speaks the PostgreSQL wire protocol. This makes it possible to use PostgreSQL clients with CrateDB. At least in theory. In practice it is not always sufficient to only support the wire protocol, it is also necessary to emulate a wide range of tables within the pg_catalog schema for clients to be happy.

CrateDB internally works differently than PostgreSQL and there is no clear one-to-one mapping of PostgreSQL index access methods to CrateDB index access methods. So the initial implementation of pg_catalog.pg_am will return an empty result set.

Defining the table meta data ¶

In CrateDB the meta data of a table is held in memory in a structure called TableInfo. This TableInfo provides properties which describe the schema of the table. Most notable are a property called ident :: RelationName, which identifies the table, and columns :: Collection<Reference> which describes the columns in the table.

There are already several system tables implemented in the pg_catalog schema. All of the meta data definitions reside in the io.crate.metadata.pgcatalog package. To add the pg_am table, we add a new class called PgAmTable within this package.

In the class we add a constant to declare its identifier:

public static final RelationName IDENT = new RelationName(PgCatalogSchemaInfo.NAME, "pg_am");

RelationName is a structure that consists of the table schema and the table name. Here the name of the schema is pg_catalog, and the table name itself is pg_am.

Next we need to provide a TableInfo instance. TableInfo is an interface, but we need a concrete instance. For system tables there is the SystemTable class. This class implements the TableInfo instance and it contains a convenient Builder which makes it easy to create a SystemTable instance.

To create a Builder instance we can use the builder factory method. It takes a RelationName instance as argument:

public static SystemTable<Void> create() {
  return SystemTable.<Void>builder(IDENT)
      .build()
}

This is the minimal definition of a system table - without any columns.

You see here that the SystemTable takes a generic, which I set to Void. Usually system tables are supposed to contain records. Data that the users can query. The generic type is the type of a single record that the table is exposing. This can be any plain old Java class.

Given that the pg_am table is going to be empty I’m using Void.

The pg_am table in PostgreSQL 13 contains four columns:

oid of type oid - the row identifier
amname of type name - the name of the access method
amhandler of type regproc - the oid of the handler function. It references the pg_proc table.
amtype of type char. Identifies if the access method is for tables (t) or indices (i).

The SystemTable.builder provides a .add(name, type, getProperty) method which can be used to add the columns. The arguments in detail:

name is the name of the column as String
type is the data type of the column, which must be an instance of DataType
getProperty is a function that takes a single record of the table and is supposed to return the value for the given column for the record. The type of the argument is the type of the generic of the SystemTable. In the case of pg_am it is Void. The type of the return value must match the type of the DataType.

After adding the first column it looks like this:

public static SystemTable<Void> create() {
  return SystemTable.<Void>builder(IDENT)
      .add("oid", DataTypes.INTEGER, ignored -> null)
      .build()
}

CrateDB doesn’t have a OID type yet, which is why I used INTEGER here which has the same streaming implementation in the wire protocol, so it is compatible.

After adding the other remaining columns it looks like this:

public static SystemTable<Void> create() {
    return SystemTable.<Void>builder(IDENT)
        .add("oid", DataTypes.INTEGER, ignored -> null)
        .add("amname", DataTypes.STRING, ignored -> null)
        .add("amhandler", DataTypes.REGPROC, ignored -> null)
        .add("amtype", DataTypes.BYTE, ignored -> null)
        .build();
}

Each getProperty function ignores the argument and returns null as value. We can do that because we know there aren’t going to be any rows anyways.

Registering the table ¶

Next we need to register the table within the schema so that CrateDB is aware of its existence. There is one SchemaInfo instance per system schema. SchemaInfo is an interface that allows the system to retrieve TableInfo instances.

For pg_catalog the implementation is called PgCatalogSchemaInfo. There we find a tableInfoMap definition which lists all existing tables within the pg_catalog. There we have to add a new entry for the pg_am table:

Map.entry(PgAmTable.IDENT.name(), PgAmTable.create())

With that we wired up the meta data definition, but we also need to tell CrateDB how it receives the data for the table. For the pg_catalog tables, this happens within the PgCatalogTableDefinitions class. Within this class we find a map from RelationName to StaticTableDefinition. This StaticTableDefinition defines how to retrieve the records of a table and how to evaluate its columns.

We add a new entry to this map:

tableDefinitions.put(PgAmTable.IDENT, new StaticTableDefinition<>(
    () -> completedFuture(emptyList()),
    PgAmTable.create().expressions(),
    false
));

The StaticTableDefinition constructor has several overloads, I took the one that best fits the case. The first argument is a supplier that is supposed to generate the Iterable<T> which encapsulates the data of the table. T here refers to the generic we used in the SystemTable definition, which we set to Void. It is a supplier so that the computation can be triggered once a user queries the table and it returns a future to enable implementations to do network IO without blocking a thread. For the pg_am table it can return an empty list because there won’t be any records.

The second argument is a map from ColumnIdent to RowCollectExpressionFactory. ColumnIdent is used to identify columns. It can describe both, top level columns like name, and nested columns of objects, like address['zip']. The RowCollectExpressionFactory is used by the expression interpreter which is responsible for evaluating any kind of expressions in CrateDB. Our SystemTable implementation that we already created provides a expressions() property that returns the required map based on the column definition we created earlier.

The last argument tells CrateDB that the data retrieval for the table does not involve any IO.

With this registration in place, CrateDB knows about the schema of the table and it knows how to generate the data for it.

This is enough for users to query the table.

Adapting tests ¶

Now that we’ve added a new table several tests will be failing. These tests query the amount of tables or columns within the CrateDB cluster or list all the available tables and now that there is a new table and several new columns, the expected output is no longer valid.

Adding documentation ¶

The last remaining step is to add some documentation for the new table. The documentation of CrateDB is rewritten in restructured text and is built using Sphinx.

For most system tables there is a longer description of the tables contents and columns, but for pg_catalog tables we’re a bit lazy and only list them within the docs/interfaces/postgres.rst file and refer to the PostgreSQL documentation.

Wrap up ¶

All that’s left is to open up the pull-request and get it merged.