Adding the pg_catalog.pg_am system table to CrateDB
In this article I’ll show you how I added the pg_catalog.pg_am
system table to CrateDB.
pg_am ¶
In PostgreSQL the pg_catalog.pg_am
table exposes the available index access methods within PostgreSQL. The table contains one row per index access method supported by the system.
The motivation to add the table to CrateDB is for compatibility with various PostgreSQL tools. CrateDB speaks the PostgreSQL wire protocol. This makes it possible to use PostgreSQL clients with CrateDB. At least in theory. In practice it is not always sufficient to only support the wire protocol, it is also necessary to emulate a wide range of tables within the pg_catalog
schema for clients to be happy.
CrateDB internally works differently than PostgreSQL and there is no clear one-to-one mapping of PostgreSQL index access methods to CrateDB index access methods. So the initial implementation of pg_catalog.pg_am
will return an empty result set.
Defining the table meta data ¶
In CrateDB the meta data of a table is held in memory in a structure called TableInfo
. This TableInfo
provides properties which describe the schema of the table. Most notable are a property called ident :: RelationName
, which identifies the table, and columns :: Collection<Reference>
which describes the columns in the table.
There are already several system tables implemented in the pg_catalog
schema. All of the meta data definitions reside in the io.crate.metadata.pgcatalog
package. To add the pg_am
table, we add a new class called PgAmTable
within this package.
In the class we add a constant to declare its identifier:
public static final RelationName IDENT = new RelationName(PgCatalogSchemaInfo.NAME, "pg_am");
RelationName
is a structure that consists of the table schema and the table name. Here the name of the schema is pg_catalog
, and the table name itself is pg_am
.
Next we need to provide a TableInfo
instance. TableInfo
is an interface, but we need a concrete instance. For system tables there is the SystemTable
class. This class implements the TableInfo
instance and it contains a convenient Builder
which makes it easy to create a SystemTable
instance.
To create a Builder
instance we can use the builder
factory method. It takes a RelationName
instance as argument:
public static SystemTable<Void> create() {
return SystemTable.<Void>builder(IDENT)
.build()
}
This is the minimal definition of a system table - without any columns.
You see here that the SystemTable
takes a generic, which I set to Void
.
Usually system tables are supposed to contain records. Data that the users can query. The generic type is the type of a single record that the table is exposing. This can be any plain old Java class.
Given that the pg_am
table is going to be empty I’m using Void
.
The pg_am table in PostgreSQL 13 contains four columns:
oid
of typeoid
- the row identifieramname
of typename
- the name of the access methodamhandler
of typeregproc
- the oid of the handler function. It references thepg_proc
table.amtype
of typechar
. Identifies if the access method is for tables (t
) or indices (i
).
The SystemTable.builder
provides a .add(name, type, getProperty)
method which can be used to add the columns. The arguments in detail:
name
is the name of the column asString
type
is the data type of the column, which must be an instance ofDataType
getProperty
is a function that takes a single record of the table and is supposed to return the value for the given column for the record. The type of the argument is the type of the generic of theSystemTable
. In the case ofpg_am
it isVoid
. The type of the return value must match the type of theDataType
.
After adding the first column it looks like this:
public static SystemTable<Void> create() {
return SystemTable.<Void>builder(IDENT)
.add("oid", DataTypes.INTEGER, ignored -> null)
.build()
}
CrateDB doesn’t have a OID
type yet, which is why I used INTEGER
here which has the same streaming implementation in the wire protocol, so it is compatible.
After adding the other remaining columns it looks like this:
public static SystemTable<Void> create() {
return SystemTable.<Void>builder(IDENT)
.add("oid", DataTypes.INTEGER, ignored -> null)
.add("amname", DataTypes.STRING, ignored -> null)
.add("amhandler", DataTypes.REGPROC, ignored -> null)
.add("amtype", DataTypes.BYTE, ignored -> null)
.build();
}
Each getProperty
function ignores the argument and returns null
as value. We can do that because we know there aren’t going to be any rows anyways.
Registering the table ¶
Next we need to register the table within the schema so that CrateDB is aware of its existence. There is one SchemaInfo
instance per system schema. SchemaInfo
is an interface that allows the system to retrieve TableInfo
instances.
For pg_catalog
the implementation is called PgCatalogSchemaInfo
. There we find a tableInfoMap
definition which lists all existing tables within the pg_catalog
. There we have to add a new entry for the pg_am
table:
Map.entry(PgAmTable.IDENT.name(), PgAmTable.create())
With that we wired up the meta data definition, but we also need to tell CrateDB how it receives the data for the table. For the pg_catalog
tables, this happens within the PgCatalogTableDefinitions
class. Within this class we find a map from RelationName
to StaticTableDefinition
. This StaticTableDefinition
defines how to retrieve the records of a table and how to evaluate its columns.
We add a new entry to this map:
tableDefinitions.put(PgAmTable.IDENT, new StaticTableDefinition<>(
() -> completedFuture(emptyList()),
PgAmTable.create().expressions(),
false
));
The StaticTableDefinition
constructor has several overloads, I took the one that best fits the case. The first argument is a supplier that is supposed to generate the Iterable<T>
which encapsulates the data of the table. T
here refers to the generic we used in the SystemTable
definition, which we set to Void
. It is a supplier so that the computation can be triggered once a user queries the table and it returns a future to enable implementations to do network IO without blocking a thread. For the pg_am
table it can return an empty list because there won’t be any records.
The second argument is a map from ColumnIdent
to RowCollectExpressionFactory
. ColumnIdent
is used to identify columns. It can describe both, top level columns like name
, and nested columns of objects, like address['zip']
. The RowCollectExpressionFactory
is used by the expression interpreter which is responsible for evaluating any kind of expressions in CrateDB. Our SystemTable
implementation that we already created provides a expressions()
property that returns the required map based on the column definition we created earlier.
The last argument tells CrateDB that the data retrieval for the table does not involve any IO.
With this registration in place, CrateDB knows about the schema of the table and it knows how to generate the data for it.
This is enough for users to query the table.
Adapting tests ¶
Now that we’ve added a new table several tests will be failing. These tests query the amount of tables or columns within the CrateDB cluster or list all the available tables and now that there is a new table and several new columns, the expected output is no longer valid.
Adding documentation ¶
The last remaining step is to add some documentation for the new table. The documentation of CrateDB is rewritten in restructured text and is built using Sphinx.
For most system tables there is a longer description of the tables contents and columns, but for pg_catalog
tables we’re a bit lazy and only list them within the docs/interfaces/postgres.rst
file and refer to the PostgreSQL documentation.
Wrap up ¶
All that’s left is to open up the pull-request and get it merged.