Adding the pg_catalog.pg_am system table to CrateDB
Table of content
- pg_am
- Defining the table meta data
- Registering the table
- Adapting tests
- Adding documentation
- Wrap up
In this article I’ll show you how I added the
pg_catalog.pg_am system table to CrateDB.
pg_am ¶
In PostgreSQL the pg_catalog.pg_am table exposes the
available index access methods within PostgreSQL. The table contains one
row per index access method supported by the system.
The motivation to add the table to CrateDB is for compatibility with
various PostgreSQL tools. CrateDB speaks the PostgreSQL wire protocol.
This makes it possible to use PostgreSQL clients with CrateDB. At least
in theory. In practice it is not always sufficient to only
support the wire protocol, it is also necessary to emulate a wide range
of tables within the pg_catalog schema for clients to be
happy.
CrateDB internally works differently than PostgreSQL and there is no
clear one-to-one mapping of PostgreSQL index access methods to CrateDB
index access methods. So the initial implementation of
pg_catalog.pg_am will return an empty result set.
Defining the table meta data ¶
In CrateDB the meta data of a table is held in memory in a structure
called TableInfo. This TableInfo provides
properties which describe the schema of the table. Most notable are a
property called ident :: RelationName, which identifies the
table, and columns :: Collection<Reference> which
describes the columns in the table.
There are already several system tables implemented in the
pg_catalog schema. All of the meta data definitions reside
in the io.crate.metadata.pgcatalog package. To add the
pg_am table, we add a new class called
PgAmTable within this package.
In the class we add a constant to declare its identifier:
public static final RelationName IDENT = new RelationName(PgCatalogSchemaInfo.NAME, "pg_am");RelationName is a structure that consists of the table
schema and the table name. Here the name of the schema is
pg_catalog, and the table name itself is
pg_am.
Next we need to provide a TableInfo instance.
TableInfo is an interface, but we need a concrete instance.
For system tables there is the SystemTable class. This
class implements the TableInfo instance and it contains a
convenient Builder which makes it easy to create a
SystemTable instance.
To create a Builder instance we can use the
builder factory method. It takes a
RelationName instance as argument:
public static SystemTable<Void> create() {
return SystemTable.<Void>builder(IDENT)
.build()
}This is the minimal definition of a system table - without any columns.
You see here that the SystemTable takes a generic, which
I set to Void. Usually system tables are supposed to
contain records. Data that the users can query. The generic type is the
type of a single record that the table is exposing. This can be any
plain old Java class.
Given that the pg_am table is going to be empty I’m
using Void.
The pg_am table in PostgreSQL 13 contains four columns:
oidof typeoid- the row identifieramnameof typename- the name of the access methodamhandlerof typeregproc- the oid of the handler function. It references thepg_proctable.amtypeof typechar. Identifies if the access method is for tables (t) or indices (i).
The SystemTable.builder provides a
.add(name, type, getProperty) method which can be used to
add the columns. The arguments in detail:
nameis the name of the column asStringtypeis the data type of the column, which must be an instance ofDataTypegetPropertyis a function that takes a single record of the table and is supposed to return the value for the given column for the record. The type of the argument is the type of the generic of theSystemTable. In the case ofpg_amit isVoid. The type of the return value must match the type of theDataType.
After adding the first column it looks like this:
public static SystemTable<Void> create() {
return SystemTable.<Void>builder(IDENT)
.add("oid", DataTypes.INTEGER, ignored -> null)
.build()
}CrateDB doesn’t have a OID type yet, which is why I used
INTEGER here which has the same streaming implementation in
the wire protocol, so it is compatible.
After adding the other remaining columns it looks like this:
public static SystemTable<Void> create() {
return SystemTable.<Void>builder(IDENT)
.add("oid", DataTypes.INTEGER, ignored -> null)
.add("amname", DataTypes.STRING, ignored -> null)
.add("amhandler", DataTypes.REGPROC, ignored -> null)
.add("amtype", DataTypes.BYTE, ignored -> null)
.build();
}Each getProperty function ignores the argument and
returns null as value. We can do that because we know there
aren’t going to be any rows anyways.
Registering the table ¶
Next we need to register the table within the schema so that CrateDB
is aware of its existence. There is one SchemaInfo instance
per system schema. SchemaInfo is an interface that allows
the system to retrieve TableInfo instances.
For pg_catalog the implementation is called
PgCatalogSchemaInfo. There we find a
tableInfoMap definition which lists all existing tables
within the pg_catalog. There we have to add a new entry for
the pg_am table:
Map.entry(PgAmTable.IDENT.name(), PgAmTable.create()) With that we wired up the meta data definition, but we also need to
tell CrateDB how it receives the data for the table. For the
pg_catalog tables, this happens within the
PgCatalogTableDefinitions class. Within this class we find
a map from RelationName to
StaticTableDefinition. This
StaticTableDefinition defines how to retrieve the records
of a table and how to evaluate its columns.
We add a new entry to this map:
tableDefinitions.put(PgAmTable.IDENT, new StaticTableDefinition<>(
() -> completedFuture(emptyList()),
PgAmTable.create().expressions(),
false
));The StaticTableDefinition constructor has several
overloads, I took the one that best fits the case. The first argument is
a supplier that is supposed to generate the
Iterable<T> which encapsulates the data of the table.
T here refers to the generic we used in the
SystemTable definition, which we set to Void.
It is a supplier so that the computation can be triggered once a user
queries the table and it returns a future to enable implementations to
do network IO without blocking a thread. For the pg_am
table it can return an empty list because there won’t be any
records.
The second argument is a map from ColumnIdent to
RowCollectExpressionFactory. ColumnIdent is
used to identify columns. It can describe both, top level columns like
name, and nested columns of objects, like
address['zip']. The
RowCollectExpressionFactory is used by the expression
interpreter which is responsible for evaluating any kind of expressions
in CrateDB. Our SystemTable implementation that we already
created provides a expressions() property that returns the
required map based on the column definition we created earlier.
The last argument tells CrateDB that the data retrieval for the table does not involve any IO.
With this registration in place, CrateDB knows about the schema of the table and it knows how to generate the data for it.
This is enough for users to query the table.
Adapting tests ¶
Now that we’ve added a new table several tests will be failing. These tests query the amount of tables or columns within the CrateDB cluster or list all the available tables and now that there is a new table and several new columns, the expected output is no longer valid.
Adding documentation ¶
The last remaining step is to add some documentation for the new table. The documentation of CrateDB is rewritten in restructured text and is built using Sphinx.
For most system tables there is a longer description of the tables
contents and columns, but for pg_catalog tables we’re a bit
lazy and only list them within the
docs/interfaces/postgres.rst file and refer to the
PostgreSQL documentation.
Wrap up ¶
All that’s left is to open up the pull-request and get it merged.