Adding the pg_catalog.pg_am system table to CrateDB
Table of content
- pg_am
- Defining the table meta data
- Registering the table
- Adapting tests
- Adding documentation
- Wrap up
In this article I’ll show you how I added the
pg_catalog.pg_am
system table to CrateDB.
pg_am ¶
In PostgreSQL the pg_catalog.pg_am
table exposes the
available index access methods within PostgreSQL. The table contains one
row per index access method supported by the system.
The motivation to add the table to CrateDB is for compatibility with
various PostgreSQL tools. CrateDB speaks the PostgreSQL wire protocol.
This makes it possible to use PostgreSQL clients with CrateDB. At least
in theory. In practice it is not always sufficient to only
support the wire protocol, it is also necessary to emulate a wide range
of tables within the pg_catalog
schema for clients to be
happy.
CrateDB internally works differently than PostgreSQL and there is no
clear one-to-one mapping of PostgreSQL index access methods to CrateDB
index access methods. So the initial implementation of
pg_catalog.pg_am
will return an empty result set.
Defining the table meta data ¶
In CrateDB the meta data of a table is held in memory in a structure
called TableInfo
. This TableInfo
provides
properties which describe the schema of the table. Most notable are a
property called ident :: RelationName
, which identifies the
table, and columns :: Collection<Reference>
which
describes the columns in the table.
There are already several system tables implemented in the
pg_catalog
schema. All of the meta data definitions reside
in the io.crate.metadata.pgcatalog
package. To add the
pg_am
table, we add a new class called
PgAmTable
within this package.
In the class we add a constant to declare its identifier:
public static final RelationName IDENT = new RelationName(PgCatalogSchemaInfo.NAME, "pg_am");
RelationName
is a structure that consists of the table
schema and the table name. Here the name of the schema is
pg_catalog
, and the table name itself is
pg_am
.
Next we need to provide a TableInfo
instance.
TableInfo
is an interface, but we need a concrete instance.
For system tables there is the SystemTable
class. This
class implements the TableInfo
instance and it contains a
convenient Builder
which makes it easy to create a
SystemTable
instance.
To create a Builder
instance we can use the
builder
factory method. It takes a
RelationName
instance as argument:
public static SystemTable<Void> create() {
return SystemTable.<Void>builder(IDENT)
.build()
}
This is the minimal definition of a system table - without any columns.
You see here that the SystemTable
takes a generic, which
I set to Void
. Usually system tables are supposed to
contain records. Data that the users can query. The generic type is the
type of a single record that the table is exposing. This can be any
plain old Java class.
Given that the pg_am
table is going to be empty I’m
using Void
.
The pg_am table in PostgreSQL 13 contains four columns:
oid
of typeoid
- the row identifieramname
of typename
- the name of the access methodamhandler
of typeregproc
- the oid of the handler function. It references thepg_proc
table.amtype
of typechar
. Identifies if the access method is for tables (t
) or indices (i
).
The SystemTable.builder
provides a
.add(name, type, getProperty)
method which can be used to
add the columns. The arguments in detail:
name
is the name of the column asString
type
is the data type of the column, which must be an instance ofDataType
getProperty
is a function that takes a single record of the table and is supposed to return the value for the given column for the record. The type of the argument is the type of the generic of theSystemTable
. In the case ofpg_am
it isVoid
. The type of the return value must match the type of theDataType
.
After adding the first column it looks like this:
public static SystemTable<Void> create() {
return SystemTable.<Void>builder(IDENT)
.add("oid", DataTypes.INTEGER, ignored -> null)
.build()
}
CrateDB doesn’t have a OID
type yet, which is why I used
INTEGER
here which has the same streaming implementation in
the wire protocol, so it is compatible.
After adding the other remaining columns it looks like this:
public static SystemTable<Void> create() {
return SystemTable.<Void>builder(IDENT)
.add("oid", DataTypes.INTEGER, ignored -> null)
.add("amname", DataTypes.STRING, ignored -> null)
.add("amhandler", DataTypes.REGPROC, ignored -> null)
.add("amtype", DataTypes.BYTE, ignored -> null)
.build();
}
Each getProperty
function ignores the argument and
returns null
as value. We can do that because we know there
aren’t going to be any rows anyways.
Registering the table ¶
Next we need to register the table within the schema so that CrateDB
is aware of its existence. There is one SchemaInfo
instance
per system schema. SchemaInfo
is an interface that allows
the system to retrieve TableInfo
instances.
For pg_catalog
the implementation is called
PgCatalogSchemaInfo
. There we find a
tableInfoMap
definition which lists all existing tables
within the pg_catalog
. There we have to add a new entry for
the pg_am
table:
Map.entry(PgAmTable.IDENT.name(), PgAmTable.create())
With that we wired up the meta data definition, but we also need to
tell CrateDB how it receives the data for the table. For the
pg_catalog
tables, this happens within the
PgCatalogTableDefinitions
class. Within this class we find
a map from RelationName
to
StaticTableDefinition
. This
StaticTableDefinition
defines how to retrieve the records
of a table and how to evaluate its columns.
We add a new entry to this map:
.put(PgAmTable.IDENT, new StaticTableDefinition<>(
tableDefinitions() -> completedFuture(emptyList()),
.create().expressions(),
PgAmTablefalse
));
The StaticTableDefinition
constructor has several
overloads, I took the one that best fits the case. The first argument is
a supplier that is supposed to generate the
Iterable<T>
which encapsulates the data of the table.
T
here refers to the generic we used in the
SystemTable
definition, which we set to Void
.
It is a supplier so that the computation can be triggered once a user
queries the table and it returns a future to enable implementations to
do network IO without blocking a thread. For the pg_am
table it can return an empty list because there won’t be any
records.
The second argument is a map from ColumnIdent
to
RowCollectExpressionFactory
. ColumnIdent
is
used to identify columns. It can describe both, top level columns like
name
, and nested columns of objects, like
address['zip']
. The
RowCollectExpressionFactory
is used by the expression
interpreter which is responsible for evaluating any kind of expressions
in CrateDB. Our SystemTable
implementation that we already
created provides a expressions()
property that returns the
required map based on the column definition we created earlier.
The last argument tells CrateDB that the data retrieval for the table does not involve any IO.
With this registration in place, CrateDB knows about the schema of the table and it knows how to generate the data for it.
This is enough for users to query the table.
Adapting tests ¶
Now that we’ve added a new table several tests will be failing. These tests query the amount of tables or columns within the CrateDB cluster or list all the available tables and now that there is a new table and several new columns, the expected output is no longer valid.
Adding documentation ¶
The last remaining step is to add some documentation for the new table. The documentation of CrateDB is rewritten in restructured text and is built using Sphinx.
For most system tables there is a longer description of the tables
contents and columns, but for pg_catalog
tables we’re a bit
lazy and only list them within the
docs/interfaces/postgres.rst
file and refer to the
PostgreSQL documentation.
Wrap up ¶
All that’s left is to open up the pull-request and get it merged.