Using tshark to monitor PostgreSQL traffic

  Saturday, September 24, 2022

As part of my work on CrateDB I occasionally have to debug its PostgreSQL wire protocol implementation. One tool that has been incredibly helpful for that is tshark, which is part of Wireshark.

What is tshark

tshark is a network protocol analyzer. It lets you listen to network traffic or read packets from a previously saved capture file.

One nice thing about it is that it comes with support for binary protocols like the PostgreSQL wire protocol. That allows you to view decoded PostgreSQL messages.

Running PostgreSQL

Setup PostgreSQL locally by following its installation instructions or by following the instructions from your distribution.

PostgreSQL can communicate with clients either via Unix socket or via TCP. To be able to observe the traffic with tshark you need to connect via TCP. First confirm that PostgreSQL is listening on a network interface by viewing its startup log messages. I use journalctl -u postgresql -e to view the logs. You should see something like this:

listening on IPv6 address "::1", port 5432
listening on IPv4 address "127.0.0.1", port 5432
listening on Unix socket "/run/postgresql/.s.PGSQL.5432"

The IPV4 or IPv6 entries is what we are after. 127.0.0.1 is the loopback device. If you see a different IP address and are on Linux, you can run ip addr and look for the IP to find the corresponding device.

With that information, we can move on to using tshark.

Using tshark

Install tshark using your favorite package manager. In Archlinux it is part of the wireshark-cli package: pacman -S wireshark-cli.

Invoke tshark like this:

tshark -i lo \
  -f 'tcp port 5432' \
  -d tcp.port==5432,pgsql \
  -T fields -e pgsql.length -e pgsql.type -e pgsql.query
  • With -i you can specify the interface to listen on. lo is the loopback device.
  • -f is used to define a capture filter. Packets that don’t match this filter are discarded.
  • -d is for the decoding feature. -d tcp.port==5432,pgsql reads as “Decode any TCP traffic on port 5432 as pgsql”. You can run tshark -d . to get a list of available layers. To get a list of available protocols it can decode, you’d use something like this: -d tcp.port==5432,..
  • -T sets the output format for the decoded package data. In this case we use fields, but you could also use -T json to get JSON output.
  • -e <field> adds a field to the list of fields to display. You can repeat this option multiple times to select all fields you want to output.

All of this information and lots more is available in its man page.

To learn what fields are available you can refer to the Display Filter Reference. The one for pgsql are listed here.

You should see some output like “Capturing on ‘Loopback: lo’”. Leave this command running.

Connecting with a client

The only step missing is to generate some traffic. Connect with psql but make sure to use TCP: psql -h localhost. This should generate some output in the terminal where tshark is running. You should see something like this:

8	Startup message	
8,26,25,23,38,23,25,27,20,25,24,32,35,27,12,5	Authentication request,Parameter status,Parameter status,Parameter status,Parameter status,Parameter status,Parameter status,Parameter status,Parameter status,Parameter status,Parameter status,Parameter status,Parameter status,Parameter status,Backend key data,Ready for query
37	Simple query	set client_encoding to 'unicode'
8,5	Command completion,Ready for query	

The exact output will depend on your -T and -e settings and your PostgreSQL version.

That’s it. If you’re interesting in learning more about what this output means, head over to the Protocol Documentation.