Using tshark to monitor PostgreSQL traffic
As part of my work on CrateDB I occasionally have to debug its PostgreSQL wire protocol implementation. One tool that has been incredibly helpful for that is tshark, which is part of Wireshark.
What is tshark ¶
tshark
is a network protocol analyzer. It lets you listen to network traffic
or read packets from a previously saved capture file.
One nice thing about it is that it comes with support for binary protocols like the PostgreSQL wire protocol. That allows you to view decoded PostgreSQL messages.
Running PostgreSQL ¶
Setup PostgreSQL locally by following its installation instructions or by following the instructions from your distribution.
PostgreSQL can communicate with clients either via Unix socket or via TCP. To be able to observe the traffic with tshark you need to connect via TCP.
First confirm that PostgreSQL is listening on a network interface by viewing its startup log messages. I use journalctl -u postgresql -e
to view the logs. You should see something like this:
listening on IPv6 address "::1", port 5432
listening on IPv4 address "127.0.0.1", port 5432
listening on Unix socket "/run/postgresql/.s.PGSQL.5432"
The IPV4 or IPv6 entries is what we are after. 127.0.0.1
is the loopback device. If you see a different IP address and are on Linux, you can run ip addr
and look for the IP to find the corresponding device.
With that information, we can move on to using tshark
.
Using tshark ¶
Install tshark
using your favorite package manager. In Archlinux it is part of the wireshark-cli
package: pacman -S wireshark-cli
.
Invoke tshark
like this:
tshark -i lo \
-f 'tcp port 5432' \
-d tcp.port==5432,pgsql \
-T fields -e pgsql.length -e pgsql.type -e pgsql.query
- With
-i
you can specify the interface to listen on.lo
is the loopback device. -f
is used to define a capture filter. Packets that don’t match this filter are discarded.-d
is for the decoding feature.-d tcp.port==5432,pgsql
reads as “Decode any TCP traffic on port 5432 aspgsql
”. You can runtshark -d .
to get a list of available layers. To get a list of available protocols it can decode, you’d use something like this:-d tcp.port==5432,.
.-T
sets the output format for the decoded package data. In this case we usefields
, but you could also use-T json
to get JSON output.-e <field>
adds a field to the list of fields to display. You can repeat this option multiple times to select all fields you want to output.
All of this information and lots more is available in its man page.
To learn what fields are available you can refer to the Display Filter Reference. The one for pgsql
are listed here.
You should see some output like “Capturing on ‘Loopback: lo’”. Leave this command running.
Connecting with a client ¶
The only step missing is to generate some traffic. Connect with psql
but make sure to use TCP: psql -h localhost
.
This should generate some output in the terminal where tshark
is running. You should see something like this:
8 Startup message
8,26,25,23,38,23,25,27,20,25,24,32,35,27,12,5 Authentication request,Parameter status,Parameter status,Parameter status,Parameter status,Parameter status,Parameter status,Parameter status,Parameter status,Parameter status,Parameter status,Parameter status,Parameter status,Parameter status,Backend key data,Ready for query
37 Simple query set client_encoding to 'unicode'
8,5 Command completion,Ready for query
The exact output will depend on your -T
and -e
settings and your PostgreSQL version.
That’s it. If you’re interesting in learning more about what this output means, head over to the Protocol Documentation.