Using tshark to monitor PostgreSQL traffic
Table of content
As part of my work on CrateDB I occasionally have to debug its PostgreSQL wire protocol implementation. One tool that has been incredibly helpful for that is tshark, which is part of Wireshark.
What is tshark ¶
tshark
is a network protocol analyzer. It lets you
listen to network traffic or read packets from a previously saved
capture file.
One nice thing about it is that it comes with support for binary protocols like the PostgreSQL wire protocol. That allows you to view decoded PostgreSQL messages.
Running PostgreSQL ¶
Setup PostgreSQL locally by following its installation instructions or by following the instructions from your distribution.
PostgreSQL can communicate with clients either via Unix socket or via
TCP. To be able to observe the traffic with tshark you need to connect
via TCP. First confirm that PostgreSQL is listening on a network
interface by viewing its startup log messages. I use
journalctl -u postgresql -e
to view the logs. You should
see something like this:
listening on IPv6 address "::1", port 5432
listening on IPv4 address "127.0.0.1", port 5432
listening on Unix socket "/run/postgresql/.s.PGSQL.5432"
The IPV4 or IPv6 entries is what we are after. 127.0.0.1
is the loopback device. If you see a different IP address and are on
Linux, you can run ip addr
and look for the IP to find the
corresponding device.
With that information, we can move on to using
tshark
.
Using tshark ¶
Install tshark
using your favorite package manager. In
Archlinux it is part of the wireshark-cli
package:
pacman -S wireshark-cli
.
Invoke tshark
like this:
tshark -i lo \
-f 'tcp port 5432' \
-d tcp.port==5432,pgsql \
-T fields -e pgsql.length -e pgsql.type -e pgsql.query
- With
-i
you can specify the interface to listen on.lo
is the loopback device. -f
is used to define a capture filter. Packets that don’t match this filter are discarded.-d
is for the decoding feature.-d tcp.port==5432,pgsql
reads as “Decode any TCP traffic on port 5432 aspgsql
”. You can runtshark -d .
to get a list of available layers. To get a list of available protocols it can decode, you’d use something like this:-d tcp.port==5432,.
.-T
sets the output format for the decoded package data. In this case we usefields
, but you could also use-T json
to get JSON output.-e <field>
adds a field to the list of fields to display. You can repeat this option multiple times to select all fields you want to output.
All of this information and lots more is available in its man page.
To learn what fields are available you can refer to the Display Filter
Reference. The one for pgsql
are listed here.
You should see some output like “Capturing on ‘Loopback: lo’”. Leave this command running.
Connecting with a client ¶
The only step missing is to generate some traffic. Connect with
psql
but make sure to use TCP:
psql -h localhost
. This should generate some output in the
terminal where tshark
is running. You should see something
like this:
8 Startup message
8,26,25,23,38,23,25,27,20,25,24,32,35,27,12,5 Authentication request,Parameter status,Parameter status,Parameter status,Parameter status,Parameter status,Parameter status,Parameter status,Parameter status,Parameter status,Parameter status,Parameter status,Parameter status,Parameter status,Backend key data,Ready for query
37 Simple query set client_encoding to 'unicode'
8,5 Command completion,Ready for query
The exact output will depend on your -T
and
-e
settings and your PostgreSQL version.
That’s it. If you’re interesting in learning more about what this output means, head over to the Protocol Documentation.