A tree sitting in your editor
This is a short introduction to the tree-sitter integration in
Neovim based on a use-case I had: To find content in a TOML
file
close to the cursor position and then launch an application using this
information.
The use-case ¶
I had a use-case where I wanted to be able to trigger an application from within Neovim and pass along information based on file contents close to the cursors position.
I wanted to do this from within TOML files. An example looks like this:
[setup]
statement_files = ["sql/uservisits.sql"]
[[queries]]
name = "global avg"
statement = '''select avg("adRevenue") from uservisits'''
iterations = 500
[[queries]]
name = "global max-long"
statement = "select max(duration) from uservisits"
iterations = 500
[teardown]
statements = ["drop table if exists uservisits"]
If I hit a key combination it should notice whether the cursor is within a
[setup]
, [teardown]
or [[queries]]
block. If it’s within [[queries]]
it should extract the name
value within it launch an application with the
value as argument.
Most TOML
parser libraries don’t preserve whitespace information, but instead
provide the result as a data dictionary. Without whitespace information it
would be difficult to figure out how the cursor position relates to the data.
Therefore a regular parser library is out of the question. This is where tree-sitter comes in.
What’s tree-sitter? ¶
From the Tree-sitter website:
Tree-sitter is a parser generator tool and an incremental parsing library. It can build a concrete syntax tree for a source file and efficiently update the syntax tree as the source file is edited
The tree-sitter runtime library is written in C. That made it possible for Neovim to embed it and provide an API for users and plugin authors to retrieve and query the syntax tree of a document.
To use tree-sitter you need to install language specific parsers. You can learn more about that in nvim-treesitter
A syntax tree ¶
Parsing the above toml
example yields a syntax tree like this:
table [0, 0] - [3, 0]
bare_key [0, 1] - [0, 6]
pair [1, 0] - [1, 40]
bare_key [1, 0] - [1, 15]
array [1, 18] - [1, 40]
string [1, 19] - [1, 39]
table_array_element [3, 0] - [8, 0]
bare_key [3, 2] - [3, 9]
pair [4, 0] - [4, 19]
bare_key [4, 0] - [4, 4]
string [4, 7] - [4, 19]
pair [5, 0] - [5, 57]
bare_key [5, 0] - [5, 9]
string [5, 12] - [5, 57]
pair [6, 0] - [6, 16]
bare_key [6, 0] - [6, 10]
integer [6, 13] - [6, 16]
table_array_element [8, 0] - [13, 0]
bare_key [8, 2] - [8, 9]
pair [9, 0] - [9, 24]
bare_key [9, 0] - [9, 4]
string [9, 7] - [9, 24]
pair [10, 0] - [10, 50]
bare_key [10, 0] - [10, 9]
string [10, 12] - [10, 50]
pair [11, 0] - [11, 16]
bare_key [11, 0] - [11, 10]
integer [11, 13] - [11, 16]
table [13, 0] - [15, 0]
bare_key [13, 1] - [13, 9]
pair [14, 0] - [14, 48]
bare_key [14, 0] - [14, 10]
array [14, 13] - [14, 48]
string [14, 14] - [14, 47]
On a first glance this may look like gibberish. What you see here are the names
of the syntax nodes and their positions within the document. In the square
brackets you see [start_row, start_col]
and [end_row, end_col]
. The exact
syntax nodes always depend on a concrete parser implementation and the language
grammar. The tree for a Python program will look different.
In this example the top-level nodes are:
table
for[setup]
and[teardown]
table_array_element
for[[queries]]
To make it easier to learn how text relates to syntax nodes you can use the playground plugin for Neovim. It helps inspecting a syntax tree as it highlights the related text as you navigate through the syntax tree:
(Neovim 0.9 adds a :InspectTree
command that can do roughly the same)
Now, how can you use this syntax tree to find the needed information?
Neovim tree-sitter API ¶
Usually the best way to learn about APIs in Neovim is to use the built-in help
system. For tree-sitter that would be :help lua-treesitter
. I’ll go over the
main components now, so you don’t need to consult the help page immediately,
but if you intend to play around with tree-sitter yourself, make sure to read
it eventually.
First we need a parser. We can get that by using the get_parser
function
within the vim.treesitter
module:
local parser = vim.treesitter.get_parser()
The parser will be bound to the current buffer/document and we can immediately
parse the document using the parse
method:
local trees = parser:parse()
The method returns a list (or in Lua: tables) of trees, not a single tree. This is because in languages like Markdown you can have nested languages, and you’d get a tree for each.
My toml
files don’t contain nested languages so I ignore that and use the
first tree, and then retrieve the root node:
local root = trees[1]:root()
Neovim provides a neat function that lets us retrieve the smallest node
spanning a given range. We can use this function to get the node containing the
cursor. But first we need the cursor location. We can use the
nvim_win_get_cursor
function for that. It takes as argument the window number
and supports passing 0
for the current window.
local lnum, col = unpack(vim.api.nvim_win_get_cursor(0))
It returns a tuple of (row, col)
with the rows starting at 1 and columns at 0.
tree-sitter uses 0-based indexes so we need to subtract one:
lnum = lnum - 1
To retrieve the node containing the cursor we use the
descendant_for_range
method. It takes a start row, start column, end row and
end column as parameters. Given that the cursor is in a single
point, not a range, we use lnum
and col
for both start and end:
local cursor_node = root:descendant_for_range(lnum, col, lnum, col)
With this node we can traverse upward until we find the table
node, then back
down using :child(index)
or :iter_children()
.
You can get the type of a node using :type()
and retrieve the content of a
node with vim.treesitter.query.get_node_text(node, bufnr)
. A complete
example:
local parent = cursor_node:parent()
while parent ~= nil do
local type = parent:type()
if type == "table" and parent:child_count() > 0 then
local child = parent:child(1)
if child:type() == "bare_key" then
local name = vim.treesitter.query.get_node_text(child, bufnr)
if name == "setup" or name == "teardown" then
print('Cursor was within a setup or teardown block')
return
end
end
end
parent = parent:parent()
end
This traverses upward to find the table
, then back down to the bare_key
to
get the value of the TOML
table node name. In the example file that would be
either [setup]
or [teardown]
.
tree-sitter queries ¶
An alternative approach to manually traversing the syntax tree is to use a lisp-like query language. Tree-sitter uses S-expressions to query for nodes within a syntax tree.
From Pattern matching with queries:
A query consists of one or more patterns, where each pattern is an S-expression that matches a certain set of nodes in a syntax tree. The expression to match a given node consists of a pair of parentheses containing two things: the node’s type, and optionally, a series of other S-expressions that match the node’s children
It may take some getting used to this query language, but the
Playground can help again. It highlights the parts in the
document which the query matches. Read its documentation for more information.
(Neovim 0.10 provides a :EditQuery
command with similar capabilities)
I won’t repeat the documentation referenced above and instead show you with what I ended up:
((table_array_element
(bare_key) @element_name
(#eq? @element_name "queries")
(pair
(bare_key) @property
(string) @value
(#eq? @property "name")
)
)
)
To translate this into English: Find all table_array_element
nodes where the
bare_key
value matches queries
and where there is a property = value
pair
where the property
value equals name
Using tree-sitter-queries ¶
The vim.treesitter.query
module contains a parse_query
function which
requires the parser name and a query string. It returns a query object:
local query = vim.treesitter.query.parse_query(vim.bo.filetype, [[
((table_array_element
(bare_key) @element_name
(#eq? @element_name "queries")
(pair
(bare_key) @property
(string) @value
(#eq? @property "name")
)
)
)
]])
Using this query object we can iterate over any matching captures
using a
iter_captures
method. The captures are the @<name>
parts of the query. The
iter_captures
method has four parameters: A syntax node as starting point,
the buffer number, the starting row from which to start the search and the end row.
We use the root
node as starting point because it should query the full
document. We use 0
as starting row to start from the top of the document and
the cursor position (lnum
) as end row. This ensures nodes below the cursor
are excluded.
Now to get the [[queries]]
block closest to the cursor we can loop through
all matches and keep a reference to the last one:
local bufnr = vim.api.nvim_get_current_buf()
local last = nil
for id, node in query:iter_captures(root, bufnr, 0, lnum) do
local capture = query.captures[id]
if capture == "value" then
last = node
end
end
iter_captures
will return each matching node with captures. In the query
above there are a few (@element_name
, @property
, and @value
) but we only
need value
If there was a match, get the contents of the node and run the application:
if last then
local name = vim.treesitter.query.get_node_text(last, bufnr)
local cmd = {
'cr8',
'run-spec',
api.nvim_buf_get_name(bufnr),
'localhost:4200',
'--action', 'queries',
'--re-name', string.sub(name, 2, #name - 1)
}
-- Not included: These spawn the cmd in a terminal:
close_term()
launch_term(cmd)
end
Wrap up ¶
That’s it.
I hope this gave you some inspiration and ideas how you could use tree-sitter to improve your own editing tasks or workflows.