A tree sitting in your editor

Friday, September 2, 2022 » Neovim

Table of content

The use-case
What’s tree-sitter?
A syntax tree
Neovim tree-sitter API
tree-sitter queries
Using tree-sitter-queries
Wrap up

This is a short introduction to the tree-sitter integration in Neovim based on a use-case I had: To find content in a TOML file close to the cursor position and then launch an application using this information.

The use-case ¶

I had a use-case where I wanted to be able to trigger an application from within Neovim and pass along information based on file contents close to the cursors position.

I wanted to do this from within TOML files. An example looks like this:

[setup]
statement_files = ["sql/uservisits.sql"]

[[queries]]
name = "global avg"
statement = '''select avg("adRevenue") from uservisits'''
iterations = 500

[[queries]]
name = "global max-long"
statement = "select max(duration) from uservisits"
iterations = 500

[teardown]
statements = ["drop table if exists uservisits"]

If I hit a key combination it should notice whether the cursor is within a [setup], [teardown] or [[queries]] block. If it’s within [[queries]] it should extract the name value within it launch an application with the value as argument.

Most TOML parser libraries don’t preserve whitespace information, but instead provide the result as a data dictionary. Without whitespace information it would be difficult to figure out how the cursor position relates to the data.

Therefore a regular parser library is out of the question. This is where tree-sitter comes in.

What’s tree-sitter? ¶

From the Tree-sitter website:

Tree-sitter is a parser generator tool and an incremental parsing library. It can build a concrete syntax tree for a source file and efficiently update the syntax tree as the source file is edited

The tree-sitter runtime library is written in C. That made it possible for Neovim to embed it and provide an API for users and plugin authors to retrieve and query the syntax tree of a document.

To use tree-sitter you need to install language specific parsers. You can learn more about that in nvim-treesitter

A syntax tree ¶

Parsing the above toml example yields a syntax tree like this:

table [0, 0] - [3, 0]
  bare_key [0, 1] - [0, 6]
  pair [1, 0] - [1, 40]
    bare_key [1, 0] - [1, 15]
    array [1, 18] - [1, 40]
      string [1, 19] - [1, 39]
table_array_element [3, 0] - [8, 0]
  bare_key [3, 2] - [3, 9]
  pair [4, 0] - [4, 19]
    bare_key [4, 0] - [4, 4]
    string [4, 7] - [4, 19]
  pair [5, 0] - [5, 57]
    bare_key [5, 0] - [5, 9]
    string [5, 12] - [5, 57]
  pair [6, 0] - [6, 16]
    bare_key [6, 0] - [6, 10]
    integer [6, 13] - [6, 16]
table_array_element [8, 0] - [13, 0]
  bare_key [8, 2] - [8, 9]
  pair [9, 0] - [9, 24]
    bare_key [9, 0] - [9, 4]
    string [9, 7] - [9, 24]
  pair [10, 0] - [10, 50]
    bare_key [10, 0] - [10, 9]
    string [10, 12] - [10, 50]
  pair [11, 0] - [11, 16]
    bare_key [11, 0] - [11, 10]
    integer [11, 13] - [11, 16]
table [13, 0] - [15, 0]
  bare_key [13, 1] - [13, 9]
  pair [14, 0] - [14, 48]
    bare_key [14, 0] - [14, 10]
    array [14, 13] - [14, 48]
      string [14, 14] - [14, 47]

On a first glance this may look like gibberish. What you see here are the names of the syntax nodes and their positions within the document. In the square brackets you see [start_row, start_col] and [end_row, end_col]. The exact syntax nodes always depend on a concrete parser implementation and the language grammar. The tree for a Python program will look different.

In this example the top-level nodes are:

table for [setup] and [teardown]
table_array_element for [[queries]]

To make it easier to learn how text relates to syntax nodes you can use the playground plugin for Neovim. It helps inspecting a syntax tree as it highlights the related text as you navigate through the syntax tree:

(Neovim 0.9 adds a :InspectTree command that can do roughly the same)

Now, how can you use this syntax tree to find the needed information?

Neovim tree-sitter API ¶

Usually the best way to learn about APIs in Neovim is to use the built-in help system. For tree-sitter that would be :help lua-treesitter. I’ll go over the main components now, so you don’t need to consult the help page immediately, but if you intend to play around with tree-sitter yourself, make sure to read it eventually.

First we need a parser. We can get that by using the get_parser function within the vim.treesitter module:

local parser = vim.treesitter.get_parser()

The parser will be bound to the current buffer/document and we can immediately parse the document using the parse method:

local trees = parser:parse()

The method returns a list (or in Lua: tables) of trees, not a single tree. This is because in languages like Markdown you can have nested languages, and you’d get a tree for each.

My toml files don’t contain nested languages so I ignore that and use the first tree, and then retrieve the root node:

local root = trees[1]:root()

Neovim provides a neat function that lets us retrieve the smallest node spanning a given range. We can use this function to get the node containing the cursor. But first we need the cursor location. We can use the nvim_win_get_cursor function for that. It takes as argument the window number and supports passing 0 for the current window.

local lnum, col = unpack(vim.api.nvim_win_get_cursor(0))

It returns a tuple of (row, col) with the rows starting at 1 and columns at 0. tree-sitter uses 0-based indexes so we need to subtract one:

lnum = lnum - 1

To retrieve the node containing the cursor we use the descendant_for_range method. It takes a start row, start column, end row and end column as parameters. Given that the cursor is in a single point, not a range, we use lnum and col for both start and end:

local cursor_node = root:descendant_for_range(lnum, col, lnum, col)

With this node we can traverse upward until we find the table node, then back down using :child(index) or :iter_children().

You can get the type of a node using :type() and retrieve the content of a node with vim.treesitter.query.get_node_text(node, bufnr). A complete example:

local parent = cursor_node:parent()
while parent ~= nil do
  local type = parent:type()
  if type == "table" and parent:child_count() > 0 then
    local child = parent:child(1)
    if child:type() == "bare_key" then
      local name = vim.treesitter.query.get_node_text(child, bufnr)
      if name == "setup" or name == "teardown" then
        print('Cursor was within a setup or teardown block')
        return
      end
    end
  end
  parent = parent:parent()
end

This traverses upward to find the table, then back down to the bare_key to get the value of the TOML table node name. In the example file that would be either [setup] or [teardown].

tree-sitter queries ¶

An alternative approach to manually traversing the syntax tree is to use a lisp-like query language. Tree-sitter uses S-expressions to query for nodes within a syntax tree.

From Pattern matching with queries:

A query consists of one or more patterns, where each pattern is an S-expression that matches a certain set of nodes in a syntax tree. The expression to match a given node consists of a pair of parentheses containing two things: the node’s type, and optionally, a series of other S-expressions that match the node’s children

It may take some getting used to this query language, but the Playground can help again. It highlights the parts in the document which the query matches. Read its documentation for more information. (Neovim 0.10 provides a :EditQuery command with similar capabilities)

I won’t repeat the documentation referenced above and instead show you with what I ended up:

((table_array_element
  (bare_key) @element_name
  (#eq? @element_name "queries")
  (pair
    (bare_key) @property
    (string) @value
    (#eq? @property "name")
  )
 )
)

To translate this into English: Find all table_array_element nodes where the bare_key value matches queries and where there is a property = value pair where the property value equals name

Using tree-sitter-queries ¶

The vim.treesitter.query module contains a parse_query function which requires the parser name and a query string. It returns a query object:

local query = vim.treesitter.query.parse_query(vim.bo.filetype, [[
  ((table_array_element
    (bare_key) @element_name
    (#eq? @element_name "queries")
    (pair
      (bare_key) @property
      (string) @value
      (#eq? @property "name")
    )
   )
  )
]])

Using this query object we can iterate over any matching captures using a iter_captures method. The captures are the @<name> parts of the query. The iter_captures method has four parameters: A syntax node as starting point, the buffer number, the starting row from which to start the search and the end row.

We use the root node as starting point because it should query the full document. We use 0 as starting row to start from the top of the document and the cursor position (lnum) as end row. This ensures nodes below the cursor are excluded.

Now to get the [[queries]] block closest to the cursor we can loop through all matches and keep a reference to the last one:

local bufnr = vim.api.nvim_get_current_buf()
local last = nil
for id, node in query:iter_captures(root, bufnr, 0, lnum) do
  local capture = query.captures[id]
  if capture == "value" then
    last = node
  end
end

iter_captures will return each matching node with captures. In the query above there are a few (@element_name, @property, and @value) but we only need value

If there was a match, get the contents of the node and run the application:

if last then
  local name = vim.treesitter.query.get_node_text(last, bufnr)
  local cmd = {
    'cr8',
    'run-spec',
    api.nvim_buf_get_name(bufnr),
    'localhost:4200',
    '--action', 'queries',
    '--re-name', string.sub(name, 2, #name - 1)
  }

  -- Not included: These spawn the cmd in a terminal:
  close_term()
  launch_term(cmd)
end

Wrap up ¶

That’s it.

I hope this gave you some inspiration and ideas how you could use tree-sitter to improve your own editing tasks or workflows.