Skip to content

Commit 397b649

Browse files
yanns1wafa-harirYann SALMON
authored
Rewrite of the bindings' generator using Tree-sitter (#275)
* entries for python's venv, tree-sitter-c repo and .python-version from pyenv * export list of project dependencies * add tree sitter, build c parser and preprocess header files * utilities for pretty printing a TS node * better preprocessing: keep comments, remove linemarkers, expand macros from libvlc.h * fix: missing return * only preprocess vlc.h to make sure includes and macro expansion are done properly * treesitter-based parse_enums almost done * extract the logic for cleaning doxygen comment block in a separate method * little refactor of parse_version: move the identification of libvlc_version.h in __init__ * parse_struct * parse_version using Tree-sitter * fix doc cleaning done in one branch but not the other * ignore anonymous enums * don't crash on empty docs, return as is * comparing outputs obtained with Tree-sitter against those obtained with regexes * fix wrong file_ in Enums produced * parse_structs with queries * update parse_structs_with_ts * compare output of structs * merge with parse_structs_with_ts and almost complete implementation of parse_funcs_with_ts * fix escape sequence warning * better dumps * ignore unique param being void * fix query for actually capturing all function declarations (forgot pointer_declarator nodes!) * include type qualifiers and pointers in return type * replacement for Par.parse_param using Tree-sitter, for parsing function parameters and struct fields * make the assumption about libvlc naming for enums * use pathlib instead of os.path in Parser * implementation of parse_callbacks_with_ts * replace if-raise by assert to make assumptions about TS nodes more concise * use parse_param_with_ts in parse_structs_with_ts * move preprocessing out of Parser and change Parser's __init__ slightly to make it more flexible (useful for testing for example) * tests for parse_enums_with_ts (and define __repr__ and __eq__ for Enum and Val along the way) * fix incomplete duplication of parse_param_with_ts logic in parse_funcs_with_ts (didn't catch const on pointers) * tests for parse_funcs_with_ts (and define __repr__ and __eq__ to Func and Par along the way) * fix: use _INDENT_ instead of tabs * remove const and spaces from Par types, like parse_param does (fix the tests accordingly) * remove 'struct' from Par types, as done in parse_param * tests for parse_structs_with_ts (and define __repr__ and __eq__ for Struct along the way) * tests for parse_callbacks and fix incomplete duplication of parse_param_with_ts * tests parse_version_with_ts * git ignore .venv as well * add setuptools * remove code related to parsing with regular expressions, and ignore function pointers and unions within structs for now * remove extraneous printing * use assertCountEqual instead of assertListEqual, because order doesn't matter in this case * feat: parse nested structs/unions; moved parse_param into Parser, created a Union class, tweaked dump methods to allow multiple levels of indentation * tests for parsing nested structs/unions and split tests for bindings vs. generator (reflected in the Makefile as well) * formatting using ruff * recipe to lint using ruff * fix some lint errors * feat: parse function pointers as params or fields; factored out type parsing into parse_type, stopped accepting top-level funcs and callbacks in parse_param because made it too difficult to handle function pointers elsewhere * fix parse_funcs and parse_callbacks to handle function pointers as params and not match function pointers as fields * test for function pointer as struct field * test for function pointer as parameter of callback * no need to name it as private, stay consistent * add option to Parser controlling whether it should parse nested structs/unions and function pointers as params/fields; it is opt-out, and if opted out, we get the same capabilities/output that the previous regex-based parser * factor the finding of an associated Doxygen comment into a separate method * update parse_param docstring in light of recent changes * tests for function pointers as param of regular functions * refactor: move clean_doxygen_comment_block out of Parser because it doesn't really belong there * tests for clean_doxygen_comment_block * refactor: format Parser's test input files * fix: preprocess even if vlc.preprocessed exists, because the output unexpectedly doesn't change when testing on different header files (otherwise need to remember to delete vlc.preprocessed!) * feat: update preprocessing command to handle cases where vlc headers are included as system headers (like in version 4.0.0) * fix: ignore enums w/o body instead of crashing (needed for libvlc v4.0.0 because there are typedefed enums w/o body) * refactor: make attributes constants * feat: handle deprecated enum values * install sphinx * tests from enums with deprecated values * feat: add docs for enum values * tests for enums with documented values * slightly more permissive linter rules + fix two linting errors * add pre-commit hook running ruff * sphinx autodoc * add ruff github action * add workflow for tests * install ruff in venv * function doxygenToSphinx and change epydoc * fix: workaround to ignore file's doc block on top of enum, struct or else * feat: format output of PythonGenerator using ruff * sort parsed items as it proves useful for debugging (makes generated bindings easier to diff) * parser can't be None now * sort imports as part of the linting * feat: run ruff --fix as well when formatting PythonGenerator's output * run ruff check --fix as part of the format recipe * feat: a good start for generating nested structs/unions * fix: pre-commit config was not doing what was expected, it checked any Python file about to be committed * fix: generate wrapper classes before structs as they can be needed as field type * fix: can't output structs sorted by name because a struct's class need to be defined before it is used; sorting breaks the (right) order of libvlc's source * use _Cstruct instead of ctypes.Structure * fix: base class should be Union, not Structure * field type being a wrapper class doesn't work because wrapper classes are not ctypes; didn't found a solution not conflicting with the creation process (__new__) of wrapper classes * increment generator's major version * feat: generate binding for function pointer as struct/union field * test bindings for function pointers of DialogCbs * no need for hardcoded Event and EventUnion anymore * update the method doxygen2sphinx * change the doxygen format @ to sphinx : * fix: generate nested structs/unions within their parent class so that nested structs/unions having the same name don't shadow each other * fix syntax warnings * make Tree-sitter C grammar a submodule * (dev) script to setup the project after cloning * use dev_setup.py in tests workflow, and test bindings as well * latest generated dev/vlc.py * fix: activating venv in subprocess is useless! can call venv's python directly * fix: fix dev_setup on windows * fix: fix tests workflow now that dev_setup has changed * can remove one item from the todolist! * add ourselves as contributors * remove pylint, pyflakes, pychecker related stuff; use ruff now * drop python2 support for the generator * put _Enum in header.py instead of generating it each time * fix: ignore Doxygen blocks with a file tag * better way to output docs: no more N/A, better indentation * format templates * strip whitespaces utility function, with unit tests * better sphinx doc in templates * don't need the toc * unit doxygen2sphinx and epydocs into docs_in_sphinx_format, and better sphinx doc generation (cross references, params in italic, note and warning blocks, etc.) * rename option genums * rename generate_ctypes * insert generated functions in the header so that header.py becomes the only place where we decide the order of generation * docstrings in sphinx format for generator, plus making some names smaller... * fix: was running insert_code two times, so writing build_date two times when format=True * add wheel for package builds * (almost) empty the blacklist and report deprecated functions as well * snake to camel case utility * feat: generate decorators for function pointers as parameters (only case is libvlc_set_exit_handler) * reintroducing EventUnion for backward compatibility * test execution of dialog cbs (at least the error one) * test event callbacks bindings * test exit handler * convert Doxygen lists to Sphinx lists, and use spaces instead of tabs for generated docs * generate docs for dev * latest generated dev/vlc.py * update generator's README given recent changes * update 'How to contribute' because no _blacklist anymore * better error messages * don't make a venv when one already exists * quotes in f-string doesn't work in all Python versions --------- Co-authored-by: wafa harir <[email protected]> Co-authored-by: Yann SALMON <[email protected]>
1 parent 03cbfde commit 397b649

35 files changed

+16955
-9139
lines changed

.github/workflows/ruff.yml

Lines changed: 22 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,22 @@
1+
name: Ruff
2+
3+
on: [push, pull_request]
4+
5+
jobs:
6+
linting:
7+
runs-on: ubuntu-latest
8+
steps:
9+
- uses: actions/checkout@v4
10+
- uses: chartboost/ruff-action@v1
11+
with:
12+
args: "check --config ruff.toml"
13+
src: "./generator/generate.py ./tests ./dev_setup.py"
14+
15+
formatting:
16+
runs-on: ubuntu-latest
17+
steps:
18+
- uses: actions/checkout@v4
19+
- uses: chartboost/ruff-action@v1
20+
with:
21+
args: "format --check --config ruff.toml"
22+
src: "./generator/generate.py ./tests ./dev_setup.py ./generator/templates"

.github/workflows/tests.yml

Lines changed: 16 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,16 @@
1+
name: Tests
2+
3+
on: [push, pull_request]
4+
5+
jobs:
6+
tests:
7+
runs-on: ubuntu-latest
8+
steps:
9+
- uses: actions/checkout@v4
10+
- run: |
11+
sudo apt-get install -y vlc pulseaudio libvlc-dev
12+
pulseaudio --start
13+
python3 dev_setup.py
14+
. .venv/bin/activate
15+
make test_generator
16+
make test_bindings

.gitignore

Lines changed: 13 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -38,3 +38,16 @@ nosetests.xml
3838
.mr.developer.cfg
3939
.project
4040
.pydevproject
41+
42+
# Virtual environment
43+
.venv
44+
venv
45+
46+
# pyenv version
47+
.python-version
48+
49+
# Preprocessed libvlc header files
50+
*.preprocessed*
51+
52+
# Documentation
53+
docs/_build/

.gitmodules

Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,4 @@
1+
[submodule "vendor/tree-sitter-c"]
2+
path = vendor/tree-sitter-c
3+
url = https://github.com/tree-sitter/tree-sitter-c
4+
ignore = dirty

.pre-commit-config.yaml

Lines changed: 32 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,32 @@
1+
# See https://pre-commit.com for more information
2+
# See https://pre-commit.com/hooks.html for more hooks
3+
repos:
4+
- repo: local
5+
hooks:
6+
# Abort the commit if there is one or more linting errors
7+
- id: lint
8+
name: Check no linting errors
9+
entry: ruff check --config ruff.toml
10+
language: python
11+
files: |
12+
(?x)^(
13+
generator/generate\.py|
14+
tests/.*\.py|
15+
dev_setup\.py
16+
)$
17+
minimum_pre_commit_version: "2.9.2"
18+
19+
# Abort the commit if code wasn't formatted
20+
- id: format
21+
name: Check code is formatted
22+
entry: ruff format --config ruff.toml --check
23+
language: python
24+
files: |
25+
(?x)^(
26+
generator/generate\.py|
27+
tests/.*\.py|
28+
dev_setup\.py|
29+
generator/templates/.*\.py
30+
)$
31+
minimum_pre_commit_version: "2.9.2"
32+

AUTHORS

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -20,3 +20,6 @@ Jonas Haag <[email protected]>
2020
Patrick Fay <[email protected]>
2121
2222
Kim Wiktorén <[email protected]>
23+
Wafa Harir <[email protected]>
24+
Yann Salmon <[email protected]>
25+
Marwa Tabib <[email protected]>

Makefile

Lines changed: 23 additions & 10 deletions
Original file line numberDiff line numberDiff line change
@@ -29,6 +29,8 @@ ifeq ($(TARGETS),)
2929
TARGETS=missing
3030
endif
3131

32+
.PHONY: missing dev installed dist deb doc test_bindings2 test_bindings test_generator test2 test tests sdist publish format rcheck check clean
33+
3234
all: $(TARGETS)
3335

3436
missing:
@@ -59,25 +61,36 @@ $(VERSIONED_NAME): generator/generate.py generator/templates/header.py generator
5961
doc: $(VERSIONED_NAME)
6062
-pydoctor --project-name=python-vlc --project-url=https://github.com/oaubert/python-vlc/ --make-html --verbose --html-output=doc $<
6163

62-
test2: $(MODULE_NAME)
63-
PYTHONPATH=$(VERSIONED_PATH):$(PROJECT_ROOT) python tests/test.py
64-
PYTHONPATH=$(DEV_PATH):$(PROJECT_ROOT) python tests/test.py
64+
test_bindings2: $(MODULE_NAME)
65+
PYTHONPATH=$(VERSIONED_PATH):$(PROJECT_ROOT) python tests/test_bindings.py
66+
PYTHONPATH=$(DEV_PATH):$(PROJECT_ROOT) python tests/test_bindings.py
67+
68+
test_bindings: $(MODULE_NAME)
69+
PYTHONPATH=$(VERSIONED_PATH):$(PROJECT_ROOT) python3 tests/test_bindings.py
70+
PYTHONPATH=$(DEV_PATH):$(PROJECT_ROOT) python3 tests/test_bindings.py
71+
72+
test_generator: $(MODULE_NAME)
73+
PYTHONPATH=$(VERSIONED_PATH):$(PROJECT_ROOT) python3 tests/test_generator.py
74+
PYTHONPATH=$(DEV_PATH):$(PROJECT_ROOT) python3 tests/test_generator.py
75+
76+
test2: test_bindings2
6577

66-
test: $(MODULE_NAME)
67-
PYTHONPATH=$(VERSIONED_PATH):${PROJECT_ROOT} python3 tests/test.py
68-
PYTHONPATH=$(DEV_PATH):$(PROJECT_ROOT) python3 tests/test.py
78+
test: test_bindings test_generator
79+
80+
tests: test test2
6981

7082
sdist: $(VERSIONED_NAME)
7183
cd $(VERSIONED_PATH); python3 setup.py bdist_wheel sdist
7284

7385
publish: $(VERSIONED_NAME)
7486
cd $(VERSIONED_PATH); python3 setup.py bdist_wheel sdist && twine upload dist/*
7587

76-
tests: test test2
88+
format:
89+
ruff format ./generator/generate.py ./tests dev_setup.py ./generator/templates/
90+
ruff check --fix --fix-only --exit-zero ./generator/generate.py ./tests dev_setup.py ./generator/templates
7791

78-
check: $(MODULE_NAME)
79-
-pyflakes $<
80-
-pylint $<
92+
check:
93+
ruff check ./generator/generate.py ./tests dev_setup.py
8194

8295
clean:
8396
-$(RM) -r $(DEV_PATH)

README.md

Lines changed: 82 additions & 32 deletions
Original file line numberDiff line numberDiff line change
@@ -1,26 +1,24 @@
1-
Python ctypes-based bindings for libvlc
2-
=======================================
1+
# Python ctypes-based bindings for libvlc
2+
3+
![](https://img.shields.io/github/actions/workflow/status/yanns1/python-vlc/tests.yml?event=push&label=tests)
34

45
This file documents the bindings generator, not the bindings
56
themselves. For the bindings documentation, see the README.module
67
file.
78

8-
99
The bindings generator generates ctypes-bindings from the include
1010
files defining the public API. The same generated module should be
11-
compatible with various versions of libvlc 2.* and 3.*. However, there
11+
compatible with various versions of libvlc 2.\* and 3.\*. However, there
1212
may be incompatible changes between major versions. Versioned bindings
1313
for 2.2 and 3.0 are provided in the repository.
1414

15-
License
16-
-------
15+
## License
1716

1817
The module generator is licensed under the GNU General Public License
19-
version 2 or later. The generated module is licensed, like libvlc,
18+
version 2 or later. The generated module is licensed, like libvlc,
2019
under the GNU Lesser General Public License 2.1 or later.
2120

22-
Building from source
23-
--------------------
21+
## Development
2422

2523
You can get the latest version of the code generator from
2624
<https://github.com/oaubert/python-vlc/> or
@@ -31,31 +29,58 @@ vlc/bindings/python, so that it finds the development include files,
3129
or to find the installed include files in /usr/include (on Debian,
3230
install libvlc-dev).
3331

32+
Once you have cloned the project, you can run
33+
34+
```
35+
python3 dev_setup.py
36+
```
37+
38+
from the root.
39+
This script will install everything that is needed (submodules,
40+
virtual environment, packages, etc.) for you to generate the bindings.
41+
Then, activate the virtual environment:
42+
43+
- On Linux with Bash:
44+
```
45+
. .venv/bin/activate
46+
```
47+
- On Windows with Powershell:
48+
```
49+
.\.venv\Scripts\Activate.ps1
50+
```
51+
52+
See https://docs.python.org/3/library/venv.html#how-venvs-work for other os-shell combinations.
53+
3454
To generate the vlc.py module and its documentation, for both the
35-
development version and the installed VLC version, use
55+
development version and the installed VLC version, use `make`.
3656

37-
make
57+
For running tests, use `make test`.
58+
Note that you need vlc installed because some tests require the
59+
libvlc's dynamic library to be present on the system.
3860

3961
If you want to generate the bindings from an installed version of the
4062
VLC includes (which are expected to be in /usr/include/vlc), use the
41-
'installed' target:
63+
'installed' target: `make installed`.
4264

43-
make installed
65+
See more recipes in the Makefile.
4466

45-
To install it for development purposes (add a symlink to your Python
67+
To install python-vlc for development purposes (add a symlink to your Python
4668
library) simply do
4769

48-
python setup.py develop
70+
```
71+
python setup.py develop
72+
```
4973

5074
preferably inside a virtualenv. You can uninstall it later with
5175

52-
python setup.py develop --uninstall
76+
```
77+
python setup.py develop --uninstall
78+
```
5379

5480
Documentation building needs epydoc. An online build is available at
5581
<http://olivieraubert.net/vlc/python-ctypes/>
5682

57-
Packaging
58-
---------
83+
## Packaging
5984

6085
The generated module version number is built from the VLC version
6186
number and the generator version number:
@@ -67,24 +92,49 @@ so that it shared it major.minor with the corresponding VLC.
6792
To generate the reference PyPI module (including setup.py, examples
6893
and metadata files), use
6994

70-
make dist
95+
```
96+
make dist
97+
```
98+
99+
## Architecture
100+
101+
First of all, the bindings generator is in generator/generate.py.
102+
103+
It really is the conjunction of two things:
104+
105+
1. A **parser** of C header files (those of libvlc): that is the class `Parser`.
106+
1. A **generator** of Python bindings: that is the class `PythonGenerator`.
107+
108+
`Parser` parses libvlc's headers and produce a kind of AST where nodes are
109+
instances of either `Struct`, `Union`, `Func`, `Par`, `Enum` or `Val`.
110+
The information kept is what is necessary for `PythonGenerator` to then produce
111+
the bindings.
112+
113+
Until version 2 of the bindings generator, parsing was regex-based.
114+
It worked pretty well thanks to the consistent coding style of libvlc.
115+
However, it remained rather fragile.
116+
117+
Since version 2, parsing is done using [Tree-sitter](https://tree-sitter.github.io/tree-sitter/).
118+
More specifically, we use the [C Tree-sitter grammar](https://github.com/tree-sitter/tree-sitter-c)
119+
and [Tree-sitter's Python bindings](https://github.com/tree-sitter/py-tree-sitter).
120+
It offers a more complete and robust parsing of C code.
121+
The job of `Parser` is thus to transform the AST[^1] produced by Tree-sitter into an "AST"
122+
understandable by the generator.
123+
124+
## LibVLC Discord
71125

72-
LibVLC Discord
73-
-----------------
74126
[![Join the chat at https://discord.gg/3h3K3JF](https://img.shields.io/discord/716939396464508958?label=discord)](https://discord.gg/3h3K3JF)
75127

76128
python-vlc is part of the LibVLC Discord Community server. Feel free to come say hi!
77129

78-
How to contribute
79-
-----------------
130+
## How to contribute
131+
132+
Contributions such as:
133+
134+
- reporting and fixing bugs,
135+
- contributing unit tests
136+
- contributing examples
80137

81-
There are short-terms contributions (reporting and fixing bugs,
82-
contributing unit tests, contributing examples). A number of libvlc
83-
functions are currently blacklisted (search for `_blacklist` in the
84-
generator code), mostly because of their signature complexity. They
85-
would benefit some work.
138+
are welcomed!
86139

87-
Longer terms goals include the rewriting of the generator to use a
88-
proper parser for the C-syntax (for the moment, the parser relies on
89-
regexp-based expression, which works thanks to the coding style
90-
applied in the code, but remains very fragile).
140+
[^1]: To be exact, it produces a CST: Concrete Syntax Tree.

TODO

Lines changed: 0 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,2 @@
11
* Add more test coverage
22
* Provide more examples
3-
* Rewrite parsing code using a proper C parser, such as
4-
https://github.com/eliben/pycparser
5-
or
6-
https://github.com/albertz/PyCParser

dev_setup.py

Lines changed: 63 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,63 @@
1+
#!/usr/bin/env python3
2+
3+
"""Script to get going after having cloned the repository."""
4+
5+
import os
6+
import sys
7+
from pathlib import Path
8+
from subprocess import PIPE, STDOUT, CalledProcessError, run
9+
10+
PROJECT_ROOT = Path(__file__).parent
11+
cwd = Path.cwd()
12+
assert (
13+
cwd.resolve() == PROJECT_ROOT.resolve()
14+
), f"You should run that script from {PROJECT_ROOT}, but current working directory is {cwd}."
15+
16+
# See https://stackoverflow.com/questions/1854/how-to-identify-which-os-python-is-running-on
17+
on_windows = os.name == "nt"
18+
19+
20+
def run_cmd(mess, cmd):
21+
print(f"{mess}...", end=" ", flush=True)
22+
try:
23+
_proc = run(cmd, stdout=PIPE, stderr=STDOUT, check=True)
24+
except CalledProcessError as e:
25+
print()
26+
cmd = " ".join(e.cmd)
27+
print(f"Oops! Command '{cmd}' failed.")
28+
print(f"Got return code {e.returncode}.")
29+
print("Here is the command output:")
30+
print(e.output.decode(), end="", flush=True)
31+
sys.exit(e.returncode)
32+
print("Done.", flush=True)
33+
34+
35+
python = "python3"
36+
venv_bin = ".venv/Scripts" if on_windows else ".venv/bin"
37+
venv_python = f"{venv_bin}/python3"
38+
pre_commit = f"{venv_bin}/pre-commit"
39+
40+
# Clone Tree-sitter grammar which is a Git submodule of the project
41+
# See https://git-scm.com/book/en/v2/Git-Tools-Submodules
42+
run_cmd(
43+
"Clone vendored C Tree-sitter grammar",
44+
["git", "submodule", "update", "--init", "--recursive"],
45+
)
46+
47+
# Create a virtual environment if it doesn't exist
48+
if not (PROJECT_ROOT / ".venv").is_dir():
49+
run_cmd("Create a virtual environment in .venv", [python, "-m", "venv", ".venv"])
50+
51+
# Upgrade venv's pip
52+
run_cmd("Upgrade pip", [venv_python, "-m", "pip", "install", "--upgrade", "pip"])
53+
54+
# Install dev dependencies
55+
run_cmd(
56+
"Install dependencies",
57+
[venv_python, "-m", "pip", "install", "-r", "requirements.txt"],
58+
)
59+
60+
# Install pre-commit hooks
61+
run_cmd("Install pre-commit hooks", [pre_commit, "install"])
62+
63+
print("Setup successfull!")

0 commit comments

Comments
 (0)