Skip to content

Commit

Permalink
feat: add dapper-python package (#40)
Browse files Browse the repository at this point in the history
Add a Python package that implements various helper functions for users who want
to use the DAPper datasets from Python, primarily file name normalization. Later it
may include functions for locating the latest version(s) of installed DAPper
datasets, and standard queries for the datasets.

Resolves #39
  • Loading branch information
nightlark committed Jan 10, 2025
1 parent 4cb9e30 commit 2240d27
Show file tree
Hide file tree
Showing 7 changed files with 457 additions and 0 deletions.
20 changes: 20 additions & 0 deletions .github/workflows/ci.yml
Original file line number Diff line number Diff line change
Expand Up @@ -48,3 +48,23 @@ jobs:
- uses: actions/checkout@v4
- name: Check formatting
run: cargo fmt -- --check

python-tests:
runs-on: ubuntu-latest

steps:
- uses: actions/checkout@v4
- name: Set up Python
uses: actions/setup-python@v4
with:
python-version: '3.x'

- name: Install dependencies
run: |
python -m pip install --upgrade pip
pip install -e .[test]
working-directory: python

- name: Run pytest
run: python -m pytest
working-directory: python
94 changes: 94 additions & 0 deletions .github/workflows/python-publish.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,94 @@
name: Publish Python Package

on:
workflow_dispatch:
inputs:
publish_target:
description: 'Publish target (testpypi, pypi, dry-run)'
required: true
default: 'dry-run'
type: choice
options:
- dry-run
- testpypi
- pypi
push:
branches:
- main
paths:
- 'python/**'
pull_request:
branches:
- main
paths:
- 'python/**'

jobs:
build-wheel:
runs-on: ubuntu-latest

steps:
- uses: actions/checkout@v4

- name: Set up Python
uses: actions/setup-python@v4
with:
python-version: '3.x'

- name: Install dependencies
run: |
python -m pip install --upgrade pip
pip install build twine auditwheel
working-directory: python

- name: Build package
run: python -m build
working-directory: python

- name: Check distribution
run: twine check dist/*
working-directory: python

- name: Upload Python package dist artifacts
uses: actions/upload-artifact@v4
with:
name: python-package-dist
path: python/dist

pypi-publish:
name: Upload release to PyPI
runs-on: ubuntu-latest
needs: build-wheel
if: github.event.inputs.publish_target == 'pypi'
environment:
name: pypi
url: https://pypi.org/p/dapper-python
permissions:
id-token: write # IMPORTANT: this permission is mandatory for trusted publishing
steps:
- name: Download Python package dist artifacts
uses: actions/download-artifact@v4
with:
name: python-package-dist
path: dist
- name: Publish package distributions to PyPI
uses: pypa/gh-action-pypi-publish@release/v1

testpypi-publish:
name: Upload release to TestPyPI
runs-on: ubuntu-latest
needs: build-wheel
if: github.event.inputs.publish_target == 'testpypi'
environment:
name: pypi
url: https://test.pypi.org/p/dapper-python
permissions:
id-token: write # IMPORTANT: this permission is mandatory for trusted publishing
steps:
- name: Download Python package dist artifacts
uses: actions/download-artifact@v4
with:
name: python-package-dist
path: dist
- name: Publish package distributions to TestPyPI
uses: pypa/gh-action-pypi-publish@release/v1
44 changes: 44 additions & 0 deletions python/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,44 @@
# dapper-python

dapper-python is a Python package for working with DAPper datasets. It provides helper functions for normalizing shared library file names similar to the Rust implementation in the DAPper project, and other methods for helping developers access the DAPper datasets.

## Installation

You can install the `dapper-python` package from PyPI using pip:

```bash
pip install dapper-python
```

## Usage

Here is an example of how to use the `dapper-python` package:

```python
from dapper_python.normalize import normalize_file_name

# Example usage
file_name = "libexample-1.2.3.so.1.2"
normalized_name = normalize_file_name(file_name)
print(normalized_name)
```

## Tests

The `dapper-python` package includes tests to help ensure the normalization function matches the Rust implementation.

You can run the tests using the following command:

```bash
python -m pytest
```

## License

DAPper is released under the MIT license. See the [LICENSE](../LICENSE)
and [NOTICE](../NOTICE) files for details. All new contributions must be made
under this license.

SPDX-License-Identifier: MIT

LLNL-CODE-871441
1 change: 1 addition & 0 deletions python/dapper_python/__init__.py
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
# This file makes the folder a package
141 changes: 141 additions & 0 deletions python/dapper_python/normalize.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,141 @@
import re
from typing import Optional, Union

class NormalizedFileName:
"""
Represents a normalized file name with optional version and SOABI information.
Attributes:
name (str): The normalized file name.
version (Optional[str]): The version number, if available.
soabi (Optional[str]): The SOABI version, if available.
normalized (bool): Indicates if the file name was normalized.
"""
def __init__(self, name: str, version: Optional[str] = None, soabi: Optional[str] = None, normalized: bool = False):
self.name = name
self.version = version
self.soabi = soabi
self.normalized = normalized

def normalize_file_name(name: str) -> Union[NormalizedFileName, str]:
"""
Normalize a shared library file name.
Args:
name (str): The file name to normalize.
Returns:
Union[NormalizedFileName, str]: A NormalizedFileName object if the file name is a shared library,
otherwise the original file name.
"""
if name.endswith(".so") or (".so." in name and not any(name.endswith(suffix) for suffix in [".gz", ".patch", ".diff", ".hmac", ".qm"])):
return normalize_soname(name)
return name

def normalize_soname(soname: str) -> NormalizedFileName:
"""
Normalize a shared object file name.
Args:
soname (str): The shared object file name to normalize.
Returns:
NormalizedFileName: A NormalizedFileName object with the normalized name, version, and SOABI information.
"""
soname, soabi = extract_soabi_version(soname)
soabi_version = soabi if soabi else None

if ".cpython-" in soname:
pos = soname.find(".cpython-")
return NormalizedFileName(normalize_cpython(soname, pos), soabi=soabi_version, normalized=True)
elif ".pypy" in soname:
pos = soname.find(".pypy")
return NormalizedFileName(normalize_pypy(soname, pos), soabi=soabi_version, normalized=True)
elif soname.startswith("libHS"):
normalized_name, version, normalized = normalize_haskell(soname)
return NormalizedFileName(normalized_name, version, soabi_version, normalized)
else:
normalized_name, version = extract_version_suffix(soname)
if version:
return NormalizedFileName(normalized_name, version, soabi_version, True)
return NormalizedFileName(soname, soabi=soabi_version, normalized=False)

def extract_soabi_version(soname: str) -> (str, str):
"""
Extract the SOABI version from a shared object file name.
Args:
soname (str): The shared object file name.
Returns:
(str, str): A tuple containing the base file name and the SOABI version.
"""
if ".so." in soname:
pos = soname.find(".so.")
return soname[:pos + 3], soname[pos + 4:]
return soname, ""

def extract_version_suffix(soname: str) -> (str, Optional[str]):
"""
Extract the version number from a shared object file name.
Args:
soname (str): The shared object file name.
Returns:
(str, Optional[str]): A tuple containing the base file name and the version number, if available.
"""
version_pattern = re.compile(r"-(\d+(\.\d+)+)\.so")
match = version_pattern.search(soname)
if match:
version = match.group(1)
base_soname = soname.rsplit('-', 1)[0]
return f"{base_soname}.so", version
return soname, None

def normalize_cpython(soname: str, pos: int) -> str:
"""
Normalize a CPython shared object file name.
Args:
soname (str): The shared object file name.
pos (int): The position of the CPython tag in the file name.
Returns:
str: The normalized file name.
"""
return f"{soname[:pos]}.cpython.so"

def normalize_pypy(soname: str, pos: int) -> str:
"""
Normalize a PyPy shared object file name.
Args:
soname (str): The shared object file name.
pos (int): The position of the PyPy tag in the file name.
Returns:
str: The normalized file name.
"""
return f"{soname[:pos]}.pypy.so"

def normalize_haskell(soname: str) -> (str, Optional[str], bool):
"""
Normalize a Haskell shared object file name.
Args:
soname (str): The shared object file name.
Returns:
(str, Optional[str], bool): A tuple containing the normalized file name, version number, and a boolean
indicating if the file name was normalized.
"""
if "-ghc" in soname:
pos = soname.rfind("-ghc")
name = soname[:pos]
api_hash = name.rsplit('-', 1)[-1]
if len(api_hash) in [20, 21, 22] and api_hash.isalnum():
name = name[:-(len(api_hash) + 1)]
name, version = name.rsplit('-', 1)
return f"{name}.so", version, True
return soname, None, False
58 changes: 58 additions & 0 deletions python/pyproject.toml
Original file line number Diff line number Diff line change
@@ -0,0 +1,58 @@
[build-system]
requires = ["setuptools"]
build-backend = "setuptools.build_meta"

[project]
name = "dapper-python"
version = "0.0.0.dev0"
description = "A Python package for interacting with DAPper datasets"
authors = [
{ name = "Ryan Mast", email = "[email protected]" }
]
license = { text = "MIT License" }
readme = "README.md"
requires-python = ">=3.6"
classifiers = [
"Programming Language :: Python :: 3",
"License :: OSI Approved :: MIT License",
"Operating System :: OS Independent",
"Intended Audience :: Developers",
"Intended Audience :: Science/Research",
"Topic :: Database",
"Topic :: Security",
"Topic :: Utilities",
]

[project.urls]
Homepage = "https://github.com/LLNL/dapper"
Discussions = "https://github.com/LLNL/dapper/discussions"
"Issue Tracker" = "https://github.com/LLNL/dapper/issues"
"Source Code" = "https://github.com/LLNL/dapper"

[project.optional-dependencies]
test = ["pytest"]
dev = ["build", "pre-commit"]

[dependency-groups]
test = ["pytest"]
dev = ["build", "pre-commit"]

[tool.setuptools.packages.find]
include = ["dapper_python", "dapper_python.*"]

[project.entry-points."surfactant"]

[tool.pytest.ini_options]
addopts = ["--import-mode=importlib"]
pythonpath = "."

[tool.ruff]
line-length = 100
indent-width = 4

[tool.ruff.lint]
# ruff defaults: E4, E7, E9, F
select = ["E", "F", "B", "I"]
ignore = ["E501", "F841"]
# don't fix flake8-bugbear (`B`) violations
unfixable = ["B"]
Loading

0 comments on commit 2240d27

Please sign in to comment.