Skip to content

Commit

Permalink
CLI for singe table synthesizer (#86)
Browse files Browse the repository at this point in the history
- Intro `Data Exporter` for exporting sampled data to data sources
- CLI updates for synthesizer
  • Loading branch information
Wh1isper committed Dec 23, 2023
1 parent bedd69c commit 4ea93b0
Show file tree
Hide file tree
Showing 36 changed files with 1,018 additions and 88 deletions.
1 change: 1 addition & 0 deletions .github/workflows/extension.yml
Original file line number Diff line number Diff line change
Expand Up @@ -27,6 +27,7 @@ jobs:
python -m pip install -e .[test]
- name: Install all packages in example/extension
run: |
python -m pip install -e example/extension/dummyexporter[test]
python -m pip install -e example/extension/dummymetadatainspector[test]
python -m pip install -e example/extension/dummycache[test]
python -m pip install -e example/extension/dummydataconnector[test]
Expand Down
9 changes: 9 additions & 0 deletions docs/source/api_reference/data_exporters/base.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,9 @@
Base Class for DataExporter
=======================

.. autoclass:: sdgx.data_exporters.base.DataExporter
:members:
:undoc-members:
:inherited-members:
:show-inheritance:
:private-members:
10 changes: 10 additions & 0 deletions docs/source/api_reference/data_exporters/csv_exporter.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,10 @@
CsvExporter
=====================================


.. autoclass:: sdgx.data_exporters.csv_exporter.CsvExporter
:members:
:undoc-members:
:inherited-members:
:show-inheritance:
:private-members:
11 changes: 11 additions & 0 deletions docs/source/api_reference/data_exporters/extension.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,11 @@
.. _api_reference/data-exporters-extension:

Extension hookspec
============================

.. automodule:: sdgx.data_exporters.extension
:members:
:undoc-members:
:inherited-members:
:show-inheritance:
:private-members:
24 changes: 24 additions & 0 deletions docs/source/api_reference/data_exporters/index.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,24 @@
Data Exporter
========================================================

.. toctree::
:maxdepth: 1

Base Class for DataExporter <base>

Built-in DataExporter
-----------------------------

.. toctree::
:maxdepth: 2

CsvExporter <csv_exporter>

Custom DataExporter Relevant
-----------------------------

.. toctree::
:maxdepth: 2

Extension hookspec <extension>
DataExporterManager <manager>
9 changes: 9 additions & 0 deletions docs/source/api_reference/data_exporters/manager.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,9 @@
DataExporterManager
=================================

.. autoclass:: sdgx.data_exporters.manager.DataExporterManager
:members:
:undoc-members:
:inherited-members:
:show-inheritance:
:private-members:
1 change: 1 addition & 0 deletions docs/source/api_reference/index.rst
Original file line number Diff line number Diff line change
Expand Up @@ -14,6 +14,7 @@ API Reference
Data Processor <data_processors/index>
Models <models/index>
Metadata and Inspectors <data_models/index>
Data Exporter <data_exporters/index>
Manager <manager>
Exceptions <exceptions>
Utils <utils>
20 changes: 15 additions & 5 deletions docs/source/developer_guides/extension/index.rst
Original file line number Diff line number Diff line change
Expand Up @@ -23,8 +23,18 @@ View latest extension example on `GitHub <https://github.com/hitsz-ids/synthetic
Plugin-supported modules
------------------------

- :ref:`Cacher for DataLoader <api_reference/cachers-extension>`
- :ref:`Data Connector <api_reference/data-connectors-extension>`
- :ref:`Data Processor <api_reference/data-processors-extension>`
- :ref:`Inspector for Metadata <api_reference/data-models-inspectors-extension>`
- :ref:`Model <api_reference/models-extension>`
- :ref:`API Reference for extended Data Connector <api_reference/data-connectors-extension>`:
:ref:`Data Connector <Data Connector>` is used to connect to data sources.
- :ref:`API Reference for extended Cacher for DataLoader <api_reference/cachers-extension>`:
:ref:`Cacher <Cacher>` is used for improving performance,
reducing network overhead and support large datasets.
- :ref:`API Reference for extended Data Processor <api_reference/data-processors-extension>`:
:ref:`Data Processor <Data Processor>` is used to pre-process and post-process data.
It is useful for business logic.
- :ref:`API Reference for extended Inspector for Metadata <api_reference/data-models-inspectors-extension>`:
:ref:`Inspector <Inspector>` is used to extract metadata such as patterns, types, etc. from raw data.
- :ref:`API Reference for extended Model <api_reference/models-extension>`:
:ref:`Model <SynthesizerModel>`, the model fitted by processed data and used to generate synthetic data.
- :ref:`API Reference for extended Data Exporter <api_reference/data-exporters-extension>`:
:ref:`Data Exporter <Data Exporter>` is used to export data to somewhere.
Use it in CLI or library way to save your processed data or synthetic data.
22 changes: 22 additions & 0 deletions docs/source/user_guides/cli.rst
Original file line number Diff line number Diff line change
@@ -1,2 +1,24 @@
Command Line Interface
==================================================

Command Line Interface(CLI) is designed to simplify the usage of SDG and enable other programs to use SDG in a more convenient way.

There are tow main commands in the CLI:

- ``fit``: For fitting, finetuning, retraining... the model, which will save the final model to a specified path.
- ``sample``: Load existing model and sample synthetic data.

And as SDG supports plug-in system, users can list all available via ``list-{component}`` command.

.. Note::

If you want to use SDG as a library, please refer to :ref:`Use Synthetic Data Generator as a library <Use Synthetic Data Generator as a library>`.

If you want to extend SDG with your own components, please refer to :ref:`Developer guides for extension <Extented Synthetic Data Generator>`.

CLI for synthetic single-table data
--------------------------------------------------

.. click:: sdgx.cli.main:cli
:prog: sdgx
:nested: full
1 change: 1 addition & 0 deletions example/extension/dummyexporter/dummyexporter/__init__.py
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
__version__ = "0.1.0"
15 changes: 15 additions & 0 deletions example/extension/dummyexporter/dummyexporter/dummyexporter.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,15 @@
from __future__ import annotations

from sdgx.data_exporters.base import DataExporter


class MyOwnExporter(DataExporter):
...


from sdgx.data_exporters.extension import hookimpl


@hookimpl
def register(manager):
manager.register("MyOwnExporter", MyOwnExporter)
27 changes: 27 additions & 0 deletions example/extension/dummyexporter/pyproject.toml
Original file line number Diff line number Diff line change
@@ -0,0 +1,27 @@
[build-system]
requires = ["hatchling"]
build-backend = "hatchling.build"

[project]
name = "dummyexporter"
dependencies = ["sdgx"]
dynamic = ["version"]
requires-python = ">=3.8"
classifiers = [
"Programming Language :: Python :: 3",
'Programming Language :: Python :: 3.8',
'Programming Language :: Python :: 3.9',
'Programming Language :: Python :: 3.10',
'Programming Language :: Python :: 3.11',
]
[project.optional-dependencies]
test = ["pytest"]

[tool.check-manifest]
ignore = [".*"]

[tool.hatch.version]
path = "dummyexporter/__init__.py"

[project.entry-points."sdgx.data_exporter"]
dummyexporter = "dummyexporter.dummyexporter"
16 changes: 16 additions & 0 deletions example/extension/dummyexporter/tests/test_registed_exporter.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,16 @@
import pytest

from sdgx.data_exporters.manager import DataExporterManager


@pytest.fixture
def manager():
yield DataExporterManager()


def test_registed_exporter(manager: DataExporterManager):
assert manager._normalize_name("MyOwnExporter") in manager.registed_exporters


if __name__ == "__main__":
pytest.main(["-vv", "-s", __file__])
Loading

0 comments on commit 4ea93b0

Please sign in to comment.