Adds a dedicated Python API #27

ilumsden · 2023-07-06T20:59:53Z

This PR adds a dedicated Python API to DYAD. This API will not be built/installed by Autotools. Instead, it will use setuptools/pip for its build/install process. This will allow us to host the Python API on PyPI if desired.

In terms of the interface, there are only 3 things a user may have to interact with:

The dyad_open function
The dyad_open_local function
The Dyad class

Most users will only use dyad_open. This function is a drop-in replacement for the built-in open function. Internally, this function is a "context manager" (see this page for more info). It creates a global Dyad object (if not created previously) and uses the Dyad object and built-in open function to carry out data production/consumption. When using dyad_open, DYAD will always be configured with environment variables.

The dyad_open_local function is provided to allow users to programmatically configure DYAD. When using dyad_open_local, users will first manually create a Dyad object and initialize it with the init or init_env methods. Then, users will invoke dyad_open_local. This function takes the same arguments as the built-in open plus an additional positional argument called dyad_ctx. This argument is the last required positional (i.e., it comes after all positionals from the built-in open function), and it accepts a pre-initialized Dyad object.

This PR is currently work-in-progress. The API is mostly complete, but packaging (i.e., setuptools stuff) and basic testing are still required.

JaeseungYeom · 2023-07-06T22:05:08Z

Is this PR is off the unmerged DTL PR? The reason I am asking is it is difficult to see what changes are relevant to the purpose of the PR.

vsoch

I'm really excited to try this out! I have mostly understanding questions at this point, and want to also ask about next steps / stage for the PR.

vsoch · 2023-07-20T03:01:33Z

pydyad/pydyad/bindings.py

+        dyad_core_libpath = None
+        self.cons_path = None
+        self.prod_path = None
+        if DYAD_LIB_DIR is None:


Apologies in advance if I'm just asking bad questions - this is the first time I'm looking here (and I'm excited to try this out). Is there any reason the user is required to export LD_LIBRARY_PATH? Can we not do the same as the linker and look in the ld so config or default library paths? I read up on this sometime last year and was using this class to derive paths: https://github.com/vsoch/elfcall/blob/dc7383ecd6386cff9927bbf4b3b65335a45f97f4/elfcall/main/ld.py#L37

TBH, this was just an arbitrary decision so that I'd have an initial way to locate libdyad_core.so. I can definitely replace this with a linker-style search.

Awesome! Feel free to use the class in elfcall it’s published on pypi.

vsoch · 2023-07-20T03:24:03Z

pydyad/pydyad/context.py

+    if not isinstance(args[0], Path):
+        fname = Path(args[0])
+    fname = fname.expanduser().resolve()
+    if kwargs["mode"] in ("r", "rb", "rt"):


I'm new to Dyad, could you walk me through what is happening here? I'm trying to follow the test case where you submit a job with flux, and you create a consumer and produced (different jobs but to the same path?) So are they just writing / reading data from the same paths (assuming a shared filesystem) or is there a way to use some kind of different protocol? And is flux always required? And finally, how does the io.open object to relate to local_dyad_io?

Thanks for answering my questions! I hope I can help when I better understand the high level stuff.

Thanks for asking them and thanks for the comments!

So, in general, the control- and data-flow works like this:

Producer Side:

Producer opens a file in write-only mode (we currently don't support append or R/W, although we do have plans for the later in Support ownership transfer for RW modes #29)

Producer closes that file

DYAD checks if the file is within its "producer-managed path" (set by user using the DYAD_PRODUCER_PATH environment variable)
a. If the file is not in the producer-managed path, nothing extra happens (besides the file close)

DYAD registers information about the file in the Flux KVS (currently, it's just the file path relative to the producer-managed path as key and the Flux rank as value)

Consumer Side:

Consumer opens a file in read-only mode

DYAD checks if the file is within its "consumer-managed path" (set by user using the DYAD_CONSUMER_PATH environment variable)
a. If the file is not in the consumer-managed path, nothing extra happens (besides the file open)

DYAD looks for an entry in the Flux KVS corresponding to the file. If there is no entry available, DYAD will block until an entry is added. This ensures the consumer is synchronized on data production (can't run a piece of code if its input isn't available, after all)
a. If the KVS entry says that the consumer and producer are on the same node (as indicated by the Flux rank), the rest of the DYAD-specific stuff is skipped

DYAD creates an RPC to invoke dyad.fetch on the Flux broker running on the producer's node

The dyad.fetch callback reads the file (currently all at once, but we intend to change that in the future) and transmits it back to the consumer using DYAD's Data Transport Layer, or DTL (added as part of the soon-to-be-merged PR Adds a Data Transport Layer to DYAD to support different ways of transferring data #24). By default, the DTL will use UCX for the actual communication.

DYAD writes the file into the consumer's storage

The actual "open" is called

Because of this, DYAD can work with either local or shared storage. The behavior above is what happens when using local storage. When using shared storage, we run all the steps on the producer side, but only steps 1-3 and 7 on the consumer side.

Currently, the user has to tell us when they are using shared storage using the DYAD_SHARED_STORAGE environment variable. However, my project this summer has been to automate that decision making using an abstraction of the storage hierarchy that I'm calling the "storage graph". The initial prototype implementation of this is almost done. Just running into some symbol export issues that refuse to be fixed by libtool's -export-symbols flag.

Oh this is so cool! So the actual file open and path ate mostly just for checking permissions or ownership of the path and the real content is in flux kvs.

I know you have a little more work to do, but I’ll start a testing setup of this branch in the flux operator soon to at least try to reproduce the test. Also, your description above was hugely helpful - if you don’t have it in a README or docs somewhere I highly recommend adding it.

And it probably follows in the future that something else other than flux kvs could be used for that layer, so we could potentially run this without flux as a dependency? I’m hugely interested in this for Kubernetes!

Most of this functionality is implemented in DYAD's core library (i.e., src/core in the source code and libdyad_core.so in an install). However, anything tied to the APIs is implemented as part of those APIs.

This is why, in dyad_open, we have the two conditionals wrapping the consume and produce calls. The conditional around consume (starting on line 29) is essentially performing steps 1 and 2 for the consumer side. Similarly, the conditional around produce (starting on line 38) is performing step 3 and the mode check of step 1 for the producer side.

Overall, what dyad_open is doing is this:

Either create or use an instance of our Python FFI class (i.e., the Dyad class from bindings.py). The decision is based on the values of dyad_ctx and register_dyad_ctx

Convert the filename (provided as the first positional argument to the function) to a pathlib.Path for convenience

Check if we should run DYAD's consumer-side code, and, if we should, invoke the corresponding call from libdyad_core.so through the FFI

Perform the actual file I/O. We use io.open to call the actual Python open function, and then we use the try-finally block to allow dyad_open to be used as a context manager (i.e., to allow it to be used in with statements)

Check if we should run DYAD's producer-side code, and, if we should, invoke the corresponding call from libdyad_core.so through the FFi

Now that I'm out of info dump mode, let me actually answer the specific questions you asked 😅

Q: "So are they just writing / reading data from the same paths (assuming a shared filesystem) or is there a way to use some kind of different protocol?"
A: That depends on the values provided to the environment variables DYAD_PRODUCER_PATH, DYAD_CONSUMER_PATH, and DYAD_SHARED_STORAGE. You can think of the producer/consumer paths as the directories DYAD tracks to detect production/consumption events. What actual storage resources they point to is completely up to the user. So, they can point to shared storage (e.g., Lustre) or local storage (e.g., tmpfs, node-local SSD, or even El Cap Rabbit XFS). Additionally, we currently require the user to tell us if they are pointing to local or shared storage by setting DYAD_SHARED_STORAGE (if set, storage is shared; otherwise, storage is local). This will be automated in the near future though.

Q: "And is flux always required?"
A: Yes. Right now, DYAD uses Flux's KVS and RPC services to perform (1) information sharing and (2) control messaging for data transfer respectively. There has been some talk about looking for other options, particularly for the KVS service. But, we have no concrete plans to replace either the KVS or RPC at this time, and there are no plans to move away from Flux entirely.

Q: "And finally, how does the io.open object to relate to local_dyad_io?"
A: They aren't really related in any way. Essentially, local_dyad_io is used to perform the DYAD operations, and io.open is used to perform the "real" file I/O operations. The key detail is that dyad_open as a whole is essentially the glue that ensures that DYAD operations are correctly called (using local_dyad_io) at the correct points in the file I/O process.

Also, thanks to writing all this out, I realized the the path checks on lines 30-31 and 39-40 are redundant because they are already performed in libdyad_core.so. I'll change that.

Good catch! I’m always happy to serve as the rubber duck! 🦆 😆

Regarding the dependency on Flux for KVS and RPC, options being considered include foundation db and grpc

JaeseungYeom · 2023-09-11T21:31:29Z

Is is still work in progress PR?

ilumsden · 2023-09-11T22:10:22Z

All that's left is testing to make sure it works correctly. Haven't gotten to that yet.

ilumsden · 2023-10-17T21:22:55Z

@JaeseungYeom @hariharan-devarajan @vsoch now that the DTL PR is merged, I've fixed up this PR. It's now fully ready for review.

…ibrary

JaeseungYeom · 2023-10-27T17:54:07Z

tests/pydyad_spsc/run.sh

+# FLUX: --output=pydyad_spsc_test.out
+# FLUX: --error=pydyad_spsc_test.err
+
+DYAD_INSTALL_LIBDIR="/g/g90/lumsden1/ws/insitu_benchmark/dyad/_test_install/lib"


This should be an environment variable. Then, check if the directory and a representative file exists under it.

I've fixed this

JaeseungYeom · 2023-10-27T17:56:20Z

tests/pydyad_spsc/run.sh

+
+DYAD_INSTALL_LIBDIR="/g/g90/lumsden1/ws/insitu_benchmark/dyad/_test_install/lib"
+KVS_NAMESPACE="pydyad_test"
+CONS_MANAGED_PATH="/l/ssd/lumsden1/pydyad_test"


Same with these variables. No hardcoded path to one of yours.
The least you can do is to rely on "${USER} instead of your id. However, that still does not work if there is no /l/ssd available.

These paths were hardcoded because this was mainly my script for testing on Corona. I've changed it to be environment variable-based

JaeseungYeom · 2023-10-27T18:21:49Z

tests/pydyad_spsc/run.sh

+EOM
+
+flux kvs namespace create $KVS_NAMESPACE
+flux exec -r all flux module load $DYAD_INSTALL_LIBDIR/dyad.so \


This is misleading although I know it works and why you do it in this way.
There could be multiple options we can look into, and I mentioned this to Hari as well.

Option 1: One is to allocate two nodes and explicitly launch modules for producer and consumer to exclusive ranks with corresponding managed paths. You may run commands via flux exec -r. However, the tricky part is that you need to run one of producer and consumer in background if you were to run them concurrently.
Option 2: Create a script for the producer and another one for the consumer. Inside of each script you check if dyad module exists and launch one if not, with the corresponding managed path.

In general, we may assume the same path for consumer and producer. This is the typical use case we envision. However, it does not work with the testing using a single node. That is why you are using producer's managed path.

JaeseungYeom

Does this PR require rebasing?
Edit: NVM. I see it has been rebased.

Given the urgency, I will go over the PR quick and merge unless there is anything critical.
However, just by a glance, I see a couple of things to improve on in the future.

The check on the data file should be stronger. Currently, the only check is to see if the size each file is the same. The check should look into the content of each file which should be based on some level of randomization. Consider creating the integer sequence from a random initial value from a set of seed agreed between the producer and the consumer. I am not suggesting to generate every number randomly but only the first value in each file. Each data file should be generated by using a different seed for its first value. Then, have consumer checks if the content matches the expectation or at least if the hash is the same.
The CI tests are borrowed from tutorial, and this interface has a separate directory under test. This needs to be better organized. Perhaps, create a directory for C and C++ under test as well.

JaeseungYeom

Given the urgency, I will just merge this as is. However, I left some comments for further improvements.

ilumsden added the enhancement New feature or request label Jul 6, 2023

ilumsden self-assigned this Jul 6, 2023

ilumsden force-pushed the python_api branch from 9280de7 to 11810d1 Compare July 7, 2023 19:39

vsoch reviewed Jul 20, 2023

View reviewed changes

ilumsden force-pushed the python_api branch 3 times, most recently from ef60cd8 to acbd0f4 Compare July 27, 2023 21:25

ilumsden force-pushed the python_api branch from acbd0f4 to ab35341 Compare September 8, 2023 19:11

ilumsden force-pushed the python_api branch 2 times, most recently from 983303e to 3fd1e60 Compare October 16, 2023 19:45

ilumsden marked this pull request as ready for review October 17, 2023 21:21

ilumsden added 3 commits October 24, 2023 13:14

Adds a Python API to DYAD

fde9903

Replaces pydyad's lookup of libdyad_core.so to use ctypes.util.find_l…

2b45140

…ibrary

Adds a README to the pydyad test

b63b9a0

ilumsden force-pushed the python_api branch from db75923 to b63b9a0 Compare October 24, 2023 17:15

JaeseungYeom reviewed Oct 27, 2023

View reviewed changes

ilumsden mentioned this pull request Oct 27, 2023

Adds a function to get DYAD file metadata #54

Merged

ilumsden force-pushed the python_api branch from f3ccb92 to 47ade92 Compare October 27, 2023 20:53

Updates CI to run the Python test too

4da69d3

ilumsden force-pushed the python_api branch from 7240dbb to 4da69d3 Compare October 27, 2023 20:56

ilumsden mentioned this pull request Oct 27, 2023

Include python interface in the CI testing #53

Open

JaeseungYeom reviewed Oct 27, 2023

View reviewed changes

JaeseungYeom approved these changes Oct 28, 2023

View reviewed changes

JaeseungYeom merged commit f41857b into flux-framework:main Oct 28, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Adds a dedicated Python API #27

Adds a dedicated Python API #27

ilumsden commented Jul 6, 2023

JaeseungYeom commented Jul 6, 2023 •

edited

Loading

vsoch left a comment

vsoch Jul 20, 2023

ilumsden Jul 20, 2023

vsoch Jul 20, 2023

vsoch Jul 20, 2023

ilumsden Jul 20, 2023 •

edited

Loading

ilumsden Jul 20, 2023

vsoch Jul 20, 2023

ilumsden Jul 20, 2023 •

edited

Loading

ilumsden Jul 20, 2023

ilumsden Jul 20, 2023

ilumsden Jul 20, 2023

vsoch Jul 20, 2023

JaeseungYeom Jul 20, 2023

JaeseungYeom commented Sep 11, 2023

ilumsden commented Sep 11, 2023

ilumsden commented Oct 17, 2023

JaeseungYeom Oct 27, 2023 •

edited

Loading

ilumsden Oct 27, 2023

JaeseungYeom Oct 27, 2023 •

edited

Loading

ilumsden Oct 27, 2023

JaeseungYeom Oct 27, 2023

JaeseungYeom left a comment •

edited

Loading

JaeseungYeom left a comment

Adds a dedicated Python API #27

Adds a dedicated Python API #27

Conversation

ilumsden commented Jul 6, 2023

JaeseungYeom commented Jul 6, 2023 • edited Loading

vsoch left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

ilumsden Jul 20, 2023 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

ilumsden Jul 20, 2023 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

JaeseungYeom commented Sep 11, 2023

ilumsden commented Sep 11, 2023

ilumsden commented Oct 17, 2023

JaeseungYeom Oct 27, 2023 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

JaeseungYeom Oct 27, 2023 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

JaeseungYeom left a comment • edited Loading

Choose a reason for hiding this comment

JaeseungYeom left a comment

Choose a reason for hiding this comment

JaeseungYeom commented Jul 6, 2023 •

edited

Loading

ilumsden Jul 20, 2023 •

edited

Loading

ilumsden Jul 20, 2023 •

edited

Loading

JaeseungYeom Oct 27, 2023 •

edited

Loading

JaeseungYeom Oct 27, 2023 •

edited

Loading

JaeseungYeom left a comment •

edited

Loading