-
Notifications
You must be signed in to change notification settings - Fork 130
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Added ray version of the html2parquet transform #666
Conversation
Signed-off-by: Sungeun An <[email protected]>
Signed-off-by: Maroun Touma <[email protected]>
Signed-off-by: Maroun Touma <[email protected]>
Signed-off-by: Maroun Touma <[email protected]>
Great. Thank you very much @sungeunan-ibm for finishing this. It looks like the conflict in yml is the result of 1) The move of this transform from universal to the language folder and 2) The fact that a new hap transform was added yesterday. But, the main reviewer for the correct implementation of ray is @daw3rd. |
Thank you @shahrokhDaijavad! @daw3rd Could you advise on how I should resolve these conflicts? Should I simply regenerate the file by running the following commands?
|
@@ -1,7 +1,11 @@ | |||
# | |||
# DO NOT EDIT THIS FILE: it is generated from test-transform.template, Edit there and run make to change these files | |||
# | |||
<<<<<<<< HEAD:.github/workflows/test-language-html2parquet.yml |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How did you check this in? I thought git would not allow it. Anyway. I recommend the following.
- merge the latest dev branch into your branch
- run
make
in this directory and you should get a proper version of this file.
@@ -1,7 +1,11 @@ | |||
# | |||
# DO NOT EDIT THIS FILE: it is generated from test-transform.template, Edit there and run make to change these files | |||
# | |||
<<<<<<<< HEAD:.github/workflows/test-language-html2parquet.yml |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same here
|
||
minio-start: .minio-start | ||
|
||
kind-load-image:: .transforms.kind-load-image |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please update the following rules to match those in other transforms from the latest dev branch. For example,
.PHONY: workflow-venv |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The Makefile you are referring to is located in the ray folder, while the example provided is directly under the ededup folder. I checked the Makefile for the ededup transform within the ray folder, and it matches the same structure.
|
||
The transform will produce the following statsd metrics: | ||
|
||
| metric name | Description | |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These are not part of the output parquet file. I think you need to tell the reader where these can be viewed. Also, please link from here to the python readme for annotations and configuration.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually, I'm not entirely sure where these metrics can be viewed. In the Ray implementation, I borrowed the pdf2parquet code, but I’m not very familiar with Prometheus metrics. These metrics are generated from the Html2ParquetRayTransform
class, and they can be viewed through logging. After some investigation, it appears that you can access these metrics in the Prometheus dashboard under the relevant namespace for your application. However, my experience with this is limited. I would greatly appreciate your assistance in adding more detailed descriptions!
Signed-off-by: Sungeun An <[email protected]>
Signed-off-by: Sungeun An <[email protected]>
Signed-off-by: Sungeun An <[email protected]>
Signed-off-by: Sungeun An <[email protected]>
package name for ray cannot be the same as package name for python
Setting up the vent for ray testing, is different than python testing
here we need to test ray modules (not python)
Need to publish ray image (not python)
Need to build ray image (not python)
Need to test ray image (not python)
Not sure if we are using this but it should still be ray sample as the default goes to python sample
This won't work here
@@ -102,6 +102,7 @@ jobs: | |||
- name: Test KFP libs (shared and v2) and run a workflow | |||
timeout-minutes: 120 | |||
run: | | |||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why is this file being changed by this PR?
@@ -53,7 +53,7 @@ docker-save-image:: | |||
@# Help: Recursively make $@ in all subdirs | |||
$(MAKE) RULE=$@ .recurse | |||
|
|||
.PHONY: workflow-venv | |||
.PHONY: workflow-vent |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There is no kfp support yet. Why is this in here ?
Signed-off-by: David Wood <[email protected]>
Need a base image to build the docker image for ray
We will update the transforms/language/html2parquet/ray/README.md in second PR.
Why are these changes needed?
Added ray version of the html2parquet transform
Related issue number (if any).
Issue #635