Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ServiceX and Open Data DID finder #6

Closed
zonca opened this issue Sep 21, 2021 · 21 comments
Closed

ServiceX and Open Data DID finder #6

zonca opened this issue Sep 21, 2021 · 21 comments
Assignees

Comments

@zonca
Copy link
Member

zonca commented Sep 21, 2021

Related to #2

In the process, send feedback, take notes, write blog post if gets interesting.

@zonca zonca self-assigned this Sep 21, 2021
@zonca
Copy link
Member Author

zonca commented Sep 22, 2021

started to try deploying ServiceX on Jetstream, as suggested by @BenGalewsky, I'm providing feedback to docs: ssl-hep/ServiceX#349

@zonca
Copy link
Member Author

zonca commented Sep 23, 2021

ServiceX PONDD instance at U Chicago configuration:

https://github.com/pondd-project/flux-cd/blob/main/servicex/pondd-values.yaml

@zonca
Copy link
Member Author

zonca commented Sep 23, 2021

started to try deploying ServiceX on Jetstream, as suggested by @BenGalewsky, I'm providing feedback to docs: ssl-hep/ServiceX#349

not useful for me to follow deployment instructions, have a completely different target, I started customizing the U of Chicago configuration: pondd-project/pondd-jetstream#1

@zonca
Copy link
Member Author

zonca commented Sep 23, 2021

@BenGalewsky

  • does CERNOpenData need to be deployed separately from ServiceX? do you have an example YAML for that as well?
  • objectStore is a service provided by ServiceX or it is an external service that is needed to run ServiceX?

@gordonwatts
Copy link

  • does CERNOpenData need to be deployed separately from ServiceX? do you have an example YAML for that as well?

The CERNOpenData DID finder is deployed inside the default servicex chart, see this in the values.yaml. If you are using a modern version of the chart as reference, I would have expected it to just work.

  • objectStore is a service provided by ServiceX or it is an external service that is needed to run ServiceX?

This probably needs @BenGalewsky - but by default, it is provided by minio. At the bottom of the default values.yaml are some minio configuration values. It is referenced at least one other place in the file, though I'm not 100% sure how they two bits interact. I know someone else was trying to substitute another object store, but I do not know the status. We have done our best to use the standard S3 API, so anything that can speak that should be able to interact.

@BenGalewsky
Copy link
Contributor

BenGalewsky commented Sep 29, 2021

objectStore is a service provided by ServiceX or it is an external service that is needed to run ServiceX?

Currently the only way to use the objectStore is to set objectStore.enabled to true. This deploys Minio as a sub chart and hooks it up to the application. We plan eventually to allow admins to bring their own object store and skip the Minio deployment.

@zonca
Copy link
Member Author

zonca commented Sep 30, 2021

ok, the serviceX pod is running.

Logs show:

INFO  [alembic.runtime.migration] Context impl PostgresqlImpl.
INFO  [alembic.runtime.migration] Will assume transactional DDL.
INFO  [alembic.runtime.migration] Running upgrade  -> b389abb05262, V1.0-RC.1
INFO  [alembic.runtime.migration] Running upgrade b389abb05262 -> 99e97a63d1bd, V1.0-RC.2
INFO  [alembic.runtime.migration] Running upgrade 99e97a63d1bd -> dd1f9a8a2aee, V1.0-RC.3
INFO  [alembic.runtime.migration] Running upgrade dd1f9a8a2aee -> a6cbb6201d3d, v1.0-rc4-a1
INFO  [alembic.runtime.migration] Running upgrade a6cbb6201d3d -> 04b9fb8ffee1, v1.0-rc4-a2
INFO  [alembic.runtime.migration] Running upgrade 04b9fb8ffee1 -> a33a96f0f035, rc4a2
[2021-09-30 20:35:39 +0000] [1] [INFO] Starting gunicorn 20.1.0
[2021-09-30 20:35:39 +0000] [1] [INFO] Listening at: http://0.0.0.0:5000 (1)
[2021-09-30 20:35:39 +0000] [1] [INFO] Using worker: sync
[2021-09-30 20:35:39 +0000] [14] [INFO] Booting worker with pid: 14
[2021-09-30 20:35:39 +0000] [15] [INFO] Booting worker with pid: 15
[2021-09-30 20:35:39 +0000] [16] [INFO] Booting worker with pid: 16
[2021-09-30 20:35:39 +0000] [18] [INFO] Booting worker with pid: 18
[2021-09-30 20:35:40 +0000] [21] [INFO] Booting worker with pid: 21

However, should it have a https interface when I connect through a browser (https://pondd-servicex.zonca.dev/) like minio?

It gives privacy error but the certificate looks fine: https://gist.github.com/18bd49f8e6d5367802d9fff6c0fdef9c

@zonca
Copy link
Member Author

zonca commented Sep 30, 2021

A suspicious thing I noticed is that Helm is writing out:

Congratulations! You deployed an ingress for this service. You can access the
REST service at http://servicex.pondd-servicex.zonca.dev

there is an extra servicex. prepended to the domain.

@gordonwatts
Copy link

This might be something for @BenGalewsky to sort out when he is back. I'm not sure how this part works!

@BenGalewsky
Copy link
Contributor

there is an extra servicex. prepended to the domain.

The first servicex is the name of the helm deployment. You can have multiple instances deployed to the same namespace under the same Ingres controller.

Have you read through our documentation on TLS options for the helm chart?

@zonca
Copy link
Member Author

zonca commented Oct 11, 2021

thanks @BenGalewsky, the docs don't specify how to configure host

I'm using cert-manager (which works fine for minio), the docs say:

app:
  ingress:
    tls:
      enabled: true
      clusterIssuer: letsencrypt-prod

but how do I configure host? with or without servicex.?

@zonca
Copy link
Member Author

zonca commented Oct 11, 2021

I tried both, I think the right one is pondd-servicex.zonca.dev, so that the ingress is:

servicex-servicex   <none>   servicex.pondd-servicex.zonca.dev   10.0.0.7   80, 443   3m28s

in the other case I get servicex.servicex.pondd-servicex.zonca.dev.

Then I get issued a cert:

spec:
  dnsNames:
  - servicex.pondd-servicex.zonca.dev
  issuerRef:
    group: cert-manager.io
    kind: ClusterIssuer
    name: letsencrypt-prod
  secretName: servicex-app-tls

the ingress has:

spec:
  rules:
  - host: servicex.pondd-servicex.zonca.dev
    http:
      paths:
      - backend:
          serviceName: default-http-backend
          servicePort: 80
        path: /servicex/internal
        pathType: ImplementationSpecific
      - backend:
          serviceName: servicex-servicex-app
          servicePort: 8000
        path: /
        pathType: ImplementationSpecific
  tls:
  - hosts:
    - servicex.pondd-servicex.zonca.dev
    secretName: servicex-app-tls

However the connection via browser still fails with privacy error, it seems it is serving a default certificate.

@zonca
Copy link
Member Author

zonca commented Oct 11, 2021

@BenGalewsky but is servicex listening on 8000?

@zonca
Copy link
Member Author

zonca commented Oct 11, 2021

@BenGalewsky ok, I got it working, possibly an issue with 4th level domains?
if I specify just zonca.dev as host and then name the helm deployment pondd-servicex, it works fine.

@zonca
Copy link
Member Author

zonca commented Oct 11, 2021

@gordonwatts it seems the deployment is working, so now I would like to test a simple retrieval from CERN Open Data.

I was trying https://github.com/ssl-hep/ServiceX_DID_Finder_CERNOpenData/blob/develop/samples/simple_plot.ipynb, or do you have a better example on how to use the DID Finder?

I get

>>> sx_dataset = ServiceXDataset("cernopendata://3827", backend_type='dev_uproot')
got an unexpected keyword argument 'backend_type'

so maybe the notebook is outdated?

Also, should I point this to the servicex REST API endpoint, right?

@BenGalewsky
Copy link
Contributor

Until @gordonwatts clears this up, I just tried a simple Postman POST Request:

POST to https://pondd-servicex.zonca.dev/servicex/transformation

{
	"did": "cernopendata://3827",
	"selection": "(Select (call EventDataset) (lambda (list e) (call (attr e 'jet_pt'))))",
	"result-destination": "object-store",
	"result-format": "root-file",
	"chunk-size": 7000,
	"workers": 1
}	

It works and I get a request ID returned!

@zonca
Copy link
Member Author

zonca commented Oct 20, 2021

@gordonwatts can you please take a look at #6 (comment)?

@gordonwatts
Copy link

Yeah - sorry!! I need to update that notebook. THe parameter is now called backend_name[ - for the complete docs on it see the source code.

@zonca
Copy link
Member Author

zonca commented Nov 4, 2021

@gordonwatts I had already tried just changing the argument name, but it didn't work. My guess some other update is needed. I get:

ServiceXException: (ServiceXException(...), 'Unable to find name/type dev_uproot in api_endpoints in servicex.yaml configuration file. Saw only names (default) and types (xaod)')

@gordonwatts
Copy link

gordonwatts commented Nov 8, 2021

@zonca - that second error is a different error - that is just saying that whatever name you are using isn't referencing an end point your local machine knows about.

In short - go to the services home page, log in, and download the customized servicex.yaml file, and place it in your home directory. Look in it and you'll see a name entry - you can use that in your code here. You can also combine multiple servicex.yaml files.

@zonca
Copy link
Member Author

zonca commented Nov 8, 2021

thanks @gordonwatts, nevermind,
I would have been nice to make a quick test, but it is not really necessary and seems extremely complicated, so let's just skip it. No need to waste more of your time for an optional test.

I consider that my deployment is working.

@BenGalewsky tested it in: #6 (comment)

I see it came through in the ServiceX dashboard:

image

@zonca zonca closed this as completed Nov 8, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants