Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Cannot get it to run hello world #145

Open
masiulaniec opened this issue Dec 12, 2022 · 2 comments
Open

Cannot get it to run hello world #145

masiulaniec opened this issue Dec 12, 2022 · 2 comments

Comments

@masiulaniec
Copy link

masiulaniec commented Dec 12, 2022

I am trying to evaluate reflow but getting stopped in my tracks at Quick Start. I am simply trying to run the hello world example:

$ cat hello.rf 
val Main = exec(image := "ubuntu") (out file) {"
        echo hello world >>{{out}}
"}
$

I initially assumed that local mode (Docker) would be the quickest. So I ran:

$ reflow run -local hello.rf
2022/12/11 15:36:37 localcluster Init requires taskdb.TaskDB: unspecified
$                                                                                                                                                                                    

This smells like an internal error (dependency injection failure). It is surprising that TaskDB is a hard dependency when in -local mode. It contradicts the official documentation, which states that TaskDB is a soft dependency even for cluster mode.

Having given up on Docker, I fell back on the official EC2 quickstart from the README. The setup-ec2 / setup-s3-repository / setup-dynamodb-assoc trio worked fine. Unfortunely, reflow run failed in a surprising way:

$ reflow run hello.rf
reflow: reflow runtime: ===== started =====
reflow: reflow version: 1.27.0 (go1.18.4)
reflow: run ID: 44898ba0
reflow: evaluating program /Users/me/reflow/hello.rf
        (no params)
        (no arguments)
reflow: Trace: none (since nopTracer is in use)
reflow: evaluating with configuration: scheduler *sched.Scheduler snapshotter blob.Mux repository *blobrepo.Repository,url=s3://masiunet-reflow-test/ assoc *dydbassoc.Assoc,TableName=masiunet-reflow-test flags nocache,norecomputeempty,topdown flowconfig hashv2 cachelookuptimeout 20m0s imagemap map[ubuntu:index.docker.io/library/ubuntu@sha256:965fbcae990b0467ed5657caceaec165018ef44a4d2d46c7cdea80a9dff0d1ea] dotwriter(*os.File)
reflow: (flow 3dca1cc0): reviseResources {mem:500.0MiB cpu:1 disk:0B}: resources {mem:500.0MiB cpu:1 disk:0B} are way higher than max {mem:0B cpu:128 disk:250.0GiB intel_avx:128 intel_avx2:128 intel_avx512:128 intel_turbo:128}
reflow:  ->  hello.Main   3dca1cc0 exec   exec ..aec165018ef44a4d2d46c7cdea80a9dff0d1ea echo hello world >>{{out}}
reflow: hello.Main 3dca1cc0 /Users/me/reflow/hello.rf:1:16:
        resources: {mem:500.0MiB cpu:1 disk:0B}
        sha256:143d42326a7796eab8314a0030604c95e7afad1587ce681492f911b501b54db9
        sha256:b5cf39692f785fbbbc9ac03dbc00c2bde0ff2076d0373724293f810b2f1276b3
        sha256:3dca1cc06adb7b4a76dbc5a526c60ebed36ad8793b5a13cc6449c4c7ff329c8e
        index.docker.io/library/ubuntu@sha256:965fbcae990b0467ed5657caceaec165018ef44a4d2d46c7cdea80a9dff0d1ea
        command:
            echo hello world >>{{out}}
        where:
reflow:  <-  hello.Main   3dca1cc0 err    exec 0s ?
        error resources exhausted: requested resources {mem:500.0MiB cpu:1 disk:0B} not satisfiable even by largest available instance type x2iedn.32xlarge with resources {mem:0B cpu:128 disk:250.0GiB intel_avx:128 intel_avx2:128 intel_avx512:128 intel_turbo:128}
        /Users/me/reflow/hello.rf:1:16
        index.docker.io/library/ubuntu@sha256:965fbcae990b0467ed5657caceaec165018ef44a4d2d46c7cdea80a9dff0d1ea
        command:
            echo hello world >>{{out}}
        where:
        profile:
            cpu mean=0.0 max=0.0 (N=0, duration=0s)
            mem mean=0B max=0B (N=0, duration=0s)
            disk mean=0B max=0B (N=0, duration=0s)
            tmp mean=0B max=0B (N=0, duration=0s)
reflow: total n=1 time=0s
        ident      n   ncache runtime(m) cpu mem(GiB) disk(GiB) tmp(GiB) requested
        hello.Main 1   0                                        

reflow: marking run done after nonrecoverable error resources exhausted: requested resources {mem:500.0MiB cpu:1 disk:0B} not satisfiable even by largest available instance type x2iedn.32xlarge with resources {mem:0B cpu:128 disk:250.0GiB intel_avx:128 intel_avx2:128 intel_avx512:128 intel_turbo:128}
reflow: resources exhausted: requested resources {mem:500.0MiB cpu:1 disk:0B} not satisfiable even by largest available instance type x2iedn.32xlarge with resources {mem:0B cpu:128 disk:250.0GiB intel_avx:128 intel_avx2:128 intel_avx512:128 intel_turbo:128}
$                                                                                                                                                                                    

The advertised mem:0B looks suspicious but I have not looked deeper than that.

I tried a few older release builds but they all fail with the same error. If I go back far enough, I get a different error:

$ ~/Downloads/reflow1.13.0.darwin.amd64 run hello.rf
infra.Init: provider ec2cluster for type *ec2cluster.Cluster: missing AMI parameter
$                                                                                                                                                                                    

I was going to attempt some code fixups but here I encountered yet more trouble: the standard go install workflow does not work:

$ ~/sdk/go1.19.3/bin/go install github.com/grailbio/reflow/cmd/reflow@latest
go: downloading github.com/grailbio/reflow v0.0.0-20221206232358-04b01f719b84
go: finding module for package github.com/grailbio/base/s3util
go: finding module for package github.com/grailbio/base/cloud/spotadvisor
go: finding module for package github.com/grailbio/base/cloud/spotfeed
go/pkg/mod/github.com/grailbio/[email protected]/ec2cluster/ec2cluster.go:33:2: module github.com/grailbio/base@latest found (v0.0.10), but does not contain package github.com/grailbio/base/cloud/spotadvisor
go/pkg/mod/github.com/grailbio/[email protected]/tool/cost.go:15:2: module github.com/grailbio/base@latest found (v0.0.10), but does not contain package github.com/grailbio/base/cloud/spotfeed
go/pkg/mod/github.com/grailbio/[email protected]/blob/s3blob/s3blob.go:27:2: module github.com/grailbio/base@latest found (v0.0.10), but does not contain package github.com/grailbio/base/s3util
$                                                                                                                         

My guess is that go.mod is not being kept in sync with the internal Bazel repo...

I eventually managed to get it to build after a series of guesses around package upgrades and some local patching but by that point I lost any confidence that my local sandbox bears any resemblance to what upstream uses. Belatedly, I realized I maybe could have extracted an up-to-date go.mod from the buildinfo metadata embedded in the released binaries but I ran out of time dedicated to this experiment.

Overall, a surprisingly poor experience for a project in its 1.x life phase. It's a shame because the technology seems interesting.

@swami-m
Copy link
Contributor

swami-m commented May 8, 2023

@masiulaniec Not sure if you are still looking at using reflow, but for the original taskdb problem, I think the following solution might work:

> reflow config -marshal > /tmp/reflow_config
> vim /tmp/reflow_config # and add the following line

taskdb: noptaskdb

> reflow run -config /tmp/reflow_config -local hello.rf

Perhaps @fialhopm might be able to confirm.

@fialhopm
Copy link
Contributor

Apologies for the very late response.

Unfortunately, specifying noptaskdb appears to not be sufficient to get hello.rf to work in local mode.

If you're using the us-east-1 region, then the following should solve the resources exhausted error:

> reflow config -marshal > /tmp/reflow_config
> vim /tmp/reflow_config # and remove the following instance types

  - c6a.32xlarge
  - c6a.48xlarge
  - c6id.32xlarge
  - g5.48xlarge
  - i4i.32xlarge
  - m6a.32xlarge
  - m6a.48xlarge
  - m6id.32xlarge
  - r6a.32xlarge
  - r6a.48xlarge
  - r6i.32xlarge
  - r6id.32xlarge
  - trn1.32xlarge
  - x2idn.32xlarge
  - x2iedn.32xlarge

> reflow -config /tmp/reflow_config run -local hello.rf

This will not work for other regions.

We'll include fixes for both issues in the next release, which will hopefully go out within the next month.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants