Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

tips for running on google colab #587

Merged
merged 1 commit into from
Sep 18, 2024
Merged

Conversation

sujee
Copy link
Contributor

@sujee sujee commented Sep 13, 2024

Why are these changes needed?

To enable run DPK applications on google colab.

Related issue number (if any).

#582

@sujee
Copy link
Contributor Author

sujee commented Sep 16, 2024

need to know where we can link to this document

Copy link
Member

@daw3rd daw3rd left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Where should this file be linked from?
Also fdedupe to fedeup, for consistency

@sujee
Copy link
Contributor Author

sujee commented Sep 16, 2024

Also fdedupe to fedeup, for consistency

You mean this? These are program args for the fuzzy transform.

    # infrastructure
    "fdedup_bucket_cpu": 0.3,
    "fdedup_doc_cpu": 0.3,
    "fdedup_mhash_cpu": 0.3,
    "fdedup_num_doc_actors": 1,
    "fdedup_num_bucket_actors": 1,
    "fdedup_num_minhash_actors": 1,
    "fdedup_num_preprocessors": 1,

@sujee
Copy link
Contributor Author

sujee commented Sep 16, 2024

Where should this file be linked from?

May be we can create a Tips and Troubleshooting section in the main README

I can use some input on this : @Bytes-Explorer @shahrokhDaijavad

@shahrokhDaijavad
Copy link
Member

@sujee There are other Tips and Troubleshooting issues dispersed in various doc files in the repo (e.g., mac.md and memory.md files in the same place you have put your google-colab.md file). If we create such a section in the README file, it's better to consolidate all of them into one md file with different sections for different tips. I think at some point soon, we should do this, but for now, I think we should just add a link to a section of README file that I submitted a PR for earlier this morning (PR #593) (to be reviewed by Hima). In the new README, I have put your Google Colab example before setting the local environment and we can add a sentence there saying something like: Though you won't need them for this simple example, here are some tips for running on Google Coalb and add the link

@sujee
Copy link
Contributor Author

sujee commented Sep 16, 2024

@shahrokhDaijavad I like this idea 👍

@Bytes-Explorer
Copy link
Collaborator

Bytes-Explorer commented Sep 17, 2024 via email

@daw3rd
Copy link
Member

daw3rd commented Sep 18, 2024

Also fdedupe to fedeup, for consistency

You mean this? These are program args for the fuzzy transform.

    # infrastructure
    "fdedup_bucket_cpu": 0.3,
    "fdedup_doc_cpu": 0.3,
    "fdedup_mhash_cpu": 0.3,
    "fdedup_num_doc_actors": 1,
    "fdedup_num_bucket_actors": 1,
    "fdedup_num_minhash_actors": 1,
    "fdedup_num_preprocessors": 1,

No, i saw doc references to fededupe which is something new. I think you mean fdedup. Let's not introduce new terms

Copy link
Member

@shahrokhDaijavad shahrokhDaijavad left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@daw3rd daw3rd merged commit 93b12f0 into IBM:dev Sep 18, 2024
21 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants