Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Wrap FileNotFound exceptions in the finetuning dataloader and convert_text_to_mds #1246

Open
wants to merge 12 commits into
base: main
Choose a base branch
from

Conversation

angel-ruiz7
Copy link
Contributor

@angel-ruiz7 angel-ruiz7 commented May 31, 2024

Creates two exceptions:

  • DatasetMissingFileError --> Tells the user that a dataset file could not be found during a failure in the finetuning dataloader where split / path were not defined properly
  • DatasetInvalidFolderError --> Tells the user that an input_folder is invalid and while trying to list objects in convert_text_to_mds.py

mapi exceptions implemented in https://github.com/databricks-mosaic/mcloud/pull/4088

@angel-ruiz7 angel-ruiz7 requested review from jjanezhang and removed request for jjanezhang May 31, 2024 16:55
@angel-ruiz7 angel-ruiz7 marked this pull request as draft May 31, 2024 17:00
@@ -683,6 +686,8 @@ def build_collate_fn(
'timeout': 0,
})

mosaicml_logger = maybe_create_mosaicml_logger()
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

idk if this is necessary, given that this exception was already logged somewhere. is this caught and logged somewhere else (similar to convert_text_to_mds)?

@jjanezhang

@angel-ruiz7 angel-ruiz7 marked this pull request as ready for review May 31, 2024 18:39
@irenedea
Copy link
Contributor

irenedea commented Jun 3, 2024

@jjanezhang @angel-ruiz7 This is awesome! I was actually just seeing a lot of FileNotFound errors last week. Are there any plans to handle FileNotFound errors that are surfaced when downloading checkpoints? I saw some errors last week due to a bad custom weights path.

(To be clear, this is out of scope for this PR, just wanted to get your thoughts 😄)

@irenedea irenedea requested a review from a team June 3, 2024 19:46
Copy link
Contributor

@snarayan21 snarayan21 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

some feedback, nothing crazy

@@ -354,3 +356,26 @@ class RunTimeoutError(InternalError):
def __init__(self, timeout: int) -> None:
message = f'Run timed out after {timeout} seconds.'
super().__init__(message, timeout=timeout)


class DatasetMissingFileError(UserError):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this error supposed to cover both cases where a shard file is missing AND the index.json metadata file is missing? From the PR description, it sounds like this error is only for a missing/badly specified index.json file. So maybe it's more appropriate to call this DatasetMissingIndexError instead

) -> None:
message = "Could not find the file '{file_name}' with any of the supported extensions: "
message += ', '.join(supported_extensions) + '.'
message += ' Please check your train / eval data and try again.'
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
message += ' Please check your train / eval data and try again.'
message += ' Please check your train / eval dataset configuration and try again.'

Not sure how this will be surfaced or if users will have access to this info, but this is more specific at least

def __init__(
self,
file_name: str,
supported_extensions: Any, # Run into type error when using List[str]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what's the type error that you get? We do use List[str] in other exceptions, see here

def __init__(self, keys: List[str]) -> None:
for example. Would be great if we can use List[str] here.

@irenedea
Copy link
Contributor

@angel-ruiz7 Are you still working on this?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants