-
Notifications
You must be signed in to change notification settings - Fork 526
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Wrap FileNotFound
exceptions in the finetuning dataloader and convert_text_to_mds
#1246
base: main
Are you sure you want to change the base?
Conversation
@@ -683,6 +686,8 @@ def build_collate_fn( | |||
'timeout': 0, | |||
}) | |||
|
|||
mosaicml_logger = maybe_create_mosaicml_logger() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
idk if this is necessary, given that this exception was already logged somewhere. is this caught and logged somewhere else (similar to convert_text_to_mds
)?
@jjanezhang @angel-ruiz7 This is awesome! I was actually just seeing a lot of FileNotFound errors last week. Are there any plans to handle FileNotFound errors that are surfaced when downloading checkpoints? I saw some errors last week due to a bad custom weights path. (To be clear, this is out of scope for this PR, just wanted to get your thoughts 😄) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
some feedback, nothing crazy
@@ -354,3 +356,26 @@ class RunTimeoutError(InternalError): | |||
def __init__(self, timeout: int) -> None: | |||
message = f'Run timed out after {timeout} seconds.' | |||
super().__init__(message, timeout=timeout) | |||
|
|||
|
|||
class DatasetMissingFileError(UserError): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this error supposed to cover both cases where a shard file is missing AND the index.json metadata file is missing? From the PR description, it sounds like this error is only for a missing/badly specified index.json file. So maybe it's more appropriate to call this DatasetMissingIndexError
instead
) -> None: | ||
message = "Could not find the file '{file_name}' with any of the supported extensions: " | ||
message += ', '.join(supported_extensions) + '.' | ||
message += ' Please check your train / eval data and try again.' |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
message += ' Please check your train / eval data and try again.' | |
message += ' Please check your train / eval dataset configuration and try again.' |
Not sure how this will be surfaced or if users will have access to this info, but this is more specific at least
def __init__( | ||
self, | ||
file_name: str, | ||
supported_extensions: Any, # Run into type error when using List[str] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what's the type error that you get? We do use List[str]
in other exceptions, see here
llm-foundry/llmfoundry/utils/exceptions.py
Line 227 in 9fbfccc
def __init__(self, keys: List[str]) -> None: |
List[str]
here.
@angel-ruiz7 Are you still working on this? |
Creates two exceptions:
DatasetMissingFileError
--> Tells the user that a dataset file could not be found during a failure in the finetuning dataloader wheresplit
/path
were not defined properlyDatasetInvalidFolderError
--> Tells the user that aninput_folder
is invalid and while trying to list objects inconvert_text_to_mds.py
mapi exceptions implemented in https://github.com/databricks-mosaic/mcloud/pull/4088