Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

False positive invalid condition ID. #477

Open
FFroehlich opened this issue Oct 20, 2020 · 6 comments
Open

False positive invalid condition ID. #477

FFroehlich opened this issue Oct 20, 2020 · 6 comments
Assignees

Comments

@FFroehlich
Copy link
Collaborator

FFroehlich commented Oct 20, 2020

Which problem would you like to address? Please describe.
Linting complains about invalid condition IDs. I believe the error is due to the fact that the condition id is numeric only. Ideally the error would be a bit more informative what the actual issue is with the condition ID.

Describe the solution you would like
Change the spec to be more accurate about identifier restrictions or change linting to be more permissive about identifiers.

Describe alternatives you have considered
Don't use numbers in identifiers.

Additional context
Can be checked via check_condition_df(pd.DataFrame({petab.CONDITION_ID: ['0']}).set_index(petab.CONDITION_ID), None).

Not sure how much sense it makes to print line numbers when operating with a DataFrame.

Typehints in check_condition_df are inconsistent with the function signature, i.e., sbml_model is not and optional argument.

@dweindl
Copy link
Member

dweindl commented Oct 21, 2020

Linting complains about invalid condition IDs. I believe the error is due to the fact that the condition id is numeric only.

Correct.

Change the spec to be more accurate about identifier restrictions or change linting to be more permissive about identifiers.

https://petab.readthedocs.io/en/latest/documentation_data_format.html#overview says:

All identifiers must consist only of upper and lower case letters, digits and underscores, and must not start with a digit.

But it seems to be overlooked to easily in the preamble. So maybe better to include that sentence with for every ID column.

Not sure how much sense it makes to print line numbers when operating with a DataFrame.

Dataframe row index preferred?
Not really sure which one is more helpful...

Typehints in check_condition_df are inconsistent with the function signature, i.e., sbml_model is not and optional argument.

👍

@FFroehlich
Copy link
Collaborator Author

Not sure how much sense it makes to print line numbers when operating with a DataFrame.

Dataframe row index preferred?
Not really sure which one is more helpful...

I think printing the regex it needs to match would be more helpful than any line numbers. The row index would be equal to the offending ID here so that doesn't really add anything new. I don't think any further info is necessary and line numbers are misleading for non-file inputs.

@dilpath
Copy link
Member

dilpath commented Oct 21, 2020

Line numbers were added in #467 after a couple of people had issues with whitespace-only lines being added to the end of their PEtab TSV files (possibly after exporting from Excel.. not sure), which then caused errors (e.g. #466).

If line numbers are removed, then a check for whitespace-only lines in TSV files/rows in dataframes might be useful, as the ID in the error message may appear blank.

I guess the line numbers aren't always correct for files either e.g. when using multiple parameter subset files that are combined into a single dataframe.

@FFroehlich
Copy link
Collaborator Author

Line numbers were added in #467 after a couple of people had issues with whitespace-only lines being added to the end of their PEtab TSV files (possibly after exporting from Excel.. not sure), which then caused errors (e.g. #466).

If line numbers are removed, then a check for whitespace-only lines in TSV files/rows in dataframes might be useful, as the ID in the error message may appear blank.

I guess the line numbers aren't always correct for files either e.g. when using multiple parameter subset files that are combined into a single dataframe.

I think when importing from files, it makes sense to print line-numbers, when importing from DataFrames, not so much. Should be possible to parse the input type somewhere.

@dweindl
Copy link
Member

dweindl commented Nov 9, 2020

I guess the line numbers aren't always correct for files either e.g. when using multiple parameter subset files that are combined into a single dataframe.

Good point.

So how shall we handle that? Drop line numbers? Just print the offending row? It's not too helpful for whitespace-only rows, but they should be relatively easy to spot anyways.

dweindl added a commit that referenced this issue Nov 9, 2020
@dilpath
Copy link
Member

dilpath commented Nov 10, 2020

So how shall we handle that? Drop line numbers? Just print the offending row? It's not too helpful for whitespace-only rows, but they should be relatively easy to spot anyways.

Yes, but if pd.isna() then add a hint in the error message to suggest that the user checks for whitespace-only lines, instead of printing the row, should be sufficient.

@dilpath dilpath self-assigned this Nov 10, 2020
@dweindl dweindl mentioned this issue Nov 20, 2020
dweindl added a commit that referenced this issue Nov 20, 2020
Release 0.1.12

* Documentation update:
  * Added SBML2Julia to list of tools supporting PEtab
  * Extended PEtab introduction
  * Tutorial for creating PEtab files
* Minor fix: Default argument for optional 'model' parameter in 
  `petab.lint.check_condition_df`` (#477)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants