Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add PEtab 2.0 draft #554

Open
wants to merge 15 commits into
base: main
Choose a base branch
from
Open

Add PEtab 2.0 draft #554

wants to merge 15 commits into from

Conversation

dweindl
Copy link
Member

@dweindl dweindl commented Mar 10, 2023

Adds the PEtab v2 specification draft to the main documentation.

👀 https://petab--554.org.readthedocs.build/en/554/v2/documentation_data_format.html

@dweindl
Copy link
Member Author

dweindl commented Jun 25, 2024

  • Add a section summarizing the changes from v1 to v2

@dweindl
Copy link
Member Author

dweindl commented Jun 26, 2024

For now, the changes to doc/documentation_data_format.rst and the schema are good to see the diff, but

  • we will want to have both the PEtab v1 and the v2 specs available, so this needs to be re-organized

@dweindl
Copy link
Member Author

dweindl commented Jul 3, 2024

For now, the changes to doc/documentation_data_format.rst and the schema are good to see the diff, but

* [ ]  we will want to have both the PEtab v1 and the v2 specs available, so this needs to be re-organized

Unless there are any objections, I will separate the v1 and v2 specs after merging #538 and then merge this PR to main. The main RTD page will then get a separate v2-draft section. This will also make v2 development more visible.

FFroehlich and others added 12 commits July 3, 2024 18:51
* extract all changes from previous

* fixup

* allow hyphens in extension names

* fixup hyphens

* only require one toolbox that implements extension

* specify how to work with multiple PEtab problems

* specify we do not require a quorum number of votes

* allow test cases to be provided by the extension library

* Apply suggestions from code review

Co-authored-by: Daniel Weindl <[email protected]>

Co-authored-by: Daniel Weindl <[email protected]>
PEtab extensions were introduced in #537. We should be able to distinguish there between optional extensions and required extensions, i.e. those that modify the parameter estimation problem as such, and those that just add additional/optional information (e.g. annotations, info for visualization, ...). If some tool does not know about a certain optional extension, it can safely be ignored during import, if it does not know about a required extension, it should fail.

This PR adds a `required` attribute to extensions in the yaml file to indicate whether they are required for the mathematical interpretation of the PEtab problem.

Resolves #544
Previously, the math expression syntax wasn't specified. This was very problematic, because different libraries and programming languages have different names for the same functions, and more importantly, differ in operator precedence.


Co-authored-by: Dilan Pathirana <[email protected]>
Co-authored-by: dilpath <[email protected]>
…562)

Following up on #543 and the discussion during the last PEtab editor meeting:
There was general consent to allow using observableIDs in the `noiseFormula` column in the observables table.

Closes #543.
# Motivation

There are a number of formats for specifying models in systems biology, each with their specific strengths and weaknesses. PEtab version 1.0.0 only allows  Systems Biology Markup Language (SBML) models. While SBML is supported by a large number of tools, there are good reasons to use other formats. For example, rule-based model formats (e.g., BioNetGenLanguage) permit more abstract and compact specification of models based on rules, which are generalisations of reactions. Therefore, and based on user request (#436), we propose to lift PEtab’s restriction to SBML models and allow arbitrary model formats.

# Proposed changes

* Changes to the PEtab YAML file:
  * Change `sbml_files` to `models`
  * `models` entries will be model IDs (following the existing conventions for PEtab IDs) mapping to:
    * `location`: path / URL to the model
    * `language`: model format
      Initial set of model format identifiers (to be extended as needed):
      * SBML: `sbml`
      * CellML: `cellml`
      * BNGL: `bngl`
      * PySB: `pysb`
  * An additional entry for mapping tables (see below) is added

  Example:

  **Before:**
  ```yaml
  format_version: 1
  parameter_file: parameters.tsv
  problems:
  - condition_files:
    - conditions.tsv
    measurement_files:
    - measurements.tsv
    observable_files:
    - observables.tsv
    sbml_files:
    - model1.xml
  ```

  **After:**
  ```yaml
  format_version: 2.0.0
  parameter_file: parameters.tsv
  problems:
  - condition_files:
    - conditions.tsv
    measurement_files:
    - measurements.tsv
    observable_files:
    - observables.tsv
    mapping_file: mappings.tsv # optional 
    models:
      id_for_model1:
        location: model1.xml
        language: sbml
  ```



* Changes to the format of existing tables/files:
  * Condition/Observable/Parameter Table
    All symbols that previously referenced the ID of SBML entities, such as parameter IDs or compartment IDs, now refer to (globally unique) named entities in the model, such as parameters, observables, expressions. For example, condition table columns may correspond to parameters, states, species of the referenced model. 
    For species, assignments in the condition table set the initial value at the beginning of the simulation for that condition, potentially replacing the initialization from preequilibration. For all other entities, values are statically replaced at all time points. For entities that assign values to other entities, such as SBML AssignmentRules, the value of the target of that rule is statically replaced at all time points.    
* Additional files
  * Mapping Table: 
    Mapping PEtab entity IDs to entity IDs in the model. This optional file may be used to reference model entities in PEtab files where the ID in the model would not be a valid identifier in PEtab (e.g., due to containing blanks, dots, or other special characters).
    The tsv file has two mandatory columns: `petabEntityId`, `modelEntityId`. Additional columns are allowed. modelEntityIds must be unique identifiers in the model. The mapping table must not map modelEntityIds to petabEntityIds that are also defined in any other part of the PEtab problem. modelEntityId may not refer to other petabEntityIds, including those defined in the mapping table. petabEntityIds defined in the mapping table may be referenced in condition, measurement, parameter and observable tables, but cannot be referenced in the model itself.
    For example, in SBML, local parameters may be referenced as `$reactionId.$localParameterId`, which are not valid PEtab IDs as they contain a `.` character. Similarly, this table may be used to reference specific species in a BGNL model which may contain many unsupported characters such as `,`, `(` or `.`. However, please note that IDs must exactly match the species names in the BNGL generated network file and no pattern matching will be performed. 

# Implications

* Tools need to check the model format and provide an informative message if the given format cannot be handled
* Validators will skip model-dependent validation when encountering unknown model types - ideally there would be some plugin mechanisms to provide validation

--- 

Co-authored by @FFroehlich @fbergmann. Also thanks to everybody participating in these discussions during the last COMBINE meeting.

---------



Co-authored-by: FFroehlich <[email protected]>
Co-authored-by: Dilan Pathirana <[email protected]>
Co-authored-by: Frank T. Bergmann <[email protected]>
@dweindl dweindl changed the title PEtab 2.0 draft Add PEtab 2.0 draft Jul 3, 2024
@dweindl dweindl marked this pull request as ready for review July 3, 2024 17:02
@dweindl dweindl requested a review from a team as a code owner July 3, 2024 17:02
@dweindl dweindl requested a review from dilpath July 8, 2024 08:33
Copy link
Member

@dilpath dilpath left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good 🚀

I did not look much at parts of doc/v2/documentation_data_format.rst that were probably copied from v1.

README.md Outdated Show resolved Hide resolved
CHANGELOG.md Outdated Show resolved Hide resolved
doc/_static/petab_schema.yaml Outdated Show resolved Hide resolved
doc/development.rst Outdated Show resolved Hide resolved
doc/development.rst Outdated Show resolved Hide resolved
Comment on lines +779 to +783
Additional columns, such as ``Color``, etc. may be specified. Extensions
that define operations on multiple PEtab problems need to employ a single
PEtab YAML file as entrypoint to the analysis. This PEtab file may leave all
fields specifying files empty and reference the other PEtab problems in the
extension specific fields.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this useful? PEtab Select currently doesn't do this, and I don't see any benefits from doing this. It would just add an additional file to PEtab Select that duplicates information in the Model Space table.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Or by "operations on multiple PEtab problems", do you mean rather, estimating parameters across multiple PEtab problems? Is there a use case? If not, it could be removed from the spec...

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's rather about the petab-select case. But I don't think it makes sense to consider petab-select an extension in the sense used here. It just builds on top of PEtab problems, but doesn't really change the interpretation of any specific PEtab problem. I think it could be removed, but maybe that's a separate discussion.
I think the main point here was that for a PEtab problem, we always want to have a yaml file that lists the extensions.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree re: putting extensions that modify a PEtab problem, into the PEtab YAML. Since I don't yet see a use case for "Extensions that define operations on multiple PEtab problems", I don't understand/can't review this part of the spec.

doc/v2/documentation_data_format.rst Outdated Show resolved Hide resolved
doc/v2/documentation_data_format.rst Outdated Show resolved Hide resolved
doc/v2/documentation_data_format.rst Show resolved Hide resolved
doc/v2/documentation_data_format.rst Outdated Show resolved Hide resolved
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants