Skip to content

Commit

Permalink
MIIC v2.0.1: Preparation of CRAN update submission, group all breakin…
Browse files Browse the repository at this point in the history
…g changes. (#133)

* Opposite edges allowed in true edges

* parseResults optimization for large number of variables

* Remove debug traces

* Opposite true edges not allowed with specific warning

* Static summary structure, change column types, reorder columns

* Rename summary columns with underscores

* Summary and documentation update

* Split proba into p_y2x, p_x2y

* Standardization of exported function names

* Remove uppercase in miic summary

* Remove uppercase in miic orientation probas

* Remove uppercase computeThreePointInfo return value

* Rename ori abreviates into ort

* Rename all.edges.xx and orientations.prob data frames

* Turn X, Y, Z function parameters into lowercase

* Harmonization of miic object + abreviated as mo

* Update version to 2.0.1

* Fixes for R checks

* URL check

* NEWS update for CRAN submission

* Spell check

* Check document tags

* Fix documention for R checks

* Set sign as true NA when 'NA'

* Fix about total run time, forced in secs

* OD review: replace mo, tmo by miic_obj, tmiic_obj

* NEWS review following comment on pull request

* Rename MDL as BIC

* HI review (without description)

* Harmonize is_continuous as parameter

* Rename movavg -> mov_avg

* Shortened ref in text, URL and tille added in ref section

* README: S. Affeldt, point to PDF + add supp

* MIIC description review

* CRAN check

* News review

* Add link to News.md in DESCRIPTION
  • Loading branch information
franck-simon authored Sep 13, 2024
1 parent b150287 commit 96c685f
Show file tree
Hide file tree
Showing 47 changed files with 2,321 additions and 1,959 deletions.
48 changes: 30 additions & 18 deletions DESCRIPTION
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
Package: miic
Title: Learning Causal or Non-Causal Graphical Models Using Information Theory
Version: 2.0.0
Version: 2.0.1
Authors@R:
c(person(given = "Franck",
family = "Simon",
Expand Down Expand Up @@ -44,23 +44,35 @@ Authors@R:
family = "Isambert",
role = "aut",
email = "[email protected]"))
Description: We report an information-theoretic method which learns a large
class of causal or non-causal graphical models from purely observational
data, while including the effects of unobserved latent variables, commonly
found in many datasets. Starting from a complete graph, the method
iteratively removes dispensable edges, by uncovering significant information
contributions from indirect paths, and assesses edge-specific confidences
from randomization of available data. The remaining edges are then oriented
based on the signature of causality in observational data. This approach can
be applied on a wide range of datasets and provide new biological insights
on regulatory networks from single cell expression data, genomic alterations
during tumor development and co-evolving residues in protein structures.
Since the version 2.0, MIIC can in addition process stationary time series
to unveil temporal causal graphs.
Description: MIIC (Multivariate Information-based Inductive Causation) is a
causal discovery method, based on information theory principles, which
learns a large class of causal or non-causal graphical models from purely
observational data, while including the effects of unobserved latent
variables. Starting from a complete graph, the method iteratively removes
dispensable edges, by uncovering significant information contributions from
indirect paths, and assesses edge-specific confidences from randomization
of available data. The remaining edges are then oriented based on the
signature of causality in observational data. The recent more interpretable
MIIC extension (iMIIC) further distinguishes genuine causes from putative
and latent causal effects, while scaling to very large datasets (hundreds
of thousands of samples).Since the version 2.0, MIIC also includes a
temporal mode (tMIIC) to learn temporal causal graphs from stationary time
series data. MIIC has been applied to a wide range of biological and
biomedical data, such as single cell gene expression data, genomic
alterations in tumors, live-cell time-lapse imaging data (CausalXtract),
as well as medical records of patients. MIIC brings unique insights based
on causal interpretation and could be used in a broad range of other data
science domains (technology, climatology, economy, ...).
For more information, you can refer to:
Simon et al. eLife, reviewed preprint <doi:10.1101/2024.02.06.579177>,
Cabeli et al. PLoS Comp. Bio. 2020 <doi:10.1371/journal.pcbi.1007866>,
Verny et al. PLoS Comp. Bio. 2017 <doi:10.1371/journal.pcbi.1005662>.
Simon et al., eLife 2024, <doi:10.1101/2024.02.06.579177>,
Ribeiro-Dantas et al., iScience 2024, <doi:10.1016/j.isci.2024.109736>,
Cabeli et al., NeurIPS 2021, <https://why21.causalai.net/papers/WHY21_24.pdf>,
Cabeli et al., Comput. Biol. 2020, <doi:10.1371/journal.pcbi.1007866>,
Li et al., NeurIPS 2019, <https://papers.nips.cc/paper/9573-constraint-based-causal-structure-learning-with-consistent-separating-sets>,
Verny et al., PLoS Comput. Biol. 2017, <doi:10.1371/journal.pcbi.1005662>,
Affeldt et al., UAI 2015, <https://auai.org/uai2015/proceedings/papers/293.pdf>.
Changes from the previous 1.5.3 release available on CRAN are available at
<https://github.com/miicTeam/miic_R_package/blob/master/NEWS.md>.
License: GPL (>= 2)
URL: https://github.com/miicTeam/miic_R_package
BugReports: https://github.com/miicTeam/miic_R_package/issues
Expand All @@ -79,4 +91,4 @@ LinkingTo:
SystemRequirements: C++14
LazyData: true
Encoding: UTF-8
RoxygenNote: 7.3.1
RoxygenNote: 7.3.2
7 changes: 3 additions & 4 deletions NAMESPACE
Original file line number Diff line number Diff line change
Expand Up @@ -7,11 +7,10 @@ export(computeThreePointInfo)
export(discretizeMDL)
export(discretizeMutual)
export(estimateTemporalDynamic)
export(export)
export(miic)
export(miic.export)
export(miic.write.network.cytoscape)
export(miic.write.style.cytoscape)
export(tmiic.export)
export(writeCytoscapeNetwork)
export(writeCytoscapeStyle)
import(Rcpp)
importFrom(stats,density)
importFrom(stats,sd)
Expand Down
154 changes: 106 additions & 48 deletions NEWS.md
Original file line number Diff line number Diff line change
@@ -1,114 +1,172 @@
# Development version

# v2.0.0
# v2.0.1

## Features

- tMIIC version for temporal causal discovery on stationary time series:
new mode of MIIC to reconstruct networks from temporal stationary datasets.
[Simon et al., eLife, reviewed preprint]
(https://www.biorxiv.org/content/10.1101/2024.02.06.579177v1.abstract)
* Release to CRAN.

## Fixes and improvements

* Faster post-processing in R for datasets with large number of variables.

## Breaking changes

Consolidating long-pending breaking changes:

* Harmonization of exported function names using `camel case`.

* Harmonization of parameters and return values using `snake case`.

* Harmonization of abbreviations.

All the documentation has been updated accordingly, if you encounter any issue
upgrading to this version, please consult the help of the relevant function
for more information about its interface.

For the core `miic()` function, the main breaking changes in the interface
(when upgrading from the 1.5.3 release on CRAN) are:

in the parameters:

* `cplx`: renaming of the complexity term `"mdl"` &rarr; `"bic"`

* `ori_proba_ratio` &rarr; `ort_proba_ratio`

in the miic object returned:

* `all.edges.summary` &rarr; `summary`
* `Nxy_ai` &rarr; `n_xy_ai`
* `log_confidence` &rarr; `info_shifted`
* `infOrt` &rarr; `ort_inferred`
* `trueOrt` &rarr; `ort_ground_truth`
* `isOrtOk` &rarr; `is_inference_correct`
* `isCausal` &rarr; `is_causal`
* `proba` &rarr; `p_y2x`, `p_x2y`
* `consensus` &rarr; `ort_consensus`

* `orientations.prob` &rarr; `triples`
* `NI3` &rarr; `ni3`
* `Error` &rarr; `conflict`

Still compared to 1.5.3, another important change in the behavior of `miic()`
is that, by default, `miic()` no longer propagates orientations
and allows latent variables discovery during orientation step.

## Known issues

- A (very) large number of contributors can lead to a memory fault.
Initial fix has been reverted due to side effects.
* Conditioning on a (very) large number of contributors can lead to a memory
fault.

# v2.0.0

## Features

* tMIIC version for temporal causal discovery on stationary time series:
new mode of `miic()` to reconstruct networks from temporal stationary
datasets ([Simon et al., eLife 2024](https://www.biorxiv.org/content/10.1101/2024.02.06.579177v1.abstract)).
The temporal mode of `miic()` is not activated by default and can be enabled by
setting the newly added parameter `mode` to `"TS"`(Temporal Stationary).
A tuning of the temporal mode is possible through a set of new parameters:
`max_nodes`, `n_layers`, `delta_t`, `mov_avg` and `keep_max_data`.

# v1.8.1

## Fixes and improvements

- The discretization of continuous variables has been modified when dealing
* The discretization of continuous variables has been improved when dealing
with variables having a large number of identical values.

- Fix for memory overflow on shared memory space.
* Fix for memory overflow on shared memory space.

# v1.8.0

## Features

- Addition of the 'is consequence' prior knowledge. Consequence variables are
excluded from the possible contributors, edges between consequences are
ignored and edges between a non consequence and a consequence are pre-oriented
toward the consequence.
* Addition of the 'is consequence' prior knowledge. Consequence variables are
excluded from the possible contributors, edges between consequences are
ignored and edges between a non consequence and a consequence are pre-oriented
toward the consequence.
Information about consequence variables can be provided to `miic()`
in the `state_order`, by supplying an `is_consequence` column.

# v1.7.0

## Features

- iMIIC version introducing genuine vs putative causes, contextual variables
and multiple enhancements to deal with very large datasets.
[Ribeiro-Dantas et al., iScience 2024]
(https://arxiv.org/abs/2303.06423)
* iMIIC version introducing contextual variables, genuine vs putative causes
and multiple enhancements to deal with very large datasets ([Ribeiro-Dantas et al., iScience 2024](https://doi.org/10.1016/j.isci.2024.109736)).
Information on contextual variables can be provided to `miic()`
in the `state_order`, by supplying an `is_contextual`column and
genuine vs putative causes can be tuned by the newly added parameter
`ort_consensus_ratio`.

# v1.6.0

## Features

- Enhancement of orientations using mutual information supremum principle for
finite datasets.
[Cabeli et al., Why21 at NeurIPS 2021]
(http://kinefold.curie.fr/isambertlab/PAPERS/cabeli_Why21-NeurIPS2021.pdf)
* Enhancement of orientations using mutual information supremum principle for
finite datasets ([Cabeli et al., Why21 at NeurIPS 2021](http://kinefold.curie.fr/isambertlab/PAPERS/cabeli_Why21-NeurIPS2021.pdf)).
The use of enhanced orientations is controlled by the newly added parameter
`negative_info` of `miic()` and is activated by default.

- By default, MIIC does not propagate orientations anymore
* By default, `miic()` does not propagate orientations anymore
and allows latent variables discovery during orientation step.

# v1.5.3

## Features

- Release to CRAN
* Release to CRAN

# v1.5.2

## Fixes and improvements
- Further refactoring of the C++ code for the computation of information.
* Further refactoring of the C++ code for the computation of information.

- Fix minor bugs in the continuous computation.
* Fix minor bugs in the continuous computation.

- Fix incompatibility with older versions of GCC (std::align).
* Fix incompatibility with older versions of GCC (std::align).

# v1.5.1

## Fixes and improvements
- Fix various bugs in the computation of information in the presence of NA
* Fix various bugs in the computation of information in the presence of NA
values in the dataset.

- An overhaul of the C++ code base, better memory management, computation time
* An overhaul of the C++ code base, better memory management, computation time
and code readability.

# v1.5.0

## Features
- Add a column `consensus` to the reconstructed graph's edges summary associated
* Add a column `consensus` to the reconstructed graph's edges summary associated
with the option `consistent`, and a new parameter `consensus_threshold`
accordingly.

- Add a parameter `ori_proba_ratio` to have more control on the orientation of
* Add a parameter `ori_proba_ratio` to have more control on the orientation of
edges.

## Fixes and improvements
- Faster post processing in R.
* Faster post processing in R.

- Rework plot functionality.
* Rework plot functionality.

- Fix a bug in the orientation part about the log score.
* Fix a bug in the orientation part about the log score.

- Refactor of the C++ code base (orientation).
* Refactor of the C++ code base (orientation).

# v1.4.2

## Fixes and improvements
- Various fixes of memory leaks and ambiguous function calls (at least for all
* Various fixes of memory leaks and ambiguous function calls (at least for all
that appear in CRAN check).

- Refactor of the C++ code base (confidence cut).
* Refactor of the C++ code base (confidence cut).

## Known issues
- Error when running the cosmicCancer example on CRAN's Solaris system.
* Error when running the cosmicCancer example on CRAN's Solaris system.

## Miscellaneous
- Move from BitBucket to GitHub, the repo is now public.
* Move from BitBucket to GitHub, the repo is now public.

# v1.4.1

Expand All @@ -118,30 +176,30 @@ CRAN).
# v1.4.0

## Incompatible changes
- Standardize the API naming convention: `snake_case` for parameters and
* Standardize the API naming convention: `snake_case` for parameters and
`camelCase` for functions. This should have led to a major version increment to
v2.0.0 given the previous version on CRAN is v1.0.3. But v1.0.3 and earlier
versions were not properly maintained and versioned under a version control
system (so we actually forgot to take them into consideration when releasing
this version).

## Features
- The method now works with continuous variables (solely or mixed with discrete
* The method now works with continuous variables (solely or mixed with discrete
variables), thanks to the discretization method as described in
[Cabeli et al., PLoS Comp. Bio. 2020](https://doi.org/10.1371/journal.pcbi.1007866).
[Cabeli et al., PLoS Comput. Biol. 2020](https://doi.org/10.1371/journal.pcbi.1007866).

- Add an option `consistent` to improve the reconstructed graph's
* Add an option `consistent` to improve the reconstructed graph's
interpretability based on schemes as described in
[Li et al., NeurIPS 2019](https://papers.nips.cc/paper/9573-constraint-based-causal-structure-learning-with-consistent-separating-sets).

## Fixes and improvements
- Various fixes of memory leaks and typos.
* Various fixes of memory leaks and typos.

- Major refactoring of the old C++ code base (still WIP) to improve readability
* Major refactoring of the old C++ code base (still WIP) to improve readability
and flexibility, and to enforce proper coding style and documentation.

- Enforce proper coding style for the R code base.
* Enforce proper coding style for the R code base.

## Known issues
- Still have some memory leaks and CRAN check errors and notes on certain
* Still have some memory leaks and CRAN check errors and notes on certain
platforms.
Loading

0 comments on commit 96c685f

Please sign in to comment.