MIIC v2.0.1: Preparation of CRAN update submission, group all breakin…

…g changes. (#133) * Opposite edges allowed in true edges * parseResults optimization for large number of variables * Remove debug traces * Opposite true edges not allowed with specific warning * Static summary structure, change column types, reorder columns * Rename summary columns with underscores * Summary and documentation update * Split proba into p_y2x, p_x2y * Standardization of exported function names * Remove uppercase in miic summary * Remove uppercase in miic orientation probas * Remove uppercase computeThreePointInfo return value * Rename ori abreviates into ort * Rename all.edges.xx and orientations.prob data frames * Turn X, Y, Z function parameters into lowercase * Harmonization of miic object + abreviated as mo * Update version to 2.0.1 * Fixes for R checks * URL check * NEWS update for CRAN submission * Spell check * Check document tags * Fix documention for R checks * Set sign as true NA when 'NA' * Fix about total run time, forced in secs * OD review: replace mo, tmo by miic_obj, tmiic_obj * NEWS review following comment on pull request * Rename MDL as BIC * HI review (without description) * Harmonize is_continuous as parameter * Rename movavg -> mov_avg * Shortened ref in text, URL and tille added in ref section * README: S. Affeldt, point to PDF + add supp * MIIC description review * CRAN check * News review * Add link to News.md in DESCRIPTION
miicTeam · Sep 13, 2024 · 96c685f · 96c685f
1 parent b150287
commit 96c685f
Show file tree

Hide file tree

Showing 47 changed files with 2,321 additions and 1,959 deletions.
diff --git a/DESCRIPTION b/DESCRIPTION
@@ -1,6 +1,6 @@
 Package: miic
 Title: Learning Causal or Non-Causal Graphical Models Using Information Theory
-Version: 2.0.0
+Version: 2.0.1
 Authors@R:
     c(person(given = "Franck",
              family = "Simon",
@@ -44,23 +44,35 @@ Authors@R:
              family = "Isambert",
              role = "aut",
              email = "[email protected]"))
-Description: We report an information-theoretic method which learns a large
-    class of causal or non-causal graphical models from purely observational
-    data, while including the effects of unobserved latent variables, commonly
-    found in many datasets. Starting from a complete graph, the method
-    iteratively removes dispensable edges, by uncovering significant information
-    contributions from indirect paths, and assesses edge-specific confidences
-    from randomization of available data. The remaining edges are then oriented
-    based on the signature of causality in observational data. This approach can
-    be applied on a wide range of datasets and provide new biological insights
-    on regulatory networks from single cell expression data, genomic alterations
-    during tumor development and co-evolving residues in protein structures.
-    Since the version 2.0, MIIC can in addition process stationary time series 
-    to unveil temporal causal graphs.
+Description: MIIC (Multivariate Information-based Inductive Causation) is a 
+    causal discovery method, based on information theory principles, which 
+    learns a large class of causal or non-causal graphical models from purely
+    observational data, while including the effects of unobserved latent 
+    variables. Starting from a complete graph, the method iteratively removes 
+    dispensable edges, by uncovering significant information contributions from 
+    indirect paths, and assesses edge-specific confidences from randomization 
+    of available data. The remaining edges are then oriented based on the 
+    signature of causality in observational data. The recent more interpretable 
+    MIIC extension (iMIIC) further distinguishes genuine causes from putative 
+    and latent causal effects, while scaling to very large datasets (hundreds
+    of thousands of samples).Since the version 2.0, MIIC also includes a 
+    temporal mode (tMIIC) to learn temporal causal graphs from stationary time
+    series data. MIIC has been applied to a wide range of biological and 
+    biomedical data, such as single cell gene expression data, genomic 
+    alterations in tumors, live-cell time-lapse imaging data (CausalXtract), 
+    as well as medical records of patients. MIIC brings unique insights based 
+    on causal interpretation and could be used in a broad range of other data 
+    science domains (technology, climatology, economy, ...).  
     For more information, you can refer to:
-    Simon et al. eLife, reviewed preprint <doi:10.1101/2024.02.06.579177>, 
-    Cabeli et al. PLoS Comp. Bio. 2020 <doi:10.1371/journal.pcbi.1007866>,
-    Verny et al. PLoS Comp. Bio. 2017 <doi:10.1371/journal.pcbi.1005662>.
+    Simon et al., eLife 2024, <doi:10.1101/2024.02.06.579177>,
+    Ribeiro-Dantas et al., iScience 2024, <doi:10.1016/j.isci.2024.109736>,
+    Cabeli et al., NeurIPS 2021, <https://why21.causalai.net/papers/WHY21_24.pdf>,
+    Cabeli et al., Comput. Biol. 2020, <doi:10.1371/journal.pcbi.1007866>,
+    Li et al., NeurIPS 2019, <https://papers.nips.cc/paper/9573-constraint-based-causal-structure-learning-with-consistent-separating-sets>,
+    Verny et al., PLoS Comput. Biol. 2017, <doi:10.1371/journal.pcbi.1005662>,
+    Affeldt et al., UAI 2015, <https://auai.org/uai2015/proceedings/papers/293.pdf>.
+    Changes from the previous 1.5.3 release available on CRAN are available at 
+    <https://github.com/miicTeam/miic_R_package/blob/master/NEWS.md>.
 License: GPL (>= 2)
 URL: https://github.com/miicTeam/miic_R_package
 BugReports: https://github.com/miicTeam/miic_R_package/issues
@@ -79,4 +91,4 @@ LinkingTo:
 SystemRequirements: C++14
 LazyData: true
 Encoding: UTF-8
-RoxygenNote: 7.3.1
+RoxygenNote: 7.3.2
diff --git a/NAMESPACE b/NAMESPACE
@@ -7,11 +7,10 @@ export(computeThreePointInfo)
 export(discretizeMDL)
 export(discretizeMutual)
 export(estimateTemporalDynamic)
+export(export)
 export(miic)
-export(miic.export)
-export(miic.write.network.cytoscape)
-export(miic.write.style.cytoscape)
-export(tmiic.export)
+export(writeCytoscapeNetwork)
+export(writeCytoscapeStyle)
 import(Rcpp)
 importFrom(stats,density)
 importFrom(stats,sd)

diff --git a/NEWS.md b/NEWS.md
@@ -1,114 +1,172 @@
-# Development version
-
-# v2.0.0
+# v2.0.1
 
 ## Features
 
-- tMIIC version for temporal causal discovery on stationary time series: 
-  new mode of MIIC to reconstruct networks from temporal stationary datasets.
-  [Simon et al., eLife, reviewed preprint]
-  (https://www.biorxiv.org/content/10.1101/2024.02.06.579177v1.abstract)
+* Release to CRAN.
+
+## Fixes and improvements
+
+* Faster post-processing in R for datasets with large number of variables.
+
+## Breaking changes
+
+Consolidating long-pending breaking changes:
+
+* Harmonization of exported function names using `camel case`.
+
+* Harmonization of parameters and return values using `snake case`.
+
+* Harmonization of abbreviations.
+
+All the documentation has been updated accordingly, if you encounter any issue
+upgrading to this version, please consult the help of the relevant function
+for more information about its interface.
+
+For the core `miic()` function, the main breaking changes in the interface
+(when upgrading from the 1.5.3 release on CRAN) are:
+
+in the parameters:
+
+* `cplx`: renaming of the complexity term `"mdl"` &rarr; `"bic"`
+
+* `ori_proba_ratio` &rarr; `ort_proba_ratio`  
+
+in the miic object returned:
+
+* `all.edges.summary` &rarr; `summary`
+  * `Nxy_ai` &rarr; `n_xy_ai`
+  * `log_confidence` &rarr; `info_shifted`
+  * `infOrt` &rarr; `ort_inferred`
+  * `trueOrt` &rarr; `ort_ground_truth`
+  * `isOrtOk` &rarr; `is_inference_correct`
+  * `isCausal` &rarr; `is_causal`
+  * `proba` &rarr; `p_y2x`, `p_x2y`
+  * `consensus` &rarr; `ort_consensus`
+
+* `orientations.prob` &rarr; `triples`
+  * `NI3` &rarr; `ni3`
+  * `Error` &rarr; `conflict`
+
+Still compared to 1.5.3, another important change in the behavior of `miic()`
+is that, by default, `miic()` no longer propagates orientations 
+and allows latent variables discovery during orientation step.
 
 ## Known issues
 
-- A (very) large number of contributors can lead to a memory fault.
-  Initial fix has been reverted due to side effects.
+* Conditioning on a (very) large number of contributors can lead to a memory 
+  fault.
 
+# v2.0.0
+
+## Features
+
+* tMIIC version for temporal causal discovery on stationary time series:
+  new mode of `miic()` to reconstruct networks from temporal stationary 
+  datasets ([Simon et al., eLife 2024](https://www.biorxiv.org/content/10.1101/2024.02.06.579177v1.abstract)).  
+  The temporal mode of `miic()` is not activated by default and can be enabled by
+  setting the newly added parameter `mode` to `"TS"`(Temporal Stationary).
+  A tuning of the temporal mode is possible through a set of new parameters:
+  `max_nodes`, `n_layers`, `delta_t`, `mov_avg` and `keep_max_data`.
+
 # v1.8.1
 
 ## Fixes and improvements
 
-- The discretization of continuous variables has been modified when dealing 
+* The discretization of continuous variables has been improved when dealing 
   with variables having a large number of identical values.
 
-- Fix for memory overflow on shared memory space.
+* Fix for memory overflow on shared memory space.
 
 # v1.8.0
 
 ## Features
 
-- Addition of the 'is consequence' prior knowledge. Consequence variables are 
-  excluded from the possible contributors, edges between consequences are 
-  ignored and edges between a non consequence and a consequence are pre-oriented 
-  toward the consequence.
+* Addition of the 'is consequence' prior knowledge. Consequence variables are
+  excluded from the possible contributors, edges between consequences are
+  ignored and edges between a non consequence and a consequence are pre-oriented
+  toward the consequence.  
+  Information about consequence variables can be provided to `miic()`
+  in the `state_order`, by supplying an `is_consequence` column.
 
 # v1.7.0
 
 ## Features
 
-- iMIIC version introducing genuine vs putative causes, contextual variables
-  and multiple enhancements to deal with very large datasets.
-  [Ribeiro-Dantas et al., iScience 2024]
-  (https://arxiv.org/abs/2303.06423)
+* iMIIC version introducing contextual variables, genuine vs putative causes
+  and multiple enhancements to deal with very large datasets ([Ribeiro-Dantas et al., iScience 2024](https://doi.org/10.1016/j.isci.2024.109736)).  
+  Information on contextual variables can be provided to `miic()`
+  in the `state_order`, by supplying an `is_contextual`column and 
+  genuine vs putative causes can be tuned by the newly added parameter
+  `ort_consensus_ratio`.
 
 # v1.6.0
 
 ## Features
 
-- Enhancement of orientations using mutual information supremum principle for 
-  finite datasets.
-  [Cabeli et al., Why21 at NeurIPS 2021]
-  (http://kinefold.curie.fr/isambertlab/PAPERS/cabeli_Why21-NeurIPS2021.pdf)
+* Enhancement of orientations using mutual information supremum principle for 
+  finite datasets ([Cabeli et al., Why21 at NeurIPS 2021](http://kinefold.curie.fr/isambertlab/PAPERS/cabeli_Why21-NeurIPS2021.pdf)).  
+  The use of enhanced orientations is controlled by the newly added parameter
+  `negative_info` of `miic()` and is activated by default.
 
-- By default, MIIC does not propagate orientations anymore
+* By default, `miic()` does not propagate orientations anymore
   and allows latent variables discovery during orientation step. 
 
 # v1.5.3
 
 ## Features
 
-- Release to CRAN
+* Release to CRAN
 
 # v1.5.2
 
 ## Fixes and improvements
-- Further refactoring of the C++ code for the computation of information.
+* Further refactoring of the C++ code for the computation of information.
 
-- Fix minor bugs in the continuous computation.
+* Fix minor bugs in the continuous computation.
 
-- Fix incompatibility with older versions of GCC (std::align).
+* Fix incompatibility with older versions of GCC (std::align).
 
 # v1.5.1
 
 ## Fixes and improvements
-- Fix various bugs in the computation of information in the presence of NA
+* Fix various bugs in the computation of information in the presence of NA
   values in the dataset.
 
-- An overhaul of the C++ code base, better memory management, computation time
+* An overhaul of the C++ code base, better memory management, computation time
   and code readability.
 
 # v1.5.0
 
 ## Features
-- Add a column `consensus` to the reconstructed graph's edges summary associated
+* Add a column `consensus` to the reconstructed graph's edges summary associated
   with the option `consistent`, and a new parameter `consensus_threshold`
   accordingly.
 
-- Add a parameter `ori_proba_ratio` to have more control on the orientation of
+* Add a parameter `ori_proba_ratio` to have more control on the orientation of
   edges.
 
 ## Fixes and improvements
-- Faster post processing in R.
+* Faster post processing in R.
 
-- Rework plot functionality.
+* Rework plot functionality.
 
-- Fix a bug in the orientation part about the log score.
+* Fix a bug in the orientation part about the log score.
 
-- Refactor of the C++ code base (orientation).
+* Refactor of the C++ code base (orientation).
 
 # v1.4.2
 
 ## Fixes and improvements
-- Various fixes of memory leaks and ambiguous function calls (at least for all
+* Various fixes of memory leaks and ambiguous function calls (at least for all
   that appear in CRAN check).
 
-- Refactor of the C++ code base (confidence cut).
+* Refactor of the C++ code base (confidence cut).
 
 ## Known issues
-- Error when running the cosmicCancer example on CRAN's Solaris system.
+* Error when running the cosmicCancer example on CRAN's Solaris system.
 
 ## Miscellaneous
-- Move from BitBucket to GitHub, the repo is now public.
+* Move from BitBucket to GitHub, the repo is now public.
 
 # v1.4.1
 
@@ -118,30 +176,30 @@ CRAN).
 # v1.4.0
 
 ## Incompatible changes
-- Standardize the API naming convention: `snake_case` for parameters and
+* Standardize the API naming convention: `snake_case` for parameters and
   `camelCase` for functions. This should have led to a major version increment to
   v2.0.0 given the previous version on CRAN is v1.0.3. But v1.0.3 and earlier
   versions were not properly maintained and versioned under a version control
   system (so we actually forgot to take them into consideration when releasing
   this version).
 
 ## Features
-- The method now works with continuous variables (solely or mixed with discrete
+* The method now works with continuous variables (solely or mixed with discrete
   variables), thanks to the discretization method as described in
-  [Cabeli et al., PLoS Comp. Bio. 2020](https://doi.org/10.1371/journal.pcbi.1007866).
+  [Cabeli et al., PLoS Comput. Biol. 2020](https://doi.org/10.1371/journal.pcbi.1007866).
 
-- Add an option `consistent` to improve the reconstructed graph's
+* Add an option `consistent` to improve the reconstructed graph's
   interpretability based on schemes as described in
   [Li et al., NeurIPS 2019](https://papers.nips.cc/paper/9573-constraint-based-causal-structure-learning-with-consistent-separating-sets).
 
 ## Fixes and improvements
-- Various fixes of memory leaks and typos.
+* Various fixes of memory leaks and typos.
 
-- Major refactoring of the old C++ code base (still WIP) to improve readability
+* Major refactoring of the old C++ code base (still WIP) to improve readability
   and flexibility, and to enforce proper coding style and documentation.
 
-- Enforce proper coding style for the R code base.
+* Enforce proper coding style for the R code base.
 
 ## Known issues
-- Still have some memory leaks and CRAN check errors and notes on certain
+* Still have some memory leaks and CRAN check errors and notes on certain
   platforms.