Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Current blog post for the R Consortium #5

Open
njtierney opened this issue Nov 7, 2018 · 0 comments
Open

Current blog post for the R Consortium #5

njtierney opened this issue Nov 7, 2018 · 0 comments

Comments

@njtierney
Copy link
Contributor

An update on the project , "A unified platform for missing values methods and workflows"

Nicholas Tierney
2018-11-07

The team was initially composed of Julie Josse and Nicholas Tierney. Julie subsequently recruited Nathalie Vialaneix, who has brought great strength to the team, having recently written an article summarising the many R packages for missing data.

The team met virtually many times throughout 2018 to discuss the plan for the year, and decided upon a name for the group, "R-Miss-Tastic". The name was chosen because it evokes a feeling of the words "Little Miss Fantastic", "Armistice", as well as the "R" language, and the word "miss". Together, we see this name as symbolising that we are working with R, we want to do a fantastic job with missing data, and we want to bring peace (armistice), to the often frustrating task of working with missing data. We created a GitHub organisation to house our ideas as repositories (https://github.com/R-miss-tastic), in an open and public way. We were also very fortunate to be able to meet in person at the 2018 UseR conference, held in Brisbane, Australia.

Our project is comprised of four parts:

  1. Curating available R packages for missing data in a CRAN task View
  2. Curating articles and related work on missing data by theme
  3. Collating Tutorials and workflows
  4. Future extensions and beyond

The majority of our work has focussed on this first part:

Curating available R packages for missing data in a CRAN task View

There is currently no task view specific to missing data, aside from sections social sciences (https://cran.r-project.org/web/views/SocialSciences.html) and Multivariate Statistics (https://cran.r-project.org/web/views/Multivariate.html). We curated a list of packages based on an initial search of all packages containing key words. The packages were then summarised and collated into a CRAN Task View (CTV). The CTV is now completed and has been published online: https://cran.r-project.org/web/views/MissingData.html. The development version of the CTV is available here at this github repository, https://github.com/R-miss-tastic/missing-data-taskview. The process for generating the CTV can be seen here: R-miss-tastic/missing-data-taskview#5.

The missing data CTV has been received well in the community. It was retweeted and favourited many times on twitter (https://twitter.com/AchimZeileis/status/1054655578700742657), and also had a separate blog post praising it on RStudio's R Views blog: https://rviews.rstudio.com/2018/10/26/cran-s-new-missing-values-task-view/. We have since been contacted many times by packages authors enquiring about how they can add their own package to the CRAN task view. We know that those who work with and develop tools and analyses for missing data are spread far around the world. And we are happy to say that being a part of a group associated with the R Consortium provides a strong focal point for members to rally behind.

The second and third parts of our work are to curate articles and related work on missing data by theme, and collate tutorials and workflows on missing data.

We have established a website to house these articles, tutorials, and workflows https://rmisstastic.netlify.com/ - and can be found at this repository. To help populate the website and curate the articles, Julie Josse has hired a talented student, Imke Mayer. Imke is currently organizing meetings with main actors of the missing values community, and collating these articles. To ensure robustness we will also put a call out for authors to submit their articles or works, and reviewers to review their placement on the platform/website.

We have not yet reached the final step of our proposal, future extensions and beyond. But we believe that by providing a platform and community to discuss missing data in R, software, and approaches and workflows, we are providing a base from which we can grow. We hope that the website and the R-miss-tastic community could eventually tackle more ambitious ideas, such as collecting datasets to benchmark imputation methods, something currently not being done anywhere in the world. By having a community involved in this, we can then have useful discussion on the benchmarks and approaches to multiple imputation, even organize challenges to find the best imputation methods, perhaps in a similar fashion to the M4 forecasting competition (https://robjhyndman.com/hyndsight/m4comp/).

Other exciting work we would like to discuss with the community discussion includes implementing other special types of missing value other than NA, such as STATAs special missing values; altering messages in base R to encourage other approaches than deleting missing observations by default, or at least indicate the risks.

On a personal note, it has been an absolute pleasure to work with Julie, Nathalie, and Imke, and I am looking forward to the future of R-Miss-Tastic!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant