Skip to content

coding resources for molecular biologists, covering R, Markdown & GitHub

Notifications You must be signed in to change notification settings

ec363/coding_resources

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

19 Commits
 
 
 
 
 
 

Repository files navigation

Coding Resources for Molecular Biologists

This is a list of resources that I, as a wet lab molecular biologist working in synthetic biology, have found helpful for learning to code. The vast majority of the resources are by no means exclusive to molecular/cellular biologists, but I will probably have a particular perspective for what is important and useful as a result of my background. It is centred around the R language, but also includes other important tools such as GitHub for version control and Markdown for text editing in coding-friendly formats. This document will be continually updated as/where appropriate.


R

The two most widely-used languages in my field are R and Python. In an ideal world one would probably learn both, however people often have preferences depending on their immediate needs/the language they are most comfortable with. I think for the basics of learning to code and data visualisation, it doesn't matter which you use. It is often smartest to learn the language most used by those around you, and/or the language that is being most actively developed specifically for the application you require it for, if your use case is somewhat specialised.

You will often hear that R is a statistical language, developed for statisticians and data scientists. This is true, but this is not all R is. R has been under active development for biological data analysis for about 20 years. R was conceived in 1992, became popular in 2000, and the Bioconductor project began in 2001. Bioconductor is a project that hosts bioinformatics-specific packages for the analysis of high-throughput sequencing data, but also includes packages specific for the analysis of other kinds of data, such as from flow cytometry. It has an estimated 1000 active developers and 300,000 active users (source: Credibly Curious). So R is well-placed to be the language of choice for these specialised bioinformatics use-cases. But it can just as justifiably be used for the analysis of smaller and less specialised data sets (indeed I use it for that all the time), particularly since its data visualisation capabilities are incredibly powerful (in other words, it's easy to use R to make really stunning figures). Beyond this, both developers and users of R, as a community, are proactively open source and uphold a friendly attitude to newcomers, and there are now countless projects whose objective is to make the language easier to use and applicable to a broader and broader set of applications.

In short, I'd recommend it.


Other programming languages

Curated lists of courses (online and in person) can be found here:


Markdown

Markdown is essentially a way of writing formatted files in plain text: like HTML but easier. You've probably come across very similar 'lightweight' text editing methods before, such as in messaging apps that let you bold text written between asterisks (*) or put text in italics when wrapped with underscores (_). Markdown was designed to be unobtrusive (readable) and convertible into other file types. More flexible than Word, simpler to learn than HTML or LaTeX. It is useful for annotating code, as code can be inserted within markdown documents (eg. "R markdown"/Rmd type of markdown documents). It has been adopted by many platforms including GitHub (note GitHub renders .md files but not .Rmd files). This file is written in Markdown and converted to HTML (rendered) when a user clicks on a folder ('repository'/'directory') within which which a markdown file called README.md exists: on the GitHub page, click on 'README.md', then 'Raw' to see this file in plain text, as it was written.


GitHub

GitHub is useful for many things: as a backup system that makes it easy to see what was changed in each version ('version control'), for sharing code and for contributing to group projects. The latter two are the most talked about aims of GitHub, but you needn't use it to publish code or collaborate for it to be useful; I find versioning the most useful aspect. Git is the language that allows the tracking of changes to files within a folder ('repository'), and GitHub is the most popular site that uses Git for syncing such changes.

The language of Git is very weird at first and takes time to get used to. I'd recommend working through the Git commands via command line like in Blischak's paper to start. For most applications however, I use GitHub Desktop, a much nicer and more intuitive interface than command line.


Troubleshooting

There is a sort of truism in coding that there is always an answer to a question on Google. This is true, though finding the right way to phrase a question and/or getting an answer that is written in an accessible way can be a challenge. Still, a mixture of asking people, Googling and reading reference texts is often the way to get to the answer.


Organisations focused on education & community-building in coding/data science

A number of interesting organisations provide free and informal guidance on coding and related topics. Their forums are also great places to ask technical questions and meet like-minded people.

  • R-Ladies - an international community of women who use R
  • R4DS - a community of those learning/using R for Data Science
  • Researc/Hers Code - a London-based group working to promote women who use coding in their research
  • #tidytuesday - an initiative to get R users to practise their data wrangling and visualisation skills by making visualisations of a new sample dataset each week and sharing their code/plots on Twitter and GitHub

Articles and Resources on Related Topics

How to learn to code

How to manage computational biology projects

Data visualisation

About

coding resources for molecular biologists, covering R, Markdown & GitHub

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published