A meta-repository of all the pieces needed to get EnsemblFS up and running:
- FuseSharp, used to build FUSE file systems in C#
- Ensembl.NET, a C# library layer on Ensembl databases (like the
Bio::Ensembl
perl library) - EnsemblFS, the actual file system code
EnsemblFS is a virtual file system that presents an Ensembl Human Genome database as a set of files for easier exploration.
For a quick intro to the Ensembl database structure and how EnsemblFS works with it, see HOWITWORKS.
THIS IS VERY EXPERIMENTAL CODE. Please don't use this for any production use. See Notes
, below, for a list of issues.
NOTE: EnsemblFS only runs on macOS at this time. If there is enough interest, I will port this to Linux and/or Windows. File an issue here to request it.
- EnsemblFS has only been tested on macOS Catalina (10.15) but should work on Leopard (10.5) and later.
- Requires Dotnet Core 3.0. Install the SDK from here.
- Requires Xcode Command Line Tools. Install with
xcode-select --install
. - Requires OSXFUSE. Install with
brew cask install osxfuse
. - Requires glib. Install with
brew install glib
. - (optional) If you want to run EnsemblFS against a local database, you'll need to download and install the Ensembl Data according to these instructions.
- (optional) Download and install Visual Studio Code. The three projects that make up EnsemblFS were all developed in VS Code and include helpful configurations that make running and debugging much simpler.
- Clone this repository locally:
git clone https://github.com/stephen-riley/ensembl-fs-quickstart.git
- Change directory into the quickstart repo:
cd ensembl-fs-quickstart
- Initialize and download the submodules:
git submodule init && git submodule update
- Build the FuseSharp adapter library:
FuseSharp/src/Adaptor/buildadaptor
- Set configuration:
cp ensembl-net/etc/ensembl.conf ~/.ensembl.conf
- Change directory to the main project:
cd ensembl-fs/EnsemblFS
- Run EnsemblFS, mounting it in
/tmp
:dotnet run -- /tmp/ensemblfs
- Clone this repository locally:
git clone https://github.com/stephen-riley/ensembl-fs-quickstart.git
- Change directory into the quickstart repo:
cd ensembl-fs-quickstart
- Initialize and download the submodules:
git submodule init && git submodule update
- Build the FuseSharp adapter library:
FuseSharp/src/Adaptor/buildadaptor
- Set configuration:
cp ensembl-net/etc/ensembl.conf ~/.ensembl.conf
- Fire up Visual Studio Code:
code .
- From the
Debug
menu, selectStart Debugging
to run EnsemblFS mounted at/tmp/ensemblfs
Currently EnsemblFS supports viewing chromosome data for the selected species. To see a list of supported species, simply execute ls
on the EnsemblFS mount point:
The overall directory structure is as follows:
ensemblfs/
species/
proteins/ # not yet implemented
features/ # not yet implemented
chromosomes/
1/
REF
2/
:
The REF
files contain the actual base pair data (A, T, C, G, and N). See the Ensembl site for more information.
The file at ensembl-net/etc/ensembl.conf
contains the information necessary to connect to the desired Ensembl database. This file must be copied to ~/.ensembl.conf
(or /etc/ensembl.conf
for a global configuration). It defaults to connecting to one of Ensembl's US-based public databases, which contains almost 500 different versions of different species.
See HOWITWORKS.md in the EnsemblFS project.
- The code is built for Ensembl database version 99. I've not tested it against version 98 schemas, so YMMV.
- Only chromosome base pair data is supported right now.
- See this issue for some notes on your Terminal configuration. (tl;dr: Set your Terminal Scrollback setting to "Limit number of rows to: 10,000" under Profiles->Window).
- If you see
mount_osxfuse: mount point /private/tmp/ensemblfs is itself on a OSXFUSE volume
, that means you need to unmount the virtual file system. Just run theumount
command, specifying the mount point; eg.umount /private/tmp/ensemblfs
.
I've been meaning to try FUSE for a while, and recently a friend got hired at a genetics lab software company, which rekindled my interest in molecular biology and the Human Genome Project. This seemed like a fun way to combine these things.