EnsemblFS Quickstart repository

A meta-repository of all the pieces needed to get EnsemblFS up and running:

FuseSharp, used to build FUSE file systems in C#
Ensembl.NET, a C# library layer on Ensembl databases (like the Bio::Ensembl perl library)
EnsemblFS, the actual file system code

EnsemblFS is a virtual file system that presents an Ensembl Human Genome database as a set of files for easier exploration.

For a quick intro to the Ensembl database structure and how EnsemblFS works with it, see HOWITWORKS.

CAVEAT EMPTOR

THIS IS VERY EXPERIMENTAL CODE. Please don't use this for any production use. See Notes, below, for a list of issues.

Prerequisites

NOTE: EnsemblFS only runs on macOS at this time. If there is enough interest, I will port this to Linux and/or Windows. File an issue here to request it.

EnsemblFS has only been tested on macOS Catalina (10.15) but should work on Leopard (10.5) and later.
Requires Dotnet Core 3.0. Install the SDK from here.
Requires Xcode Command Line Tools. Install with xcode-select --install.
Requires OSXFUSE. Install with brew cask install osxfuse.
Requires glib. Install with brew install glib.
(optional) If you want to run EnsemblFS against a local database, you'll need to download and install the Ensembl Data according to these instructions.
(optional) Download and install Visual Studio Code. The three projects that make up EnsemblFS were all developed in VS Code and include helpful configurations that make running and debugging much simpler.

Building and running (command line)

Clone this repository locally: git clone https://github.com/stephen-riley/ensembl-fs-quickstart.git
Change directory into the quickstart repo: cd ensembl-fs-quickstart
Initialize and download the submodules: git submodule init && git submodule update
Build the FuseSharp adapter library: FuseSharp/src/Adaptor/buildadaptor
Set configuration: cp ensembl-net/etc/ensembl.conf ~/.ensembl.conf
Change directory to the main project: cd ensembl-fs/EnsemblFS
Run EnsemblFS, mounting it in /tmp: dotnet run -- /tmp/ensemblfs

Building a running (Visual Studio Code)

Clone this repository locally: git clone https://github.com/stephen-riley/ensembl-fs-quickstart.git
Change directory into the quickstart repo: cd ensembl-fs-quickstart
Initialize and download the submodules: git submodule init && git submodule update
Build the FuseSharp adapter library: FuseSharp/src/Adaptor/buildadaptor
Set configuration: cp ensembl-net/etc/ensembl.conf ~/.ensembl.conf
Fire up Visual Studio Code: code .
From the Debug menu, select Start Debugging to run EnsemblFS mounted at /tmp/ensemblfs

Using EnsemblFS

Currently EnsemblFS supports viewing chromosome data for the selected species. To see a list of supported species, simply execute ls on the EnsemblFS mount point:

The overall directory structure is as follows:

ensemblfs/
    species/
        proteins/       # not yet implemented
        features/       # not yet implemented
        chromosomes/
            1/
                REF
            2/
            :

The REF files contain the actual base pair data (A, T, C, G, and N). See the Ensembl site for more information.

Configuration

The file at ensembl-net/etc/ensembl.conf contains the information necessary to connect to the desired Ensembl database. This file must be copied to ~/.ensembl.conf (or /etc/ensembl.conf for a global configuration). It defaults to connecting to one of Ensembl's US-based public databases, which contains almost 500 different versions of different species.

How it works

See HOWITWORKS.md in the EnsemblFS project.

Notes

The code is built for Ensembl database version 99. I've not tested it against version 98 schemas, so YMMV.
Only chromosome base pair data is supported right now.
See this issue for some notes on your Terminal configuration. (tl;dr: Set your Terminal Scrollback setting to "Limit number of rows to: 10,000" under Profiles->Window).
If you see mount_osxfuse: mount point /private/tmp/ensemblfs is itself on a OSXFUSE volume, that means you need to unmount the virtual file system. Just run the umount command, specifying the mount point; eg. umount /private/tmp/ensemblfs.

Why this exists

I've been meaning to try FUSE for a while, and recently a friend got hired at a genetics lab software company, which rekindled my interest in molecular biology and the Human Genome Project. This seemed like a fun way to combine these things.

Name		Name	Last commit message	Last commit date
Latest commit History 20 Commits
.vscode		.vscode
FuseSharp @ a86ccc4		FuseSharp @ a86ccc4
ensembl-fs @ 5e2c89d		ensembl-fs @ 5e2c89d
ensembl-net @ 3d8cad9		ensembl-net @ 3d8cad9
img		img
.gitignore		.gitignore
.gitmodules		.gitmodules
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

EnsemblFS Quickstart repository

CAVEAT EMPTOR

Prerequisites

Building and running (command line)

Building a running (Visual Studio Code)

Using EnsemblFS

Configuration

How it works

Notes

Why this exists

About

Releases

Packages

stephen-riley/ensembl-fs-quickstart

Folders and files

Latest commit

History

Repository files navigation

EnsemblFS Quickstart repository

CAVEAT EMPTOR

Prerequisites

Building and running (command line)

Building a running (Visual Studio Code)

Using EnsemblFS

Configuration

How it works

Notes

Why this exists

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Packages