vmm

This project contains code to create an adapted classifier for classifying borehole descriptions into textures, admixtures, and colors. It leverages GEOBERTje, a language model trained on geological borehole descriptions.

The code consists of:

A preprocessing module (data/dataloader.py): This module loads and preprocesses the lithology data.
A dataloader module (data/dataloader.py): This module creates torch dataloaders for the lithology data, split into train, validation, and test sets. Moreover, it tokenizes the text data using the GEOBERTje tokenizer and obtains the embeddings for the tokens. By doing this in advance, we can speed up the training process.
A model module (models/independent.py): This module contains the model architecture, which consists of 2-layered MLP classifier.
A config module (configs/): This module contains the configuration files for the data and model.
A main module (main.py): This module contains the main code to load the data, train the model, and evaluate the model.

Setting Up Project

This project uses Hatch for environment management and dependency installation. To get started, install Hatch:

pip install hatch

Then, create the environment:

hatch env create

Activate the environment:

hatch shell

For more details, refer to the Hatch documentation.

To register a kernel in a notebook using Hatch, you can activate your Hatch shell and spin up a notebook server which you can access through your browser or your favorite editor.

Running the Code

Configuration

Ensure you have the configuration files data_config.yaml and model_config.yaml in the vmm/configs directory. These files should contain the necessary configuration for the data and model.

Run the code

The loading and preprocessing of the lithology data, as well as the model training and evaluation can be done through:

python vmm/main.py # run from vmm/src

If desired, the necessary modules can also be loaded into a notebook to run interactively.

Logging

The project uses Python's built-in logging module to log messages. Logs will be displayed in the terminal when you run the code.

License

vmm is distributed under the terms of the MIT license.

Name		Name	Last commit message	Last commit date
Latest commit History 19 Commits
src/vmm		src/vmm
tests		tests
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
pyproject.toml		pyproject.toml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

vmm

Setting Up Project

Running the Code

Configuration

Run the code

Logging

License

About

Uh oh!

Releases

Packages

Uh oh!

Languages

License

B-Deforce/vmm

Folders and files

Latest commit

History

Repository files navigation

vmm

Setting Up Project

Running the Code

Configuration

Run the code

Logging

License

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Languages

Packages