Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update scripts #25

Merged
merged 9 commits into from
Jul 3, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
29 changes: 15 additions & 14 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -10,10 +10,10 @@
## Table of Contents

- [Introduction](#introduction)
- [Installation](#installation)
- [Datasets](#datasets)
- [Models](#models)
- [Tasks](#tasks)
- [Installation](#installation)
- [Developing](#🧑🏿‍💻-developing)

## Introduction
Expand All @@ -22,6 +22,16 @@ AtomGen provides a robust framework for handling atomistic graph datasets focusi

It streamlines the process of aggregation, standardization, and utilization of datasets from diverse sources, enabling large-scale pre-training and generative modeling on atomistic graphs.


## Installation

The package can be installed using poetry:

```bash
python3 -m poetry install
source $(poetry env info --path)/bin/activate
```

## Datasets

AtomGen facilitates the aggregation and standardization of datasets, including but not limited to:
Expand All @@ -36,9 +46,9 @@ Currently, AtomGen has pre-processed datasets for the S2EF pre-training task for

AtomGen supports a variety of models for training on atomistic graph datasets, including:

- SchNet
- TokenGT
- Uni-Mol+ (Modified)
- **[SchNet](https://arxiv.org/abs/1706.08566)**: A continuous-filter convolutional neural network for modeling quantum interactions.
- **[TokenGT](https://github.com/jw9730/tokengt)**: Tokenized graph transformer that treats all nodes and edges as independent tokens.
- **AtomFormer**: Custom architecture that leverages gaussian pair-wise positional embeddings and self-attention to model atomistic graphs.

## Tasks

Expand All @@ -50,16 +60,7 @@ Experimentation with pre-training tasks is facilitated through AtomGen, includin

- **Coordinate Denoising**: Denoising atom coordinates.

These tasks are all facilitated through the DataCollatorForAtomModeling class and can be used simultaneously or individually.

## Installation

The package can be installed using poetry:

```bash
python3 -m poetry install
source $(poetry env info --path)/bin/activate
```
These tasks are all facilitated through the `DataCollatorForAtomModeling` class and can be used simultaneously or individually.


## 🧑🏿‍💻 Developing
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -5,8 +5,6 @@
"depth": 12,
"mlp_ratio": 4,
"k": 128,
"op_hidden_dim": 16,
"tr_hidden_dim": 16,
"dropout": 0.0,
"mask_token_id": 0,
"pad_token_id": 119,
Expand Down
Original file line number Diff line number Diff line change
@@ -1,9 +1,11 @@
{
"vocab_size": 122,
"vocab_size": 123,
"dim": 1024,
"num_heads": 16,
"num_heads": 32,
"depth": 24,
"mlp_ratio": 4,
"k": 128,
"dropout": 0.0,
"mask_token_id": 0,
"pad_token_id": 119,
"bos_token_id": 120,
Expand Down
Original file line number Diff line number Diff line change
@@ -1,9 +1,9 @@
{
"vocab_size": 123,
"dim": 768,
"num_heads": 32,
"depth": 12,
"mlp_ratio": 1,
"dim": 128,
"num_heads": 4,
"depth": 2,
"mlp_ratio": 4,
"k": 128,
"op_hidden_dim": 16,
"tr_hidden_dim": 16,
Expand Down
12 changes: 0 additions & 12 deletions atomgen/models/configs/transformer-base.json

This file was deleted.

15 changes: 0 additions & 15 deletions atomgen/models/configs/transformer-mini.json

This file was deleted.

12 changes: 0 additions & 12 deletions atomgen/models/configs/transformer-small.json

This file was deleted.

12 changes: 0 additions & 12 deletions atomgen/models/configs/transformer-tiny.json

This file was deleted.

16 changes: 0 additions & 16 deletions atomgen/models/configs/unimolplus-mini.json

This file was deleted.

46 changes: 46 additions & 0 deletions atomgen/models/configuration_atomformer.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,46 @@
"""Configuration class for Atomformer."""

from typing import Any

from transformers.configuration_utils import PretrainedConfig


class AtomformerConfig(PretrainedConfig): # type: ignore
r"""
Configuration of a :class:`~transform:class:`~transformers.AtomformerModel`.

It is used to instantiate an Atomformer model according to the specified arguments.
"""

model_type = "atomformer"

def __init__(
self,
vocab_size: int = 123,
dim: int = 768,
num_heads: int = 32,
depth: int = 12,
mlp_ratio: int = 1,
k: int = 128,
dropout: float = 0.0,
mask_token_id: int = 0,
pad_token_id: int = 119,
bos_token_id: int = 120,
eos_token_id: int = 121,
cls_token_id: int = 122,
**kwargs: Any,
) -> None:
super().__init__(**kwargs)
self.vocab_size = vocab_size
self.dim = dim
self.num_heads = num_heads
self.depth = depth
self.mlp_ratio = mlp_ratio
self.k = k

self.dropout = dropout
self.mask_token_id = mask_token_id
self.pad_token_id = pad_token_id
self.bos_token_id = bos_token_id
self.eos_token_id = eos_token_id
self.cls_token_id = cls_token_id
Loading