Releases · macrocosm-os/finetuning

07 Nov 03:00

v2.4.0

f9c499a

Release 2.4.0

This release incorporates a third evaluation task (Fineweb) into the current competition starting on block 4,250,808. At this time the weighting of each dataset will be 90% MMLU, 5% Word Sorting, and 5% Fineweb.

Subnet

Added new Fineweb evaluation task.
- This evaluation scores models based on the computed average cross entropy loss on samples from Fineweb.
- It is the same evaluation from subnet 9. Including it helps ensure the finetuned models do not lose too much of their original context.
- Includes a check to make sure models are generating reasonable output. Meaning they are not too repetitive within or across responses.
Improved definition of the competition schedule to include eval tasks.
- This makes it easier to add new evaluations to competitions at specific weights and makes it easier to view them as a miner.
- See COMPETITION_SCHEDULE_BY_BLOCK in constants/__init.py__ to view for yourself.

Validators

Improved the logic around strategy selection for sharing files across subprocess boundaries. This will help avoid overflowing /dev/shm.

Miners

The new dataset loader for the fineweb task can be found at https://github.com/macrocosm-os/finetuning/blob/main/finetune/datasets/hugging_face/hugging_face_loader.py.
- As mentioned this will be incorporated into the existing competition starting in block 4,250,808 so please take this into consideration for your training.
- Note that this supports general hugging face datasets. Currently constants are included for Falcon and Fineweb. The current competition is only using Fineweb data.
Validators should update as soon as they can. Note that due to requirement version updates you will need to rerun
python -m pip install -e .

Assets 2

01 Nov 21:10

RusticLuftig

v2.3.0

9598e92

Release 2.3.0 Latest

Latest

This release addresses the current wandb sampling issue from SN 1 and adds functionalities to improve v-trust.

V-trust improvements:

We've improved the PromptingDatasetLoader to more reliably and consistently fetch samples. Validators will now fetch 700 samples instead of 400
Validators now align to "sync blocks" to use the same set of eval samples, as well as pace how frequently evaluations are performed. This should improve v-trust across the board, particularly in situations where the top model changes.
Miner weights are now fully converted to a winner-takes-all, where exactly 1 model will receive weight. Previously a 2nd model could receive a small amount of weight (due to soft-maxing of weights) if enough models were evaluated in a batch
Added better retry behavior for set_weights

Assets 2

28 Oct 05:13

RusticLuftig

v2.2.1

ecd856f

Release 2.2.1

This is a minor release to address the current ongoing issue with SN 1's wandb integration. If there are no samples to use for synthetic MMLU, that evaluation task will be skipped and all weight will be given to the remaining evaluation tasks (currently, just word sorting).

Other fixes

Fixes model lookup issue that can occur if a hotkey is reregistered.

Assets 2

24 Oct 01:58

Sid-Data-Universe

v2.2.0

ed975ab

Release 2.2.0.

Adds a new Word sorting eval task at block 4139465. At first, it's worth 2.5% of a miner's score
Fixed grace period check during model cleanup to respect the most recent file instead of the oldest file in a folder.
Validators now use a seed generated from the hash of a recent block for the dataset loaders. This will improve vTrust as validators will use the same seed if they evaluate within the same ~30 minute window.

Assets 2

08 Oct 02:59

Sid-Data-Universe

v2.1.2

8e2b0cb

Release 2.1.2.

VTrust improvements and code cleanups.

Use a deterministic generation configuration for model evaluation. This ensures that validators evaluating the same models over the same samples will get the same results.
Increase the number of samples from 300 to 400.
Cleanup code relating to the now deprecated competition. The provided miner is now a shell that needs to be filled in based on your training strategy.
Fixed a few example notebooks to work against the refactored codebase.

Assets 2

14 Sep 03:28

RusticLuftig

v2.1.1

656c4a8

Release 2.1.1

Hotfix release to address the wandb issue that causes the main thread to hang indefinitely.

There are currently 7 running runs in the prompting wandb prompting. One of those runs (hhodrv2s) is poisoned and all attempts to perform a history scan on it result in a 502 from wandb. Furthermore, the wandb client will INFINITELY RETRY, which is ridiculous.

This change addresses the issue in 2 ways:

We reimplement the wandb history client so we can add a sane amount of retries (3). We combine this with a reduction in collected samples to 300, to make it more likely we'll fulfill the 300 samples from 6 runs, should one be poisoned in future.
We also use a sampled history scan, which additionally filters (server-side) the steps returned to only those that contain the requested keys. The returned steps also only contain the requested metrics. As a result, it now takes a few seconds to load 300 samples rather than the ~1-2 minutes before!

Assets 2

13 Sep 17:10

RusticLuftig

v2.1.0

bb0d82e

Release 2.1.0

Sunsets the SN9_MODEL competition

Assets 2

12 Sep 03:02

Sid-Data-Universe

v2.0.1

920f483

Release 2.0.1

B7_MULTI_CHOICE competition fixes and improvements:

Fixes a bug in B7_MULTI_CHOICE competition in computing weights.
Adjust epsilon for B7_MULTI_CHOICE to start at 0.05 and decay to 0.01 over 5 days.
Double the number of samples used for B7_MULTI_CHOICE evaluations.

Doubles the number of models kept between evaluations for all competitions from 2 to 4.

Assets 2

05 Sep 23:18

Sid-Data-Universe

v2.0.0

a97e3d6

Release 2.0.0

Adds a new B7_MULTI_CHOICE competition.

Weights are annealed slowly from a 100/0 split to a 50/50 split at a rate of 5% per 3600 blocks.
Multiple choice question data is loaded from the PromptingSubnetLoader which pulls from SN1 wandb logs.
Evaluation asks the models the multiple choice questions and takes the first possible answer returned.

Across all competitions we only keep the 2 top models from eval to eval to speed up the evaluation loop.
Across all competitions we will start using an epsilon with a linear decay instead of a fixed epsilon value.

Note: This includes a bump to the validator state version and therefore local state will automatically be wiped on update.
Note: This includes some requirement bumps, so validators will also need to run python -m pip install -e . to update.

Assets 2

23 Aug 23:37

Sid-Data-Universe

v1.0.3

55b7ad0

Release 1.0.3

Announcing Release 1.0.3.

This release brings us up to bittensor 3.9.4, improves logging, and further prepares us for multiple competitions.

Subnet Improvements

Documentation improvements.
Refactor onto our taoverse library for shared development with subnet 9.

Validator Improvements

Improvements to logging (especially in regards to miner hugging face information).
Keep models by competition specific weights.

Assets 2

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Release 2.4.0

Subnet

Validators

Miners

Other fixes

Releases: macrocosm-os/finetuning

Release 2.4.0

Release 2.4.0

Subnet

Validators

Miners

Release 2.3.0

Release 2.2.1

Other fixes

Release 2.2.0.

Release 2.1.2.

Release 2.1.1

Release 2.1.0

Release 2.0.1

Release 2.0.0

Release 1.0.3