What has been done and how can you help?

This repository serves as place for administrative things and a place for generic todos and issues. If you're new to BigCode, we recommend reading the section below to see how you can contribute to the project.

What has been done and how can you help?

📚Dataset: We’ve collected 130M+ Github repositories and ran a license detector on them. We will soon release a large dataset with files from repositories with permissive licenses. Besides Github, we would like to add more datasets and create a Code Dataset Catalogue.

Open tickets:

We encourage you to join #wg-dataset if you are interested in discussions about data governance (e.g. regarding the ethical and legal concerns of the training data, OpenRAIL licenses for code applications, etc).

🕵🏻‍♀️Evaluation: We started working on an evaluation harness to evaluate code generation models in an easy way on a wide range of tasks.

Open tickets:

Please join #wg-evaluation for all discussions on the evaluation of code LLMs.

💪Training: We’ve been training smaller models (350M-1B parameters) on the ServiceNow cluster through a fork of Megatron-LM.

We’ve ported ALiBi in order to support longer sequences at inference time
We’ve implemented multi-query attention so as to speed-up incremental decoding
The goal is to scale to a ~15B parameter model. We will, however, first run several ablation studies on a smaller scale. We will soon release our experiment plan and ask for your feedback!

We encourage you to get in touch with us at #wg-training if you have experience with large-scale transformer training in a multi node setup.

🏎 Inference: We’ve implemented multi-query attention in Transformers and Megatron-LM. While others have reported up to a 10x decoding speed-up over a multi-head attention baseline, we’ve only seen modest improvements of ~25%.

Open tickets:

Improve inference speed of multi-query attention model

Please go to the #wg-inference channel for technical discussion on how to improve inference speed of LLMs. You can find a summary of all open tickets here.

Name		Name	Last commit message	Last commit date
Latest commit History 6 Commits
CODE-OF-CONDUCT.md		CODE-OF-CONDUCT.md
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

What has been done and how can you help?

Overview of our repositories

Megatron-LM

bigcode-analysis

bigcode-evaluation-harness

bigcode-website

About

Releases

Packages

bigcode-project/admin

Folders and files

Latest commit

History

Repository files navigation

What has been done and how can you help?

Overview of our repositories

Megatron-LM

bigcode-analysis

bigcode-evaluation-harness

bigcode-website

About

Resources

Code of conduct

Stars

Watchers

Forks

Releases

Packages 0

Packages