Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
51 changes: 51 additions & 0 deletions examples/aiXcoder-7B/full.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,51 @@
### data
train_dataset_type: erniekit
eval_dataset_type: erniekit
train_dataset_path: /workspace/pretrainning/data/pt/train_sft.jsonl
train_dataset_prob: "1.0"
eval_dataset_path: /workspace/pretrainning/data/pt/eval_sft.jsonl
eval_dataset_prob: "1.0"
max_seq_len: 8192
num_samples_each_epoch: 6000000
packing: false
mix_strategy: concat

### model
model_name_or_path: /workspace/aiXcoder-7B
attn_impl: flashmask

### finetuning
# base
stage: SFT
fine_tuning: full
seed: 23
do_train: true
do_eval: true
per_device_eval_batch_size: 1
per_device_train_batch_size: 1
num_train_epochs: 1
max_steps: -1
eval_steps: 100
evaluation_strategy: steps
save_steps: 100
save_total_limit: 1
save_strategy: steps
logging_steps: 1
gradient_accumulation_steps: 4
logging_dir: /workspace/pretrainning/vdl_log
output_dir: /workspace/pretrainning/checkpoints/aixcoder-7b-base-pd-converted_sft_ckpts
disable_tqdm: true
eval_accumulation_steps: 16

# train
warmup_steps: 20
learning_rate: 1.0e-5

# performance
tensor_parallel_degree: 1
pipeline_parallel_degree: 1
sharding: stage2
recompute: true
bf16: true
fp16_opt_level: O2
unified_checkpoint: true
54 changes: 54 additions & 0 deletions examples/aiXcoder-7B/full_tp_pp.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,54 @@
### data
train_dataset_type: erniekit
eval_dataset_type: erniekit
train_dataset_path: /workspace/pretrainning/data/pt/train_sft.jsonl
train_dataset_prob: "1.0"
eval_dataset_path: /workspace/pretrainning/data/pt/eval_sft.jsonl
eval_dataset_prob: "1.0"
max_seq_len: 1024
num_samples_each_epoch: 100
packing: true
mix_strategy: concat

### model
model_name_or_path: /workspace/aixcoder-7b-base-pd-converted
convert_from_hf: false
save_to_hf: false
attn_impl: flashmask

### finetuning
# base
stage: SFT
fine_tuning: full
seed: 23
do_train: true
do_eval: true
per_device_eval_batch_size: 1
per_device_train_batch_size: 1
num_train_epochs: 1
max_steps: -1
eval_steps: 100
evaluation_strategy: steps
save_steps: 100
save_total_limit: 1
save_strategy: steps
logging_steps: 1
gradient_accumulation_steps: 4
logging_dir: /workspace/pretrainning/vdl_log
output_dir: /workspace/pretrainning/checkpoints/aixcoder-7b-base-pd-converted_sft_ckpts_parallel
disable_tqdm: true
eval_accumulation_steps: 16

# train
warmup_steps: 20
learning_rate: 1.0e-5

# performance
tensor_parallel_degree: 8
pipeline_parallel_degree: 1
sequence_parallel: true
sharding: stage1
recompute: true
bf16: true
fp16_opt_level: O2
unified_checkpoint: true
140 changes: 140 additions & 0 deletions paddleformers/transformers/aixcoder/LICENSE
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

license文件可删除

Original file line number Diff line number Diff line change
@@ -0,0 +1,140 @@
AIXCODER-7B MODEL LICENSE AGREEMENT

aiXcoder-7B Version Release Date: 2024

"Agreement" means the terms and conditions for use, reproduction, distribution and
modification of the aiXcoder Materials set forth herein.

"Documentation" means the specifications, manuals and documentation
accompanying aiXcoder-7B distributed by aiXcoder at
https://huggingface.co/aiXcoder/aixcoder-7b-base.

"Licensee" or "you" means you, or your employer or any other person or entity (if
you are entering into this Agreement on such person or entity's behalf), of the age
required under applicable laws, rules or regulations to provide legal consent and that
has legal authority to bind your employer or such other person or entity if you are
entering in this Agreement on their behalf.

"aiXcoder-7B" means the foundational large language models and software and
algorithms, including machine-learning model code, trained model weights,
inference-enabling code, training-enabling code, fine-tuning enabling code and other
elements of the foregoing distributed by aiXcoder at
https://huggingface.co/aiXcoder/aixcoder-7b-base.

"aiXcoder Materials" means, collectively, aiXcoder's proprietary aiXcoder-7B and
Documentation (and any portion thereof) made available under this Agreement.

"aiXcoder" or "we" means aiXcoder and its affiliates.

By using or distributing any portion or element of the aiXcoder Materials,
you agree to be bound by this Agreement.

1. License Rights and Redistribution.

a. Grant of Rights for Academic Research Use. You are granted a non-exclusive,
worldwide, non-transferable and royalty-free limited license under aiXcoder's
intellectual property or other rights owned by aiXcoder embodied in the aiXcoder
Materials to use, reproduce, distribute, copy, create derivative works of, and make
modifications to the aiXcoder Materials solely for academic research purposes.

b. Commercial Use. For commercial use of the aiXcoder Materials, you must
apply for a commercial license by sending an email to [email protected].
Commercial use without explicit written permission from aiXcoder is prohibited.

c. Redistribution and Use.

i. If you distribute or make the aiXcoder Materials, or any derivative works
thereof, available to a third party, you shall provide a copy of this Agreement to such
third party.

ii. You must retain in all copies of the aiXcoder Materials that you
distribute the following attribution notice within a "Notice" text file distributed as a
part of such copies: "aiXcoder-7B is licensed under the aiXcoder Model License,
Copyright (c) aiXcoder. All Rights Reserved."

iii. Your use of the aiXcoder Materials must comply with applicable laws
and regulations (including trade compliance laws and regulations).

iv. You will not use the aiXcoder Materials or any output or results of the
aiXcoder Materials to improve any other large language model (excluding aiXcoder-7B
or derivative works thereof) without explicit permission.

2. Restrictions.

You will not, and will not permit, assist or cause any third party to:

a. use, modify, copy, reproduce, create derivative works of, or distribute the
aiXcoder Materials (or any derivative works thereof, works incorporating the aiXcoder
Materials, or any data produced by the Software), in whole or in part, for (i) any
commercial or production purposes without proper license, (ii) military purposes or in
the service of nuclear technology, (iii) purposes of surveillance, including any research
or development relating to surveillance, (iv) biometric processing without proper consent,
(v) in any manner that infringes, misappropriates, or otherwise violates any third-party
rights, or (vi) in any manner that violates any applicable law;

b. alter or remove copyright and other proprietary notices which appear on or in
the aiXcoder Materials;

c. utilize any equipment, device, software, or other means to circumvent or remove
any security or protection used by aiXcoder in connection with the Software, or to
circumvent or remove any usage restrictions, or to enable functionality disabled by
aiXcoder.

3. Disclaimer of Warranty. UNLESS REQUIRED BY APPLICABLE LAW, THE
AIXCODER MATERIALS AND ANY OUTPUT AND RESULTS THEREFROM ARE
PROVIDED ON AN "AS IS" BASIS, WITHOUT WARRANTIES OF ANY KIND,
EITHER EXPRESS OR IMPLIED, INCLUDING, WITHOUT LIMITATION, ANY
WARRANTIES OF TITLE, NON-INFRINGEMENT, MERCHANTABILITY, OR
FITNESS FOR A PARTICULAR PURPOSE. YOU ARE SOLELY RESPONSIBLE
FOR DETERMINING THE APPROPRIATENESS OF USING OR REDISTRIBUTING
THE AIXCODER MATERIALS AND ASSUME ANY RISKS ASSOCIATED WITH YOUR
USE OF THE AIXCODER MATERIALS AND ANY OUTPUT AND RESULTS.

4. Limitation of Liability. IN NO EVENT WILL AIXCODER OR ITS AFFILIATES BE
LIABLE UNDER ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, TORT,
NEGLIGENCE, PRODUCTS LIABILITY, OR OTHERWISE, ARISING OUT OF THIS
AGREEMENT, FOR ANY LOST PROFITS OR ANY INDIRECT, SPECIAL,
CONSEQUENTIAL, INCIDENTAL, EXEMPLARY OR PUNITIVE DAMAGES, EVEN
IF AIXCODER OR ITS AFFILIATES HAVE BEEN ADVISED OF THE POSSIBILITY OF
ANY OF THE FOREGOING.

5. Intellectual Property.

a. No trademark licenses are granted under this Agreement, and in
connection with the aiXcoder Materials, neither aiXcoder nor Licensee may use any name
or mark owned by or associated with the other or any of its affiliates, except as
required for reasonable and customary use in describing and redistributing the
aiXcoder Materials.

b. Subject to aiXcoder's ownership of aiXcoder Materials and derivatives made by or
for aiXcoder, with respect to any derivative works and modifications of the aiXcoder
Materials that are made by you, as between you and aiXcoder, you are and will be the
owner of such derivative works and modifications.

c. If you institute litigation or other proceedings against aiXcoder or any entity
(including a cross-claim or counterclaim in a lawsuit) alleging that the aiXcoder
Materials or aiXcoder-7B outputs or results, or any portion of any of the foregoing,
constitutes infringement of intellectual property or other rights owned or licensable
by you, then any licenses granted to you under this Agreement shall terminate as of
the date such litigation or claim is filed or instituted. You will indemnify and hold
harmless aiXcoder from and against any claim by any third party arising out of or related
to your use or distribution of the aiXcoder Materials.

6. Term and Termination. The term of this Agreement will commence upon your
acceptance of this Agreement or access to the aiXcoder Materials and will continue in
full force and effect until terminated in accordance with the terms and conditions
herein. aiXcoder may terminate this Agreement if you are in breach of any term or
condition of this Agreement. Upon termination of this Agreement, you shall delete
and cease use of the aiXcoder Materials. Sections 3, 4, 5 and 7 shall survive the
termination of this Agreement.

7. Governing Law and Jurisdiction. This Agreement will be governed and
construed under the laws of the People's Republic of China without regard to choice of
law principles. The courts of China shall have jurisdiction of any dispute arising out of
this Agreement.

8. Contact Information. For commercial licensing inquiries or any questions regarding
this Agreement, please contact: [email protected]

9. Acknowledgments. We would like to thank all contributors to the open-source
projects and datasets that made this work possible.
Loading
Loading