Skip to content

Commit 3772e47

Browse files
authored
Fixing typos in core documentation (#61)
1 parent 0776768 commit 3772e47

File tree

4 files changed

+4
-4
lines changed

4 files changed

+4
-4
lines changed

tinker_cookbook/recipes/chat_sl/README.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -37,5 +37,5 @@ Performance can be further improved by training longer with a higher `lora_rank`
3737

3838
The base classes in [tinker_cookbook/supervised/data.py](../../supervised/data.py) support loading new data in the following way:
3939
- `SupervisedDatasetFromHFDataset` loads dataset on huggingface hub with a postprocessing function
40-
- `StreamingSupervisedDatasetFromHFDataset` works simiarly, but supports streaming
40+
- `StreamingSupervisedDatasetFromHFDataset` works similarly, but supports streaming
4141
- `FromConversationFileBuilder` supports data loading from a JSONL file

tinker_cookbook/recipes/distillation/README.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -8,7 +8,7 @@ Specifically, we provide the scripts needed to reproduce our experiments from th
88

99
## Distillation for reasoning
1010

11-
Our results can be reproducing by running:
11+
Our results can be reproduced by running:
1212
1. Supervised finetuning on [OpenThoughts3](https://huggingface.co/datasets/open-thoughts/OpenThoughts3-1.2M)
1313
2. On-policy distillation on [DeepMath](https://huggingface.co/datasets/zwhe99/DeepMath-103K)
1414

tinker_cookbook/recipes/math_rl/README.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,4 +1,4 @@
1-
# Using Reinforcement Learning to Solve Math Prolems
1+
# Using Reinforcement Learning to Solve Math Problems
22

33
Math problems have been the most active testbed for RL with LLMs. This recipe collects environments and grading functions that allows you to test on several popular math datasets.
44

tinker_cookbook/recipes/preference/README.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,6 @@
11
# Learning from Preferences
22

3-
Many applications involve learnin from preferences beyond scalar rewards. We provide a few examples here:
3+
Many applications involve learning from preferences beyond scalar rewards. We provide a few examples here:
44

55
1. [Shorter](./shorter/): we introduce the `PairwisePreferenceRLDatasetBuilder` abstraction and walk through a simple example that trains a model to generate shorter responses.
66
2. [RLHF](./rlhf/): we walk through the standard RLHF pipeline from [1, 2]. This pipeline involves three stages: supervised fine-tuning, reward model learning, and reinforcement learning.

0 commit comments

Comments
 (0)