Skip to content

Programming-from-A-to-Z/Data-for-Fine-Tuning

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

4 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Fine-Tuning Dataset Formatter

This repository contains a script for converting a CSV file into a JSON Lines (JSONL) dataset for fine-tuning language models. The dataset is formatted to be compatible with OpenAI's fine-tuning specification, but this format is a standard that could be applied to other platforms like autotrain on hugging face.

Example CSV

hot take
Hotdogs are not sandwiches they are tacos.
Tuesday is worse than Monday.
Butterflies are gross, centipedes with wings.

Output Format

The output JSONL file (output.jsonl) is formatted with the typical structure of a conversation. Each line in the output file is a JSON object that represents a conversation sequence involving a system message, a user prompt, and an assistant response.

Example JSONL

{"messages": [{"role": "system", "content": "You are a hot take generator."}, {"role": "user", "content": "Give me a hot take."}, {"role": "assistant", "content": "Pineapple belongs on every pizza, and it's the superior topping."}]}

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published