This repository contains a script for converting a CSV file into a JSON Lines (JSONL) dataset for fine-tuning language models. The dataset is formatted to be compatible with OpenAI's fine-tuning specification, but this format is a standard that could be applied to other platforms like autotrain on hugging face.
hot take
Hotdogs are not sandwiches they are tacos.
Tuesday is worse than Monday.
Butterflies are gross, centipedes with wings.
The output JSONL file (output.jsonl
) is formatted with the typical structure of a conversation. Each line in the output file is a JSON object that represents a conversation sequence involving a system message, a user prompt, and an assistant response.
{"messages": [{"role": "system", "content": "You are a hot take generator."}, {"role": "user", "content": "Give me a hot take."}, {"role": "assistant", "content": "Pineapple belongs on every pizza, and it's the superior topping."}]}