Load_dataset text "sample_by=paragraph" not working, only returns 1 item #7152

Luca-Jank · 2024-09-17T12:54:37Z

Luca-Jank
Sep 17, 2024

Hello!
I have a text file on windows. I already ran a script to insert a "\n" at certain points in the file.
It is now in the format:
"
AParagraphHere
Line1OfParagraph
Line2OfParagraph
...

DifferentParagraph now
Line2...
"
When I use "load_dataset("text", data_files={"train": ["my_file.txt"]}, sample_by="paragraph"), only one item is returned, which is the text of the entire file without seperation by paragraphs:

DatasetDict({
train: Dataset({
features: ['text'],
num_rows: 1
})
})

When I leave out "sample_by='paragraph'", it returns one item for every line in the file.

How can I make it return the actual paragraphs?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Load_dataset text "sample_by=paragraph" not working, only returns 1 item #7152

{{title}}

Replies: 0 comments

Select a reply

Load_dataset text "sample_by=paragraph" not working, only returns 1 item #7152

Luca-Jank Sep 17, 2024

Replies: 0 comments

Luca-Jank
Sep 17, 2024