Skip to content

Commit ae92263

Browse files
committed
docs(text-splitters): troubleshooting when chunk_overlap seems not to apply
1 parent 28728dc commit ae92263

File tree

1 file changed

+11
-0
lines changed

1 file changed

+11
-0
lines changed

libs/text-splitters/README.md

Lines changed: 11 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -21,6 +21,17 @@ LangChain Text Splitters contains utilities for splitting into chunks a wide var
2121

2222
For full documentation, see the [API reference](https://reference.langchain.com/python/langchain).
2323

24+
## 🛠️ Troubleshooting: `chunk_overlap` seems not to apply
25+
26+
- After header based splitting (e.g., `MarkdownHeaderTextSplitter`), use **`split_documents(docs)`** (not `split_text`) so overlap is applied **within each section** and per section metadata (headers) is preserved on chunks.
27+
- Overlap appears only when a **single input section** exceeds `chunk_size` and is split into multiple chunks.
28+
- Overlap **does not cross** section/document boundaries (e.g., `# H1``## H2`).
29+
- If the header becomes a tiny first chunk, there's nothing meaningful to overlap. Consider `strip_headers=True` in `MarkdownHeaderTextSplitter`, or reduce separators so the section forms a longer segment.
30+
- If your text lacks newlines/spaces, keep a fallback `""` in `separators` so the splitter can still split and apply overlap.
31+
32+
> Looking for examples and API details? See the [Text Splitters how-to](https://python.langchain.com/docs/how_to/#text-splitters) and the [API reference](https://python.langchain.com/api_reference/text_splitters/index.html).
33+
34+
2435
## 📕 Releases & Versioning
2536

2637
See our [Releases](https://docs.langchain.com/oss/python/release-policy) and [Versioning](https://docs.langchain.com/oss/python/versioning) policies.

0 commit comments

Comments
 (0)