Cant pass transformer pipeline to RollingWindowSplitter function #30

TatvaJoshi · 2024-12-31T21:14:46Z

from transformers import AutoTokenizer, pipeline
from optimum.onnxruntime import ORTModelForFeatureExtraction
from semantic_router.splitters import RollingWindowSplitter

model_path = "/apps/data/gte-base/onnx"
model = ORTModelForFeatureExtraction.from_pretrained(model_path, file_name="model.onnx")
tokenizer = AutoTokenizer.from_pretrained(model_path)

embedder = pipeline("feature-extraction", model=model, tokenizer=tokenizer)

splitter = RollingWindowSplitter(
encoder=embedder,
dynamic_threshold=True,
min_split_tokens=75,
max_split_tokens=365,
window_size=3, # Adjust based on your requirements
plot_splits=False, # Set to True to visualize chunking
enable_statistics=False # Set to True to print chunking stats
)

text = "Your long text document goes here."

chunks = splitter(text)

for i, chunk in enumerate(chunks):
print(f"Chunk {i+1}: {chunk}")

*RETURNS this error:

splitter = RollingWindowSplitter(
Traceback (most recent call last):
File "/apps/data/test.py", line 14, in
splitter = RollingWindowSplitter(
File "/apps/data/.venv/lib64/python3.9/site-packages/semantic_router/splitters/rolling_window.py", line 60, in init
super().init(name=name, encoder=encoder)
File "/apps/data/.venv/lib64/python3.9/site-packages/pydantic/v1/main.py", line 341, in init
raise validation_error
pydantic.v1.error_wrappers.ValidationError: 1 validation error for RollingWindowSplitter
encoder
value is not a valid dict (type=type_error.dict)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Cant pass transformer pipeline to RollingWindowSplitter function #30

Cant pass transformer pipeline to RollingWindowSplitter function #30

TatvaJoshi commented Dec 31, 2024 •

edited

Loading

Cant pass transformer pipeline to RollingWindowSplitter function #30

Cant pass transformer pipeline to RollingWindowSplitter function #30

Comments

TatvaJoshi commented Dec 31, 2024 • edited Loading

TatvaJoshi commented Dec 31, 2024 •

edited

Loading