Skip to content

Fix dependency parsing ValueError on variable-length CoNLL-U output#5

Merged
wannaphong merged 6 commits intomainfrom
copilot/fix-valueerror-dependency-parsing
Feb 5, 2026
Merged

Fix dependency parsing ValueError on variable-length CoNLL-U output#5
wannaphong merged 6 commits intomainfrom
copilot/fix-valueerror-dependency-parsing

Conversation

Copy link
Contributor

Copilot AI commented Feb 5, 2026

PyThaiNLP's dependency_parsing() returns more than 10 fields for certain text patterns, causing ValueError: too many values to unpack (expected 10) when processing Thai text like 'ฉันชอบสิ่งมีชีวิตเลกๆที่เรียกว่าโลมา โลมาเก็งที่สุด!'.

Changes

Modified spacy_pythainlp/core.py:

  • Use slice notation i[:10] instead of direct unpacking to handle variable-length tuples
  • Add validation to ensure minimum 10 fields (CoNLL-U standard) with clear error message
# Before
idx, word, _, postag, _, _, head, dep, _, space = i

# After  
if len(i) < 10:
    raise ValueError(f"Expected at least 10 fields in dependency parsing output, got {len(i)}")
idx, word, _, postag, _, _, head, dep, _, space = i[:10]

This handles PyThaiNLP returning 11+ fields (additional metadata beyond standard CoNLL-U) while maintaining backward compatibility with exactly 10 fields.

Added Test Suite (tests/test_dependency_parsing.py):

  • 9 comprehensive test cases covering the dependency parsing fix
  • Tests for 10, 11, and 12 field CoNLL-U formats
  • Validation tests for insufficient fields (< 10)
  • POS tagging and dependency relation verification
  • Basic functionality tests for imports, pipeline addition, and tokenization

Added GitHub Actions Workflow (.github/workflows/test.yml):

  • Automated CI/CD testing on push and pull requests
  • Tests against Python versions 3.8, 3.9, 3.10, 3.11, 3.12
  • Runs pytest with coverage reporting
  • Uploads coverage reports to Codecov

Testing

  • ✅ All 9 tests passing
  • ✅ 44% code coverage of spacy_pythainlp/core.py
  • ✅ Tests use mocking to simulate various CoNLL-U output formats
  • ✅ Automated testing will run on all future PRs
Original prompt

This section details on the original issue you should resolve

<issue_title>ValueError: Too many values to unpack in dependency parsing using spacy_pythainlp</issue_title>
<issue_description>Issue Description:

Code:

import spacy
import spacy_pythainlp.core

nlp = spacy.blank('th')
nlp.add_pipe(
   "pythainlp",
   config={
        "dependency_parsing": True,
        "dependency_parsing_engine": "esupar",
   }
)

text_to_make_error = 'ฉันชอบสิ่งมีชีวิตเลกๆที่เรียกว่าโลมา โลมาเก็งที่สุด!'
doc = nlp(text_to_make_error)

Error Message:

---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
Cell In[31], line 14
      5 nlp.add_pipe(
      6    "pythainlp",
      7    config={
   (...)
     10    }
     11 )
     13 text_to_make_error = 'ฉันชอบสิ่งมีชีวิตเล็กๆที่เรียกว่าโลมา โลมาเก่งที่สุด!'
---> 14 doc = nlp(text_to_make_error)

File ~/.pyenv/versions/aspect-thai/lib/python3.11/site-packages/spacy/language.py:1031, in Language.__call__(self, text, disable, component_cfg)
   1029     raise ValueError(Errors.E109.format(name=name)) from e
   1030 except Exception as e:
-> 1031     error_handler(name, proc, [doc], e)
   1032 if not isinstance(doc, Doc):
   1033     raise ValueError(Errors.E005.format(name=name, returned_type=type(doc)))

File ~/.pyenv/versions/aspect-thai/lib/python3.11/site-packages/spacy/util.py:1670, in raise_error(proc_name, proc, docs, e)
   1669 def raise_error(proc_name, proc, docs, e):
-> 1670     raise e

File ~/.pyenv/versions/aspect-thai/lib/python3.11/site-packages/spacy/language.py:1026, in Language.__call__(self, text, disable, component_cfg)
   1024     error_handler = proc.get_error_handler()
   1025 try:
-> 1026     doc = proc(doc, **component_cfg.get(name, {}))  # type: ignore[call-arg]
   1027 except KeyError as e:
   1028     # This typically happens if a component is not initialized
   1029     raise ValueError(Errors.E109.format(name=name)) from e

File ~/.pyenv/versions/aspect-thai/lib/python3.11/site-packages/spacy_pythainlp/core.py:87, in PyThaiNLP.__call__(self, doc)
     85 def __call__(self, doc:Doc):
     86     if self.dependency_parsing:
---> 87         doc = self._dep(doc)
     88         self.on_tokenize = False
     89         self.on_sent = False

File ~/.pyenv/versions/aspect-thai/lib/python3.11/site-packages/spacy_pythainlp/core.py:167, in PyThaiNLP._dep(self, doc)
    165 _dep_temp = dependency_parsing(text, model=self.dependency_parsing_model, engine=self.dependency_parsing_engine, tag="list")
    166 for i in _dep_temp:
--> 167     idx, word, _, postag, _, _, head, dep, _, space =  i
    168     words.append(word)
    169     pos.append(postag)

ValueError: too many values to unpack (expected 10)

Description:
I am encountering a ValueError when using the spacy_pythainlp library for dependency parsing in Thai text. The code provided attempts to perform dependency parsing on a Thai text, but it results in the error mentioned above.

Possible Solution:
The error seems to be related to unpacking values. To address this issue, you can modify the line in your code that unpacks values from the variable i. The error message suggests that there are more values to unpack than expected.

Change this line:

idx, word, _, postag, _, _, head, dep, _, space =  i

to:

idx, word, other*, postag, _, _, head, dep, _, space =  i

This change takes the excessive elements into other variable ensuring unpacking does not return errors. But this is just a quick fix. I'm wondering why with this text it returns more than 10 elements.

Additional Information:

  • I am using the spacy library with the spacy_pythainlp extension for Thai language processing.
  • The error occurs when the code attempts to process the Thai text provided.
  • The issue appears to be related to unpacking values during dependency parsing.</issue_description>

Comments on the Issue (you are @copilot in this section)


✨ Let Copilot coding agent set things up for you — coding agent works faster and does higher quality work when set up for your repo.

Copilot AI and others added 2 commits February 5, 2026 08:22
…d tuples

Co-authored-by: wannaphong <8536487+wannaphong@users.noreply.github.com>
Co-authored-by: wannaphong <8536487+wannaphong@users.noreply.github.com>
@wannaphong wannaphong marked this pull request as ready for review February 5, 2026 08:25
@wannaphong
Copy link
Member

@copilot Add testset and github action for testing

Copilot AI changed the title [WIP] Fix ValueError in dependency parsing with spacy_pythainlp Fix dependency parsing ValueError on variable-length CoNLL-U output Feb 5, 2026
Copilot AI requested a review from wannaphong February 5, 2026 08:26
Co-authored-by: wannaphong <8536487+wannaphong@users.noreply.github.com>
Copy link
Contributor Author

Copilot AI commented Feb 5, 2026

@copilot Add testset and github action for testing

Added comprehensive test suite and GitHub Actions workflow in commit f947d9c.

Test Suite (tests/test_dependency_parsing.py):

  • 9 test cases covering the dependency parsing fix
  • Tests for 10, 11, and 12 field CoNLL-U formats
  • Validation tests for insufficient fields
  • POS tagging and dependency relation verification
  • Basic functionality tests

GitHub Actions (.github/workflows/test.yml):

  • Runs on push/PR to main branches
  • Tests Python 3.8-3.12
  • Includes pytest with coverage reporting
  • All 9 tests passing with 44% coverage of core.py

@wannaphong wannaphong merged commit 501c8fb into main Feb 5, 2026
7 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

ValueError: Too many values to unpack in dependency parsing using spacy_pythainlp

2 participants