Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

en_core_web_md and en_core_web_lg make Python crash silently on Windows #13708

Open
felixlennart opened this issue Dec 5, 2024 · 1 comment

Comments

@felixlennart
Copy link

felixlennart commented Dec 5, 2024

When I try to use the models en_core_web_md or en_core_web_lg Python silently crashes.
However en_core_web_sm runs fine with no problems!
There are no Python exceptions (tried Debugging).
But the crash causes an entry in the Windows Event Protocol:

Faulting application name: python3.10.exe, version: 3.10.11150.1013, timestamp: 0x642cc42e
Faulting module name: cy.cp310-win_amd64.pyd, version: 0.0.0.0, timestamp: 0x66e36da3
Exception code: 0xc0000005
Fault offset: 0x000000000008e9ea
Faulting process ID: 0x5ed4
Faulting application start time: 0x01db4497b25c334c
Faulting application path: C:\Program Files\WindowsApps\PythonSoftwareFoundation.Python.3.10_3.10.3056.0_x64__qbz5n2kfra8p0\python3.10.exe
Faulting module path: c:\Users\Felix\Documents\Entwicklung\usecase-scoring\clean_env\lib\site-packages\blis\cy.cp310-win_amd64.pyd
Report ID: 8dbf93b3-cb53-46c1-ba02-e51170db537a
Full package name: PythonSoftwareFoundation.Python.3.10_3.10.3056.0_x64__qbz5n2kfra8p0
Application ID relative to faulting package: Python

So it has something to do with the Windows version of the blis-Dependency for spaCy, already tried different versions of it without success.

How to reproduce the behaviour

I'm just trying to do a basic check if a String consists of a noun and a verb, there appears to be no error:

    def _internalValidation(self, s: str) -> bool:
        # Put space between camel case word combinations
        s = re.sub(r'(?<=[a-z])(?=[A-Z])', ' ', s)
        
        # Load string into natural language processing model
        doc = self.nlp(s)
        
        # Tokenize the string
        words = [token for token in doc]
        
        # Invalid if there are less than 2 words
        if len(words) < 2:
            return False
        
        # Check if there is a noun and a verb
        hasNoun = any(x.pos_ == 'NOUN' for x in words)
        hasVerb = any(x.pos_ == 'VERB' for x in words)

        return hasNoun and hasVerb

Your Environment

Info about spaCy

  • spaCy version: 3.8.2
  • Platform: Windows-10-10.0.19045-SP0
  • Python version: 3.12.7
  • Pipelines: en_core_web_sm (3.8.0), en_core_web_sm (3.8.0), en_core_web_lg (3.8.0)

System specs:

  • AMD Ryzen 3700X
  • 32 GB RAM
  • NVIDIA GeForce RTX 3070
@honnibal
Copy link
Member

Working on this, thanks.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants