Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implementation in a loop clogs up memory #6

Open
molokanov50 opened this issue Feb 9, 2023 · 1 comment
Open

Implementation in a loop clogs up memory #6

molokanov50 opened this issue Feb 9, 2023 · 1 comment

Comments

@molokanov50
Copy link

molokanov50 commented Feb 9, 2023

There is a need for me to determine grammatical case for terms in texts of a big dataset. I found that the increment of memory usage as large as 0.3 to 0.7 MB occurs virtually every call of
forms = predictor.predict(terms).
Consider a simple example:

def findCase(termNumber, text):  # нахождение падежа термина с указанным номером в тексте
    terms = text.split()
    forms = predictor.predict(terms)
    myTag = forms[termNumber].tag
    parts = re.split('\\|', myTag)
    for part in parts:
        subparts = re.split('=', part)
        if len(subparts) < 2:
            continue
        if subparts[0] == 'Case':
            return subparts[1].upper()
    return 'UNDEF'

And then, if I have a collection of texts, i can implement:

myDict = {}
for i in range(len(texts)):
    case = findCase(0, texts[i])
    myDict[i] = case

I have 12500 texts with average length of about 700 symbols each. Running all my dataset required me extra 1.5 GB of memory due to utilizing predictor.predict(terms).
Seems like my local variable forms remains in the memory after completing the method, but really, is your RNNMorphPredictor model maybe self-trained in this scenario? How to free this volume of memory?

@molokanov50
Copy link
Author

Update: there is no obvious difference depending on the length of every single text. I reduced input text length down to 10 tokens, or approx. 80 symbols only. Memory usage is the same - 1.5 GB per 12500 texts. Thereby my question becomes even more actual.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant