Implementation in a loop clogs up memory #6

molokanov50 · 2023-02-09T08:04:40Z

There is a need for me to determine grammatical case for terms in texts of a big dataset. I found that the increment of memory usage as large as 0.3 to 0.7 MB occurs virtually every call of
forms = predictor.predict(terms).
Consider a simple example:

def findCase(termNumber, text):  # нахождение падежа термина с указанным номером в тексте
    terms = text.split()
    forms = predictor.predict(terms)
    myTag = forms[termNumber].tag
    parts = re.split('\\|', myTag)
    for part in parts:
        subparts = re.split('=', part)
        if len(subparts) < 2:
            continue
        if subparts[0] == 'Case':
            return subparts[1].upper()
    return 'UNDEF'

And then, if I have a collection of texts, i can implement:

myDict = {}
for i in range(len(texts)):
    case = findCase(0, texts[i])
    myDict[i] = case

I have 12500 texts with average length of about 700 symbols each. Running all my dataset required me extra 1.5 GB of memory due to utilizing predictor.predict(terms).
Seems like my local variable forms remains in the memory after completing the method, but really, is your RNNMorphPredictor model maybe self-trained in this scenario? How to free this volume of memory?

The text was updated successfully, but these errors were encountered:

molokanov50 · 2023-02-09T13:37:08Z

Update: there is no obvious difference depending on the length of every single text. I reduced input text length down to 10 tokens, or approx. 80 symbols only. Memory usage is the same - 1.5 GB per 12500 texts. Thereby my question becomes even more actual.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Implementation in a loop clogs up memory #6

Implementation in a loop clogs up memory #6

molokanov50 commented Feb 9, 2023 •

edited

Loading

molokanov50 commented Feb 9, 2023

Implementation in a loop clogs up memory #6

Implementation in a loop clogs up memory #6

Comments

molokanov50 commented Feb 9, 2023 • edited Loading

molokanov50 commented Feb 9, 2023

molokanov50 commented Feb 9, 2023 •

edited

Loading