Skip to content

Conversation

ma-ilsi
Copy link

@ma-ilsi ma-ilsi commented Nov 9, 2024

Certain characters can find themselves identified as single words in parsed Arabic corpus data, such as "ـ" (Unicode point 0640). If this "word" is then passed to the SinaTools utilities it may be passed to arStrip and subsequently the remove_punctuation function. Such an argument will then raise UnboundLocalError in remove_punctuation (since the argument passed to text is ''):

UnboundLocalError: cannot access local variable 'output_string' where it is not associated with a value

A simple fix is to just move the initialization of output_string right above the try block. This seems very sensible, actually since a fail silently approach works well for this function, leaving the returned value as a blank that was only processed by arStrip.

The change prevents the raising of `UnboundLocalError` when passing certain strings found in Arabic corpuses such as "ـ" (Unicode point 0640) to the chained function calls: remove_punctuation(arStrip()).
@ma-ilsi ma-ilsi changed the title Variable output_String initialized earlier Variable output_string initialized earlier Nov 9, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant