Text highlighting using char_span #13598
Unanswered
DWHowes
asked this question in
Help: Coding & Implementations
Replies: 1 comment
-
I've figured out the problem. When the text files were originally saved, they were encoded as UTF-8-BOM. Changing the encoding to UTF-8 fixed the problem I describe above. |
Beta Was this translation helpful? Give feedback.
0 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
I am loading one or more text files (as specified by flist) into a list (data_list). Each entry in the list is a dictionary composed of a spacy doc object and a currently empty Label list.
File Access
The list data_list is initialized as an empty list in the init method of the class.
In my app, I can click through each entry in the list, displaying the text of the entry in my edit window (a QTextEdit widget).
In the displayed text, I wish to partially highlight a span of tokens and have the highlighted text snap to the beginning of the first partially selected token and the end of the last one. This is the code I'm using
This code is called from my event filter, when I catch the left mouse button release.
The highlighting is managed correctly on every line of the file except the first one. For the first line, there is an offset in the start_char returned from char_span. This behaviour manifests for the first line of each file that has been merged into data_list.
For example, using the following text (first line from the file I've attached).
I've eliminated a file read error, as I've read in the file two different ways (using readline and a csv reader) with the same results. If there is a non-displaying character starting the text file, I haven't been able to find it (I've attached one of the texts files)
If I'm not handling the edge case correctly, I'm not sure what I'm doing wrong. Any help or ideas will be greatly appreciated.
an_outline_of_human_socioenvironmental_coevolution.txt
Beta Was this translation helpful? Give feedback.
All reactions