Skip to content

Commit 53b7d68

Browse files
authored
fix refs not being extracted (#94)
1 parent 4cd493d commit 53b7d68

4 files changed

Lines changed: 1028 additions & 2366 deletions

File tree

grobid_client/format/TEI2LossyJSON.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -877,7 +877,7 @@ def traverse_and_collect(node, current_pos=0):
877877
# The reference text was also cleaned, so we need to find it in the final cleaned text
878878
# We can search around the original position to find the correct occurrence
879879
search_start = max(0, ref['offset_start'] - 10) # Look a bit before the original position
880-
search_end = min(len(final_text), ref['offset_start'] + 10) # Look a bit after
880+
search_end = min(len(final_text), ref['offset_end'] + 10) # Look a bit after
881881
search_area = final_text[search_start:search_end]
882882

883883
# Find the reference in the search area

0 commit comments

Comments
 (0)