-
-
Notifications
You must be signed in to change notification settings - Fork 91
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
🚧 Colour based tagging & non-ocr page_to_text #111
Open
seanmcguire12
wants to merge
48
commits into
main
Choose a base branch
from
APE-76
base: main
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
…r to tagging. Added functionality for combining the OCR annotations of the original/raw page and the tagged page.
…to be compatible with supertype ITarsier.
…ase tagging from page_to_image() to page_to_text(), added method combine_annotations() to decouple the sorting logic from page_to_text().
✨ Added functionality for taking screenshot of original/raw page prior to tagging. Added functionality for combining the OCR annotations of the original/raw page and the tagged page.
Updated APE-76 with double tagging fix.
h4q2uwr0z0sVFM0q5AV7n
|
1JWoJWs3uZMt8Wa5ql6pr
|
aLnmhAeCwsHCd3dM53rwG |
…n so that it doesn't return elements with bounding boxes that occupy no space (i.e., elements that do not actually appear on the page).
…formatting issues. Fixed issue where elements with tagged children were getting rendered twice & creating collisions.
…the next screenshot. - Added getNextColors function to generate a diverse list of RGB colors based on the number of elements. - Added colorDistance function to calculate the Euclidean distance between two colors. - Added assignColors function to assign colors to elements, ensuring maximum color distance. - Improvements to colourBasedTagify: - Tag and collect elements with bounding boxes > 0 and within the viewport. - Apply colors to element borders & set opacity to 1 - Handle special cases for links - Set visibility of non-tagged child elements to hidden. - Added function for disabling/enabling transitions and animation to be used before taking screenshots
- Added boundingBoxX and boundingBoxY attributes in ColouredElem to track element positions - Updated `page_to_text` to handle recoloring of undetected elements: - Added functions to disable and enable transitions/animations during screenshots. - Added recoloring logic to improve element detection. - Ensured missing elements are recolored and re-checked for visibility. - Added new _check_colours_ method to compare colors within a threshold to be used on the first pass.
# Conflicts: # poetry.lock # tarsier/core.py # tarsier/tag_utils.ts
# Conflicts: # tarsier-snapshots/snapshots/05W3ZEmj8pbuYSHArYUkz/ocr.txt # tarsier-snapshots/snapshots/07wOwFaw3aGekjCBpZkg0/ocr.txt # tarsier-snapshots/snapshots/0fdyKSMbc3kVUgL9RGiEk/ocr.txt # tarsier-snapshots/snapshots/0orFfEesEVpe1BN7B114a/ocr.txt # tarsier-snapshots/snapshots/11u8vZX9JHQsOrXVSWfJd/ocr.txt # tarsier-snapshots/snapshots/1JWoJWs3uZMt8Wa5ql6pr/ocr.txt # tarsier-snapshots/snapshots/1N0FTiHE53vO1j0nHDNG5/ocr.txt # tarsier-snapshots/snapshots/1qkOHewUy0Kqq9RVVSOoQ/ocr.txt # tarsier-snapshots/snapshots/1z50b4syZzf7J1kQt2k7W/ocr.txt # tarsier-snapshots/snapshots/24SLE3KnDhtOYYgIM4ote/ocr.txt # tarsier-snapshots/snapshots/2ErcEyBkupKnoHkQAhJCk/ocr.txt # tarsier-snapshots/snapshots/2HauD8zfTdDq75G7WRJzB/ocr.txt # tarsier-snapshots/snapshots/2tmwuvYVJ9KqVgHIOgctI/ocr.txt # tarsier-snapshots/snapshots/3MAIydQKH2qHnl1cuLmNc/ocr.txt # tarsier-snapshots/snapshots/3vDNIHFXtjcnarvJQWdd7/ocr.txt # tarsier-snapshots/snapshots/47D2wwbE0WZOV6obQbYA7/ocr.txt # tarsier-snapshots/snapshots/484mHWaGAH0l8tgW95Hvv/ocr.txt # tarsier-snapshots/snapshots/4Hmgj9cuidpeiVpWdXVBf/ocr.txt # tarsier-snapshots/snapshots/4Je6qSd4YFoyLxVZLQRb7/ocr.txt # tarsier-snapshots/snapshots/4KGjHFZbEpB345rOxuIzv/ocr.txt # tarsier-snapshots/snapshots/Ey2q7uEroarG84e6YZnym/ocr.txt # tarsier-snapshots/snapshots/F5AaImEw3SHGkneXd36eH/ocr.txt # tarsier-snapshots/snapshots/FIECXMTasC96yBFr7BcN1/ocr.txt # tarsier-snapshots/snapshots/FSfE85pVbn96ntVl1qEGp/ocr.txt # tarsier-snapshots/snapshots/FeRDLVQyg3Y1l62axB6az/ocr.txt # tarsier-snapshots/snapshots/FmpHbDna6mnBNe0hLyTYZ/ocr.txt # tarsier-snapshots/snapshots/Fw6hoBmn7nm2KAy4YDzv9/ocr.txt # tarsier-snapshots/snapshots/G9Xy74ZxrdukPaChWTWAo/ocr.txt # tarsier-snapshots/snapshots/GAeRa1QK7BcoGKelpEOA9/ocr.txt # tarsier-snapshots/snapshots/GNekmizdgssA6t94zWOId/ocr.txt # tarsier-snapshots/snapshots/GQfYTjppPhTgYtsuFUbXF/ocr.txt # tarsier-snapshots/snapshots/GcW0Q862yCbKr28CQTg2c/ocr.txt # tarsier-snapshots/snapshots/GuFznteaPUy3yrETrOh4Y/ocr.txt # tarsier-snapshots/snapshots/HCrPjvyx0XaLvNHxVBPZt/ocr.txt # tarsier-snapshots/snapshots/HixdQTqLbSa6zIaKmxxE1/ocr.txt # tarsier-snapshots/snapshots/HleEA9DcP1jBVN5cBEmFT/ocr.txt # tarsier-snapshots/snapshots/Hramb0PgtU7wHEj0D5OKj/ocr.txt # tarsier-snapshots/snapshots/I8Bj6okah8nrfPEinahWH/ocr.txt # tarsier-snapshots/snapshots/IUnyfHVheJUrv8frQYQib/ocr.txt # tarsier-snapshots/snapshots/JAiFFb1qWlEVk48Ny32ND/ocr.txt # tarsier-snapshots/snapshots/JNOSAEEZO4j2unWHPFBdO/ocr.txt # tarsier-snapshots/snapshots/JaVENaBu8Iu7yYoUNrORW/ocr.txt # tarsier-snapshots/snapshots/JwUW9qdzk0NgtnK2Y2BSS/ocr.txt # tarsier-snapshots/snapshots/Jxv57Kbqw1AP4qv1zvlqg/ocr.txt # tarsier-snapshots/snapshots/K88O7OW0FJoCfdVUD4xXH/ocr.txt # tarsier-snapshots/snapshots/KGEFdtgwltNKXKHGOkkaF/ocr.txt # tarsier-snapshots/snapshots/KNyomEvINtDSbA7cKRr1F/ocr.txt # tarsier-snapshots/snapshots/KTcoPSidqLGESp29nQtLv/ocr.txt # tarsier-snapshots/snapshots/KuDD2GuMDlbuKO4ozdbDA/ocr.txt # tarsier-snapshots/snapshots/KypCMQmDQ2XZ2GIbMKacI/ocr.txt # tarsier-snapshots/snapshots/L3uXGoAVL6YpRHGBCnlB8/ocr.txt # tarsier-snapshots/snapshots/L6BOPpJEJhhN5JHfWj4g1/ocr.txt # tarsier-snapshots/snapshots/LN4K9AZwaPC50Z4e513su/ocr.txt # tarsier-snapshots/snapshots/LNMVWWtQRcjkj54ONLebI/ocr.txt # tarsier-snapshots/snapshots/LOnORRBp7zDQifntNAcFO/ocr.txt # tarsier-snapshots/snapshots/LuM2bHYg5mnBvjhttTDlh/ocr.txt # tarsier-snapshots/snapshots/Ly1DY8GL7cV5mWnxr1DH5/ocr.txt # tarsier-snapshots/snapshots/MIZDQx8G6Gn562lO5hFQb/ocr.txt # tarsier-snapshots/snapshots/MP4p6ibb3PLD3i8AmBrZ3/ocr.txt # tarsier-snapshots/snapshots/MQOPSYI3SU7EEQRbHUMHr/ocr.txt # tarsier-snapshots/snapshots/MQrMR8W7oJtlUc056qZ6L/ocr.txt # tarsier-snapshots/snapshots/MRD347sMiS2vlw091LAqK/ocr.txt # tarsier-snapshots/snapshots/NHWkSmdwXKQb9oe9vVGZf/ocr.txt # tarsier-snapshots/snapshots/NJouZuI4JTRsMz3KYK1cV/ocr.txt # tarsier-snapshots/snapshots/NLtUSUexaGqmRUBomWj9R/ocr.txt # tarsier-snapshots/snapshots/NSVMR9p35Pku7LUyPCMHY/ocr.txt # tarsier-snapshots/snapshots/NUkrUYwOJuYfv5SC3GHTE/ocr.txt # tarsier-snapshots/snapshots/NV6JL1wEHaTPuK65dKt6t/ocr.txt # tarsier-snapshots/snapshots/NZoqFzLNm1OJsS96Pyxbi/ocr.txt # tarsier-snapshots/snapshots/O3kSfBi6P0CQBJTCmjV7B/ocr.txt # tarsier-snapshots/snapshots/O3t7Of3CTP2WUj71YddFO/ocr.txt # tarsier-snapshots/snapshots/OWLWiq0ePIJmx5VmtquOD/ocr.txt # tarsier-snapshots/snapshots/Ofe0weKbJ9yl5vEwkalCS/ocr.txt # tarsier-snapshots/snapshots/OlYrsJi04Czdu7Uvl1mIF/ocr.txt # tarsier-snapshots/snapshots/OmJeRJARVmguS9uMWU1Xb/ocr.txt # tarsier-snapshots/snapshots/P7dY0WRzR4PCfWZNuSeBf/ocr.txt # tarsier-snapshots/snapshots/PiQlpch5uQzWNXiEEvjX3/ocr.txt # tarsier-snapshots/snapshots/PthtpZsDczvCOCFYIogKI/ocr.txt # tarsier-snapshots/snapshots/PzN7n57ArAxzcJHzx63NY/ocr.txt # tarsier-snapshots/snapshots/QIOkg628A7yzKluLVB8of/ocr.txt # tarsier-snapshots/snapshots/QJ1O4XyX7e3CpAPQ3Bonw/ocr.txt # tarsier-snapshots/snapshots/QOZFTfvesXGZxgsmHqnrL/ocr.txt # tarsier-snapshots/snapshots/QWwSzGV7QMgJprOY5cOpP/ocr.txt # tarsier-snapshots/snapshots/Ql2B37FdNugeJ09WjopGa/ocr.txt # tarsier-snapshots/snapshots/QlAWMyjvSxPHh4E5Fkjfs/ocr.txt # tarsier-snapshots/snapshots/QuUpyX6Z5U2HUUQWJV3S4/ocr.txt # tarsier-snapshots/snapshots/QwiRD9fjb4YuRaY3Ypz3f/ocr.txt # tarsier-snapshots/snapshots/QxSSau0T34NCk6O1bq4Cd/ocr.txt # tarsier-snapshots/snapshots/R99SMT2jvCjJRqRGra2g6/ocr.txt # tarsier-snapshots/snapshots/RIqXLn8bSaFN0AG4DdoHO/ocr.txt # tarsier-snapshots/snapshots/RVotqLcMUyKXULUTqYCvm/ocr.txt # tarsier-snapshots/snapshots/RpjyEXqtmEQDFWgojBJMU/ocr.txt # tarsier-snapshots/snapshots/S8AKJlRl5F8Vci1UiLU1a/ocr.txt # tarsier-snapshots/snapshots/SEyENcYHqerkt0nmJZjl7/ocr.txt # tarsier-snapshots/snapshots/STPTr6OhlruneOtA24xi9/ocr.txt # tarsier-snapshots/snapshots/SjzTipa4JUYx4Ocn5VkCV/ocr.txt # tarsier-snapshots/snapshots/SlMfqkoK2KeAp31dHr88F/ocr.txt # tarsier-snapshots/snapshots/Sqb7SeHvAcouDW5rFl9yu/ocr.txt # tarsier-snapshots/snapshots/Std6TTbgilRTiLDGJOezx/ocr.txt # tarsier-snapshots/snapshots/T1pTeE6hYcFsaZ84no4GM/ocr.txt # tarsier-snapshots/snapshots/TG8dn0Xi3SJC0VHjWRH1P/ocr.txt # tarsier-snapshots/snapshots/TKUFwwdmB0ioMyUXvozpu/ocr.txt # tarsier-snapshots/snapshots/TLxVvFZ6MRB0nbSBWl8ym/ocr.txt # tarsier-snapshots/snapshots/TQyvtLuRcbSStSHq1seCq/ocr.txt # tarsier-snapshots/snapshots/U5wOXA13nV6xyogmib6uL/ocr.txt # tarsier-snapshots/snapshots/UEQ5bJeIeTst0YVL8ga9Z/ocr.txt # tarsier-snapshots/snapshots/UPCNbyQNGulQpM6v6sxUo/ocr.txt # tarsier-snapshots/snapshots/UjsF3B4ihFcZjXEcZCnm1/ocr.txt # tarsier-snapshots/snapshots/VPIrl5m9IfNLKS03UyzNH/ocr.txt # tarsier-snapshots/snapshots/Vba6zNQZmxgxA8byjpmaA/ocr.txt # tarsier-snapshots/snapshots/Vo8MreF9aVq5bE45XqaMz/ocr.txt # tarsier-snapshots/snapshots/VogIUZw1FJlCEiBzTUwYR/ocr.txt # tarsier-snapshots/snapshots/VqSaCh7ffPXKh1IymN8Oo/ocr.txt # tarsier-snapshots/snapshots/W8QTUDItaXJSOaBOZGAE8/ocr.txt # tarsier-snapshots/snapshots/WDGGGgqdb1RGaoGlseBJk/ocr.txt # tarsier-snapshots/snapshots/WEVQJfQEWky3KR7Hc2kuK/ocr.txt # tarsier-snapshots/snapshots/WyQg7esKNNds3EYMZCx2J/ocr.txt # tarsier-snapshots/snapshots/XSzc3ewTsGRYwwdHvb6LK/ocr.txt # tarsier-snapshots/snapshots/Xixe0WiedsLB1KFcKpv2r/ocr.txt # tarsier-snapshots/snapshots/Xnuxii49OIfjWntcihbjX/ocr.txt # tarsier-snapshots/snapshots/XsNkGYeq1DTAnyKuuvHPZ/ocr.txt # tarsier-snapshots/snapshots/Xu7Q49cgzMsp4cgMR0qqS/ocr.txt # tarsier-snapshots/snapshots/XxXTjDH2qRuu4n5BSLM5d/ocr.txt # tarsier-snapshots/snapshots/Yb4ug21SFYfiN4ENjJCcz/ocr.txt # tarsier-snapshots/snapshots/YuBInhOP8OdQAfy4Htvre/ocr.txt # tarsier-snapshots/snapshots/ZW0ihimOJEReeseRBrI5i/ocr.txt # tarsier-snapshots/snapshots/ZYBqV9WrmYmyFExthpKLD/ocr.txt # tarsier-snapshots/snapshots/a0pJxHhxIHFKcoFjkORnG/ocr.txt # tarsier-snapshots/snapshots/aLnmhAeCwsHCd3dM53rwG/ocr.txt # tarsier-snapshots/snapshots/aQZGYIDkaa6JY6aXv6wXQ/ocr.txt # tarsier-snapshots/snapshots/aa3t8r3kAlp9FYx2uSOFz/ocr.txt # tarsier-snapshots/snapshots/abgIXICPIttq3MhkmSVdV/ocr.txt # tarsier-snapshots/snapshots/ahEBAfuWtiZ8HM77W2d2D/ocr.txt # tarsier-snapshots/snapshots/aivDVkwH92hQdu5cDr4nv/ocr.txt # tarsier-snapshots/snapshots/apscD5vWHBV1dvAX6K7Vt/ocr.txt # tarsier-snapshots/snapshots/awL4PUmAj9TIIqR6L95fq/ocr.txt # tarsier-snapshots/snapshots/bOVNaNsrc6UrCdlhHLxGy/ocr.txt # tarsier-snapshots/snapshots/bOlARasPXtWAjEDfxtk2L/ocr.txt # tarsier-snapshots/snapshots/bZPREHVg723XRC2I6z9MQ/ocr.txt # tarsier-snapshots/snapshots/bwwko5J7aFk5K8qz61jBI/ocr.txt # tarsier-snapshots/snapshots/c3s1dYwKWMEJHKGyP3qnr/ocr.txt # tarsier-snapshots/snapshots/cAeniCN923UcmnXuOOIBJ/ocr.txt # tarsier-snapshots/snapshots/cFcnDQSGQgDeyHBnZtrU8/ocr.txt # tarsier-snapshots/snapshots/cMPCNSczVAPhdXJxBIBEd/ocr.txt # tarsier-snapshots/snapshots/cdFPVICHIa5evhnj1OiMx/ocr.txt # tarsier-snapshots/snapshots/cohMcyz81B0NHA04Qeik2/ocr.txt # tarsier-snapshots/snapshots/ct6PuXzujbOlM9zaARUpa/ocr.txt # tarsier-snapshots/snapshots/cv3sq0A9o3VHmD1UvEWse/ocr.txt # tarsier-snapshots/snapshots/e7iDpCvvfiq3oU1UAvxTC/ocr.txt # tarsier-snapshots/snapshots/eE46U0AMRoczeDL2eOcgf/ocr.txt # tarsier-snapshots/snapshots/eKKvQ3OZG6H0jjTIRINPs/ocr.txt # tarsier-snapshots/snapshots/eSG6HgfI2R9JpZQRozsSV/ocr.txt # tarsier-snapshots/snapshots/ecqQm32DLMtTUWt2AQxhm/ocr.txt # tarsier-snapshots/snapshots/f41Dz5iiwe5QjVbXqWpJJ/ocr.txt # tarsier-snapshots/snapshots/fJPQwUD42zT2WKhdBJLnN/ocr.txt # tarsier-snapshots/snapshots/fJWonTvHgvl7Ex9DdB1Px/ocr.txt # tarsier-snapshots/snapshots/gHXZyrqL7qpmKMFYM6oGE/ocr.txt # tarsier-snapshots/snapshots/gKfAQGripVAFa87dehr5m/ocr.txt # tarsier-snapshots/snapshots/gd2iNA5INcT66penKY175/ocr.txt # tarsier-snapshots/snapshots/gdtUqXUos3CdM6zVlMbbC/ocr.txt # tarsier-snapshots/snapshots/gg5AAaFekWGXPdKtYBoer/ocr.txt # tarsier-snapshots/snapshots/ggdDF9CwmrmiBHsQvZcDk/ocr.txt # tarsier-snapshots/snapshots/h4q2uwr0z0sVFM0q5AV7n/ocr.txt # tarsier-snapshots/snapshots/ijJbuKPqEOkA4OK0BzLPk/ocr.txt # tarsier-snapshots/snapshots/jCYLQBT1114BBW83zKQdt/ocr.txt # tarsier-snapshots/snapshots/jH56yUizuVbTYWAIwSJkM/ocr.txt # tarsier-snapshots/snapshots/k1I07SwT7Clry1xxPODfa/ocr.txt # tarsier-snapshots/snapshots/kZVEvHT3kuBfZtNUY8rC2/ocr.txt # tarsier-snapshots/snapshots/kbd8qO9tx1Efbf08MqZWQ/ocr.txt # tarsier-snapshots/snapshots/ke6newcCWvPhsxeZ5TCZ4/ocr.txt # tarsier-snapshots/snapshots/kfueRbnkKCdJwC0BRiggp/ocr.txt # tarsier-snapshots/snapshots/kvcH8Q2BG1SPgWSAN3f2h/ocr.txt # tarsier-snapshots/snapshots/kx3CBXYC9YUyRIFIMYTcD/ocr.txt # tarsier-snapshots/snapshots/l3mMTs6gZa1GvpGjknIFT/ocr.txt # tarsier-snapshots/snapshots/l8QvEOlveFkWUVYu1HNgD/ocr.txt # tarsier-snapshots/snapshots/lBTRjkiZqEdNvCSjTmoWG/ocr.txt # tarsier-snapshots/snapshots/lHjLewJTfQKFSAmGE5Wr1/ocr.txt # tarsier-snapshots/snapshots/lSwsaU5jAVRddpYTCsWEd/ocr.txt # tarsier-snapshots/snapshots/n1VHZA0AkvnKB3Qy2hqvB/ocr.txt # tarsier-snapshots/snapshots/n1zh09obI7c51LUTBNNBE/ocr.txt # tarsier-snapshots/snapshots/n28tTMFEZfIyMXsCxO6Ra/ocr.txt # tarsier-snapshots/snapshots/n7LTn5tVJ2B3IvDopFTFO/ocr.txt # tarsier-snapshots/snapshots/nAXVoJDSuul938vtPvfFB/ocr.txt # tarsier-snapshots/snapshots/nXWHr3UoycfzFqubWTUpn/ocr.txt # tarsier-snapshots/snapshots/njhgFq4h4BcMTdaRxtElY/ocr.txt # tarsier-snapshots/snapshots/nxkcxrThdmaRX01YRXtho/ocr.txt # tarsier-snapshots/snapshots/o28cv918RSdVcg2P55tGq/ocr.txt # tarsier-snapshots/snapshots/oBJMkbpRqNM02wNlOTP3N/ocr.txt # tarsier-snapshots/snapshots/oEAjw9fv6UXmS63CIzZlU/ocr.txt # tarsier-snapshots/snapshots/oaDAf9SeUsVwpDeKajNrs/ocr.txt # tarsier-snapshots/snapshots/ogRf0dLwJKiDJUQnzz4pn/ocr.txt # tarsier-snapshots/snapshots/pAObMNn95uFVSll7pCXpg/ocr.txt # tarsier-snapshots/snapshots/pNsTF6muOdSesbhNTFI9g/ocr.txt # tarsier-snapshots/snapshots/pXL6ojrOhW79o92e8IXw0/ocr.txt # tarsier-snapshots/snapshots/pk7eEZ2sweN4YzzFVK217/ocr.txt # tarsier-snapshots/snapshots/prf1dSczRpaoWLrEMseB1/ocr.txt # tarsier-snapshots/snapshots/q3jMY8P01UJCw3ggDs1OJ/ocr.txt # tarsier-snapshots/snapshots/q72iVxzE9cGatHU1cLKJX/ocr.txt # tarsier-snapshots/snapshots/qgEjcl77WINh8ltNc9NoC/ocr.txt # tarsier-snapshots/snapshots/qrWALKWSykHxTLuVy0Rl7/ocr.txt # tarsier-snapshots/snapshots/qtRibcsG6iq09TyGQoYhv/ocr.txt # tarsier-snapshots/snapshots/qyZjOcbaiHuVq4FpOB26b/ocr.txt # tarsier-snapshots/snapshots/rFp4CQs5ZxAebcIM0d62U/ocr.txt # tarsier-snapshots/snapshots/rGFdlkuftF7L1VlFL7LbS/ocr.txt # tarsier-snapshots/snapshots/rKCkTGVbx4Mpi0BAnKCRd/ocr.txt # tarsier-snapshots/snapshots/rZQpVHDs30D7WbTFIiXCr/ocr.txt # tarsier-snapshots/snapshots/ranUaEMdxbjMltYPt2AX7/ocr.txt # tarsier-snapshots/snapshots/rgCTp6HulNEsEqEupEUZN/ocr.txt # tarsier-snapshots/snapshots/rmMxc6dEoyE1WpLLWqTHV/ocr.txt # tarsier-snapshots/snapshots/t8biLN0RgFBPYO2hv2JYJ/ocr.txt # tarsier-snapshots/snapshots/tIowzAEvZcWH9ukP4Aofa/ocr.txt # tarsier-snapshots/snapshots/tV4VsHCiYAA3o6oKYyXVk/ocr.txt # tarsier-snapshots/snapshots/tVBOUnrTSDIHQbsMw2WgS/ocr.txt # tarsier-snapshots/snapshots/tbRxihP0jtq5O12zVhvEF/ocr.txt # tarsier-snapshots/snapshots/token_statistics.txt # tarsier-snapshots/snapshots/u2IEvb9Ke4lKLaD4LtJYE/ocr.txt # tarsier-snapshots/snapshots/u3fjwZRjKUEcvr8kkmy5v/ocr.txt # tarsier-snapshots/snapshots/u7I1P6OC5xX8f3u8Fwjvf/ocr.txt # tarsier-snapshots/snapshots/uOmbtFqUSqItS8CKmyi51/ocr.txt # tarsier-snapshots/snapshots/uPrnCohCwLCrVvwN8eXWZ/ocr.txt # tarsier-snapshots/snapshots/uibGV6FB4gcYvY93AIWJe/ocr.txt # tarsier-snapshots/snapshots/v7hgryy94evdLb0aHzDtY/ocr.txt # tarsier-snapshots/snapshots/vELUj6wGf96coJAqt0x5D/ocr.txt # tarsier-snapshots/snapshots/vVJc0PFcYOzKHHL1v1hev/ocr.txt # tarsier-snapshots/snapshots/vgTQTZN0Efl4vXQ0I9Iy8/ocr.txt # tarsier-snapshots/snapshots/wUHnayH90bjRjjjdCT0r2/ocr.txt # tarsier-snapshots/snapshots/wXhQ0YobLZ4z1BAZesBUF/ocr.txt # tarsier-snapshots/snapshots/wjmMahVNX7T1jH9GmVW9r/ocr.txt # tarsier-snapshots/snapshots/wqGtmRYz4PWe4LCxAW4UI/ocr.txt # tarsier-snapshots/snapshots/x9tCDlr2WOazDKVrF3njD/ocr.txt # tarsier-snapshots/snapshots/xCHAOXtOYz47HfNY9LeZq/ocr.txt # tarsier-snapshots/snapshots/xZCsA0eNaR7OMmhcBlsOv/ocr.txt # tarsier-snapshots/snapshots/xgnNjPdOMUY0LZ1GJdEsE/ocr.txt # tarsier-snapshots/snapshots/xh7zxFmYI3du3PWBnEjQ4/ocr.txt # tarsier-snapshots/snapshots/xkEtVvkl3HDnC827Flk3g/ocr.txt # tarsier-snapshots/snapshots/xkINPY1INO91Jv5ZokNGu/ocr.txt # tarsier-snapshots/snapshots/yXLMF4nocYqJnql2dt71R/ocr.txt # tarsier-snapshots/snapshots/yoqTH08pW464eBIPYwd5r/ocr.txt # tarsier-snapshots/snapshots/yzwuXotaBr52CyG4mUDhy/ocr.txt # tarsier-snapshots/snapshots/zKVOGYYHXR3uskE0WcG1A/ocr.txt # tarsier-snapshots/snapshots/zPfbTSTbZ3sOGYDiqwyj0/ocr.txt # tarsier-snapshots/snapshots/zRdqy27hn5RdNqJqnjzaA/ocr.txt # tarsier-snapshots/tarsier_snapshots/snapshots.py
pyproject.toml
Outdated
@@ -1,6 +1,6 @@ | |||
[tool.poetry] | |||
name = "tarsier" | |||
version = "0.6.3" | |||
version = "0.6.39" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why 9?
Comment on lines
+9
to
+12
cd ./tarsier-snapshots || exit 1 | ||
poetry install | ||
poetry run bananalyze --download |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
not really relevant for settuing up tarsier. Would delete
# Conflicts: # poetry.lock # tarsier-snapshots/tarsier_snapshots/snapshots.py
…eturn element bounding box
…not just unique ones
…eturn value to match that of original page_to_text
# Conflicts: # poetry.lock # tarsier/core.py # tarsier/tag_utils.ts
…, colour images, capture image alt text
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
WIP: implemented colour based tagging & page_to_text_new which doesnt use OCR.