"Short" form of token shape #6561
kinghuang
started this conversation in
New Features & Project Ideas
Replies: 1 comment
-
I see your point, but isn't this easily achievable with some of the code you've already written, and accessing the results from a custom token attribute? |
Beta Was this translation helpful? Give feedback.
0 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
The Token shape attribute shows the orthographic features of the token's string. Sequences of the same characters are truncated after length 4.
Sometimes, I am more interested in something that generalizes the shape of the text even more. For example, say I have the following text.
Given the token for
Product
, I want to scan previous tokens that share the same basic shape. The current shapes forMy Great Product
areXx Xxxxx Xxxxx
. I can further truncate the shapes by doing something the following.This truncates the shapes of those tokens to
Xx Xx Xx
, making it possible to correlate the shapes of the three words.I would be interested in a Token attribute that acted like the current
shape
, but truncated sequences of the same character after length 1, instead of length 4. Something like ashort_shape
/short_shape_
attribute.Xxxxx
Xx
Xx
Xx
d,ddd.dd
d,d.d
dddd.dd
d.d
d.dddd
d.d
Your Environment
Beta Was this translation helpful? Give feedback.
All reactions