This is the Ruby implementation of the twitter-text parsing library. The library has methods to parse Tweets and calculate length, validity, parse @mentions, #hashtags, URLs, and more.
Installation uses bundler.
% gem install bundler
% bundle install
To run the Conformance test suite from the command line via rake:
% rake test:conformance:run
You can also run the rspec tests in the spec
directory:
% rspec spec
twitter-text 2.0 introduces configuration files that define how Tweets are parsed for length. This allows for backwards compatibility and flexibility going forward. Old-style traditional 140-character parsing is defined by the v1.json configuration file, whereas v2.json is updated for "weighted" Tweets where ranges of Unicode code points can have independent weights aside from the default weight. The sum of all code points, each weighted appropriately, should not exceed the max weighted length.
Some old methods from twitter-text 1.0 have been marked deprecated,
such as the tweet_length()
method. The new API is based on the
following method, parse_tweet()
def parse_tweet(text, options = {}) { ... }
This method takes a string as input and returns a results object that
contains information about the
string. Twitter::TwitterText::Validation::ParseResults
object includes:
-
:weighted_length
: the overall length of the tweet with code points weighted per the ranges defined in the configuration file. -
:permillage
: indicates the proportion (per thousand) of the weighted length in comparison to the max weighted length. A value > 1000 indicates input text that is longer than the allowable maximum. -
:valid
: indicates if input text length corresponds to a valid result. -
:display_range_start, :display_range_end
: An array of two unicode code point indices identifying the inclusive start and exclusive end of the displayable content of the Tweet. For more information, see the description ofdisplay_text_range
here: Tweet updates -
:valid_range_start, :valid_range_end
: An array of two unicode code point indices identifying the inclusive start and exclusive end of the valid content of the Tweet. For more information on the extended Tweet payload see Tweet updates
class MyClass
include Twitter::TwitterText::Extractor
usernames = extract_mentioned_screen_names("Mentioning @twitter and @jack")
# usernames = ["twitter", "jack"]
end
class MyClass
include Twitter::TwitterText::Extractor
extract_reply_screen_name("@twitter are you hiring?").do |username|
# username = "twitter"
end
end
class MyClass
include Twitter::TwitterText::Autolink
html = auto_link("link @user, please #request")
end
module ApplicationHelper
include Twitter::TwitterText::Autolink
end
<%= auto_link("link @user, please #request") %>
Username extraction and linking matches all valid Twitter usernames but does not verify that the username is a valid Twitter account.
Auto-link and extract list names when they are written in @user/list-name format.
Auto-link and extract hashtags, where a hashtag can contain most letters or numbers but cannot be solely numbers and cannot contain punctuation.
Asian languages like Chinese, Japanese or Korean may not use a delimiter such as a space to separate normal text from URLs making it difficult to identify where the URL ends and the text starts.
For this reason twitter-text currently does not support extracting or auto-linking of URLs immediately followed by non-Latin characters.
Example: "http://twitter.com/は素晴らしい" . The normal text is "は素晴らしい" and is not part of the URL even though it isn't space separated.
Special care has been taken to be sure that auto-linking and extraction work in Tweets of all languages. This means that languages without spaces between words should work equally well.
Use to provide emphasis around the "hits" returned from the Search API, built to work against text that has been auto-linked already.
Have a bug? Please create an issue here on GitHub!
https://github.com/twitter/twitter-text/issues
- David LaMacchia (https://github.com/dlamacchia)
- Yoshimasa Niwa (https://github.com/niw)
- Sudheer Guntupalli (https://github.com/sudhee)
- Kaushik Lakshmikanth (https://github.com/kaushlakers)
- Jose Antonio Marquez Russo (https://github.com/joseeight)
- Lee Adams (https://github.com/leeaustinadams)
- Matt Sanford (http://github.com/mzsanford)
- Raffi Krikorian (http://github.com/r)
- Ben Cherry (http://github.com/bcherry)
- Patrick Ewing (http://github.com/hoverbird)
- Jeff Smick (http://github.com/sprsquish)
- Kenneth Kufluk (https://github.com/kennethkufluk)
- Keita Fujii (https://github.com/keitaf)
- Jean-Philippe Bougie (http://github.com/jpbougie)
- Erik Michaels-Ober (https://github.com/sferik)
Copyright 2012-2020 Twitter, Inc and other contributors
Licensed under the Apache License, Version 2.0