Skip to content

Conversation

@echeran
Copy link
Contributor

@echeran echeran commented Oct 8, 2024

In order to "modernize" the BreakIterator API, this PR introduces a new wrapper using a more convenient, modern API design around a Segmenter interface.

A few of the goals that motivate the new Segmenter API:

  • Use newer Java features from Java 8 that support the Stream API which underlies a functional programming style
  • Create instances that are immutable (reduces complexity borne of statefulness; allows user code to be more referentially transparent)
  • Create a wrapper class around the iteration. This allows the decoupling of the iteration of a source string from the construction of the BreakIterator such that we can perform iteration over one string in isolation from other strings
  • Use interfaces to properly decouple and abstract. APIs built on top of interfaces can allow user-created implementations to participate in such higher level APIs.

More details in the design doc.

This PR will focus on the ICU4J side of the work.

Checklist

  • Required: Issue filed: https://unicode-org.atlassian.net/browse/ICU-22789
  • Required: The PR title must be prefixed with a JIRA Issue number. Example: "ICU-1234 Fix xyz"
  • Required: The PR description must include the link to the Jira Issue, for example by completing the URL in the first checklist item
  • Required: Each commit message must be prefixed with a JIRA Issue number. Example: "ICU-1234 Fix xyz"
  • Issue accepted (done by Technical Committee after discussion)
  • Tests included, if applicable
  • API docs and/or User Guide docs changed or added, if applicable

macchiati
macchiati previously approved these changes Jan 9, 2025
@echeran echeran changed the title ICU-22789 Add Segmenter API to conveniently wrap BreakIterator ICU-22789 Add Segmenter API to conveniently wrap BreakIterator in ICU4J Apr 11, 2025
@echeran echeran marked this pull request as ready for review April 11, 2025 22:55
richgillam
richgillam previously approved these changes Apr 17, 2025
Copy link
Contributor

@richgillam richgillam left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks great! One quibble.

@richgillam
Copy link
Contributor

Looks great! One quibble.

Actually, one other observation. As things stand, Segmenter and its subclasses don't do much-- they just create Segments objects, which wrap BreakIterator objects that do all the work. The API is a lot cleaner and clearer, but the implementation isn't. I assume the plan in the future at some point is to move to an implementation where the Segmenter actually owns the state and category tables and the Segments object just handles iteration over a particular string? (I'm not saying you need to do this now; just clarifying that that's in the plan.)

Copy link
Member

@markusicu markusicu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sorry for the late review!
so far i have mostly checked the API here against the proposal -- and found that the proposal omitted one method, see the design doc.

Copy link
Contributor Author

@echeran echeran left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Addressed new feedback from @markusicu. PTAL.

markusicu
markusicu previously approved these changes Jun 25, 2025
Copy link
Member

@markusicu markusicu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lean and mean :-)

@echeran echeran force-pushed the breakiter-api-modern branch from b51b630 to 21e00d9 Compare June 25, 2025 19:35
@jira-pull-request-webhook
Copy link

Hooray! The files in the branch are the same across the force-push. 😃

~ Your Friendly Jira-GitHub PR Checker Bot

@echeran echeran merged commit 1f33101 into unicode-org:main Jun 25, 2025
14 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants