Previous | Next | Trail Map | Writing Global Programs | Collation and Text Boundaries


The Text Boundary Demo

The BreakIterator(in the API reference documentation) class was added to the JDK 1.1. This class provides methods for finding word, sentence and other boundaries in text in a locale-sensitive way. Here's another demo program provided by Taligent that shows off many of BreakIterator's capabilities. Run the demo program and try some of the suggestions below.

Since you can't run the applet, here's a picture of it:

Text boundary analysis enables programs to provide more intelligent text selection and line-wrapping, based on the natural language of the text. All of the boundary detection functions demonstrated here use table-driven state machines for speed.

This demo applet lets you cycle through consecutive occurrences of the types of text elements listed below. Use the right or left arrow key to highlight the next or previous occurrence. 

Sentences  Lines  Words  Characters 


Sentences

Many text editing applications allow the user to select sentences with a key combination or triple click. The sentence selection mechanism demonstrated here is sensitive to the context of periods and other punctuation. 

To See This...

Do This...

Sentence selection with correct interpretation of periods within numbers and abbreviations, and trailing punctuation marks such as quotation marks and parentheses.
  1. Click at the beginning of the text
  2. Use the left or right arrow key to select sentences


Lines

Line boundary analysis determines where a text string can be broken when line-wrapping. 

To See This...

Do This...

Identification of places where lines could be broken if you needed to reflow the text. The mechanism correctly handles punctuation and hyphenated words. 
  1. Select Line Break from the Boundaries menu 
  2. Click in the text 
  3. Use the left or right arrow key to highlight line breakpoints 
Though not demonstrated here, the boundary mechanism also handles line breaks for non-Roman scripts.


Words

Word boundary analysis is used by search and replace functions, as well as within text editing applications that allow the user to select words with a double click. 

To See This...

Do This...

Word selection with correct interpretation of punctuation marks within and following words. Characters that are not part of a word, such as symbols or punctuation marks, have word-breaks on both sides. 
  1. Select Word from the Boundaries menu 
  2. Click in the text 
  3. Use the left or right arrow key to highlight words 


Characters

Character boundary analysis allows users to interact with characters as they expect to, for example, when moving the cursor through a text string. 

To See This...

Do This...

Correct navigation of the cursor through character strings, regardless of how the character is stored. For example, an accented character might be stored as a base character and a diacritical mark. What users consider to be a character can differ between languages. 
  1. Select Char from the Boundaries menu 
  2. Click in the text 
  3. Use the left or right arrow key to highlight characters 
Though not demonstrated here, the boundary detection mechanism can handle Hangul and Indic syllables as well as characters followed by non-spacing accents and diacritics. 

You can also type other text into the applet window to see selection of different boundaries. Try it out! 



The source.


This page incorporates material or code copyrighted by Taligent, Inc. For more information on international resources, see their International Fact Sheet.


Previous | Next | Trail Map | Writing Global Programs | Collation and Text Boundaries