Struct unicode_segmentation::GraphemeCursor

source ·
pub struct GraphemeCursor { /* private fields */ }
Expand description

Cursor-based segmenter for grapheme clusters.

This allows working with ropes and other datastructures where the string is not contiguous or fully known at initialization time.

Implementations§

source§

impl GraphemeCursor

source

pub fn new(offset: usize, len: usize, is_extended: bool) -> GraphemeCursor

Create a new cursor. The string and initial offset are given at creation time, but the contents of the string are not. The is_extended parameter controls whether extended grapheme clusters are selected.

The offset parameter must be on a codepoint boundary.

let s = "हिन्दी";
let mut legacy = GraphemeCursor::new(0, s.len(), false);
assert_eq!(legacy.next_boundary(s, 0), Ok(Some("ह".len())));
let mut extended = GraphemeCursor::new(0, s.len(), true);
assert_eq!(extended.next_boundary(s, 0), Ok(Some("हि".len())));
source

pub fn set_cursor(&mut self, offset: usize)

Set the cursor to a new location in the same string.

let s = "abcd";
let mut cursor = GraphemeCursor::new(0, s.len(), false);
assert_eq!(cursor.cur_cursor(), 0);
cursor.set_cursor(2);
assert_eq!(cursor.cur_cursor(), 2);
source

pub fn cur_cursor(&self) -> usize

The current offset of the cursor. Equal to the last value provided to new() or set_cursor(), or returned from next_boundary() or prev_boundary().

// Two flags (🇷🇸🇮🇴), each flag is two RIS codepoints, each RIS is 4 bytes.
let flags = "\u{1F1F7}\u{1F1F8}\u{1F1EE}\u{1F1F4}";
let mut cursor = GraphemeCursor::new(4, flags.len(), false);
assert_eq!(cursor.cur_cursor(), 4);
assert_eq!(cursor.next_boundary(flags, 0), Ok(Some(8)));
assert_eq!(cursor.cur_cursor(), 8);
source

pub fn provide_context(&mut self, chunk: &str, chunk_start: usize)

Provide additional pre-context when it is needed to decide a boundary. The end of the chunk must coincide with the value given in the GraphemeIncomplete::PreContext request.

let flags = "\u{1F1F7}\u{1F1F8}\u{1F1EE}\u{1F1F4}";
let mut cursor = GraphemeCursor::new(8, flags.len(), false);
// Not enough pre-context to decide if there's a boundary between the two flags.
assert_eq!(cursor.is_boundary(&flags[8..], 8), Err(GraphemeIncomplete::PreContext(8)));
// Provide one more Regional Indicator Symbol of pre-context
cursor.provide_context(&flags[4..8], 4);
// Still not enough context to decide.
assert_eq!(cursor.is_boundary(&flags[8..], 8), Err(GraphemeIncomplete::PreContext(4)));
// Provide additional requested context.
cursor.provide_context(&flags[0..4], 0);
// That's enough to decide (it always is when context goes to the start of the string)
assert_eq!(cursor.is_boundary(&flags[8..], 8), Ok(true));
source

pub fn is_boundary( &mut self, chunk: &str, chunk_start: usize ) -> Result<bool, GraphemeIncomplete>

Determine whether the current cursor location is a grapheme cluster boundary. Only a part of the string need be supplied. If chunk_start is nonzero or the length of chunk is not equal to len on creation, then this method may return GraphemeIncomplete::PreContext. The caller should then call provide_context with the requested chunk, then retry calling this method.

For partial chunks, if the cursor is not at the beginning or end of the string, the chunk should contain at least the codepoint following the cursor. If the string is nonempty, the chunk must be nonempty.

All calls should have consistent chunk contents (ie, if a chunk provides content for a given slice, all further chunks covering that slice must have the same content for it).

let flags = "\u{1F1F7}\u{1F1F8}\u{1F1EE}\u{1F1F4}";
let mut cursor = GraphemeCursor::new(8, flags.len(), false);
assert_eq!(cursor.is_boundary(flags, 0), Ok(true));
cursor.set_cursor(12);
assert_eq!(cursor.is_boundary(flags, 0), Ok(false));
source

pub fn next_boundary( &mut self, chunk: &str, chunk_start: usize ) -> Result<Option<usize>, GraphemeIncomplete>

Find the next boundary after the current cursor position. Only a part of the string need be supplied. If the chunk is incomplete, then this method might return GraphemeIncomplete::PreContext or GraphemeIncomplete::NextChunk. In the former case, the caller should call provide_context with the requested chunk, then retry. In the latter case, the caller should provide the chunk following the one given, then retry.

See is_boundary for expectations on the provided chunk.

let flags = "\u{1F1F7}\u{1F1F8}\u{1F1EE}\u{1F1F4}";
let mut cursor = GraphemeCursor::new(4, flags.len(), false);
assert_eq!(cursor.next_boundary(flags, 0), Ok(Some(8)));
assert_eq!(cursor.next_boundary(flags, 0), Ok(Some(16)));
assert_eq!(cursor.next_boundary(flags, 0), Ok(None));

And an example that uses partial strings:

let s = "abcd";
let mut cursor = GraphemeCursor::new(0, s.len(), false);
assert_eq!(cursor.next_boundary(&s[..2], 0), Ok(Some(1)));
assert_eq!(cursor.next_boundary(&s[..2], 0), Err(GraphemeIncomplete::NextChunk));
assert_eq!(cursor.next_boundary(&s[2..4], 2), Ok(Some(2)));
assert_eq!(cursor.next_boundary(&s[2..4], 2), Ok(Some(3)));
assert_eq!(cursor.next_boundary(&s[2..4], 2), Ok(Some(4)));
assert_eq!(cursor.next_boundary(&s[2..4], 2), Ok(None));
source

pub fn prev_boundary( &mut self, chunk: &str, chunk_start: usize ) -> Result<Option<usize>, GraphemeIncomplete>

Find the previous boundary after the current cursor position. Only a part of the string need be supplied. If the chunk is incomplete, then this method might return GraphemeIncomplete::PreContext or GraphemeIncomplete::PrevChunk. In the former case, the caller should call provide_context with the requested chunk, then retry. In the latter case, the caller should provide the chunk preceding the one given, then retry.

See is_boundary for expectations on the provided chunk.

let flags = "\u{1F1F7}\u{1F1F8}\u{1F1EE}\u{1F1F4}";
let mut cursor = GraphemeCursor::new(12, flags.len(), false);
assert_eq!(cursor.prev_boundary(flags, 0), Ok(Some(8)));
assert_eq!(cursor.prev_boundary(flags, 0), Ok(Some(0)));
assert_eq!(cursor.prev_boundary(flags, 0), Ok(None));

And an example that uses partial strings (note the exact return is not guaranteed, and may be PrevChunk or PreContext arbitrarily):

let s = "abcd";
let mut cursor = GraphemeCursor::new(4, s.len(), false);
assert_eq!(cursor.prev_boundary(&s[2..4], 2), Ok(Some(3)));
assert_eq!(cursor.prev_boundary(&s[2..4], 2), Err(GraphemeIncomplete::PrevChunk));
assert_eq!(cursor.prev_boundary(&s[0..2], 0), Ok(Some(2)));
assert_eq!(cursor.prev_boundary(&s[0..2], 0), Ok(Some(1)));
assert_eq!(cursor.prev_boundary(&s[0..2], 0), Ok(Some(0)));
assert_eq!(cursor.prev_boundary(&s[0..2], 0), Ok(None));

Trait Implementations§

source§

impl Clone for GraphemeCursor

source§

fn clone(&self) -> GraphemeCursor

Returns a copy of the value. Read more
1.0.0 · source§

fn clone_from(&mut self, source: &Self)

Performs copy-assignment from source. Read more
source§

impl Debug for GraphemeCursor

source§

fn fmt(&self, f: &mut Formatter<'_>) -> Result

Formats the value using the given formatter. Read more

Auto Trait Implementations§

Blanket Implementations§

source§

impl<T> Any for T
where T: 'static + ?Sized,

source§

fn type_id(&self) -> TypeId

Gets the TypeId of self. Read more
source§

impl<T> Borrow<T> for T
where T: ?Sized,

source§

fn borrow(&self) -> &T

Immutably borrows from an owned value. Read more
source§

impl<T> BorrowMut<T> for T
where T: ?Sized,

source§

fn borrow_mut(&mut self) -> &mut T

Mutably borrows from an owned value. Read more
source§

impl<T> From<T> for T

source§

fn from(t: T) -> T

Returns the argument unchanged.

source§

impl<T, U> Into<U> for T
where U: From<T>,

source§

fn into(self) -> U

Calls U::from(self).

That is, this conversion is whatever the implementation of From<T> for U chooses to do.

source§

impl<T, U> TryFrom<U> for T
where U: Into<T>,

§

type Error = Infallible

The type returned in the event of a conversion error.
source§

fn try_from(value: U) -> Result<T, <T as TryFrom<U>>::Error>

Performs the conversion.
source§

impl<T, U> TryInto<U> for T
where U: TryFrom<T>,

§

type Error = <U as TryFrom<T>>::Error

The type returned in the event of a conversion error.
source§

fn try_into(self) -> Result<U, <U as TryFrom<T>>::Error>

Performs the conversion.