Struct junction_api::Regex

source ·
pub struct Regex(/* private fields */);
Expand description

A regular expression.

Regex has same syntax and semantics as Rust’s regex crate.

Methods from Deref<Target = Regex>§

pub fn is_match(&self, haystack: &str) -> bool

Returns true if and only if there is a match for the regex anywhere in the haystack given.

It is recommended to use this method if all you need to do is test whether a match exists, since the underlying matching engine may be able to do less work.

§Example

Test if some haystack contains at least one word with exactly 13 Unicode word characters:

use regex::Regex;

let re = Regex::new(r"\b\w{13}\b").unwrap();
let hay = "I categorically deny having triskaidekaphobia.";
assert!(re.is_match(hay));

pub fn find<'h>(&self, haystack: &'h str) -> Option<Match<'h>>

This routine searches for the first match of this regex in the haystack given, and if found, returns a [Match]. The Match provides access to both the byte offsets of the match and the actual substring that matched.

Note that this should only be used if you want to find the entire match. If instead you just want to test the existence of a match, it’s potentially faster to use Regex::is_match(hay) instead of Regex::find(hay).is_some().

§Example

Find the first word with exactly 13 Unicode word characters:

use regex::Regex;

let re = Regex::new(r"\b\w{13}\b").unwrap();
let hay = "I categorically deny having triskaidekaphobia.";
let mat = re.find(hay).unwrap();
assert_eq!(2..15, mat.range());
assert_eq!("categorically", mat.as_str());

pub fn find_iter<'r, 'h>(&'r self, haystack: &'h str) -> Matches<'r, 'h>

Returns an iterator that yields successive non-overlapping matches in the given haystack. The iterator yields values of type [Match].

§Time complexity

Note that since find_iter runs potentially many searches on the haystack and since each search has worst case O(m * n) time complexity, the overall worst case time complexity for iteration is O(m * n^2).

§Example

Find every word with exactly 13 Unicode word characters:

use regex::Regex;

let re = Regex::new(r"\b\w{13}\b").unwrap();
let hay = "Retroactively relinquishing remunerations is reprehensible.";
let matches: Vec<_> = re.find_iter(hay).map(|m| m.as_str()).collect();
assert_eq!(matches, vec![
    "Retroactively",
    "relinquishing",
    "remunerations",
    "reprehensible",
]);

pub fn captures<'h>(&self, haystack: &'h str) -> Option<Captures<'h>>

This routine searches for the first match of this regex in the haystack given, and if found, returns not only the overall match but also the matches of each capture group in the regex. If no match is found, then None is returned.

Capture group 0 always corresponds to an implicit unnamed group that includes the entire match. If a match is found, this group is always present. Subsequent groups may be named and are numbered, starting at 1, by the order in which the opening parenthesis appears in the pattern. For example, in the pattern (?<a>.(?<b>.))(?<c>.), a, b and c correspond to capture group indices 1, 2 and 3, respectively.

You should only use captures if you need access to the capture group matches. Otherwise, [Regex::find] is generally faster for discovering just the overall match.

§Example

Say you have some haystack with movie names and their release years, like “‘Citizen Kane’ (1941)”. It’d be nice if we could search for substrings looking like that, while also extracting the movie name and its release year separately. The example below shows how to do that.

use regex::Regex;

let re = Regex::new(r"'([^']+)'\s+\((\d{4})\)").unwrap();
let hay = "Not my favorite movie: 'Citizen Kane' (1941).";
let caps = re.captures(hay).unwrap();
assert_eq!(caps.get(0).unwrap().as_str(), "'Citizen Kane' (1941)");
assert_eq!(caps.get(1).unwrap().as_str(), "Citizen Kane");
assert_eq!(caps.get(2).unwrap().as_str(), "1941");
// You can also access the groups by index using the Index notation.
// Note that this will panic on an invalid index. In this case, these
// accesses are always correct because the overall regex will only
// match when these capture groups match.
assert_eq!(&caps[0], "'Citizen Kane' (1941)");
assert_eq!(&caps[1], "Citizen Kane");
assert_eq!(&caps[2], "1941");

Note that the full match is at capture group 0. Each subsequent capture group is indexed by the order of its opening (.

We can make this example a bit clearer by using named capture groups:

use regex::Regex;

let re = Regex::new(r"'(?<title>[^']+)'\s+\((?<year>\d{4})\)").unwrap();
let hay = "Not my favorite movie: 'Citizen Kane' (1941).";
let caps = re.captures(hay).unwrap();
assert_eq!(caps.get(0).unwrap().as_str(), "'Citizen Kane' (1941)");
assert_eq!(caps.name("title").unwrap().as_str(), "Citizen Kane");
assert_eq!(caps.name("year").unwrap().as_str(), "1941");
// You can also access the groups by name using the Index notation.
// Note that this will panic on an invalid group name. In this case,
// these accesses are always correct because the overall regex will
// only match when these capture groups match.
assert_eq!(&caps[0], "'Citizen Kane' (1941)");
assert_eq!(&caps["title"], "Citizen Kane");
assert_eq!(&caps["year"], "1941");

Here we name the capture groups, which we can access with the name method or the Index notation with a &str. Note that the named capture groups are still accessible with get or the Index notation with a usize.

The 0th capture group is always unnamed, so it must always be accessed with get(0) or [0].

Finally, one other way to to get the matched substrings is with the [Captures::extract] API:

use regex::Regex;

let re = Regex::new(r"'([^']+)'\s+\((\d{4})\)").unwrap();
let hay = "Not my favorite movie: 'Citizen Kane' (1941).";
let (full, [title, year]) = re.captures(hay).unwrap().extract();
assert_eq!(full, "'Citizen Kane' (1941)");
assert_eq!(title, "Citizen Kane");
assert_eq!(year, "1941");

pub fn captures_iter<'r, 'h>( &'r self, haystack: &'h str, ) -> CaptureMatches<'r, 'h>

Returns an iterator that yields successive non-overlapping matches in the given haystack. The iterator yields values of type [Captures].

This is the same as [Regex::find_iter], but instead of only providing access to the overall match, each value yield includes access to the matches of all capture groups in the regex. Reporting this extra match data is potentially costly, so callers should only use captures_iter over find_iter when they actually need access to the capture group matches.

§Time complexity

Note that since captures_iter runs potentially many searches on the haystack and since each search has worst case O(m * n) time complexity, the overall worst case time complexity for iteration is O(m * n^2).

§Example

We can use this to find all movie titles and their release years in some haystack, where the movie is formatted like “‘Title’ (xxxx)”:

use regex::Regex;

let re = Regex::new(r"'([^']+)'\s+\(([0-9]{4})\)").unwrap();
let hay = "'Citizen Kane' (1941), 'The Wizard of Oz' (1939), 'M' (1931).";
let mut movies = vec![];
for (_, [title, year]) in re.captures_iter(hay).map(|c| c.extract()) {
    movies.push((title, year.parse::<i64>()?));
}
assert_eq!(movies, vec![
    ("Citizen Kane", 1941),
    ("The Wizard of Oz", 1939),
    ("M", 1931),
]);

Or with named groups:

use regex::Regex;

let re = Regex::new(r"'(?<title>[^']+)'\s+\((?<year>[0-9]{4})\)").unwrap();
let hay = "'Citizen Kane' (1941), 'The Wizard of Oz' (1939), 'M' (1931).";
let mut it = re.captures_iter(hay);

let caps = it.next().unwrap();
assert_eq!(&caps["title"], "Citizen Kane");
assert_eq!(&caps["year"], "1941");

let caps = it.next().unwrap();
assert_eq!(&caps["title"], "The Wizard of Oz");
assert_eq!(&caps["year"], "1939");

let caps = it.next().unwrap();
assert_eq!(&caps["title"], "M");
assert_eq!(&caps["year"], "1931");

pub fn split<'r, 'h>(&'r self, haystack: &'h str) -> Split<'r, 'h>

Returns an iterator of substrings of the haystack given, delimited by a match of the regex. Namely, each element of the iterator corresponds to a part of the haystack that isn’t matched by the regular expression.

§Time complexity

Since iterators over all matches requires running potentially many searches on the haystack, and since each search has worst case O(m * n) time complexity, the overall worst case time complexity for this routine is O(m * n^2).

§Example

To split a string delimited by arbitrary amounts of spaces or tabs:

use regex::Regex;

let re = Regex::new(r"[ \t]+").unwrap();
let hay = "a b \t  c\td    e";
let fields: Vec<&str> = re.split(hay).collect();
assert_eq!(fields, vec!["a", "b", "c", "d", "e"]);
§Example: more cases

Basic usage:

use regex::Regex;

let re = Regex::new(r" ").unwrap();
let hay = "Mary had a little lamb";
let got: Vec<&str> = re.split(hay).collect();
assert_eq!(got, vec!["Mary", "had", "a", "little", "lamb"]);

let re = Regex::new(r"X").unwrap();
let hay = "";
let got: Vec<&str> = re.split(hay).collect();
assert_eq!(got, vec![""]);

let re = Regex::new(r"X").unwrap();
let hay = "lionXXtigerXleopard";
let got: Vec<&str> = re.split(hay).collect();
assert_eq!(got, vec!["lion", "", "tiger", "leopard"]);

let re = Regex::new(r"::").unwrap();
let hay = "lion::tiger::leopard";
let got: Vec<&str> = re.split(hay).collect();
assert_eq!(got, vec!["lion", "tiger", "leopard"]);

If a haystack contains multiple contiguous matches, you will end up with empty spans yielded by the iterator:

use regex::Regex;

let re = Regex::new(r"X").unwrap();
let hay = "XXXXaXXbXc";
let got: Vec<&str> = re.split(hay).collect();
assert_eq!(got, vec!["", "", "", "", "a", "", "b", "c"]);

let re = Regex::new(r"/").unwrap();
let hay = "(///)";
let got: Vec<&str> = re.split(hay).collect();
assert_eq!(got, vec!["(", "", "", ")"]);

Separators at the start or end of a haystack are neighbored by empty substring.

use regex::Regex;

let re = Regex::new(r"0").unwrap();
let hay = "010";
let got: Vec<&str> = re.split(hay).collect();
assert_eq!(got, vec!["", "1", ""]);

When the empty string is used as a regex, it splits at every valid UTF-8 boundary by default (which includes the beginning and end of the haystack):

use regex::Regex;

let re = Regex::new(r"").unwrap();
let hay = "rust";
let got: Vec<&str> = re.split(hay).collect();
assert_eq!(got, vec!["", "r", "u", "s", "t", ""]);

// Splitting by an empty string is UTF-8 aware by default!
let re = Regex::new(r"").unwrap();
let hay = "☃";
let got: Vec<&str> = re.split(hay).collect();
assert_eq!(got, vec!["", "☃", ""]);

Contiguous separators (commonly shows up with whitespace), can lead to possibly surprising behavior. For example, this code is correct:

use regex::Regex;

let re = Regex::new(r" ").unwrap();
let hay = "    a  b c";
let got: Vec<&str> = re.split(hay).collect();
assert_eq!(got, vec!["", "", "", "", "a", "", "b", "c"]);

It does not give you ["a", "b", "c"]. For that behavior, you’d want to match contiguous space characters:

use regex::Regex;

let re = Regex::new(r" +").unwrap();
let hay = "    a  b c";
let got: Vec<&str> = re.split(hay).collect();
// N.B. This does still include a leading empty span because ' +'
// matches at the beginning of the haystack.
assert_eq!(got, vec!["", "a", "b", "c"]);

pub fn splitn<'r, 'h>( &'r self, haystack: &'h str, limit: usize, ) -> SplitN<'r, 'h>

Returns an iterator of at most limit substrings of the haystack given, delimited by a match of the regex. (A limit of 0 will return no substrings.) Namely, each element of the iterator corresponds to a part of the haystack that isn’t matched by the regular expression. The remainder of the haystack that is not split will be the last element in the iterator.

§Time complexity

Since iterators over all matches requires running potentially many searches on the haystack, and since each search has worst case O(m * n) time complexity, the overall worst case time complexity for this routine is O(m * n^2).

Although note that the worst case time here has an upper bound given by the limit parameter.

§Example

Get the first two words in some haystack:

use regex::Regex;

let re = Regex::new(r"\W+").unwrap();
let hay = "Hey! How are you?";
let fields: Vec<&str> = re.splitn(hay, 3).collect();
assert_eq!(fields, vec!["Hey", "How", "are you?"]);
§Examples: more cases
use regex::Regex;

let re = Regex::new(r" ").unwrap();
let hay = "Mary had a little lamb";
let got: Vec<&str> = re.splitn(hay, 3).collect();
assert_eq!(got, vec!["Mary", "had", "a little lamb"]);

let re = Regex::new(r"X").unwrap();
let hay = "";
let got: Vec<&str> = re.splitn(hay, 3).collect();
assert_eq!(got, vec![""]);

let re = Regex::new(r"X").unwrap();
let hay = "lionXXtigerXleopard";
let got: Vec<&str> = re.splitn(hay, 3).collect();
assert_eq!(got, vec!["lion", "", "tigerXleopard"]);

let re = Regex::new(r"::").unwrap();
let hay = "lion::tiger::leopard";
let got: Vec<&str> = re.splitn(hay, 2).collect();
assert_eq!(got, vec!["lion", "tiger::leopard"]);

let re = Regex::new(r"X").unwrap();
let hay = "abcXdef";
let got: Vec<&str> = re.splitn(hay, 1).collect();
assert_eq!(got, vec!["abcXdef"]);

let re = Regex::new(r"X").unwrap();
let hay = "abcdef";
let got: Vec<&str> = re.splitn(hay, 2).collect();
assert_eq!(got, vec!["abcdef"]);

let re = Regex::new(r"X").unwrap();
let hay = "abcXdef";
let got: Vec<&str> = re.splitn(hay, 0).collect();
assert!(got.is_empty());

pub fn replace<'h, R>(&self, haystack: &'h str, rep: R) -> Cow<'h, str>
where R: Replacer,

Replaces the leftmost-first match in the given haystack with the replacement provided. The replacement can be a regular string (where $N and $name are expanded to match capture groups) or a function that takes a [Captures] and returns the replaced string.

If no match is found, then the haystack is returned unchanged. In that case, this implementation will likely return a Cow::Borrowed value such that no allocation is performed.

When a Cow::Borrowed is returned, the value returned is guaranteed to be equivalent to the haystack given.

§Replacement string syntax

All instances of $ref in the replacement string are replaced with the substring corresponding to the capture group identified by ref.

ref may be an integer corresponding to the index of the capture group (counted by order of opening parenthesis where 0 is the entire match) or it can be a name (consisting of letters, digits or underscores) corresponding to a named capture group.

If ref isn’t a valid capture group (whether the name doesn’t exist or isn’t a valid index), then it is replaced with the empty string.

The longest possible name is used. For example, $1a looks up the capture group named 1a and not the capture group at index 1. To exert more precise control over the name, use braces, e.g., ${1}a.

To write a literal $ use $$.

§Example

Note that this function is polymorphic with respect to the replacement. In typical usage, this can just be a normal string:

use regex::Regex;

let re = Regex::new(r"[^01]+").unwrap();
assert_eq!(re.replace("1078910", ""), "1010");

But anything satisfying the [Replacer] trait will work. For example, a closure of type |&Captures| -> String provides direct access to the captures corresponding to a match. This allows one to access capturing group matches easily:

use regex::{Captures, Regex};

let re = Regex::new(r"([^,\s]+),\s+(\S+)").unwrap();
let result = re.replace("Springsteen, Bruce", |caps: &Captures| {
    format!("{} {}", &caps[2], &caps[1])
});
assert_eq!(result, "Bruce Springsteen");

But this is a bit cumbersome to use all the time. Instead, a simple syntax is supported (as described above) that expands $name into the corresponding capture group. Here’s the last example, but using this expansion technique with named capture groups:

use regex::Regex;

let re = Regex::new(r"(?<last>[^,\s]+),\s+(?<first>\S+)").unwrap();
let result = re.replace("Springsteen, Bruce", "$first $last");
assert_eq!(result, "Bruce Springsteen");

Note that using $2 instead of $first or $1 instead of $last would produce the same result. To write a literal $ use $$.

Sometimes the replacement string requires use of curly braces to delineate a capture group replacement when it is adjacent to some other literal text. For example, if we wanted to join two words together with an underscore:

use regex::Regex;

let re = Regex::new(r"(?<first>\w+)\s+(?<second>\w+)").unwrap();
let result = re.replace("deep fried", "${first}_$second");
assert_eq!(result, "deep_fried");

Without the curly braces, the capture group name first_ would be used, and since it doesn’t exist, it would be replaced with the empty string.

Finally, sometimes you just want to replace a literal string with no regard for capturing group expansion. This can be done by wrapping a string with [NoExpand]:

use regex::{NoExpand, Regex};

let re = Regex::new(r"(?<last>[^,\s]+),\s+(\S+)").unwrap();
let result = re.replace("Springsteen, Bruce", NoExpand("$2 $last"));
assert_eq!(result, "$2 $last");

Using NoExpand may also be faster, since the replacement string won’t need to be parsed for the $ syntax.

pub fn replace_all<'h, R>(&self, haystack: &'h str, rep: R) -> Cow<'h, str>
where R: Replacer,

Replaces all non-overlapping matches in the haystack with the replacement provided. This is the same as calling replacen with limit set to 0.

If no match is found, then the haystack is returned unchanged. In that case, this implementation will likely return a Cow::Borrowed value such that no allocation is performed.

When a Cow::Borrowed is returned, the value returned is guaranteed to be equivalent to the haystack given.

The documentation for [Regex::replace] goes into more detail about what kinds of replacement strings are supported.

§Time complexity

Since iterators over all matches requires running potentially many searches on the haystack, and since each search has worst case O(m * n) time complexity, the overall worst case time complexity for this routine is O(m * n^2).

§Fallibility

If you need to write a replacement routine where any individual replacement might “fail,” doing so with this API isn’t really feasible because there’s no way to stop the search process if a replacement fails. Instead, if you need this functionality, you should consider implementing your own replacement routine:

use regex::{Captures, Regex};

fn replace_all<E>(
    re: &Regex,
    haystack: &str,
    replacement: impl Fn(&Captures) -> Result<String, E>,
) -> Result<String, E> {
    let mut new = String::with_capacity(haystack.len());
    let mut last_match = 0;
    for caps in re.captures_iter(haystack) {
        let m = caps.get(0).unwrap();
        new.push_str(&haystack[last_match..m.start()]);
        new.push_str(&replacement(&caps)?);
        last_match = m.end();
    }
    new.push_str(&haystack[last_match..]);
    Ok(new)
}

// Let's replace each word with the number of bytes in that word.
// But if we see a word that is "too long," we'll give up.
let re = Regex::new(r"\w+").unwrap();
let replacement = |caps: &Captures| -> Result<String, &'static str> {
    if caps[0].len() >= 5 {
        return Err("word too long");
    }
    Ok(caps[0].len().to_string())
};
assert_eq!(
    Ok("2 3 3 3?".to_string()),
    replace_all(&re, "hi how are you?", &replacement),
);
assert!(replace_all(&re, "hi there", &replacement).is_err());
§Example

This example shows how to flip the order of whitespace (excluding line terminators) delimited fields, and normalizes the whitespace that delimits the fields:

use regex::Regex;

let re = Regex::new(r"(?m)^(\S+)[\s--\r\n]+(\S+)$").unwrap();
let hay = "
Greetings  1973
Wild\t1973
BornToRun\t\t\t\t1975
Darkness                    1978
TheRiver 1980
";
let new = re.replace_all(hay, "$2 $1");
assert_eq!(new, "
1973 Greetings
1973 Wild
1975 BornToRun
1978 Darkness
1980 TheRiver
");

pub fn replacen<'h, R>( &self, haystack: &'h str, limit: usize, rep: R, ) -> Cow<'h, str>
where R: Replacer,

Replaces at most limit non-overlapping matches in the haystack with the replacement provided. If limit is 0, then all non-overlapping matches are replaced. That is, Regex::replace_all(hay, rep) is equivalent to Regex::replacen(hay, 0, rep).

If no match is found, then the haystack is returned unchanged. In that case, this implementation will likely return a Cow::Borrowed value such that no allocation is performed.

When a Cow::Borrowed is returned, the value returned is guaranteed to be equivalent to the haystack given.

The documentation for [Regex::replace] goes into more detail about what kinds of replacement strings are supported.

§Time complexity

Since iterators over all matches requires running potentially many searches on the haystack, and since each search has worst case O(m * n) time complexity, the overall worst case time complexity for this routine is O(m * n^2).

Although note that the worst case time here has an upper bound given by the limit parameter.

§Fallibility

See the corresponding section in the docs for [Regex::replace_all] for tips on how to deal with a replacement routine that can fail.

§Example

This example shows how to flip the order of whitespace (excluding line terminators) delimited fields, and normalizes the whitespace that delimits the fields. But we only do it for the first two matches.

use regex::Regex;

let re = Regex::new(r"(?m)^(\S+)[\s--\r\n]+(\S+)$").unwrap();
let hay = "
Greetings  1973
Wild\t1973
BornToRun\t\t\t\t1975
Darkness                    1978
TheRiver 1980
";
let new = re.replacen(hay, 2, "$2 $1");
assert_eq!(new, "
1973 Greetings
1973 Wild
BornToRun\t\t\t\t1975
Darkness                    1978
TheRiver 1980
");

pub fn shortest_match(&self, haystack: &str) -> Option<usize>

Returns the end byte offset of the first match in the haystack given.

This method may have the same performance characteristics as is_match. Behaviorlly, it doesn’t just report whether it match occurs, but also the end offset for a match. In particular, the offset returned may be shorter than the proper end of the leftmost-first match that you would find via [Regex::find].

Note that it is not guaranteed that this routine finds the shortest or “earliest” possible match. Instead, the main idea of this API is that it returns the offset at the point at which the internal regex engine has determined that a match has occurred. This may vary depending on which internal regex engine is used, and thus, the offset itself may change based on internal heuristics.

§Example

Typically, a+ would match the entire first sequence of a in some haystack, but shortest_match may give up as soon as it sees the first a.

use regex::Regex;

let re = Regex::new(r"a+").unwrap();
let offset = re.shortest_match("aaaaa").unwrap();
assert_eq!(offset, 1);

pub fn shortest_match_at(&self, haystack: &str, start: usize) -> Option<usize>

Returns the same as [Regex::shortest_match], but starts the search at the given offset.

The significance of the starting point is that it takes the surrounding context into consideration. For example, the \A anchor can only match when start == 0.

If a match is found, the offset returned is relative to the beginning of the haystack, not the beginning of the search.

§Panics

This panics when start >= haystack.len() + 1.

§Example

This example shows the significance of start by demonstrating how it can be used to permit look-around assertions in a regex to take the surrounding context into account.

use regex::Regex;

let re = Regex::new(r"\bchew\b").unwrap();
let hay = "eschew";
// We get a match here, but it's probably not intended.
assert_eq!(re.shortest_match(&hay[2..]), Some(4));
// No match because the  assertions take the context into account.
assert_eq!(re.shortest_match_at(hay, 2), None);

pub fn is_match_at(&self, haystack: &str, start: usize) -> bool

Returns the same as [Regex::is_match], but starts the search at the given offset.

The significance of the starting point is that it takes the surrounding context into consideration. For example, the \A anchor can only match when start == 0.

§Panics

This panics when start >= haystack.len() + 1.

§Example

This example shows the significance of start by demonstrating how it can be used to permit look-around assertions in a regex to take the surrounding context into account.

use regex::Regex;

let re = Regex::new(r"\bchew\b").unwrap();
let hay = "eschew";
// We get a match here, but it's probably not intended.
assert!(re.is_match(&hay[2..]));
// No match because the  assertions take the context into account.
assert!(!re.is_match_at(hay, 2));

pub fn find_at<'h>(&self, haystack: &'h str, start: usize) -> Option<Match<'h>>

Returns the same as [Regex::find], but starts the search at the given offset.

The significance of the starting point is that it takes the surrounding context into consideration. For example, the \A anchor can only match when start == 0.

§Panics

This panics when start >= haystack.len() + 1.

§Example

This example shows the significance of start by demonstrating how it can be used to permit look-around assertions in a regex to take the surrounding context into account.

use regex::Regex;

let re = Regex::new(r"\bchew\b").unwrap();
let hay = "eschew";
// We get a match here, but it's probably not intended.
assert_eq!(re.find(&hay[2..]).map(|m| m.range()), Some(0..4));
// No match because the  assertions take the context into account.
assert_eq!(re.find_at(hay, 2), None);

pub fn captures_at<'h>( &self, haystack: &'h str, start: usize, ) -> Option<Captures<'h>>

Returns the same as [Regex::captures], but starts the search at the given offset.

The significance of the starting point is that it takes the surrounding context into consideration. For example, the \A anchor can only match when start == 0.

§Panics

This panics when start >= haystack.len() + 1.

§Example

This example shows the significance of start by demonstrating how it can be used to permit look-around assertions in a regex to take the surrounding context into account.

use regex::Regex;

let re = Regex::new(r"\bchew\b").unwrap();
let hay = "eschew";
// We get a match here, but it's probably not intended.
assert_eq!(&re.captures(&hay[2..]).unwrap()[0], "chew");
// No match because the  assertions take the context into account.
assert!(re.captures_at(hay, 2).is_none());

pub fn captures_read<'h>( &self, locs: &mut CaptureLocations, haystack: &'h str, ) -> Option<Match<'h>>

This is like [Regex::captures], but writes the byte offsets of each capture group match into the locations given.

A [CaptureLocations] stores the same byte offsets as a [Captures], but does not store a reference to the haystack. This makes its API a bit lower level and less convenient. But in exchange, callers may allocate their own CaptureLocations and reuse it for multiple searches. This may be helpful if allocating a Captures shows up in a profile as too costly.

To create a CaptureLocations value, use the [Regex::capture_locations] method.

This also returns the overall match if one was found. When a match is found, its offsets are also always stored in locs at index 0.

§Panics

This routine may panic if the given CaptureLocations was not created by this regex.

§Example
use regex::Regex;

let re = Regex::new(r"^([a-z]+)=(\S*)$").unwrap();
let mut locs = re.capture_locations();
assert!(re.captures_read(&mut locs, "id=foo123").is_some());
assert_eq!(Some((0, 9)), locs.get(0));
assert_eq!(Some((0, 2)), locs.get(1));
assert_eq!(Some((3, 9)), locs.get(2));

pub fn captures_read_at<'h>( &self, locs: &mut CaptureLocations, haystack: &'h str, start: usize, ) -> Option<Match<'h>>

Returns the same as [Regex::captures_read], but starts the search at the given offset.

The significance of the starting point is that it takes the surrounding context into consideration. For example, the \A anchor can only match when start == 0.

§Panics

This panics when start >= haystack.len() + 1.

This routine may also panic if the given CaptureLocations was not created by this regex.

§Example

This example shows the significance of start by demonstrating how it can be used to permit look-around assertions in a regex to take the surrounding context into account.

use regex::Regex;

let re = Regex::new(r"\bchew\b").unwrap();
let hay = "eschew";
let mut locs = re.capture_locations();
// We get a match here, but it's probably not intended.
assert!(re.captures_read(&mut locs, &hay[2..]).is_some());
// No match because the  assertions take the context into account.
assert!(re.captures_read_at(&mut locs, hay, 2).is_none());

pub fn as_str(&self) -> &str

Returns the original string of this regex.

§Example
use regex::Regex;

let re = Regex::new(r"foo\w+bar").unwrap();
assert_eq!(re.as_str(), r"foo\w+bar");

pub fn capture_names(&self) -> CaptureNames<'_>

Returns an iterator over the capture names in this regex.

The iterator returned yields elements of type Option<&str>. That is, the iterator yields values for all capture groups, even ones that are unnamed. The order of the groups corresponds to the order of the group’s corresponding opening parenthesis.

The first element of the iterator always yields the group corresponding to the overall match, and this group is always unnamed. Therefore, the iterator always yields at least one group.

§Example

This shows basic usage with a mix of named and unnamed capture groups:

use regex::Regex;

let re = Regex::new(r"(?<a>.(?<b>.))(.)(?:.)(?<c>.)").unwrap();
let mut names = re.capture_names();
assert_eq!(names.next(), Some(None));
assert_eq!(names.next(), Some(Some("a")));
assert_eq!(names.next(), Some(Some("b")));
assert_eq!(names.next(), Some(None));
// the '(?:.)' group is non-capturing and so doesn't appear here!
assert_eq!(names.next(), Some(Some("c")));
assert_eq!(names.next(), None);

The iterator always yields at least one element, even for regexes with no capture groups and even for regexes that can never match:

use regex::Regex;

let re = Regex::new(r"").unwrap();
let mut names = re.capture_names();
assert_eq!(names.next(), Some(None));
assert_eq!(names.next(), None);

let re = Regex::new(r"[a&&b]").unwrap();
let mut names = re.capture_names();
assert_eq!(names.next(), Some(None));
assert_eq!(names.next(), None);

pub fn captures_len(&self) -> usize

Returns the number of captures groups in this regex.

This includes all named and unnamed groups, including the implicit unnamed group that is always present and corresponds to the entire match.

Since the implicit unnamed group is always included in this length, the length returned is guaranteed to be greater than zero.

§Example
use regex::Regex;

let re = Regex::new(r"foo").unwrap();
assert_eq!(1, re.captures_len());

let re = Regex::new(r"(foo)").unwrap();
assert_eq!(2, re.captures_len());

let re = Regex::new(r"(?<a>.(?<b>.))(.)(?:.)(?<c>.)").unwrap();
assert_eq!(5, re.captures_len());

let re = Regex::new(r"[a&&b]").unwrap();
assert_eq!(1, re.captures_len());

pub fn static_captures_len(&self) -> Option<usize>

Returns the total number of capturing groups that appear in every possible match.

If the number of capture groups can vary depending on the match, then this returns None. That is, a value is only returned when the number of matching groups is invariant or “static.”

Note that like [Regex::captures_len], this does include the implicit capturing group corresponding to the entire match. Therefore, when a non-None value is returned, it is guaranteed to be at least 1. Stated differently, a return value of Some(0) is impossible.

§Example

This shows a few cases where a static number of capture groups is available and a few cases where it is not.

use regex::Regex;

let len = |pattern| {
    Regex::new(pattern).map(|re| re.static_captures_len())
};

assert_eq!(Some(1), len("a")?);
assert_eq!(Some(2), len("(a)")?);
assert_eq!(Some(2), len("(a)|(b)")?);
assert_eq!(Some(3), len("(a)(b)|(c)(d)")?);
assert_eq!(None, len("(a)|b")?);
assert_eq!(None, len("a|(b)")?);
assert_eq!(None, len("(b)*")?);
assert_eq!(Some(2), len("(b)+")?);

pub fn capture_locations(&self) -> CaptureLocations

Returns a fresh allocated set of capture locations that can be reused in multiple calls to [Regex::captures_read] or [Regex::captures_read_at].

The returned locations can be used for any subsequent search for this particular regex. There is no guarantee that it is correct to use for other regexes, even if they have the same number of capture groups.

§Example
use regex::Regex;

let re = Regex::new(r"(.)(.)(\w+)").unwrap();
let mut locs = re.capture_locations();
assert!(re.captures_read(&mut locs, "Padron").is_some());
assert_eq!(locs.get(0), Some((0, 6)));
assert_eq!(locs.get(1), Some((0, 1)));
assert_eq!(locs.get(2), Some((1, 2)));
assert_eq!(locs.get(3), Some((2, 6)));

Trait Implementations§

source§

impl AsRef<Regex> for Regex

source§

fn as_ref(&self) -> &Regex

Converts this type into a shared reference of the (usually inferred) input type.
source§

impl Clone for Regex

source§

fn clone(&self) -> Regex

Returns a copy of the value. Read more
1.0.0 · source§

fn clone_from(&mut self, source: &Self)

Performs copy-assignment from source. Read more
source§

impl Debug for Regex

source§

fn fmt(&self, f: &mut Formatter<'_>) -> Result<(), Error>

Formats the value using the given formatter. Read more
source§

impl Deref for Regex

source§

type Target = Regex

The resulting type after dereferencing.
source§

fn deref(&self) -> &Self::Target

Dereferences the value.
source§

impl<'de> Deserialize<'de> for Regex

source§

fn deserialize<D>(deserializer: D) -> Result<Regex, D::Error>
where D: Deserializer<'de>,

Deserialize this value from the given Serde deserializer. Read more
source§

impl FromStr for Regex

source§

type Err = String

The associated error which can be returned from parsing.
source§

fn from_str(s: &str) -> Result<Self, Self::Err>

Parses a string s to return a value of this type. Read more
source§

impl PartialEq for Regex

source§

fn eq(&self, other: &Self) -> bool

Tests for self and other values to be equal, and is used by ==.
1.0.0 · source§

fn ne(&self, other: &Rhs) -> bool

Tests for !=. The default implementation is almost always sufficient, and should not be overridden without very good reason.
source§

impl Serialize for Regex

source§

fn serialize<S>(&self, serializer: S) -> Result<S::Ok, S::Error>
where S: Serializer,

Serialize this value into the given Serde serializer. Read more

Auto Trait Implementations§

§

impl Freeze for Regex

§

impl RefUnwindSafe for Regex

§

impl Send for Regex

§

impl Sync for Regex

§

impl Unpin for Regex

§

impl UnwindSafe for Regex

Blanket Implementations§

source§

impl<T> Any for T
where T: 'static + ?Sized,

source§

fn type_id(&self) -> TypeId

Gets the TypeId of self. Read more
source§

impl<T> Borrow<T> for T
where T: ?Sized,

source§

fn borrow(&self) -> &T

Immutably borrows from an owned value. Read more
source§

impl<T> BorrowMut<T> for T
where T: ?Sized,

source§

fn borrow_mut(&mut self) -> &mut T

Mutably borrows from an owned value. Read more
source§

impl<T> CloneToUninit for T
where T: Clone,

source§

unsafe fn clone_to_uninit(&self, dst: *mut T)

🔬This is a nightly-only experimental API. (clone_to_uninit)
Performs copy-assignment from self to dst. Read more
source§

impl<T> DynClone for T
where T: Clone,

source§

fn __clone_box(&self, _: Private) -> *mut ()

source§

impl<T> From<T> for T

source§

fn from(t: T) -> T

Returns the argument unchanged.

§

impl<T> FromRef<T> for T
where T: Clone,

§

fn from_ref(input: &T) -> T

Converts to this type from a reference to the input type.
§

impl<T> Instrument for T

§

fn instrument(self, span: Span) -> Instrumented<Self>

Instruments this type with the provided [Span], returning an Instrumented wrapper. Read more
§

fn in_current_span(self) -> Instrumented<Self>

Instruments this type with the current Span, returning an Instrumented wrapper. Read more
source§

impl<T, U> Into<U> for T
where U: From<T>,

source§

fn into(self) -> U

Calls U::from(self).

That is, this conversion is whatever the implementation of From<T> for U chooses to do.

source§

impl<T> IntoEither for T

source§

fn into_either(self, into_left: bool) -> Either<Self, Self>

Converts self into a Left variant of Either<Self, Self> if into_left is true. Converts self into a Right variant of Either<Self, Self> otherwise. Read more
source§

fn into_either_with<F>(self, into_left: F) -> Either<Self, Self>
where F: FnOnce(&Self) -> bool,

Converts self into a Left variant of Either<Self, Self> if into_left(&self) returns true. Converts self into a Right variant of Either<Self, Self> otherwise. Read more
source§

impl<T> IntoRequest<T> for T

source§

fn into_request(self) -> Request<T>

Wrap the input message T in a tonic::Request
source§

impl<T> ToOwned for T
where T: Clone,

source§

type Owned = T

The resulting type after obtaining ownership.
source§

fn to_owned(&self) -> T

Creates owned data from borrowed data, usually by cloning. Read more
source§

fn clone_into(&self, target: &mut T)

Uses borrowed data to replace owned data, usually by cloning. Read more
source§

impl<T, U> TryFrom<U> for T
where U: Into<T>,

source§

type Error = Infallible

The type returned in the event of a conversion error.
source§

fn try_from(value: U) -> Result<T, <T as TryFrom<U>>::Error>

Performs the conversion.
source§

impl<T, U> TryInto<U> for T
where U: TryFrom<T>,

source§

type Error = <U as TryFrom<T>>::Error

The type returned in the event of a conversion error.
source§

fn try_into(self) -> Result<U, <U as TryFrom<T>>::Error>

Performs the conversion.
§

impl<V, T> VZip<V> for T
where V: MultiLane<T>,

§

fn vzip(self) -> V

§

impl<T> WithSubscriber for T

§

fn with_subscriber<S>(self, subscriber: S) -> WithDispatch<Self>
where S: Into<Dispatch>,

Attaches the provided Subscriber to this type, returning a [WithDispatch] wrapper. Read more
§

fn with_current_subscriber(self) -> WithDispatch<Self>

Attaches the current default Subscriber to this type, returning a [WithDispatch] wrapper. Read more
source§

impl<T> DeserializeOwned for T
where T: for<'de> Deserialize<'de>,