Init.Data.String.Basic

List.asString definition

(s : List Char) : String

Full source

/--
Creates a string that contains the characters in a list, in order.

Examples:
 * `['L', '∃', '∀', 'N'].asString = "L∃∀N"`
 * `[].asString = ""`
 * `['a', 'a', 'a'].asString = "aaa"`
-/
def List.asString (s : List Char) : String :=
  ⟨s⟩

String from list of characters

Informal description

Given a list of Unicode characters `s`, the function returns a string formed by concatenating these characters in order. For example, the list `['L', '∃', '∀', 'N']` produces the string `"L∃∀N"`, an empty list produces the empty string `""`, and the list `['a', 'a', 'a']` produces the string `"aaa"`.

String.instOfNatPos instance

: OfNat String.Pos (nat_lit 0)

Full source

instance : OfNat String.Pos (nat_lit 0) where
  ofNat := {}

Zero as a Byte Position in UTF-8 Strings

Informal description

The byte position `0` in a UTF-8 encoded string can be interpreted as an element of type `String.Pos` via the `OfNat` typeclass.

String.instLT instance

: LT String

Full source

instance : LT String :=
  ⟨fun s₁ s₂ => s₁.data < s₂.data⟩

Lexicographic Order on Strings

Informal description

The type of strings is equipped with a canonical less-than relation, where for any two strings $s_1$ and $s_2$, $s_1 < s_2$ holds if and only if $s_1$ is lexicographically less than $s_2$ when compared character by character using the Unicode code point order.

String.decidableLT instance

(s₁ s₂ : @& String) : Decidable (s₁ < s₂)

Full source

@[extern "lean_string_dec_lt"]
instance decidableLT (s₁ s₂ : @& String) : Decidable (s₁ < s₂) :=
  List.decidableLT s₁.data s₂.data

Decidability of Lexicographic Order on Strings

Informal description

For any two strings $s_1$ and $s_2$, the lexicographic order relation $s_1 < s_2$ is decidable. Here, $s_1 < s_2$ holds if and only if $s_1$ is lexicographically less than $s_2$ when compared character by character using the Unicode code point order.

String.decLt abbrev

Full source

@[deprecated decidableLT (since := "2024-12-13")] abbrev decLt := @decidableLT

Decidability of Lexicographic Order on Strings

Informal description

The abbreviation `String.decLt` provides a decision procedure for the lexicographic order relation on strings. For any two strings $s_1$ and $s_2$, it constructs a proof that $s_1 < s_2$ is decidable, meaning it can determine whether $s_1$ is lexicographically less than $s_2$ based on their Unicode code point order.

String.le definition

(a b : String) : Prop

Full source

/--
Non-strict inequality on strings, typically used via the `≤` operator.

`a ≤ b` is defined to mean `¬ b < a`.
-/
@[reducible] protected def le (a b : String) : Prop := ¬ b < a

Non-strict string inequality

Informal description

The non-strict inequality relation on strings, where for any two strings $ a $ and $ b $, $ a \leq b $ is defined to mean $ \neg (b < a) $. This is typically used via the $ \leq $ operator.

String.instLE instance

: LE String

Full source

instance : LE String :=
  ⟨String.le⟩

The Canonical Order on Strings

Informal description

The type of strings is equipped with a canonical "less than or equal to" relation, where for any two strings $a$ and $b$, $a \leq b$ is defined as $\neg (b < a)$ (i.e., $b$ is not lexicographically less than $a$).

String.decLE instance

(s₁ s₂ : String) : Decidable (s₁ ≤ s₂)

Full source

instance decLE (s₁ s₂ : String) : Decidable (s₁ ≤ s₂) :=
  inferInstanceAs (Decidable (Not _))

Decidability of Lexicographic Order on Strings

Informal description

For any two strings $s_1$ and $s_2$, the non-strict lexicographic order relation $s_1 \leq s_2$ is decidable. Here, $s_1 \leq s_2$ holds if and only if $s_1$ is lexicographically less than or equal to $s_2$ when compared character by character using the Unicode code point order.

String.length definition

: (@& String) → Nat

Full source

/--
Returns the length of a string in Unicode code points.

Examples:
* `"".length = 0`
* `"abc".length = 3`
* `"L∃∀N".length = 4`
-/
@[extern "lean_string_length"]
def length : (@& String) → Nat
  | ⟨s⟩ => s.length

Length of a string in Unicode code points

Informal description

The function maps a string $s$ to its length in Unicode code points, denoted by $\text{length}(s)$. For example: - $\text{length}(\texttt{""}) = 0$ - $\text{length}(\texttt{"abc"}) = 3$ - $\text{length}(\texttt{"L∃∀N"}) = 4$

String.push definition

: String → Char → String

Full source

/--
Adds a character to the end of a string.

The internal implementation uses dynamic arrays and will perform destructive updates
if the string is not shared.

Examples:
* `"abc".push 'd' = "abcd"`
* `"".push 'a' = "a"`
-/
@[extern "lean_string_push"]
def push : String → Char → String
  | ⟨s⟩, c => ⟨s ++ [c]⟩

String append character

Informal description

The function appends a character $c$ to the end of a string $s$, resulting in a new string. Internally, this operation may perform destructive updates if the original string is not shared. Examples: - $\texttt{"abc".push 'd'} = \texttt{"abcd"}$ - $\texttt{"".push 'a'} = \texttt{"a"}$

String.append definition

: String → (@& String) → String

Full source

/--
Appends two strings. Usually accessed via the `++` operator.

The internal implementation will perform destructive updates if the string is not shared.

Examples:
 * `"abc".append "def" = "abcdef"`
 * `"abc" ++ "def" = "abcdef"`
 * `"" ++ "" = ""`
-/
@[extern "lean_string_append"]
def append : String → (@& String) → String
  | ⟨a⟩, ⟨b⟩ => ⟨a ++ b⟩

String concatenation

Informal description

The function concatenates two strings $s_1$ and $s_2$ to produce a new string consisting of the characters of $s_1$ followed by the characters of $s_2$. This operation is typically accessed via the `++` operator. The implementation may perform destructive updates if the input strings are not shared. Examples: - $\texttt{"abc".append "def"} = \texttt{"abcdef"}$ - $\texttt{"abc" ++ "def"} = \texttt{"abcdef"}$ - $\texttt{"" ++ ""} = \texttt{""}$

String.toList definition

(s : String) : List Char

Full source

/--
Converts a string to a list of characters.

Even though the logical model of strings is as a structure that wraps a list of characters, this
operation takes time and space linear in the length of the string. At runtime, strings are
represented as dynamic arrays of bytes.

Examples:
 * `"abc".toList = ['a', 'b', 'c']`
 * `"".toList = []`
 * `"\n".toList = ['\n']`
-/
def toList (s : String) : List Char :=
  s.data

String to list of characters conversion

Informal description

The function converts a string $s$ to a list of its constituent Unicode characters. Although strings are logically modeled as lists of characters, this operation requires time and space proportional to the length of the string due to their runtime representation as dynamic arrays of bytes. Examples: - $\texttt{"abc".toList} = [\texttt{'a'}, \texttt{'b'}, \texttt{'c'}]$ - $\texttt{"".toList} = []$ - $\texttt{"\textbackslash n".toList} = [\texttt{'\textbackslash n'}]$

String.Pos.isValid definition

(s : @& String) (p : @& Pos) : Bool

Full source

/--
Returns `true` if `p` is a valid UTF-8 position in the string `s`.

This means that `p ≤ s.endPos` and `p` lies on a UTF-8 character boundary. At runtime, this
operation takes constant time.

Examples:
 * `String.Pos.isValid "abc" ⟨0⟩ = true`
 * `String.Pos.isValid "abc" ⟨1⟩ = true`
 * `String.Pos.isValid "abc" ⟨3⟩ = true`
 * `String.Pos.isValid "abc" ⟨4⟩ = false`
 * `String.Pos.isValid "𝒫(A)" ⟨0⟩ = true`
 * `String.Pos.isValid "𝒫(A)" ⟨1⟩ = false`
 * `String.Pos.isValid "𝒫(A)" ⟨2⟩ = false`
 * `String.Pos.isValid "𝒫(A)" ⟨3⟩ = false`
 * `String.Pos.isValid "𝒫(A)" ⟨4⟩ = true`
-/
@[extern "lean_string_is_valid_pos"]
def Pos.isValid (s : @&String) (p : @& Pos) : Bool :=
  go s.data 0
where
  go : List Char → Pos → Bool
  | [],    i => i = p
  | c::cs, i => if i = p then true else go cs (i + c)

UTF-8 position validity check

Informal description

Given a string $s$ and a byte position $p$ in $s$, the function returns `true` if $p$ is a valid UTF-8 position in $s$, meaning that $p$ lies on a UTF-8 character boundary and $p \leq \text{endPos}(s)$. This operation runs in constant time at runtime. Examples: - $\text{isValid}("abc", 0) = \text{true}$ - $\text{isValid}("abc", 1) = \text{true}$ - $\text{isValid}("abc", 3) = \text{true}$ - $\text{isValid}("abc", 4) = \text{false}$ - $\text{isValid}("𝒫(A)", 0) = \text{true}$ - $\text{isValid}("𝒫(A)", 1) = \text{false}$ - $\text{isValid}("𝒫(A)", 2) = \text{false}$ - $\text{isValid}("𝒫(A)", 3) = \text{false}$ - $\text{isValid}("𝒫(A)", 4) = \text{true}$

String.utf8GetAux definition

: List Char → Pos → Pos → Char

Full source

def utf8GetAux : List Char → Pos → Pos → Char
  | [],    _, _ => default
  | c::cs, i, p => if i = p then c else utf8GetAux cs (i + c) p

UTF-8 character retrieval auxiliary function

Informal description

The auxiliary function `String.utf8GetAux` takes a list of Unicode characters `cs`, a current byte position `i`, and a target byte position `p`. It returns the character at position `p` in the UTF-8 encoded string represented by `cs`. If the current position `i` matches the target position `p`, it returns the head character `c` of the list. Otherwise, it recursively processes the tail of the list, incrementing the current position by the size of the character `c` in UTF-8 encoding. If the list is empty or the position is invalid, it returns a default character.

String.get definition

(s : @& String) (p : @& Pos) : Char

Full source

/--
Returns the character at position `p` of a string. If `p` is not a valid position, returns the
fallback value `(default : Char)`, which is `'A'`, but does not panic.

This function is overridden with an efficient implementation in runtime code. See
`String.utf8GetAux` for the reference implementation.

Examples:
* `"abc".get ⟨1⟩ = 'b'`
* `"abc".get ⟨3⟩ = (default : Char)` because byte `3` is at the end of the string.
* `"L∃∀N".get ⟨2⟩ = (default : Char)` because byte `2` is in the middle of `'∃'`.
-/
@[extern "lean_string_utf8_get"]
def get (s : @& String) (p : @& Pos) : Char :=
  match s with
  | ⟨s⟩ => utf8GetAux s 0 p

Character retrieval by byte position in a UTF-8 string

Informal description

Given a string `s` and a byte position `p` in UTF-8 encoding, the function returns the Unicode character at position `p` in `s`. If `p` is not a valid position (i.e., it is out of bounds or not on a character boundary), the function returns the default character `'A'` without causing an error.

String.utf8GetAux? definition

: List Char → Pos → Pos → Option Char

Full source

def utf8GetAux? : List Char → Pos → Pos → Option Char
  | [],    _, _ => none
  | c::cs, i, p => if i = p then c else utf8GetAux? cs (i + c) p

UTF-8 character lookup by byte position (auxiliary function)

Informal description

The auxiliary function takes a list of Unicode characters `cs`, a current byte position `i`, and a target byte position `p`. It returns the character at position `p` in the UTF-8 encoded string represented by `cs` if such a character exists, otherwise it returns `none`. The function checks if the current position `i` matches the target position `p`; if so, it returns the corresponding character, otherwise it recursively processes the next character in the list, updating the current position by the size of the current character.

String.get? definition

: (@& String) → (@& Pos) → Option Char

Full source

/--
Returns the character at position `p` of a string. If `p` is not a valid position, returns `none`.

This function is overridden with an efficient implementation in runtime code. See
`String.utf8GetAux?` for the reference implementation.

Examples:
* `"abc".get? ⟨1⟩ = some 'b'`
* `"abc".get? ⟨3⟩ = none`
* `"L∃∀N".get? ⟨1⟩ = some '∃'`
* `"L∃∀N".get? ⟨2⟩ = none`
-/
@[extern "lean_string_utf8_get_opt"]
def get? : (@& String) → (@& Pos) → Option Char
  | ⟨s⟩, p => utf8GetAux? s 0 p

Character lookup by byte position in a UTF-8 string

Informal description

Given a string $s$ and a byte position $p$ in $s$, this function returns the Unicode character at position $p$ if $p$ is a valid position within $s$, otherwise it returns `none`. The position $p$ must lie on a UTF-8 character boundary to be considered valid.

String.get! definition

(s : @& String) (p : @& Pos) : Char

Full source

/--
Returns the character at position `p` of a string. Panics if `p` is not a valid position.

See `String.get?` for a safer alternative.

This function is overridden with an efficient implementation in runtime code. See
`String.utf8GetAux` for the reference implementation.

Examples
* `"abc".get! ⟨1⟩ = 'b'`
-/
@[extern "lean_string_utf8_get_bang"]
def get! (s : @& String) (p : @& Pos) : Char :=
  match s with
  | ⟨s⟩ => utf8GetAux s 0 p

Unsafe character retrieval by byte position in a UTF-8 string

Informal description

Given a string $s$ and a byte position $p$ in $s$, the function returns the Unicode character at position $p$. The function will panic if $p$ is not a valid position in $s$.

String.utf8SetAux definition

(c' : Char) : List Char → Pos → Pos → List Char

Full source

def utf8SetAux (c' : Char) : List Char → Pos → Pos → List Char
  | [],    _, _ => []
  | c::cs, i, p =>
    if i = p then (c'::cs) else c::(utf8SetAux c' cs (i + c) p)

UTF-8 string character replacement (auxiliary function)

Informal description

The auxiliary function `String.utf8SetAux` takes a Unicode character `c'`, a list of characters `cs`, a current byte position `i`, and a target byte position `p`. It traverses the list `cs` and replaces the character at position `p` with `c'`, while preserving the rest of the list. The function recursively processes the list, updating the current position by the size of each character encountered until it reaches the target position `p`.

String.set definition

: String → (@& Pos) → Char → String

Full source

/--
Replaces the character at a specified position in a string with a new character. If the position is
invalid, the string is returned unchanged.

If both the replacement character and the replaced character are 7-bit ASCII characters and the
string is not shared, then it is updated in-place and not copied.

Examples:
* `"abc".set ⟨1⟩ 'B' = "aBc"`
* `"abc".set ⟨3⟩ 'D' = "abc"`
* `"L∃∀N".set ⟨4⟩ 'X' = "L∃XN"`
* `"L∃∀N".set ⟨2⟩ 'X' = "L∃∀N"` because `'∃'` is a multi-byte character, so the byte index `2` is an
  invalid position.
-/
@[extern "lean_string_utf8_set"]
def set : String → (@& Pos) → Char → String
  | ⟨s⟩, i, c => ⟨utf8SetAux c s 0 i⟩

Character replacement in a UTF-8 string

Informal description

The function replaces the character at a specified byte position `i` in a UTF-8 encoded string `s` with a new character `c`. If the position `i` is invalid (i.e., it is out of bounds or not on a character boundary), the string is returned unchanged. The operation is performed in-place when possible (for 7-bit ASCII characters and unshared strings).

String.modify definition

(s : String) (i : Pos) (f : Char → Char) : String

Full source

/--
Replaces the character at position `p` in the string `s` with the result of applying `f` to that
character. If `p` is an invalid position, the string is returned unchanged.

If both the replacement character and the replaced character are 7-bit ASCII characters and the
string is not shared, then it is updated in-place and not copied.

Examples:
* `abc.modify ⟨1⟩ Char.toUpper = "aBc"`
* `abc.modify ⟨3⟩ Char.toUpper = "abc"`
-/
def modify (s : String) (i : Pos) (f : Char → Char) : String :=
  s.set i <| f <| s.get i

Character modification in a string

Informal description

Given a string $s$, a byte position $i$ in $s$, and a function $f$ on characters, the function replaces the character at position $i$ in $s$ with $f$ applied to that character. If $i$ is not a valid position in $s$, the string remains unchanged. The operation is performed in-place when possible (for 7-bit ASCII characters and unshared strings).

String.next definition

(s : @& String) (p : @& Pos) : Pos

Full source

/--
Returns the next position in a string after position `p`. The result is unspecified if `p` is not a
valid position or if `p = s.endPos`.

A run-time bounds check is performed to determine whether `p` is at the end of the string. If a
bounds check has already been performed, use `String.next'` to avoid a repeated check.

Some examples where the result is unspecified:
* `"abc".next ⟨3⟩`, since `3 = "abc".endPos`
* `"L∃∀N".next ⟨2⟩`, since `2` points into the middle of a multi-byte UTF-8 character

Examples:
* `"abc".get ("abc".next 0) = 'b'`
* `"L∃∀N".get (0 |> "L∃∀N".next |> "L∃∀N".next) = '∀'`
-/
@[extern "lean_string_utf8_next"]
def next (s : @& String) (p : @& Pos) : Pos :=
  let c := get s p
  p + c

Next UTF-8 character boundary position in a string

Informal description

Given a UTF-8 encoded string `s` and a valid byte position `p` (not at the end of the string), the function returns the next valid byte position after `p` that lies on a UTF-8 character boundary. The result is computed by adding the byte length of the character at position `p` to `p`. The behavior is unspecified if `p` is invalid or points to the end of the string. Examples: - For the string `"abc"`, `next` applied to position `0` returns position `1` (the start of `'b'`). - For the string `"L∃∀N"`, applying `next` twice from position `0` returns position `4` (the start of `'∀'`).

String.utf8PrevAux definition

: List Char → Pos → Pos → Pos

Full source

def utf8PrevAux : List Char → Pos → Pos → Pos
  | [],    _, _ => 0
  | c::cs, i, p =>
    let i' := i + c
    if i' = p then i else utf8PrevAux cs i' p

UTF-8 previous character boundary auxiliary function

Informal description

The auxiliary function `String.utf8PrevAux` takes a list of Unicode characters `cs`, a current byte position `i`, and a target byte position `p`. It recursively scans the list to find the largest byte position `i'` such that `i' ≤ p` and `i'` corresponds to the start of a character in the UTF-8 encoding. If no such position is found, it returns `0`. More precisely: - For an empty list, it returns `0`. - For a non-empty list `c::cs`, it computes the next byte position `i' = i + c` (where `c` is the size of the current character in bytes). If `i'` matches the target position `p`, it returns `i`; otherwise, it continues the search recursively with the remaining characters `cs`, the new position `i'`, and the same target `p`.

String.prev definition

: (@& String) → (@& Pos) → Pos

Full source

/--
Returns the position in a string before a specified position, `p`. If `p = ⟨0⟩`, returns `0`. If `p`
is not a valid position, the result is unspecified.

For example, `"L∃∀N".prev ⟨3⟩` is unspecified, since byte 3 occurs in the middle of the multi-byte
character `'∃'`.

Examples:
* `"abc".get ("abc".endPos |> "abc".prev) = 'c'`
* `"L∃∀N".get ("L∃∀N".endPos |> "L∃∀N".prev |> "L∃∀N".prev |> "L∃∀N".prev) = '∃'`
-/
@[extern "lean_string_utf8_prev"]
def prev : (@& String) → (@& Pos) → Pos
  | ⟨s⟩, p => if p = 0 then 0 else utf8PrevAux s 0 p

Previous UTF-8 character boundary in a string

Informal description

Given a string $s$ and a byte position $p$ in $s$, the function returns the previous valid UTF-8 character boundary position before $p$. If $p$ is at the start of the string (position $0$), it returns $0$. The behavior is unspecified if $p$ is not a valid position (e.g., in the middle of a multi-byte character). Examples: - For the string `"abc"`, `prev` applied to the end position returns the position of `'c'`. - For the string `"L∃∀N"`, applying `prev` three times from the end position returns the position of `'∃'`.

String.front definition

(s : String) : Char

Full source

/--
Returns the first character in `s`. If `s = ""`, returns `(default : Char)`.

Examples:
* `"abc".front = 'a'`
* `"".front = (default : Char)`
-/
@[inline] def front (s : String) : Char :=
  get s 0

First character of a string

Informal description

The function returns the first character of the string `s`. If `s` is the empty string, it returns the default character (typically `'\0'`). Examples: - `"abc".front = 'a'` - `"".front = (default : Char)`

String.back definition

(s : String) : Char

Full source

/--
Returns the last character in `s`. If `s = ""`, returns `(default : Char)`.

Examples:
* `"abc".back = 'c'`
* `"".back = (default : Char)`
-/
@[inline] def back (s : String) : Char :=
  get s (prev s s.endPos)

Last character of a string

Informal description

Given a string $s$, the function returns the last Unicode character in $s$. If $s$ is the empty string, it returns the default character (typically `'\0'`). Examples: - For `s = \texttt{"abc"}`, the result is `'c'`. - For `s = \texttt{""}`, the result is the default character.

String.atEnd definition

: (@& String) → (@& Pos) → Bool

Full source

/--
Returns `true` if a specified byte position is greater than or equal to the position which points to
the end of a string. Otherwise, returns `false`.

Examples:
* `(0 |> "abc".next |> "abc".next |> "abc".atEnd) = false`
* `(0 |> "abc".next |> "abc".next |> "abc".next |> "abc".next |> "abc".atEnd) = true`
* `(0 |> "L∃∀N".next |> "L∃∀N".next |> "L∃∀N".next |> "L∃∀N".atEnd) = false`
* `(0 |> "L∃∀N".next |> "L∃∀N".next |> "L∃∀N".next |> "L∃∀N".next |> "L∃∀N".atEnd) = true`
* `"abc".atEnd ⟨4⟩ = true`
* `"L∃∀N".atEnd ⟨7⟩ = false`
* `"L∃∀N".atEnd ⟨8⟩ = true`
-/
@[extern "lean_string_utf8_at_end"]
def atEnd : (@& String) → (@& Pos) → Bool
  | s, p => p.byteIdx ≥ utf8ByteSize s

Check if byte position is at end of string

Informal description

Given a string $s$ and a byte position $p$ in $s$, the function returns `true` if $p$ is greater than or equal to the byte position pointing to the end of $s$, and `false` otherwise. This is determined by comparing the byte index of $p$ with the UTF-8 byte size of $s$.

String.get' definition

(s : @& String) (p : @& Pos) (h : ¬s.atEnd p) : Char

Full source

/--
Returns the character at position `p` of a string. Returns `(default : Char)`, which is `'A'`, if
`p` is not a valid position.

Requires evidence, `h`, that `p` is within bounds instead of performing a run-time bounds check as
in `String.get`.

A typical pattern combines `get'` with a dependent `if`-expression to avoid the overhead of an
additional bounds check. For example:
```
def getInBounds? (s : String) (p : String.Pos) : Option Char :=
  if h : s.atEnd p then none else some (s.get' p h)
```
Even with evidence of `¬ s.atEnd p`, `p` may be invalid if a byte index points into the middle of a
multi-byte UTF-8 character. For example, `"L∃∀N".get' ⟨2⟩ (by decide) = (default : Char)`.

Examples:
* `"abc".get' 0 (by decide) = 'a'`
* `let lean := "L∃∀N"; lean.get' (0 |> lean.next |> lean.next) (by decide) = '∀'`
-/
@[extern "lean_string_utf8_get_fast"]
def get' (s : @& String) (p : @& Pos) (h : ¬ s.atEnd p) : Char :=
  match s with
  | ⟨s⟩ => utf8GetAux s 0 p

Character retrieval at a valid byte position in a string

Informal description

Given a string $s$ and a byte position $p$ in $s$, along with evidence $h$ that $p$ is not at the end of $s$, the function returns the Unicode character at position $p$ in $s$. If $p$ is not a valid position (e.g., it points into the middle of a multi-byte UTF-8 character), the function returns the default character `'A'`.

String.next' definition

(s : @& String) (p : @& Pos) (h : ¬s.atEnd p) : Pos

Full source

/--
Returns the next position in a string after position `p`. The result is unspecified if `p` is not a
valid position.

Requires evidence, `h`, that `p` is within bounds. No run-time bounds check is performed, as in
`String.next`.

A typical pattern combines `String.next'` with a dependent `if`-expression to avoid the overhead of
an additional bounds check. For example:
```
def next? (s: String) (p : String.Pos) : Option Char :=
  if h : s.atEnd p then none else s.get (s.next' p h)
```

Example:
* `let abc := "abc"; abc.get (abc.next' 0 (by decide)) = 'b'`
-/
@[extern "lean_string_utf8_next_fast"]
def next' (s : @& String) (p : @& Pos) (h : ¬ s.atEnd p) : Pos :=
  let c := get s p
  p + c

Next valid byte position in a string

Informal description

Given a string $s$ and a valid byte position $p$ in $s$ (with proof $h$ that $p$ is not at the end of $s$), the function returns the next valid byte position after $p$ in $s$. The result is computed by adding the UTF-8 size of the character at position $p$ to $p$.

Char.utf8Size_pos theorem

(c : Char) : 0 < c.utf8Size

Full source

theorem _root_.Char.utf8Size_pos (c : Char) : 0 < c.utf8Size := by
  repeat first | apply iteInduction (motive := (0 < ·)) <;> intros | decide

Positivity of UTF-8 Encoding Size for Unicode Characters

Informal description

For any Unicode character $c$, the number of bytes required to encode $c$ in UTF-8 is strictly positive, i.e., $\text{utf8Size}(c) > 0$.

Char.utf8Size_le_four theorem

(c : Char) : c.utf8Size ≤ 4

Full source

theorem _root_.Char.utf8Size_le_four (c : Char) : c.utf8Size ≤ 4 := by
  repeat first | apply iteInduction (motive := (· ≤ 4)) <;> intros | decide

UTF-8 Encoding Size Bound: $\text{utf8Size}(c) \leq 4$

Informal description

For any Unicode character $c$, the number of bytes required to encode $c$ in UTF-8 is at most 4. In other words, $\text{utf8Size}(c) \leq 4$.

String.one_le_csize abbrev

Full source

@[deprecated Char.utf8Size_pos (since := "2026-06-04")] abbrev one_le_csize := Char.utf8Size_pos

Lower bound on UTF-8 encoding size: $1 \leq \text{utf8Size}(c)$

Informal description

For any Unicode character $c$, the UTF-8 encoding size satisfies $1 \leq \text{utf8Size}(c)$.

String.pos_lt_eq theorem

(p₁ p₂ : Pos) : (p₁ < p₂) = (p₁.1 < p₂.1)

Full source

@[simp] theorem pos_lt_eq (p₁ p₂ : Pos) : (p₁ < p₂) = (p₁.1 < p₂.1) := rfl

Equivalence of String Position Order and Byte Index Order

Informal description

For any two byte positions $p_1$ and $p_2$ in a UTF-8 encoded string, the strict order relation $p_1 < p_2$ holds if and only if the underlying byte indices satisfy $p_1.1 < p_2.1$.

String.pos_add_char theorem

(p : Pos) (c : Char) : (p + c).byteIdx = p.byteIdx + c.utf8Size

Full source

@[simp] theorem pos_add_char (p : Pos) (c : Char) : (p + c).byteIdx = p.byteIdx + c.utf8Size := rfl

UTF-8 Byte Index Addition Formula: $(p + c).\text{byteIdx} = p.\text{byteIdx} + c.\text{utf8Size}$

Informal description

For any byte position $p$ in a UTF-8 encoded string and any Unicode character $c$, the byte index of the position $p + c$ is equal to the sum of the byte index of $p$ and the UTF-8 encoding size of $c$, i.e., $(p + c).\text{byteIdx} = p.\text{byteIdx} + c.\text{utf8Size}$.

String.Pos.ne_zero_of_lt theorem

: {a b : Pos} → a < b → b ≠ 0

Full source

protected theorem Pos.ne_zero_of_lt : {a b : Pos} → a < b → b ≠ 0
  | _, _, hlt, rfl => Nat.not_lt_zero _ hlt

Nonzero Byte Position from Strict Order

Informal description

For any two byte positions $a$ and $b$ in a UTF-8 encoded string, if $a < b$, then $b$ is not the zero position.

String.lt_next theorem

(s : String) (i : Pos) : i.1 < (s.next i).1

Full source

theorem lt_next (s : String) (i : Pos) : i.1 < (s.next i).1 :=
  Nat.add_lt_add_left (Char.utf8Size_pos _) _

Byte Index Increases at Next Character Boundary in UTF-8 Strings

Informal description

For any string $s$ and any byte position $i$ in $s$, the byte index of $i$ is strictly less than the byte index of the next character boundary `s.next i$.

String.utf8PrevAux_lt_of_pos theorem

: ∀ (cs : List Char) (i p : Pos), p ≠ 0 → (utf8PrevAux cs i p).1 < p.1

Full source

theorem utf8PrevAux_lt_of_pos : ∀ (cs : List Char) (i p : Pos), p ≠ 0 →
    (utf8PrevAux cs i p).1 < p.1
  | [], _, _, h =>
    Nat.lt_of_le_of_lt (Nat.zero_le _)
      (Nat.zero_lt_of_ne_zero (mt (congrArg Pos.mk) h))
  | c::cs, i, p, h => by
    simp [utf8PrevAux]
    apply iteInduction (motive := (Pos.byteIdx · < _)) <;> intro h'
    next => exact h' ▸ Nat.add_lt_add_left (Char.utf8Size_pos _) _
    next => exact utf8PrevAux_lt_of_pos _ _ _ h

UTF-8 Previous Character Boundary is Strictly Less Than Nonzero Target Position

Informal description

For any list of Unicode characters `cs`, any byte positions `i` and `p` in a UTF-8 encoded string, if `p` is not the zero position, then the byte position returned by `utf8PrevAux cs i p` is strictly less than `p`.

String.prev_lt_of_pos theorem

(s : String) (i : Pos) (h : i ≠ 0) : (s.prev i).1 < i.1

Full source

theorem prev_lt_of_pos (s : String) (i : Pos) (h : i ≠ 0) : (s.prev i).1 < i.1 := by
  simp [prev, h]
  exact utf8PrevAux_lt_of_pos _ _ _ h

Previous Character Boundary is Strictly Less Than Nonzero Position in UTF-8 Strings

Informal description

For any string $s$ and any nonzero byte position $i$ in $s$, the byte position of the previous UTF-8 character boundary `s.prev i` is strictly less than $i$.

String.posOfAux definition

(s : String) (c : Char) (stopPos : Pos) (pos : Pos) : Pos

Full source

def posOfAux (s : String) (c : Char) (stopPos : Pos) (pos : Pos) : Pos :=
  if h : pos < stopPos then
    if s.get pos == c then pos
    else
      have := Nat.sub_lt_sub_left h (lt_next s pos)
      posOfAux s c stopPos (s.next pos)
  else pos
termination_by stopPos.1 - pos.1

Auxiliary function for finding character position in a string

Informal description

Given a string `s`, a character `c`, a stopping position `stopPos`, and a starting position `pos`, the function searches for the first occurrence of `c` in `s` starting from `pos` and moving forward (via `s.next`), returning the position where `c` is found. The search stops when either `c` is found or `pos` reaches/exceeds `stopPos`, in which case it returns the current `pos`. More precisely: - If `pos < stopPos` and `s.get pos == c`, return `pos` - If `pos < stopPos` but `s.get pos ≠ c`, recursively search from `s.next pos` - Otherwise (`pos ≥ stopPos`), return `pos` The termination is guaranteed by the decreasing difference between `stopPos` and `pos` in each recursive call.

String.posOf definition

(s : String) (c : Char) : Pos

Full source

/--
Returns the position of the first occurrence of a character, `c`, in a string `s`. If `s` does not
contain `c`, returns `s.endPos`.

Examples:
* `"abcba".posOf 'a' = ⟨0⟩`
* `"abcba".posOf 'z' = ⟨5⟩`
* `"L∃∀N".posOf '∀' = ⟨4⟩`
-/
@[inline] def posOf (s : String) (c : Char) : Pos :=
  posOfAux s c s.endPos 0

First occurrence position of a character in a string

Informal description

Given a string $s$ and a character $c$, the function returns the byte position of the first occurrence of $c$ in $s$. If $c$ does not appear in $s$, it returns the end position of $s$ (i.e., the byte position immediately after the last character of $s$). Examples: - For $s = \texttt{"abcba"}$ and $c = \texttt{'a'}$, the result is position $0$. - For $s = \texttt{"abcba"}$ and $c = \texttt{'z'}$, the result is the end position of $s$ (position $5$). - For $s = \texttt{"L∃∀N"}$ and $c = \texttt{'∀'}$, the result is position $4$.

String.revPosOfAux definition

(s : String) (c : Char) (pos : Pos) : Option Pos

Full source

def revPosOfAux (s : String) (c : Char) (pos : Pos) : Option Pos :=
  if h : pos = 0 then none
  else
    have := prev_lt_of_pos s pos h
    let pos := s.prev pos
    if s.get pos == c then some pos
    else revPosOfAux s c pos
termination_by pos.1

Backward search for character in string from position

Informal description

Given a string $s$, a character $c$, and a byte position `pos` in $s$, the function searches backward from `pos` to the start of the string for the first occurrence of $c$. If found, it returns the position of $c$ as `some pos`; otherwise, it returns `none`. The search proceeds by moving to the previous valid UTF-8 character boundary at each step until the start of the string is reached.

String.revPosOf definition

(s : String) (c : Char) : Option Pos

Full source

/--
Returns the position of the last occurrence of a character, `c`, in a string `s`. If `s` does not
contain `c`, returns `none`.

Examples:
* `"abcabc".refPosOf 'a' = some ⟨3⟩`
* `"abcabc".revPosOf 'z' = none`
* `"L∃∀N".revPosOf '∀' = some ⟨4⟩`
-/
@[inline] def revPosOf (s : String) (c : Char) : Option Pos :=
  revPosOfAux s c s.endPos

Last occurrence position of a character in a string

Informal description

Given a string $s$ and a character $c$, the function returns the byte position of the last occurrence of $c$ in $s$ as an optional value. If $c$ does not appear in $s$, it returns `none`. The search is performed from the end of the string towards the beginning.

String.findAux definition

(s : String) (p : Char → Bool) (stopPos : Pos) (pos : Pos) : Pos

Full source

def findAux (s : String) (p : Char → Bool) (stopPos : Pos) (pos : Pos) : Pos :=
  if h : pos < stopPos then
    if p (s.get pos) then pos
    else
      have := Nat.sub_lt_sub_left h (lt_next s pos)
      findAux s p stopPos (s.next pos)
  else pos
termination_by stopPos.1 - pos.1

Auxiliary string character search function

Informal description

The auxiliary function `findAux` searches for the first character in the string `s` starting from position `pos` up to (but not including) `stopPos` that satisfies the predicate `p`. If such a character is found, its position is returned; otherwise, the function returns `stopPos`. The search proceeds by checking each character in sequence, moving to the next character position using `s.next pos` if the current character does not satisfy `p`.

String.find definition

(s : String) (p : Char → Bool) : Pos

Full source

/--
Finds the position of the first character in a string for which the Boolean predicate `p` returns
`true`. If there is no such character in the string, then the end position of the string is
returned.

Examples:
 * `"coffee tea water".find (·.isWhitespace) = ⟨6⟩`
 * `"tea".find (· == 'X') = ⟨3⟩`
 * `"".find (· == 'X') = ⟨0⟩`
-/
@[inline] def find (s : String) (p : Char → Bool) : Pos :=
  findAux s p s.endPos 0

First occurrence of a character satisfying a predicate in a string

Informal description

Given a string $s$ and a predicate $p$ on characters, the function returns the byte position of the first character in $s$ that satisfies $p$. If no such character exists, it returns the end position of $s$. Examples: - For the string `"coffee tea water"` and predicate `(·.isWhitespace)`, the function returns position 6 (the first whitespace character). - For the string `"tea"` and predicate `(· == 'X')`, the function returns position 3 (the end of the string). - For the empty string `""` and any predicate, the function returns position 0.

String.revFindAux definition

(s : String) (p : Char → Bool) (pos : Pos) : Option Pos

Full source

def revFindAux (s : String) (p : Char → Bool) (pos : Pos) : Option Pos :=
  if h : pos = 0 then none
  else
    have := prev_lt_of_pos s pos h
    let pos := s.prev pos
    if p (s.get pos) then some pos
    else revFindAux s p pos
termination_by pos.1

Backward character search in a string from a given position

Informal description

The auxiliary function `String.revFindAux` searches backward in the string `s` from the byte position `pos` for the first character satisfying the predicate `p`. If `pos` is `0`, it returns `none`. Otherwise, it moves to the previous character position and checks if the character at that position satisfies `p`. If it does, it returns that position; otherwise, it continues searching backward. The search terminates when the beginning of the string is reached. More formally, given a string `s`, a predicate `p` on characters, and a starting byte position `pos`, the function returns: - `none` if `pos = 0`, - `some pos'` where `pos'` is the first position before or at `pos` where `p (s.get pos')` holds, if such a position exists, - otherwise, it continues searching backward from `s.prev pos`. The termination is ensured by the decreasing byte position index.

String.revFind definition

(s : String) (p : Char → Bool) : Option Pos

Full source

/--
Finds the position of the last character in a string for which the Boolean predicate `p` returns
`true`. If there is no such character in the string, then `none` is returned.

Examples:
 * `"coffee tea water".revFind (·.isWhitespace) = some ⟨10⟩`
 * `"tea".revFind (· == 'X') = none`
 * `"".revFind (· == 'X') = none`
-/
@[inline] def revFind (s : String) (p : Char → Bool) : Option Pos :=
  revFindAux s p s.endPos

Last occurrence of a character satisfying a predicate in a string

Informal description

Given a string $s$ and a predicate $p$ on characters, the function returns the byte position of the last character in $s$ that satisfies $p$, if such a character exists. Otherwise, it returns `none`. **Examples:** - For $s = \texttt{"coffee tea water"}$ and $p$ testing for whitespace, the result is $\texttt{some } 10$ (position of the last space). - For $s = \texttt{"tea"}$ and $p$ testing for 'X', the result is $\texttt{none}$. - For $s = \texttt{""}$ and any $p$, the result is $\texttt{none}$.

String.Pos.min abbrev

(p₁ p₂ : Pos) : Pos

Full source

/--
Returns either `p₁` or `p₂`, whichever has the least byte index.
-/
abbrev Pos.min (p₁ p₂ : Pos) : Pos :=
  { byteIdx := p₁.byteIdx.min p₂.byteIdx }

Minimum Byte Position in UTF-8 String

Informal description

Given two byte positions $p₁$ and $p₂$ in a UTF-8 string, the function returns the position with the smaller byte index, i.e., $\min(p₁, p₂)$.

String.firstDiffPos definition

(a b : String) : Pos

Full source

/--
Returns the first position where the two strings differ.

If one string is a prefix of the other, then the returned position is the end position of the
shorter string. If the strings are identical, then their end position is returned.

Examples:
* `"tea".firstDiffPos "ten" = ⟨2⟩`
* `"tea".firstDiffPos "tea" = ⟨3⟩`
* `"tea".firstDiffPos "teas" = ⟨3⟩`
* `"teas".firstDiffPos "tea" = ⟨3⟩`
-/
def firstDiffPos (a b : String) : Pos :=
  let stopPos := a.endPos.min b.endPos
  let rec loop (i : Pos) : Pos :=
    if h : i < stopPos then
      if a.get i != b.get i then i
      else
        have := Nat.sub_lt_sub_left h (lt_next a i)
        loop (a.next i)
    else i
    termination_by stopPos.1 - i.1
  loop 0

First differing position in UTF-8 strings

Informal description

Given two strings $a$ and $b$, the function returns the first byte position where the two strings differ in their UTF-8 encoding. If one string is a prefix of the other, it returns the end position of the shorter string. If the strings are identical, it returns their common end position. **Examples:** - For $a = \texttt{"tea"}$ and $b = \texttt{"ten"}$, the result is position $2$. - For $a = \texttt{"tea"}$ and $b = \texttt{"tea"}$, the result is position $3$. - For $a = \texttt{"tea"}$ and $b = \texttt{"teas"}$, the result is position $3$. - For $a = \texttt{"teas"}$ and $b = \texttt{"tea"}$, the result is position $3$.

String.extract definition

: (@& String) → (@& Pos) → (@& Pos) → String

Full source

/--
Creates a new string that consists of the region of the input string delimited by the two positions.

The result is `""` if the start position is greater than or equal to the end position or if the
start position is at the end of the string. If either position is invalid (that is, if either points
at the middle of a multi-byte UTF-8 character) then the result is unspecified.

Examples:
* `"red green blue".extract ⟨0⟩ ⟨3⟩ = "red"`
* `"red green blue".extract ⟨3⟩ ⟨0⟩ = ""`
* `"red green blue".extract ⟨0⟩ ⟨100⟩ = "red green blue"`
* `"red green blue".extract ⟨4⟩ ⟨100⟩ = "green blue"`
* `"L∃∀N".extract ⟨2⟩ ⟨100⟩ = "green blue"`
-/
@[extern "lean_string_utf8_extract"]
def extract : (@& String) → (@& Pos) → (@& Pos) → String
  | ⟨s⟩, b, e => if b.byteIdx ≥ e.byteIdx then "" else ⟨go₁ s 0 b e⟩
where
  go₁ : List Char → Pos → Pos → Pos → List Char
    | [],        _, _, _ => []
    | s@(c::cs), i, b, e => if i = b then go₂ s i e else go₁ cs (i + c) b e

  go₂ : List Char → Pos → Pos → List Char
    | [],    _, _ => []
    | c::cs, i, e => if i = e then [] else c :: go₂ cs (i + c) e

Substring extraction from UTF-8 string

Informal description

Given a string $s$ and two byte positions $b$ and $e$ within $s$, the function returns the substring of $s$ starting at position $b$ (inclusive) and ending at position $e$ (exclusive). If $b \geq e$ or if $b$ is at the end of the string, the result is the empty string. The behavior is unspecified if either position points to the middle of a multi-byte UTF-8 character. **Examples:** - For $s = \texttt{"red green blue"}$, $\texttt{extract}\ s\ 0\ 3 = \texttt{"red"}$ - For $s = \texttt{"red green blue"}$, $\texttt{extract}\ s\ 3\ 0 = \texttt{""}$ - For $s = \texttt{"red green blue"}$, $\texttt{extract}\ s\ 0\ 100 = \texttt{"red green blue"}$ - For $s = \texttt{"red green blue"}$, $\texttt{extract}\ s\ 4\ 100 = \texttt{"green blue"}$

String.splitAux definition

(s : String) (p : Char → Bool) (b : Pos) (i : Pos) (r : List String) : List String

Full source

@[specialize] def splitAux (s : String) (p : Char → Bool) (b : Pos) (i : Pos) (r : List String) : List String :=
  if h : s.atEnd i then
    let r := (s.extract b i)::r
    r.reverse
  else
    have := Nat.sub_lt_sub_left (Nat.gt_of_not_le (mt decide_eq_true h)) (lt_next s _)
    if p (s.get i) then
      let i' := s.next i
      splitAux s p i' i' (s.extract b i :: r)
    else
      splitAux s p b (s.next i) r
termination_by s.endPos.1 - i.1

String splitting auxiliary function

Informal description

The auxiliary function `splitAux` splits a string `s` into a list of substrings based on a predicate `p` that checks each character. The function takes: - `s`: the input string to be split - `p`: a predicate function on characters that determines split points - `b`: the starting position in `s` for the current substring - `i`: the current position in `s` being examined - `r`: an accumulator list of substrings already split (in reverse order) The function processes the string character by character. When a character satisfies `p`, it splits the string at that position, adds the substring from `b` to `i` to the accumulator, and continues processing from the next position. When the end of the string is reached, it returns the accumulated substrings in correct order.

String.split definition

(s : String) (p : Char → Bool) : List String

Full source

/--
Splits a string at each character for which `p` returns `true`.

The characters that satisfy `p` are not included in any of the resulting strings. If multiple
characters in a row satisfy `p`, then the resulting list will contain empty strings.

Examples:
* `"coffee tea water".split (·.isWhitespace) = ["coffee", "tea", "water"]`
* `"coffee  tea  water".split (·.isWhitespace) = ["coffee", "", "tea", "", "water"]`
* `"fun x =>\n  x + 1\n".split (· == '\n') = ["fun x =>", "  x + 1", ""]`
-/
@[specialize] def split (s : String) (p : Char → Bool) : List String :=
  splitAux s p 0 0 []

String splitting by predicate

Informal description

The function splits a string `s` into a list of substrings by separating `s` at each character that satisfies the predicate `p`. The characters that satisfy `p` are not included in the resulting substrings. If multiple consecutive characters satisfy `p`, the resulting list will contain empty strings between them. **Examples:** - Splitting `"coffee tea water"` on whitespace gives `["coffee", "tea", "water"]` - Splitting `"coffee tea water"` on whitespace gives `["coffee", "", "tea", "", "water"]` - Splitting `"fun x =>\n x + 1\n"` on newlines gives `["fun x =>", " x + 1", ""]`

String.splitOnAux definition

(s sep : String) (b : Pos) (i : Pos) (j : Pos) (r : List String) : List String

Full source

/--
Auxiliary for `splitOn`. Preconditions:
* `sep` is not empty
* `b <= i` are indexes into `s`
* `j` is an index into `sep`, and not at the end

It represents the state where we have currently parsed some split parts into `r` (in reverse order),
`b` is the beginning of the string / the end of the previous match of `sep`, and the first `j` bytes
of `sep` match the bytes `i-j .. i` of `s`.
-/
def splitOnAux (s sep : String) (b : Pos) (i : Pos) (j : Pos) (r : List String) : List String :=
  if s.atEnd i then
    let r := (s.extract b i)::r
    r.reverse
  else
    if s.get i == sep.get j then
      let i := s.next i
      let j := sep.next j
      if sep.atEnd j then
        splitOnAux s sep i i 0 (s.extract b (i - j)::r)
      else
        splitOnAux s sep b i j r
    else
      splitOnAux s sep b (s.next (i - j)) 0 r
termination_by (s.endPos.1 - (i - j).1, sep.endPos.1 - j.1)
decreasing_by
  all_goals simp_wf
  focus
    rename_i h _ _
    left; exact Nat.sub_lt_sub_left
      (Nat.lt_of_le_of_lt (Nat.sub_le ..) (Nat.gt_of_not_le (mt decide_eq_true h)))
      (Nat.lt_of_le_of_lt (Nat.sub_le ..) (lt_next s _))
  focus
    rename_i i₀ j₀ _ eq h'
    rw [show (s.next i₀ - sep.next j₀).1 = (i₀ - j₀).1 by
      show (_ + Char.utf8Size _) - (_ + Char.utf8Size _) = _
      rw [(beq_iff_eq ..).1 eq, Nat.add_sub_add_right]; rfl]
    right; exact Nat.sub_lt_sub_left
      (Nat.lt_of_le_of_lt (Nat.le_add_right ..) (Nat.gt_of_not_le (mt decide_eq_true h')))
      (lt_next sep _)
  focus
    rename_i h _
    left; exact Nat.sub_lt_sub_left
      (Nat.lt_of_le_of_lt (Nat.sub_le ..) (Nat.gt_of_not_le (mt decide_eq_true h)))
      (lt_next s _)

Auxiliary string splitting function with separator

Informal description

The auxiliary function for `splitOn` that splits a string `s` into parts separated by the substring `sep`. It maintains the state where: - `sep` is non-empty - `b` and `i` are positions in `s` with `b ≤ i` - `j` is a position in `sep` not at the end - `r` accumulates the split parts in reverse order The function checks if the current position `i` in `s` matches the position `j` in `sep`. If a full match is found, it adds the substring from `b` to `i-j` to `r` and continues splitting. Otherwise, it continues searching for matches.

String.instInhabited instance

: Inhabited String

Full source

instance : Inhabited String := ⟨""⟩

Strings are Inhabited

Informal description

The type of strings is inhabited, with a default string value.

String.instAppend instance

: Append String

Full source

instance : Append String := ⟨String.append⟩

String Concatenation Operation

Informal description

The type of strings is equipped with a canonical append operation `++` that concatenates two strings.

String.pushn definition

(s : String) (c : Char) (n : Nat) : String

Full source

/--
Adds multiple repetitions of a character to the end of a string.

Returns `s`, with `n` repetitions of `c` at the end. Internally, the implementation repeatedly calls
`String.push`, so the string is modified in-place if there is a unique reference to it.

Examples:
 * `"indeed".pushn '!' 2 = "indeed!!"`
 * `"indeed".pushn '!' 0 = "indeed"`
 * `"".pushn ' ' 4 = "    "`
-/
@[inline] def pushn (s : String) (c : Char) (n : Nat) : String :=
  n.repeat (fun s => s.push c) s

String append repeated character

Informal description

The function appends a character $c$ to the end of a string $s$ exactly $n$ times, returning the resulting string. If $n = 0$, the original string $s$ is returned unchanged. Examples: - $\texttt{"indeed".pushn '!' 2} = \texttt{"indeed!!"}$ - $\texttt{"indeed".pushn '!' 0} = \texttt{"indeed"}$ - $\texttt{"".pushn ' ' 4} = \texttt{" "}$

String.isEmpty definition

(s : String) : Bool

Full source

/--
Checks whether a string is empty.

Empty strings are equal to `""` and have length and end position `0`.

Examples:
 * `"".isEmpty = true`
 * `"empty".isEmpty = false`
 * `" ".isEmpty = false`
-/
@[inline] def isEmpty (s : String) : Bool :=
  s.endPos == 0

String emptiness check

Informal description

The function checks whether a string $s$ is empty by comparing its end position to $0$. A string is empty if and only if its end position is $0$, which means it has no characters. Examples: - $\text{isEmpty ""} = \text{true}$ - $\text{isEmpty "empty"} = \text{false}$ - $\text{isEmpty " "} = \text{false}$

String.join definition

(l : List String) : String

Full source

/--
Appends all the strings in a list of strings, in order.

Use `String.intercalate` to place a separator string between the strings in a list.

Examples:
 * `String.join ["gr", "ee", "n"] = "green"`
 * `String.join ["b", "", "l", "", "ue"] = "red"`
 * `String.join [] = ""`
-/
@[inline] def join (l : List String) : String :=
  l.foldl (fun r s => r ++ s) ""

String concatenation from a list

Informal description

The function `String.join` takes a list of strings and concatenates them in order, returning the resulting string. For example: - $\text{join } [\text{"gr"}, \text{"ee"}, \text{"n"}] = \text{"green"}$ - $\text{join } [\text{"b"}, \text{""}, \text{"l"}, \text{""}, \text{"ue"}] = \text{"blue"}$ - $\text{join } [] = \text{""}$

String.singleton definition

(c : Char) : String

Full source

/--
Returns a new string that contains only the character `c`.

Because strings are encoded in UTF-8, the resulting string may take multiple bytes.

Examples:
 * `String.singleton 'L' = "L"`
 * `String.singleton ' ' = " "`
 * `String.singleton '"' = "\""`
 * `String.singleton '𝒫' = "𝒫"`
-/
@[inline] def singleton (c : Char) : String :=
  "".push c

String containing a single character

Informal description

The function creates a new string consisting of the single Unicode character $c$. Due to UTF-8 encoding, the resulting string may occupy multiple bytes. Examples: - $\texttt{String.singleton 'L'} = \texttt{"L"}$ - $\texttt{String.singleton ' '} = \texttt{" "}$ - $\texttt{String.singleton '"'} = \texttt{"\""}$ - $\texttt{String.singleton '𝒫'} = \texttt{"𝒫"}$

String.intercalate definition

(s : String) : List String → String

Full source

/--
Appends the strings in a list of strings, placing the separator `s` between each pair.

Examples:
 * `", ".intercalate ["red", "green", "blue"] = "red, green, blue"`
 * `" and ".intercalate ["tea", "coffee"] = "tea and coffee"`
 * `" | ".intercalate ["M", "", "N"] = "M |  | N"`
-/
def intercalate (s : String) : List String → String
  | []      => ""
  | a :: as => go a s as
where go (acc : String) (s : String) : List String → String
  | a :: as => go (acc ++ s ++ a) s as
  | []      => acc

String concatenation with separator

Informal description

The function takes a separator string $s$ and a list of strings, then concatenates the strings in the list with $s$ inserted between each adjacent pair. Examples: - $\text{intercalate } \text{", " } [\text{"red"}, \text{"green"}, \text{"blue"}] = \text{"red, green, blue"}$ - $\text{intercalate } \text{" and " } [\text{"tea"}, \text{"coffee"}] = \text{"tea and coffee"}$ - $\text{intercalate } \text{" | " } [\text{"M"}, \text{""}, \text{"N"}] = \text{"M | | N"}$

String.Iterator structure

Full source

/--
An iterator over the characters (Unicode code points) in a `String`. Typically created by
`String.iter`.

String iterators pair a string with a valid byte index. This allows efficient character-by-character
processing of strings while avoiding the need to manually ensure that byte indices are used with the
correct strings.

An iterator is *valid* if the position `i` is *valid* for the string `s`, meaning `0 ≤ i ≤ s.endPos`
and `i` lies on a UTF8 byte boundary. If `i = s.endPos`, the iterator is at the end of the string.

Most operations on iterators return unspecified values if the iterator is not valid. The functions
in the `String.Iterator` API rule out the creation of invalid iterators, with two exceptions:
- `Iterator.next iter` is invalid if `iter` is already at the end of the string (`iter.atEnd` is
  `true`), and
- `Iterator.forward iter n`/`Iterator.nextn iter n` is invalid if `n` is strictly greater than the
  number of remaining characters.
-/
structure Iterator where
  /-- The string being iterated over. -/
  s : String
  /-- The current UTF-8 byte position in the string `s`.

  This position is not guaranteed to be valid for the string. If the position is not valid, then the
  current character is `(default : Char)`, similar to `String.get` on an invalid position.
  -/
  i : Pos
  deriving DecidableEq, Inhabited

String Iterator

Informal description

A structure representing an iterator over the Unicode code points (characters) in a string. An iterator consists of a string and a valid byte index position within that string. The iterator is valid when the position is between 0 and the string's end position (inclusive) and lies on a UTF-8 byte boundary. When the position equals the string's end position, the iterator is considered at the end of the string. Most operations on invalid iterators return unspecified values, though the API is designed to prevent creating invalid iterators except in two cases: when advancing past the end of the string, or when moving forward more characters than remain in the string.

String.instDecidableEqIterator instance

: DecidableEq✝ (@String.Iterator✝)

Full source

DecidableEq, Inhabited

Decidable Equality for String Iterators

Informal description

For any two string iterators, the equality between them is decidable.

String.instInhabitedIterator instance

: Inhabited✝ (@String.Iterator)

Full source

Inhabited

String Iterators are Inhabited

Informal description

The type of string iterators is inhabited, with a default iterator value.

String.mkIterator definition

(s : String) : Iterator

Full source

/-- Creates an iterator at the beginning of the string. -/
@[inline] def mkIterator (s : String) : Iterator :=
  ⟨s, 0⟩

String iterator constructor

Informal description

The function creates a new iterator for the string `s` positioned at the beginning (byte position 0).

String.iter abbrev

Full source

@[inherit_doc mkIterator]
abbrev iter := mkIterator

String Iterator Constructor

Informal description

The function `String.iter` creates an iterator for a given string `s`, positioned at the beginning of the string (byte position 0). This iterator can be used to traverse the Unicode code points (characters) of the string sequentially.

String.instSizeOfIterator instance

: SizeOf String.Iterator

Full source

/--
The size of a string iterator is the number of bytes remaining.

Recursive functions that iterate towards the end of a string will typically decrease this measure.
-/
instance : SizeOf String.Iterator where
  sizeOf i := i.1.utf8ByteSize - i.2.byteIdx

Size of String Iterator as Remaining Bytes

Informal description

For any string iterator, its size is defined as the number of remaining bytes in the string from the iterator's current position to the end of the string. This measure is used to ensure termination in recursive functions that iterate towards the end of a string.

String.Iterator.sizeOf_eq theorem

(i : String.Iterator) : sizeOf i = i.1.utf8ByteSize - i.2.byteIdx

Full source

theorem Iterator.sizeOf_eq (i : String.Iterator) : sizeOf i = i.1.utf8ByteSize - i.2.byteIdx :=
  rfl

Size of String Iterator Equals Remaining Bytes

Informal description

For any string iterator $i$, the size of $i$ is equal to the difference between the UTF-8 byte size of the underlying string and the byte index of the iterator's current position, i.e., $\text{sizeOf}(i) = \text{utf8ByteSize}(i.1) - i.2.\text{byteIdx}$.

String.Iterator.toString definition

Full source

@[inline, inherit_doc Iterator.s]
def toString := Iterator.s

String from iterator

Informal description

The function `String.Iterator.toString` takes a string iterator and returns the underlying string that the iterator is traversing.

String.Iterator.remainingBytes definition

: Iterator → Nat

Full source

/--
The number of UTF-8 bytes remaining in the iterator.
-/
@[inline] def remainingBytes : Iterator → Nat
  | ⟨s, i⟩ => s.endPos.byteIdx - i.byteIdx

Remaining UTF-8 bytes in string iterator

Informal description

For a given string iterator `i` consisting of a string `s` and a byte position `i.byteIdx`, the function returns the number of UTF-8 bytes remaining in the string from the current position to the end of the string, computed as `s.endPos.byteIdx - i.byteIdx`.

String.Iterator.pos definition

Full source

@[inline, inherit_doc Iterator.i]
def pos := Iterator.i

Byte position of string iterator

Informal description

The function returns the current byte position of a string iterator within its underlying UTF-8 encoded string. The position is valid when it lies on a character boundary and is between 0 and the string's end position (inclusive).

String.Iterator.curr definition

: Iterator → Char

Full source

/--
Gets the character at the iterator's current position.

A run-time bounds check is performed. Use `String.Iterator.curr'` to avoid redundant bounds checks.

If the position is invalid, returns `(default : Char)`.
-/
@[inline] def curr : Iterator → Char
  | ⟨s, i⟩ => get s i

Current character of string iterator

Informal description

The function returns the Unicode character at the current position of a string iterator. If the iterator's position is invalid (out of bounds or not on a character boundary), it returns the default character `'A'`.

String.Iterator.next definition

: Iterator → Iterator

Full source

/--
Moves the iterator's position forward by one character, unconditionally.

It is only valid to call this function if the iterator is not at the end of the string (i.e.
if `Iterator.atEnd` is `false`); otherwise, the resulting iterator will be invalid.
-/
@[inline] def next : Iterator → Iterator
  | ⟨s, i⟩ => ⟨s, s.next i⟩

Advance string iterator to next character

Informal description

Given a string iterator `⟨s, i⟩`, where `s` is a string and `i` is a valid byte position in `s` (not at the end of the string), this function returns a new iterator with the position advanced to the next UTF-8 character boundary in `s`. The new position is computed by adding the byte length of the character at position `i` to `i`. **Note:** This function should only be called when the iterator is not at the end of the string (i.e., when `Iterator.atEnd` is `false`). Calling it on an iterator at the end of the string results in an invalid iterator.

String.Iterator.prev definition

: Iterator → Iterator

Full source

/--
Moves the iterator's position backward by one character, unconditionally.

The position is not changed if the iterator is at the beginning of the string.
-/
@[inline] def prev : Iterator → Iterator
  | ⟨s, i⟩ => ⟨s, s.prev i⟩

Move string iterator backward by one character

Informal description

Given a string iterator, this function moves the iterator's position backward by one Unicode character (code point). If the iterator is already at the beginning of the string (position 0), the position remains unchanged.

String.Iterator.atEnd definition

: Iterator → Bool

Full source

/--
Checks whether the iterator is past its string's last character.
-/
@[inline] def atEnd : Iterator → Bool
  | ⟨s, i⟩ => i.byteIdx ≥ s.endPos.byteIdx

String iterator at end check

Informal description

Given a string iterator `⟨s, i⟩`, where `s` is a string and `i` is a byte position in `s`, the function returns `true` if the byte index `i` is greater than or equal to the byte index of the end position of `s`, and `false` otherwise. This indicates whether the iterator has reached or passed the end of the string.

String.Iterator.hasNext definition

: Iterator → Bool

Full source

/--
Checks whether the iterator is at or before the string's last character.
-/
@[inline] def hasNext : Iterator → Bool
  | ⟨s, i⟩ => i.byteIdx < s.endPos.byteIdx

Check if iterator has next character

Informal description

Given a string iterator `it`, the function returns `true` if the iterator's current byte index is strictly less than the byte index of the string's end position (i.e., the iterator is not at or past the last character of the string), and `false` otherwise.

String.Iterator.hasPrev definition

: Iterator → Bool

Full source

/--
Checks whether the iterator is after the beginning of the string.
-/
@[inline] def hasPrev : Iterator → Bool
  | ⟨_, i⟩ => i.byteIdx > 0

Check if iterator is after string beginning

Informal description

Given a string iterator, this function returns `true` if the iterator's current position is after the beginning of the string (i.e., its byte index is greater than 0), and `false` otherwise.

String.Iterator.curr' definition

(it : Iterator) (h : it.hasNext) : Char

Full source

/--
Gets the character at the iterator's current position.

The proof of `it.hasNext` ensures that there is, in fact, a character at the current position. This
function is faster that `String.Iterator.curr` due to avoiding a run-time bounds check.
-/
@[inline] def curr' (it : Iterator) (h : it.hasNext) : Char :=
  match it with
  | ⟨s, i⟩ => get' s i (by simpa only [hasNext, endPos, decide_eq_true_eq, String.atEnd, ge_iff_le, Nat.not_le] using h)

Current character at valid iterator position

Informal description

Given a string iterator `it` and a proof `h` that the iterator has a next character (i.e., it is not at the end of the string), the function returns the Unicode character at the iterator's current position.

String.Iterator.next' definition

(it : Iterator) (h : it.hasNext) : Iterator

Full source

/--
Moves the iterator's position forward by one character, unconditionally.

The proof of `it.hasNext` ensures that there is, in fact, a position that's one character forwards.
This function is faster that `String.Iterator.next` due to avoiding a run-time bounds check.
-/
@[inline] def next' (it : Iterator) (h : it.hasNext) : Iterator :=
  match it with
  | ⟨s, i⟩ => ⟨s, s.next' i (by simpa only [hasNext, endPos, decide_eq_true_eq, String.atEnd, ge_iff_le, Nat.not_le] using h)⟩

Unconditional advance of string iterator by one character

Informal description

Given a string iterator `it` with proof `h` that it has a next character, the function returns a new iterator advanced by one character position. The advancement is performed unconditionally, without runtime bounds checking, relying on the proof `h` to ensure the operation is valid.

String.Iterator.setCurr definition

: Iterator → Char → Iterator

Full source

/--
Replaces the current character in the string.

Does nothing if the iterator is at the end of the string. If both the replacement character and the
replaced character are 7-bit ASCII characters and the string is not shared, then it is updated
in-place and not copied.
-/
@[inline] def setCurr : Iterator → Char → Iterator
  | ⟨s, i⟩, c => ⟨s.set i c, i⟩

Replace current character in string iterator

Informal description

Given a string iterator and a character, this function returns a new iterator where the current character (at the iterator's position) is replaced by the given character. If the iterator is at the end of the string, the function does nothing. The operation is performed in-place when possible (for 7-bit ASCII characters and unshared strings).

String.Iterator.toEnd definition

: Iterator → Iterator

Full source

/--
Moves the iterator's position to the end of the string, just past the last character.
-/
@[inline] def toEnd : Iterator → Iterator
  | ⟨s, _⟩ => ⟨s, s.endPos⟩

Move iterator to end of string

Informal description

Given a string iterator, this function returns a new iterator positioned at the end of the string, just after the last character.

String.Iterator.extract definition

: Iterator → Iterator → String

Full source

/--
Extracts the substring between the positions of two iterators. The first iterator's position is the
start of the substring, and the second iterator's position is the end.

Returns the empty string if the iterators are for different strings, or if the position of the first
iterator is past the position of the second iterator.
-/
@[inline] def extract : Iterator → Iterator → String
  | ⟨s₁, b⟩, ⟨s₂, e⟩ =>
    if s₁ ≠ s₂ || b > e then ""
    else s₁.extract b e

Substring extraction between iterators

Informal description

Given two iterators `i₁` and `i₂` over the same string, the function returns the substring from the position of `i₁` (inclusive) to the position of `i₂` (exclusive). If the iterators are from different strings or if the position of `i₁` is after the position of `i₂`, the function returns the empty string.

String.Iterator.forward definition

: Iterator → Nat → Iterator

Full source

/--
Moves the iterator's position forward by the specified number of characters.

The resulting iterator is only valid if the number of characters to skip is less than or equal
to the number of characters left in the iterator.
-/
def forward : Iterator → Nat → Iterator
  | it, 0   => it
  | it, n+1 => forward it.next n

Advance string iterator by `n` characters

Informal description

Given a string iterator `it` and a natural number `n`, the function `forward` returns a new iterator obtained by advancing `it` by `n` characters. The function is defined recursively: advancing by 0 characters returns the original iterator, while advancing by `n+1` characters is equivalent to advancing the iterator to the next character and then advancing the resulting iterator by `n` characters. **Note:** The resulting iterator is only valid if `n` is less than or equal to the number of remaining characters in the iterator.

String.Iterator.remainingToString definition

: Iterator → String

Full source

/--
The remaining characters in an iterator, as a string.
-/
@[inline] def remainingToString : Iterator → String
  | ⟨s, i⟩ => s.extract i s.endPos

Remaining substring from iterator

Informal description

Given a string iterator $\langle s, i \rangle$, where $s$ is a string and $i$ is a byte position within $s$, the function returns the substring of $s$ starting from position $i$ (inclusive) to the end of the string (exclusive). - If $i$ is at or beyond the end of the string, the result is the empty string. - The behavior is unspecified if $i$ points to the middle of a multi-byte UTF-8 character. **Examples:** - For an iterator $\langle \texttt{"hello"}, 2 \rangle$, the result is $\texttt{"llo"}$. - For an iterator $\langle \texttt{"hello"}, 5 \rangle$, the result is $\texttt{""}$.

String.Iterator.nextn definition

: Iterator → Nat → Iterator

Full source

@[inherit_doc forward]
def nextn : Iterator → Nat → Iterator
  | it, 0   => it
  | it, i+1 => nextn it.next i

Advance string iterator by $n$ characters

Informal description

Given a string iterator and a natural number $n$, this function advances the iterator forward by $n$ Unicode characters (code points). The function is defined recursively: advancing by 0 characters leaves the iterator unchanged, while advancing by $n+1$ characters is equivalent to advancing by 1 character followed by advancing by $n$ characters.

String.Iterator.prevn definition

: Iterator → Nat → Iterator

Full source

/--
Moves the iterator's position back by the specified number of characters, stopping at the beginning
of the string.
-/
def prevn : Iterator → Nat → Iterator
  | it, 0   => it
  | it, i+1 => prevn it.prev i

Move string iterator backward by $n$ characters

Informal description

Given a string iterator and a natural number $n$, this function moves the iterator's position backward by $n$ Unicode characters (code points). If moving backward by $n$ characters would go past the beginning of the string, the iterator stops at the beginning (position 0). The function is defined recursively: moving back 0 characters leaves the iterator unchanged, while moving back $n+1$ characters is equivalent to moving back 1 character followed by moving back $n$ characters.

String.offsetOfPosAux definition

(s : String) (pos : Pos) (i : Pos) (offset : Nat) : Nat

Full source

def offsetOfPosAux (s : String) (pos : Pos) (i : Pos) (offset : Nat) : Nat :=
  if i >= pos then offset
  else if h : s.atEnd i then
    offset
  else
    have := Nat.sub_lt_sub_left (Nat.gt_of_not_le (mt decide_eq_true h)) (lt_next s _)
    offsetOfPosAux s pos (s.next i) (offset+1)
termination_by s.endPos.1 - i.1

UTF-8 character offset between positions (auxiliary function)

Informal description

The auxiliary function `offsetOfPosAux` computes the number of UTF-8 characters (code points) between positions `i` and `pos` in string `s`, starting from an initial offset `offset`. If `i` is already at or beyond `pos`, it returns the current offset. Otherwise, it recursively processes the next valid UTF-8 character position until reaching `pos` or the end of the string.

String.offsetOfPos definition

(s : String) (pos : Pos) : Nat

Full source

/--
Returns the character index that corresponds to the provided position (i.e. UTF-8 byte index) in a
string.

If the position is at the end of the string, then the string's length in characters is returned. If
the position is invalid due to pointing at the middle of a UTF-8 byte sequence, then the character
index of the next character after the position is returned.

Examples:
* `"L∃∀N".offsetOfPos ⟨0⟩ = 0`
* `"L∃∀N".offsetOfPos ⟨1⟩ = 1`
* `"L∃∀N".offsetOfPos ⟨2⟩ = 2`
* `"L∃∀N".offsetOfPos ⟨4⟩ = 2`
* `"L∃∀N".offsetOfPos ⟨5⟩ = 3`
* `"L∃∀N".offsetOfPos ⟨50⟩ = 4`
-/
@[inline] def offsetOfPos (s : String) (pos : Pos) : Nat :=
  offsetOfPosAux s pos 0 0

Character index from byte position in UTF-8 string

Informal description

Given a UTF-8 encoded string $s$ and a byte position $pos$ in $s$, the function returns the corresponding character index (UTF-8 code point position) in $s$. If $pos$ is at the end of the string, it returns the total number of characters in $s$. If $pos$ is invalid (i.e., in the middle of a UTF-8 byte sequence), it returns the character index of the next valid character after $pos$.

String.foldlAux definition

{α : Type u} (f : α → Char → α) (s : String) (stopPos : Pos) (i : Pos) (a : α) : α

Full source

@[specialize] def foldlAux {α : Type u} (f : α → Char → α) (s : String) (stopPos : Pos) (i : Pos) (a : α) : α :=
  if h : i < stopPos then
    have := Nat.sub_lt_sub_left h (lt_next s i)
    foldlAux f s stopPos (s.next i) (f a (s.get i))
  else a
termination_by stopPos.1 - i.1

Left-fold auxiliary function for strings

Informal description

The auxiliary function `foldlAux` performs a left-fold operation on a string `s` from position `i` up to (but not including) `stopPos`, applying function `f : α → Char → α` to accumulate a result of type `α`. The initial accumulator value is `a`. The function processes each character in the string sequentially, updating the accumulator at each step until it reaches `stopPos` or the end of the string, at which point it returns the final accumulated value.

String.foldl definition

{α : Type u} (f : α → Char → α) (init : α) (s : String) : α

Full source

/--
Folds a function over a string from the left, accumulating a value starting with `init`. The
accumulated value is combined with each character in order, using `f`.

Examples:
 * `"coffee tea water".foldl (fun n c => if c.isWhitespace then n + 1 else n) 0 = 2`
 * `"coffee tea and water".foldl (fun n c => if c.isWhitespace then n + 1 else n) 0 = 3`
 * `"coffee tea water".foldl (·.push ·) "" = "coffee tea water"`
-/
@[inline] def foldl {α : Type u} (f : α → Char → α) (init : α) (s : String) : α :=
  foldlAux f s s.endPos 0 init

Left-fold over a string

Informal description

The function `foldl` performs a left-fold operation on a string `s`, applying a function `f : α → Char → α` to accumulate a result of type `α` starting from an initial value `init`. It processes each character in the string sequentially from left to right, updating the accumulator at each step until the end of the string is reached, at which point it returns the final accumulated value. Examples: - `"coffee tea water".foldl (fun n c => if c.isWhitespace then n + 1 else n) 0 = 2` - `"coffee tea and water".foldl (fun n c => if c.isWhitespace then n + 1 else n) 0 = 3` - `"coffee tea water".foldl (·.push ·) "" = "coffee tea water"`

String.foldrAux definition

{α : Type u} (f : Char → α → α) (a : α) (s : String) (i begPos : Pos) : α

Full source

@[specialize] def foldrAux {α : Type u} (f : Char → α → α) (a : α) (s : String) (i begPos : Pos) : α :=
  if h : begPos < i then
    have := String.prev_lt_of_pos s i <| mt (congrArg String.Pos.byteIdx) <|
      Ne.symm <| Nat.ne_of_lt <| Nat.lt_of_le_of_lt (Nat.zero_le _) h
    let i := s.prev i
    let a := f (s.get i) a
    foldrAux f a s i begPos
  else a
termination_by i.1

Right-fold auxiliary function for strings

Informal description

The auxiliary function for right-fold operation on a string. Given a function `f : Char → α → α`, an accumulator `a : α`, a string `s`, and two positions `i` and `begPos` in the string, it recursively applies `f` to each character in the string from right to left (starting from position `i` and moving towards `begPos`), updating the accumulator at each step. The recursion terminates when the current position `i` is no longer greater than `begPos`, at which point the accumulator is returned.

String.foldr definition

{α : Type u} (f : Char → α → α) (init : α) (s : String) : α

Full source

/--
Folds a function over a string from the right, accumulating a value starting with `init`. The
accumulated value is combined with each character in reverse order, using `f`.

Examples:
 * `"coffee tea water".foldr (fun c n => if c.isWhitespace then n + 1 else n) 0 = 2`
 * `"coffee tea and water".foldr (fun c n => if c.isWhitespace then n + 1 else n) 0 = 3`
 * `"coffee tea water".foldr (fun c s => c.push s) "" = "retaw dna aet eeffoc"`
-/
@[inline] def foldr {α : Type u} (f : Char → α → α) (init : α) (s : String) : α :=
  foldrAux f init s s.endPos 0

Right-fold over a string

Informal description

Given a function $ f : \text{Char} \to \alpha \to \alpha $, an initial value $ \text{init} : \alpha $, and a string $ s $, the function `String.foldr` applies $ f $ to each character of $ s $ from right to left, starting with `init` and accumulating the result. For example: - `"coffee tea water".foldr (fun c n => if c.isWhitespace then n + 1 else n) 0` counts the number of whitespace characters in the string, yielding $ 2 $. - `"coffee tea and water".foldr (fun c n => if c.isWhitespace then n + 1 else n) 0` yields $ 3 $. - `"coffee tea water".foldr (fun c s => c.push s) ""` reverses the string, yielding `"retaw aet eeffoc"`.

String.anyAux definition

(s : String) (stopPos : Pos) (p : Char → Bool) (i : Pos) : Bool

Full source

@[specialize] def anyAux (s : String) (stopPos : Pos) (p : Char → Bool) (i : Pos) : Bool :=
  if h : i < stopPos then
    if p (s.get i) then true
    else
      have := Nat.sub_lt_sub_left h (lt_next s i)
      anyAux s stopPos p (s.next i)
  else false
termination_by stopPos.1 - i.1

Auxiliary function for checking if any character in a substring satisfies a predicate

Informal description

The auxiliary function `String.anyAux` checks if any character in the substring of `s` from position `i` to `stopPos` satisfies the predicate `p`. It recursively processes the string from position `i` towards `stopPos`, returning `true` as soon as it finds a character satisfying `p`, and `false` if no such character is found before reaching `stopPos`. The termination is ensured by the decreasing difference between the byte positions `stopPos` and `i`.

String.any definition

(s : String) (p : Char → Bool) : Bool

Full source

/--
Checks whether there is a character in a string for which the Boolean predicate `p` returns `true`.

Short-circuits at the first character for which `p` returns `true`.

Examples:
 * `"brown".any (·.isLetter) = true`
 * `"brown".any (·.isWhitespace) = false`
 * `"brown and orange".any (·.isLetter) = true`
 * `"".any (fun _ => false) = false`
-/
@[inline] def any (s : String) (p : Char → Bool) : Bool :=
  anyAux s s.endPos p 0

Existence of a character satisfying a predicate in a string

Informal description

Given a string $s$ and a predicate $p$ on characters, the function returns `true` if there exists at least one character in $s$ for which $p$ returns `true`, and `false` otherwise. The function short-circuits and returns `true` as soon as it finds such a character. Examples: - `"brown".any (·.isLetter) = true` - `"brown".any (·.isWhitespace) = false` - `"brown and orange".any (·.isLetter) = true` - `"".any (fun _ => false) = false`

String.all definition

(s : String) (p : Char → Bool) : Bool

Full source

/--
Checks whether the Boolean predicate `p` returns `true` for every character in a string.

Short-circuits at the first character for which `p` returns `false`.

Examples:
 * `"brown".all (·.isLetter) = true`
 * `"brown and orange".all (·.isLetter) = false`
 * `"".all (fun _ => false) = true`
-/
@[inline] def all (s : String) (p : Char → Bool) : Bool :=
  !s.any (fun c => !p c)

Universal quantification over characters in a string

Informal description

Given a string $s$ and a predicate $p$ on characters, the function returns `true` if every character in $s$ satisfies $p$, and `false` otherwise. The function short-circuits and returns `false` as soon as it finds a character that does not satisfy $p$. Examples: - `"brown".all (·.isLetter) = true` - `"brown and orange".all (·.isLetter) = false` - `"".all (fun _ => false) = true`

String.contains definition

(s : String) (c : Char) : Bool

Full source

/--
Checks whether a string contains the specified character.

Examples:
* `"green".contains 'e' = true`
* `"green".contains 'x' = false`
* `"".contains 'x' = false`
-/
@[inline] def contains (s : String) (c : Char) : Bool :=
s.any (fun a => a == c)

String contains character

Informal description

The function checks whether a string $s$ contains a given Unicode character $c$, returning `true` if $c$ appears in $s$ and `false` otherwise.

String.utf8SetAux_of_gt theorem

(c' : Char) : ∀ (cs : List Char) {i p : Pos}, i > p → utf8SetAux c' cs i p = cs

Full source

theorem utf8SetAux_of_gt (c' : Char) : ∀ (cs : List Char) {i p : Pos}, i > p → utf8SetAux c' cs i p = cs
  | [],    _, _, _ => rfl
  | c::cs, i, p, h => by
    rw [utf8SetAux, if_neg (mt (congrArg (·.1)) (Ne.symm <| Nat.ne_of_lt h)), utf8SetAux_of_gt c' cs]
    exact Nat.lt_of_lt_of_le h (Nat.le_add_right ..)

UTF-8 String Auxiliary Replacement Preserves List When Position Exceeds Target

Informal description

For any Unicode character `c'`, list of characters `cs`, and byte positions `i` and `p` in a UTF-8 string, if `i > p`, then the auxiliary function `utf8SetAux` leaves the list unchanged, i.e., `utf8SetAux c' cs i p = cs`.

String.set_next_add theorem

 (s : String) (i : Pos) (c : Char) (b₁ b₂) (h : (s.next i).1 + b₁ = s.endPos.1 + b₂) :
  ((s.set i c).next i).1 + b₁ = (s.set i c).endPos.1 + b₂

Full source

theorem set_next_add (s : String) (i : Pos) (c : Char) (b₁ b₂)
    (h : (s.next i).1 + b₁ = s.endPos.1 + b₂) :
    ((s.set i c).next i).1 + b₁ = (s.set i c).endPos.1 + b₂ := by
  simp [next, get, set, endPos, utf8ByteSize] at h ⊢
  rw [Nat.add_comm i.1, Nat.add_assoc] at h ⊢
  let rec foo : ∀ cs a b₁ b₂,
    (utf8GetAux cs a i).utf8Size + b₁ = utf8ByteSize.go cs + b₂ →
    (utf8GetAux (utf8SetAux c cs a i) a i).utf8Size + b₁ = utf8ByteSize.go (utf8SetAux c cs a i) + b₂
  | [], _, _, _, h => h
  | c'::cs, a, b₁, b₂, h => by
    unfold utf8SetAux
    apply iteInduction (motive := fun p => (utf8GetAux p a i).utf8Size + b₁ = utf8ByteSize.go p + b₂) <;>
      intro h' <;> simp [utf8GetAux, h', utf8ByteSize.go] at h ⊢
    next =>
      rw [Nat.add_assoc, Nat.add_left_comm] at h ⊢; rw [Nat.add_left_cancel h]
    next =>
      rw [Nat.add_assoc] at h ⊢
      refine foo cs (a + c') b₁ (c'.utf8Size + b₂) h
  exact foo s.1 0 _ _ h

Preservation of Position Equation under Character Replacement in UTF-8 Strings

Informal description

Let $s$ be a UTF-8 encoded string, $i$ a valid byte position in $s$, and $c$ a Unicode character. For any natural numbers $b_1$ and $b_2$ such that $\text{next}(s, i).1 + b_1 = \text{endPos}(s).1 + b_2$, where $\text{next}(s, i)$ is the next character boundary after $i$ and $\text{endPos}(s)$ is the end position of $s$, it holds that after replacing the character at position $i$ with $c$, the equality $\text{next}(\text{set}(s, i, c), i).1 + b_1 = \text{endPos}(\text{set}(s, i, c)).1 + b_2$ remains valid.

String.mapAux_lemma theorem

(s : String) (i : Pos) (c : Char) (h : ¬s.atEnd i) : (s.set i c).endPos.1 - ((s.set i c).next i).1 < s.endPos.1 - i.1

Full source

theorem mapAux_lemma (s : String) (i : Pos) (c : Char) (h : ¬s.atEnd i) :
    (s.set i c).endPos.1 - ((s.set i c).next i).1 < s.endPos.1 - i.1 :=
  suffices (s.set i c).endPos.1 - ((s.set i c).next i).1 = s.endPos.1 - (s.next i).1 by
    rw [this]
    apply Nat.sub_lt_sub_left (Nat.gt_of_not_le (mt decide_eq_true h)) (lt_next ..)
  Nat.sub.elim (motive := (_ = ·)) _ _
    (fun _ k e =>
      have := set_next_add _ _ _ k 0 e.symm
      Nat.sub_eq_of_eq_add <| this.symm.trans <| Nat.add_comm ..)
    (fun h => by
      have ⟨k, e⟩ := Nat.le.dest h
      rw [Nat.succ_add] at e
      have : ((s.set i c).next i).1 = _ := set_next_add _ _ c 0 k.succ e.symm
      exact Nat.sub_eq_zero_of_le (this ▸ Nat.le_add_right ..))

Decreasing Remaining Length Lemma for UTF-8 String Modification

Informal description

For a UTF-8 encoded string $s$, a byte position $i$ in $s$ that is not at the end of the string, and a Unicode character $c$, the difference between the end position of the modified string $s.\text{set}(i, c)$ and the next character boundary after $i$ in the modified string is strictly less than the difference between the end position of the original string $s$ and the position $i$. In other words, after replacing the character at position $i$ with $c$, the remaining length (in bytes) from the next character boundary to the end of the string decreases compared to the original remaining length from $i$ to the end.

String.mapAux definition

(f : Char → Char) (i : Pos) (s : String) : String

Full source

@[specialize] def mapAux (f : Char → Char) (i : Pos) (s : String) : String :=
  if h : s.atEnd i then s
  else
    let c := f (s.get i)
    have := mapAux_lemma s i c h
    let s := s.set i c
    mapAux f (s.next i) s
termination_by s.endPos.1 - i.1

Auxiliary string mapping function

Informal description

The auxiliary function `String.mapAux` applies a function `f : Char → Char` to each character in a string `s` starting from a given byte position `i`. If the position `i` is at the end of the string, the function returns the string unchanged. Otherwise, it applies `f` to the character at position `i`, updates the string with the result, and recursively processes the next character position. The termination of this function is ensured by the decreasing difference between the end position of the string and the current position `i`.

String.map definition

(f : Char → Char) (s : String) : String

Full source

/--
Applies the function `f` to every character in a string, returning a string that contains the
resulting characters.

Examples:
 * `"abc123".map Char.toUpper = "ABC123"`
 * `"".map Char.toUpper = ""`
-/
@[inline] def map (f : Char → Char) (s : String) : String :=
  mapAux f 0 s

String character mapping function

Informal description

The function `String.map` applies a function $f : \text{Char} \to \text{Char}$ to each character in a string $s$, returning a new string where each character is replaced by the result of applying $f$ to it. The function processes the string from the beginning (byte position 0) to the end.

String.isNat definition

(s : String) : Bool

Full source

/--
Checks whether the string can be interpreted as the decimal representation of a natural number.

A string can be interpreted as a decimal natural number if it is not empty and all the characters in
it are digits.

Use `String.toNat?` or `String.toNat!` to convert such a string to a natural number.

Examples:
 * `"".isNat = false`
 * `"0".isNat = true`
 * `"5".isNat = true`
 * `"05".isNat = true`
 * `"587".isNat = true`
 * `"-587".isNat = false`
 * `" 5".isNat = false`
 * `"2+3".isNat = false`
 * `"0xff".isNat = false`
-/
@[inline] def isNat (s : String) : Bool :=
  !s.isEmpty!s.isEmpty && s.all (·.isDigit)

Decimal natural number string check

Informal description

A string $s$ is considered a valid decimal representation of a natural number if it is non-empty and every character in $s$ is an ASCII digit (i.e., between '0' and '9' inclusive).

String.toNat? definition

(s : String) : Option Nat

Full source

/--
Interprets a string as the decimal representation of a natural number, returning it. Returns `none`
if the string does not contain a decimal natural number.

A string can be interpreted as a decimal natural number if it is not empty and all the characters in
it are digits.

Use `String.isNat` to check whether `String.toNat?` would return `some`. `String.toNat!` is an
alternative that panics instead of returning `none` when the string is not a natural number.

Examples:
 * `"".toNat? = none`
 * `"0".toNat? = some 0`
 * `"5".toNat? = some 5`
 * `"587".toNat? = some 587`
 * `"-587".toNat? = none`
 * `" 5".toNat? = none`
 * `"2+3".toNat? = none`
 * `"0xff".toNat? = none`
-/
def toNat? (s : String) : Option Nat :=
  if s.isNat then
    some <| s.foldl (fun n c => n*10 + (c.toNat - '0'.toNat)) 0
  else
    none

String to natural number conversion (decimal)

Informal description

The function interprets a string $ s $ as the decimal representation of a natural number, returning it as `some n` where $ n $ is the corresponding natural number. If the string is empty or contains any non-digit characters, it returns `none`. More precisely, the function checks if $ s $ is non-empty and consists only of ASCII digits (0-9). If so, it computes the natural number by iteratively processing each character $ c $ in $ s $, updating the accumulator $ n $ as $ n \times 10 + (\text{code point of } c - \text{code point of '0'}) $. Otherwise, it returns `none`.

String.substrEq definition

(s1 : String) (pos1 : String.Pos) (s2 : String) (pos2 : String.Pos) (sz : Nat) : Bool

Full source

/--
Checks whether substrings of two strings are equal. Substrings are indicated by their starting
positions and a size in _UTF-8 bytes_. Returns `false` if the indicated substring does not exist in
either string.
-/
def substrEq (s1 : String) (pos1 : String.Pos) (s2 : String) (pos2 : String.Pos) (sz : Nat) : Bool :=
  pos1.byteIdx + sz ≤ s1.endPos.byteIdx && pos2.byteIdx + sz ≤ s2.endPos.byteIdxpos1.byteIdx + sz ≤ s1.endPos.byteIdx && pos2.byteIdx + sz ≤ s2.endPos.byteIdx && loop pos1 pos2 { byteIdx := pos1.byteIdx + sz }
where
  loop (off1 off2 stop1 : Pos) :=
    if _h : off1.byteIdx < stop1.byteIdx then
      let c₁ := s1.get off1
      let c₂ := s2.get off2
      c₁ == c₂ && loop (off1 + c₁) (off2 + c₂) stop1
    else true
  termination_by stop1.1 - off1.1
  decreasing_by
    have := Nat.sub_lt_sub_left _h (Nat.add_lt_add_left c₁.utf8Size_pos off1.1)
    decreasing_tactic

Substring equality check for UTF-8 strings

Informal description

The function checks whether substrings of two strings $s_1$ and $s_2$ are equal. The substrings are specified by their starting byte positions $pos_1$ and $pos_2$ (in UTF-8 encoding) and a common size $sz$ (in bytes). The function returns `false` if either substring does not exist (i.e., if the specified substring would extend beyond the end of either string). Otherwise, it compares the substrings character by character and returns `true` if they are identical.

String.isPrefixOf definition

(p : String) (s : String) : Bool

Full source

/--
Checks whether the first string (`p`) is a prefix of the second (`s`).

`String.startsWith` is a version that takes the potential prefix after the string.

Examples:
 * `"red".isPrefixOf "red green blue" = true`
 * `"green".isPrefixOf "red green blue" = false`
 * `"".isPrefixOf "red green blue" = true`
-/
def isPrefixOf (p : String) (s : String) : Bool :=
  substrEq p 0 s 0 p.endPos.byteIdx

String prefix check

Informal description

The function checks whether the string $p$ is a prefix of the string $s$, i.e., whether $s$ starts with $p$. This is done by comparing the substring of $s$ starting at position $0$ with length equal to the byte length of $p$ to the string $p$ itself. The empty string is considered a prefix of any string.

String.replace definition

(s pattern replacement : String) : String

Full source

/--
In the string `s`, replaces all occurrences of `pattern` with `replacement`.

Examples:
* `"red green blue".replace "e" "" = "rd grn blu"`
* `"red green blue".replace "ee" "E" = "red grEn blue"`
* `"red green blue".replace "e" "E" = "rEd grEEn bluE"`
-/
def replace (s pattern replacement : String) : String :=
  if h : pattern.endPos.1 = 0 then s
  else
    have hPatt := Nat.zero_lt_of_ne_zero h
    let rec loop (acc : String) (accStop pos : String.Pos) :=
      if h : pos.byteIdx + pattern.endPos.byteIdx > s.endPos.byteIdx then
        acc ++ s.extract accStop s.endPos
      else
        have := Nat.lt_of_lt_of_le (Nat.add_lt_add_left hPatt _) (Nat.ge_of_not_lt h)
        if s.substrEq pos pattern 0 pattern.endPos.byteIdx then
          have := Nat.sub_lt_sub_left this (Nat.add_lt_add_left hPatt _)
          loop (acc ++ s.extract accStop pos ++ replacement) (pos + pattern) (pos + pattern)
        else
          have := Nat.sub_lt_sub_left this (lt_next s pos)
          loop acc accStop (s.next pos)
      termination_by s.endPos.1 - pos.1
    loop "" 0 0

String replacement function

Informal description

Given a string $s$, a pattern string `pattern`, and a replacement string `replacement`, the function replaces all occurrences of `pattern` in $s$ with `replacement`. If `pattern` is empty, the function returns $s$ unchanged. Examples: - `"red green blue".replace "e" "" = "rd grn blu"` - `"red green blue".replace "ee" "E" = "red grEn blue"` - `"red green blue".replace "e" "E" = "rEd grEEn bluE"`

String.findLineStart definition

(s : String) (pos : String.Pos) : String.Pos

Full source

/--
Returns the position of the beginning of the line that contains the position `pos`.

Lines are ended by `'\n'`, and the returned position is either `0 : String.Pos` or immediately after
a `'\n'` character.
-/
def findLineStart (s : String) (pos : String.Pos) : String.Pos :=
  match s.revFindAux (· = '\n') pos with
  | none => 0
  | somesome n => ⟨n.byteIdx + 1⟩

Start position of line containing a given position

Informal description

Given a string `s` and a byte position `pos` within `s`, the function returns the byte position of the beginning of the line containing `pos`. A line is defined as a sequence of characters ending with a newline character `'\n'`. The returned position is either `0` (indicating the start of the string) or the position immediately after a newline character. More precisely: - If there is no newline character before `pos`, the function returns `0`. - Otherwise, it returns the position right after the last newline character before `pos`.

Substring.isEmpty definition

(ss : Substring) : Bool

Full source

/--
Checks whether a substring is empty.

A substring is empty if its start and end positions are the same.
-/
@[inline] def isEmpty (ss : Substring) : Bool :=
  ss.bsize == 0

Substring emptiness check

Informal description

A substring is empty if its byte size is zero, which occurs when its start and end positions are equal.

Substring.toString definition

: Substring → String

Full source

/--
Copies the region of the underlying string pointed to by a substring into a fresh string.
-/
@[inline] def toString : Substring → String
  | ⟨s, b, e⟩ => s.extract b e

Substring to string conversion

Informal description

Given a substring consisting of a string $s$ and two byte positions $b$ and $e$ within $s$, this function returns a new string containing the characters of $s$ from position $b$ (inclusive) to position $e$ (exclusive). If $b \geq e$, the result is the empty string.

Substring.toIterator definition

: Substring → String.Iterator

Full source

/--
Returns an iterator into the underlying string, at the substring's starting position. The ending
position is discarded, so the iterator alone cannot be used to determine whether its current
position is within the original substring.
-/
@[inline] def toIterator : Substring → String.Iterator
  | ⟨s, b, _⟩ => ⟨s, b⟩

Substring to iterator conversion

Informal description

Given a substring, this function returns an iterator positioned at the start of the substring within the underlying string. The iterator does not retain information about the substring's end position, so it cannot independently determine when it has moved beyond the original substring's bounds.

Substring.get definition

: Substring → String.Pos → Char

Full source

/--
Returns the character at the given position in the substring.

The position is relative to the substring, rather than the underlying string, and no bounds checking
is performed with respect to the substring's end position. If the relative position is not a valid
position in the underlying string, the fallback value `(default : Char)`, which is `'A'`, is
returned.  Does not panic.
-/
@[inline] def get : Substring → String.Pos → Char
  | ⟨s, b, _⟩, p => s.get (b+p)

Character retrieval in substring by relative byte position

Informal description

Given a substring `s` with start byte position `b` in the underlying string, and a relative byte position `p` within the substring, the function returns the Unicode character at position `b + p` in the underlying string. If `b + p` is not a valid position (either out of bounds or not on a character boundary), it returns the default character `'A'` without causing an error.

Substring.next definition

: Substring → String.Pos → String.Pos

Full source

/--
Returns the next position in a substring after the given position. If the position is at the end of
the substring, it is returned unmodified.

Both the input position and the returned position are interpreted relative to the substring's start
position, not the underlying string.
-/
@[inline] def next : Substring → String.Pos → String.Pos
  | ⟨s, b, e⟩, p =>
    let absP := b+p
    if absP = e then p else { byteIdx := (s.next absP).byteIdx - b.byteIdx }

Next position in a substring

Informal description

Given a substring `s` with start byte position `b` and end byte position `e` in the underlying string, and a relative byte position `p` within the substring, the function returns the next valid byte position after `p` in the substring. If `p` is already at the end of the substring (i.e., `b + p = e`), it returns `p` unchanged. The returned position is relative to the substring's start position `b`. More precisely: 1. Compute the absolute position `absP = b + p` in the underlying string. 2. If `absP = e`, return `p` (unchanged). 3. Otherwise, compute the next valid position after `absP` in the underlying string using `s.next absP`, then subtract `b` to make it relative to the substring's start position.

Substring.lt_next theorem

(s : Substring) (i : String.Pos) (h : i.1 < s.bsize) : i.1 < (s.next i).1

Full source

theorem lt_next (s : Substring) (i : String.Pos) (h : i.1 < s.bsize) :
    i.1 < (s.next i).1 := by
  simp [next]; rw [if_neg ?a]
  case a =>
    refine mt (congrArg String.Pos.byteIdx) (Nat.ne_of_lt ?_)
    exact (Nat.add_comm .. ▸ Nat.add_lt_of_lt_sub h :)
  apply Nat.lt_sub_of_add_lt
  rw [Nat.add_comm]; apply String.lt_next

Byte Index Increases at Next Character Boundary in Substrings

Informal description

For any substring $s$ and any byte position $i$ within $s$ such that the byte index of $i$ is less than the byte size of $s$, the byte index of $i$ is strictly less than the byte index of the next character boundary `s.next i`.

Substring.prev definition

: Substring → String.Pos → String.Pos

Full source

/--
Returns the previous position in a substring, just prior to the given position. If the position is
at the beginning of the substring, it is returned unmodified.

Both the input position and the returned position are interpreted relative to the substring's start
position, not the underlying string.
-/
@[inline] def prev : Substring → String.Pos → String.Pos
  | ⟨s, b, _⟩, p =>
    let absP := b+p
    if absP = b then p else { byteIdx := (s.prev absP).byteIdx - b.byteIdx }

Previous position in a substring

Informal description

Given a substring $s$ and a byte position $p$ within $s$, the function returns the previous valid byte position just before $p$ in $s$. If $p$ is already at the start of the substring, it returns $p$ unchanged. Both the input and output positions are interpreted relative to the substring's start position.

Substring.nextn definition

: Substring → Nat → String.Pos → String.Pos

Full source

/--
Returns the position that's the specified number of characters forward from the given position in a
substring. If the end position of the substring is reached, it is returned.

Both the input position and the returned position are interpreted relative to the substring's start
position, not the underlying string.
-/
def nextn : Substring → Nat → String.Pos → String.Pos
  | _,  0,   p => p
  | ss, i+1, p => ss.nextn i (ss.next p)

$n$-th next position in a substring

Informal description

Given a substring `s`, a natural number `n`, and a byte position `p` within `s`, the function returns the position that is `n` characters forward from `p` in `s`. If the end of the substring is reached before counting `n` characters, the end position is returned. Both the input position `p` and the returned position are interpreted relative to the substring's start position. More precisely: 1. If `n = 0`, return `p` (unchanged). 2. Otherwise, recursively compute the next position after `p` in `s` using `s.next p`, then apply `nextn` with `n-1` to this new position.

Substring.prevn definition

: Substring → Nat → String.Pos → String.Pos

Full source

/--
Returns the position that's the specified number of characters prior to the given position in a
substring. If the start position of the substring is reached, it is returned.

Both the input position and the returned position are interpreted relative to the substring's start
position, not the underlying string.
-/
def prevn : Substring → Nat → String.Pos → String.Pos
  | _,  0,   p => p
  | ss, i+1, p => ss.prevn i (ss.prev p)

$n$-th previous position in a substring

Informal description

Given a substring $s$, a natural number $n$, and a byte position $p$ within $s$, the function returns the position that is $n$ characters before $p$ in $s$. If the start of the substring is reached before counting $n$ characters, the start position is returned. Both the input position $p$ and the returned position are interpreted relative to the substring's start position.

Substring.front definition

(s : Substring) : Char

Full source

/--
Returns the first character in the substring.

If the substring is empty, but the substring's start position is a valid position in the underlying
string, then the character at the start position is returned. If the substring's start position is
not a valid position in the string, the fallback value `(default : Char)`, which is `'A'`, is
returned.  Does not panic.
-/
@[inline] def front (s : Substring) : Char :=
  s.get 0

First character of a substring

Informal description

The function returns the first character of a substring. If the substring is empty but its start position is valid within the underlying string, it returns the character at that position. If the start position is invalid, it returns the default character `'A'` without causing an error.

Substring.posOf definition

(s : Substring) (c : Char) : String.Pos

Full source

/--
Returns the substring-relative position of the first occurrence of `c` in `s`, or `s.bsize` if `c`
doesn't occur.
-/
@[inline] def posOf (s : Substring) (c : Char) : String.Pos :=
  match s with
  | ⟨s, b, e⟩ => { byteIdx := (String.posOfAux s c e b).byteIdx - b.byteIdx }

Position of first character occurrence in a substring

Informal description

Given a substring `s` and a character `c`, the function returns the position of the first occurrence of `c` in `s`, relative to the start of the substring. If `c` does not occur in `s`, it returns the length of `s` (in bytes). More precisely, for a substring `⟨s, b, e⟩` (where `s` is the underlying string, `b` is the start position, and `e` is the end position), the function computes the first occurrence of `c` in `s` between positions `b` and `e` using `String.posOfAux`, then returns the difference between this position and `b` (measured in bytes).

Substring.drop definition

: Substring → Nat → Substring

Full source

/--
Removes the specified number of characters (Unicode code points) from the beginning of a substring
by advancing its start position.

If the substring's end position is reached, the start position is not advanced past it.
-/
@[inline] def drop : Substring → Nat → Substring
  | ss@⟨s, b, e⟩, n => ⟨s, b + ss.nextn n 0, e⟩

Substring with first $n$ characters removed

Informal description

Given a substring $ss$ and a natural number $n$, the function returns a new substring obtained by removing the first $n$ characters (Unicode code points) from $ss$. The start position of the new substring is advanced by $n$ characters, but it will not be moved past the end position of the original substring. More formally, if $ss$ is represented as $\langle s, b, e \rangle$ where $s$ is the underlying string, $b$ is the start position, and $e$ is the end position, then the new substring is $\langle s, b + \text{nextn}(ss, n, 0), e \rangle$, where $\text{nextn}(ss, n, p)$ returns the position $n$ characters forward from $p$ in $ss$.

Substring.dropRight definition

: Substring → Nat → Substring

Full source

/--
Removes the specified number of characters (Unicode code points) from the end of a substring
by moving its end position towards its start position.

If the substring's start position is reached, the end position is not retracted past it.
-/
@[inline] def dropRight : Substring → Nat → Substring
  | ss@⟨s, b, _⟩, n => ⟨s, b, b + ss.prevn n ⟨ss.bsize⟩⟩

Substring with last $n$ characters removed

Informal description

Given a substring $ss$ and a natural number $n$, the function returns a new substring obtained by removing the last $n$ characters (Unicode code points) from $ss$. The end position of the new substring is adjusted by moving it left by $n$ characters, but it will not be moved past the start position of the original substring. More formally, if $ss$ is represented as $\langle s, b, e \rangle$ where $s$ is the underlying string, $b$ is the start position, and $e$ is the end position, then the new substring is $\langle s, b, b + \text{prevn}(ss, n, \text{bsize}(ss)) \rangle$, where $\text{prevn}(ss, n, p)$ returns the position $n$ characters before $p$ in $ss$.

Substring.take definition

: Substring → Nat → Substring

Full source

/--
Retains only the specified number of characters (Unicode code points) at the beginning of a
substring, by moving its end position towards its start position.

If the substring's start position is reached, the end position is not retracted past it.
-/
@[inline] def take : Substring → Nat → Substring
  | ss@⟨s, b, _⟩, n => ⟨s, b, b + ss.nextn n 0⟩

First `n` characters of a substring

Informal description

Given a substring `ss` of a string `s` with start position `b` and a natural number `n`, the function returns a new substring consisting of the first `n` characters (Unicode code points) of `ss`. The end position of the new substring is adjusted to `b + nextn(n, 0)`, where `nextn(n, 0)` computes the position `n` characters forward from the start of `ss`. If `n` exceeds the length of `ss`, the entire substring is returned. More precisely: 1. The new substring shares the same underlying string `s` and start position `b` as the original substring. 2. The end position is computed by moving forward `n` characters from the start position, using the `nextn` function. 3. If the substring's length is less than `n`, the end position will not be adjusted beyond the original substring's end position.

Substring.takeRight definition

: Substring → Nat → Substring

Full source

/--
Retains only the specified number of characters (Unicode code points) at the end of a substring, by
moving its start position towards its end position.

If the substring's end position is reached, the start position is not advanced past it.
-/
@[inline] def takeRight : Substring → Nat → Substring
  | ss@⟨s, b, e⟩, n => ⟨s, b + ss.prevn n ⟨ss.bsize⟩, e⟩

Last `n` characters of a substring

Informal description

Given a substring `ss` of a string `s` with start position `b` and end position `e`, and a natural number `n`, the function returns a new substring of `s` consisting of the last `n` characters of `ss`. The start position of the new substring is adjusted to `b + prevn(n, bsize(ss))`, where `prevn` moves backward `n` characters from the end of `ss`, and `bsize(ss)` is the byte size of `ss`. The end position remains `e`. If `n` exceeds the length of `ss`, the entire substring is returned.

Substring.atEnd definition

: Substring → String.Pos → Bool

Full source

/--
Checks whether a position in a substring is precisely equal to its ending position.

The position is understood relative to the substring's starting position, rather than the underlying
string's starting position.
-/
@[inline] def atEnd : Substring → String.Pos → Bool
  | ⟨_, b, e⟩, p => b + p == e

Check if position is at end of substring

Informal description

Given a substring `s` with start position `b` and end position `e`, and a position `p` relative to `b`, the function returns `true` if `b + p = e` (i.e., if `p` is exactly at the end of the substring).

Substring.extract definition

: Substring → String.Pos → String.Pos → Substring

Full source

/--
Returns the region of the substring delimited by the provided start and stop positions, as a
substring. The positions are interpreted with respect to the substring's start position, rather than
the underlying string.

If the resulting substring is empty, then the resulting substring is a substring of the empty string
`""`. Otherwise, the underlying string is that of the input substring with the beginning and end
positions adjusted.
-/
@[inline] def extract : Substring → String.Pos → String.Pos → Substring
  | ⟨s, b, e⟩, b', e' => if b' ≥ e' then ⟨"", 0, 0⟩ else ⟨s, e.min (b+b'), e.min (b+e')⟩

Extract substring within specified relative positions

Informal description

Given a substring with underlying string $s$, start position $b$, and end position $e$, and two positions $b'$ and $e'$ relative to $b$, the function returns a new substring of $s$ starting at $\min(e, b + b')$ and ending at $\min(e, b + e')$. If $b' \geq e'$, the result is the empty substring `""` with start and end positions both set to 0.

Substring.foldl definition

{α : Type u} (f : α → Char → α) (init : α) (s : Substring) : α

Full source

/--
Folds a function over a substring from the left, accumulating a value starting with `init`. The
accumulated value is combined with each character in order, using `f`.
-/
@[inline] def foldl {α : Type u} (f : α → Char → α) (init : α) (s : Substring) : α :=
  match s with
  | ⟨s, b, e⟩ => String.foldlAux f s e b init

Left fold over a substring

Informal description

Given a function $ f : \alpha \to \text{Char} \to \alpha $, an initial value $ \text{init} : \alpha $, and a substring $ s $, the function `Substring.foldl` applies $ f $ to each character in $ s $ from left to right, starting with $ \text{init} $ as the initial accumulator value. The result is the final accumulator value after processing all characters in the substring.

Substring.foldr definition

{α : Type u} (f : Char → α → α) (init : α) (s : Substring) : α

Full source

/--
Folds a function over a substring from the right, accumulating a value starting with `init`. The
accumulated value is combined with each character in reverse order, using `f`.
-/
@[inline] def foldr {α : Type u} (f : Char → α → α) (init : α) (s : Substring) : α :=
  match s with
  | ⟨s, b, e⟩ => String.foldrAux f init s e b

Right fold over a substring

Informal description

Given a function $ f : \text{Char} \to \alpha \to \alpha $, an initial value $ \text{init} : \alpha $, and a substring $ s $, the function `Substring.foldr` applies $ f $ to each character in $ s $ from right to left, starting with $ \text{init} $ as the initial accumulator value. The result is the final accumulator value after processing all characters in the substring.

Substring.any definition

(s : Substring) (p : Char → Bool) : Bool

Full source

/--
Checks whether the Boolean predicate `p` returns `true` for any character in a substring.

Short-circuits at the first character for which `p` returns `true`.
-/
@[inline] def any (s : Substring) (p : Char → Bool) : Bool :=
  match s with
  | ⟨s, b, e⟩ => String.anyAux s e p b

Existence of a satisfying character in a substring

Informal description

Given a substring `s` and a predicate `p` on characters, the function checks if there exists any character in `s` for which `p` returns `true`. The evaluation short-circuits and returns `true` as soon as such a character is found.

Substring.all definition

(s : Substring) (p : Char → Bool) : Bool

Full source

/--
Checks whether the Boolean predicate `p` returns `true` for every character in a substring.

Short-circuits at the first character for which `p` returns `false`.
-/
@[inline] def all (s : Substring) (p : Char → Bool) : Bool :=
  !s.any (fun c => !p c)

Universal quantification over characters in a substring

Informal description

Given a substring `s` and a predicate `p` on characters, the function checks if `p` returns `true` for every character in `s`. The evaluation short-circuits and returns `false` as soon as a character for which `p` returns `false` is found.

Substring.contains definition

(s : Substring) (c : Char) : Bool

Full source

/--
Checks whether a substring contains the specified character.
-/
@[inline] def contains (s : Substring) (c : Char) : Bool :=
  s.any (fun a => a == c)

Character containment in a substring

Informal description

Given a substring `s` and a character `c`, the function checks whether `c` appears in `s` by testing if any character in `s` is equal to `c`.

Substring.takeWhileAux definition

(s : String) (stopPos : String.Pos) (p : Char → Bool) (i : String.Pos) : String.Pos

Full source

@[specialize] def takeWhileAux (s : String) (stopPos : String.Pos) (p : Char → Bool) (i : String.Pos) : String.Pos :=
  if h : i < stopPos then
    if p (s.get i) then
      have := Nat.sub_lt_sub_left h (String.lt_next s i)
      takeWhileAux s stopPos p (s.next i)
    else i
  else i
termination_by stopPos.1 - i.1

Auxiliary function for taking characters while a predicate holds

Informal description

The auxiliary function `takeWhileAux` takes a string `s`, a stop position `stopPos`, a predicate `p` on characters, and a current position `i`. It returns the position `j` such that all characters from `i` to `j` (exclusive) satisfy `p`, and either `j` is the first position where `p` fails or `j` is `stopPos` if all characters up to `stopPos` satisfy `p`. The function proceeds by checking if the current position `i` is before `stopPos` and if the character at `i` satisfies `p`. If both conditions hold, it recursively processes the next character position; otherwise, it returns `i`.

Substring.takeWhile definition

: Substring → (Char → Bool) → Substring

Full source

/--
Retains only the longest prefix of a substring in which a Boolean predicate returns `true` for all
characters by moving the substring's end position towards its start position.
-/
@[inline] def takeWhile : Substring → (Char → Bool) → Substring
  | ⟨s, b, e⟩, p =>
    let e := takeWhileAux s e p b;
    ⟨s, b, e⟩

Longest prefix satisfying a predicate

Informal description

Given a substring and a predicate on characters, the function returns the longest prefix of the substring where all characters satisfy the predicate. The resulting substring starts at the same position as the input and ends at the first position where the predicate fails or at the original end position if all characters satisfy the predicate.

Substring.dropWhile definition

: Substring → (Char → Bool) → Substring

Full source

/--
Removes the longest prefix of a substring in which a Boolean predicate returns `true` for all
characters by moving the substring's start position. The start position is moved to the position of
the first character for which the predicate returns `false`, or to the substring's end position if
the predicate always returns `true`.
-/
@[inline] def dropWhile : Substring → (Char → Bool) → Substring
  | ⟨s, b, e⟩, p =>
    let b := takeWhileAux s e p b;
    ⟨s, b, e⟩

Longest prefix removal while predicate holds

Informal description

Given a substring and a predicate on characters, the function `dropWhile` returns a new substring obtained by removing the longest prefix of the original substring where all characters satisfy the predicate. The start position of the new substring is moved to the first position where the predicate fails, or to the end position if the predicate holds for all characters.

Substring.takeRightWhileAux definition

(s : String) (begPos : String.Pos) (p : Char → Bool) (i : String.Pos) : String.Pos

Full source

@[specialize] def takeRightWhileAux (s : String) (begPos : String.Pos) (p : Char → Bool) (i : String.Pos) : String.Pos :=
  if h : begPos < i then
    have := String.prev_lt_of_pos s i <| mt (congrArg String.Pos.byteIdx) <|
      Ne.symm <| Nat.ne_of_lt <| Nat.lt_of_le_of_lt (Nat.zero_le _) h
    let i' := s.prev i
    let c  := s.get i'
    if !p c then i
    else takeRightWhileAux s begPos p i'
  else i
termination_by i.1

Auxiliary function for taking characters from the right while a predicate holds

Informal description

Given a string `s`, a starting position `begPos`, a predicate `p` on characters, and a current position `i`, the function `takeRightWhileAux` returns the rightmost position `j` such that all characters between `begPos` and `j` (inclusive) satisfy `p`. If `i` is not greater than `begPos`, it returns `i`. The function works by recursively moving left from `i` until it finds a character that does not satisfy `p` or reaches `begPos`.

Substring.takeRightWhile definition

: Substring → (Char → Bool) → Substring

Full source

/--
Retains only the longest suffix of a substring in which a Boolean predicate returns `true` for all
characters by moving the substring's start position towards its end position.
-/
@[inline] def takeRightWhile : Substring → (Char → Bool) → Substring
  | ⟨s, b, e⟩, p =>
    let b := takeRightWhileAux s b p e
    ⟨s, b, e⟩

Longest suffix satisfying a predicate

Informal description

Given a substring and a predicate on characters, the function returns the longest suffix of the substring where all characters satisfy the predicate. This is achieved by adjusting the start position of the substring towards its end position while the predicate holds for all characters in the suffix.

Substring.dropRightWhile definition

: Substring → (Char → Bool) → Substring

Full source

/--
Removes the longest suffix of a substring in which a Boolean predicate returns `true` for all
characters by moving the substring's end position. The end position is moved just after the position
of the last character for which the predicate returns `false`, or to the substring's start position
if the predicate always returns `true`.
-/
@[inline] def dropRightWhile : Substring → (Char → Bool) → Substring
  | ⟨s, b, e⟩, p =>
    let e := takeRightWhileAux s b p e
    ⟨s, b, e⟩

Drop suffix while predicate holds

Informal description

Given a substring and a predicate on characters, the function `dropRightWhile` modifies the substring by moving its end position to exclude the longest suffix where all characters satisfy the predicate. The new end position is placed just after the last character that fails the predicate, or at the start position if all characters satisfy it.

Substring.trimLeft definition

(s : Substring) : Substring

Full source

/--
Removes leading whitespace from a substring by moving its start position to the first non-whitespace
character, or to its end position if there is no non-whitespace character.

“Whitespace” is defined as characters for which `Char.isWhitespace` returns `true`.
-/
@[inline] def trimLeft (s : Substring) : Substring :=
  s.dropWhile Char.isWhitespace

Trim leading whitespace from substring

Informal description

Given a substring `s`, the function removes leading whitespace characters by moving the start position of the substring to the first non-whitespace character, or to the end position if all characters are whitespace. Here, "whitespace" refers to characters for which `Char.isWhitespace` returns `true` (space, tab, carriage return, or newline).

Substring.trimRight definition

(s : Substring) : Substring

Full source

/--
Removes trailing whitespace from a substring by moving its end position to the last non-whitespace
character, or to its start position if there is no non-whitespace character.

“Whitespace” is defined as characters for which `Char.isWhitespace` returns `true`.
-/
@[inline] def trimRight (s : Substring) : Substring :=
  s.dropRightWhile Char.isWhitespace

Trim trailing whitespace from substring

Informal description

The function removes trailing whitespace characters from a substring by adjusting its end position to the last non-whitespace character, or to the start position if the substring consists entirely of whitespace. Here, "whitespace" refers to characters for which `Char.isWhitespace` returns `true` (space, tab, carriage return, or newline).

Substring.trim definition

: Substring → Substring

Full source

/--
Removes leading and trailing whitespace from a substring by first moving its start position to the
first non-whitespace character, and then moving its end position to the last non-whitespace
character.

If the substring consists only of whitespace, then the resulting substring's start position is moved
to its end position.

“Whitespace” is defined as characters for which `Char.isWhitespace` returns `true`.

Examples:
 * `" red green blue ".toSubstring.trim.toString = "red green blue"`
 * `" red green blue ".toSubstring.trim.startPos = ⟨1⟩`
 * `" red green blue ".toSubstring.trim.stopPos = ⟨15⟩`
 * `"     ".toSubstring.trim.startPos = ⟨5⟩`
-/
@[inline] def trim : Substring → Substring
  | ⟨s, b, e⟩ =>
    let b := takeWhileAux s e Char.isWhitespace b
    let e := takeRightWhileAux s b Char.isWhitespace e
    ⟨s, b, e⟩

Trim whitespace from substring

Informal description

Given a substring, the function `trim` removes leading and trailing whitespace characters by adjusting the start and end positions of the substring. Specifically, it moves the start position to the first non-whitespace character and the end position to the last non-whitespace character. If the substring consists entirely of whitespace, the start position is set to the end position. Whitespace is defined as any character for which `Char.isWhitespace` returns `true`, including space, tab, carriage return, and newline.

Substring.isNat definition

(s : Substring) : Bool

Full source

/--
Checks whether the substring can be interpreted as the decimal representation of a natural number.

A substring can be interpreted as a decimal natural number if it is not empty and all the characters
in it are digits.

Use `Substring.toNat?` to convert such a substring to a natural number.
-/
@[inline] def isNat (s : Substring) : Bool :=
  s.all fun c => c.isDigit

Check if substring represents a decimal natural number

Informal description

Given a substring `s`, the function returns `true` if `s` is non-empty and every character in `s` is an ASCII digit (i.e., between '0' and '9' inclusive), indicating that `s` can be interpreted as a decimal representation of a natural number.

Substring.toNat? definition

(s : Substring) : Option Nat

Full source

/--
Checks whether the substring can be interpreted as the decimal representation of a natural number,
returning the number if it can.

A substring can be interpreted as a decimal natural number if it is not empty and all the characters
in it are digits.

Use `Substring.isNat` to check whether the substring is such a substring.
-/
def toNat? (s : Substring) : Option Nat :=
  if s.isNat then
    some <| s.foldl (fun n c => n*10 + (c.toNat - '0'.toNat)) 0
  else
    none

Decimal natural number interpretation of a substring

Informal description

Given a substring `s`, the function returns `some n` where `n` is the natural number represented by the decimal digits in `s` if `s` is non-empty and consists entirely of ASCII digits (i.e., characters between '0' and '9'). Otherwise, it returns `none`. More precisely, if `s.isNat` is true, the function computes the natural number by interpreting each character in `s` as a digit and accumulating the result from left to right using the formula: \[ n = \sum_{i=0}^{k-1} d_i \cdot 10^{k-1-i} \] where $d_i$ is the numeric value of the $i$-th character in `s` (i.e., $d_i = \text{ord}(c_i) - \text{ord}('0')$), and $k$ is the length of `s`.

Substring.beq definition

(ss1 ss2 : Substring) : Bool

Full source

/--
Checks whether two substrings represent equal strings. Usually accessed via the `==` operator.

Two substrings do not need to have the same underlying string or the same start and end positions;
instead, they are equal if they contain the same sequence of characters.
-/
def beq (ss1 ss2 : Substring) : Bool :=
  ss1.bsize == ss2.bsize && ss1.str.substrEq ss1.startPos ss2.str ss2.startPos ss1.bsize

Substring equality check

Informal description

Two substrings $ss_1$ and $ss_2$ are considered equal if they have the same byte size and their character sequences match exactly. This equality check is performed by comparing the underlying string segments byte by byte, starting from their respective positions.

Substring.hasBeq instance

: BEq Substring

Full source

instance hasBeq : BEq Substring := ⟨beq⟩

Boolean Equality for Substrings

Informal description

The type `Substring` is equipped with a boolean-valued equality relation `==` that checks if two substrings have the same byte size and matching character sequences.

Substring.sameAs definition

(ss1 ss2 : Substring) : Bool

Full source

/--
Checks whether two substrings have the same position and content.

The two substrings do not need to have the same underlying string for this check to succeed.
-/
def sameAs (ss1 ss2 : Substring) : Bool :=
  ss1.startPos == ss2.startPos && ss1 == ss2

Substring equality with position check

Informal description

Given two substrings `ss₁` and `ss₂`, the function returns `true` if they have the same starting position and identical content, and `false` otherwise. The underlying strings of the substrings do not need to be the same for this check to succeed.

Substring.commonPrefix definition

(s t : Substring) : Substring

Full source

/--
Returns the longest common prefix of two substrings.

The returned substring uses the same underlying string as `s`.
-/
def commonPrefix (s t : Substring) : Substring :=
  { s with stopPos := loop s.startPos t.startPos }
where
  /-- Returns the ending position of the common prefix, working up from `spos, tpos`. -/
  loop spos tpos :=
    if h : spos < s.stopPos ∧ tpos < t.stopPos then
      if s.str.get spos == t.str.get tpos then
        have := Nat.sub_lt_sub_left h.1 (s.str.lt_next spos)
        loop (s.str.next spos) (t.str.next tpos)
      else
        spos
    else
      spos
  termination_by s.stopPos.byteIdx - spos.byteIdx

Longest common prefix of two substrings

Informal description

Given two substrings $s$ and $t$, the function returns the longest common prefix between them. The returned substring shares the same underlying string as $s$. The function works by iteratively comparing characters from the start of both substrings until a mismatch is found or the end of either substring is reached.

Substring.commonSuffix definition

(s t : Substring) : Substring

Full source

/--
Returns the longest common suffix of two substrings.

The returned substring uses the same underlying string as `s`.
-/
def commonSuffix (s t : Substring) : Substring :=
  { s with startPos := loop s.stopPos t.stopPos }
where
  /-- Returns the starting position of the common prefix, working down from `spos, tpos`. -/
  loop spos tpos :=
    if h : s.startPos < spos ∧ t.startPos < tpos then
      let spos' := s.str.prev spos
      let tpos' := t.str.prev tpos
      if s.str.get spos' == t.str.get tpos' then
        have : spos' < spos := s.str.prev_lt_of_pos spos (String.Pos.ne_zero_of_lt h.1)
        loop spos' tpos'
      else
        spos
    else
      spos
  termination_by spos.byteIdx

Longest common suffix of two substrings

Informal description

Given two substrings `s` and `t`, the function returns the longest common suffix between them. The returned substring shares the same underlying string as `s`. The function works by iteratively comparing characters from the end of both substrings towards the beginning until a mismatch is found or the start of either substring is reached.

Substring.dropPrefix? definition

(s : Substring) (pre : Substring) : Option Substring

Full source

/--
If `pre` is a prefix of `s`, returns the remainder. Returns `none` otherwise.

The substring `pre` is a prefix of `s` if there exists a `t : Substring` such that
`s.toString = pre.toString ++ t.toString`. If so, the result is the substring of `s` without the
prefix.
-/
def dropPrefix? (s : Substring) (pre : Substring) : Option Substring :=
  let t := s.commonPrefix pre
  if t.bsize = pre.bsize then
    some { s with startPos := t.stopPos }
  else
    none

Substring remainder after prefix removal

Informal description

Given a substring `s` and a prefix substring `pre`, this function returns the remaining part of `s` after removing `pre` if `pre` is indeed a prefix of `s`. If `pre` is not a prefix, it returns `none`. More formally, let `t` be the longest common prefix between `s` and `pre`. If the byte size of `t` equals the byte size of `pre`, then `pre` is a prefix of `s`, and the function returns the substring of `s` starting from the end of `t`. Otherwise, it returns `none`.

Substring.dropSuffix? definition

(s : Substring) (suff : Substring) : Option Substring

Full source

/--
If `suff` is a suffix of `s`, returns the remainder. Returns `none` otherwise.

The substring `suff` is a suffix of `s` if there exists a `t : Substring` such that
`s.toString = t.toString ++ suff.toString`. If so, the result the substring of `s` without the
suffix.
-/
def dropSuffix? (s : Substring) (suff : Substring) : Option Substring :=
  let t := s.commonSuffix suff
  if t.bsize = suff.bsize then
    some { s with stopPos := t.startPos }
  else
    none

Substring suffix removal

Informal description

Given a substring `s` and a suffix substring `suff`, this function returns `some t` where `t` is the substring of `s` without the suffix `suff` if `suff` is indeed a suffix of `s`. Otherwise, it returns `none`. More precisely, the function checks whether `suff` is a suffix of `s` by comparing their longest common suffix. If the byte size of this common suffix matches the byte size of `suff`, then `suff` is a suffix of `s`, and the function returns the substring of `s` up to the start of the common suffix. Otherwise, it returns `none`.

String.drop definition

(s : String) (n : Nat) : String

Full source

/--
Removes the specified number of characters (Unicode code points) from the start of the string.

If `n` is greater than `s.length`, returns `""`.

Examples:
 * `"red green blue".drop 4 = "green blue"`
 * `"red green blue".drop 10 = "blue"`
 * `"red green blue".drop 50 = ""`
-/
@[inline] def drop (s : String) (n : Nat) : String :=
  (s.toSubstring.drop n).toString

String with first $n$ characters removed

Informal description

Given a string $s$ and a natural number $n$, the function returns a new string obtained by removing the first $n$ characters (Unicode code points) from $s$. If $n$ is greater than the length of $s$, the result is the empty string.

String.dropRight definition

(s : String) (n : Nat) : String

Full source

/--
Removes the specified number of characters (Unicode code points) from the end of the string.

If `n` is greater than `s.length`, returns `""`.

Examples:
 * `"red green blue".dropRight 5 = "red green"`
 * `"red green blue".dropRight 11 = "red"`
 * `"red green blue".dropRight 50 = ""`
-/
@[inline] def dropRight (s : String) (n : Nat) : String :=
  (s.toSubstring.dropRight n).toString

String with last $n$ characters removed

Informal description

Given a string $s$ and a natural number $n$, the function returns a new string obtained by removing the last $n$ characters (Unicode code points) from $s$. If $n$ is greater than the length of $s$, the result is the empty string.

String.take definition

(s : String) (n : Nat) : String

Full source

/--
Creates a new string that contains the first `n` characters (Unicode code points) of `s`.

If `n` is greater than `s.length`, returns `s`.

Examples:
* `"red green blue".take 3 = "red"`
* `"red green blue".take 1 = "r"`
* `"red green blue".take 0 = ""`
* `"red green blue".take 100 = "red green blue"`
-/
@[inline] def take (s : String) (n : Nat) : String :=
  (s.toSubstring.take n).toString

First $n$ characters of a string

Informal description

Given a string $s$ and a natural number $n$, the function returns a new string consisting of the first $n$ characters (Unicode code points) of $s$. If $n$ is greater than the length of $s$, the entire string $s$ is returned.

String.takeRight definition

(s : String) (n : Nat) : String

Full source

/--
Creates a new string that contains the last `n` characters (Unicode code points) of `s`.

If `n` is greater than `s.length`, returns `s`.

Examples:
* `"red green blue".takeRight 4 = "blue"`
* `"red green blue".takeRight 1 = "e"`
* `"red green blue".takeRight 0 = ""`
* `"red green blue".takeRight 100 = "red green blue"`
-/
@[inline] def takeRight (s : String) (n : Nat) : String :=
  (s.toSubstring.takeRight n).toString

Last `n` characters of a string

Informal description

Given a string $s$ and a natural number $n$, the function returns a new string consisting of the last $n$ characters of $s$. If $n$ is greater than the length of $s$, the entire string $s$ is returned.

String.takeWhile definition

(s : String) (p : Char → Bool) : String

Full source

/--
Creates a new string that contains the longest prefix of `s` in which `p` returns `true` for all
characters.

Examples:
* `"red green blue".takeWhile (·.isLetter) = "red"`
* `"red green blue".takeWhile (· == 'r') = "r"`
* `"red green blue".takeWhile (· != 'n') = "red gree"`
* `"red green blue".takeWhile (fun _ => true) = "red green blue"`
-/
@[inline] def takeWhile (s : String) (p : Char → Bool) : String :=
  (s.toSubstring.takeWhile p).toString

Longest prefix of a string satisfying a predicate

Informal description

Given a string $s$ and a predicate $p$ on characters, the function returns the longest prefix of $s$ consisting of characters that all satisfy $p$. Examples: - For $s = \texttt{"red green blue"}$ and $p$ checking if a character is a letter, the result is $\texttt{"red"}$. - For $s = \texttt{"red green blue"}$ and $p$ checking if a character equals $\texttt{'r'}$, the result is $\texttt{"r"}$. - For $s = \texttt{"red green blue"}$ and $p$ checking if a character is not $\texttt{'n'}$, the result is $\texttt{"red gree"}$. - For $s = \texttt{"red green blue"}$ and $p$ always returning true, the result is the original string $\texttt{"red green blue"}$.

String.dropWhile definition

(s : String) (p : Char → Bool) : String

Full source

/--
Creates a new string by removing the longest prefix from `s` in which `p` returns `true` for all
characters.

Examples:
* `"red green blue".dropWhile (·.isLetter) = " green blue"`
* `"red green blue".dropWhile (· == 'r') = "ed green blue"`
* `"red green blue".dropWhile (· != 'n') = "n blue"`
* `"red green blue".dropWhile (fun _ => true) = ""`
-/
@[inline] def dropWhile (s : String) (p : Char → Bool) : String :=
  (s.toSubstring.dropWhile p).toString

Drop longest prefix satisfying predicate

Informal description

Given a string $s$ and a predicate $p$ on characters, the function `dropWhile` returns a new string obtained by removing the longest prefix of $s$ where all characters satisfy $p$. Examples: - For $s = \texttt{"red green blue"}$ and $p$ checking if a character is a letter, the result is $\texttt{" green blue"}$. - For $s = \texttt{"red green blue"}$ and $p$ checking if a character equals $\texttt{'r'}$, the result is $\texttt{"ed green blue"}$. - For $s = \texttt{"red green blue"}$ and $p$ checking if a character is not $\texttt{'n'}$, the result is $\texttt{"n blue"}$. - For $s = \texttt{"red green blue"}$ and $p$ always returning true, the result is the empty string $\texttt{""}$.

String.takeRightWhile definition

(s : String) (p : Char → Bool) : String

Full source

/--
Creates a new string that contains the longest suffix of `s` in which `p` returns `true` for all
characters.

Examples:
* `"red green blue".takeRightWhile (·.isLetter) = "blue"`
* `"red green blue".takeRightWhile (· == 'e') = "e"`
* `"red green blue".takeRightWhile (· != 'n') = " blue"`
* `"red green blue".takeRightWhile (fun _ => true) = "red green blue"`
-/
@[inline] def takeRightWhile (s : String) (p : Char → Bool) : String :=
  (s.toSubstring.takeRightWhile p).toString

Longest suffix satisfying a predicate

Informal description

Given a string $s$ and a predicate $p$ on characters, the function returns the longest suffix of $s$ where every character satisfies $p$.

String.dropRightWhile definition

(s : String) (p : Char → Bool) : String

Full source

/--
Creates a new string by removing the longest suffix from `s` in which `p` returns `true` for all
characters.

Examples:
* `"red green blue".dropRightWhile (·.isLetter) = "red green "`
* `"red green blue".dropRightWhile (· == 'e') = "red green blu"`
* `"red green blue".dropRightWhile (· != 'n') = "red green"`
* `"red green blue".dropRightWhile (fun _ => true) = ""`
-/
@[inline] def dropRightWhile (s : String) (p : Char → Bool) : String :=
  (s.toSubstring.dropRightWhile p).toString

Drop suffix while predicate holds

Informal description

Given a string $s$ and a predicate $p$ on characters, the function `dropRightWhile` returns a new string obtained by removing the longest suffix of $s$ where all characters satisfy $p$. Examples: * `"red green blue".dropRightWhile (·.isLetter) = "red green "` * `"red green blue".dropRightWhile (· == 'e') = "red green blu"` * `"red green blue".dropRightWhile (· != 'n') = "red green"` * `"red green blue".dropRightWhile (fun _ => true) = ""`

String.startsWith definition

(s pre : String) : Bool

Full source

/--
Checks whether the first string (`s`) begins with the second (`pre`).

`String.isPrefix` is a version that takes the potential prefix before the string.

Examples:
 * `"red green blue".startsWith "red" = true`
 * `"red green blue".startsWith "green" = false`
 * `"red green blue".startsWith "" = true`
 * `"red".startsWith "red" = true`
-/
@[inline] def startsWith (s pre : String) : Bool :=
  s.toSubstring.take pre.length == pre.toSubstring

String prefix check

Informal description

The function checks whether the string $s$ begins with the string $\text{pre}$. It returns $\text{true}$ if the first $\text{length}(\text{pre})$ characters of $s$ match $\text{pre}$ exactly, and $\text{false}$ otherwise. The empty string $\text{pre}$ always matches. Examples: - $\text{startsWith}(\texttt{"red green blue"}, \texttt{"red"}) = \text{true}$ - $\text{startsWith}(\texttt{"red green blue"}, \texttt{"green"}) = \text{false}$ - $\text{startsWith}(\texttt{"red green blue"}, \texttt{""}) = \text{true}$ - $\text{startsWith}(\texttt{"red"}, \texttt{"red"}) = \text{true}$

String.endsWith definition

(s post : String) : Bool

Full source

/--
Checks whether the first string (`s`) ends with the second (`post`).

Examples:
 * `"red green blue".endsWith "blue" = true`
 * `"red green blue".endsWith "green" = false`
 * `"red green blue".endsWith "" = true`
 * `"red".endsWith "red" = true`
-/
@[inline] def endsWith (s post : String) : Bool :=
  s.toSubstring.takeRight post.length == post.toSubstring

String suffix check

Informal description

The function checks whether the string $s$ ends with the string $\text{post}$. It returns $\text{true}$ if the last $\text{length}(\text{post})$ characters of $s$ match $\text{post}$ exactly, and $\text{false}$ otherwise. The empty string $\text{post}$ always matches. Examples: - $\text{endsWith}(\texttt{"red green blue"}, \texttt{"blue"}) = \text{true}$ - $\text{endsWith}(\texttt{"red green blue"}, \texttt{"green"}) = \text{false}$ - $\text{endsWith}(\texttt{"red green blue"}, \texttt{""}) = \text{true}$ - $\text{endsWith}(\texttt{"red"}, \texttt{"red"}) = \text{true}$

String.trimRight definition

(s : String) : String

Full source

/--
Removes trailing whitespace from a string.

“Whitespace” is defined as characters for which `Char.isWhitespace` returns `true`.

Examples:
* `"abc".trimRight = "abc"`
* `"   abc".trimRight = "   abc"`
* `"abc \t  ".trimRight = "abc"`
* `"  abc   ".trimRight = "  abc"`
* `"abc\ndef\n".trimRight = "abc\ndef"`
-/
@[inline] def trimRight (s : String) : String :=
  s.toSubstring.trimRight.toString

String right trim

Informal description

The function removes trailing whitespace characters from a string $s$, where whitespace is defined as any character for which `Char.isWhitespace` returns `true` (including space, tab, carriage return, and newline). The function operates by converting the string to a substring, trimming the right side, and converting back to a string.

String.trimLeft definition

(s : String) : String

Full source

/--
Removes leading whitespace from a string.

“Whitespace” is defined as characters for which `Char.isWhitespace` returns `true`.

Examples:
* `"abc".trimLeft = "abc"`
* `"   abc".trimLeft = "   abc"`
* `"abc \t  ".trimLeft = "abc \t  "`
* `"  abc   ".trimLeft = "abc   "`
* `"abc\ndef\n".trimLeft = "abc\ndef\n"`
-/
@[inline] def trimLeft (s : String) : String :=
  s.toSubstring.trimLeft.toString

String left trim function

Informal description

The function removes all leading whitespace characters from a string $s$, where whitespace is defined as any character for which `Char.isWhitespace` returns `true$ (including space, tab, carriage return, and newline). The function operates by converting the string to a substring, trimming the left side, and converting back to a string. Examples: * $\text{trimLeft}(\texttt{"abc"}) = \texttt{"abc"}$ * $\text{trimLeft}(\texttt{" abc"}) = \texttt{"abc"}$ * $\text{trimLeft}(\texttt{"abc \t "}) = \texttt{"abc \t "}$ * $\text{trimLeft}(\texttt{" abc "}) = \texttt{"abc "}$ * $\text{trimLeft}(\texttt{"abc\ndef\n"}) = \texttt{"abc\ndef\n"}$

String.trim definition

(s : String) : String

Full source

/--
Removes leading and trailing whitespace from a string.

“Whitespace” is defined as characters for which `Char.isWhitespace` returns `true`.

Examples:
* `"abc".trim = "abc"`
* `"   abc".trim = "abc"`
* `"abc \t  ".trim = "abc"`
* `"  abc   ".trim = "abc"`
* `"abc\ndef\n".trim = "abc\ndef"`
-/
@[inline] def trim (s : String) : String :=
  s.toSubstring.trim.toString

String trim function

Informal description

The function takes a string $s$ and returns a new string with all leading and trailing whitespace characters removed. Whitespace is defined as any character for which `Char.isWhitespace` returns `true`, including space, tab, carriage return, and newline.

String.nextWhile definition

(s : String) (p : Char → Bool) (i : String.Pos) : String.Pos

Full source

/--
Repeatedly increments a position in a string, as if by `String.next`, while the predicate `p`
returns `true` for the character at the position. Stops incrementing at the end of the string or
when `p` returns `false` for the current character.

Examples:
* `let s := "   a  "; s.get (s.nextWhile Char.isWhitespace 0) = 'a'`
* `let s := "a  "; s.get (s.nextWhile Char.isWhitespace 0) = 'a'`
* `let s := "ba  "; s.get (s.nextWhile Char.isWhitespace 0) = 'b'`
-/
@[inline] def nextWhile (s : String) (p : Char → Bool) (i : String.Pos) : String.Pos :=
  Substring.takeWhileAux s s.endPos p i

Advance position while predicate holds

Informal description

Given a string $s$, a predicate $p$ on characters, and a starting byte position $i$, the function `nextWhile` returns the first position $j$ such that either: 1. $j$ is the end of the string, or 2. the character at position $j$ does not satisfy $p$, and all characters from $i$ to $j$ (exclusive) satisfy $p$.

String.nextUntil definition

(s : String) (p : Char → Bool) (i : String.Pos) : String.Pos

Full source

/--
Repeatedly increments a position in a string, as if by `String.next`, while the predicate `p`
returns `false` for the character at the position. Stops incrementing at the end of the string or
when `p` returns `true` for the current character.

Examples:
* `let s := "   a  "; s.get (s.nextUntil Char.isWhitespace 0) = ' '`
* `let s := "   a  "; s.get (s.nextUntil Char.isLetter 0) = 'a'`
* `let s := "a  "; s.get (s.nextUntil Char.isWhitespace 0) = ' '`
-/
@[inline] def nextUntil (s : String) (p : Char → Bool) (i : String.Pos) : String.Pos :=
  nextWhile s (fun c => !p c) i

Advance position until predicate holds

Informal description

Given a string $s$, a predicate $p$ on characters, and a starting byte position $i$, the function `nextUntil` returns the first position $j$ such that either: 1. $j$ is the end of the string, or 2. the character at position $j$ satisfies $p$, and all characters from $i$ to $j$ (exclusive) do not satisfy $p$. **Examples:** * For $s = \text{" a "}$ and $p = \text{Char.isWhitespace}$, starting at position $0$, the function returns the position of the first non-whitespace character (here, the position of 'a'). * For $s = \text{" a "}$ and $p = \text{Char.isLetter}$, starting at position $0$, the function returns the position of the first letter (here, the position of 'a'). * For $s = \text{"a "}$ and $p = \text{Char.isWhitespace}$, starting at position $0$, the function returns the position of the first whitespace character (here, the position of ' ').

String.toUpper definition

(s : String) : String

Full source

/--
Replaces each character in `s` with the result of applying `Char.toUpper` to it.

`Char.toUpper` has no effect on characters outside of the range `'a'`–`'z'`.

Examples:
* `"orange".toUpper = "ORANGE"`
* `"abc123".toUpper = "ABC123"`
-/
@[inline] def toUpper (s : String) : String :=
  s.map Char.toUpper

Uppercase conversion of a string

Informal description

The function converts each character in the string $s$ to its uppercase version using `Char.toUpper`. Characters outside the range 'a'–'z' remain unchanged. For example, applying this function to the string "orange" yields "ORANGE", and applying it to "abc123" yields "ABC123".

String.toLower definition

(s : String) : String

Full source

/--
Replaces each character in `s` with the result of applying `Char.toLower` to it.

`Char.toLower` has no effect on characters outside of the range `'A'`–`'Z'`.

Examples:
* `"ORANGE".toLower = "orange"`
* `"Orange".toLower = "orange"`
* `"ABc123".toLower = "abc123"`
-/
@[inline] def toLower (s : String) : String :=
  s.map Char.toLower

String lowercase conversion

Informal description

The function converts each uppercase ASCII letter in the string $s$ to its corresponding lowercase letter, leaving all other characters unchanged. Specifically, for each character $c$ in $s$, it applies `Char.toLower` to $c$. **Examples:** * $\text{toLower}\ \text{"ORANGE"} = \text{"orange"}$ * $\text{toLower}\ \text{"Orange"} = \text{"orange"}$ * $\text{toLower}\ \text{"ABc123"} = \text{"abc123"}$

String.capitalize definition

(s : String)

Full source

/--
Replaces the first character in `s` with the result of applying `Char.toUpper` to it. Returns the
empty string if the string is empty.

`Char.toUpper` has no effect on characters outside of the range `'a'`–`'z'`.

Examples:
* `"orange".capitalize = "Orange"`
* `"ORANGE".capitalize = "ORANGE"`
* `"".capitalize = ""`
-/
@[inline] def capitalize (s : String) :=
  s.set 0 <| s.get 0 |>.toUpper

String capitalization

Informal description

The function `capitalize` takes a string `s` and returns a new string where the first character is converted to uppercase using `Char.toUpper`, while leaving all other characters unchanged. If the input string is empty, the result is also an empty string. **Examples:** * $\text{capitalize}\ \text{"orange"} = \text{"Orange"}$ * $\text{capitalize}\ \text{"ORANGE"} = \text{"ORANGE"}$ * $\text{capitalize}\ \text{""} = \text{""}$

String.decapitalize definition

(s : String)

Full source

/--
Replaces the first character in `s` with the result of applying `Char.toLower` to it. Returns the
empty string if the string is empty.

`Char.toLower` has no effect on characters outside of the range `'A'`–`'Z'`.

Examples:
* `"Orange".decapitalize = "orange"`
* `"ORANGE".decapitalize = "oRANGE"`
* `"".decapitalize = ""`
-/
@[inline] def decapitalize (s : String) :=
  s.set 0 <| s.get 0 |>.toLower

String decapitalization

Informal description

The function replaces the first character of the string `s` with its lowercase version (using `Char.toLower`). If the string is empty, it returns the empty string. The operation has no effect on characters outside the ASCII uppercase range ('A'–'Z'). **Examples:** * $\text{decapitalize}\ \text{"Orange"} = \text{"orange"}$ * $\text{decapitalize}\ \text{"ORANGE"} = \text{"oRANGE"}$ * $\text{decapitalize}\ \text{""} = \text{""}$

String.dropPrefix? definition

(s : String) (pre : String) : Option Substring

Full source

/--
If `pre` is a prefix of `s`, returns the remainder. Returns `none` otherwise.

The string `pre` is a prefix of `s` if there exists a `t : String` such that `s = pre ++ t`. If so,
the result is `some t`.

Use `String.stripPrefix` to return the string unchanged when `pre` is not a prefix.

Examples:
 * `"red green blue".dropPrefix? "red " = some "green blue"`
 * `"red green blue".dropPrefix? "reed " = none`
 * `"red green blue".dropPrefix? "" = some "red green blue"`
-/
def dropPrefix? (s : String) (pre : String) : Option Substring :=
  s.toSubstring.dropPrefix? pre.toSubstring

String remainder after prefix removal

Informal description

Given a string $s$ and a prefix string $\text{pre}$, this function returns the remaining part of $s$ after removing $\text{pre}$ if $\text{pre}$ is indeed a prefix of $s$ (i.e., there exists a string $t$ such that $s = \text{pre} ++ t$). If $\text{pre}$ is not a prefix of $s$, the function returns `none`. **Examples:** * $\text{dropPrefix?}\ \text{"red green blue"}\ \text{"red "} = \text{some "green blue"}$ * $\text{dropPrefix?}\ \text{"red green blue"}\ \text{"reed "} = \text{none}$ * $\text{dropPrefix?}\ \text{"red green blue"}\ \text{""} = \text{some "red green blue"}$

String.dropSuffix? definition

(s : String) (suff : String) : Option Substring

Full source

/--
If `suff` is a suffix of `s`, returns the remainder. Returns `none` otherwise.

The string `suff` is a suffix of `s` if there exists a `t : String` such that `s = t ++ suff`. If so,
the result is `some t`.

Use `String.stripSuffix` to return the string unchanged when `suff` is not a suffix.

Examples:
 * `"red green blue".dropSuffix? " blue" = some "red green"`
 * `"red green blue".dropSuffix? " blu " = none`
 * `"red green blue".dropSuffix? "" = some "red green blue"`
-/
def dropSuffix? (s : String) (suff : String) : Option Substring :=
  s.toSubstring.dropSuffix? suff.toSubstring

String suffix removal

Informal description

Given a string $s$ and a suffix string $\text{suff}$, this function returns $\text{some}\ t$ where $t$ is the substring of $s$ without the suffix $\text{suff}$ if $\text{suff}$ is indeed a suffix of $s$. Otherwise, it returns $\text{none}$. More precisely, the function checks whether $\text{suff}$ is a suffix of $s$ by comparing their longest common suffix. If the byte size of this common suffix matches the byte size of $\text{suff}$, then $\text{suff}$ is a suffix of $s$, and the function returns the substring of $s$ up to the start of the common suffix. Otherwise, it returns $\text{none}$.

String.stripPrefix definition

(s : String) (pre : String) : String

Full source

/--
If `pre` is a prefix of `s`, returns the remainder. Returns `s` unmodified otherwise.

The string `pre` is a prefix of `s` if there exists a `t : String` such that `s = pre ++ t`. If so,
the result is `t`. Otherwise, it is `s`.

Use `String.dropPrefix?` to return `none` when `pre` is not a prefix.

Examples:
 * `"red green blue".stripPrefix "red " = "green blue"`
 * `"red green blue".stripPrefix "reed " = "red green blue"`
 * `"red green blue".stripPrefix "" = "red green blue"`
-/
def stripPrefix (s : String) (pre : String) : String :=
  s.dropPrefix? pre |>.map Substring.toString |>.getD s

String remainder after prefix removal (or original string)

Informal description

Given a string $s$ and a prefix string $\text{pre}$, this function returns the remainder of $s$ after removing $\text{pre}$ if $\text{pre}$ is a prefix of $s$ (i.e., there exists a string $t$ such that $s = \text{pre} ++ t$). Otherwise, it returns $s$ unchanged. **Examples:** * $\text{stripPrefix}\ \text{"red green blue"}\ \text{"red "} = \text{"green blue"}$ * $\text{stripPrefix}\ \text{"red green blue"}\ \text{"reed "} = \text{"red green blue"}$ * $\text{stripPrefix}\ \text{"red green blue"}\ \text{""} = \text{"red green blue"}$

String.stripSuffix definition

(s : String) (suff : String) : String

Full source

/--
If `suff` is a suffix of `s`, returns the remainder. Returns `s` unmodified otherwise.

The string `suff` is a suffix of `s` if there exists a `t : String` such that `s = t ++ suff`. If so,
the result is `t`. Otherwise, it is `s`.

Use `String.dropSuffix?` to return `none` when `suff` is not a suffix.

Examples:
 * `"red green blue".stripSuffix " blue" = "red green"`
 * `"red green blue".stripSuffix " blu " = "red green blue"`
 * `"red green blue".stripSuffix "" = "red green blue"`
-/
def stripSuffix (s : String) (suff : String) : String :=
  s.dropSuffix? suff |>.map Substring.toString |>.getD s

String suffix stripping

Informal description

Given a string $s$ and a suffix string $\text{suff}$, this function returns the string obtained by removing $\text{suff}$ from the end of $s$ if $\text{suff}$ is indeed a suffix of $s$. Otherwise, it returns $s$ unchanged. More precisely, if there exists a string $t$ such that $s = t + \text{suff}$, then the function returns $t$. If no such $t$ exists, it returns $s$.

Char.toString definition

(c : Char) : String

Full source

/--
Constructs a singleton string that contains only the provided character.

Examples:
 * `'L'.toString = "L"`
 * `'"'.toString = "\""`
-/
@[inline] protected def toString (c : Char) : String :=
  String.singleton c

String representation of a character

Informal description

The function maps a Unicode character $c$ to a string consisting solely of that character. The resulting string has length 1. Examples: - $\texttt{Char.toString 'L'} = \texttt{"L"}$ - $\texttt{Char.toString '"'} = \texttt{"\""}$

Char.length_toString theorem

(c : Char) : c.toString.length = 1

Full source

@[simp] theorem length_toString (c : Char) : c.toString.length = 1 := rfl

String Length of Character Representation is One

Informal description

For any Unicode character $c$, the length of the string representation of $c$ is equal to 1, i.e., $\text{length}(c.\text{toString}) = 1$.

String.ext theorem

{s₁ s₂ : String} (h : s₁.data = s₂.data) : s₁ = s₂

Full source

theorem ext {s₁ s₂ : String} (h : s₁.data = s₂.data) : s₁ = s₂ :=
  show ⟨s₁.data⟩ = (⟨s₂.data⟩ : String) from h ▸ rfl

String Equality via Character List Equality

Informal description

For any two strings $s_1$ and $s_2$, if their underlying character lists are equal (i.e., $s_1.\text{data} = s_2.\text{data}$), then $s_1 = s_2$.

String.ext_iff theorem

{s₁ s₂ : String} : s₁ = s₂ ↔ s₁.data = s₂.data

Full source

theorem ext_iff {s₁ s₂ : String} : s₁ = s₂ ↔ s₁.data = s₂.data := ⟨fun h => h ▸ rfl, ext⟩

String Equality via Character List Equality (iff version)

Informal description

For any two strings $s_1$ and $s_2$, they are equal if and only if their underlying lists of characters are equal, i.e., $s_1 = s_2 \leftrightarrow s_1.\text{data} = s_2.\text{data}$.

String.default_eq theorem

: default = ""

Full source

@[simp] theorem default_eq : default = "" := rfl

Default String is Empty

Informal description

The default string value is equal to the empty string, i.e., $\text{default} = ""$.

String.length_mk theorem

(s : List Char) : (String.mk s).length = s.length

Full source

@[simp] theorem length_mk (s : List Char) : (String.mk s).length = s.length := rfl

String Construction Preserves Length: $\text{length}(\text{mk}(s)) = \text{length}(s)$

Informal description

For any list of characters $s$, the length of the string constructed from $s$ is equal to the length of $s$, i.e., $\text{length}(\text{String.mk}(s)) = \text{length}(s)$.

String.length_empty theorem

: "".length = 0

Full source

@[simp] theorem length_empty : "".length = 0 := rfl

Empty String Length: $\text{length}("") = 0$

Informal description

The length of the empty string is zero, i.e., $\text{length}("") = 0$.

String.length_singleton theorem

(c : Char) : (String.singleton c).length = 1

Full source

@[simp] theorem length_singleton (c : Char) : (String.singleton c).length = 1 := rfl

Length of Singleton String is One

Informal description

For any Unicode character $c$, the length of the string consisting solely of $c$ is $1$, i.e., $\text{length}(\texttt{String.singleton } c) = 1$.

String.length_push theorem

(c : Char) : (String.push s c).length = s.length + 1

Full source

@[simp] theorem length_push (c : Char) : (String.push s c).length = s.length + 1 := by
  rw [push, length_mk, List.length_append, List.length_singleton, Nat.succ.injEq]
  rfl

Length of String After Push: $\text{length}(s \cdot c) = \text{length}(s) + 1$

Informal description

For any string $s$ and character $c$, the length of the string obtained by appending $c$ to $s$ is equal to the length of $s$ plus one, i.e., $\text{length}(\text{push}(s, c)) = \text{length}(s) + 1$.

String.length_pushn theorem

(c : Char) (n : Nat) : (pushn s c n).length = s.length + n

Full source

@[simp] theorem length_pushn (c : Char) (n : Nat) : (pushn s c n).length = s.length + n := by
  unfold pushn; induction n <;> simp [Nat.repeat, Nat.add_assoc, *]

Length of String After Repeated Push: $\text{length}(s \cdot c^n) = \text{length}(s) + n$

Informal description

For any string $s$, character $c$, and natural number $n$, the length of the string obtained by appending $n$ copies of $c$ to $s$ is equal to the length of $s$ plus $n$, i.e., $\text{length}(\text{pushn}(s, c, n)) = \text{length}(s) + n$.

String.length_append theorem

(s t : String) : (s ++ t).length = s.length + t.length

Full source

@[simp] theorem length_append (s t : String) : (s ++ t).length = s.length + t.length := by
  simp only [length, append, List.length_append]

Length of Concatenated Strings: $\text{length}(s \mathbin{+\kern-1.5ex+} t) = \text{length}(s) + \text{length}(t)$

Informal description

For any two strings $s$ and $t$, the length of their concatenation $s \mathbin{+\kern-1.5ex+} t$ is equal to the sum of their individual lengths, i.e., $$\text{length}(s \mathbin{+\kern-1.5ex+} t) = \text{length}(s) + \text{length}(t).$$

String.data_push theorem

(s : String) (c : Char) : (s.push c).data = s.data ++ [c]

Full source

@[simp] theorem data_push (s : String) (c : Char) : (s.push c).data = s.data ++ [c] := rfl

String Push Character Data Equals Append

Informal description

For any string $s$ and character $c$, the underlying list of characters of the string obtained by appending $c$ to $s$ is equal to the concatenation of the original list of characters of $s$ with the singleton list containing $c$. In other words, $(s\text{.push } c)\text{.data} = s\text{.data} \mathbin{+\kern-1.5ex+} [c]$.

String.data_append theorem

(s t : String) : (s ++ t).data = s.data ++ t.data

Full source

@[simp] theorem data_append (s t : String) : (s ++ t).data = s.data ++ t.data := rfl

String Concatenation Preserves Underlying Character Lists

Informal description

For any strings $s$ and $t$, the underlying list of characters of the concatenated string $s \mathbin{+\kern-1.5ex+} t$ is equal to the concatenation of the underlying lists of characters of $s$ and $t$. In other words, $(s \mathbin{+\kern-1.5ex+} t)\text{.data} = s\text{.data} \mathbin{+\kern-1.5ex+} t\text{.data}$.

String.lt_iff theorem

{s t : String} : s < t ↔ s.data < t.data

Full source

theorem lt_iff {s t : String} : s < t ↔ s.data < t.data := .rfl

Lexicographic Order on Strings via Character Lists

Informal description

For any two strings $s$ and $t$, the lexicographic order $s < t$ holds if and only if the underlying list of characters of $s$ is lexicographically less than the underlying list of characters of $t$.

String.Pos.byteIdx_zero theorem

: (0 : Pos).byteIdx = 0

Full source

@[simp] theorem byteIdx_zero : (0 : Pos).byteIdx = 0 := rfl

Byte Index of Zero String Position is Zero

Informal description

The byte index of the zero position in a UTF-8 string is equal to 0, i.e., $\text{byteIdx}(0) = 0$.

String.Pos.byteIdx_mk theorem

(n : Nat) : byteIdx ⟨n⟩ = n

Full source

theorem byteIdx_mk (n : Nat) : byteIdx ⟨n⟩ = n := rfl

Byte Index of Constructed String Position Equals Input Natural Number

Informal description

For any natural number $n$, the byte index of the string position constructed from $n$ is equal to $n$ itself, i.e., $\text{byteIdx}(\langle n \rangle) = n$.

String.Pos.mk_zero theorem

: ⟨0⟩ = (0 : Pos)

Full source

@[simp] theorem mk_zero : ⟨0⟩ = (0 : Pos) := rfl

Zero Byte Position Construction Identity

Informal description

The constructed byte position from the natural number $0$ is equal to the zero byte position in a UTF-8 encoded string, i.e., $\langle 0 \rangle = 0$.

String.Pos.mk_byteIdx theorem

(p : Pos) : ⟨p.byteIdx⟩ = p

Full source

@[simp] theorem mk_byteIdx (p : Pos) : ⟨p.byteIdx⟩ = p := rfl

Byte Position Construction from Index is Identity

Informal description

For any byte position $p$ in a UTF-8 encoded string, the constructed position from its byte index is equal to $p$ itself, i.e., $\langle p.\text{byteIdx} \rangle = p$.

String.Pos.ext theorem

{i₁ i₂ : Pos} (h : i₁.byteIdx = i₂.byteIdx) : i₁ = i₂

Full source

theorem ext {i₁ i₂ : Pos} (h : i₁.byteIdx = i₂.byteIdx) : i₁ = i₂ :=
  show ⟨i₁.byteIdx⟩ = (⟨i₂.byteIdx⟩ : Pos) from h ▸ rfl

Equality of String Positions via Byte Indices

Informal description

For any two byte positions $i_1$ and $i_2$ in a UTF-8 encoded string, if their underlying byte indices are equal ($i_1.\text{byteIdx} = i_2.\text{byteIdx}$), then the positions themselves are equal ($i_1 = i_2$).

String.Pos.ext_iff theorem

{i₁ i₂ : Pos} : i₁ = i₂ ↔ i₁.byteIdx = i₂.byteIdx

Full source

theorem ext_iff {i₁ i₂ : Pos} : i₁ = i₂ ↔ i₁.byteIdx = i₂.byteIdx := ⟨fun h => h ▸ rfl, ext⟩

Equivalence of String Position Equality and Byte Index Equality

Informal description

For any two byte positions $i_1$ and $i_2$ in a UTF-8 encoded string, $i_1 = i_2$ if and only if their underlying byte indices are equal, i.e., $i_1.\text{byteIdx} = i_2.\text{byteIdx}$.

String.Pos.add_byteIdx theorem

(p₁ p₂ : Pos) : (p₁ + p₂).byteIdx = p₁.byteIdx + p₂.byteIdx

Full source

@[simp] theorem add_byteIdx (p₁ p₂ : Pos) : (p₁ + p₂).byteIdx = p₁.byteIdx + p₂.byteIdx := rfl

Additivity of Byte Indices in UTF-8 String Position Addition

Informal description

For any two byte positions $p_1$ and $p_2$ in a UTF-8 encoded string, the byte index of their sum $p_1 + p_2$ is equal to the sum of their individual byte indices, i.e., $(p_1 + p_2).\text{byteIdx} = p_1.\text{byteIdx} + p_2.\text{byteIdx}$.

String.Pos.add_eq theorem

(p₁ p₂ : Pos) : p₁ + p₂ = ⟨p₁.byteIdx + p₂.byteIdx⟩

Full source

theorem add_eq (p₁ p₂ : Pos) : p₁ + p₂ = ⟨p₁.byteIdx + p₂.byteIdx⟩ := rfl

Sum of UTF-8 String Positions Equals Sum of Byte Indices

Informal description

For any two byte positions $p_1$ and $p_2$ in a UTF-8 encoded string, the sum $p_1 + p_2$ is equal to the byte position constructed from the sum of their underlying byte indices, i.e., $p_1 + p_2 = \langle p_1.\text{byteIdx} + p_2.\text{byteIdx} \rangle$.

String.Pos.sub_byteIdx theorem

(p₁ p₂ : Pos) : (p₁ - p₂).byteIdx = p₁.byteIdx - p₂.byteIdx

Full source

@[simp] theorem sub_byteIdx (p₁ p₂ : Pos) : (p₁ - p₂).byteIdx = p₁.byteIdx - p₂.byteIdx := rfl

Byte Index Subtraction for UTF-8 String Positions

Informal description

For any two byte positions $p_1$ and $p_2$ in a UTF-8 encoded string, the byte index of their difference $p_1 - p_2$ equals the difference of their byte indices, i.e., $(p_1 - p_2).\text{byteIdx} = p_1.\text{byteIdx} - p_2.\text{byteIdx}$.

String.Pos.sub_eq theorem

(p₁ p₂ : Pos) : p₁ - p₂ = ⟨p₁.byteIdx - p₂.byteIdx⟩

Full source

theorem sub_eq (p₁ p₂ : Pos) : p₁ - p₂ = ⟨p₁.byteIdx - p₂.byteIdx⟩ := rfl

Subtraction of UTF-8 String Positions Equals Difference of Byte Indices

Informal description

For any two byte positions $p_1$ and $p_2$ in a UTF-8 encoded string, the subtraction $p_1 - p_2$ is equal to the byte position constructed from the difference of their underlying byte indices, i.e., $p_1 - p_2 = \langle p_1.\text{byteIdx} - p_2.\text{byteIdx} \rangle$.

String.Pos.addChar_byteIdx theorem

(p : Pos) (c : Char) : (p + c).byteIdx = p.byteIdx + c.utf8Size

Full source

@[simp] theorem addChar_byteIdx (p : Pos) (c : Char) : (p + c).byteIdx = p.byteIdx + c.utf8Size := rfl

Byte Index Addition Formula for UTF-8 String Positions

Informal description

For any byte position $p$ in a UTF-8 encoded string and any Unicode character $c$, the byte index of the position $p + c$ is equal to the sum of the byte index of $p$ and the UTF-8 encoding size of $c$, i.e., $(p + c).\text{byteIdx} = p.\text{byteIdx} + c.\text{utf8Size}$.

String.Pos.addChar_eq theorem

(p : Pos) (c : Char) : p + c = ⟨p.byteIdx + c.utf8Size⟩

Full source

theorem addChar_eq (p : Pos) (c : Char) : p + c = ⟨p.byteIdx + c.utf8Size⟩ := rfl

UTF-8 Position Addition Formula: $p + c = \langle p.\text{byteIdx} + c.\text{utf8Size} \rangle$

Informal description

For any byte position $p$ in a UTF-8 encoded string and any Unicode character $c$, the position obtained by adding $c$ to $p$ is equal to the byte position constructed from the sum of $p$'s byte index and the UTF-8 encoding size of $c$, i.e., $p + c = \langle p.\text{byteIdx} + c.\text{utf8Size} \rangle$.

String.Pos.zero_addChar_byteIdx theorem

(c : Char) : ((0 : Pos) + c).byteIdx = c.utf8Size

Full source

theorem zero_addChar_byteIdx (c : Char) : ((0 : Pos) + c).byteIdx = c.utf8Size := by
  simp only [addChar_byteIdx, byteIdx_zero, Nat.zero_add]

Byte Index of Zero Position Plus Character Equals Character's UTF-8 Size

Informal description

For any Unicode character $c$, the byte index of the position obtained by adding $c$ to the zero position in a UTF-8 string equals the UTF-8 encoding size of $c$, i.e., $(0 + c).\text{byteIdx} = c.\text{utf8Size}$.

String.Pos.zero_addChar_eq theorem

(c : Char) : (0 : Pos) + c = ⟨c.utf8Size⟩

Full source

theorem zero_addChar_eq (c : Char) : (0 : Pos) + c = ⟨c.utf8Size⟩ := by rw [← zero_addChar_byteIdx]

UTF-8 Position Addition from Zero: $0 + c = \langle c.\text{utf8Size} \rangle$

Informal description

For any Unicode character $c$, the byte position obtained by adding $c$ to the zero position in a UTF-8 string is equal to the byte position constructed from the UTF-8 encoding size of $c$, i.e., $0 + c = \langle c.\text{utf8Size} \rangle$.

String.Pos.addChar_right_comm theorem

(p : Pos) (c₁ c₂ : Char) : p + c₁ + c₂ = p + c₂ + c₁

Full source

theorem addChar_right_comm (p : Pos) (c₁ c₂ : Char) : p + c₁ + c₂ = p + c₂ + c₁ := by
  apply ext
  repeat rw [pos_add_char]
  apply Nat.add_right_comm

Right Commutativity of Character Addition in UTF-8 String Positions: $(p + c_1) + c_2 = (p + c_2) + c_1$

Informal description

For any byte position $p$ in a UTF-8 encoded string and any Unicode characters $c_1$ and $c_2$, the following equality holds: $$(p + c_1) + c_2 = (p + c_2) + c_1$$

String.Pos.ne_of_lt theorem

{i₁ i₂ : Pos} (h : i₁ < i₂) : i₁ ≠ i₂

Full source

theorem ne_of_lt {i₁ i₂ : Pos} (h : i₁ < i₂) : i₁ ≠ i₂ := mt ext_iff.1 (Nat.ne_of_lt h)

Strict Order Implies Inequality for String Positions

Informal description

For any two byte positions $i_1$ and $i_2$ in a UTF-8 encoded string, if $i_1 < i_2$, then $i_1 \neq i_2$.

String.Pos.ne_of_gt theorem

{i₁ i₂ : Pos} (h : i₁ < i₂) : i₂ ≠ i₁

Full source

theorem ne_of_gt {i₁ i₂ : Pos} (h : i₁ < i₂) : i₂ ≠ i₁ := (ne_of_lt h).symm

Inequality from Strict Order for String Positions (Reverse Version)

Informal description

For any two byte positions $i_1$ and $i_2$ in a UTF-8 encoded string, if $i_1 < i_2$, then $i_2 \neq i_1$.

String.Pos.byteIdx_addString theorem

(p : Pos) (s : String) : (p + s).byteIdx = p.byteIdx + s.utf8ByteSize

Full source

@[simp] theorem byteIdx_addString (p : Pos) (s : String) :
    (p + s).byteIdx = p.byteIdx + s.utf8ByteSize := rfl

Byte Index Addition Property for String Positions: $(p + s).\text{byteIdx} = p.\text{byteIdx} + s.\text{utf8ByteSize}$

Informal description

For any byte position $p$ in a UTF-8 encoded string and any string $s$, the byte index of the position resulting from adding $s$ to $p$ is equal to the sum of the byte index of $p$ and the UTF-8 byte size of $s$, i.e., $(p + s).\text{byteIdx} = p.\text{byteIdx} + s.\text{utf8ByteSize}$.

String.Pos.addString_byteIdx abbrev

Full source

@[deprecated byteIdx_addString (since := "2025-03-18")]
abbrev addString_byteIdx := @byteIdx_addString

Byte Index Addition Property for String Positions: $(p + s).\text{byteIdx} = p.\text{byteIdx} + s.\text{utf8ByteSize}$

Informal description

For any byte position $p$ in a UTF-8 encoded string and any string $s$, the byte index of the position resulting from adding $s$ to $p$ is equal to the sum of the byte index of $p$ and the UTF-8 byte size of $s$, i.e., $(p + s).\text{byteIdx} = p.\text{byteIdx} + s.\text{utf8ByteSize}$.

String.Pos.addString_eq theorem

(p : Pos) (s : String) : p + s = ⟨p.byteIdx + s.utf8ByteSize⟩

Full source

theorem addString_eq (p : Pos) (s : String) : p + s = ⟨p.byteIdx + s.utf8ByteSize⟩ := rfl

UTF-8 String Position Addition Formula

Informal description

For any byte position $p$ in a UTF-8 encoded string and any string $s$, the result of adding $s$ to $p$ is a new byte position whose byte index equals the sum of $p$'s byte index and the UTF-8 byte size of $s$. In other words, $p + s = \langle p.\text{byteIdx} + s.\text{utf8ByteSize} \rangle$.

String.Pos.byteIdx_zero_addString theorem

(s : String) : ((0 : Pos) + s).byteIdx = s.utf8ByteSize

Full source

theorem byteIdx_zero_addString (s : String) : ((0 : Pos) + s).byteIdx = s.utf8ByteSize := by
  simp only [byteIdx_addString, byteIdx_zero, Nat.zero_add]

Byte Index of Zero Position Plus String Equals String's UTF-8 Byte Size

Informal description

For any string $s$, the byte index of the position obtained by adding $s$ to the zero position (0 : Pos) equals the UTF-8 byte size of $s$, i.e., $(0 + s).\text{byteIdx} = s.\text{utf8ByteSize}$.

String.Pos.zero_addString_byteIdx abbrev

Full source

@[deprecated byteIdx_zero_addString (since := "2025-03-18")]
abbrev zero_addString_byteIdx := @byteIdx_zero_addString

Byte Index of Zero Position Plus String Equals String's UTF-8 Byte Size

Informal description

For any string $s$, the byte index of the position obtained by adding $s$ to the zero position $(0 : \text{Pos})$ equals the UTF-8 byte size of $s$, i.e., $(0 + s).\text{byteIdx} = s.\text{utf8ByteSize}$.

String.Pos.zero_addString_eq theorem

(s : String) : (0 : Pos) + s = ⟨s.utf8ByteSize⟩

Full source

theorem zero_addString_eq (s : String) : (0 : Pos) + s = ⟨s.utf8ByteSize⟩ := by
  rw [← byteIdx_zero_addString]

Zero Position Plus String Equals Byte Size Position

Informal description

For any string $s$, the result of adding $s$ to the zero position $(0 : \text{Pos})$ is equal to the string position constructed from the UTF-8 byte size of $s$, i.e., $0 + s = \langle s.\text{utf8ByteSize} \rangle$.

String.Pos.le_iff theorem

{i₁ i₂ : Pos} : i₁ ≤ i₂ ↔ i₁.byteIdx ≤ i₂.byteIdx

Full source

theorem le_iff {i₁ i₂ : Pos} : i₁ ≤ i₂ ↔ i₁.byteIdx ≤ i₂.byteIdx := .rfl

String Position Ordering Correspondence: $i₁ \leq i₂ \leftrightarrow i₁.\text{byteIdx} \leq i₂.\text{byteIdx}$

Informal description

For any two string positions $i₁$ and $i₂$, the relation $i₁ \leq i₂$ holds if and only if their underlying byte indices satisfy $i₁.\text{byteIdx} \leq i₂.\text{byteIdx}$.

String.Pos.mk_le_mk theorem

{i₁ i₂ : Nat} : Pos.mk i₁ ≤ Pos.mk i₂ ↔ i₁ ≤ i₂

Full source

@[simp] theorem mk_le_mk {i₁ i₂ : Nat} : Pos.mkPos.mk i₁ ≤ Pos.mk i₂ ↔ i₁ ≤ i₂ := .rfl

String Position Ordering Correspondence: $\text{Pos.mk}(i₁) \leq \text{Pos.mk}(i₂) \leftrightarrow i₁ \leq i₂$

Informal description

For any natural numbers $i₁$ and $i₂$, the string position constructed from $i₁$ is less than or equal to the string position constructed from $i₂$ if and only if $i₁ \leq i₂$.

String.Pos.lt_iff theorem

{i₁ i₂ : Pos} : i₁ < i₂ ↔ i₁.byteIdx < i₂.byteIdx

Full source

theorem lt_iff {i₁ i₂ : Pos} : i₁ < i₂ ↔ i₁.byteIdx < i₂.byteIdx := .rfl

String Position Strict Ordering Correspondence: $i₁ < i₂ \leftrightarrow i₁.\text{byteIdx} < i₂.\text{byteIdx}$

Informal description

For any two string positions $i₁$ and $i₂$, the relation $i₁ < i₂$ holds if and only if their underlying byte indices satisfy $i₁.\text{byteIdx} < i₂.\text{byteIdx}$.

String.Pos.mk_lt_mk theorem

{i₁ i₂ : Nat} : Pos.mk i₁ < Pos.mk i₂ ↔ i₁ < i₂

Full source

@[simp] theorem mk_lt_mk {i₁ i₂ : Nat} : Pos.mkPos.mk i₁ < Pos.mk i₂ ↔ i₁ < i₂ := .rfl

String Position Strict Ordering Correspondence: $\text{Pos.mk}(i₁) < \text{Pos.mk}(i₂) \leftrightarrow i₁ < i₂$

Informal description

For any natural numbers $i₁$ and $i₂$, the string position constructed from $i₁$ is strictly less than the string position constructed from $i₂$ if and only if $i₁ < i₂$.

String.get!_eq_get theorem

(s : String) (p : Pos) : get! s p = get s p

Full source

@[simp] theorem get!_eq_get (s : String) (p : Pos) : get! s p = get s p := rfl

Equality of Safe and Unsafe Character Retrieval in Strings

Informal description

For any string $s$ and byte position $p$ in $s$, the unsafe character retrieval function `get!` returns the same character as the safe retrieval function `get`, i.e., $\text{get!}(s, p) = \text{get}(s, p)$.

String.lt_next' theorem

(s : String) (p : Pos) : p < next s p

Full source

theorem lt_next' (s : String) (p : Pos) : p < next s p := lt_next ..

Strict Monotonicity of Next Character Boundary in Strings

Informal description

For any string $s$ and any byte position $p$ in $s$, the position $p$ is strictly less than the next character boundary position $\text{next}(s, p)$.

String.prev_zero theorem

(s : String) : prev s 0 = 0

Full source

@[simp] theorem prev_zero (s : String) : prev s 0 = 0 := rfl

Initial Position Invariance under `prev` Operation

Informal description

For any string $s$, the function `prev` applied to the initial byte position $0$ returns $0$, i.e., $\text{prev}(s, 0) = 0$.

String.get'_eq theorem

(s : String) (p : Pos) (h) : get' s p h = get s p

Full source

@[simp] theorem get'_eq (s : String) (p : Pos) (h) : get' s p h = get s p := rfl

Equality of Character Retrieval Functions at Valid Positions

Informal description

For any string $s$ and byte position $p$ in $s$, if $h$ is a proof that $p$ is not at the end of $s$, then the character retrieved at position $p$ using `get'` is equal to the character retrieved using `get`, i.e., $\texttt{get'}(s, p, h) = \texttt{get}(s, p)$.

String.next'_eq theorem

(s : String) (p : Pos) (h) : next' s p h = next s p

Full source

@[simp] theorem next'_eq (s : String) (p : Pos) (h) : next' s p h = next s p := rfl

Equivalence of `next'` and `next` for valid string positions

Informal description

For any string $s$ and byte position $p$ in $s$ with proof $h$ that $p$ is not at the end of $s$, the function `next'` applied to $s$, $p$, and $h$ equals the function `next` applied to $s$ and $p$, i.e., $\texttt{next'}(s, p, h) = \texttt{next}(s, p)$.

String.singleton_eq theorem

(c : Char) : singleton c = ⟨[c]⟩

Full source

theorem singleton_eq (c : Char) : singleton c = ⟨[c]⟩ := rfl

Singleton String Construction from Character

Informal description

For any Unicode character $c$, the string consisting solely of $c$ is equal to the string constructed from the singleton list $[c]$, i.e., $\texttt{singleton}(c) = \langle [c] \rangle$.

String.data_singleton theorem

(c : Char) : (singleton c).data = [c]

Full source

@[simp] theorem data_singleton (c : Char) : (singleton c).data = [c] := rfl

Underlying List of a Single-Character String is Singleton List

Informal description

For any Unicode character $c$, the underlying list of characters in the string consisting solely of $c$ is the singleton list $[c]$. That is, $\texttt{singleton}(c).\texttt{data} = [c]$.

String.append_empty theorem

(s : String) : s ++ "" = s

Full source

@[simp] theorem append_empty (s : String) : s ++ "" = s := ext (List.append_nil _)

Right Identity Property of String Concatenation

Informal description

For any string $s$, concatenating $s$ with the empty string `""` yields $s$ itself, i.e., $s \mathbin{+\kern-0.5em+} \text{""} = s$.

String.empty_append theorem

(s : String) : "" ++ s = s

Full source

@[simp] theorem empty_append (s : String) : "" ++ s = s := rfl

Left Identity of String Concatenation: `"" ++ s = s`

Informal description

For any string $s$, the concatenation of the empty string `""` with $s$ equals $s$, i.e., `"" ++ s = s`.

String.append_assoc theorem

(s₁ s₂ s₃ : String) : (s₁ ++ s₂) ++ s₃ = s₁ ++ (s₂ ++ s₃)

Full source

theorem append_assoc (s₁ s₂ s₃ : String) : (s₁ ++ s₂) ++ s₃ = s₁ ++ (s₂ ++ s₃) :=
  ext (List.append_assoc ..)

Associativity of String Concatenation: $(s_1 \mathbin{+\kern-1.5ex+} s_2) \mathbin{+\kern-1.5ex+} s_3 = s_1 \mathbin{+\kern-1.5ex+} (s_2 \mathbin{+\kern-1.5ex+} s_3)$

Informal description

For any three strings $s_1$, $s_2$, and $s_3$, the concatenation operation satisfies the associativity property: $$(s_1 \mathbin{+\kern-1.5ex+} s_2) \mathbin{+\kern-1.5ex+} s_3 = s_1 \mathbin{+\kern-1.5ex+} (s_2 \mathbin{+\kern-1.5ex+} s_3)$$ where $\mathbin{+\kern-1.5ex+}$ denotes string concatenation.

Substring.prev_zero theorem

(s : Substring) : s.prev 0 = 0

Full source

@[simp] theorem prev_zero (s : Substring) : s.prev 0 = 0 := by simp [prev, Pos.add_eq, Pos.byteIdx_zero]

Previous Position at Zero: $\text{prev}(s, 0) = 0$

Informal description

For any substring $s$, the previous position function evaluated at the zero byte position returns zero, i.e., $\text{prev}(s, 0) = 0$.

Substring.prevn_zero theorem

(s : Substring) : ∀ n, s.prevn n 0 = 0

Full source

@[simp] theorem prevn_zero (s : Substring) : ∀ n, s.prevn n 0 = 0
  | 0 => rfl
  | n+1 => by simp [prevn, prevn_zero s n]

$n$-th Previous Position at Zero: $\text{prevn}(s, n, 0) = 0$

Informal description

For any substring $s$ and any natural number $n$, the $n$-th previous position function evaluated at the zero byte position returns zero, i.e., $\text{prevn}(s, n, 0) = 0$.