doc-next-gen

Init.Data.Char.Basic

Module docstring

{}

isValidChar definition
(n : UInt32) : Prop
Full source
/-- Determines if the given integer is a valid [Unicode scalar value](https://www.unicode.org/glossary/#unicode_scalar_value).

Note that values in `[0xd800, 0xdfff]` are reserved for [UTF-16 surrogate pairs](https://en.wikipedia.org/wiki/Universal_Character_Set_characters#Surrogates).
-/
@[inline, reducible] def isValidChar (n : UInt32) : Prop :=
  n < 0xd800 ∨ (0xdfff < n ∧ n < 0x110000)
Valid Unicode scalar value
Informal description
A 32-bit unsigned integer \( n \) is a valid Unicode scalar value if it satisfies either \( n < 0xd800 \) or \( 0xdfff < n < 0x110000 \). This excludes the surrogate pair range \([0xd800, 0xdfff]\) and ensures the value is within the valid Unicode range.
Char.lt definition
(a b : Char) : Prop
Full source
/--
One character is less than another if its code point is strictly less than the other's.
-/
protected def lt (a b : Char) : Prop := a.val < b.val
Character ordering by code point
Informal description
For two Unicode characters \( a \) and \( b \), the relation \( a < b \) holds if and only if the code point of \( a \) is strictly less than the code point of \( b \).
Char.le definition
(a b : Char) : Prop
Full source
/--
One character is less than or equal to another if its code point is less than or equal to the
other's.
-/
protected def le (a b : Char) : Prop := a.val ≤ b.val
Character ordering by code points
Informal description
For two Unicode characters $a$ and $b$, the relation $a \leq b$ holds if and only if the code point of $a$ is less than or equal to the code point of $b$.
Char.instLT instance
: LT Char
Full source
instance : LT Char := ⟨Char.lt⟩
The Less-Than Relation on Unicode Characters
Informal description
The type of Unicode characters is equipped with a canonical less-than relation, where for any two characters $a$ and $b$, $a < b$ holds if and only if the code point of $a$ is strictly less than the code point of $b$.
Char.instLE instance
: LE Char
Full source
instance : LE Char := ⟨Char.le⟩
Ordering on Unicode Characters by Code Points
Informal description
Unicode characters are equipped with a canonical ordering relation $\leq$ based on their code points.
Char.instDecidableLt instance
(a b : Char) : Decidable (a < b)
Full source
instance (a b : Char) :  Decidable (a < b) :=
  UInt32.decLt _ _
Decidability of Strict Order on Unicode Characters
Informal description
For any two Unicode characters $a$ and $b$, the strict order relation $a < b$ is decidable. Here, $a < b$ holds if and only if the code point of $a$ is strictly less than the code point of $b$.
Char.instDecidableLe instance
(a b : Char) : Decidable (a ≤ b)
Full source
instance (a b : Char) : Decidable (a ≤ b) :=
  UInt32.decLe _ _
Decidability of the Ordering on Unicode Characters
Informal description
For any two Unicode characters $a$ and $b$, the proposition $a \leq b$ (where $\leq$ is the canonical ordering based on code points) is decidable.
Char.isValidCharNat abbrev
(n : Nat) : Prop
Full source
/--
True for natural numbers that are valid [Unicode scalar
values](https://www.unicode.org/glossary/#unicode_scalar_value).
-/
abbrev isValidCharNat (n : Nat) : Prop :=
  n < 0xd800 ∨ (0xdfff < n ∧ n < 0x110000)
Natural Number as Valid Unicode Scalar Value
Informal description
A natural number $n$ is a valid Unicode scalar value if it satisfies the conditions specified by the Unicode standard for scalar values.
Char.isValidUInt32 theorem
(n : Nat) (h : isValidCharNat n) : n < UInt32.size
Full source
theorem isValidUInt32 (n : Nat) (h : isValidCharNat n) : n < UInt32.size := by
  match h with
  | Or.inl h        =>
    apply Nat.lt_trans h
    decide
  | Or.inr ⟨_,  h₂⟩ =>
    apply Nat.lt_trans h₂
    decide
Valid Unicode Scalar Values are Below $2^{32}$
Informal description
For any natural number $n$ that is a valid Unicode scalar value (i.e., satisfies `isValidCharNat n`), it holds that $n$ is strictly less than $2^{32}$ (the size of the `UInt32` type).
Char.isValidChar_of_isValidCharNat theorem
(n : Nat) (h : isValidCharNat n) : isValidChar (UInt32.ofNatLT n (isValidUInt32 n h))
Full source
theorem isValidChar_of_isValidCharNat (n : Nat) (h : isValidCharNat n) : isValidChar (UInt32.ofNatLT n (isValidUInt32 n h)) :=
  match h with
  | Or.inl h =>
    Or.inl (UInt32.ofNatLT_lt_of_lt _ (by decide) h)
  | Or.inr ⟨h₁, h₂⟩ =>
    Or.inr ⟨UInt32.lt_ofNatLT_of_lt _ (by decide) h₁, UInt32.ofNatLT_lt_of_lt _ (by decide) h₂⟩
Preservation of Unicode Validity from Natural Numbers to UInt32
Informal description
For any natural number $n$ that is a valid Unicode scalar value (i.e., satisfies `isValidCharNat n`), the unsigned 32-bit integer representation of $n$ (obtained via `UInt32.ofNatLT`) is also a valid Unicode scalar value.
Char.isValidChar_zero theorem
: isValidChar 0
Full source
theorem isValidChar_zero : isValidChar 0 :=
  Or.inl (by decide)
Validity of Zero as a Unicode Scalar Value
Informal description
The Unicode scalar value corresponding to the zero unsigned 32-bit integer (0) is valid, i.e., it satisfies the condition for being a valid Unicode character.
Char.toNat definition
(c : Char) : Nat
Full source
/--
The character's Unicode code point as a `Nat`.
-/
@[inline] def toNat (c : Char) : Nat :=
  c.val.toNat
Unicode code point of a character
Informal description
The function maps a Unicode character \( c \) to its Unicode code point as a natural number.
Char.toUInt8 definition
(c : Char) : UInt8
Full source
/--
Converts a character into a `UInt8` that contains its code point.

If the code point is larger than 255, it is truncated (reduced modulo 256).
-/
@[inline] def toUInt8 (c : Char) : UInt8 :=
  c.val.toUInt8
Character to unsigned 8-bit integer conversion
Informal description
The function maps a Unicode character \( c \) to an unsigned 8-bit integer representing its code point modulo 256 (truncated if the code point exceeds 255).
Char.ofUInt8 definition
(n : UInt8) : Char
Full source
/--
Converts an 8-bit unsigned integer into a character.

The integer's value is interpreted as a Unicode code point.
-/
def ofUInt8 (n : UInt8) : Char := ⟨n.toUInt32, .inl (Nat.lt_trans n.toBitVec.isLt (by decide))⟩
Character from unsigned 8-bit integer
Informal description
The function converts an unsigned 8-bit integer \( n \) into a Unicode character, interpreting the integer's value as a Unicode code point.
Char.instInhabited instance
: Inhabited Char
Full source
instance : Inhabited Char where
  default := 'A'
Inhabited Type of Unicode Characters
Informal description
The type of Unicode characters is inhabited, with a default character (typically the null character `'\0'`).
Char.isWhitespace definition
(c : Char) : Bool
Full source
/--
Returns `true` if the character is a space `(' ', U+0020)`, a tab `('\t', U+0009)`, a carriage
return `('\r', U+000D)`, or a newline `('\n', U+000A)`.
-/
@[inline] def isWhitespace (c : Char) : Bool :=
  c = ' ' || c = '\t'c = ' ' || c = '\t' || c = '\r'c = ' ' || c = '\t' || c = '\r' || c = '\n'
Check if character is whitespace
Informal description
The function returns `true` if the Unicode character `c` is a whitespace character, specifically the space character `' '` (U+0020), tab `'\t'` (U+0009), carriage return `'\r'` (U+000D), or newline `'\n'` (U+000A), and `false` otherwise.
Char.isUpper definition
(c : Char) : Bool
Full source
/--
Returns `true` if the character is a uppercase ASCII letter.

The uppercase ASCII letters are the following: `ABCDEFGHIJKLMNOPQRSTUVWXYZ`.
-/
@[inline] def isUpper (c : Char) : Bool :=
  c.val ≥ 65 && c.val ≤ 90
Check if character is uppercase ASCII letter
Informal description
The function returns `true` if the Unicode character `c` is an uppercase ASCII letter (i.e., its code point is between 65 and 90 inclusive, corresponding to `A` to `Z`), and `false` otherwise.
Char.isLower definition
(c : Char) : Bool
Full source
/--
Returns `true` if the character is a lowercase ASCII letter.

The lowercase ASCII letters are the following: `abcdefghijklmnopqrstuvwxyz`.
-/
@[inline] def isLower (c : Char) : Bool :=
  c.val ≥ 97 && c.val ≤ 122
Check if a character is a lowercase ASCII letter
Informal description
The function returns `true` if the Unicode character `c` is a lowercase ASCII letter, i.e., its code point is between 97 (`'a'`) and 122 (`'z'`) inclusive.
Char.isAlpha definition
(c : Char) : Bool
Full source
/--
Returns `true` if the character is an ASCII letter.

The ASCII letters are the following: `ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz`.
-/
@[inline] def isAlpha (c : Char) : Bool :=
  c.isUpper || c.isLower
Check if character is an ASCII letter
Informal description
The function returns `true` if the Unicode character `c` is an ASCII letter (either uppercase `A`-`Z` or lowercase `a`-`z`), and `false` otherwise.
Char.isDigit definition
(c : Char) : Bool
Full source
/--
Returns `true` if the character is an ASCII digit.

The ASCII digits are the following: `0123456789`.
-/
@[inline] def isDigit (c : Char) : Bool :=
  c.val ≥ 48 && c.val ≤ 57
Check if character is an ASCII digit
Informal description
The function returns `true` if the Unicode character `c` is an ASCII digit, i.e., its code point lies between 48 (for '0') and 57 (for '9') inclusive.
Char.isAlphanum definition
(c : Char) : Bool
Full source
/--
Returns `true` if the character is an ASCII letter or digit.

The ASCII letters are the following: `ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz`.
The ASCII digits are the following: `0123456789`.
-/
@[inline] def isAlphanum (c : Char) : Bool :=
  c.isAlpha || c.isDigit
Check if character is an ASCII letter or digit
Informal description
The function returns `true` if the Unicode character `c` is an ASCII letter (either uppercase `A`-`Z` or lowercase `a`-`z`) or an ASCII digit (`0`-`9`), and `false` otherwise.
Char.toLower definition
(c : Char) : Char
Full source
/--
Converts an uppercase ASCII letter to the corresponding lowercase letter. Letters outside the ASCII
alphabet are returned unchanged.

The uppercase ASCII letters are the following: `ABCDEFGHIJKLMNOPQRSTUVWXYZ`.
-/
def toLower (c : Char) : Char :=
  let n := toNat c;
  if n >= 65 ∧ n <= 90 then ofNat (n + 32) else c
Lowercase conversion of ASCII letters
Informal description
The function converts an uppercase ASCII letter to its corresponding lowercase letter. Specifically, if the Unicode code point of the character \( c \) is between 65 (for 'A') and 90 (for 'Z') inclusive, it returns the character with code point increased by 32 (to get the lowercase equivalent). Otherwise, it returns the character unchanged.
Char.toUpper definition
(c : Char) : Char
Full source
/--
Converts a lowercase ASCII letter to the corresponding uppercase letter. Letters outside the ASCII
alphabet are returned unchanged.

The lowercase ASCII letters are the following: `abcdefghijklmnopqrstuvwxyz`.
-/
def toUpper (c : Char) : Char :=
  let n := toNat c;
  if n >= 97 ∧ n <= 122 then ofNat (n - 32) else c
Uppercase conversion of ASCII letters
Informal description
The function converts a lowercase ASCII letter to its corresponding uppercase letter. Specifically, if the Unicode code point of the character \( c \) is between 97 ('a') and 122 ('z') inclusive, it returns the character with code point reduced by 32 (yielding 'A' to 'Z'); otherwise, it returns the character unchanged.