Init.Data.Char.Basic

isValidChar definition

(n : UInt32) : Prop

Full source

/-- Determines if the given integer is a valid [Unicode scalar value](https://www.unicode.org/glossary/#unicode_scalar_value).

Note that values in `[0xd800, 0xdfff]` are reserved for [UTF-16 surrogate pairs](https://en.wikipedia.org/wiki/Universal_Character_Set_characters#Surrogates).
-/
@[inline, reducible] def isValidChar (n : UInt32) : Prop :=
  n < 0xd800 ∨ (0xdfff < n ∧ n < 0x110000)

Valid Unicode scalar value

Informal description

A 32-bit unsigned integer $ n $ is a valid Unicode scalar value if it satisfies either $ n < 0xd800 $ or $ 0xdfff < n < 0x110000 $. This excludes the surrogate pair range $[0xd800, 0xdfff]$ and ensures the value is within the valid Unicode range.

Char.lt definition

(a b : Char) : Prop

Full source

/--
One character is less than another if its code point is strictly less than the other's.
-/
protected def lt (a b : Char) : Prop := a.val < b.val

Character ordering by code point

Informal description

For two Unicode characters $ a $ and $ b $, the relation $ a < b $ holds if and only if the code point of $ a $ is strictly less than the code point of $ b $.

Char.le definition

(a b : Char) : Prop

Full source

/--
One character is less than or equal to another if its code point is less than or equal to the
other's.
-/
protected def le (a b : Char) : Prop := a.val ≤ b.val

Character ordering by code points

Informal description

For two Unicode characters $a$ and $b$, the relation $a \leq b$ holds if and only if the code point of $a$ is less than or equal to the code point of $b$.

Char.instLT instance

: LT Char

Full source

instance : LT Char := ⟨Char.lt⟩

The Less-Than Relation on Unicode Characters

Informal description

The type of Unicode characters is equipped with a canonical less-than relation, where for any two characters $a$ and $b$, $a < b$ holds if and only if the code point of $a$ is strictly less than the code point of $b$.

Char.instLE instance

: LE Char

Full source

instance : LE Char := ⟨Char.le⟩

Ordering on Unicode Characters by Code Points

Informal description

Unicode characters are equipped with a canonical ordering relation $\leq$ based on their code points.

Char.instDecidableLt instance

(a b : Char) : Decidable (a < b)

Full source

instance (a b : Char) :  Decidable (a < b) :=
  UInt32.decLt _ _

Decidability of Strict Order on Unicode Characters

Informal description

For any two Unicode characters $a$ and $b$, the strict order relation $a < b$ is decidable. Here, $a < b$ holds if and only if the code point of $a$ is strictly less than the code point of $b$.

Char.instDecidableLe instance

(a b : Char) : Decidable (a ≤ b)

Full source

instance (a b : Char) : Decidable (a ≤ b) :=
  UInt32.decLe _ _

Decidability of the Ordering on Unicode Characters

Informal description

For any two Unicode characters $a$ and $b$, the proposition $a \leq b$ (where $\leq$ is the canonical ordering based on code points) is decidable.

Char.isValidCharNat abbrev

(n : Nat) : Prop

Full source

/--
True for natural numbers that are valid [Unicode scalar
values](https://www.unicode.org/glossary/#unicode_scalar_value).
-/
abbrev isValidCharNat (n : Nat) : Prop :=
  n < 0xd800 ∨ (0xdfff < n ∧ n < 0x110000)

Natural Number as Valid Unicode Scalar Value

Informal description

A natural number $n$ is a valid Unicode scalar value if it satisfies the conditions specified by the Unicode standard for scalar values.

Char.isValidUInt32 theorem

(n : Nat) (h : isValidCharNat n) : n < UInt32.size

Full source

theorem isValidUInt32 (n : Nat) (h : isValidCharNat n) : n < UInt32.size := by
  match h with
  | Or.inl h        =>
    apply Nat.lt_trans h
    decide
  | Or.inr ⟨_,  h₂⟩ =>
    apply Nat.lt_trans h₂
    decide

Valid Unicode Scalar Values are Below $2^{32}$

Informal description

For any natural number $n$ that is a valid Unicode scalar value (i.e., satisfies `isValidCharNat n`), it holds that $n$ is strictly less than $2^{32}$ (the size of the `UInt32` type).

Char.isValidChar_of_isValidCharNat theorem

(n : Nat) (h : isValidCharNat n) : isValidChar (UInt32.ofNatLT n (isValidUInt32 n h))

Full source

theorem isValidChar_of_isValidCharNat (n : Nat) (h : isValidCharNat n) : isValidChar (UInt32.ofNatLT n (isValidUInt32 n h)) :=
  match h with
  | Or.inl h =>
    Or.inl (UInt32.ofNatLT_lt_of_lt _ (by decide) h)
  | Or.inr ⟨h₁, h₂⟩ =>
    Or.inr ⟨UInt32.lt_ofNatLT_of_lt _ (by decide) h₁, UInt32.ofNatLT_lt_of_lt _ (by decide) h₂⟩

Preservation of Unicode Validity from Natural Numbers to UInt32

Informal description

For any natural number $n$ that is a valid Unicode scalar value (i.e., satisfies `isValidCharNat n`), the unsigned 32-bit integer representation of $n$ (obtained via `UInt32.ofNatLT`) is also a valid Unicode scalar value.

Char.isValidChar_zero theorem

: isValidChar 0

Full source

theorem isValidChar_zero : isValidChar 0 :=
  Or.inl (by decide)

Validity of Zero as a Unicode Scalar Value

Informal description

The Unicode scalar value corresponding to the zero unsigned 32-bit integer (0) is valid, i.e., it satisfies the condition for being a valid Unicode character.

Char.toNat definition

(c : Char) : Nat

Full source

/--
The character's Unicode code point as a `Nat`.
-/
@[inline] def toNat (c : Char) : Nat :=
  c.val.toNat

Unicode code point of a character

Informal description

The function maps a Unicode character $ c $ to its Unicode code point as a natural number.

Char.toUInt8 definition

(c : Char) : UInt8

Full source

/--
Converts a character into a `UInt8` that contains its code point.

If the code point is larger than 255, it is truncated (reduced modulo 256).
-/
@[inline] def toUInt8 (c : Char) : UInt8 :=
  c.val.toUInt8

Character to unsigned 8-bit integer conversion

Informal description

The function maps a Unicode character $ c $ to an unsigned 8-bit integer representing its code point modulo 256 (truncated if the code point exceeds 255).

Char.ofUInt8 definition

(n : UInt8) : Char

Full source

/--
Converts an 8-bit unsigned integer into a character.

The integer's value is interpreted as a Unicode code point.
-/
def ofUInt8 (n : UInt8) : Char := ⟨n.toUInt32, .inl (Nat.lt_trans n.toBitVec.isLt (by decide))⟩

Character from unsigned 8-bit integer

Informal description

The function converts an unsigned 8-bit integer $ n $ into a Unicode character, interpreting the integer's value as a Unicode code point.

Char.instInhabited instance

: Inhabited Char

Full source

instance : Inhabited Char where
  default := 'A'

Inhabited Type of Unicode Characters

Informal description

The type of Unicode characters is inhabited, with a default character (typically the null character `'\0'`).

Char.isWhitespace definition

(c : Char) : Bool

Full source

/--
Returns `true` if the character is a space `(' ', U+0020)`, a tab `('\t', U+0009)`, a carriage
return `('\r', U+000D)`, or a newline `('\n', U+000A)`.
-/
@[inline] def isWhitespace (c : Char) : Bool :=
  c = ' ' || c = '\t'c = ' ' || c = '\t' || c = '\r'c = ' ' || c = '\t' || c = '\r' || c = '\n'

Check if character is whitespace

Informal description

The function returns `true` if the Unicode character `c` is a whitespace character, specifically the space character `' '` (U+0020), tab `'\t'` (U+0009), carriage return `'\r'` (U+000D), or newline `'\n'` (U+000A), and `false` otherwise.

Char.isUpper definition

(c : Char) : Bool

Full source

/--
Returns `true` if the character is a uppercase ASCII letter.

The uppercase ASCII letters are the following: `ABCDEFGHIJKLMNOPQRSTUVWXYZ`.
-/
@[inline] def isUpper (c : Char) : Bool :=
  c.val ≥ 65 && c.val ≤ 90

Check if character is uppercase ASCII letter

Informal description

The function returns `true` if the Unicode character `c` is an uppercase ASCII letter (i.e., its code point is between 65 and 90 inclusive, corresponding to `A` to `Z`), and `false` otherwise.

Char.isLower definition

(c : Char) : Bool

Full source

/--
Returns `true` if the character is a lowercase ASCII letter.

The lowercase ASCII letters are the following: `abcdefghijklmnopqrstuvwxyz`.
-/
@[inline] def isLower (c : Char) : Bool :=
  c.val ≥ 97 && c.val ≤ 122

Check if a character is a lowercase ASCII letter

Informal description

The function returns `true` if the Unicode character `c` is a lowercase ASCII letter, i.e., its code point is between 97 (`'a'`) and 122 (`'z'`) inclusive.

Char.isAlpha definition

(c : Char) : Bool

Full source

/--
Returns `true` if the character is an ASCII letter.

The ASCII letters are the following: `ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz`.
-/
@[inline] def isAlpha (c : Char) : Bool :=
  c.isUpper || c.isLower

Check if character is an ASCII letter

Informal description

The function returns `true` if the Unicode character `c` is an ASCII letter (either uppercase `A`-`Z` or lowercase `a`-`z`), and `false` otherwise.

Char.isDigit definition

(c : Char) : Bool

Full source

/--
Returns `true` if the character is an ASCII digit.

The ASCII digits are the following: `0123456789`.
-/
@[inline] def isDigit (c : Char) : Bool :=
  c.val ≥ 48 && c.val ≤ 57

Check if character is an ASCII digit

Informal description

The function returns `true` if the Unicode character `c` is an ASCII digit, i.e., its code point lies between 48 (for '0') and 57 (for '9') inclusive.

Char.isAlphanum definition

(c : Char) : Bool

Full source

/--
Returns `true` if the character is an ASCII letter or digit.

The ASCII letters are the following: `ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz`.
The ASCII digits are the following: `0123456789`.
-/
@[inline] def isAlphanum (c : Char) : Bool :=
  c.isAlpha || c.isDigit

Check if character is an ASCII letter or digit

Informal description

The function returns `true` if the Unicode character `c` is an ASCII letter (either uppercase `A`-`Z` or lowercase `a`-`z`) or an ASCII digit (`0`-`9`), and `false` otherwise.

Char.toLower definition

(c : Char) : Char

Full source

/--
Converts an uppercase ASCII letter to the corresponding lowercase letter. Letters outside the ASCII
alphabet are returned unchanged.

The uppercase ASCII letters are the following: `ABCDEFGHIJKLMNOPQRSTUVWXYZ`.
-/
def toLower (c : Char) : Char :=
  let n := toNat c;
  if n >= 65 ∧ n <= 90 then ofNat (n + 32) else c

Lowercase conversion of ASCII letters

Informal description

The function converts an uppercase ASCII letter to its corresponding lowercase letter. Specifically, if the Unicode code point of the character $ c $ is between 65 (for 'A') and 90 (for 'Z') inclusive, it returns the character with code point increased by 32 (to get the lowercase equivalent). Otherwise, it returns the character unchanged.

Char.toUpper definition

(c : Char) : Char

Full source

/--
Converts a lowercase ASCII letter to the corresponding uppercase letter. Letters outside the ASCII
alphabet are returned unchanged.

The lowercase ASCII letters are the following: `abcdefghijklmnopqrstuvwxyz`.
-/
def toUpper (c : Char) : Char :=
  let n := toNat c;
  if n >= 97 ∧ n <= 122 then ofNat (n - 32) else c

Uppercase conversion of ASCII letters

Informal description

The function converts a lowercase ASCII letter to its corresponding uppercase letter. Specifically, if the Unicode code point of the character $ c $ is between 97 ('a') and 122 ('z') inclusive, it returns the character with code point reduced by 32 (yielding 'A' to 'Z'); otherwise, it returns the character unchanged.