Skip to content

Commit c50c25a

Browse files
committed
Add docs for state and codepoint
1 parent 5fa6d57 commit c50c25a

File tree

2 files changed

+21
-3
lines changed

2 files changed

+21
-3
lines changed

core/src/Streamly/Internal/Unicode/Stream.hs

Lines changed: 18 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -190,8 +190,26 @@ encodeLatin1Lax = encodeLatin1
190190
-- UTF-8 decoding
191191
-------------------------------------------------------------------------------
192192

193+
-- CodePoint represents a specific character in the Unicode standard.
194+
-- The code point is a numerical value assigned to each character,
195+
-- and UTF-8 encoding uses a variable number of bytes to represent
196+
-- different code points.
197+
--
198+
-- Calculate the code point value: Depending on the type of the leading byte,
199+
-- extract the significant bits from each byte of the sequence and combine them
200+
-- to form the complete code point value. The specific bit manipulations will
201+
-- differ based on the number of bytes used.
193202
-- Int helps in cheaper conversion from Int to Char
194203
type CodePoint = Int
204+
205+
-- DecodeState refers to the number of bytes remaining to complete the current
206+
-- UTF-8 character decoding. For ASCII characters (code points 0 to 127),
207+
-- no decoding state is necessary because they are represented by a single byte.
208+
-- Therefore, the decoding state for ASCII characters can be considered as 0.
209+
-- For multi-byte characters, the decoding state indicates the number of bytes
210+
-- remaining to complete the character. It is usually initialized to a non-zero
211+
-- value corresponding to the number of bytes in the multi-byte character, e.g
212+
-- DecodeState will be 1 for 2-bytes char.
195213
type DecodeState = Word8
196214

197215
-- We can divide the errors in three general categories:

core/src/Streamly/Unicode/Stream.hs

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -76,10 +76,10 @@
7676
module Streamly.Unicode.Stream
7777
(
7878
DecodeState
79-
, DecodeError(..)
8079
, CodePoint
81-
82-
-- * Construction (Decoding)
80+
, DecodeError(..)
81+
82+
-- * Resumable UTF-8 decoding
8383
, decodeLatin1
8484
, decodeUtf8
8585
, decodeUtf8'

0 commit comments

Comments
 (0)