File tree Expand file tree Collapse file tree 2 files changed +21
-3
lines changed Expand file tree Collapse file tree 2 files changed +21
-3
lines changed Original file line number Diff line number Diff line change @@ -190,8 +190,26 @@ encodeLatin1Lax = encodeLatin1
190190-- UTF-8 decoding
191191-------------------------------------------------------------------------------
192192
193+ -- CodePoint represents a specific character in the Unicode standard.
194+ -- The code point is a numerical value assigned to each character,
195+ -- and UTF-8 encoding uses a variable number of bytes to represent
196+ -- different code points.
197+ --
198+ -- Calculate the code point value: Depending on the type of the leading byte,
199+ -- extract the significant bits from each byte of the sequence and combine them
200+ -- to form the complete code point value. The specific bit manipulations will
201+ -- differ based on the number of bytes used.
193202-- Int helps in cheaper conversion from Int to Char
194203type CodePoint = Int
204+
205+ -- DecodeState refers to the number of bytes remaining to complete the current
206+ -- UTF-8 character decoding. For ASCII characters (code points 0 to 127),
207+ -- no decoding state is necessary because they are represented by a single byte.
208+ -- Therefore, the decoding state for ASCII characters can be considered as 0.
209+ -- For multi-byte characters, the decoding state indicates the number of bytes
210+ -- remaining to complete the character. It is usually initialized to a non-zero
211+ -- value corresponding to the number of bytes in the multi-byte character, e.g
212+ -- DecodeState will be 1 for 2-bytes char.
195213type DecodeState = Word8
196214
197215-- We can divide the errors in three general categories:
Original file line number Diff line number Diff line change 7676module Streamly.Unicode.Stream
7777 (
7878 DecodeState
79- , DecodeError (.. )
8079 , CodePoint
81-
82- -- * Construction (Decoding)
80+ , DecodeError (.. )
81+
82+ -- * Resumable UTF-8 decoding
8383 , decodeLatin1
8484 , decodeUtf8
8585 , decodeUtf8'
You can’t perform that action at this time.
0 commit comments