Fossil

Check-in [2e7a6cb0]
Login

Many hyperlinks are disabled.
Use anonymous login to enable hyperlinks.

Overview
Comment:fixed a bad function name; had tested it external to fossil but didn't build/test before committing; mea culpa
Downloads: Tarball | ZIP archive | SQL archive
Timelines: family | ancestors | descendants | both | invalid_utf8_table
Files: files | file ages | folders
SHA1: 2e7a6cb03d053aad15defd49c76eab522ee07e4d
User & Date: sdr 2016-06-11 00:11:11
Original Comment: committed a function with a bad name because I C&P and foolishly didn't build or test; glad it was on a branch
Context
2016-06-11
00:13
merged from trunk check-in: 4f906e53 user: sdr tags: invalid_utf8_table
00:11
fixed a bad function name; had tested it external to fossil but didn't build/test before committing; mea culpa check-in: 2e7a6cb0 user: sdr tags: invalid_utf8_table
2016-06-10
20:45
performance optimizations check-in: 635f3b03 user: sdr tags: invalid_utf8_table
Changes
Hide Diffs Unified Diffs Ignore Whitespace Patch

Changes to src/lookslike.c.

141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
** except for the "overlong form" of \u0000 (Modified UTF-8)
** which is not considered invalid here: Some languages like
** Java and Tcl use it. This function also considers valid
** the derivatives CESU-8 & WTF-8 (as described in the same
** wikipedia article referenced previously).
*/

int invalid_utf8_b(const Blob *pContent)
{
  /* definitions for various utf-8 sequence lengths */
  static unsigned char def_2a[] = { 2, 0xC0, 0xC0, 0x80, 0x80 };
  static unsigned char def_2b[] = { 2, 0xC2, 0xDF, 0x80, 0xBF };
  static unsigned char def_3a[] = { 3, 0xE0, 0xE0, 0xA0, 0xBF, 0x80, 0xBF };
  static unsigned char def_3b[] = { 3, 0xE1, 0xEF, 0x80, 0xBF, 0x80, 0xBF };
  static unsigned char def_4a[] = { 4, 0xF0, 0xF0, 0x90, 0xBF, 0x80, 0xBF, 0x80, 0xBF };







|







141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
** except for the "overlong form" of \u0000 (Modified UTF-8)
** which is not considered invalid here: Some languages like
** Java and Tcl use it. This function also considers valid
** the derivatives CESU-8 & WTF-8 (as described in the same
** wikipedia article referenced previously).
*/

int invalid_utf8(const Blob *pContent)
{
  /* definitions for various utf-8 sequence lengths */
  static unsigned char def_2a[] = { 2, 0xC0, 0xC0, 0x80, 0x80 };
  static unsigned char def_2b[] = { 2, 0xC2, 0xDF, 0x80, 0xBF };
  static unsigned char def_3a[] = { 3, 0xE0, 0xE0, 0xA0, 0xBF, 0x80, 0xBF };
  static unsigned char def_3b[] = { 3, 0xE1, 0xEF, 0x80, 0xBF, 0x80, 0xBF };
  static unsigned char def_4a[] = { 4, 0xF0, 0xF0, 0x90, 0xBF, 0x80, 0xBF, 0x80, 0xBF };