Friday, 1 January 2016

What's new in Unicode 9.0 ?

Unicode version 9.0 is scheduled for release in June 2016. The final repertoire is now fixed, and 7,500 characters (including 72 emoji) will be added to Unicode 9.0. This will bring the total number of graphic and format characters in the Unicode Standard to 128,172 characters (in case you are concerned that Unicode is running out of space, that still leaves room for another 846,293 characters to be encoded). In summary, Unicode 9.0 wil include 11 new blocks (named ranges of characters) and cover 6 new scripts (Osage, Newa, Bhaiksuki, Marchen, Tangut, and Adlam), making a total of 270 blocks and 135 scripts.


Emoji

74 Emoji characters have been accepted for encoding in Unicode 9.0. However, two of these characters have been de-emojified at the request of Apple: U+1F946 RIFLE (representing Shooting or Hunting) and U+1F93B MODERN PENTATHLON (which includes Pistol Shooting as one of its disciplines) will have no Unicode properties to suggest that they are emoji. So the two characters will still be encoded in Unicode 9.0, but as plain symbols not as emoji characters; and it is unlikely that any major vendors will implement them as emoji.


Provisional
Code Point
Sample Colour
Emoji Image
Provisional Character Name Source
U+1F57A MAN DANCING
Encoded to match U+1F483 💃 DANCER (typically implemented as a female dancer)
L2/15-054
U+1F5A4 BLACK HEART
unequivocally represented as black in all variants
Encoded because there is a need for a black-coloured heart emoji, and U+2764 ❤ HEAVY BLACK HEART is typically implemented as a red heart
L2/15-054
U+1F6D1 OCTAGONAL SIGN
stop sign
L2/15-054
U+1F6D2 SHOPPING TROLLEY
shopping cart
L2/15-195
U+1F6F4 SCOOTER L2/15-054
U+1F6F5 MOTOR SCOOTER L2/15-054
U+1F6F6 CANOE L2/15-196
U+1F919 CALL ME HAND L2/15-054
U+1F91A RAISED BACK OF HAND L2/15-054
U+1F91B LEFT-FACING FIST L2/15-054
U+1F91C RIGHT-FACING FIST L2/15-054
U+1F91D HANDSHAKE L2/15-054
U+1F91E HAND WITH INDEX AND MIDDLE FINGERS CROSSED L2/15-054
U+1F920 FACE WITH COWBOY HAT L2/15-054
U+1F921 CLOWN FACE L2/15-054
U+1F922 NAUSEATED FACE L2/15-054
U+1F923 ROLLING ON THE FLOOR LAUGHING L2/15-054
U+1F924 DROOLING FACE L2/15-054
U+1F925 LYING FACE L2/15-054
U+1F926 FACE PALM L2/15-054
U+1F927 SNEEZING FACE
gesundheit
L2/15-195
U+1F930 PREGNANT WOMAN L2/15-054
U+1F933 SELFIE
typically used with face or human figure
L2/15-054
U+1F934 PRINCE
Encoded to match U+1F478 👸 PRINCESS
L2/15-054
U+1F935 MAN IN TUXEDO
groom
Encoded to match U+1F470 👰 BRIDE WITH VEIL
L2/15-054
U+1F936 MOTHER CHRISTMAS
Mrs. Claus
Encoded to match U+1F385 🎅 FATHER CHRISTMAS
L2/15-054
U+1F937 SHRUG L2/15-054
U+1F938 PERSON DOING CARTWHEEL
gymnastics
L2/15-196
U+1F939 JUGGLING L2/15-195
U+1F93A FENCER
fencing
L2/15-196
U+1F93B MODERN PENTATHLON
NOT AN EMOJI (see above)
L2/15-196
U+1F93C WRESTLERS
wrestling
L2/15-196
U+1F93D WATER POLO L2/15-196
U+1F93E HANDBALL L2/15-196
U+1F940 WILTED FLOWER
wiltered rose
L2/15-054
U+1F941 DRUM WITH DRUMSTICKS L2/15-195
U+1F942 CLINKING GLASSES L2/15-054
U+1F943 TUMBLER GLASS
typically shown with iced drink
whisky
L2/15-195
U+1F944 SPOON L2/15-195
U+1F945 GOAL NET L2/15-196
U+1F946 RIFLE
marksmanship, shooting (Olympic sport)
hunting
NOT AN EMOJI (see above)
L2/15-196
U+1F947 FIRST PLACE MEDAL
gold medal
L2/15-196
U+1F948 SECOND PLACE MEDAL
silver medal
L2/15-196
U+1F949 THIRD PLACE MEDAL
bronze medal
L2/15-196
U+1F94A BOXING GLOVE
boxing
L2/15-196
U+1F94B MARTIAL ARTS UNIFORM
judo and other martial arts
L2/15-196
U+1F950 CROISSANT L2/15-054
U+1F951 AVOCADO L2/15-054
U+1F952 CUCUMBER L2/15-054
U+1F953 BACON L2/15-054
U+1F954 POTATO L2/15-054
U+1F955 CARROT L2/15-054
U+1F956 BAGUETTE BREAD
french bread
L2/15-195
U+1F957 GREEN SALAD L2/15-195
U+1F958 SHALLOW PAN OF FOOD
paella, casserole
L2/15-195
U+1F959 STUFFED FLATBREAD
döner kebab, falafel, gyro, shawarma
L2/15-195
U+1F95A EGG L2/15-267
U+1F95B GLASS OF MILK L2/15-267
U+1F95C PEANUTS L2/15-267
U+1F95D KIWIFRUIT L2/15-267
U+1F95E PANCAKES L2/15-267
U+1F985 EAGLE L2/15-054
U+1F986 DUCK L2/15-054
U+1F987 BAT L2/15-054
U+1F988 SHARK L2/15-054
U+1F989 OWL L2/15-054
U+1F98A FOX FACE L2/15-054
U+1F98B BUTTERFLY L2/15-195
U+1F98C DEER L2/15-195
U+1F98D GORILLA L2/14-092
L2/15-195
U+1F98E LIZARD L2/15-195
U+1F98F RHINOCEROS L2/15-195
U+1F990 SHRIMP L2/15-267
U+1F991 SQUID L2/15-267

NB The above code points and character names are subject to change, and should not be relied on at this point in time.

Sources


These characters are currently under ISO ballot for inclusion in ISO/IEC 10646:2016 (5th ed.) (see WG2 N4705 pages 130, 131, 135, and 137–138). Most of the 8,514 characters in this document will feed into Unicode version 10.0 in June 2017, but due to the urgent need of netizens to be able to use new emoji at the earliest possible date, the Unicode Technical Committee (UTC) has a habit (policy?) of fast-tracking emoji characters into the Unicode standard out of synchronization with the corresponding ISO standard (ISO/IEC 10646). On January 26 these 74 emoji characters were authorized for inclusion in the Unicode 9.0 beta, and unless any national bodies have strong and compelling objections to any of these emoji characters in the current CD ballot (which closes 29 February 2016), then these 74 emoji characters will definitely be in Unicode 9.0. A final decision will be made when the UTC meets in early May 2016.

In the end, at the UTC meeting in May 2016, the UTC decided to only accept 72 emoji characters. At the request of Apple (in response to several well-publicized emoji gun incidents, and a campaign against adding more violent emoji to Unicode), U+1F946 RIFLE and U+1F93B MODERN PENTATHLON (which includes shooting as one of its disciplines) were de-emojified, and will be encoded in Unicode 9.0 as plain non-emoji symbols. Of course, people can still use U+1F946 🥆 RIFLE (or various combinations of the letters A-Z, and many other Unicode characters) to threaten other people in text messages, but the threats will not need to be taken seriously because the rifle character will not be displayed in colour (and it is quite likely that major vendors will not support this character at all in their fonts).


More Emoji to Look Forward to ...

Proposals by Jennifer 8. Lee and friends to encode emoji characters representing chopsticks, dumplings, fortune cookies, and Chinese takeout boxes were joyfully accepted by the shadowy Emoji subcommittee at the January UTC meeting, but they were submitted too late for inclusion in Unicode 9.0 — we can look forwrd to welcoming them into Unicode 10.0 in June 2017.


Alolita Sharma (@alolita) : #UTC146 Peter Edberg accepts #dumpling #chopsticks #fortunecookie #takeoutbox originals from emoji designer YiyingLu (25 January 2016)



It's Not All About Emoji !

Emoji make up 99% of the noise and hype surrounding Unicode 9.0, but they account for only 1% of the new characters.

7,227 of the 7,426 non-emoji characters to be added to Unicode 9.0 are included in ISO/IEC 10646:2014 (4th ed.) Amendment 2, and are highlighted in this document (along with one currency sign, nine CJK unified ideographs, 36 emoji characters, and 5 emoji modifier characters which were fast-tracked into Unicode 8.0). These characters have all been through at least two rounds of ISO technical ballots, and they are now stable (they cannot be moved, removed, or renamed). The remaining 199 characters are included in the Committe Draft for ISO/IEC 10646:2016 (5th ed.) (full draft is downloadable as N4446). This edition has not yet completed its two rounds of technical ballots by ISO national bodies, but the UTC has decided to fast-track the Adlam script, the Newa script, and Japanese TV symbols (in addition to the 74 emoji discussed above) into Unicode 9.0. It is not unusual for the UTC to fast-track urgently-required characters (such as currency symbols and emoji) into a version of Unicode before they have completed their final technical ballot, but it is unprecedented to fast-track complete scripts, especially when the first technical ballot has not yet completed.

Newa in particular has been a very difficult script to get encoded because of technical and political differences of opinion about what characters to include and the encoding model to use (see the long list of documents relating to Newa in the table below). As recently as the first ballot on the Committee Draft for ISO/IEC 10646 in August 2015 the UK national body expressed concerns over the encoding of murmured resonants as atomic characters (L2/15-262 p. 16), so the encoding of Newa cannot be considered to be uncontroversial. By fast-tracking Adlam and Newa into Unicode 9.0, the UTC has effectively stiffled any ISO national body opposition to the Newa repertoire that the UTC has agreed upon. The CD ballot for ISO/IEC 10646 closes 29 February 2016, which theoretically allows the UTC time to tweak (or even withdraw) any of the fast-tracked characters in response to ballot comments by ISO national bodies, but any requests to change the character repertoire, character positions or character names for Newa or Adlam in the final ISO technical ballot (DIS ballot) later this year will have to be rejected as the encoding of Newa and Adlam is already a fait accompli.

Fast-tracked characters from the ISO/IEC 10646 CD are marked ** in the tables below.


7,297 of the 7,500 new characters in Unicode 9.0 belong to six new scripts :


Inscription in the Marchen script on the library of the Yungdrung Bon Monastery in Dolanji (Himachal Pradesh)

Photograph © Chris Hatchell


Of the 7,500 characters added to Unicode 9.0 (including the 74 emoji), 7,357 characters are included in 11 new blocks, and 143 characters are added to existing blocks, as detailed in the two tables below. The code points and character names for all these characters are now fixed, and will not be changed. Draft official Unicode data files are available here, and I have made a plain text list of all the new characters to be added to Unicode 9.0 available here.


Characters Added to New Blocks
Block Name Range Characters / Source Documents
Cyrillic Extended-C 1C80..1C8F

9 letters used in early Church Slavonic (1C80..1C88).


Aleksandr Andreev, Yuri Shardt, and Nikita Simmons, "Proposal to Use Standardized Variation Sequences to Encode Church Slavonic Glyph Variants in Unicode" (2014-07-20) [L2/13-153]

Aleksandr Andreev, Yuri Shardt, and Nikita Simmons, "Proposal to Encode Additional Cyrillic Characters used in Early Church Slavonic Printed Books" (2014-08-20) [WG2 N4607 || L2/14-196]

Osage 104B0..104FF

72 letters for Osage: 36 uppercase letters (104B0..104D3) and 36 lowercase letters (104D8..104FB).


Michael Everson, Herman Mongrain Lookout, and Cameron Pratt, "Preliminary proposal to encode the Osage script in the UCS" (2014-02-20) [WG2 N4548 || L2/14-068]

Michael Everson, Herman Mongrain Lookout, and Cameron Pratt, "Proposal to encode Latin characters for Osage in the UCS" (2014-07-30) [WG2 N4587 || L2/14-175]

Michael Everson, Herman Mongrain Lookout, and Cameron Pratt, "Final proposal to encode the Osage script in the UCS" (2014-09-21) [WG2 N4619 || L2/14-214]

Newa ** 11400..1147F

92 characters for Newa: 53 letters (11400..11434); 13 vowel signs (11435..11441); 7 other signs (11442..11448); an Om character (11449); a Siddhi character (1144A); 5 punctuation marks (1144B..1144F); 10 digits (11450..11459); a placeholder mark (1145B); and an insertion sign (1145D).


Anshuman Pandey, "Preliminary Proposal to Encode the Prachalit Nepal Script in ISO/IEC 10646" (2011-05-03) [WG2 N4038 || L2/11-152]

Anshuman Pandey, "Preliminary Proposal to Encode the Newar Script in ISO/IEC 10646" (2012-02-29) [WG2 N4184 || L2/12-003]

Dev Dass Manandhar, Samir Karmacharya and Bishnu Chitrakar, "Proposal for the Nepālalipi script in the UCS" (2012-02-05) [WG2 N4322 || L2/12-120]

Ken Whistler, "On the encoding of the “Nepaalalipi” / “Newar” script" (2012-05-11) [L2/12-200]

Dev Dass Manandhar, "Response to L2/12-200 “On the encoding of ‘Nepaalalipi’/‘Newar’ script”" (2012-07-21) [L2/12-244]

Dev Dass Manandhar, "Ancillary materials on “breathy consonants” in “Nepaalalipi”" (2012-07-21) [L2/12-245]

Iain Sinclair, "Letter in support of N4184 and encoding the Newar script in ISO/IEC 10646" (2012-10-22) [WG2 N4372 || L2/12-336]

Dev Dass Manandhar, Samir Karmacharya and Bishnu Chitrakar, "Proposal for the Nepaalalipi script in the UCS" (2012-10-29) [L2/12-349]

Pat Hall, "Proposal to Encode Nepal Himalayish Scripts in ISO/IEC 10646" (2012-10-08) [WG2 N4347 || L2/12-365]

Deborah Anderson, "Comparison between Newar and Nepaalalipi proposals (L2/12‐003 and L2/12‐349)" (2012-11-08) [L2/12-390]

Dev Dass Manandhar, Bishnu Chitrakar and Samir Karmacharya, "To Unicode Technical Committee (UTC)" (2013-01-28) [L2/13-029]

Dev Dass Manandhar, Samir Karmacharya and Bishnu Chitrakar, "Proposal to Encode Nepaalalipi Script in ISO/IEC 10646" (2014-04-10) [L2/14-086]

Deborah Anderson, "Comparison between Newar and Nepaalalipi proposals (L2/12‐003 and L2/14‐086)" (2014-09-23) [L2/14-220]

Deborah Anderson, "Recommendations to UTC from Script Meeting in Nepal" (2014-10-06) [L2/14-253]

Anshuman Pandey, "Response to the Recommendation for Nepalese Scripts in L2/14-253" (2014-10-21) [WG2 N4602 || L2/14-258]

Ken Whistler, "Rationale for Atomic Encoding of Murmured Resonants in Newa" (2014-10-27) [L2/14-281]

Anshuman Pandey, "Specimen Showing Representation of Murmured Consonants in the Newar Script" (2014-10-28) [WG2 N4552 || L2/14-290]

Ken Whistler, "Towards a Consensus Encoding of Newa" (2014-12-04) [WG2 N4660 || L2/14-285]

Mongolian Supplement 11660..1167F

13 head marks for Mongolian (11660..1166C).


Aaron Bell, Greg Eck, Andrew Glass, and Andrew West, "Encoding Mongolian head letters" (2014-01-17) [L2/14-030]

Aaron Bell, Greg Eck, Andrew Glass, and Andrew West, "Proposal to encode five Mongolian head marks" (2014-02-06) [WG2 N4542 || L2/14-067]

China, "Comments on N4542 Five Mongolian Head Marks" (2014-02-19) [WG2 N4547 || L2/14-081]

China, "A Letter to the Authors of N4542 (5 Birgas in Mongolian Block" (2014-09-23) [WG2 N4632 || L2/14-240]

Bhaiksuki 11C00..11C6F

97 characters for Bhaiksuki: 46 letters (11C00..11C08, 11C0A..11C2E); 12 vowel signs (11C2F..11C36, 11C38..11C3B); 5 other signs (11C3C..11C40); 2 dandas (11C41..11C42); a word separator (11C43); 2 gap fillers (11C44..11C45); 10 decimal digits (11C50..11C59); 18 numbers (11C5A..11C6B); and a hundreds unit mark (11C6C).


Anshuman Pandey and Dragomir Dimitrov, "Proposal to Encode the Bhaiksuki Script in ISO/IEC 10646" (2013-07-22) [WG2 N4469 || L2/13-167]

Anshuman Pandey and Dragomir Dimitrov, "Revised Proposal to Encode the Bhaiksuki Script in ISO/IEC 10646" (2013-10-27) [WG2 N4489 || L2/13-194]

Anshuman Pandey and Dragomir Dimitrov, "Revised Proposal to Encode the Bhaiksuki Script in ISO/IEC 10646" (2014-01-27) [L2/14-036]

Anshuman Pandey and Dragomir Dimitrov, "Final Proposal to Encode the Bhaiksuki Script in ISO/IEC 10646" (2014-04-23) [WG2 N4573 || L2/14-091]

Marchen 11C70..11CBF

68 characters for Marchen: 2 marks (11C70..11C71); 30 letters (11C72..11C8F); 29 subjoined letters (11C92..11CA7, 11CA9..11CAF); 5 vowel signs (11CB0..11CB4); and 2 other signs (11CB5..11CB6).


Andrew West, "Proposal to encode the Marchen script in the SMP of the UCS" (2011-04-30) [WG2 N4032 || L2/11-140]

Andrew West, "Final proposal to encode the Marchen script in the SMP of the UCS" (2013-10-22) [WG2 N4491 || L2/13-197]

Ideographic Symbols and Punctuation 16FE0..16FFF

1 iteration mark for Tangut (16FE0).


See under Tangut.

Tangut 17000..187FF

6,125 Tangut ideographs (17000..187EC) [characters are named algorithmically based on their code point, as TANGUT IDEOGRAPH-hhhhh].


Richard Cook (UC Berkeley Script Encoding Initiative), "Proposal to encode Tangut characters in UCS Plane 1" (2007-05-09) [WG2 N3297 || L2/07-143] [Multi Column Chart : WG2 N3297A || L2/07-144] [Single Column Chart: WG2 N3297B || L2/07-145]

Richard Cook (UC Berkeley Script Encoding Initiative), "Tangut Proposal Code Chart Update" (2007-07-24) [L2/07-229]

Richard Cook (UC Berkeley Script Encoding Initiative), "Tangut Background" (2007-09-01) [WG2 N3307 || L2/07-289]

China, "Response to UC Berkeley’s proposals on Tangut" (2007-09-16) [WG2 N3338 || L2/07-301]

Richard Cook, "Expert feedback on Chinese NB input on WG2/N3297 Tangut Encoding Proposal" (2007-09-17) [WG2 N3343 || L2/07-302]

UK, "Comments on N3297: Proposal to encode Tangut characters in UCS Plane 1 and Charts" (2008-04-19) [WG2 N3448 || L2/08-175]

China and US, "Comments on N3297: Proposal to encode Tangut characters in UCS Plane 1 and charts" (2008-04-22) [WG2 N3467 || L2/08-187]

Richard Cook, [Five column chart] (2008-07-15 / 2010-04-16) [WG2 N3822 || L2/08-259]

Richard Cook, "Single-Column Tangut Code Chart (using Column G font)" (2008-09-03) [L2/08-336]

UK, "Review of Proposed Tangut Repertoire" (2008-09-01
(rev. 2008-09-07)) [WG2 N3496 || L2/08-337]

Michael Everson and Andrew West, "Expert Feedback on the proposed Tangut character set in PDAM 6.2" (2008-09-24) [WG2 N3498 || L2/08-341]

Richard Cook and Ken Lunde, "The UCS Tangut Repertory" (2008-10-10) [WG2 N3521 || L2/08-349]

China, "Response from Tangut scholars of China on the Tangut Unicode proposal" (2008-10-13) [N3539 || L2/08-376]

Erkki I. Kolehmainen, "Report from the Ad Hoc on Tangut" (2008-10-13) [N3541 || L2/08-377]

Michael Everson, Nathan Hill, Guillaume Jacques, Andrew West, Viacheslav Zaytsev, "Proposal for a revised Tangut character set for encoding in the SMP of the UCS" (2009-03-01) [WG2 N3577 || L2/09-095]

Michael Everson, Nathan Hill, Guillaume Jacques, Andrew West, Viacheslav Zaytsev, "Proposal for a revised Tangut character set for encoding in the SMP of the UCS" (2009-04-08) [WG2 N3577R || L2/09-115] [Appendix A: WG2 N3577R-A || L2/09-116] [Appendix B: WG2 N3577R-B || L2/09-117]

Deborah Anderson and Richard Cook, "Request for Tangut font and mappings from N3577 to Amendment 7 repertoire" (2009-03-04) [WG2 N3586]

Peter Constable, "Tangut Ad-Hoc Meeting Report" (2009-04-20) [WG2 N3629 || L2/09-169]

China, Ireland, UK, "Final proposal for encoding the Tangut script in the SMP of the UCS" (2010-04-05) [WG2 N3797 || L2/10-095] [Appendix A: WG2 N3797-A] [Appendix B: WG2 N3797-B]

Deborah Anderson and Richard Cook, "Comments on Tangut proposal N3797" (2010-04-16) [WG2 N3821 || L2/10-131]

Deborah Anderson, "Tangut Ad hoc report" (2010-04-21) [WG2 N3833 || L2/10-141]

UK, "Report on Tangut Encoding" (2011-05-22) [WG2 N4033 || L2/11-214] [Appendix A: WG2 N4033A || L2/11-214] [Appendix B: WG2 N4033B || L2/11-214]

Michael Everson and Andrew West, "Tangut chart to supplement N4033 'Report on Tangut Encoding'" (2011-05-26) [WG2 N4083 || L2/11-204]

Richard Cook and Deborah Anderson, Script Encoding Initiative, UC Berkeley, "Comments on Tangut report N4033" (2011‐06‐01) [WG2 N4094]

Andrew West, Viacheslav Zaytsev, Michael Everson, "Proposal to encode the Tangut script in the UCS" (2012-10-02) [WG2 N4325 || L2/12-313]

Michael Everson and Andrew West, "Code chart for Tangut ideographs and Tangut radicals" (2012-10-02) [WG2 N4327 || L2/12-315]

China, "Comments on N4325, 4326 and N4327 (Tangut)" (2012-10-20) [WG2 N4370]

China, "Explanation on the Re-facture of Tangut Fonts" (2013-06-10) [WG2 N4455]

Deborah Anderson, SEI, UC Berkeley, "Summary of Tangut meeting (Beijing, China)" (2013-12-10) [WG2 N4516 || L2/13-241]

Andrew West, Michael Everson, Han Xiaomang, Jia Changye, Jing Yongshi, Viacheslav Zaytsev, "Proposal to encode the Tangut script in the UCS" (2014-01-21) [WG2 N4522 || L2/14-023]

Andrew West, Michael Everson, Han Xiaomang, Jia Changye, Jing Yongshi, Viacheslav Zaytsev, "Code chart for the Tangut script" (2014-01-21) [WG2 N4525 || L2/14-021]

Andrew West, Viacheslav Zaytsev, Sun Bojun, Michael Everson, "Tangut glyph corrections" (2014-10-01) [WG2 N4588R2 || L2/14-209]

China, "Review of N4558R Tangut glyph corrections" (2014-09-29) [WG2 N4640]

Deborah Anderson, "Ad Hoc Reports for Tangut and Khitan Large Script" (2014-09-29) [WG2 N4642 || L2/14-246]

Andrew West, Viacheslav Zaytsev, Michael Everson, "Discussion of Tangut character L2008-4148" (2014-12-01) [WG2 N4650 || L2/14-301]

Andrew West, Michael Everson, Viacheslav Zaytsev, "Review of Tangut repertoire in DAM ballot" (2015-07-16) [WG2 N4667 || L2/15-175]

China, "Reply to WG2N4650 and WG2N4667 on Tangut" (2015-10-13) [WG2 N4684 || L2/15-279]

Tangut Components 18800..18AFF

755 Tangut radicals and character components (18800..18AF2).


Michael Everson and Andrew West, "Proposal to encode Tangut Radicals and CJK Strokes in the UCS" (2008-09-01) [WG2 N3495 || L2/08-335]

Richard Cook and Deborah Anderson, "Comments on the Tangut radicals and strokes proposal (N3495 = L2/08‐335)" (2008-10-29) [L2/08-399]

Andrew West, Viacheslav Zaytsev, Michael Everson, "Proposal to encode Tangut radicals in the UCS" (2012-10-02) [WG2 N4326 || L2/12-314]

Michael Everson and Andrew West, "Code chart for Tangut ideographs and Tangut radicals" (2012-10-02) [WG2 N4327 || L2/12-315]

Andrew West, Viacheslav Zaytsev, Sun Bojun, Michael Everson, "Proposal to encode Tangut radicals in the UCS" (2014-09-30) [WG2 N4636 || L2/14-228]

Glagolitic Supplement 1E000..1E02F

38 combining letters for Glagolitic (1E000..1E006, 1E008..1E018, 1E01B..1E021, 1E023..1E024, 1E026..1E02A).


Aleksandr Andreev, Heinz Miklas, and Yuri Shardt, "Proposal to Encode Combining Glagolitic Letters in Unicode" (2014-08-20) [WG2 N4608 || L2/14-087]

Ralph Cleminson and David Birnbaum, "Expert Feedback on L2/14-087 Proposal to Encode Additional Glagolitic Characters" (2014-04-27) [WG2 N4608 || L2/14-103]

Ralph Cleminson and David Birnbaum, "Additional Expert Feedback on L2/14‐087 Proposal to Encode Additional Glagolitic Characters" (2014-07-21) [WG2 N4608 || L2/14-165]

Adlam ** 1E900..1E95F

87 characters for Adlam: 34 uppercase letters (1E900..1E921); 34 lowercase letters (1E922..1E943); 7 marks (1E944..1E94A); 10 digits (1E950..1E959); and 2 punctuation marks (1E95E..1E95F).


Michael Everson, "Preliminary proposal for encoding the Adlam script in the SMP of the UCS" (2012-10-28) [WG2 N4488 || L2/13-191]

Michael Everson, "Proposal for encoding the Adlam script in the SMP of the UCS" (2014-09-23) [WG2 N4628 || L2/14-219]


Leaf from a Tangut Buddhist manuscript (Great Perfection of Wisdom Sutra)

Tang. 334/249


Characters Added to Existing Blocks
Block Name Range Characters / Source Documents
Arabic Extended-A 08A0..08FF

5 Arabic letters for Bravanese (08B6..08BA).


Hamid Banafunzi, Marghani Banafunzi, and Maxamed Nuur, "Proposal to encode five Arabic script characters for the Bravanese (Chimiini)" (2014-08-31) [L2/13-178]

Roozbeh Pournader, "Proposal to encode four Arabic characters for Bravanese" (2014-11-06) [WG2 N4498 || L2/13-223]

Roozbeh Pournader and Shervin Afshar, "Proposal to Encode Arabic Letter Teh with Small Teh Above for Bravanese" (2014-11-01) [L2/13-293]


3 Arabic letters for Warsh-based orthographies (08BB..08BD).


Lorna Evans (SIL International), "Supporting the Warsh orthography for Arabic script" (2014-04-29) [L2/14-104]

Lorna Evans (SIL International), "Proposal to encode Warsh‐based Arabic script characters" (2014-08-15) [WG2 N4597 || L2/14-211]


15 Quranic marks used in Pakistani printing (08D4..08E2).


Lateef Sagar Shaikh, "Proposal to encode Quranic marks used in Quran published in Pakistan" (2014-04-24) [L2/14-095]

Lateef Sagar Shaikh, "Proposal to encode Quranic Alternate Dammatan used in Quran published in Pakistan" (2014-04-25) [L2/14-096]

Roozbeh Pournader, "Proposal to encode fourteen Pakistani Quranic marks" (2014-07-27) [WG2 N4589 || L2/14-105]

Lateef Sagar Shaikh, "Proposal to encode Quranic mark Ar-Rub used in Quran published in Pakistan" (2014-08-11) [WG2 N4592 || L2/14-148]

Kannada 0C80..0CFF

1 spacing candrabindu sign (0C80).


Vinodh Rajan, "Proposal to encode Kannada Sign Spacing Candrabindu" (2014-07-18) [WG2 N4591 || L2/14-153]

Malayalam 0D00..0D7F

3 chillu letters (0D54..0D56).


Cibu Johny, "Proposal to encode MALAYALAM LETTER CHILLU LLL" (2013-05-15) [WG2 N4428 || L2/13-063]

Cibu Johny, "Proposal to encode MALAYALAM LETTER CHILLU M" (2014-01-08) [WG2 N4539 || L2/14-013]

Cibu Johny, "Proposal to encode MALAYALAM LETTER CHILLU Y" (2013-12-26) [WG2 N4539 || L2/14-017]


1 para sign (0D4F).


Cibu Johny, "Proposal to encode MALAYALAM SIGN PARA" (2014-01-16) [WG2 N4538 || L2/14-016]


10 characters for fractions (0D58..0D5E, 0D76..0D78).


Shriramana Sharma, "Proposal to encode Malayalam minor fractions" (2013-04-25) [WG2 N4429 || L2/13-051]

Combining Diacritical Marks Supplement ** 1DC0..1DFF

1 combining deletion mark for Newa (1DFB).


Ken Whistler, "Towards a Consensus Encoding of Newa" (2014-11-07) [WG2 N4660 || L2/14-285]

Miscellaneous Technical 2300..23FF

4 power button symbols (23FB..23FE).


Terence Eden, Joe Loughry, and Bruce Nordman, "Proposal to Include IEC Power Symbols" (2014-02-14) [WG2 N4567 || L2/14-009]

Michael Everson, "Towards a proposal to encode power symbols in the UCS" (2014-02-04) [WG2 N4535 || L2/14-059]

Supplemental Punctuation 2E00..2E7F

1 punctuation mark for Slavonic (2E43: DASH WITH LEFT UPTURN).


Aleksandr Andreev, Yuri Shardt, and Nikita Simmons, "Proposal to Encode a Slavonic Punctuation Mark in Unicode" (2014-02-04) [WG2 N4534 || L2/13-238]


1 suspension mark for Byzantine Greek (2E44: DOUBLE SUSPENSION MARK).


Dumbarton Oaks (Joel Kalvesmaki), "Proposal to encode GREEK BYZANTINE DOUBLE SUSPENSION MARK" (2014-07-18) [WG2 N4595 || L2/14-157]

Latin Extended-D A720..A7FF

1 letter for Unifon (A7AE: LATIN CAPITAL LETTER SMALL CAPITAL I).


Michael Everson, "Proposal to encode “Unifon” and other characters in the UCS" (2012-04-29) [WG2 N4262 || L2/12-138]

Michael Everson, "Revised proposal to encode Unifon characters in the UCS" (2014-02-24) [WG2 N4549 || L2/14-070]

Saurashtra A880..A8DF

1 candrabindu sign (A8C5).


Vinodh Rajan, "Proposal to encode Saurashtra Sign Candrabindu" (2014-08-07) [WG2 N4590 || L2/14-163]

Ancient Greek Numbers 10140..1018F

2 signs for ancient Greek (1018D..1018E).


Dumbarton Oaks (Joel Kalvesmaki), "Proposal to encode GREEK BYZANTINE INDICTION SIGN" (2014-07-18) [WG2 N4596 || L2/14-156]

Dumbarton Oaks (Joel Kalvesmaki), "Proposal to encode GREEK BYZANTINE NOMISMA SIGN" (2014-07-18) [WG2 N4594 || L2/14-158]

Khojki 11200..1124F

1 sukun sign for Arabic transliteration in the Khojki script (1123E).


Anshuman Pandey, "Proposal to Encode the Khojki Sign SUKUN in ISO/IEC 10646" (2014-05-05) [WG2 N4575 || L2/14-133]

Enclosed Alphanumeric Supplement ** 1F100..1F1FF

18 Japanese TV symbols required for ARIB STD-B62 (1F19B..1F1AC).


Japan National Body, "Proposal to include additional Japanese TV symbols to ISO/IEC 10646" (2015-07-23) [WG2 N4671 || L2/15-238]

Enclosed Ideographic Supplement ** 1F200..1F2FF

1 Japanese TV symbol required for ARIB STD-B62 (1F23B).


Japan National Body, "Proposal to include additional Japanese TV symbols to ISO/IEC 10646" (2015-07-23) [WG2 N4671 || L2/15-238]

Miscellaneous Symbols and Pictographs ** 1F300..1F5FF

2 emoji (see top of post):

1F57A : MAN DANCING

1F5A4 : BLACK HEART

Transport and Map Symbols ** 1F680..1F6FF

5 emoji (see top of post):

1F6D1 : OCTAGONAL SIGN

1F6D2 : SHOPPING TROLLEY

1F6F4 : SCOOTER

1F6F5 : MOTOR SCOOTER

1F6F6 : CANOE

Supplemental Symbols and Pictographs ** 1F900..1F9FF

67 emoji and emoticons (see top of post):

1F919 : CALL ME HAND

1F91A : RAISED BACK OF HAND

1F91B : LEFT-FACING FIST

1F91C : RIGHT-FACING FIST

1F91D : HANDSHAKE

1F91E : HAND WITH INDEX AND MIDDLE FINGERS CROSSED

1F920 : FACE WITH COWBOY HAT

1F921 : CLOWN FACE

1F922 : NAUSEATED FACE

1F923 : ROLLING ON THE FLOOR LAUGHING

1F924 : DROOLING FACE

1F925 : LYING FACE

1F926 : FACE PALM

1F927 : SNEEZING FACE

1F930 : PREGNANT WOMAN

1F933 : SELFIE

1F934 : PRINCE

1F935 : MAN IN TUXEDO

1F936 : MOTHER CHRISTMAS

1F937 : SHRUG

1F938 : PERSON DOING CARTWHEEL

1F939 : JUGGLING

1F93A : FENCER

1F93B : MODERN PENTATHLON

1F93C : WRESTLERS

1F93D : WATER POLO

1F93E : HANDBALL

1F940 : WILTED FLOWER

1F941 : DRUM WITH DRUMSTICKS

1F942 : CLINKING GLASSES

1F943 : TUMBLER GLASS

1F944 : SPOON

1F945 : GOAL NET

1F946 : RIFLE

1F947 : FIRST PLACE MEDAL

1F948 : SECOND PLACE MEDAL

1F949 : THIRD PLACE MEDAL

1F94A : BOXING GLOVE

1F94B : MARTIAL ARTS UNIFORM

1F950 : CROISSANT

1F951 : AVOCADO

1F952 : CUCUMBER

1F953 : BACON

1F954 : POTATO

1F955 : CARROT

1F956 : BAGUETTE BREAD

1F957 : GREEN SALAD

1F958 : SHALLOW PAN OF FOOD

1F959 : STUFFED FLATBREAD

1F95A : EGG

1F95B : GLASS OF MILK

1F95C : PEANUTS

1F95D : KIWIFRUIT

1F95E : PANCAKES

1F985 : EAGLE

1F986 : DUCK

1F987 : BAT

1F988 : SHARK

1F989 : OWL

1F98A : FOX FACE

1F98B : BUTTERFLY

1F98C : DEER

1F98D : GORILLA

1F98E : LIZARD

1F98F : RHINOCEROS

1F990 : SHRIMP

1F991 : SQUID


The author in front of a Tangut Buddhist inscription on the Cloud Platform at Juyong Pass

Photograph by Michael Everson (CC BY-SA 3.0)



Previous Posts on Unicode Versions



Last modified: 2016-05-24