Home Misc Index
  HTML Characters
 P van Diemen

Characters in HTML

The standard character set for HTML is ASCII, an old and very common standard (see ASCII table for more details).  It is a 7-bit code (0··127) but it includes so-called control codes which do not represent a graphical symbol (i.e. 'not printable').  This leaves 95 codes representing the digits (0··9), upper- and lowercase letters (A··Z, a··z) and some symbols like comma, point, parentheses, equal sign, etc. 


Other Characters

Obviously, the ASCII set is just basic;  maybe sufficient for English but lacking for other Western European languages, let alone Eastern European languages (e.g. Cyrillic) or Asiatic languages (e.g. Arabic, Indian, Chinese, …).
Apart from that, some ASCII symbols (e.g. '<') have special meanings in HTML itself (i.e. part of the HTML-syntax) and can't be used directly for normal text.  So we need a solution for that as well.

On nearly all computers, characters are stored in an 8-bit byte, and the ASCII-set uses only half of it and leaves another 128 codes (128··255) unused (you need to keep the ASCII set as HTML uses it).  So many systems use that additional code space and create their own character set extension to ASCII, providing for example â, é, ü and ñ.  In practice, accented letters can be used directly in HTML texts provided that the browser and computer are set appropriately.  For this to work:

  1. The web page must indicate what character set (extension) is used in that page.  This is done through a <meta charset=…> statement in the web page header.
  2. The browser and local computer must be capable of handling that charset:  it must use a font that has the appropriate symbols.

We will elaborate on these settings below (in Representation of Glyphs).  Well known extensions are the various (Window) Code Pages (e.g. CP1252), and the various international standards (e.g. ISO-8895-1 = Latin1). 

Though the use of such character set extensions does solve the common problems for a particular language, it proves to be insufficient in general.  And the problem with the HTML syntax symbols hasn't been solved.
This leads to Unicode (a huge character set), and mechanisms to use it.


Unicode

Unicode, or Universal Character Set (UCS), is a standard for the definition of all kinds of characters (multi-lingual symbols, 'glyphs', or 'graphemes' as Unicode calls them) like Greek, Cyrillic and even Chinese characters and Egyptian hieroglyphics, i.e. other letters and symbols than available in ASCII. 
Unicode uses a 31 bit code, providing space for billions of characters.  However, these exotic glyphs can not be presented without extra effort.  There are 2 main ways to code Unicode characters:

  1. The NCR method (next section), or
  2. Use of a multi-byte code like UTF-8 (the most common encoding, but there are others).

Unicode/UTF8 is the standard for HTML now;  Unicode is equivalent to ISO 10646.  Commonly, both methods (UTF-8 and NCR) are used.  For the most popular Unicode characters, see Diacritical characters and further.


Numerical Character Reference, NCR

For any symbol (including the standard ASCII-code), one may use:

          &«value»; Note the leading & and closing ;
   where «value» is either:
#decimal_number,
#xhexadecimal_number
   orcharacter_name.
This method is known as Numerical Character Reference (NCR), though the last variant is clearly not numerical.  Obviously, the variant with character_name is more suited for human use, but all methods are applicable.
The numerical value corresponds to the Unicode value for that character, the character_name is the Unicode name for that character.  And it works independant of any charset setting of the webpage !

The above coding method is in fact the only way to correctly denote some HTML reserved characters ('<', '>', and as & is used in this coding, '&' as well) in the ASCII-set as these characters are also used in the HTML-syntax, and the advised way to denote extensions to the ASCII set (e.g. Latin 1). 


UTF-8

UTF-8 (UCS Transformation Format) provides the coding to extend the ASCII character set with all Unicode's characters:  it encodes Unicode graphemes in 1··6 bytes, while an ASCII character still takes a single byte (in fact, only ASCII takes a single byte – all other codes take at least 2 bytes).
There are in fact 2 standards for UTF-8:  the older RFC2279 (described below), and the new RFC3629 which limits the Unicode range a bit (1,114,111 glyphs) but uses the same encoding scheme resulting to maximum 4 UTF-8 code bytes.  Some applications –specifically in this context MySQL & JavaScript– limit the Unicode range even more (e.g. 65,535 glyphs, 2 Unicode bytes corresponding to 3 UTF-8 code bytes).
HTML5 assumes UTF-8 encoding.

Coding scheme:  Unicode bit patern, from left (most significant) to right (least significant):
      _fff.ffee eeed.dddd cccc.cbbb baaa.aaaa

Unicode rangeByte 1Byte 2Byte 3Byte 4Byte 5Byte 6Description
x0 - x7F0aaa.aaaaASCII (7-bit code)
x80 - x7.FF110b.bbba10aa.aaaaLatin diacritical characters (includes ISO 8895), Greek, Cyrillic, Hebrew, Arabic, …
x8.00 - xFF.FF1110.cccc10cb.bbba10aa.aaaatypically East Asian languages (Chinese, …)
x1.00.00 - x1F.FF.FF1111.0ddd10dd.cccc10cb.bbba10aa.aaaa
x20.00.00 - x3.FF.FF.FF1111.10ee10ee.eddd10dd.cccc10cb.bbba10aa.aaaa
x4.00.00.00 - x7F.FF.FF.FF1111.110f10ff.ffee10ee.eddd10dd.cccc10cb.bbba10aa.aaaa
Notes:

Unicode convertor

You can type a value in any (white) input field.

UnicodeUTF-8
Dec:Byte 1Byte 2Byte 3Byte 4Byte 5Byte 6
Hex:
Char: 

Representation of Glyphs

Representation of non-ASCII characters in a browser on a computer involves multiple steps:  the browser must know how to handle the characters in HTML text, and the computer must know the graphical representation.  If the browser can not represent the glyph, it will usually display something like .

  1. The HTML-page containing non-ASCII characters must indicate which character set it uses in the header section of the webpage through
    <META http-equiv="Content-Type" content="text/html; charset=utf-8"> for UTF-8 (HTML5), or …charset=ISO-8859-1 (for Latin1), or charset=ISO-8859-15 (Latin9 Western European), or whatever is appropriate in your case.  See //www.iana.org/assignments/character-sets/character-sets.xhtml for registered character sets.
    Notes:
    1. This META-statement is reflecting non-ASCII codes (e.g. UTF-8 or Latin1) in the raw HTML-page between the HTTP-server and your browser, and not reflecting the glyphs generated by a HTML NCR-construct (&«value»;).  If you type an á in your HTML-text, you have non-ASCII codes in the (raw) HTML-page;  if you use NCR &aacute; or &#225; or &#xE1; your page is all ASCII, and the browser makes it appear like an á independant of any charset setting.  A Content Management System may automatically convert all non-ASCII codes to its corresponding Numerical Character Reference for you (e.g. convert á to &#225;).
      If your text is nearly all in for example Arabic or Cyrillic, you will probably not convert all UTF-8 codes to NCR-codes, but set the charset [1] and browser [3] appropriately.  NCR is for the exceptional glypfs in your text.
    2. If you don't specify a <META … charset=...> statement, the browser has to assume something.  That assumption may depend on the DTD (<!DOCTYPE …>) and/or languages settings and/or locales.  For HTML5 all browsers nowadays will assume charset=UTF-8, but for older (HTML4) web pages charset=ISO-8859-1 (or comparable) is more appropriate.  Also, if you enter values into and/or display values from a database, you may have to specify a character set corresponding to the charset of the database.
    3. The HTML <FORM>-element supports an accept-charset=… attribute which allows you to use a different character set for input-fields.
    4. If you are authoring HTML-pages, make sure that the character set specified in the <META …charset=…-statement corresponds to the character set your editor is using, or strictly use ASCII and the NCR-method.
    5. If the page is generated by PHP version 5.6 or later, your <META …-statement will be overruled by a charset="utf-8" in the http-header (not the HTML head section):  HTML1114: Codepage utf-8 from (HTTP header) overrides conflicting codepage iso-8859-1 from (META tag).
      So if you think you're using Latin1 because you have a <META …charset=ISO-8859-1"> in that page, you are deceived !  And if you use such pages to enter/­modify values in an MySQL-database set to Latin1 (which is commonly the default), you are entering UTF-codes in the database;  you get a mix of (old) Latin1 codes (which appear corrupted) and (new) UTF8 codes in your database which makes the conversion of the database to UTF a headache (but see [c] above).  Also copying text from a Unicode editor –such as MS Word– in a web input-screen may enter UTF8 codes.
      You can counteract PHP5.6 default behaviour by replacing the charset in the HTTP-header by issuing
      header( 'Content-Type: text/html; charset= ISO-8895-1' ); before any HTML is generated.
    6. If you changed to HTML5 (e.g. by using the simple DTD <!DOCTYPE HTML> or implicitely by not using a DTD at all), and dit not specify <META http-equiv="Content-Type" content="text/html; charset=ISO-8859-1"> you have implicitely switched to charset=UTF-8 !  See a previous bullet point [e] for the consequences.
  2. The HTML-page must, when defining a font, select a font that is likely to support Unicode, or only use 'generic fonts' like 'serif' or 'sans-serif'.  This is implemented through a style(-sheet) definition like
    style="font-family: 'Arial Unicode MS', 'Lucida Sans Unicode', sans-serif" (as used in this page for the glyphs in the tables).
    If the author of the page doesn't do that, the browser uses a default font (see [3] below).  If the author does use style="font-family: …" but makes a poor choice for the font-family, a user can force the browser to use a specific font (supporting Unicode).  But that is cumbersome, and applicable for all webpages (see below).
  3. The browser must be set to represent non-ASCII codes when no character set is defined:
  4. The browser must select a default font (in case the page does not define a font) suitable for Unicode:
  5. Then of course, the computer must have the font to represent the particular glyph.  A potential problem is that the HTML/CSS may specify a font, and that that font is available on your computer but not with the Unicode extension.  Also, on old systems (<2000) Unicode is probably not available.
    See HTML_Fonts for the representation of various HTML fonts on your computer.
    For fonts supporting Unicode glyphs, search the web for Code2000, or see Code2000 on Wikipedia.  See also "Unicode fonts" on Wikipedia.

ASCII

In the table below, you see the American Standard Code for Information Interchange (ASCII) characters.  In the borders you'll find the indication for the hexadecimal code values.

_0_1_2_3_4_5_6_7 _8_9_A_B_C_D_E_F
0_NulSohStxEtxEotEnqAckBel BsHtLfVtFfCrSoSi0_
1_DleDc1Dc2Dc3Dc4NakSynEtb CanEmSubEscFsGsRsUs1_
2_Spc!"#$%&' ()*+,-./2_
3_01234567 89:;<=>?3_
4_@ABCDEFG HIJKLMNO4_
5_PQRSTUVW XYZ[\]^_5_
6_`abcdefg hijklmno6_
7_pqrstuvw xyz{|}~Del7_
_0_1_2_3_4_5_6_7 _8_9_A_B_C_D_E_F
The first 2 rows contain 'control codes' (see below), the other codes represent graphical characters (except for the last one which is also a control code: Del).

ASCII Control Codes

The table below lists the ASCII control codes with the glyphs for these codes, the (decimal) Numerical Character Reference for the glyph, the ASCII control code values, and explanatory comments.  Note that the table is rotated compared to the table above (to allow additional information).

glyphNCRASCIIDescriptionglyphNCRASCIIDescription
0_1_
_09216 0Null 923216Data Link Escape_0
_19217 1Start of Header 923317Device Control 1_1
_29218 2Start of Text 923418Device Control 2_2
_39219 3End of Text 923519Device Control 3_3
_49220 4End of Transmission 923620Device Control 4_4
_59221 5Enquiry 923721Negative Acknowledge_5
_69222 6Acknowledge 923822Synchronous Idle_6
_79223 7Bell 923923End of Transmission Block _7
_89224 8Backspace 924024Cancel_8
_99225 9Horizontal Tabulation 924125End of Medium_9
_A922610Line Feed 924226Substitute_A
_B922711Vertical Tabulation 924327Escape_B
_C922812Form Feed 924428File Separator_C
_D922913Carriage Return 924529Group Separator_D
_E923014Shift Out 924630Record Separator_E
_F923115Shift In 924731Unit Separator_F
0_1_
20924832Space/blank 9249127Delete7F
Control codes are used to control devices (e.g. Cr and Lf for a printer, teletype or text window), and for communication (e.g. Ack, Nack).  In the old days papertape was used to record (program)text, most commonly 8 tracks (7 for the ASCII code and 1 for parity).  When punching all holes for a code (i.e. Del), the character was skipped (effectively deleted).
HTML ignores all control codes except for Cr, Lf, Ff and Ht which are treated as a space (and multiple spaces are reduced to a single space).

ASCII Grapic Codes

ASCII graphics and their decimal code value (for ASCII identical to the Numerical Character Reference).  Note that here also the table is rotated.
charNCRcharNCRcharNCRcharNCR charNCRcharNCR
2_3_ 4_5_6_ 7_
_0Spc32048@64 P80`96p112_0
_1!33149 A65Q81 a97q113_1
_2"34250 B66R82 b98r114_2
_3#35351 C67S83 c99s115_3
_4$36452 D68T84 d100t116_4
_5%37553 E69U85 e101u117_5
_6&38654F70V86f102v118_6
_7'39755 G71W87 g103w119_7
_8(40856 H72X88 h104x120_8
_9)41957 I73Y89 i105y121_9
_A*42:58 J74Z90 j106z122_A
_B+43;59 K75[91 k107{123_B
_C,44<60 L76\92 l108|124_C
_D-45=61 M77]93 m109}125_D
_E.46>62 N78^94 n110~126_E
_F/47?63 O79_95 o111Del127_F
2_3_ 4_5_6_ 7_

HTML Reserved characters

In the table below, you'll find the HTML reserved characters, its decimal code value, its «character_name», and related comment.

CharNCRnameComment
" 34quotDouble quote sign
Used in HTML syntax (not likely a problem in HTML, but potentially in combination with applications like PHP & MySQL)
& 38ampAmpersand
Used in HTML syntax (escape character, likely a problem)
' 39aposApostrophe
Used in HTML syntax (not likely a problem in HTML, but potentially in combination with applications like PHP & MySQL)
< 60 ltLess Than sign
Used in HTML syntax (likely a problem)
> 62 gtGreater Than sign
Used in HTML syntax (not likely a problem)

For some other ASCII characters there are named NCR-codes, but as their usefulness is doubtful at least, they are not provided here.


Other characters

Characters other than ASCII are preferably indicated as Unicode NCRs or UTF-8.  However, to also provide some insight in other character sets, the Latin1/­CP1252 charset is discussed below first (there are also some commonalities with Unicode).

Latin1 / CP1252 Character Set

A very commonly used character set (in HTML4) was ISO-8859-1, also known as Latin1Latin1 is an extension to ASCII, so the code values 0··127 (x00··x7F) are already defined.  But Latin1 exploits the fact that most computers use 8-bit bytes, and defines the code values 128··255 (x80··xFF) as well.
In Latin1 the first 32 codes (x80··x9F) are control codes, but these codes were only rarely used.  Yet another character set is the (Windows) Code Page 1252 (CP1252);  it is a superset of Latin1 and also defines graphical symbols for most of the first 32 codes (the control codes in Latin1).  Many systems treat Latin1 and CP1252 as identical (e.g. browsers, MySQL).  If your web page uses charset=ISO-8859-1 and you have loaded CP1252 on your PC, you should be Ok.  Note however that the first 32 code values of CP1252 are not equal to Unicode values.

_0_1_2_3_4_5_6_7 _8_9_A_B_C_D_E_F
8_ƒˆŠŒŽ8_
9_˜šœžŸ9_
A_nbsp¡¢£¤¥¦§ ¨©ª«¬shy®¯A_
B_°±²³´µ· ¸¹º»¼½¾¿B_
C_ÀÁÂÃÄÅÆÇ ÈÉÊËÌÍÎÏC_
D_ÐÑÒÓÔÕÖ× ØÙÚÛÜÝÞßD_
E_àáâãäåæç èéêëìíîïE_
F_ðñòóôõö÷ øùúûüýþÿF_
_0_1_2_3_4_5_6_7 _8_9_A_B_C_D_E_F
Notes:

Latin9 Character Set

The ISO-8859-15 –or Latin9– character set is almost the same as ISO-8859-1/­Latin1;  the differences are listed below.  The most practical feature is that it contains the Euro-sign.

CharsetxA4xA6xA8xB4xB8xBCxBDxBE
8859-1¤¦¨´¸¼½¾Latin1
8859-15ŠšŽžŒœŸLatin9

The special characters below are Unicode characters not ordered according to their numeric value, but thematically grouped by their symbolic representation to ease look-up.  If you want to use a specific (non-Western) alphabet, use the appropriate Unicode code page (e.g. Cyrillic code page).

Diacritical characters

Latin 1 characters have NCR values greater or equal to 160 (xA0) and less or equal to 255 (xFF).
The table below also contains Latin Extended-A (x100-x17F) and Latin Extended-B (x180-x24F), and is sorted alphabetically.

A

À192Agrave Capital A grave accent à224agrave Small a grave accent
Á193Aacute Capital A acute accent á225aacute Small a acute accent
Â194Acirc Capital A circumflex accent â226acirc Small a circumflex accent
Ã195Atilde Capital A tilde ã227atilde Small a tilde
Ä196Auml Capital A dieresis or umlaut ä228auml Small a dieresis or umlaut
Å197Aring Capital A ring å229aring Small a ring
Ā256AmacrCapital A with macron accent ā257amacrSmall a with macron accent
Ă258AbreveCapital A with breve ă259abreveSmall a with breve
Ą260AogonCapital A with ogonek ą261aogonSmall a with ogonek
Ǎ461 ǎ462
Ǟ478 ǟ479
Ǡ480 ǡ481
Ǻ506 ǻ507
Ȁ512 ȁ513
Ȃ514 ȃ515
Ȧ550 ȧ551
Æ198AElig Capital AE dipthong (ligature) æ230aelig Small ae dipthong (ligature)
Ǣ482 ǣ483
Ǽ508 ǽ509

B

Ɓ385 ƀ384
Ƃ386 ƃ387
Ƅ388 ƅ389
9250 blank symbol

C

Ç199Ccedil Capital C cedilla ç231ccedil Small c cedilla
Ć262CacuteCapital C acute accent ć263cacuteSmall c acute accent
Ĉ264CcircCapital C circumflex ĉ265ccircSmall c circumflex
Ċ266CdotCapital C with dot ċ267cdotSmall C with dot
Č268CcaronCapital C with caron č269ccaronSmall c with caron
Ɔ390
Ƈ391 ƈ392

D

Ď270DcaronCapital D with caron ď271dcaronSmall d with caron
Đ272DstrokCapital D with stroke đ273strokSmall d with stroke
Ð208ETH Capital Eth, Icelandic ð240eth Small eth, Icelandic
Ɖ393
Ɗ394
Ƌ395 ƌ396
DZ497 Dz498
dz499
DŽ452 Dž453
dž454

E

È200Egrave Capital E grave accent è232egrave Small e grave accent
É201Eacute Capital E acute accent é233eacute Small e acute accent
Ê202Ecirc Capital E circumflex accent ê234ecirc Small e circumflex accent
Ë203Euml Capital E dieresis or umlaut ë235euml Small e dieresis or umlaut
Ē274EmacrCapital E with macron ē275emacrSmall e with macron
Ĕ276 ĕ277
Ė278EdotCapital E with dot ė279edotSmall e with dot
Ę280EogonCapital E with ogonek ę281eogonSmall e with ogonek
Ě282EcaronCaptial E with caron ě283ecaronSmall e with caron
Ð208ETH Capital Eth, Icelandic ð240eth Small eth, Icelandic
Ǝ398 Ə399
Ɛ400
Ȅ516 ȅ517
Ȇ518 ȇ519
Ȩ552 ȩ553

F

Ƒ401 ƒ402fnofflorin
64256ffligff-ligature 64261ft-ligature
64257filigfi-ligature 64259ffiligffi-ligature
fjfjligfj-ligature
64258flligfl-ligature 64260fflligffl-ligature

G

Ĝ284GcircCapital G circumflexĝ285gcircSmall g circumflex
Ğ286GbreveCapital G with breve ğ287gbreveSmaal g breve
Ġ288GdotCapital G with dot ġ289gdotSmall g with dot
Ģ290GcedilCapital G with cedilla ģ291
Ɠ403 ʛ667
Ǥ484 ǥ485
Ǧ486 ǧ487
Ǵ500 ǵ501gacuteSmall g acute

H

Ĥ292HcircCapital H circumflex ĥ293hcircSmall h circumflex
Ħ294HstrokCapital H with stroke ħ295hstrokSmall h with stroke
Ȟ542 ȟ543

I

Ì204Igrave Capital I grave accent ì236igrave Small i grave accent
Í205Iacute Capital I acute accent í237iacute Small i acute accent
Î206Icirc Capital I circumflex accent î238icirc Small i circumflex accent
Ï207Iuml Capital I dieresis or umlaut ï239iuml Small i dieresis or umlaut
Ĩ296ItildeCapital I tilde ĩ297itildeSmall i tilde
Ī298ImacrCapital I macron ī299imacrSmall i macron
Ĭ300 ĭ301
Į302IogonCapital I ogonek į303iogonSmall i ogonek
İ304IdotCapital I dot ı305inodotSmall i no dot
ľ318 ŀ320
ł322 ſ383
Ǐ463 ǐ464
Ȉ520 ȉ521
Ȋ522 ȋ523
IJ306IJligCapital ligature IJ ij307ijligSmall ligature ij

J

Ĵ308JcircCapital J circumflex ĵ309jcircSmall j circumflex
ǰ496 ȷ567jmath

K

Ķ310KcedilCapital K cedilla ķ311kcedilSmall k cedilla
Ƙ408 ƙ409
ĸ312kgreenSmall kra (see also Greek Kappa)
Ǩ488 ǩ489

L

Ĺ313LacuteCapital L acute ĺ314lacuteSmall l acute
Ļ315LcedilCapital L cedilla ļ316lcedilSmall l cedilla
Ľ317LcaronCapital L caron ľ318lcaronSmall l caron
Ŀ319LmidotCapital L middot ŀ320lmidotSmall l middot
Ł321LstrokCapital L stroke ł322lstrokSmall l stroke
LJ455 Lj456
lj457

N

Ñ209Ntilde Capital N tilde ñ241ntilde Small n tilde
Ń323NacuteCapital N acute ń324nacuteSmall n acute
Ņ325NcedilCapital N cedilla ņ326ncedilSmall n cedilla
Ň327NcaronCapital N caron ň328ncaronSmall n caron
ʼn329naposSmall n apostrophe
Ŋ330ENGCapital ENG ŋ331engSmall eng
Ɲ413 ƞ414
Ǹ504 ǹ505
NJ458 Nj459
nj460

O

Ò210Ograve Capital O grave accent ò242ograve Small o grave accent
Ó211Oacute Capital O acute accent ó243oacute Small o acute accent
Ô212Ocirc Capital O circumflex accent ô244ocirc Small o circumflex accent
Õ213Otilde Capital O tilde õ245otilde Small o tilde
Ö214Ouml Capital O dieresis or umlaut ö246ouml Small o dieresis or umlaut
Ø216Oslash Capital O slash ø248oslash Small o slash
Ō332OmacrCapital O macron ō333omacrSmall o macron
Ŏ334 ŏ335
Ő336OdblacCapital O double accent ő337odblacSmall o double accent
Ơ416 ơ417
Ƣ418 ƣ419
Ǒ465 ǒ466
Ǫ490 ǫ491
Ǭ492 ǭ493
Ǿ510 ǿ511
Ȍ524 ȍ525
Ȏ526 ȏ527
Ȫ554 ȫ555
Ȭ556 ȭ557
Ȯ558 ȯ559
Ȱ560 ȱ561
Œ338OElig Latin capital ligature OE œ339oeligLatin small ligature oe

P

Ƥ420P ƥ421

R

Ŕ340RacuteCapital R acute ŕ341racuteSmall r acute
Ŗ342RcedilCapital R cedidilla ŗ343rcedilSmall r cedilla
Ř344RcaronCapital R caron ř345rcaronSmall r caron
Ȑ528 ȑ529
Ȓ530 ȓ531
Ʀ422
ʀ640 ʁ641

S

Ś346SacuteCapital S acute ś347sacuteSmall s acute
Ŝ348ScircCapital S circonflex ŝ349scircSmall s circonflex
Ş350ScedilCapital S cedilla ş351scedilSmall s cedilla
Š352ScaronCapital S caron š353scaronSmall s caron
Ƨ423 ƨ424
Ș536 ș537
ß223szlig Small sharp s, German sz-ligature (see also Greek Beta)
64261Long S T ligature 64262Small st ligature

T

Þ222THORN Capital THORN, Icelandic þ254thorn Small thorn, Icelandic
Ţ354TcedilCapital T cedilla ţ355tcedilSmall t cedilla
Ț538 ț539
Ť356TcaronCapital T caron ť357tcaronSmall t caron
Ŧ358TstrokCapital T stroke ŧ359tstrokSmall t stroke
Ʈ430 ƭ429
Ƭ428 ƫ427

U

Ù217Ugrave Capital U grave accent ù249ugrave Small u grave accent
Ú218Uacute Capital U acute accent ú250uacute Small u acute accent
Û219Ucirc Capital U circumflex accent û251ucirc Small u circumflex accent
Ü220Uuml Capital U dieresis or umlaut ü252uuml Small u dieresis or umlaut
Ũ360UtildeCapital U tilde ũ361utildeSmall u tilde
Ū362UmacrCapital U macron ū363umacrSmall u macron
Ŭ364UbreveCapital U breve ŭ365ubreveSmall u breve
Ů366UringCapital U ring ů367uringSmall u ring
Ű368UdblacCapital U double acute ű369udblacSmall u double acute
Ų370UogonCapital U ogonek ų371uogonSmall u ogonek
Ư431 ư432
Ǔ467 ǔ468
Ǖ469 ǖ470
Ǘ471 ǘ472
Ǚ473 ǚ474
Ǜ475 ǜ476
Ȕ532 ȕ533
Ȗ534 ȗ535
7531ue-ligature

W

Ŵ372WcircCapital W circumflexŵ373wcircSmall w circumflex

Y

Ý221Yacute Capital Y acute accent ý253yacute Small y acute accent
Ÿ376YumlCapital Y diaeresis or umlaut ÿ255yumlSmall y diaeresis or umlaut
Ŷ374YcircCapital Y circumflex ŷ375ycircSmall y circumflex
Ȳ562 ȳ563
Ɣ404
ƴ436 Ƴ435
IJ306IJligCapital ligature IJ ij307ijligSmall ligature ij

Z

Ź377ZacuteCapital Z acute ź378zacuteSmall z acute
Ż379ZdotCapital Z dot ż380zdotSmall z dot
Ž381ZcaronCapital Z caron ž382zcaronSmall z caron
Ƶ437impedCapital Z stroke, Impedance ƶ438
Ȥ548 ȥ549

Greek symbols

These are Greek symbols, also used in mathematical notations.

Α913Alpha Capital alpha α945alpha Small alpha
Β914Beta Capital beta β946beta Small beta
Γ915Gamma Capital gamma γ947gamma Small gamma
Ϝ988Gammad Capital digamma ϝ989gammad Small gammad
Δ916Delta Capital delta δ948delta Small delta
Ε917Epsilon Capital epsilon ε949epsilon Small epsilon
ϵ1013epsi Straight epsilon ϶1014bepsiBack epsilon
Ζ918Zeta Capital zeta ζ950zeta Small zeta
Η919Eta Capital eta η951eta Small eta
Θ920Theta Capital theta θ952theta Small theta
ϑ977thetasym Small theta symbol
Ι921Iota Capital iota ι953iota Small iota
Κ922Kappa Capital kappa κ954kappa Small kappa
ϰ1008kappav Kappa
Λ923Lambda Capital lambda λ955lambda Small lambda
Μ924Mu Capital mu μ956mu Small mu
µ181micro Micro sign
Ν925Nu Capital nu ν957nu Small nu
Ξ926Xi Capital xi ξ958xi Small xi
Ο927Omicron Capital omicron ο959omicron Small omicron
Π928Pi Capital pi π960pi Small pi
ϖ982piv Pi symbol
Ρ929Rho Capital rho ρ961rho Small rho
ϱ1009rhov Small rho
Σ931Sigma Capital sigma ς962sigmaf Small final sigma
σ963sigma Small sigma
Τ932Tau Capital tau τ964tau Small tau
Υ933Upsilon Capital upsilon υ965upsilon, upsi Small upsilon
ϒ978upsih Upsilon with hook symbol
Φ934Phi Capital phi φ966phi Small phi
ϕ981straightphi Straight phi
Χ935Chi Capital chi χ967chi Small chi
Ψ936Psi Capital psi ψ968psi Small psi
Ω937Omega Capital omega ω969omega Small omega

Double-struck letters

𝔸120120AopfDouble-struck capital A 𝕒120146aopfDouble-struck small a
𝔹120121BopfDouble-struck capital B 𝕓120147bopfDouble-struck small b
 8450CopfDouble-struck capital C 𝕔120148copfDouble-struck small c
𝔻120123DopfDouble-struck capital D 𝕕120149dopfDouble-struck small d
𝔼120124EopfDouble-struck capital E 𝕖120150eopfDouble-struck small e
𝔽120125FopfDouble-struck capital F 𝕗120151fopfDouble-struck small f
𝔾120126GopfDouble-struck capital G 𝕘120152gopfDouble-struck small g
 8461HopfDouble-struck capital H 𝕙120153hopfDouble-struck small h
𝕀120128IopfDouble-struck capital I 𝕚120154iopfDouble-struck small i
𝕁120129JopfDouble-struck capital J 𝕛120155jopfDouble-struck small j
𝕂120130KopfDouble-struck capital K 𝕜120156kcopfDouble-struck small k
𝕃120131LopfDouble-struck capital L 𝕝120157lopfDouble-struck small l
𝕄120132MopfDouble-struck capital M 𝕞120158mopfDouble-struck small m
 8469NopfDouble-struck capital N 𝕟120159nopfDouble-struck small n
𝕆120134OopfDouble-struck capital O 𝕠120160oopfDouble-struck small o
 8473PopfDouble-struck capital P 𝕡120161popfDouble-struck small p
 8474QopfDouble-struck capital Q 𝕢120162qopfDouble-struck small q
 8477RopfDouble-struck capital R 𝕣120163ropfDouble-struck small r
𝕊120138SopfDouble-struck capital S 𝕤120164sopfDouble-struck small s
𝕋120139TopfDouble-struck capital T 𝕥120165topfDouble-struck small t
𝕌120140UopfDouble-struck capital U 𝕦120166uopfDouble-struck small u
𝕍120141VopfDouble-struck capital V 𝕧120167vopfDouble-struck small v
𝕎120142WopfDouble-struck capital W 𝕨120168wopfDouble-struck small w
𝕏120143XopfDouble-struck capital X 𝕩120169xopfDouble-struck small x
𝕐120144YopfDouble-struck capital Y 𝕪120170yopfDouble-struck small y
 8484ZopfDouble-struck capital Z 𝕫120171zopfDouble-struck small z
The range for the capitals is inconsistent (jumping code numbers).  These double-struck letters don't look good in bold.

There are also script letters in the range 119964-120015 (names Ascr-zscr, incomplete), and gothic(?) in the range 120068-120119 (names Afr-zfr, also incomplete).

Combining Diacritical Marks

A special category of symbols are the Combining Diacritical Marks:  these are not stand-alone characters but are always overstriking the previous character (i.e. overstrike characters).  For example a combining grave accent after the letter m will result in m̀ (m&#768;).
Multiple combining marks are allowed, e.g. ò̧̕ or 2̃ (but results may not be very nice).  There are no 'character names' (&«name»;) for these marks.  Application is also in scientific notations.

In the table below all combining marks are used after a blank.

  ̀768Combining Grave accent   ́769Combining Acute accent
  ̂770Combining Circumflex accent  ̃771Combining Tilde
  ̄772Combining Macron  ̅773Combining Overline
  ̆774Combining Breve  ̇775Combining Dot above
  ̈776Combining Diaeresis  ̉777Combining Hook above
  ̊778Combining Ring above  ̋779Combining Double acute accent
  ̌780Combining Caron  ̍781Combining Vertical line above
  ̎782Combining Double vertical line above  ̏783Combining Double grave accent
  ̐784Combining Condrabindu  ̑785Combining Inverted breve, &DownBreve;
  ̒786Combining Turned comma above  ̓787Combining Comma above
  ̔788Combining Reversed comma above  ̕789Combining Comma above right
  ̖790Combining Grave accent below  ̗791Combining Acute accent below
  ̘792Combining Left tack below  ̙793Combining Right tack below
  ̚794Combining Left angle above  ̛795Combining Horn
  ̜796Combining Left half ring below  ̝797Combining Up tack below
  ̞798Combining Down tack below  ̟799Combining Plus sign below
  ̠800Combining Minus sign below  ̡801Combining Palatalized hook below
  ̢802Combining Retroflex hook below  ̣803Combining Dot below
  ̤804Combining Diaeresis below  ̥805Combining Ring below
  ̦806Combining Comma below  ̧807Combining Cedilla
  ̨808Combining Ogonek  ̩809Combining Vertical line below
  ̪810Combining Bridge below  ̫811Combining Inverted double arch below
  ̬812Combining Caron below  ̭813Combining Circumflex accent below
  ̮814Combining Breve below  ̯815Combining Inverted breve below
  ̰816Combining Tilde below  ̱817Combining Macron below
  ̲818&UnderBar; Combining Low line  ̳819Combining Double low line
  ̴820Combining Tilde overlay  ̵821Combining Short stroke overlay
  ̶822Combining Long stroke overlay  ̷823Combining Short solidus overlay
  ̸824Combining Long solidus overlay  ̹825Combining Right half ring below
  ̺826Combining Inverted bridge below  ̻827Combining Square below
  ̼828Combining Seagull below  ̽829Combining X above
  ̾830Combining Vertical tilde  ̿831Combining Double overline
  ̀832Combining Grave tone mark  ́833Combining Acute tone mark
  ͂834Combining Greek perispomeni  ̓835Combining Greek koronis
  ̈́836Combining Greek dialytika tonos  ͅ837Combining Greek ypogegrammeni
 ͆838Combining Bridge above  ͇839Combining Equal sign below
  ͈840Combining Double vertical line below  ͉841Combining Left angle below
  ͊842Combining Not tilde above  ͋843Combining Homothetic above
  ͌844Combining Almost equal to above  ͍845Combining Left right arrow below
  ͎846Combining Upwards arrow below  ͏847Combining Grapheme joiner
  ͐848Combining Right arrowhead above  ͑849Combining Left half ring above
  ͒850Combining Fermata  ͓851Combining X below
  ͔852Combining Left arrowhead below  ͕853Combining Right arrowhead below
  ͖854Combining Right arrowhead and up arrowhead below  ͗855Combining Right half ring above
  ͘856Combining Dot above right  ͙857Combining Asterisk below
  ͚858Combining Double ring below  ͛859Combining Zigzag above
  ͜860Combining Double breve below  ͝861Combining Double breve
  ͞862Combining Double macron  ͟863Combining Double macron below
  ͠864Combining Double tilde  ͡865Combining Doube inverted breve
 ͢866Combining Double rightwards arrow below  ͣ867Combining Latin small letter A
  ͤ868Combining Latin small letter E  ͥ869Combining Latin small letter I
  ͦ870Combining Latin small letter O  ͧ871Combining Latin small letter U
  ͨ872Combining Latin small letter C  ͩ873Combining Latin small letter D
  ͪ874Combining Latin small letter H  ͫ875Combining Latin small letter M
  ͬ876Combining Latin small letter R  ͭ877Combining Latin small letter T
  ͮ878Combining Latin small letter V  ͯ879Combining Latin small letter X

Special Symbols

Typographics

9248 Space glyph 9250 blank symbol
| |32Normal blank/space (for comparison; spaces are between | )
| |160nbsp Non-Break Space -173shy Soft (=hidden) hyphen
| |8194ensp'n'-space 8211 ndash'n'-dash
| |8195emsp'm'-space 8212 mdash'm'-dash
| |8202hairspHair space 8209non-breaking hyphen
| |8201thinsp thin space 8254oline Overline = spacing overscore
| |8197emsp14 1/4m-space |‌|8204zwnjzero width non-joiner
| |8196emsp13 1/3m-space |‍|8205zwj zero width joiner
| |8199numsp figure space | |8200puncsppunctuation space
182para Paragraph sign §167sect Section sign
¡161iexcl Inverted exclamation mark ¿191iquest Inverted question mark
8226bullBullet = black small circle ·183middotMiddle dot
8230hellipHorizontal ellipsis = three dot leader 8229nldrTwo dot leader
8224dagger Dagger 8225DaggerDouble dagger
©169copy Copyright ®174reg Registered trademark
153, 8482tradeTrade mark sign 8471copysrCopy sign
8240permil Per mille sign 8470numeroNumber
¦166brvbar, brkbar Broken vertical bar 8214Verbarvertical (double) bar
8213horbarhorizontal bar 8259hybullHyphen bullet
8206lrm left-to-right mark 8207rlm right-to-left mark
ª170ordf Feminine ordinal º186ordm Masculine ordinal
9792female Feminine sign 9794male Male sign
9742phoneTelephone 9913Sextile
10003checkChecked 10007crossCrossed
9834sungMusical note 9839sharp

Quotes

' 39apos Apostrophe, single quote (Ascii) "34quot Double quote (Ascii)
8242prime prime = minutes = feet 8243Prime Double prime = seconds = inches
8244tprime triple prime 8245backprimeBack prime
8216lsquo Left single quotation mark 8217rsquoRight single quotation mark
8218sbquo Single bottom 9 quotation mark 8222bdquoDouble bottom 9 quotation mark
8220ldquoLeft double quotation mark 8221rdquoRight double quotation mark
«171laquo Left double angle quote, guillemot-left (French left quote)»187raquo Right doubleangle quote, guillemot-right (French right quote)
8249lsaquo Single left-pointing angle quotation mark 8250rsaquoSingle right-pointing angle quotation mark
9001lang left-pointing angle bracket = bra 9002rang right-pointing angle bracket = ket

Currency signs

¤164currenGeneral currency sign ƒ131fnofFlorin sign
8364euro Euro sign £163pound Pound sterling
¢162cent Cent sign ¥165yen Yen sign

Geometric forms

9632 Black square 9633 White square
9670 Black diamond 9671 White diamond
9679 Black circle/disc 9675 White circle/disc
9650 Black up-pointing triangle 9651 White up-pointing triangle
9652 Small up-pointing triangle 9653 Small up-pointing triangle
9654 Right-pointing triangle 9655 Right-pointing triangle
9658 Right-pointing triangle 9657 Right-pointing triangle
9660 Down-pointing triangle 9661 Down-pointing triangle
9662 Down-pointing triangle 9663 Down-pointing triangle
9668 Left-pointing triangle 9669 Left-pointing triangle
9666 Left-pointing triangle 9667 Left-pointing triangle
9733Star, starfBlack star 9734starWhite star
8902Small black star

Accents

¨168uml, die Umlaut, Dieresis ¯175macr, hibar Macron accent
´180acute Acute accent ¸184cedil Cedilla
ˆ710circ modifier letter Circumflex accent ˇ711caron, HacekCaron
˘728breve Breve accent ̑785DownBreveDown breve accent
˛731ogon Ogon accent ˜152, 732tilde Small tilde
˝733dblacDouble acute ˚730ringRing
˙729dot Diacritical dot
8411TripleDotTriple dot 8412DotDotQuadrupal dot

Arrows

8592larr Leftwards arrow 8656lArr Leftwards double arrow
8593uarr Upwards arrow 8657uArr Upwards double arrow
8594rarr Rightwards arrow 8658rArr, ImpliesRightwards double arrow
8595darr Downwards arrow 8659dArr Downwards double arrow
8596harr Left right arrow 8660hArr Left right double arrow
8597UpDownArrowUp-down arrow 8661UpdownarrowUp-down double arrow
8598nwarrowUp-left arrow 8662nwArrNorth-West double arrow
8599UpperRightArrowUp-right arrow 8663neArrNorth-East double arrow
8600LowerRightArrowDown-right arrow 8664seArrSouth-East double arrow
8601swarrDown-left arrow 8665swArrSouth-West double arrow
8624LshUp-left arrow 8625rshUp-right arrow
8626ldshDown-left arrow 8627rdshDown-right arrow
8630curvearrowleftCurved arrow left 8631curarrCurved arrow right
8634olarrRotate left 8635orarrRotate right
8629crarr Downwards arrow with corner leftwards = carriage return

Superscripts & Fractions

°176deg Degree sign ¹185sup1 Note 1
²178sup2 Note 2, square ³179sup3 Note 3, cubic
½189frac12, half Fraction one-half
8531frac13Fraction one-third 8532frac23Fraction two-third
¼188frac14 Fraction one-fourth ¾190frac34 Fraction three-fourth
8533frac15Fraction one-fifth 8534frac25Fraction two-fifth
8535frac35Fraction three-fifth 8536frac45Fraction four-fifth
8537frac16Fraction one-sixth 8538frac56Fraction five-sixth
8539frac18Fraction one-eighth 8540frac38Fraction three-eighth
8541frac58Fraction five-eighth 8542frac78Fraction seven-eighth

Mathematical & technical symbols

This is only a selection;  there are many more
×215times multiplication sign ÷247divide division sign
8722minus minus sign 8727lowast asterisk operator
±177plusmn Plus or minus mpminus plus
8260frasl Fraction slash ¬172not Not sign
ȷ567jmath mathematical j (imaginary) Ƶ437impedimpedance
8476real Blackletter capital R = real part symbol 8465image Blackletter capital I = imaginary part
8472weierp Script capital P = power set = Weierstrass p8501alefsym Alef symbol = first transfinite cardinal
8704forall for all 8707exist there exists
8712isin element of 8713notin not an element of
8715ni contains as member 8709empty empty set = null set = diameter
8745cap intersection = cap 8746cup union = cup
8743and logical and = wedge 8744or logical or = vee
8706part partial differential 8747int integral
8748Int double integral 8750ContourIntegralcontour integral
8711nabla nabla = backward difference 8901sdot dot operator
8719prod n-ary product = product sign 8721sum n-ary sumation
8730radic, Sqrtsquare root = radical sign 8734infin infinity
8733prop proportional to 8869perp up tack = orthogonal to = perpendicular
8736ang angle 9674loz lozenge
8764sim tilde operator = varies with = similar to 8756there4 therefore
8766ac alternating current 8767acd
8739mid 8741shortparallelparallel
11005parslparallel
8773cong approximately equal to 8776asymp, approxalmost equal to = asymptotic to
8800ne not equal to 8801equiv, Congruent identical to
8804le less-than or equal to 8805ge greater-than or equal to
8834sub subset of 8835sup superset of
8836nsub not a subset of
8838sube subset of or equal to 8839supe superset of or equal to
8853oplus circled plus = direct sum 8855otimes circled times = vector product
8968lceil left ceiling = apl upstile 8969rceil right ceiling
8970lfloor left floor = apl downstile 8971rfloor right floor

Card faces

9824spades Black spade suit 9827clubs Black club suit = shamrock
9829hearts Black heart suit = valentine 9830diams Black diamond suit
 

Emoticons

9786Smiley
😁128513Grinning face with smiling eyes 😂128514Face with tears of joy
😃128515Smiling face with open mouth 😄128516Smiling face with open mouth and smiling eyes
😅128517Smiling face with open mouth and cold sweat 😆128518Smiling face with open mouth and tightly-closed eyes
😇128519Smiling face with halo 😈128520Smiling face with horns
😉128521Winking face 😊128522Smiling face with smiling eyes
😋128523Face savouring delicious food 😌128524Relieved face
😍128525Smiling face with heart-shaped eyes 😎128526Smiling face with sunglasses
😏128527Smirking face 😐128528Neutral face
😒128530Unamused face 😓128531Face with cold sweat
😔128532Pensive face 😖128534Confounded face
😘128536Face throwing a kiss 😚128538Kissing face with closed eyes
😜128540Face with stuck-out tongue and winking eye 😝128541Face with stuck-out tongue and tightly-closed eyes
😞128542Disappointed face 😠128544Angry face
😡128545Pouting face 😢128546Crying face
😣128547Persevering face 😤128548Face with look of triumph
😥128549Disappointed but relieved face 😨128552Fearful face
😩128553Weary face 😪128554Sleepy face
😫128557Tired face 😭128557Loudly crying face
😰128560Face with open mouth and cold sweat 😱128561Face screaming in fear
😲128562Astonished face 😳128563Flushed face
😵128565Dizzy face 😶128566Face without mouth
😷128567Face with medical mask   
😸128568Grinning cat face with smiling eyes 😹128569Cat face with tears of joy
😺128570Smiling cat face with open mouth 😻128571Smiling cat face with heart-shaped eyes
😼128572Cat face with wry smile 😽128573Kissing cat face with closed eyes
😾128574Pouting cat face 😿128575Crying cat face
🙀128576Weary cat face
🙅128581Face with no good gesture 🙆128582Face with ok gesture
🙇128583Person bowing deeply 🙈128584See-no-evil monkey
🙉128585Hear-no-evil monkey 🙊128586Speak-no-evil monkey
🙋128587Happy person raising one hand 🙌128588Person raising both hands in celebration
🙍128589Person frowning 🙎128590Person with pouting face
🙏128591Person with folded hands

Box Drawing Symbols

┌─┬┐
│ ││
├─┼┤
└─┴┘
9484 9472 9516 9488
9474      9474 9474
9500 9472 9532 9508
9492 9472 9524 9496
┏━┳┓
┃ ┃┃
┣━╋┫
┗━┻┛
9487 9473 9523 9491
9475 emsp 9475 9475
9507 9473 9547 9515
9495 9473 9531 9499
╔═╦╗
║ ║║
╠═╬╣
╚═╩╝
9556 9552 9574 9559
9553      9553 9553
9568 9552 9580 9571
9562 9552 9577 9565
 

=O=