Home Misc Index
  HTML Characters
 P van Diemen
Contents  

Coding in HTML

Standard characters for HTML are in the American Standard Code for Information Interchange (ASCII, a very old and very common standard, see below).  This is a 7-bit code (0··127) where the first 32 codes are so-called control codes;  they include codes like carriage return, line feed, horizontal tab and do not represent a symbol (i.e. 'not printable'), so they are ignored by HTML.  This leaves 96 codes denoting digits, upper- and lowercase letters and some symbols like comma, point, parenthesis, equal sign, etc. 
Apart from using the ASCII-code straightforward (i.e. letters & digits), one may use
          &'value';
   where 'value' is either:
#decimal_number,
#xhexadecimal_number,
   orcharacter_name.
The methods using numbers are also known as Numerical Character Reference (NCR). Obviously, the variant with character_name is more suited for human use, but all methods should work.

The above coding method is in fact the only way to correctly denote some HTML reserved characters ('&', '<', '>') in the ASCII-set as these characters are also used in the HTML-syntax, and the advised way to denote the extensions to the character set (e.g. Latin 1).  In practice, Latin 1 characters (like â, é, ü, ñ) can be used directly in HTML texts provided that the browser and computer are set appropriately (see Representation of Glyphs).

Other Characters
On nearly all computers, characters are stored in an 8-bit byte.  The ASCII-set is only 7-bit and leaves another 128 codes (128··255) for special characters.  That code space can be used in various ways;  in the past it was used for some country-dependant (i.e. not multi-national) symbols but nowadays it is used for multi-byte codes like ISO 8859 or like Unicode.

Notes:

Unicode

Unicode is a standard for the definition of all kinds of multi-lingual characters (symbols, 'glyphs', or 'graphemes' as Unicode calls them) like Greek and even Chinese characters, i.e. other letters and symbols than available in ASCII.  This is achieved through multi-byte codes.  However, due to compatibility requirements with various existing codes, there are multiple encodings which makes it quite confusing.
The coding presented here is the most common one:  UTF-8.  Be aware however that there are other codings.

UTF-8 is an extension to the ASCII character set and to ISO 8859-1 (representing the common diacritical extensions as used in Latin 1 containing characters like â, é, ü, ñ).  It encodes Unicode graphemes in 1··4 bytes, such that the old ASCII code still applies in a single byte. 

Coding scheme:
RangeByte1Byte2Byte3Byte4Description
0 - 1270xxxxxxxASCII (7-bit code)
128 - 2,047110yyyxx10xxxxxxLatin diacritical characters, Greek, Cyrillic, Hebrew, Arabic, … compatible with ISO 8895
2,048 - 65,5351110yyyy10yyyyxx10xxxxxxtypically East Asian languages (Chinese, …)
65,536 - 1,114,11111110zzz10zzyyyy10yyyyxx10xxxxxx
The letter x in above scheme represents bits in the first byte in Unicode (not the encoding), the letter y bits in the second Unicode byte, and z bits in the rest.
Note that there are UTF-8 codes which are not valid (like 192-193).

Representation of Glyphs

To represent non-ASCII characters in a browser on a computer involves multiple steps:  the browser must know how to handle the HTML, and the computer must know the graphical representation.

ASCII characters

spc32048 @64P80 `96p112
!33149 A65Q81 a97q113
"34250 B66R82 b98r114
#35351 C67S83 c99s115
$36452 D68T84 d100t116
%37553 E69U85 e101u117
&38654 F70V86 f102v118
'39755 G71W87 g103w119
(40856 H72X88 h104x120
)41957 I73Y89 i105y121
*42:58 J74Z90 j106z122
+43;59 K75[91 k107{123
,44<60 L76\92 l108|124
-45=61 M77]93 m109}125
.46>62 N78^94 n110~126
/47?63 O79_95 o111del127

Reserved characters

"34quotDouble quote sign
Used in HTML syntax (not likely a problem)
&38ampAmpersand
Used in HTML syntax (escape character, likely a problem)
<60ltLess Than sign
Used in HTML syntax (likely a problem)
>62gtGreater Than sign
Used in HTML syntax
 

Other characters

The special characters below are not ordered according to their numeric value, but grouped by their symbolic representation to ease look-up.

Diacritical characters

Latin 1 characters have values less than 256.  Non-latin characters have been ordered with their Latin 'look-alikes'.
À192Agrave Capital A, grave accent à224agrave Small a, grave accent
Á193Aacute Capital A, acute accent á225aacute Small a, acute accent
Â194Acirc Capital A, circumflex accent â226acirc Small a, circumflex accent
Ã195Atilde Capital A, tilde ã227atilde Small a, tilde
Ä196Auml Capital A, dieresis or umlaut mark ä228auml Small a, dieresis or umlaut mark
Å197Aring Capital A, ring å229aring Small a, ring
Æ198AElig Capital AE dipthong (ligature) æ230aelig Small ae dipthong (ligature)
Ā256 ā257
Ă258Capital A with caron ă259Small a with caron
Ą260 ą261
Ǎ461 ǎ462
Ǟ478 ǟ479
Ǡ480 ǡ481
Ǣ482 ǣ483
Ǻ506 ǻ507
Ǽ508 ǽ509
Ȁ512 ȁ513
Ȃ514 ȃ515
Ȧ550 ȧ551
Ɓ385B ƀ384
Ƃ386 ƃ387
Ƅ388 ƅ389
Ç199Ccedil Capital C, cedilla ç231ccedil Small c, cedilla
Ć262 ć263
Ĉ264 ĉ265
Ċ266 ċ267
Č268 č269
Ɔ390
Ƈ391 ƈ392
Ď270D ď271
Đ272 đ273
Ð208ETH Capital Eth, Icelandic ð240eth Small eth, Icelandic
Ɖ393
Ɗ394
ƌ396 Ƌ395
DŽ452 Dž453
dž454
DZ497 Dz498
dz499
È200Egrave Capital E, grave accent è232egrave Small e, grave accent
É201Eacute Capital E, acute accent é233eacute Small e, acute accent
Ê202Ecirc Capital E, circumflex accent ê234ecirc Small e, circumflex accent
Ë203Euml Capital E, dieresis or umlaut mark ë235euml Small e, dieresis or umlaut mark
Ē274 ē275
Ĕ276 ĕ277
Ė278 ė279
Ę280 ę281
Ě282 ě283
Ǝ398 Ə399
Ɛ400
Ȅ516 ȅ517
Ȇ518 ȇ519
Ȩ552 ȩ553
Ƒ401F ƒ402fnofflorin
Ĝ284G ĝ285
Ğ286 ğ287
Ġ288 ġ289
Ģ290 ģ291
Ɠ403 ʛ667
Ǥ484 ǥ485
Ǧ486 ǧ487
Ǵ500 ǵ501
Ĥ292H ĥ293
Ħ294 ħ295
Ȟ542 ȟ543
Ì204Igrave Capital I, grave accent ì236igrave Small i, grave accent
Í205Iacute Capital I, acute accent í237iacute Small i, acute accent
Î206Icirc Capital I, circumflex accent î238icirc Small i, circumflex accent
Ï207Iuml Capital I, dieresis or umlaut mark ï239iuml Small i, dieresis or umlaut mark
Ĩ296 ĩ297
Ī298 ī299
Ĭ300 ĭ301
Į302 į303
İ304 ı305
IJ306 ij307
ĺ314 ļ316
ľ318 ŀ320
ł322 ſ383
Ǐ463 ǐ464
Ȉ520 ȉ521
Ȋ522 ȋ523
Ĵ308J ĵ309
ǰ496
Ķ310K ķ311
Ƙ408 ƙ409
ĸ312(see also Greek Kappa)
Ǩ488 ǩ489
Ĺ313L Ļ315
Ľ317 Ŀ319
Ł321 lj457
LJ455 Lj456
Ñ209Ntilde Capital N, tilde ñ241ntilde Small n, tilde
Ń323 ń324
Ņ325 ņ326
Ň327 ň328
ʼn329
Ŋ330 ŋ331
Ɲ413 ƞ414
Ǹ504 ǹ505
NJ458 Nj459
nj460
Ò210Ograve Capital O, grave accent ò242ograve Small o, grave accent
Ó211Oacute Capital O, acute accent ó243oacute Small o, acute accent
Ô212Ocirc Capital O, circumflex accent ô244ocirc Small o, circumflex accent
Õ213Otilde Capital O, tilde õ245otilde Small o, tilde
Ö214Ouml Capital O, dieresis or umlaut mark ö246ouml Small o, dieresis or umlaut mark
Ø216Oslash Capital O, slash ø248oslash Small o, slash
Œ338OElig Latin capital ligature OE œ339oeligLatin small ligature oe
Ō332 ō333
Ŏ334 ŏ335
Ő336 ő337
Ơ416 ơ417
Ƣ418 ƣ419
Ǒ465 ǒ466
Ǫ490 ǫ491
Ǭ492 ǭ493
Ǿ510 ǿ511
Ȍ524 ȍ525
Ȏ526 ȏ527
Ȫ554 ȫ555
Ȭ556 ȭ557
Ȯ558 ȯ559
Ȱ560 ȱ561
Ƥ420P ƥ421
Ŕ340R ŕ341
Ŗ342 ŗ343
Ř344 ř345
Ȑ528 ȑ529
Ȓ530 ȓ531
Ʀ422
ʀ640 ʁ641
Š352ScaronLatin capital letter S with caron š353scaronLatin small letter s with caron
Ś346 ś347
Ŝ348 ŝ349
Ş350 ş351
ß223szlig Small sharp s, German sz-ligature (see also Greek Beta)
Ƨ423 ƨ424
Ș536 ș537
Þ222THORN Capital THORN, Icelandic þ254thorn Small thorn, Icelandic
Ţ354 ţ355
Ţ354 ţ355
Ť356 ť357
Ŧ358 ŧ359
Ț538 ț539
Ʈ430 ƭ429
Ƭ428 ƫ427
Ù217Ugrave Capital U, grave accent ù249ugrave Small u, grave accent
Ú218Uacute Capital U, acute accent ú250uacute Small u, acute accent
Û219Ucirc Capital U, circumflex accent û251ucirc Small u, circumflex accent
Ü220Uuml Capital U, dieresis or umlaut mark ü252uuml Small u, dieresis or umlaut mark
Ũ360 ũ361
Ū362 ū363
Ŭ364 ŭ365
Ů366 ů367
Ű368 ű369
Ų370 ų371
Ư431 ư432
Ǔ467 ǔ468
Ǖ469 ǖ470
Ǘ471 ǘ472
Ǚ473 ǚ474
Ǜ475 ǜ476
Ȕ532 ȕ533
Ȗ534 ȗ535
Ŵ372W ŵ373
Ý221Yacute Capital Y, acute accent ý253yacute Small y, acute accent
Ÿ376YumlLatin capital letter Y with diaeresis ÿ255yumlSmall y, dieresis or umlaut mark
Ŷ374 ŷ375
Ȳ562 ȳ563
Ɣ404
ƴ436 Ƴ435
Ż379Z ż380
Ž381 ž382
Ź377 ź378
Ƶ437 ƶ438
Ȥ548 ȥ549

Note:  Codes 128··159 are often used (in particular on Windows systems) but form a representational problem (e.g. on other systems like Unix or Mac).

Greek symbols

These are Greek symbols, also used in mathematical notations.
Α913Alpha Capital alpha α945alpha Small alpha
Β914Beta Capital beta β946beta Small beta
Γ915Gamma Capital gamma γ947gamma Small gamma
Δ916Delta Capital delta δ948delta Small delta
Ε917Epsilon Capital epsilon ε949epsilon Small epsilon
Ζ918Zeta Capital zeta ζ950zeta Small zeta
Η919Eta Capital eta η951eta Small eta
Θ920Theta Capital theta θ952theta Small theta
ϑ977thetasym Small theta symbol
Ι921Iota Capital iota ι953iota Small iota
Κ922Kappa Capital kappa κ954kappa Small kappa
Λ923Lambda Capital lambda λ955lambda Small lambda
Μ924Mu Capital mu μ956mu Small mu
µ181micro Micro sign
Ν925Nu Capital nu ν957nu Small nu
Ξ926Xi Capital xi ξ958xi Small xi
Ο927Omicron Capital omicron ο959omicron Small omicron
Π928Pi Capital pi π960pi Small pi
ϖ982piv Pi symbol
Ρ929Rho Capital rho ρ961rho Small rho
Σ931Sigma Capital sigma ς962sigmaf Small final sigma
σ963sigma Small sigma
Τ932Tau Capital tau τ964tau Small tau
Υ933Upsilon Capital upsilon υ965upsilon Small upsilon
ϒ978upsih Upsilon with hook symbol
Φ934Phi Capital phi φ966phi Small phi
Χ935Chi Capital chi χ967chi Small chi
Ψ936Psi Capital psi ψ968psi Small psi
Ω937Omega Capital omega ω969omega Small omega

Special Symbols

 160nbsp Non-Break Space -173shy Soft hyphen
182para Paragraph sign §167sect Section sign
8211ndash 'n'-dash 8212mdash'm'-dash
8194ensp 'n'-space 8195emsp 'm'-space
8201thinsp thin space 8209non-breaking hyphen
8204zwnj zero width non-joiner 8205zwj zero width joiner
8206lrm left-to-right mark 8207rlm right-to-left mark
8226bullBullet = black small circle8230hellipHorizontal ellipsis = three dot leader
·183middot Middle dot ¤164curren General currency sign
°176deg Degree sign £163pound Pound sterling
¹185sup1 Note 1 ¢162cent Cent sign
²178sup2 Note 2, square ¥165yen Yen sign
³179sup3 Note 3, cubic 8364euro Euro sign
8216lsquo Left single quotation mark 8217rsquoRight single quotation mark
8218sbquo Single bottom 9 quotation mark 8222bdquoDouble bottom 9 quotation mark
8220ldquoLeft double quotation mark 8221rdquoRight double quotation mark
8224dagger Dagger 8225DaggerDouble dagger
¡161iexcl Inverted exclamation mark ¿191iquest Inverted question mark
ª170ordf Feminine ordinal º186ordm Masculine ordinal
8249lsaquo Single left-pointing angle quotation mark 8250rsaquoSingle right-pointing angle quotation mark
«171laquo Left double angle quote, guillemot-left »187raquo Right doubleangle quote, guillemot-right
ˆ710circ modifier letter circumflex accent ˜732tilde Small tilde
´180acute Acute accent ¨168uml Dieresis
¸184cedil Cedilla 8240permil Per mille sign
½189frac12 Fraction one-half ©169copy Copyright
¼188frac14 Fraction one-fourth ®174reg Registered trademark
¾190frac34 Fraction three-fourths 8482tradeTrade mark sign
¦166brvbar Broken vertical bar ¯175macr Macron accent
×215times Multiply sign ÷247divide Division sign
±177plusmn Plus or minus ¬172not Not sign
8242prime prime = minutes = feet 8243Prime Double prime = seconds = inches
8476real Blackletter capital R = real part symbol 8465image Blackletter capital I = imaginary part
8254oline Overline = spacing overscore 8260frasl Fraction slash
8472weierp Script capital P = power set = Weierstrass p8501alefsym Alef symbol = first transfinite cardinal
8592larr Leftwards arrow 8656lArr Leftwards double arrow
8593uarr Upwards arrow 8657uArr Upwards double arrow
8594rarr Rightwards arrow 8658rArr Rightwards double arrow
8595darr Downwards arrow 8659dArr Downwards double arrow
8596harr Left right arrow 8660hArr Left right double arrow
8629crarr Downwards arrow with corner leftwards = carriage return
8704forall for all 8707exist there exists
8712isin element of 8713notin not an element of
8715ni contains as member 8709empty empty set = null set = diameter
8745cap intersection = cap 8746cup union = cup
8706part partial differential 8747int integral
8711nabla nabla = backward difference 8901sdot dot operator
8719prod n-ary product = product sign 8721sum n-ary sumation
8722minus minus sign 8727lowast asterisk operator
8743and logical and = wedge 8744or logical or = vee
8730radic square root = radical sign 8734infin infinity
8733prop proportional to 8869perp up tack = orthogonal to = perpendicular
8736ang angle 9674loz lozenge
8764sim tilde operator = varies with = similar to 8756there4 therefore
8773cong approximately equal to 8776asymp almost equal to = asymptotic to
8800ne not equal to 8801equiv identical to
8804le less-than or equal to 8805ge greater-than or equal to
8834sub subset of 8835sup superset of
8836nsub not a subset of
8838sube subset of or equal to 8839supe superset of or equal to
8853oplus circled plus = direct sum 8855otimes circled times = vector product
8968lceil left ceiling = apl upstile 8969rceil right ceiling
8970lfloor left floor = apl downstile 8971rfloor right floor
9001lang left-pointing angle bracket = bra 9002rang right-pointing angle bracket = ket
9824spades Black spade suit 9827clubs Black club suit = shamrock
9829hearts Black heart suit = valentine 9830diams Black diamond suit
9786 Smiley

Box Drawing Symbols

┌─┬┐
│ ││
├─┼┤
└─┴┘
9484 9472 9516 9488
9474      9474 9474
9500 9472 9532 9508
9492 9472 9524 9496
 

=O=