character entities

by llizard (aka ejm)

recommended for encoding . standard characters . cautionary tale about extended characters

Because the symbols ” & < > (double quotationmark, ampersand, lessthansymbol and greaterthansymbol) are integral to HTML code itself, you should be replacing all instances of ” & < > that are used for anything other than the code itself. It is also useful to employ character entities to replace characters in e-mail addresses that are entered on your webpages. This helps greatly in preventing robots from mining e-mail addresses for spam purposes. In your HTML coding, replace the character with &#numberofcharacter; (ampersand, hashmark, numberofcharacter, semicolon). Instead of &, you would type &amp;.

edited August 2008

N.B. The folks at w3c now recommend that if accented characters are to be used, the UTF-8 charset be chosen rather than iso-8859-1. But the characters ” & < > (double quotationmark, ampersand, lessthansymbol and greaterthansymbol) should still be encoded, as well as any extended characters you may be using. Put the following coding just before </head>.

<meta http-equiv=”content-type” content=”text/html; charset=utf-8″ />

It is probably still a good idea to encode the @ symbol especially if the mailto: link is used. Please see the recommended for encoding section.

(Read more on the w3c Internationalization page.)Note that extended characters can be typed as-is on some blogging platforms. The blog software will translate the characters into character entities as required for viewing correctly on HTML pages, but leave them alone for RSS feeds. (Indeed, in moving these pages to wordpress.com, getting this page to display has been quite challenging. The software wants to automatically changing the coding instructions into the actual characters….)

ASCII stands for “American Standard Code Information Interchange”. The numbers of the characters can be found by referring to the character map on your computer. The characters numbered from #032 to #126 are common to all keyboard systems. (#032 is the spacebar and is not the recommended character entity to depict a non-breaking space.)

recommended for encoding

recommended for encoding . standard characters . cautionary tale about extended characters

characters to be replaced with entities for display in HTML

use these character entities – strongly recommended

Character & Name     Character Entity
_______________________________________
&   ampersand            &amp;      
<   lessthan             &lt;       
>   greaterthan          &gt;       
"   quotationmark        &quot;     
@   at                   &#064;     
©   Copyright            &copy;     

Use &nbsp; to indicate 
a nonbreaking space in HTML.

encoding examples using some of the above characters

extended characters – A Cautionary Tale

recommended for encoding . standard characters . cautionary tale about extended characters

The characters numbered from #127 to #255 (and higher) are not common to all keyboard systems and can look quite different on different operating systems. Please bear that in mind when you use these characters.

Here is an example using the character #189. These will look different depending on whether you are viewing this page on a PC or a Mac or a….

After looking at the character map on my PC, it looks like #189 in “Symbol” font will show a “vertical bar“.

Symbol font #189: ½

But when I look at it here on the webpage, even though this computer has the symbol font installed, I see a “one half” symbol in Netscape7, Firefox and Opera. The “vertical bar” only appears in IE6 and the ancient and little-used NS4. See “standard characters” for the character entity for | (vertical bar)

Here is the character #189 in a “sans-serif” font: ½

In this case, on my PC, as expected, I see a “one half” symbol, but depending on your OS, you might be seeing an asterisk, or the symbol for Pi, or ?, or ….

“wingdings” #189: ½
On my PC, I see an “analogue clock” symbol showing 07:00. But anyone who doesn’t have the wingdings font installed will see “one half” (or maybe an asterisk, or the symbol for Pi, or….)

Moral of the story: It is inadvisable to use specific fonts for symbols. If you really want the characters you use on your website to be viewed relatively globally, it’s a very good idea to follow the guidelines at www.w3.org. If you plan on using extended characters, you might want to use images rather than entities unless you know categorically that your viewers will be able to see them.

standard characters

recommended for encoding . standard characters . cautionary tale about extended characters

These characters will look the same from computer to computer.
(Use a fixed-width font for ASCII-art and charts)

Courier, FixedSys, Monaco and the generic monospace are examples of non-proportional or fixed width fonts and are used for drawing ASCII-art and making charts. It is ill advised to use Times New Roman, Times, Symbol or any other proportional width fonts for drawing ASCII-art or making charts.

   &#032;    3   &#051;    F   &#070;    Y   &#089;    l   &#108;

!   &#033;    4   &#052;    G   &#071;    Z   &#090;    m   &#109;

"   &#034;    5   &#053;    H   &#072;    [   &#091;    n   &#110;

#   &#035;    6   &#054;    I   &#073;    \   &#092;    o   &#111;

$   &#036;    7   &#055;    J   &#074;    ]   &#093;    p   &#112;

%   &#037;    8   &#056;    K   &#075;    ^   &#094;    q   &#113;

&   &#038;    9   &#057;    L   &#076;    _   &#095;    r   &#114;

'   &#039;    :   &#058;    M   &#077;    `   &#096;    s   &#115;

(   &#040;    ;   &#059;    N   &#078;    a   &#097;    t   &#116;

)   &#041;    <   &#060;    O   &#079;    b   &#098;    u   &#117;

*   &#042;    =   &#061;    P   &#080;    c   &#099;    v   &#118;

+   &#043;    >   &#062;    Q   &#081;    d   &#100;    w   &#119;

,   &#044;    ?   &#063;    R   &#082;    e   &#101;    x   &#120;

-   &#045;    @   &#064;    S   &#083;    f   &#102;    y   &#121;

.   &#046;    A   &#065;    T   &#084;    g   &#103;    z   &#122;

/   &#047;    B   &#066;    U   &#085;    h   &#104;    {   &#123;

0   &#048;    C   &#067;    V   &#086;    i   &#105;    |   &#124;

1   &#049;    D   &#068;    W   &#087;    j   &#106;    }   &#125;

2   &#050;    E   &#069;    X   &#088;    k   &#107;    ~   &#126;

#032 is the space bar.

Some people prefer named entities rather than
numbered because they are easier to remember.

"  quotation mark      &#034;  or   &quot;   
&  ampersand           &#038;  or   &amp;    
'  single quote        &#039;  or  &apos;    
<  less than           &#060;  or   &lt;     
>  greater than        &#062;  or   &gt;     

N.B. apos; is NOT fully supported.
Use #039; instead

 

Other charactersets can be found at www.w3.org – Internationalization page.

© llizard (aka ejm) 1998, 2000, 2001, 2003, 2004, 2005, 2006, 2008, 2015

(Yes, the above © symbol is displayed on this page by using the entity &copy;)


FAQ | Copyright Myths | ASCII characters chart | character entities | putting ASCII-art on a webpage | “internet safe” colours | choosing colours