by llizard (aka ejm)
character entities for HTML
Because the symbols ” & < > (double quotationmark, ampersand, lessthansymbol and greaterthansymbol) are integral to HTML code itself, you should be replacing all instances of ” & < > that are used for anything other than the code itself. It is also useful to employ character entities to replace characters in e-mail addresses that are entered on your webpages. This helps greatly in preventing robots from mining e-mail addresses for spam purposes. In your HTML coding, replace the character with &#numberofcharacter; (ampersand, hashmark, numberofcharacter, semicolon). Instead of &, you would type
edited August 2008
N.B. The folks at w3c now recommend that if accented characters are to be used, the UTF-8 charset be chosen rather than iso-8859-1. But the characters ” & < > (double quotationmark, ampersand, lessthansymbol and greaterthansymbol) should still be encoded, as well as any extended characters you may be using. Put the following coding just before
<meta http-equiv=”content-type” content=”text/html; charset=utf-8″ />
It is probably still a good idea to encode the @ symbol especially if the
mailto: link is used. Please see the recommended for encoding section.
(Read more on the w3c Internationalization page.)Note that extended characters can be typed as-is on some blogging platforms. The blog software will translate the characters into character entities as required for viewing correctly on HTML pages, but leave them alone for RSS feeds. (Indeed, in moving these pages to wordpress.com, getting this page to display has been quite challenging. The software wants to automatically changing the coding instructions into the actual characters….)
ASCII stands for “American Standard Code Information Interchange”. The numbers of the characters can be found by referring to the character map on your computer. The characters numbered from #032 to #126 are common to all keyboard systems. (#032 is the spacebar and is not the recommended character entity to depict a non-breaking space.)
recommended for encoding
extended characters – A Cautionary Tale
The characters numbered from #127 to #255 (and higher) are not common to all keyboard systems and can look quite different on different operating systems. Please bear that in mind when you use these characters.
Here is an example using the character #189. These will look different depending on whether you are viewing this page on a PC or a Mac or a….
After looking at the character map on my PC, it looks like #189 in “Symbol” font will show a “vertical bar“.
Symbol font #189: ½
But when I look at it here on the webpage, even though this computer has the symbol font installed, I see a “one half” symbol in Netscape7, Firefox and Opera. The “vertical bar” only appears in IE6 and the ancient and little-used NS4. See “standard characters” for the character entity for | (vertical bar)
Here is the character #189 in a “sans-serif” font: ½
In this case, on my PC, as expected, I see a “one half” symbol, but depending on your OS, you might be seeing an asterisk, or the symbol for Pi, or ?, or ….
“wingdings” #189: ½
On my PC, I see an “analogue clock” symbol showing 07:00. But anyone who doesn’t have the wingdings font installed will see “one half” (or maybe an asterisk, or the symbol for Pi, or….)
Moral of the story: It is inadvisable to use specific fonts for symbols. If you really want the characters you use on your website to be viewed relatively globally, it’s a very good idea to follow the guidelines at www.w3.org. If you plan on using extended characters, you might want to use images rather than entities unless you know categorically that your viewers will be able to see them.
Other charactersets can be found at www.w3.org – Internationalization page.
© llizard (aka ejm) 1998, 2000, 2001, 2003, 2004, 2005, 2006, 2008, 2015
(Yes, the above © symbol is displayed on this page by using the entity ©)