Wednesday, April 6, 2011

Encoding and Decoding entities in HTML and XHTML documents


HTMLEntities is a simple library to facilitate encoding and decoding of named (ý and so on) or numerical ({ orĪ) entities in HTML and XHTML documents.
The current release is version 4.1.0.

Changes since version 4.0.0

  • Now works with Ruby 1.9.1 and JRuby 1.3.1.
  • Reverted lazy loading of entity mappings as this is not thread-safe.
  • Finally removed the deprecated String#encode_entities and#decode_entities methods.
  • Added :expanded charset: about 1000 SGML entities.

Usage

If you are running the examples below in irb, please make sure that you are running in a UTF-8 terminal and have set $KCODE to 'u'to enable the display of UTF-8 characters.

Decoding

require 'htmlentities'
coder = HTMLEntities.new
string = "élan"
coder.decode(string) # => "élan"

Encoding

This is slightly more complicated, due to the various options. The encode method takes a variable number of parameters, which tell it which instructions to carry out.
require 'htmlentities'
coder = HTMLEntities.new
string = "<élan>"
coder.encode(string) # => "<élan>"
coder.encode(string, :named) # => "<élan>"
coder.encode(string, :decimal) # => "<élan>"
coder.encode(string, :hexadecimal) # => "<élan>"

Documentation

For more details, you can browse the rdoc-generated documentation online.

Getting it

HTMLEntities is available as a tarball or gem on its RubyForge files page.
You should also be able to install it using RubyGems:
gem install htmlentities
The current source code can be found on GitHub asthreedaymonk/htmlentities.

No comments:

Post a Comment