| ERW: The Manual | ||
|---|---|---|
| <<< Previous | Localisation | Next >>> |
Natively, ERW runs using ISO-8859-1. This is the standard character encoding for most PHP installation, and for several databases. However, using ISO-8859-1 is a strong limiting factor if you plan to manage data in languages that are not covered by its codepoints.
ERW supports fully UTF-8. As usual when character-encoding problems are involved, the mechanism is not so simple. The first step to use UTF-8 support is to make PHP UTF-8 aware. This requires compiling in PHP the mbstring extension, which allows to handle transparently strings in several encodings. Unfortunately, several distributions do not release PHP with the mbstring extension enabled, so you may have to recompile PHP (although if you have a decent packaging system this should turn out to be rather easy). To check whether your PHP installation has mbstring, you can look at the output of php -m.
Once the mbstring extension has been loaded, you
must enable UTF-8 support by setting the configuration variable $_ERW_utf8 and configure
PHP to use UTF-8 by modifying the following parameter in the PHP
configuration file:
mbstring.func_overload = 7 |
ERW will generate a suitable HTML tag that will inform the browser that its output is encoded as UTF-8. However, it is good practise to communicate this information before the page is actually output, using an HTTP header; this can be obtained setting the following PHP configuration variable:
default_charset = "UTF-8" |
default_charset can also be changed
locally, using an .htaccess file. Be sure to have the
required permissions (AllowOverride Options), and just
add a line
php_value default_charset UTF-8 |
The rationale behind UTF-8 support is that everything should be in UTF-8. Thus, with UTF-8 support activated web pages are output in UTF-8, input from the browser is read in UTF-8, and data exchange with the database if performed using UTF-8 strings. This influences a number of settings, going from the DBMS internal encoding to the terminal encoding, that must be set up correctly.
Note that usually an 8-bit clean database with work flawlessly with UTF-8 support, as UTF-8 has several good properties (e.g., lexicographical byte-by-byte ordering coincide with lexicographical character-by-character ordering).
![]() | By default, ERtool will generated definition files using ISO-8859-1. As long as none of your labels, enumerative types, etc. is not US-ASCII, this is not a problem. If, however, you plan to use arbitrary UTF-8 characters in your labels, you should use the UTF-8 backend when producing definition files with ERtool. |
| <<< Previous | Home | Next >>> |
| Other Localisation Variables | Up | Script Reference |