Thursday, October 28, 2010

character encoding woes

Using 1.3.5

In a new app, I've just noticed that I've got some sort of an encoding
problem. One of the descriptions for my Book model contains some
accented characters (à, é) that are displaying as the dreaded black
diamond with question mark. I'm seeing this in FF, Chrome, and Opera
(haven't bothered booting the Windows box).

checklist ...

bootstrap.php:
Configure::write('App.encoding', 'UTF-8');

books.sql:
SET NAMES 'utf8'; at the beginning of the file.

$ file -bi books.sql
text/plain; charset=utf-8

For the heck of it:
$ recode UTF-8 books.sql

MySQL in terminal:
mysql> SHOW FULL COLUMNS FROM books;
(description has utf8_unicode_ci)

mysql> SET NAMES 'utf8';
mysql> SELECT description from books WHERE id = 19;
(looks great)

html source:
<!DOCTYPE html>
<html dir="ltr" lang="en-CA">
<head>
<meta http-equiv="Content-Type" content="text/html; charset=utf-8" />
<meta charset="utf-8">

request header:
Accept-Charset UTF-8,*

response header:
Vary Accept-Encoding

Browser reports page encoding "Unicode (UTF-8)".

When I switch to ISO-8859-1 it looks ... correct.

And I use jEdit, which will /refuse/ to open a file that contains
anything other than UTF-8. It will demand that it be re-opened using a
selected alternative character encoding. In any case, I copied the
offending text into Gedit and saved it as a temp file, making sure
that the encoding was set to default UTF-8. I then opened the file in
jEdit and copied & pasted the text into books.sql, re-imported into
the DB. No joy.

Just to be sure the content wasn't the problem, I then went here:
http://www.fileformat.info/info/unicode/char/search.htm

... and copied & pasted the characters from there to my file, and
re-imported. (Yes, the site serves as UTF-8. You'd be surprised how
many sites display tables of "UTF-8 characters" as Latin-1, Mac-Roman,
etc.) Anyway, no cigar.

I also copied everything in the html source--except the offending
text--into a new file at webroot/tmp.html. I then copied the offending
text from books.sql into its correct place in the doc and loaded it
up. It looks great. Damn!

So, the only possibility that I can think of is that Cake is screwing
things up somehow, perhaps simply requesting the data from MySQL using
an erroneous char encoding (anyone know where I can check that?), but
I should point out that the layout is using $html->charset() for that
(correct) meta tag.

Anything I missed?

Check out the new CakePHP Questions site http://cakeqs.org and help others with their CakePHP related questions.

You received this message because you are subscribed to the Google Groups "CakePHP" group.
To post to this group, send email to cake-php@googlegroups.com
To unsubscribe from this group, send email to
cake-php+unsubscribe@googlegroups.com For more options, visit this group at http://groups.google.com/group/cake-php?hl=en

No comments: