I have nailed the issue, it is indeed a bug in the Zimbra backend.
Apparently there is some type of double encoding going on:
In GetBodyRecursive (zimbra.php) the function iconv( in, out, string) converts the string from "in" format to the "out" format.
Code:
$oldEncoding = $parameters["charset"];
$body .= iconv( $oldEncoding, "UTF-8//IGNORE//TRANSLIT", $message->body);
This does
look alright, however $parameters["charset"] is "ISO-8859-1" even though the contents are "UTF-8". This results in garbage characters in the email.
Here is a version of the function that properly detects the string encoding and uses this as a base, not the supplied values. This produces 100% correct emails in all tested languages (german, french etc.). If you're moving this back into the repo, please keep the attrib!
Code:
/** GetBodyRecursive
* Get all parts in the message with specified type and concatenate them together, unless the
* Content-Disposition is 'attachment', in which case the text is apparently an attachment
*
* 2011-11-14: encoding fixed, comments added and beautified by kongregate/dwc <dwckongregate_x_googlemail.com>
*/
function GetBodyRecursive($message, $subtype, &$body)
{
if (!isset($message->ctype_primary))
{
return;
}
// is this object a text message, then grab the message body
if (strcasecmp($message->ctype_primary,"text") == 0 &&
strcasecmp($message->ctype_secondary,$subtype) == 0 &&
isset($message->body))
{
if (isset($message->ctype_parameters))
{
$parameters = $message->ctype_parameters;
$sourceEncoding = mb_detect_encoding($message->body, "auto"); // get encoding from body
// if encoding is not UTF-8 encode it properly
// FIXME: why do we need subtype plain?
if ($sourceEncoding != "UTF-8" && $subtype == "plain")
{
$body .= iconv( $sourceEncoding, "UTF-8//IGNORE//TRANSLIT", $message->body);
}
else
{
$body .= $message->body;
}
}
else
{
$body .= $message->body;
}
}
// is this a multipart email and there are multiple parts in this object then also grab the contents
if(strcasecmp($message->ctype_primary,"multipart") == 0 &&
isset($message->parts) &&
is_array($message->parts))
{
// iterate through message parts
foreach($message->parts as $part)
{
// if the part is not a binary attachement
if(!isset($part->disposition) || strcasecmp($part->disposition,"attachment"))
{
// grab the contents!
$this->GetBodyRecursive($part, $subtype, $body);
}
}
}
} // end GetBodyRecursive
Guess this will make some international users happy :)
Also, don't forget the Content-Type fix I have posted earlier if you want to run a happy system.
Thanks again for your work on this backend. Cheers!