View Issue Details
| ID | Project | Category | View Status | Date Submitted | Last Update |
|---|---|---|---|---|---|
| 0004647 | ardour | bugs | public | 2012-01-21 21:42 | 2012-02-13 20:02 |
| Reporter | anrug | Assigned To | anrug | ||
| Priority | normal | Severity | minor | Reproducibility | always |
| Status | closed | Resolution | fixed | ||
| Target Version | 3.0-beta3 | ||||
| Summary | 0004647: TOC export does not properly encode CD text fields | ||||
| Description | Ardour (both 3.x and 2.x) writes CD text fields in UTF-8 (might be my system's setting) when exporting cdrdao TOC files. But cdrdao expects Latin1 (aka ISO 8859-1) with double quotes escaped and any non-ascii-printable characters in octal representation ("\123"). For Asian languages cdrdao (and CD text) uses yet another notation, if someone is interested to get that working with Ardour I would need some help from people familiar with Asian language encodings. Ardour should do the necessary encoding/escaping and maybe warn the user when a character he used can't be encoded as CD text. | ||||
| Additional Information | Attached is a file which shows a C++ code fragment that the cdrdao project uses when writing CD text fields in TOC files. The file also has two functions and a minimal test program (sorry, only simple ANSI C) which I sketch out to get the strings in the proper representation. | ||||
| Tags | No tags attached. | ||||
|
2012-01-21 21:42
|
a3_toc_patch.c (3,104 bytes)
#include <stdio.h>
/**
* Code fragments for properly encoding UTF-8 strings in cdrdao TOCs
*
* Andreas Ruge, Jan 2012
*/
/* This is taken from cdrdao 1.2.3, it shows how text fields fro TOCs
are escaped (cdrdao runs under the C locale):
out << " \"";
for (i = 0; i < dataLen_ - 1; i++) {
if (data_[i] == '"') {
out << "\\\"";
}
else if (isprint(data_[i])) {
out << data_[i];
}
else {
sprintf(buf, "\\%03o", (unsigned int)data_[i]);
out << buf;
}
}
out << "\"";
}
*/
/**
* Print a string in the format used for CD text strings in cdrdao TOC files
*
* This is:
* a) escape double quotes with a backslash
* b) print all characters from 0x20 - 0x7E (printable ascii)
* c) use octal three digit representation for all other values
* d) enclose the whole string with double quotes
*
* Andreas Ruge, 2012
*/
void toc_print_string(const char *s, FILE *fp)
{
fprintf(fp, " \"");
for ( ; *s != '\0'; s++)
{
if (*s == '"')
{
fprintf(fp, "\\\"");
}
else if (0x20 <= *s && *s <= 0x7E)
{
fprintf(fp, "%c", *s);
}
else
{
fprintf(fp, "\\%03o", (unsigned char)*s);
}
}
fprintf(fp, "\"");
}
/**
* Translate UTF-8 string to ISO 8859-1 (latin1)
*
* the dest buffer will never have to be larger than the src string
*
* Return
* 0 on success
* 1 when a unicode sequnece was found which can't be represented in
* ISO 8859-1, or when the unicode sequence is invalid
*
* Andreas Ruge, 2012
*/
int utf8_to_latin1(unsigned char *dest, int dest_len, unsigned char *src)
{
int ret = 0;
src = (unsigned char *)src;
while (*src && dest_len)
{
if (!(*src & 0x80))
{
/* 7-bit (ASCII range) */
*dest++ = *src++;
dest_len--;
}
else if (((*src & 0xFC) == 0xC0) && ((*(src + 1) & 0xC0) == 0x80))
{
/* bit pattern 110000xx 10xxxxxx,
a two byte UTF-8 sequence with no more than 8 data bits used,
i.e. can be translated straight to IS0 8859-1 */
*dest = *src++ << 6;
*dest++ |= *src++ & 0x3F;
dest_len--;
}
else
{
/* byte is part of an UTF-8 sequence which can't be translated */
ret = 1;
*src++;
}
}
*dest = '\0';
return ret;
}
/* Test function, to be used on a UTF-8 terminal */
int main(int argc, char *argv[]) {
char buf[100];
char *p;
int ret = utf8_to_latin1(buf, sizeof buf, argv[1]);
if (ret == 0) {
printf("All characters converted to ISO 8859-1\n");
} else {
printf("Warning: some characters could not be converted to ISO 8859-1\n");
}
/*for (p = buf; *p; p++)
{
printf("%x ", (unsigned char)*p);
}
printf("\n");*/
toc_print_string(buf, stdout);
printf("\n");
return 0;
}
|
|
2012-01-22 20:40
|
a3_toc_patch2.c (3,336 bytes)
#include <stdio.h>
/**
* Code fragments for properly encoding UTF-8 strings in cdrdao TOCs
*
* Andreas Ruge, Jan 2012
*/
/* This is taken from cdrdao 1.2.3, it shows how text fields fro TOCs
are escaped (cdrdao runs under the C locale):
out << " \"";
for (i = 0; i < dataLen_ - 1; i++) {
if (data_[i] == '"') {
out << "\\\"";
}
else if (isprint(data_[i])) {
out << data_[i];
}
else {
sprintf(buf, "\\%03o", (unsigned int)data_[i]);
out << buf;
}
}
out << "\"";
}
*/
/**
* Print a string in the format used for CD text strings in cdrdao TOC files
*
* This is:
* a) escape double quotes with a backslash
* b) print all characters from 0x20 - 0x7E (printable ascii), except the backslash
* c) use octal three digit representation for all other values
* d) enclose the whole string in double quotes
*
* Andreas Ruge, 2012
*/
void toc_print_string(const char *s, FILE *fp)
{
fprintf(fp, " \"");
for ( ; *s != '\0'; s++)
{
if (*s == '"')
{
fprintf(fp, "\\\"");
}
else if (0x20 <= *s && *s <= 0x7E && *s != '\\')
{
fprintf(fp, "%c", *s);
}
else
{
fprintf(fp, "\\%03o", (unsigned char)*s);
}
}
fprintf(fp, "\"");
}
/**
* Translate UTF-8 string to ISO 8859-1 (latin1)
*
* the dest buffer will never have to be larger than the src string
*
* Return
* 0 on success
* 1 when a unicode sequnece was found which can't be represented in
* ISO 8859-1, or when the unicode string is invalid
*
* Andreas Ruge, 2012
*/
int utf8_to_latin1(unsigned char *dest, int dest_len, unsigned char *src)
{
int ret = 0;
while (*src && dest_len)
{
if (!(*src & 0x80))
{
/* 7-bit => ASCII range */
*dest++ = *src++;
dest_len--;
}
else
{
/* 8-bit => UTF-8 multi-byte sequence */
if (((*src & 0xFC) == 0xC0) && ((*(src + 1) & 0xC0) == 0x80))
{
/* bit pattern 110000xx 10xxxxxx,
a two byte UTF-8 sequence with no more than 8 data bits used,
i.e. can be translated straight to IS0 8859-1 */
*dest = *src++ << 6;
*dest++ |= *src++ & 0x3F;
dest_len--;
}
else
{
/* part of UTF-8 multi-byte sequence which can't be
translated to ISO 8859-1 */
ret = 1;
*src++;
}
}
}
*dest = '\0';
return ret;
}
/* Test function, to be used on a UTF-8 terminal. */
int main(int argc, char *argv[]) {
char buf[100];
char *p;
int ret = utf8_to_latin1(buf, sizeof buf, argv[1]);
if (ret == 0) {
printf("All characters converted to ISO 8859-1\n");
} else {
printf("Warning: some characters could not be converted to ISO 8859-1\n");
}
/*for (p = buf; *p; p++)
{
printf("%x ", (unsigned char)*p);
}
printf("\n");*/
printf("Encoded for cdrdao TOC:");
toc_print_string(buf, stdout);
printf("\n");
return 0;
}
|
|
|
I've uploaded a modified version of my (C) code. The cdrdao parser does not require and does not even allow the backslash to be escaped itself. So when you enter a track title with a leading backslash, the backslash will be written to the TOC file literally, resulting in something like: TITLE "MyTitle\" which will break the cdrdao toc parser. Worse, if you enter a title like 'Titel\123' you'll get 'TitleS' on your CD-R and nor warnings whatsoever. To work around this Ardour should write a backslash in octal representation, as done in the new file 'a3_toc_patch2.c'. Weired stuff. ;) |
|
|
svn rev 11314 contains an implementation of this. your code wasn't UTF-8 safe, but it did convey the general idea. I've tested it on a few test cases and it appears to work. please let me know if you are aware of any issues with it. |
|
|
Hmm, what did you mean by "UTF-8 safe"? The function for escaping is meant to be used with a Latin1 encoded (8bit) string. I may be completely mistaken but in your change to svn 11314 I can't see where we get a Latin1 string in the first place. |
|
|
i created a marker called "Something ß for "Ü" |
|
|
Has been fixed in a3 |
| Date Modified | Username | Field | Change |
|---|---|---|---|
| 2012-01-21 21:42 | anrug | New Issue | |
| 2012-01-21 21:42 | anrug | File Added: a3_toc_patch.c | |
| 2012-01-22 10:20 | cth103 | cost | => 0.00 |
| 2012-01-22 10:20 | cth103 | Target Version | => 3.0-beta3 |
| 2012-01-22 20:40 | anrug | File Added: a3_toc_patch2.c | |
| 2012-01-22 20:47 | anrug | Note Added: 0012613 | |
| 2012-01-23 17:24 | paul | Note Added: 0012617 | |
| 2012-01-23 17:24 | paul | Status | new => feedback |
| 2012-01-23 20:08 | anrug | Note Added: 0012628 | |
| 2012-01-23 20:13 | paul | Note Added: 0012629 | |
| 2012-02-13 20:02 | anrug | Note Added: 0012793 | |
| 2012-02-13 20:02 | anrug | Status | feedback => resolved |
| 2012-02-13 20:02 | anrug | Resolution | open => fixed |
| 2012-02-13 20:02 | anrug | Assigned To | => anrug |
| 2012-02-13 20:02 | anrug | Status | resolved => closed |