Discussion:
[Flac] metaflac --no-utf8-convert complains about UTF
Jan Stary
2014-12-05 19:16:47 UTC
Permalink
This is 1.3.1 on OpenBSD/amd64.
The --no-utf8-convert option of metaflac(1) does not work for me:

$ metaflac --no-utf8-convert --set-tag="Artist=Žoužlíček" aladin.flac
aladin.flac: ERROR: tag value for 'Artist' is not valid UTF-8
(You probably can't see the Czech letters properly in my mail,
but that's beside the point.)

Indeed, it is not valid UTF8 (it's LATIN2), but isn't metaflac
supposed to just write it as specified, with this option?

Jan
Brian Willoughby
2014-12-06 05:50:47 UTC
Permalink
Hello Jan,

I assume the problem is that metaflac has no way of knowing the encoding that was provided on the command line, since it could literally be anything. The --no-utf8-convert option means that metaflac does nothing to the letters as they pass through, and then the problem becomes that the next program to read the tags has to assume the character set without any information. If the program reading the tags gets the character set wrong, then you see garbage.

It's possible that the "local charset" or "locale" will be the same on the command line and in the application interpreting the characters, but that's not always true.

Or, to put it another way, isn't the assumption that all tags in a FLAC file are UTF-8? Thus, if you provide LATIN2 and don't allow metaflac to convert, then it's sure to be garbage.

By the way, I can see the Czech letters properly in your email, because it has a header saying Content-Type: text/plain; charset="iso-8859-2" and my Mac uses that information to decode the characters correctly. Not that I can pronounce Czech properly, but it sure looks like some of my favorite movie titles…

I'm just guessing here, but I assume that the best way to handle this would be to provide the characters to metaflac in UTF-8 and not use that option (because it ignores the charset). Then the applications reading out the tags will know that they're UTF-8.

Obviously, if anyone has better procedures for this, please explain. I don't actually know whether this option is supposed to work on input, output, or both.

Brian
Post by Jan Stary
This is 1.3.1 on OpenBSD/amd64.
$ metaflac --no-utf8-convert --set-tag="Artist=Žoužlíček" aladin.flac
aladin.flac: ERROR: tag value for 'Artist' is not valid UTF-8
(You probably can't see the Czech letters properly in my mail,
but that's beside the point.)
Indeed, it is not valid UTF8 (it's LATIN2), but isn't metaflac
supposed to just write it as specified, with this option?
Jan Stary
2014-12-06 07:52:15 UTC
Permalink
Post by Brian Willoughby
I assume the problem is that metaflac has no way of knowing the encoding that was provided on the command line, since it could literally be anything. The --no-utf8-convert option means that metaflac does nothing to the letters as they pass through,
That's what is supposed to do, but doesn't, apparently.
Post by Brian Willoughby
and then the problem becomes that the next program to read the tags
has to assume the character set without any information.
If the program reading the tags gets the character set wrong,
then you see garbage.
Nevermind the next program, the problem now is that
metaflac does not honor the --no-utf8-convert option.
Post by Brian Willoughby
Or, to put it another way, isn't the assumption that all tags
in a FLAC file are UTF-8? Thus, if you provide LATIN2 and don't
allow metaflac to convert, then it's sure to be garbage.
No. It's sure to be exactly what the user provided,
n my case LATIN2, if metaflac honors the --no-utf8-convert option.
Post by Brian Willoughby
I don't actually know whether this option is supposed to work on input,
output, or both.
The manpage says:

--no-utf8-convert
Do not convert tags from UTF-8 to local charset, or vice versa.
This is useful for scripts, and setting tags in situations
where the locale is wrong.

"vice versa" tells me it's supposed to work for both input and output.
"setting tags" tells me it's deffinitely for input.
Jan Stary
2014-12-06 08:54:55 UTC
Permalink
Post by Jan Stary
This is 1.3.1 on OpenBSD/amd64.
$ metaflac --no-utf8-convert --set-tag="Artist=Žoužlíček" aladin.flac
aladin.flac: ERROR: tag value for 'Artist' is not valid UTF-8
(You probably can't see the Czech letters properly in my mail,
but that's beside the point.)
Indeed, it is not valid UTF8 (it's LATIN2), but isn't metaflac
supposed to just write it as specified, with this option?
The problem seems to be in
src/metaflac/operations_shorthand_vorbiscomment.c
in the set_vc_field() function.

It does check whether utf conversion is required,

/* move 'data' into 'converted', converting to UTF-8 if necessary */
if(raw) {
converted = data;
}
}
but later checks that FLAC__format_vorbiscomment_entry_is_legal()
whether or not we are utf converting; and this function, defined
in ./src/libFLAC/format.c, ultimately calls for utf8len_(s) no matter what.
So my LATIN2 text fails to be legal, because it's not legal UTF
-- which, indeed, it isn't.

Jan
Brian Willoughby
2014-12-06 09:28:55 UTC
Permalink
Post by Jan Stary
Post by Jan Stary
This is 1.3.1 on OpenBSD/amd64.
$ metaflac --no-utf8-convert --set-tag="Artist=Žoužlíček" aladin.flac
aladin.flac: ERROR: tag value for 'Artist' is not valid UTF-8
(You probably can't see the Czech letters properly in my mail,
but that's beside the point.)
Indeed, it is not valid UTF8 (it's LATIN2), but isn't metaflac
supposed to just write it as specified, with this option?
The problem seems to be in
src/metaflac/operations_shorthand_vorbiscomment.c
in the set_vc_field() function.
It does check whether utf conversion is required,
/* move 'data' into 'converted', converting to UTF-8 if necessary */
if(raw) {
converted = data;
}
}
but later checks that FLAC__format_vorbiscomment_entry_is_legal()
whether or not we are utf converting; and this function, defined
in ./src/libFLAC/format.c, ultimately calls for utf8len_(s) no matter what.
So my LATIN2 text fails to be legal, because it's not legal UTF
-- which, indeed, it isn't.
Looks like you found the problem. One piece of code is doing the right thing, another piece of code is ignoring the option.

By the way, I've never used FLAC inside Ogg Vorbis. Instead, I use pure FLAC format files. Is there any difference between the way this option works on a straight FLAC file versus how it works on FLAC data in an Ogg Vorbis container?

Brian
Martin Leese
2014-12-06 19:33:35 UTC
Permalink
Post by Brian Willoughby
Post by Jan Stary
Post by Jan Stary
This is 1.3.1 on OpenBSD/amd64.
...
Post by Brian Willoughby
Post by Jan Stary
The problem seems to be in
src/metaflac/operations_shorthand_vorbiscomment.c
in the set_vc_field() function.
By the way, I've never used FLAC inside Ogg Vorbis.
Instead, I use pure FLAC format files. Is there any
difference between the way this option works on a
straight FLAC file versus how it works on FLAC data
in an Ogg Vorbis container?
Some confusion seems to be creeping in, here.
Jan's problem, as I understand it, is with the
METADATA_BLOCK_VORBIS_COMMENT in
a Native FLAC file. (No OggFLAC, Ogg being
the container, and no Vorbis, a lossy codec.)

METADATA_BLOCK_VORBIS_COMMENT is
defined at:
https://xiph.org/flac/format.html#metadata_block_vorbis_comment

and VorbisComments at:
http://www.xiph.org/vorbis/doc/v-comment.html

Note that a VorbisComment is defined as
being UTF-8, although metaflac --no-utf8-convert
doesn't seem to be behaving as advertised.

Finally, Jan might have more luck taking his
problem with metatflac over to the flac-dev list.

Regards,
Martin
--
Martin J Leese
E-mail: martin.leese stanfordalumni.org
Web: http://members.tripod.com/martin_leese/
Martin Leese
2014-12-06 20:55:16 UTC
Permalink
Martin Leese wrote:
...
Post by Martin Leese
Finally, Jan might have more luck taking his
problem with metatflac over to the flac-dev list.
Even better, he could submit a bug report at:
http://sourceforge.net/p/flac/bugs/

Regards,
Martin
--
Martin J Leese
E-mail: martin.leese stanfordalumni.org
Web: http://members.tripod.com/martin_leese/
Martin Leese
2014-12-07 23:00:33 UTC
Permalink
Jan Stary wrote:
...
Post by Jan Stary
BTW, the other Xiph projects track their issues at https://trac.xiph.org/
- is it intentional that FLAC uses the sourceforge bug tracker?
Is there any relation between the two?
The only ticket I see at trac.xiph.org
specifically for FLAC is dated 2005. However,
the code maintainers all hang out on the
flac-dev list, so ask there.

I suggested SourceForge.net because that is
the "Bug Tracker" link on the Web page at:
https://xiph.org/flac/developers.html

Regards,
Martin
--
Martin J Leese
E-mail: martin.leese stanfordalumni.org
Web: http://members.tripod.com/martin_leese/
Jan Stary
2014-12-07 10:43:43 UTC
Permalink
Post by Jan Stary
This is 1.3.1 on OpenBSD/amd64.
$ metaflac --no-utf8-convert --set-tag="Artist=Žoužlíček" aladin.flac
aladin.flac: ERROR: tag value for 'Artist' is not valid UTF-8
(You probably can't see the Czech letters properly in my mail,
but that's beside the point.)
Indeed, it is not valid UTF8 (it's LATIN2), but isn't metaflac
supposed to just write it as specified, with this option?
The problem seems to be in
src/metaflac/operations_shorthand_vorbiscomment.c
in the set_vc_field() function.
It does check whether utf conversion is required,
/* move 'data' into 'converted', converting to UTF-8 if necessary */
if(raw) {
converted = data;
}
}
but later checks that FLAC__format_vorbiscomment_entry_is_legal()
whether or not we are utf converting; and this function, defined
in ./src/libFLAC/format.c, ultimately calls for utf8len_(s) no matter what.
So my LATIN2 text fails to be legal, because it's not legal UTF
-- which, indeed, it isn't.
https://xiph.org/flac/format.html#metadata_block_vorbis_comment
http://www.xiph.org/vorbis/doc/v-comment.html
Note that a VorbisComment is defined as
being UTF-8, although metaflac --no-utf8-convert
doesn't seem to be behaving as advertised.
Reading the above links, the Vorbis Comment is defined to be UTF8.
What is the purpose of --no-utf8-convert in setting tags then?
To specifically ask for invalid files?

Maybe I am misunderstanding the meaning of --no-utf8-convert.
Perhaps the current behaviour is intended, and --no-utf8-convert
just means "don't bother converting, it is already UTF8".
Which my example isn't, and metaflac rightfully complains.

Can anybody please shed some light on this?
Post by Jan Stary
Finally, Jan might have more luck taking his
problem with metatflac over to the flac-dev list.
http://sourceforge.net/p/flac/bugs/
Yes, I will move this to flac-dev and file a proper bug report
once I am sure it is a bug, and it's the bug I think it is.

BTW, the other Xiph projects track their issues at https://trac.xiph.org/
- is it intentional that FLAC uses the sourceforge bug tracker?
Is there any relation between the two?

Jan
Jan Stary
2014-12-07 11:30:06 UTC
Permalink
Not sure it this is related, but the UTF conversion from and to
my local charset does not work for me either (the --no-utf8-convert
option is not involved in this).

$ export LC_ALL=ISO8859-2
$ metaflac --remove-all-tags file.flac
$ metaflac --set-tag="TITLE=Žoužlička" file.flac
$ metaflac --list --block-number=2 file.flac
METADATA block #2
type: 4 (VORBIS_COMMENT)
is last: false
length: 59
vendor string: reference libFLAC 1.3.0 20130526
comments: 1
comment[0]: TITLE=#ou#li#ka
$ metaflac --export-tags-to=- file.flac
TITLE=#ou#li#ka

Here is how I understand this: metaflac understood the characters
in --set-tag="TITLE=Žoužlička", because it knows my local charset
is ISO8859-2; metaflac converted that string into UTF8 and stored it
in the Vorbis comment; that's what --list shows me.

But when I --export the tags, metaflac does _not_ convert
the UTF8 comment back to my ISO8859-2 charset.

Jan
Brian Willoughby
2014-12-07 18:29:06 UTC
Permalink
Post by Jan Stary
Maybe I am misunderstanding the meaning of --no-utf8-convert.
Perhaps the current behaviour is intended, and --no-utf8-convert
just means "don't bother converting, it is already UTF8".
That's exactly what I assume it means. I think that's the only thing it could mean.

Brian Willoughby
Jan Stary
2014-12-07 19:09:20 UTC
Permalink
Post by Brian Willoughby
Post by Jan Stary
Maybe I am misunderstanding the meaning of --no-utf8-convert.
Perhaps the current behaviour is intended, and --no-utf8-convert
just means "don't bother converting, it is already UTF8".
That's exactly what I assume it means.
I think that's the only thing it could mean.
"Do not convert tags from UTF-8 to local charset, or vice versa."

I think it is prefectly legit to understand this as
"save my comments exactly how I gave them".
Loading...