Discussion:
[Flac] Checking file integrity
Anton Shepelev
2011-04-23 18:53:07 UTC
Permalink
Hello all,

I have read FLAC stores a hashsum inside its files
to check their integrity. But how to do it without
having to extract the file? Is the hashsum calcu-
lated on the extracted or compressed data?

Thanks in advance,
Anton
J.B. Nicholson-Owens
2011-04-23 20:01:35 UTC
Permalink
Post by Anton Shepelev
I have read FLAC stores a hashsum inside its files
to check their integrity. But how to do it without
having to extract the file? Is the hashsum calcu-
lated on the extracted or compressed data?
I'm not sure what you're asking for, but the hash is computed on the raw
sample data. Raw samples are compressed inside the FLAC file.

To extract the computed hash (MD5):

metaflac --show-md5sum audio.flac

To extract the raw sample data and do the MD5 hash computation another
way you can uncompress the FLAC file, convert it to raw with a program
like sox(1), and recompute the MD5 hash:

flac --decode audio.flac (this creates audio.wav)
sox audio.wav audio.raw (this creates audio.raw)
md5sum audio.raw (this should match the hash above)

For example:

$ metaflac --show-md5sum audio.flac
7853aca9317d3b348b3aad5219fa63c9
$ flac --decode --silent audio.flac
$ sox audio.wav audio.raw
$ md5sum audio.raw
7853aca9317d3b348b3aad5219fa63c9 audio.raw

I hope this helps.
J.B. Nicholson-Owens
2011-04-23 20:50:38 UTC
Permalink
Post by Anton Shepelev
I have read FLAC stores a hashsum inside its files
to check their integrity. But how to do it without
having to extract the file? Is the hashsum calcu-
lated on the extracted or compressed data?
I should have mentioned before that apparently the MD5 hash is computed
on the uncompressed raw sample data and the easiest way to check the
FLAC file is with

flac --verify audio.flac

--verify won't extract the data to a file, but it does seem to run over
the entire file (as one would expect of such a function).
Anton Shepelev
2011-04-24 09:43:26 UTC
Permalink
Post by J.B. Nicholson-Owens
I'm not sure what you're asking for
I have a huge archive of FLAC files and want auto-
matically to check the integrity thereof, so as if
some file be reported as corrupted I can restore it
from a mirror backup.
Post by J.B. Nicholson-Owens
I should have mentioned before that apparently the
MD5 hash is computed on the uncompressed raw sam-
ple data and the easiest way to check the FLAC
file is with
flac --verify audio.flac
This doesn't work, and the official FLAC focumenta-
tion says this option is only applicable to WAV
files:

-V, --verify
Verify the encoding process. With this
option, flac will create a parallel
decoder that decodes the output of the
encoder and compares the result against
the original. It will abort immediately
with an error if a mismatch occurs. -V
increases the total encoding time but is
guaranteed to catch any unforseen bug in
the encoding process.

It seems to have nothing to do with the MD5 hash.

The procedure you have described in your previous
reply is quite complicated for an automatic check-
ing. Why does FLAC calculate MD5 on the RAW uncom-
pressed data? If it were using compressed data
instead the checking wouldn't require decompression
and would be quicker: just calculate the hash on the
binary file and compare it against the stored
value...

Anton
scott brown
2011-04-24 12:38:42 UTC
Permalink
flac -t audio.flac

-d, --decodeDecode (flac encodes by default). flac will exit with an exit
code of 1 (and print a message, even in silent mode) if there were any
errors during decoding, including when the MD5 checksum does not match the
decoded output. Otherwise the exit code will be 0. -t, --test Test (same as
-d except no decoded file is written). The exit codes are the same as in
decode mode.
Post by Anton Shepelev
Post by J.B. Nicholson-Owens
I'm not sure what you're asking for
I have a huge archive of FLAC files and want auto-
matically to check the integrity thereof, so as if
some file be reported as corrupted I can restore it
from a mirror backup.
Post by J.B. Nicholson-Owens
I should have mentioned before that apparently the
MD5 hash is computed on the uncompressed raw sam-
ple data and the easiest way to check the FLAC
file is with
flac --verify audio.flac
This doesn't work, and the official FLAC focumenta-
tion says this option is only applicable to WAV
-V, --verify
Verify the encoding process. With this
option, flac will create a parallel
decoder that decodes the output of the
encoder and compares the result against
the original. It will abort immediately
with an error if a mismatch occurs. -V
increases the total encoding time but is
guaranteed to catch any unforseen bug in
the encoding process.
It seems to have nothing to do with the MD5 hash.
The procedure you have described in your previous
reply is quite complicated for an automatic check-
ing. Why does FLAC calculate MD5 on the RAW uncom-
pressed data? If it were using compressed data
instead the checking wouldn't require decompression
and would be quicker: just calculate the hash on the
binary file and compare it against the stored
value...
Anton
_______________________________________________
Flac mailing list
Flac at xiph.org
http://lists.xiph.org/mailman/listinfo/flac
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.xiph.org/pipermail/flac/attachments/20110424/f033c2b0/attachment.htm
Brian Willoughby
2011-04-24 13:40:10 UTC
Permalink
Post by Anton Shepelev
The procedure you have described in your previous
reply is quite complicated for an automatic check-
ing. Why does FLAC calculate MD5 on the RAW uncom-
pressed data? If it were using compressed data
instead the checking wouldn't require decompression
and would be quicker: just calculate the hash on the
binary file and compare it against the stored
value...
An MD5 on the compressed data would be different for every
compression level, and potentially for every revision of the libFLAC
coder. In this case, the quicker solution is not the best. There is
only one MD5 hash for the uncompressed data, and it proves that you
got back the original data without any loss, which is the whole point
of FLAC. In fact, the way it is implemented provides two features:
It ensures that your FLAC file has not been corrupted, and it also
ensures that you actually get back the original data without any loss.

p.s. scott has already provided the -t solution.

Brian Willoughby
Sound Consulting
Anton Shepelev
2011-04-24 19:36:12 UTC
Permalink
I thank everybody for their replies.
An MD5 on the compressed data would be different
for every compression level, and potentially for
every revision of the libFLAC coder. In this
case, the quicker solution is not the best.
That's what I wanted to hear. To test a file's
integrity is one thing, and to test the correctness
of the compression by ensuring reversibility is
another.

Anton

Loading...