ComicRack has a new home! Redirecting…

You should be automatically redirected. If not, visit http://comicrack.cyolito.com/ and update your bookmarks.

Tuesday, June 19, 2007

About meta data... again

As there seems to be a lot of confusion about where and when ComicRack stores it's meta data, lets get it from the mouth of the horse :)

ComicRack always stores data as Xml snippets (either as part of the big library file) or as single small files into eComics (see below).

When adding or opening eComics and there is a secondary stream present for the file (only on NTFS), the xml data from this stream is loaded. If no stream is present and the eComic is an archive, it is opened and a ComicInfo.xml is searched.

If you change meta data, the data is stored to the Library and to the secondary NTFS stream (if file is on NTFS and the write meta data to file option is turned on). The ComicInfo.xml file is NEVER changed. It always stays the same in this process.

ComicInfo.xml IS added to eComics when they are saved to an archive or file list format (Save as...) or converted (Convert to CBZ). Also only images not tagged as deleted are written.

Now you may wonder why this is done so complicated...

ComicRack supports various file formats (PDF, CBZ (ZIP), CBR(RAR) and will add further in the future like TAR.GZ). Let's take a closer look at these different file formats:
  • PDF: No archive at all, no place for ComicInfo.xml or any custom meta data.
  • CBR: Rar format can not be written (proprietary)
  • CBZ: Can be written, but changing the ComicInfo.xml would require to unpack the cbz, change the single file and pack up the whole thing again. This is very costly. That's the reason why ComicInfo.xml is not updated.
  • Future Formats: If they support fast replacement of the data file without unpacking the whole thing, comicrack will do it.
One of the implications for the whole process is also, that even if you only use ComicRacks file browser, you should still add all the files to the library. Because the library is the fastest way to get the data.

If there's popular demand, i may add a function like "Commit meta data to eComic" for file formats which support it (CBZ).

I hope everything is clear now.

This whole "use an archive format" for eComics may have been valid for the past, but it is very unsatisfactory for applications like ComicRack. If mp3 files would have been done the same way, changing id3 data would require the editor to expand the whole thing to a wave file, change the data, and recompress it again to mp3...

Using secondary streams on NTFS was the only plausible solution to store the info also to the eComic files and not only to the Library.

So how would IMHO a GOOD comic book file format look like (just a quick idea):

[signature]
[xml info offset]
[count of images]

this block repeats n times
[size of image stream]

[image stream]

[xml info]


This way you could do a very easy and fast update of the info, without doing much with the (possible huge amount) of image data. As pages are already compressed, compressing them again is unnecessary (always use store for zip files :)). But the problem is, this would be a proprietary format and thats where the only strength of archive based formats shines: You can open and extract the content with hundreds of free and open tools on every OS out there (That's also one of the reason why PDF is not a good choice).

Of course, a tool to parse the above file format would be 50 lines of code...