News:

Dear forum visitors, if the support forum is not available, please try again a few minutes later. Thanks!

Main Menu
Support-Forum

[PATCH] Improvements for the MP3 ID3v2 parser

Started by ht, 25.04.2019 13:32:10

Previous topic - Next topic

0 Members and 1 Guest are viewing this topic.

ht

Hi,

Please find attached a patch to improve the jDownloads MP3 ID3 tag parser.  The patch is against jDownloads 3.2.64.

As I've seen similar issues in the presentation of MP3 ID3 information as described in <http://www.jdownloads.com/index.php/support-forum.html?topic=11368.0>, I took a quick look at the ID3v2 specification <http://id3.org/id3v2.4.0-structure> and the jDownloads ID3 tag parser implementation to try to find a reason why the ID3 metadata sometimes displays with garbled (or extra) characters, or sometimes only partially, or even not at all.

In summary, I found the following things which the attached patch tries to address in the getID3v2Tags function in com_jdownloads.orig/site/helpers/jdownloadshelper.php:

       
  • The tag/frame size calculation did not always yield correct results.  According to the ID3v2 specification, each byte of the four-byte field reserved for tag/frame size data carries only 7 bits of information.  The attached patch rewrites the size calculation logic to address this issue.  (The patched code doesn't verify that each byte in a size field has its most significant bit cleared as mandated by the specification – tags and frames with invalid sizes should probably be skipped.  However, trusting such invalid size values to seek to another position in the input file could easily lead to just losing the tag/frame tracking position in the input file anyway, so the patch omits any validity checks for simplicity.)
  • According to the ID3v2 specification <http://id3.org/id3v2.4.0-frames>, the first byte of each text information frame specifies an encoding for the data that follows.  The attached patch adds support for the UTF-16 and UTF-8 encodings explicitly mentioned in the specification (relying on the iconv function to convert between different encodings), and makes the parser skip text information frames with an unknown encoding.  (The previous implementation used the ISO-8859-1 input encoding for all text data, which is probably why UTF-encoded non-ASCII data could appear garbled on output.)
  • To simplify the code, the attached patch also removes the $blnAllFrames parameter from this function – I couldn't find the function being called with a non-default value for this parameter from anywhere else in the code included in the distribution.  There's probably no universal encoding which could be assumed for the contents of non-text data frames, so it's unclear how the data should be handled in this case.
In addition to the patch, I also have a small question regarding the handling of the {name} placeholder value in the layout templates for displaying ID3 tag information.  Unlike some other placeholder values (such as {album} or {artist}), the {name} placeholder value appears to get replaced with the download file name, and not (which is what I originally assumed) with the song title from the input file.  Is this behavior intentional?  As a feature request, I'd welcome having a placeholder value to access the song title (the contents of the TIT2 text information frame) from the ID3 metadata since this frame may contain more complete information about the song than what's in the download file name.

Thank you for this amazing software!  I hope that the attached patch could be used as a basis to improve it further.

Regards,
Heikki Tauriainen


[gelöscht durch Administrator]
  •  

ht

Hi,
I'm so sorry, it appears that things are much more complicated than I thought while originally comparing the tag parser implementation against the ID3v2 specification (where I just picked the latest version that was available, ID3v2.4.0).  However, it appears that the ID3v2.4.0 tag format differs from the previous ID3v2.3.0 specification in some crucial details which makes the two versions incompatible: when comparing the specifications, it appears that

       
  • only the tag size is represented as a "synchsafe" integer (<http://id3.org/id3v2.4.0-structure>, Section 6.2) with 7 bits of information in each byte in both versions of the specification;
  • the size of each frame is represented as a synchsafe integer only in IDv2.4.0; in the previous version, each byte of the size field carries 8 bits of information; and
  • only the IDv2.4.0 tag format supports UTF-8 encoded strings – in IDv2.3.0 all UTF strings are "16-bit Unicode" (see <http://id3.org/d3v2.3.0>, Section 3.3).
Because of these differences, the previous patch won't work correctly with files using the older ID3v2.3 tag format.  The original tag parser implementation looks like it could've been based on this version of the tag format, based on the observation that it refers to 4-character text information frame identifiers which exist only in ID3v2.3 (the identifiers have only three characters in IDv2.2, and the TYER identifier no longer exists in ID3v2.4).

Please find attached a new version of the patch (against the jDownloads 3.2.64 distribution) which intentionally supports only ID3v2.3 and ID3v2.4 tags having no extended header (which should be skipped if present) and no tag (or frame, ID3v2.4) data which has been "unsynchronized" (the unsynchronization should be undone before processing the tag/frame content).  A complete parser implementation should of course support these features as well, therefore the patch ends up being not much more than just a proof of concept...

Regards,Heikki Tauriainen


[gelöscht durch Administrator]
  •  

Arno

Hi Heikki,
many thanks for your work. I know that the ID3v2 specifications are very complicated. So I had used at that time a open source script for this functionality as I have not the time to create an own. Yes it is not perfectly and error-prone. I had planned to remove this part in the coming new jD generation 3.9. But some users need it still. So I have added it again to be  compatible with the current versions.

Fact is, that I use this functionality not self and have also not the time to check your modifications really with the new 3.9 series. Maybe are you interested to check this self? This would be very useful. In this case I would you add to the testing team. Interested?  ;)
Best Regards / Gruß
Arno
Please make a Donation for jDownloads and/or write a review on the Joomla! Extensions directory!
  •  

ht

Hi Arno,

Thank you very much for your answer!

> Hi Heikki,
> many thanks for your work. I know that the ID3v2 specifications are very complicated. So I had used at that time a open source script for this functionality as I have not the time to create an own. Yes it is not perfectly and error-prone. I had planned to remove this part in the coming new jD generation 3.9. But some users need it still. So I have added it again to be  compatible with the current versions.


Thank you for this background information, I wasn't aware of the history of the tag parser implementation, nor that it was actually being considered for removal in the next major version of jDownloads.

> Fact is, that I use this functionality not self and have also not the time to check your modifications really with the new 3.9 series. Maybe are you interested to check this self? This would be very useful. In this case I would you add to the testing team. Interested?  ;)

I'm afraid I can't commit to any kind of long-term testing role at this moment (I must confess here that I have basically zero prior knowledge about jDownloads internals, and not even much experience about Joomla administration – I only have access to a live Joomla installation where I'm responsible for the maintenance of a simple media archive).

However, as the suggested changes concern only a single stand-alone function having no apparent dependencies from any other code in the codebase, I'd believe the patched code would work the same in a new version of a helper library as a replacement for the old ID3 parser code.  (In testing the patched version of the code, I simply ran the function directly against some test files.)  Given access to a development version of the codebase, I could of course check that the patch will apply cleanly against it and update the patch if necessary, however, testing the code as part of an actual installation of a development version of jDownloads is something I probably don't have the skills to directly help you with.

That said, while having jDownloads parse ID3 tags from MP3 files is convenient in that it removes the need to duplicate information which is already stored in the files into the files' descriptions for presentation purposes (in my case, I'm responsible for archiving self-made recordings of performances of a small music group for the group's internal use), I can see why removing special handling for MP3 files from future versions of jDownloads, or possibly using an external library for accessing metadata from media files (if there is one that could be easily used), could be a much better idea than applying this incomplete patch for long term development.  Even without this minor feature, there's always the option to add additional information about downloads manually to the download descriptions.  So please, don't change your mind about preserving the ID3 tag parser just because of this discussion if you had already decided otherwise for jDownloads 3.9 :)

Thanks again for taking the time to consider my question!

Regards,
Heikki
  •