1 (edited by theOtherCassius 2012-06-09 16:49:28)

Topic: Plugin to pull tag info from a webpage

There are a number of websites out there that could be used to infer info such as genre, artist, album, and either track number or track name (if one of the two is present).

For instance, take a look at this page:

http://archive.org/details/1-001

At some point a few years ago, I downloaded the VBR zip file which came through without track numbers. I could easily write a script to parse archive.org pages and return a JSON object containing whatever info can be sniffed in the page source. Track numbers, for instance, could be inferred from the order in which track names are presented.

My only problem would be that I would be doing it in PHP (I know -- I should have learned PERL). So I'm wondering if there are any plugin authors who would be willing to collaborate on this (I do the PHP side and you do the PERL side). I'm thinking it could be written as a general plugin which others could extend (for other sites such as last.fm) by simply writing a script either in PERL, PHP or any other scripting language that could be executed from PERL.

In pseudocode the plugin would look something like this

Create right-click option to get tags from URL
Capture URL from user
Parse URL for Top Level Domain
For each script type and while found flag is false
--check script directory for parsing script (i.e. archivedotorg.php)
--if found
----execute script
----compare values from returned JSON object to existing tags and generate preview to user
----prompt user to apply changes or cancel
----set found flag to TRUE
if found flag is false
--inform user that no script was found for the tld of the provided url

each TLD script would be expected to return a JSON object per specifications from the plugin author

I realize that I'm new here and therefore not a known quantity. So I would be willing to write the PHP side first

EDIT: I just want to point out that if this functionality was combined with tha autoclustering that I have suggested/asked for here:

http://forums.musicbrainz.org/viewtopic.php?id=3632

The script to pull tags from archive.org could potentially run without prompting the user (since there is a good chance the name of the directory the files live in can be used to find the associated page on archive.org. (see an archive.org url to see what I mean: http://archive.org/details/corpid005).

Re: Plugin to pull tag info from a webpage

Replying to my own above. I now realize that some (all?) plug-ins are in python.

I've done a very little bit of python in the past, and I see a few examples scattered about which I think would help me take a run at this on my own. I see a few possible challenges in putting the finished product together but hopefully I can tackle 90% of this and get a bit of help at the end.