Topic: 2 new plugins released - $p() and $eq2()
I've released two new plugins on the plugins page. Both add taggerscript functions.
The first is $eq2(). It works like $eq(), but for more than a single test string: $eq(string to test against, test string 1, test string 2, .. test string n). %variables% can also be used, rather than predefined strings. Any number of test strings can be used.
The second is $p(). It works like a reverse \p{Foo} Unicode Property regex test. Give it a string, and it returns the Unicode Property (ie, the script) of the first character of that string. Useage: $p(any string or %variable%)
Examples:
"Anything" -> latin, "Аракс" or "Эдуард" -> cyrillic, "मुकेश" -> devanagari, "つしまみれ" -> hiragana, etc.
$p() is a little slow, for the first release loaded, because it has to initialize the various regexes used, but for every other release loaded afterwards, it reuses those regexes, so it's far faster than that first one.
$p() also supports a second argument, "True" (correct capitalization is important here). By default, $p() only checks scripts officially within Unicode. If the second argument is set to True, then it will also check user-defined code points for scripts not officially within Unicode. Currently, this only means that ConScript scripts are checked, but in future I'll also be adding in new scripts and characters added to already official scripts which are on the Unicode roadmap, but not yet officially within Unicode itself. (If there's any others that'd be worth checking, let me know; Omiglot's scripts don't seem to be in sufficient widespread use to be worth adding, but I'd be willing to add any of those, one-off, if there's a need for it.)
The only official "script" that $p() doesn't check for is the "common" script. These are pretty much all punctuation and control characters anyhow, and there's so many of them, it really slows down the plugin to check for them. Instead, $p() will return "common or unknown" if no script is matched, rather than actually checking for common and differentiating "common" or "unknown" (which would be the 'correct' implementation per Unicode's guidance).
$p() is especially useful if you're using a 'first character of artist name/artist name/...' type of naming string, but only want that to be done for characters of 1+ specific scripts. For example, my music dir currently has:
A/...
B/...
C/...
...
Z/...
[common or unknown]/... (most of this will eventually end up in [symbols & punctuation] after I add some missing punctuation marks to that separate test)
[cyrillic]/..
[devanagari]/..
[greek]/..
[han]/..
[hangul]/..
[hiragana]/..
[katakana]/..
[number]/.. (separately separated by a $in(0123456789,%string%) test after the $p() test)
[symbols & punctuation]/.. (separately separated by a $in('[*…\(\$+-:",%string%) test after the $p() test)
Much more readable and functional to me than having ザ/..., コ/..., 윤/..., etc.
The links are on http://wiki.musicbrainz.org/Picard_Plugins , or directly, $eq2() http://rapidshare.com/files/452223583/eq2.py.zip and $p() http://rapidshare.com/files/453111881/p.py.zip
Brian
PS: I'm also tweaking a $ne2() and $in2(), along the same lines of $eq2(). When they're tested and ready, they'll also be added to the plugins page.