1 (edited by r0k78 2012-07-30 09:39:09)

Topic: [PlugIn] Transliterate unicode characters and remove accents

I finally put together a plugin that will do what i want.
I put it on pastebin (http://pastebin.com/FCpW1GSh) in case anyone is interested.

This plugin transliterates artist, albumartist, album and track names. This means it will convert non ASCII characters (cyrillic, greek ...) to the closest match of one or more ASCII characters and remove accents and other diacritics from characters, effectively creating an "english letters" version of the string. This is intended for sorting and file naming mainly.
The plugin will NOT overwrite any tag. It will save the transliterated strings to temporary tags. You can then use the tagger script to save any/all of those tags or the file naming script if you just want to use those for file naming.
Temporary tags are :
%_albumartisttrans%
%_artisttrans%
%_albumtrans%
%_titletrans%

---------------------------------------------- Below is the original post -------------------------------------

Hello. I use to convert all accented and special unicode characters to basic, non accented latin characters (the "english" letters) in all my sorting tags (artistsort, albumsort, tracksort). I do this so that bands like "Аркона" will be listed alphabetically along other bands (in this case at letter A) rather than appear in distinct groups.
So far, i did this semi-manually using foobar2000, but i'd like to automate the process using Picard.
I guess i can make plenty of $replace statements or fewer $rreplace, but maybe there is an even easier way of doing this using a python "plugin" script. Unfortunately, i never used python. After a quick look at the python doc, i could write some python regex but that would be very similar to using tagging script.
Is there a better/easier way of doing this (such as a built-in python string converter)?

Re: [PlugIn] Transliterate unicode characters and remove accents

Isn't there a setting in the options to use only ASCII? Or is that for file naming only?

Re: [PlugIn] Transliterate unicode characters and remove accents

There is an ASCII option for file naming and a "replace unicode punctuation" for every tag, but nothing that can help convert Б to Be or Ä to A

Re: [PlugIn] Transliterate unicode characters and remove accents

http://pastebin.com/h7XELV6p might help for removing accents. You would need to modify it so that it edits the variables you want it to edit and add any extra characters you want to replace.

I don't know of a better way to convert Cyrillic to Latin though other than replacing each character one by one. If you want to do that anyway, the list in http://pastebin.com/xSc0H7Vy would probably be a good start to avoid having to create it from scratch.

5 (edited by r0k78 2012-07-29 08:11:33)

Re: [PlugIn] Transliterate unicode characters and remove accents

Thanks for these nikki :-)
The accent trick is really nice.

EDIT : Now that i know what i'm trying to do is "transliterate", i was able to find this : http://pypi.python.org/pypi/Unidecode
Looks useful.

Re: [PlugIn] Transliterate unicode characters and remove accents

Update. I wrote (assembled) a plugin to do this. First post have been edited with a link.