Topic: Where is the schema for the data dump?

I spent several hours attempting to use the information here along with the tab delimited data dump:
http://musicbrainz.org/doc/MusicBrainz_Database/Schema

I manually typed out all those tables and fields and created the database in mysql along with the indices, but the files in mbdump.tar.bz2 don't match the documented schema information exactly.

1. While some of the schema ngs.png image seems correct, several extra fields exist or are referring to a different field name (i.e. wrong order) in some of the tables such as release, recording and others.  I wish the data dump contained the current schema information in SQL or just a list of field names/data types for each table as a separate file for reference.

2. Additionally, I see no way to associate the tracks with the album name.  The charts provided don't describe such a relationship, yet the musicbrainz web site does have this ability to navigate from artist to releases to viewing the track names and other details.

I'm attempting to download the musicbrainz server OVA image instead to see if that works better.

If there is a better place to discuss this, please let me know.

Re: Where is the schema for the data dump?

See http://git.musicbrainz.org/gitweb/?p=mu … Tables.sql for the SQL that creates the tables.

Tracks are connected to releases via the tracklist and medium tables (you can skip the tracklist table though by joining medium and track on medium.tracklist=track.tracklist)

Re: Where is the schema for the data dump?

skyflare wrote:

I manually typed out all those tables and fields and created the database in mysql along with the indices, but the files in mbdump.tar.bz2 don't match the documented schema information exactly.

1. While some of the schema ngs.png image seems correct, several extra fields exist or are referring to a different field name (i.e. wrong order) in some of the tables such as release, recording and others.  I wish the data dump contained the current schema information in SQL or just a list of field names/data types for each table as a separate file for reference.

No documentation of a schema is ever going to keep up with the reality 100% of the time - but in any case, manually creating tables indexes etc is always going to be a very bad idea with any database of even moderate complexity. There are scripts for doing this kind of thing. Besides, that schema diagram doesn't even display data types?

skyflare wrote:

I'm attempting to download the musicbrainz server OVA image instead to see if that works better.

The OVA is indeed probably the easiest way to get up and running - or you could take a look at https://github.com/lalinsky/mbslave which sets you up with a slave database, leaving out the server itself.

skyflare wrote:

2. Additionally, I see no way to associate the tracks with the album name.  The charts provided don't describe such a relationship, yet the musicbrainz web site does have this ability to navigate from artist to releases to viewing the track names and other details.

Sure the charts do; however you'd need to have at least basic familiarity with MB's data model and terminology to interpret.

A track belongs to a tracklist. Each tracklist is associated with one or more mediums. One or more mediums belongs to a release. A release has a name.

Re: Where is the schema for the data dump?

Thanks for the replies.  The OVA image did have all the missing fields plus I can log all the queries running in postgresql while viewing the pages.  This is excellent since it saves me having to understand the complexity of the musicbrainz server programming.   I may use mbslave later as you suggest when I have figured out the queries I need so that I can eliminate the virtualization for the production server.

I'm trying to build an application focused on a few genres of music with new search methods I'm creating.   I'm using mysql and railo (CFML), so that's why I'm rebuilding the database and I'll have to make a script that pulls the data I need out of postgresql periodically.  I want to eliminate many of the relationships for the sake of performance.   It seems like the current database is optimized for small size with maximum normalization.   Whereas, I want to do the opposite: denormalize the tables and store most common data in application memory and focus only on the most current releases for sale. 

Perhaps the documentation could note that the schema is not up to date in the charts and that you should refer to the source code / server image for the latest version.