FreeDB to MySQL conversion using Ruby

Not really related to the kind of information I use to post in this blog, but I thought somebody might be interested in a script that I quickly wrote to take the freedb database files and store them in MySQL (you could use another RDBMS provided a driver is available for ActiveRecord). It also shows how an easy it is to interact with a database through Ruby and ActiveRecord.

FreeDB is an open and free alternative to CDDB built using the “solidary community intelligence” I’ve been talking about in my lasts posts.I used it as first test to build a catalog of CDs, but the quality of the data is not very good if you want to build a relational, consistent view of it. It is based on the FreeDB database specification available here. You may also want to try MusicBrainz. Is is a an interesting alternative, although less well known. They started building a relational (and therefore more consistent) database of music metadata, which includes CD albums but also digital tracks.

Anyways, below is the code. There is a lot of room for improvement (it does not check the arguments for instance), but it is a good start point.


(Note: I hope it still works as I wrote it sometime ago and I might have done some modifications for testing … )


Usage: freedb2mysql.rb <freedb db directory>
freedb2mysql.rb

#!/usr/bin/env ruby
require ‘rubygems’
require_gem ‘activerecord’
#use UTF8
$KCODE = ‘u’
require ‘jcode’

#Define database connection (change host and database to your own)
ActiveRecord::Base.establish_connection(:adapter => ‘mysql’, :host => ”myhost”, :database => ‘mydatabase’)
ActiveRecord::Base.connection.execute ‘SET NAMES UTF8′

#Define freedb objects
class Artist < ActiveRecord::Base
has_and_belongs_to_many :albums

def to_s
self.name
end
end

class Album < ActiveRecord::Base
has_many :tracks
has_and_belongs_to_many :artists
end

class Track < ActiveRecord::Base
belongs_to :album

def to_s
“#{self.number}. #{self.title} – #{self.duration} s”
end
end

#Process files in folder
def read_folder(directory)

if File.directory?(directory)
Dir.chdir(directory) do
puts “chdir to #{directory}”
Dir.foreach(‘.’) do |file|
next if file.eql?(“.”) or file.eql?(“..”)
if File.directory?(file)
read_folder(file)
elsif File.file?(file) and File.readable?(file)
read_freedb_item(file)
end
end
end
else
#Maybe should throw exception
read_freedb_item(directory)
end
end

#Populates mysql database with data from freedb item
def read_freedb_item(file)

puts “Read file #{file}”

=begin
FREEDB FILE FORMAT
# xmcd
#
# Track frame offsets:
# 150
[ ... 21 frame offsets omitted ]
# 210627
#
# Disc length: 2952 seconds
#
# Revision: 1
# Submitted via: xmcd 2.0
#
DISCID=270b8617
DTITLE=Franske Stemninger / Con Spirito
DYEAR=1981
DGENRE=Classical
TTITLE0=Mille regretz de vous abandonner
[ ... 21 TTITLEN keywords omitted ]
TTITLE22=L’arche de no
EXTD=Copyright (c) 1981 MCA Records Inc.\nManufactured f
EXTD=or MCA Records Inc.
EXTT0=Des Prez\nYez
[ ... 21 EXTTN keywords omitted ]
EXTT22=Schmitt: A contre-voix \n(excerpt)
PLAYORDER=

=end

lines = IO.readlines(file)
if lines[0] !~ /^# xmcd/
puts “ERROR: Wrong header: #{lines[0]}”
return
end

start_offsets = (1..lines.length).find {|index| lines[index] =~ /^# Track frame offsets:/}
end_offsets = (1..lines.length).find {|index| lines[index] =~ /^# Disc length: (\d+)/}

if not (start_offsets and end_offsets)
puts “ERROR: No track offset section!”
return
end

offsets = []
lines.slice(start_offsets+1..end_offsets-1).each do |line|
if line =~ /^#\s+(\d+)\s+/
offsets << $1.to_i
end
end

total_duration = lines[end_offsets].scan(/^# Disc length: (\d+)/)
total_duration = total_duration[0][0].to_i

puts total_duration

#Read Key values into a hash
freedb_metadata = {}
lines.slice(end_offsets+1..lines.length).each do |line|
if line =~ /^([A-Z]+\d*)\s*=\s*(.*?)\s*\n/
if freedb_metadata[$1] #If the key is already defined -> concatenate values
freedb_metadata[$1].concat($2)
else
freedb_metadata[$1] = $2
end
end
end

#Check the mandatory attributes are set
if not freedb_metadata["DTITLE"]
puts “ERROR: Album does not have a title”
return
end

#Build Album and add tracks to it
album = Album.new
artists, album.title = freedb_metadata["DTITLE"].split(/\s+\/\s*|\s*\/\s+/)
if not album.title
album.title = artists
end
album.title.strip!

if album.title.length == 0
puts “ERROR: invalid title (probably wrong encoding)”
return
end

album.discid = freedb_metadata["DISCID"]
album.year = freedb_metadata["DYEAR"]
album.genre = freedb_metadata["DGENRE"]
album.extd = freedb_metadata["EXTD"]

(0..offsets.length-1).each do |index|
track = Track.new
track.title = freedb_metadata["TTITLE#{index}"]
track.title.strip! if track.title
if track.title == nil or track.title.length == 0
puts “ERROR: problem with track title (probably bad encoding)”
return
end
track.number = index + 1
track.extt = freedb_metadata["EXTT#{index}"]
if (index == offsets.length-1)
track.duration = total_duration – offsets[index]/75
else
track.duration = (offsets[index+1] – offsets[index]) / 75
end
album.tracks << track
end

puts album.tracks

puts “Artist: #{artists}, Title: #{album.title}”

#It looks like when several artists are related to a cd they are separated by ‘ – ‘
artists.split(/ – /).each do |artist_name|
name = artist_name.strip

if name.length == 0
puts “ERROR: problem with artist name (probably bad encoding)”
return
end

artist = Artist.find_by_name(artist_name)
if artist == nil
artist = Artist.new
artist.name = artist_name.strip
end
album.artists << artist
end

puts album.title

if not album.save
puts “ERROR saving album!”
return
end

puts “Album saved”

end

puts “Reading #{ARGV[0]}”
read_folder(ARGV[0])


mysql -h <host> <db_name> < CreateTables.sql
Note: be careful, this script deletes the tables if they exist before recreating them.
CreateTables.sql

DROP TABLE if exists artists;

CREATE TABLE artists (
id int not null auto_increment,
name varchar(100) not null,
primary key (id)
) character set utf8;

DROP TABLE if exists albums;

CREATE TABLE albums (
id int not null auto_increment,
discid char(8) not null,
title varchar(100) not null,
year smallint unsigned,
genre varchar(40),
extd varchar(255) default null,
asin varchar(20) default null,
primary key (id)
) character set utf8;

DROP TABLE if exists albums_artists;

CREATE TABLE albums_artists (
album_id int not null,
artist_id int not null,
constraint fk_aa_album foreign key (album_id) references albums(id),
constraint fk_aa_artists foreign key (artist_id) references artists(id),
primary key (album_id, artist_id)
) character set utf8;

DROP TABLE if exists tracks;

CREATE TABLE tracks (
id int not null auto_increment,
album_id int not null,
title varchar(100) not null,
number tinyint not null,
duration smallint unsigned,
extt varchar(255) default null,
constraint fk_tracks_album foreign key (album_id) references albums(id),
primary key (id)
) character set utf8;


Leave a Reply