Featured Post

Extract zip files with Ruby on Rails

Some friends of mine found my first tutorial difficult to understand. So i’m going to start with smaller pieces of code and start with basic unzip functions in Ruby on Rails. 123456789require 'fileutils' require 'zip/zip' require 'zip/zipfilesystem' def unzip   Zip::ZipFile.open("/path/to/file.zip").each...

Read More

How to process multiple zip-files with Ruby on Rails

Posted by Lars | Posted in Rails, Tutorials | Posted on 22-03-2009

0

Sometimes you have to integrate exported data from third party desktop software. One way to do is to pack all data into a zipped file and push it via FTP onto your server. This tutorial will explain how to process zip-files with your Ruby on Rails™ application.

I will take an example from a real estate managing software. It’s a common way for this kind of software to pack and push their data to online real estate listing services.

The zip-file contains:

  • A xml file with property data
  • Some picture files

The zip files are transfered by real estate managing software into a specific folder. There can be multiple zip files whose have to processed in correct time order.

The tutorial will cover following points:

  • Processing multiple zip-files
  • Parsing the included XML File to get the data
  • Processing binary data like pictures or documents
  • Error handling an cleaning up

Alright people, let’s start.

Processing multiple zip-files

All zip files will be uploaded into following directory. You have to create this directory with read/write access for your rails application. Furthermore you have to configure a FTP-User with write only access (”Postbox”-Setup). So you can use this ftp access for different clients.

1
RAILS_ROOT/external_uploads

 The file

1
RAILS_ROOT/external_uploads/zipdata.zip

has following content:

1
2
3
4
5
6
data.xml
picture1.jpg
picture2.jpg
picture3.jpg
picture4.jpg
picture5.jpg

The complete import will be executed in model layer. So we need an Import class. Some ruby packages are required, too.

1
2
3
4
5
6
7
8
9
10
11
class Import < ActiveRecord::Base

require 'fileutils'
require 'zip/zip'
require 'zip/zipfilesystem'
require 'RMagick'
require 'find'

attr_accessor :current_source_file, :current_hash, :current_xml_file, :file_handler, :xml_document, :tmp_property, :expose, :tmp_attachment, :company, :current_is_topobjekt
....
end

Before we start we need some supporting class functions. This function returns a random string. We need it to create a temporary directory to extract the zip file:

1
2
3
4
5
6
7
# returns a random string
def self.make_random_string(len=10)
chars = ("a".."z").to_a + ("A".."Z").to_a + ("0".."9").to_a
random_string = ""
1.upto(len) { |i| random_string << chars[rand(chars.size-1)] }
return random_string
end

The next function returns a list of all zipfiles of your directory sortet by date. It’s important because the zip files depent from each other. 

1
2
3
4
5
# returns the sorted file list from oldest to latest timestamp
def self.sorted_filelist
unsortet_files = Dir.glob(RAILS_ROOT+'/external_uploads/*.zip')
unsortet_files.sort{|a,b| File.mtime(a) <=> File.mtime(b)}
end

This code loops throught our external_uploads directory and starts the process for each zipfile:

1
2
3
4
5
6
7
8
def self.go
self.sorted_filelist.each do |single_zip_file|
current_import = Import.new(
:current_source_file => single_zip_file,
:current_hash => make_random_string)
current_import.start
end
end

The object function start calls all sub processes once for each zipfile.

1
2
3
4
5
6
7
8
9
# processing the zipfile
def start
make_tmp_path
unzip
open_xml_file
parse_xml_file
close_xml_file
clean_up
end

The function make_tmp_path creates a temporary directory. Each zipfile must have an own temporary directory. With current_hash we can guarantee it.

1
2
3
4
5
6
7
def tmp_path
File.join('external_uploads','tmp',current_hash)
end

def make_tmp_path
FileUtils.mkdir_p tmp_path
end

The unzip function extracts the currently selected zip-file into temporary directory. The rescue block catches errors.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
def add_path(filename)
File.join(tmp_path,filename)
end

def unzip
begin
Zip::ZipFile.open(current_source_file).each do |single_file|
if single_file.name.downcase =~ /.xml/
self.current_xml_file = add_path(single_file.name)
end
single_file.extract(add_path(single_file.name))
end
rescue
remove_tmp_path
end
end

I use a regular expression to get the included XML-File. The file and the corresponding path will saved into current_xml_file attribute.

Now we have the files

1
2
3
4
5
6
data.xml
picture1.jpg
picture2.jpg
picture3.jpg
picture4.jpg
picture5.jpg

unzipped into directory

1
RAILS_ROOT/external_uploads/tmp/adFvdSDed/

The current_xml_file attribute contains

1
RAILS_ROOT/external_uploads/tmp/adFvdSDed/data.xml

In the next part of this tutorial we will process the xml-file with REXML. We will do some image processing on the picture files with RMagick.

UPDATE: I canceled the second part of this tutorial because my latest tutorial will cover the XML parsing theme.

delicious | digg | reddit | facebook | technorati | stumbleupon | chatintamil

Write a comment