How to process multiple zip-files with Ruby on Rails
Posted by Lars | Posted in Rails, Tutorials | Posted on 22-03-2009
0
Sometimes you have to integrate exported data from third party desktop software. One way to do is to pack all data into a zipped file and push it via FTP onto your server. This tutorial will explain how to process zip-files with your Ruby on Rails™ application.
I will take an example from a real estate managing software. It’s a common way for this kind of software to pack and push their data to online real estate listing services.
The zip-file contains:
- A xml file with property data
- Some picture files
The zip files are transfered by real estate managing software into a specific folder. There can be multiple zip files whose have to processed in correct time order.
The tutorial will cover following points:
- Processing multiple zip-files
- Parsing the included XML File to get the data
- Processing binary data like pictures or documents
- Error handling an cleaning up
Alright people, let’s start.
Processing multiple zip-files
All zip files will be uploaded into following directory. You have to create this directory with read/write access for your rails application. Furthermore you have to configure a FTP-User with write only access (”Postbox”-Setup). So you can use this ftp access for different clients.
1 | RAILS_ROOT/external_uploads |
The file
1 | RAILS_ROOT/external_uploads/zipdata.zip |
has following content:
1 2 3 4 5 6 | data.xml picture1.jpg picture2.jpg picture3.jpg picture4.jpg picture5.jpg |
The complete import will be executed in model layer. So we need an Import class. Some ruby packages are required, too.
1 2 3 4 5 6 7 8 9 10 11 | class Import < ActiveRecord::Base require 'fileutils' require 'zip/zip' require 'zip/zipfilesystem' require 'RMagick' require 'find' attr_accessor :current_source_file, :current_hash, :current_xml_file, :file_handler, :xml_document, :tmp_property, :expose, :tmp_attachment, :company, :current_is_topobjekt .... end |
Before we start we need some supporting class functions. This function returns a random string. We need it to create a temporary directory to extract the zip file:
1 2 3 4 5 6 7 | # returns a random string def self.make_random_string(len=10) chars = ("a".."z").to_a + ("A".."Z").to_a + ("0".."9").to_a random_string = "" 1.upto(len) { |i| random_string << chars[rand(chars.size-1)] } return random_string end |
The next function returns a list of all zipfiles of your directory sortet by date. It’s important because the zip files depent from each other.
1 2 3 4 5 | # returns the sorted file list from oldest to latest timestamp def self.sorted_filelist unsortet_files = Dir.glob(RAILS_ROOT+'/external_uploads/*.zip') unsortet_files.sort{|a,b| File.mtime(a) <=> File.mtime(b)} end |
This code loops throught our external_uploads directory and starts the process for each zipfile:
1 2 3 4 5 6 7 8 | def self.go self.sorted_filelist.each do |single_zip_file| current_import = Import.new( :current_source_file => single_zip_file, :current_hash => make_random_string) current_import.start end end |
The object function start calls all sub processes once for each zipfile.
1 2 3 4 5 6 7 8 9 | # processing the zipfile def start make_tmp_path unzip open_xml_file parse_xml_file close_xml_file clean_up end |
The function make_tmp_path creates a temporary directory. Each zipfile must have an own temporary directory. With current_hash we can guarantee it.
1 2 3 4 5 6 7 | def tmp_path File.join('external_uploads','tmp',current_hash) end def make_tmp_path FileUtils.mkdir_p tmp_path end |
The unzip function extracts the currently selected zip-file into temporary directory. The rescue block catches errors.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 | def add_path(filename) File.join(tmp_path,filename) end def unzip begin Zip::ZipFile.open(current_source_file).each do |single_file| if single_file.name.downcase =~ /.xml/ self.current_xml_file = add_path(single_file.name) end single_file.extract(add_path(single_file.name)) end rescue remove_tmp_path end end |
I use a regular expression to get the included XML-File. The file and the corresponding path will saved into current_xml_file attribute.
Now we have the files
1 2 3 4 5 6 | data.xml picture1.jpg picture2.jpg picture3.jpg picture4.jpg picture5.jpg |
unzipped into directory
1 | RAILS_ROOT/external_uploads/tmp/adFvdSDed/ |
The current_xml_file attribute contains
1 | RAILS_ROOT/external_uploads/tmp/adFvdSDed/data.xml |
In the next part of this tutorial we will process the xml-file with REXML. We will do some image processing on the picture files with RMagick.
UPDATE: I canceled the second part of this tutorial because my latest tutorial will cover the XML parsing theme.










