Wednesday, March 30, 2011

Scheduling tasks in Ruby / Rails

SSMTP Relay & Mail Delivery in Rails


SSMTP relay
Role separation and vertical scaling between the database and the app servers is almost taken for granted, but is no reason that we can't do this for mail delivery as well! Following up on some of the comments from my earlier post on sendmail, configuring and maintaining a secure, high-performance mail server is a non-trivial task. Hence, we should be able to do it once, and then tell Rails to use that server for all mail related tasks - thankfully, the process is easy.

Painless mail MTA: SSMTP

Assuming you already have Postfix/Exim/Sendmail configured on one of your machines (dedicated mail server), we're good to go. In fact, a great benefit of this approach is the fact that the SMTP server does not have to be local - it can be a service like GMail, or your own server located on a different network. To make things click, we'll install SSMTP on all of our app servers:
SSMTP - A secure, effective and simple way of getting mail off a system to your mail hub. It contains no suid-binaries or other dangerous things - no mail spool to poke around in, and no daemons running in the background. Mail is simply forwarded to the configured mailhost.
Use your favorite installer (apt, yum, etc.) to get ssmtp setup on your system. If you have sendmail installed, it will prompt you to remove it as part of the process - from now on, all applications relying on sendmail will interact with ssmtp.

Configuring GMail as external SMTP

Configuring ssmtp can't get any easier. Navigate to /etc/ssmtp and open up ssmtp.conf. Below is a fully featured, sample configuration file for relaying all of your mail to GMail servers:
# Config file for sSMTP sendmail
#
# The person who gets all mail for userids < 1000
# Make this empty to disable rewriting.
root=postmaster
 
# The place where the mail goes. The actual machine name is required no
# MX records are consulted. Commonly mailhosts are named mail.domain.com
 
# GMAIL configuration
mailhub=smtp.gmail.com:587
AuthUser=youremail@gmail.com
AuthPass=pass
UseSTARTTLS=YES
 
# The full hostname
hostname=machinehostname
 
# Are users allowed to set their own From: address?
# YES - Allow the user to specify their own From: address
# NO - Use the system generated From: address
FromLineOverride=YES
 
Next, open up your environment.rb, set ActionMailer to use sendmail, and we're all done:
ActionMailer::Base.delivery_method = :sendmail
 
That's it, from now on your Rails application will think it is using sendmail, not realizing that ssmtp is quietly doing all the work of interacting with Google's servers. In similar fashion, you can point your app servers to use a dedicated mail-server machine inside or outside of your network by changing your ssmtp.conf!

Reconstructing Request URIs in Rails

Secure UTF-8 Input in Rails


Approximately 64.2 percent of online users do not speak English. Ok, so once we adjust these numbers to take into account second-language speakers, this number won't be as large - let's say cut it in half, and that's being generous! Then, we're still looking at 30 percent of the total online population who, in all likelihood, cannot use your website due to simple encoding problems. Surprising? Unacceptable! On a more positive note, if you're using Rails, then as of 1.2.2+ you are already serving UTF-8 content and have almost transparent unicode support. Having said that, almost transparent is both its biggest strength and its biggest weakness - it's great that it works, but it hides some of the complexity behind multi-byte operations which we need to be aware of, and it also introduces some new security holes. It's a leaky abstraction, and we need to address some of these leaks.

Making sense of Unicode/UTF-8 in Ruby

Unicode was developed to address the need for multilingual support of the modern world. Specifically, its aim was to enable processing of arbitrary languages mixed within each other - a rather ambitious goal once you realize the sheer number of possible characters (graphemes). Over the years, a number of new standards (UTF-7, UTF-8, CESU-8, UTF-16/UCS-2, etc.) have been developed to address this need, but UTF-8 emerged as the de facto standard. To fully appreciate some of the complexities of the task, and to better understand the leaks, I would strongly recommend that you take the time and read Joel Spolsky's The Absolute Minimum Every Software Developer Absolutely, Positively Must Know About Unicode and Character Sets.
The underlying problem with UTF-8 is its multi-byte encoding mechanism. Remember your one line C/C++ character iterator loop? Kiss that goodbye, now we need additional logic. In Ruby-land, full unicode support is still lacking (by default), but it is expected that the long awaited Ruby 1.9/2.0 will become unicode-safe. In the meantime, Rails team decided to stop waiting for a miracle and introduced Multibyte support into its 1.2.2 release. There is some hand-twisting behind the scenes and I would recommend that you familiarize yourself with it: Multibyte for Rails.

Validating UTF-8 Input

First important distinction that you need to be aware of when switching to UTF-8 is that not every sequence of bytes is a valid UTF-8 string. This was never a problem before, in ISO-8859-1 everything was valid by default, but with UTF-8 we need to massage our input before we make this assumption. Culprits: old browsers, automated agents, malicious users.
To accomplish this, and to explore some of the reasoning behind the solution, check out Paul Battley's Fixing invalid UTF-8 in Ruby. He suggests the following filter:
require 'iconv'
 
ic = Iconv.new('UTF-8//IGNORE', 'UTF-8')
valid_string = ic.iconv(untrusted_string + ' ')[0..-2]
 

Filtering UTF-8 Input

Once the string is validated we have to make sure that its contents match our expectations. As previously mentioned, UTF-8 does not encode renderings (glyphs), but uses an abstract notion of a character (grapheme), which is in turn translated into a code point. Hence, given the enormous size of the input space, accepting blind UTF-8 is usually a bad idea. Thankfully, UTF has a notion of categories which allows us to filter the input. To begin with, you will probably want to discard the entire Cx family (control characters), and then fine-tune other categories depending on your application:
Cc - Other, Control
Cf - Other, Format
Cs - Other, Surrogate
Co - Other, Private Use
Cn - Other, Not Assigned (no characters in the file have this property)
Once released, Ruby 1.9 will include the Oniguruma regular expression library which is encoding aware. However, we don't have to wait for Ruby 1.9 (frankly, we can't), and thankfully we can install it as a gemand get back to our problem:
require 'oniguruma'
 
# Finall all Cx category graphemes
reg = Oniguruma::ORegexp.new("\\p{C}", {:encoding => Oniguruma::ENCODING_UTF8})
 
# Erase the Cx graphemes from our validated string
filtered_string = reg.gsub(validated_string, '')
 
It is also worth mentioning that Oniguruma has a number of other great features which are definitely worth exploring further.

Serving minty-fresh UTF-8

Once the data is validated and filtered, it has to be properly stored. MySQL usually defaults to latin1 encoding, and we have to explicitly tell it to use UTF-8:
# Method 1: in your mysql config (my.cnf)
default-character-set=utf8
 
# method 2: explicitly specify the character set
CREATE DATABASE foo CHARACTER SET utf8 COLLATE utf8_general_ci;
CREATE TABLE widgets (...) Type=MyISAM CHARACTER SET utf8;
 
# Finally, in your Rails DB config
production:
user: username
host: xxx.xxx.xxx.xxx
encoding: utf8 # make rails UTF-8 aware
 
Once the data is stored, we also need to serve it with proper headers. Rails should automatically do this for us, but sometimes we might have to massage it by specifying our own custom headers:
@headers["Content-Type"] = "text/html; charset=utf-8" # general purpose
@headers["Content-Type"] = "text/xml; charset=utf-8" # xml content
 
Finally, unicode also has one gotcha when it comes to caching: make sure your front-end server (apache, lighttpd, etc.) is either configured to serve utf headers by default, or add the following line at the very top of your template (once the browser encounters it, it drops everything it has done and is forced to start over):
 

Additional resources

In conclusion, unicode often appears to be a complicated beast, but once you figure out the basics, it won't seem as all that bad. For further reading, I would also recommend perusing through the following resources:

Client HTTP Caching in Rails

Google / Yahoo Sitemaps in Rails

Validating URL/URI in Ruby on Rails


Wouldn't it be nice if you could validate the format and the existence of a URI/URL provided by the users in your Rails application? Well, I thought so. So after a little research I put together avalidates_uri_existence_of.rb which does exactly that. Take a look below:
equire 'net/http'
 
# Original credits: http://blog.inquirylabs.com/2006/04/13/simple-uri-validation/
# HTTP Codes: http://www.ruby-doc.org/stdlib/libdoc/net/http/rdoc/classes/Net/HTTPResponse.html
 
class ActiveRecord::Base
def self.validates_uri_existence_of(*attr_names)
configuration = { :message => "is not valid or not responding", :on => :save, :with => nil }
configuration.update(attr_names.pop) if attr_names.last.is_a?(Hash)
 
raise(ArgumentError, "A regular expression must be supplied as the :with option of the configuration hash") unless configuration[:with].is_a?(Regexp)
 
validates_each(attr_names, configuration) do |r, a, v|
if v.to_s =~ configuration[:with] # check RegExp
begin # check header response
case Net::HTTP.get_response(URI.parse(v))
when Net::HTTPSuccess then true
else r.errors.add(a, configuration[:message]) and false
end
rescue # Recover on DNS failures..
r.errors.add(a, configuration[:message]) and false
end
else
r.errors.add(a, configuration[:message]) and false
end
end
end
end
 
Code above is a barebones check which first validates the URL format against a provided regular expression (http://www.etc...) and if it passes, it sends a HEAD (header) request and checks if we get a 200 Code (HTTPSuccess) back. You could easily extend it to handle redirects, etc.
How do you use it in your application? Copy the code below into a validates_uri_existence_of.rb and drop it into your lib directory in your Rails application. Next, open up your environment.rb (under /config) and at the bottom of the file add:
equire 'validates_uri_existence_of'
Almost there, now we can use our new custom validation in any model by calling:
validates_uri_existence_of :url, :with =>
/(^$)|(^(http|https):\/\/[a-z0-9]+([\-\.]{1}[a-z0-9]+)*\.[a-z]{2,5}(([0-9]{1,5})?\/.*)?$)/ix
 
:url is the field you want to validate, and with specifies the Regular expression to check the URL against. One in the example should work for almost all valid http / https links. I haven't found a problem with it yet. Theoretically, you could change the expression to validate only URL's within a specific domain or a set of domains.
So, now we can validate the format of the URL and check that the URL is alive and breathing (returns Code 200).

Counter for acts_as_taggable



http://www.igvita.com/2006/08/28/extending-acts_as_taggable/

Sunday, March 20, 2011

Ruby Video,Audio Processing


What

RVideo is a Ruby library inspects and processes video and audio files by providing an interface to free Unix tools like ffmpeg.

Installing

Installation is a little involved. First, install the gem:
sudo gem install rvideo
Next, install ffmpeg and (possibly) other related libraries. This is documented elsewhere on the web, and can be a headache. If you are on a Mac, the Macports build is reasonably good (though not perfect). Install with:
sudo port install ffmpeg
Or, for a better build (recommended), add additional video- and audio-related libraries, like this:
sudo port install ffmpeg +lame +libogg +vorbis +faac +faad +xvid +x264 +a52
Most package management systems include a build of ffmpeg, but many include a poor build. So you may need to compile from scratch.
If you want to create Flash Video files, also install flvtool2:
sudo gem install flvtool2
Once ffmpeg and RVideo are installed, you’re set.

The basics

file = RVideo::Inspector.new(:file => "#{FILE_PATH}/filename.mp4")
file.video_codec # => mpeg4
file.audio_codec # => aac
file.resolution # => 320x240
command = "ffmpeg -i $input_file -vcodec xvid -s $resolution$ $output_file$" 
options = {
:input_file => "#{FILE_PATH}/filename.mp4",
:output_file => "#{FILE_PATH}/processed_file.mp4",
:resolution => "640x480"
}

transcoder = RVideo::Transcoder.new

transcoder.execute(command, options)

transcoder.processed.video_codec # => xvid

Demonstration of usage

To inspect a file, initialize an RVideo file inspector object. See the documentation for details.
A few examples:
file = RVideo::Inspector.new(:file => "#{APP_ROOT}/files/input.mp4")
file = RVideo::Inspector.new(:raw_response => @existing_response)
file = RVideo::Inspector.new(:file => "#{APP_ROOT}/files/input.mp4",
:ffmpeg_binary => "#{APP_ROOT}/bin/ffmpeg")
file.fps        # => "29.97" 
file.duration # => "00:05:23.4"
To transcode a video, initialize a Transcoder object.
transcoder = RVideo::Transcoder.new
Then pass a command and valid options to the execute method.
recipe = "ffmpeg -i $input_file$ -ar 22050 -ab 64 -f flv -r 29.97 -s" 
recipe += " $resolution$ -y $output_file$"
recipe += "\nflvtool2 -U $output_file$"
begin
transcoder.execute(recipe, {:input_file => "/path/to/input.mp4",
:output_file => "/path/to/output.flv", :resolution => "640x360"})
rescue TranscoderError => e
puts "Unable to transcode file: #{e.class} - #{e.message}"
end
If the job succeeds, you can access the metadata of the input and output files with:
transcoder.original     # RVideo::Inspector object
transcoder.processed # RVideo::Inspector object
Even if the file is processed, it may still have problems. RVideo will populate an errors array if the duration of the processed video differs from the duration of the original video, or if the processed file is unreadable.



Embedding SWF content with Ruby on Rails

Count internal and external links in a page


Today I’ve readed the Webmaster Guidelines, in the “Design and content guidelines” section Google recommends to keep a resonable number of links in the page:
Keep the links on a given page to a reasonable number.
Mat Cutts talks about 100 links/page.
Hmm, how many links do I have in my pages ? I’ve made a simple Ruby script to count links in a page (NokogiriDomainatrix and Open-Uri made it trivial):
#!/usr/bin/env ruby
 
require "rubygems"
require "nokogiri"
require "open-uri"
require 'domainatrix'
 
url = ARGV.first
 
raise "*** Use: links_count " unless url
 
domain = Domainatrix.parse(url)
 
int_links = 0
ext_links = 0
 
doc = Nokogiri::HTML(open(url).read)
doc.xpath("//a[@href]").each do |node|
link = node.get_attribute('href')
 
if link =~ %r{\Ahttp://}
l = Domainatrix.parse(link)
if l.public_suffix == domain.public_suffix and l.domain == domain.domain
int_links += 1
else
ext_links += 1
puts link
end
else
int_links += 1
end
end
 
puts ""
puts "*** internal links: #{int_links}"
puts "*** external links: #{ext_links}"
puts "*** total: #{int_links + ext_links}"
Now I can see how many internal/external links contains a page. Example for CNN.com:
[vitalie@silver ~]$ links_count.rb http://www.cnn.com
http://www.cnnmexico.com/
http://www.ireport.com/
http://www.time.com/time/world/article/0,8599,1997325,00.html
http://www.ireport.com/docs/DOC-459640?hpt=Mid
http://www.ireport.com/docs/DOC-459640?hpt=Mid
http://www.ireport.com/?hpt=Sbin
http://twitter.com/worldcupcnn
http://foursquare.com/cnn
http://movie-critics.ew.com/2010/06/16/psycho-turns-50-today/
http://www.cnngo.com/hong-kong/play/beyond-star-ferry-457619
http://www.cnngo.com/bangkok/play/spirited-competition-koh-samui-regatta-706618
http://www.ireport.com/?cnn=yes
http://www.turnerstoreonline.com/
http://www.cnntraveller.com
http://www.cnnchile.com
http://www.cnnmexico.com
http://cnn.joins.com/
http://www.cnn.co.jp/
http://www.cnnturk.com/
http://www.turner.com/
http://www.cnnmediainfo.com/
http://www.turner.com/careers/
 
*** internal links: 263
*** external links: 22
*** total: 285