Negative Look-behinds in Ruby 1.8 (and Javascript)
09 Jun 2010
Regular expressions and Ruby
This should probably help out anyone who has tried a negative look-behind assertion in Ruby 1.8.
There are two possible solutions for this, firstly you can use the Ruby 1.9 regular expression engine (Oniguruma) and continue using regular expressions in full glorious manner. However as Oniguruma is a C extension to the Ruby programming language, I would err on the side of caution before attempting to install it.
There is another, slightly convoluted, yet reasonable enough solution that you could apply in Ruby and Javascript!
Take for example the following scenario, we’ve been slowly transitioning our content management system into a series of web services for external PHP Typekit websites.
Unfortunately, one thing we would have liked to include is a calendar just like at the bottom here, which used to contain a load of relative links to:
/\/assets\/a\/[0-9a-f]{2}\/[0-9a-f]{2}/
Not so good, if you are trying to pull HTML directly from our main Rails app as a web service, so you would want a fully qualified domain name. In most languages, you could modify that regex to something like:
/(?<!http\:\/\/images\.scholastic\.co\.uk)\/assets\/a\/[0-9a-f]{2}\/[0-9a-f]{2}/
What we ended up doing is turning our URL check into technically two checks, one for the main regex, and one for a substring match afterwards
def validate_asset_urls
raise "Use negative lookbehind instead of substring match" if RUBY_VERSION > "1.9"
main_regex = /\/assets\/a\/[0-9a-f]{2}\/[0-9a-f]{2}/
domain_name = "http://images.scholastic.co.uk"
self.content_str.scan(main_regex).inject(0) do |index, result|
# create substring of the content_str which goes from the last point in
# the inject to the end of the string
substring = self.content_str[index..(self.content_str.size-1)]
# increment substring start index by the index of the first result match
# in order to aid us in working out whether a FQDN is used
index += substring.index(result)
if self.content_str[(index-domain_name.size)..index-1] != domain_name
errors.add(:content_str, "Links to assets must be fully qualified: #{result}")
break
end
# increment substring index by the size of the result if no error
index += result.size
end
end
So, the scan gives us all matches of the main pattern and then a substring match using a running pointer for each match checks whether to see whether it is valid. This could be further extended to a regex if the lookbehind match is more complex.