Regular Expressions in Ruby
The Regexp class in Ruby implements Regular Expressions (regex or regexp), which are essential to string manipulation, extraction, and validation. They define textual patterns. Ruby’s regex syntax is very similar to that of Perl 5. Specialized features including grouping, lookarounds, capture variables, modifiers, and certain string techniques are used in advanced usage.
Regexp Definition and Initialization
A Regexp object can be created in several equivalent ways: using forward slashes (/pattern/), using %r{pattern}, or using factory methods:
| Method/Syntax | Description |
/pattern/ | The most common literal form. |
%r{pattern} | Useful when the pattern contains forward slashes, avoiding the need for escaping. |
Regexp.new("pattern") | Programmatic creation. |
Regexp.compile("pattern") | Synonym for Regexp.new. |
You can also read Ruby GUI Toolkits: GTK, wxRuby And RubyCocoa Explained
Matching and Advanced Comparison
The =~ Operator and Match Data
- In Ruby,
=~is the fundamental pattern-matching operator. Regular expressions must be used for one operand and strings for the other. - If a match is discovered, the integer index where the first match starts is returned by
=~. This finding is applicable to conditional statements since all non-nilintegers, including 0, are “truthy” in Ruby. - It returns
nilif no match is found.
Following a successful match, a MatchData object with all of the match’s details is stored in the global variable $~.
MatchData and Global Variables
All pattern match results are contained in the MatchData object. The most recent successful match is used to set a number of unique thread-local and method-local global variables:
| Global Variable | English Alias | MatchData Equivalent | Description |
| $~ | $LAST_MATCH_INFO | Regexp.last_match | The MatchData object itself. |
| $& | $MATCH | $~ | The entire matched text. |
| `$“ | $PREMATCH | $~.pre_match | The string preceding the match. |
| $’ | $POSTMATCH | $~.post_match | The string following the match. |
| $1…$9 | None | $~…$~ | The content of the Nth grouped subexpression. |
| $+ | $LAST_PAREN_MATCH | $~[-1] | The string matched by the highest-numbered group. |
Code Example: Accessing Match Data (Global Variables)
text = "hello world"
text =~ /ell(\w)\s(\w+)/
puts $& # "ello world" (entire match)
puts $` # "h" (before match)
puts $' # "" (after match)
puts $1 # "o" (first capture)
puts $2 # "world" (second capture)
Output
ello world
h
o
world
Case Equality (===)
For Regexp objects, the === operator is defined to determine if a given string matches the pattern. Because case statements employ === implicitly for comparisons, this is very crucial.
Code Example: Regexp in a case statement
string = "123"
case string
when /^[a-zA-Z]+$/
"Letters"
when /^[41, 42, 51-57]+$/
"Numbers" # => "Numbers"
else
"Mixed"
end
case "Ruby is #1!" when /\APython/
puts "Boooo."
when /\ARuby/ # This branch is executed because /\ARuby/ === "Ruby is #1!"
puts "You are right." # => "You are right."
end
Output
You are right.
match? (Ruby 2.4+)
Available in Ruby 2.4 and later, the Regexp#match? function yields a strict Boolean (true or false) answer and, importantly, does not affect the global $~ variable or any associated variables.
Code Example: Using match?
puts /R.../.match?("Ruby") # => true
puts /R.../.match?("Ruby", 1) # => false (starts search at position 1)
Output
true
false
You can also read Extending Ruby with C: A Complete Beginner’s Guide
Advanced Syntax Features
Named Capture Groups
Ruby extends standard grouping syntax (...) with named groups using (?<name>...). This allows developers to extract captured text using the group’s name (as a string or symbol) instead of counting group indexes. Named capture groups are available as keys in the resulting MatchData object.
Code Example: Named Groups
name_reg = /h(i|ello), my name is (?<name>.*)/i
name_input = "Hi, my name is Zaphod Beeblebrox"
match_data = name_reg.match(name_input)
if match_data
puts match_data[41] # => "i" (The first numbered group: (i|ello))
puts match_data[:name] # => "Zaphod Beeblebrox" (Access by symbol name)
puts match_data["name"] # => "Zaphod Beeblebrox" (Access by string name)
end
Output
Zaphod Beeblebrox
Zaphod Beeblebrox
The captured text is automatically assigned to local variables that match the group names in Ruby 1.9+ when a regex with named captures appears literally on the left side of =~.
Code Example: Automatic Local Variable Assignment (Ruby 1.9+)
if /(?<lang>\w+) (?<ver>\d+\.(\d+)+) (?<review>\w+)/ =~ "Ruby 1.9 rules!"
puts lang # => "Ruby"
puts ver # => "1.9"
puts review # => "rules"
end
Output
Ruby
1.9
rules
You can also read Embedding Ruby: Running Ruby Code Inside C & C++ Programs
Quantifiers and Greediness
Quantifiers (such as? for zero or one, * for zero or many, + for one or many, and {n,m} for ranges) specify the number of repeated patterns.
Quantifiers are greedy by default, matching as many characters as they can while still permitting the expression to match as a whole. Use a question mark (?) to make a quantifier lazy (non-greedy).
Code Example: Greediness vs. Laziness
text = "<first> <second>"
puts text[/<(.*)>/] # => "<first> <second>" (Greedy: matches up to the *last* >)
puts text[/<(.*?)>/] # => "<first>" (Lazy: matches up to the *first* >)
Output
<first> <second>
<first>
Anchors and Boundaries
Without using any characters, anchors define the match position:
^and$match the beginning and end of a line, respectively.\Aand\zmatch the absolute start and end of the string.\bmatches a word boundary (the transition between a word character and a non-word character).\Bmatches a non-word boundary.
Lookahead and Lookbehind (Assertions)
With lookarounds, a pattern can only be matched if it comes before or after another pattern; the context pattern is not included in the final match ($&).
| Assertion | Syntax | Matches |
| Positive Lookahead | (?=re) | Matches only if the pattern re follows. |
| Negative Lookahead | (?!re) | Matches only if the pattern re does not follow. |
| Positive Lookbehind | (?<=re) | Matches only if the pattern re precedes (Ruby 1.9+). |
| Negative Lookbehind | (?<!re) | Matches only if the pattern re does not precede (Ruby 1.9+). |
Code Example: Lookahead Assertion
puts "Ruby!".match(/Ruby(?=!)/) # => #<MatchData "Ruby">
puts "Ruby?".match(/Ruby(?=!)/) # => nil
Output
Ruby
You can also read What Are The Ruby Version Management With Code Examples
Search and Replace Operations
The String#sub, String#gsub, and their mutating versions (sub!, gsub!) are essential for search and replace. sub/sub! replaces the first occurrence, while gsub/gsub! replaces all occurrences. These methods work with regular expressions as the search pattern.
Replacement String Backreferences
The matched groups or portions of the original text may be referenced by backslash sequences in the replacement string.
\0or\&refers to the entire matched text ($&).\1,\2, etc., refer to the captured groups.\k<name>refers to a named captured group (Ruby 1.9+).
Code Example: Swapping using Backreferences
# Swap the first two characters (using global substitution for clarity)
puts "nercpyitno".gsub(/(.)(.)/, '\2\1')
# Output: encryption
# Case-insensitive replacement using \0 to preserve capitalization
text = "The ruby language"
puts text.gsub(/\bruby\b/i, '<b>\0</b>')
# Output: The <b>ruby</b> language
Output
encryption
The <b>ruby</b> language
Dynamic Replacement using Code Blocks
A code block that dynamically computes the replacement string can be used by gsub and sub in place of a static one. The block receives as an argument the matched text ($&).
Code Example: Capitalizing Words
The initial letter of each word that \b\w matches gets capitalized in this example using a block:
def mixed_case(name)
# \b\w matches a word boundary followed by one word character (the first letter)
name.gsub(/\b\w/) { |first| first.upcase }
end
puts mixed_case("fats waller")
# Output: Fats Waller
Output
Fats Waller
You can also read What Are The XML And RSS Processing In Ruby With Examples
Regexp Modifiers (Flags)
Modifiers are single characters that are added to the regex literal to control the match.
| Modifier | Constant | Description |
| i | Regexp::IGNORECASE | Makes the match case-insensitive. |
| m | Regexp::MULTILINE | Allows the dot (.) character to match newlines, treating the string as a single line. |
| x | Regexp::EXTENDED | Allows spaces and comments within the pattern for improved readability. |
| o | N/A | Performs string interpolation (#{}) only once, when the literal is first evaluated. |
| u, e, s, n | N/A | Define the encoding (UTF-8, EUC, SJIS, or ASCII/none). |
Code Example: Extended Syntax (x)
The x modifier makes complex regex easier to comprehend by allowing whitespace and comments inside the pattern.
extended = %r{
\s+ # Match one or more whitespace characters
a # Match the letter 'a'
\s+ # Match one or more whitespace characters
}xi # Extended and case-insensitive flags
puts extended =~ "What was Alfred doing here?" # => 4
puts extended =~ "My, that was a yummy mango." # => 8
Output
12
You can also read Date And Time Manipulation In Ruby With Code Examples
Utility Functions
Combining Regular Expressions
Any of its component strings or Regexp objects can be matched by a single pattern created using the Regexp.union factory function. When you send strings to Regexp.union, they are automatically escaped.
Code Example: Regexp.union
# Pattern matches any of the three symbols
Regexp.union("()", "[]", "{}") # => /\(\)|\[\]|\{\}/ (Automatic escaping)
# Aggregating multiple regex patterns for replacement
key_value_pairs = [[/[a-z]/i, '#'], [/#/, 'P']].freeze
# Matches either a letter or a pound sign
combined_regex = Regexp.union(*key_value_pairs.collect { |k,v| k })
result = "Here is number #123".gsub(combined_regex) do |match|
# Find the first pair whose regex matches the current character
key_value_pairs.detect { |k,v| k =~ match }[1]
end
puts result # => "#### ## ###### P123"
Output
#### ## ###### P123
Escaping
To ensure that the text is matched literally, the Regexp.escape (or Regexp.quote) method takes a string and escapes any characters that have particular meaning in regular expressions.
Code Example: Escaping a String for Literal Use
suffix = Regexp.escape("()") # Treat parentheses literally
# suffix is now "\\(\\)"
r = Regexp.new("[a-z]+" + suffix) # /[a-z]+\(\)/
puts r # => /[a-z]+\(\)/
puts "test()".match?(r) # => true
puts "hello".match?(r) # => false
Output
(?-mix:[a-z]+\(\))
true
false
You can also read What Is Mean By Introspection And Singleton Classes In Ruby
