Regular Expressions in Ruby: Pattern Matching & Validation

Regular Expressions in Ruby

The Regexp class in Ruby implements Regular Expressions (regex or regexp), which are essential to string manipulation, extraction, and validation. They define textual patterns. Ruby’s regex syntax is very similar to that of Perl 5. Specialized features including grouping, lookarounds, capture variables, modifiers, and certain string techniques are used in advanced usage.

Regexp Definition and Initialization

A Regexp object can be created in several equivalent ways: using forward slashes (/pattern/), using %r{pattern}, or using factory methods:

Method/Syntax	Description
`/pattern/`	The most common literal form.
`%r{pattern}`	Useful when the pattern contains forward slashes, avoiding the need for escaping.
`Regexp.new("pattern")`	Programmatic creation.
`Regexp.compile("pattern")`	Synonym for `Regexp.new`.

You can also read Ruby GUI Toolkits: GTK, wxRuby And RubyCocoa Explained

Matching and Advanced Comparison

The `=~` Operator and Match Data

In Ruby, =~ is the fundamental pattern-matching operator. Regular expressions must be used for one operand and strings for the other.
If a match is discovered, the integer index where the first match starts is returned by =~. This finding is applicable to conditional statements since all non-nil integers, including 0, are “truthy” in Ruby.
It returns nil if no match is found.

Following a successful match, a MatchData object with all of the match’s details is stored in the global variable $~.

MatchData and Global Variables

All pattern match results are contained in the MatchData object. The most recent successful match is used to set a number of unique thread-local and method-local global variables:

Global Variable	English Alias	MatchData Equivalent	Description
$~	`$LAST_MATCH_INFO`	`Regexp.last_match`	The `MatchData` object itself.
$&	`$MATCH`	`$~`	The entire matched text.
`$“	`$PREMATCH`	`$~.pre_match`	The string preceding the match.
$’	`$POSTMATCH`	`$~.post_match`	The string following the match.
$1…$9	None	`$~`…`$~`	The content of the Nth grouped subexpression.
$+	`$LAST_PAREN_MATCH`	`$~[-1]`	The string matched by the highest-numbered group.

Code Example: Accessing Match Data (Global Variables)

text = "hello world"

text =~ /ell(\w)\s(\w+)/
puts $&   # "ello world"   (entire match)
puts $`   # "h"            (before match)
puts $'   # ""             (after match)
puts $1   # "o"            (first capture)
puts $2   # "world"        (second capture)

Output

ello world
h

o
world

Case Equality (`===`)

For Regexp objects, the === operator is defined to determine if a given string matches the pattern. Because case statements employ === implicitly for comparisons, this is very crucial.

Code Example: Regexp in a case statement

string = "123" 
case string 
when /^[a-zA-Z]+$/ 
  "Letters" 
when /^[41, 42, 51-57]+$/ 
  "Numbers" # => "Numbers" 
else 
  "Mixed" 
end 

case "Ruby is #1!" when /\APython/ 
  puts "Boooo." 
when /\ARuby/ # This branch is executed because /\ARuby/ === "Ruby is #1!"
  puts "You are right." # => "You are right." 
end

Output

You are right.

`match?` (Ruby 2.4+)

Available in Ruby 2.4 and later, the Regexp#match? function yields a strict Boolean (true or false) answer and, importantly, does not affect the global $~ variable or any associated variables.

Code Example: Using match?

puts /R.../.match?("Ruby")       # => true
puts /R.../.match?("Ruby", 1)    # => false (starts search at position 1)

Output

true
false

You can also read Extending Ruby with C: A Complete Beginner’s Guide

Advanced Syntax Features

Named Capture Groups

Ruby extends standard grouping syntax (...) with named groups using (?<name>...). This allows developers to extract captured text using the group’s name (as a string or symbol) instead of counting group indexes. Named capture groups are available as keys in the resulting MatchData object.

Code Example: Named Groups

name_reg = /h(i|ello), my name is (?<name>.*)/i
name_input = "Hi, my name is Zaphod Beeblebrox"

match_data = name_reg.match(name_input) 

if match_data
  puts match_data[41]        # => "i" (The first numbered group: (i|ello)) 
  puts match_data[:name]    # => "Zaphod Beeblebrox" (Access by symbol name) 
  puts match_data["name"]   # => "Zaphod Beeblebrox" (Access by string name) 
end

Output


Zaphod Beeblebrox
Zaphod Beeblebrox

The captured text is automatically assigned to local variables that match the group names in Ruby 1.9+ when a regex with named captures appears literally on the left side of =~.

Code Example: Automatic Local Variable Assignment (Ruby 1.9+)

if /(?<lang>\w+) (?<ver>\d+\.(\d+)+) (?<review>\w+)/ =~ "Ruby 1.9 rules!"
  puts lang    # => "Ruby" 
  puts ver     # => "1.9" 
  puts review  # => "rules" 
end

Output

Ruby
1.9
rules

You can also read Embedding Ruby: Running Ruby Code Inside C & C++ Programs

Quantifiers and Greediness

Quantifiers (such as? for zero or one, * for zero or many, + for one or many, and {n,m} for ranges) specify the number of repeated patterns.

Quantifiers are greedy by default, matching as many characters as they can while still permitting the expression to match as a whole. Use a question mark (?) to make a quantifier lazy (non-greedy).

Code Example: Greediness vs. Laziness

text = "<first> <second>"
puts text[/<(.*)>/]   # => "<first> <second>" (Greedy: matches up to the *last* >) 
puts text[/<(.*?)>/]  # => "<first>" (Lazy: matches up to the *first* >)

Output

<first> <second>
<first>

Anchors and Boundaries

Without using any characters, anchors define the match position:

^ and $ match the beginning and end of a line, respectively.
\A and \z match the absolute start and end of the string.
\b matches a word boundary (the transition between a word character and a non-word character).
\B matches a non-word boundary.

Lookahead and Lookbehind (Assertions)

With lookarounds, a pattern can only be matched if it comes before or after another pattern; the context pattern is not included in the final match ($&).

Assertion	Syntax	Matches
Positive Lookahead	`(?=re)`	Matches only if the pattern `re` follows.
Negative Lookahead	`(?!re)`	Matches only if the pattern `re` does not follow.
Positive Lookbehind	`(?<=re)`	Matches only if the pattern `re` precedes (Ruby 1.9+).
Negative Lookbehind	`(?<!re)`	Matches only if the pattern `re` does not precede (Ruby 1.9+).

Code Example: Lookahead Assertion

puts "Ruby!".match(/Ruby(?=!)/)  # => #<MatchData "Ruby">
puts "Ruby?".match(/Ruby(?=!)/)  # => nil

Output

Ruby

You can also read What Are The Ruby Version Management With Code Examples

Search and Replace Operations

The String#sub, String#gsub, and their mutating versions (sub!, gsub!) are essential for search and replace. sub/sub! replaces the first occurrence, while gsub/gsub! replaces all occurrences. These methods work with regular expressions as the search pattern.

Replacement String Backreferences

The matched groups or portions of the original text may be referenced by backslash sequences in the replacement string.

\0 or \& refers to the entire matched text ($&).
\1, \2, etc., refer to the captured groups.
\k<name> refers to a named captured group (Ruby 1.9+).

Code Example: Swapping using Backreferences

# Swap the first two characters (using global substitution for clarity)
puts "nercpyitno".gsub(/(.)(.)/, '\2\1')  
# Output: encryption

# Case-insensitive replacement using \0 to preserve capitalization
text = "The ruby language"
puts text.gsub(/\bruby\b/i, '<b>\0</b>')  
# Output: The <b>ruby</b> language

Output

encryption
The <b>ruby</b> language

Dynamic Replacement using Code Blocks

A code block that dynamically computes the replacement string can be used by gsub and sub in place of a static one. The block receives as an argument the matched text ($&).

Code Example: Capitalizing Words

The initial letter of each word that \b\w matches gets capitalized in this example using a block:

def mixed_case(name)
  # \b\w matches a word boundary followed by one word character (the first letter)
  name.gsub(/\b\w/) { |first| first.upcase } 
end

puts mixed_case("fats waller")  
# Output: Fats Waller

Output

Fats Waller

You can also read What Are The XML And RSS Processing In Ruby With Examples

Regexp Modifiers (Flags)

Modifiers are single characters that are added to the regex literal to control the match.

Modifier	Constant	Description
i	`Regexp::IGNORECASE`	Makes the match case-insensitive.
m	`Regexp::MULTILINE`	Allows the dot (`.`) character to match newlines, treating the string as a single line.
x	`Regexp::EXTENDED`	Allows spaces and comments within the pattern for improved readability.
o	N/A	Performs string interpolation (`#{}`) only once, when the literal is first evaluated.
u, e, s, n	N/A	Define the encoding (UTF-8, EUC, SJIS, or ASCII/none).

Code Example: Extended Syntax (x)

The x modifier makes complex regex easier to comprehend by allowing whitespace and comments inside the pattern.

extended = %r{
  \s+        # Match one or more whitespace characters
  a          # Match the letter 'a'
  \s+        # Match one or more whitespace characters
}xi          # Extended and case-insensitive flags

puts extended =~ "What was Alfred doing here?"  # => 4
puts extended =~ "My, that was a yummy mango."  # => 8

Output

You can also read Date And Time Manipulation In Ruby With Code Examples

Utility Functions

Combining Regular Expressions

Any of its component strings or Regexp objects can be matched by a single pattern created using the Regexp.union factory function. When you send strings to Regexp.union, they are automatically escaped.

Code Example: Regexp.union

# Pattern matches any of the three symbols
Regexp.union("()", "[]", "{}") # => /\(\)|\[\]|\{\}/ (Automatic escaping)

# Aggregating multiple regex patterns for replacement
key_value_pairs = [[/[a-z]/i, '#'], [/#/, 'P']].freeze
# Matches either a letter or a pound sign
combined_regex = Regexp.union(*key_value_pairs.collect { |k,v| k })

result = "Here is number #123".gsub(combined_regex) do |match|
  # Find the first pair whose regex matches the current character
  key_value_pairs.detect { |k,v| k =~ match }[1]
end

puts result  # => "#### ## ###### P123"

Output

#### ## ###### P123

Escaping

To ensure that the text is matched literally, the Regexp.escape (or Regexp.quote) method takes a string and escapes any characters that have particular meaning in regular expressions.

Code Example: Escaping a String for Literal Use

suffix = Regexp.escape("()")  # Treat parentheses literally
# suffix is now "\\(\\)"
r = Regexp.new("[a-z]+" + suffix)  # /[a-z]+\(\)/

puts r       # => /[a-z]+\(\)/
puts "test()".match?(r)  # => true
puts "hello".match?(r)   # => false

Output

(?-mix:[a-z]+\(\))
true
false

You can also read What Is Mean By Introspection And Singleton Classes In Ruby

Page Content

Tutorials

Regular Expressions in Ruby: Pattern Matching & Validation