Regex Extractor
Match
test@example.com
How it works
A regex extractor applies a regular expression to a block of text and returns all matches โ optionally with capture groups extracted and formatted as a table or list. This is the browser-based equivalent of grep -oP (Perl-compatible regex with output only), sed 's/.../.../', or Python re.findall().
**Capture groups** Without capture groups, the entire match is returned. With groups: (d+)/(d+)/(d+) applied to "date 03/15/2024" returns ['03', '15', '2024'] for the three captured groups. Named groups: (?P<month>d+)/(?P<day>d+)/(?P<year>d+) returns {'month': '03', 'day': '15', 'year': '2024'}.
**Common extraction patterns** Email addresses: [a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+.[a-zA-Z]{2,}. IPv4 addresses: (?:d{1,3}.){3}d{1,3}. URLs: https?://[^s<>"{}|\^`+[]]+ . Hex colors: #[0-9A-Fa-f]{6}. Dollar amounts: $d{1,3}(?:,d{3})*(?:.d{2})?. Log levels: (DEBUG|INFO|WARN(?:ING)?|ERROR|CRITICAL|FATAL).
**Flags and multiline matching** g flag (global): return all matches, not just the first. i flag (case-insensitive): match uppercase and lowercase. m flag (multiline): ^ and $ match line boundaries, not just string boundaries. s flag (dotall): . matches newline characters. Flags can be combined: /pattern/gim.
Frequently Asked Questions
- Greedy quantifiers (*, +, ?) match as much as possible. Lazy/reluctant quantifiers (*?, +?, ??) match as little as possible. Example: on string '<a>text</a>', the greedy pattern <.*> matches the entire '<a>text</a>' string (matches from first < to last >). The lazy pattern <.*?> matches only '<a>' (stops at the first >). In HTML/XML parsing, lazy quantifiers are almost always what you want to avoid matching across multiple tags. Lazy matching is triggered by adding ? after the quantifier.
- Regex: [a-zA-Z0-9._%+\-]+@[a-zA-Z0-9.\-]+\.[a-zA-Z]{2,}. Paste the text and the regex into this tool with the global flag (g) to extract all matches. Limitations: this regex accepts some technically invalid emails and rejects some technically valid ones (quoted local parts, internationalized domain names). For production email validation, use a dedicated library or RFC 5321 compliant validator. For extraction from text (not validation), this pattern catches 99%+ of real-world email addresses.
- Lookahead (?=...) matches a position where the pattern matches ahead, without consuming characters. (?<=...) lookbehind matches where the pattern matches behind. Negative versions: (?!...) and (?<!...). Example: extract prices (digits) preceded by '$': (?<=\$)\d+\.\d{2} โ captures '19.99' from '$19.99' without including the '$' in the match. These are zero-width assertions โ they check for a condition at a position but don't include that context in the match. Not all regex engines support lookbehind (JavaScript ES2018+ supports fixed-length lookbehinds).
- g (global): return all matches, not just the first. Without g, regex stops after the first match. i (case-insensitive): /hello/i matches 'Hello', 'HELLO', 'hello'. m (multiline): ^ matches start of each line (not just start of string); $ matches end of each line. s (dotall/single-line): . matches newline characters. u (unicode): enables full Unicode matching (required for \p{L} Unicode property escapes). y (sticky): match only at the current lastIndex position (used for stateful parsing). JavaScript supports all of these; combine as needed: /pattern/gim.