How it works
The Extract URLs tool parses a block of text and pulls out every URL it finds, outputting them as a clean, one-per-line list. It detects HTTP and HTTPS URLs, handles query strings and fragments, and optionally normalizes protocol-relative URLs.
URLs appear in many unstructured contexts: plain text emails, log files, exported data, HTML source, markdown documents, and scraped web content. This tool saves the tedium of finding them manually or writing a custom regex.
How to use it: paste any text, HTML, or log content. The tool extracts every valid URL and displays them in a list. Toggle options to: include or exclude duplicate URLs, sort alphabetically, strip query parameters, extract only unique domains, or limit to specific protocols (http, https only).
Common use cases: extracting links from a plain-text email thread for a link audit, pulling all image URLs from HTML source code, finding all external links in a block of markdown, extracting API endpoint URLs from a log file, and building a URL list for a broken-link checker.
Technical note: the extractor uses a pattern that matches URLs starting with http://, https://, or //. It handles parenthetical URLs (as found in markdown links), angle-bracket delimiters, and URLs at end-of-sentence before periods and commas. URLs containing Unicode characters (internationalized domain names) are also detected.
Frequently Asked Questions
- URLs beginning with http://, https://, and protocol-relative // are detected. The pattern handles query strings, fragments (#section), port numbers, and internationalized domain names (IDNs).
- Yes. The tool finds URLs both in plain text and in HTML attribute values like href, src, action, and data-url. Paste raw HTML source and all URLs are extracted.
- Yes. Enable 'Remove duplicates' to output each URL once. Toggle 'Unique domains only' to extract just the domain names from all found URLs.
- The extractor strips trailing punctuation (periods, commas, closing parentheses) that commonly appear at the end of a URL in prose — so 'Visit https://example.com.' correctly extracts 'https://example.com' without the period.