How to Extract and Analyze URLs From Any Source

Why URL Extraction Is More Complex Than It Looks

A URL doesn't live in just one place in HTML. It can be in an href attribute, a src attribute, an action attribute, a CSS background-image url(), a data-src attribute for lazy-loaded images, a srcset attribute for responsive images, a poster attribute for video thumbnails, or in plain text content. A basic 'find all href values' approach misses the majority of URLs in a real HTML document.

And once you've found them all, a raw list of URLs without structure isn't particularly useful. You need to know: which are internal links and which are external? Which are images, scripts, stylesheets, documents, API endpoints? Which domains appear most frequently? Are there any insecure HTTP links that might cause mixed-content warnings? Is a particular third-party domain unexpectedly dominant in the link profile?

The URL Extractor handles all of this. It finds URLs across all attribute types and contexts, categorizes each one using extension and path-pattern matching, groups them by domain, flags HTTP links, and presents the full picture with charts and filterable tables. Whether you're doing an SEO audit, a security review, competitive research, or just need to pull all the image URLs from a page, the tool produces immediately actionable output rather than just a raw list.

Extract and categorize all URLs from any source

Paste HTML source, markdown, CSV, or any text. The extractor finds every URL, categorizes it (images, scripts, stylesheets, external links, API endpoints, social media, and more), shows the domain breakdown with charts, and exports in your preferred format.

Open URL Extractor

Key Use Cases and How to Handle Each

1
SEO link audit
Open your page in a browser, right-click → View Page Source, select all, copy, and paste. The extractor categorizes all links as internal or external, shows the full external domain breakdown, and lets you filter to external links only for export. Use this to understand your outbound link profile, find broken or unintended external links, and identify which third-party domains your page references. The domain breakdown chart immediately shows if any unexpected domain is over-represented.
2
Extracting all image URLs
Click the 'Images' category tile after extraction to filter the list to image URLs only. This gives you all img src values, CSS background-image URLs, and srcset entries from the page. Export as JSON array for use in a download script, or as one-per-line for import to a media management tool. This is particularly useful for content migration, image audit, or building a list of assets to pre-fetch.
3
Finding API endpoints in documentation
Paste API documentation HTML or markdown into the extractor. The API/Endpoints category captures URLs matching /api/, /v1/, /graphql, /rest/, and similar patterns. This is faster than manually scanning documentation for endpoint URLs and produces a clean list for import to Postman, Insomnia, or curl scripts.
4
Checking third-party dependencies
The domain breakdown table shows every external domain referenced in your content, sorted by link count. This is a fast way to audit third-party dependencies — how many domains does your page contact? Are there unexpected tracking or advertising domains? The list gives you the complete picture in seconds rather than requiring manual inspection of individual tags.
5
Markdown link extraction
Paste any markdown document — a README, a documentation file, a blog post — and the extractor finds all links in both inline syntax [text](url) and plain URL format. This is useful for link-checking documentation, extracting reference links before content migration, or building a bibliography of external references from a long document.

Frequently Asked Questions

Does the extractor work on full HTML pages?

Yes — it's specifically optimized for HTML. It finds URLs in all attribute contexts (href, src, action, data-src, poster, srcset, etc.), in CSS url() values embedded in style tags and attributes, in HTML comments, in script tag content where URLs appear as strings, and in visible text content. Paste the complete View Page Source output and it will find everything.

How accurate is the categorization?

Categorization uses file extension matching and path pattern matching. It's accurate for explicit extensions (.jpg, .js, .css, .pdf) and common patterns (/api/, /v1/, social media domains). It cannot determine content type from an HTTP response (since it's all client-side), so URLs with no extension or unconventional paths may be categorized as 'external' or 'internal' rather than their specific type. The filter lets you search by any substring for more precise targeting.

What's the difference between the JSON and CSV export formats?

JSON export gives you a plain array of URL strings — useful for JavaScript code, API calls, or tools that accept JSON input. CSV export gives you four columns: url, category, domain, and protocol (https/http/other) — useful for spreadsheet analysis, filtering by category in Excel, or building a link inventory. Both include only the URLs currently visible in the filtered list.

How does the internal vs external detection work?

The extractor scans for the first absolute URL in your input and uses its domain as the 'base domain' for that extraction. Relative URLs (starting with / or ./) and URLs containing that domain are classified as internal. All other absolute URLs pointing to different domains are external. If your input contains only relative URLs with no absolute domain reference, all links will be classified as 'relative' rather than internal/external.

Can I use this to check for broken links?

The extractor identifies and lists all URLs — you can then use a link checker tool (like Dead Link Checker, Broken Link Checker, or the W3C Link Validator) on the exported list. The URL Extractor is optimized for extraction and categorization; active link validation (making HTTP requests to check if URLs return 200) requires a server-side component and is outside scope for a client-side tool.

Need to extract emails or phone numbers too?

The Email Extractor and Phone Number Extractor work the same way — paste any text and get a deduplicated, categorized, exportable list. All three tools work together for complete contact and link data extraction from any source.

Open Email Extractor