What URL formats does the extractor recognize?

The extractor finds URLs in: href='...' and href="..." attributes, src attributes, action attributes, CSS url() values, plain URLs starting with http://, https://, ftp://, mailto:, and //-protocol URLs. It also finds bare URLs without protocol that match domain patterns (domain.tld/path). It handles single-quoted, double-quoted, and unquoted attributes, and recognizes URLs in markdown link syntax [text](url) and reference-style links.

How does link categorization work?

Each URL is categorized based on its extension, path patterns, and content type signals: images (.jpg, .png, .gif, .svg, .webp, etc.), scripts (.js, .mjs, module references), stylesheets (.css), documents (.pdf, .doc, .xlsx, etc.), API endpoints (/api/, /v1/, /graphql, etc.), social media (twitter.com, linkedin.com, facebook.com, etc.), mailto: links, and general internal vs external links based on the detected base domain.

How does internal vs external detection work?

The extractor detects the base domain from absolute URLs in the input. Relative URLs (starting with /) and URLs matching that domain are classified as internal. URLs pointing to other domains are external. If no absolute domain can be detected (e.g., input is all relative URLs), all links are shown without internal/external classification.

Can I filter to show only specific link types?

Yes — the Link List tab includes type and domain filters. You can show only external links, only images, only scripts, only API endpoints, or links from a specific domain. Combined with the export feature, this lets you extract a specific category of URLs from a large HTML document in seconds.

Is my content private?

Completely. All processing runs in your browser's JavaScript engine — no server calls, no logging, no data storage. Your HTML source, markdown, or text is processed locally and never transmitted anywhere. This makes it safe for internal documentation, proprietary codebases, and sensitive content.

💻Developer Tools

Extract Every URL From Any Text or HTML Instantly

Extract, deduplicate, and analyze every URL from any text or HTML instantly.

What This Does

URLs are embedded everywhere — in HTML source code, in markdown documents, in CSV exports, in log files, in copied web content. Finding all the links in a block of content manually is slow and error-prone; tools that do it automatically usually give you raw, unprocessed output that still requires significant cleanup. The URL Extractor goes substantially beyond simple link extraction. It finds every URL in your pasted text — in href attributes, src attributes, action attributes, CSS url() values, plain-text URLs with or without http/https, and bare domain references — then deduplicates them, normalizes them, and categorizes each link by type: internal links, external links, images, scripts, stylesheets, documents, API endpoints, social media links, and more. The domain breakdown shows you which sites are linked to most frequently. The protocol breakdown identifies HTTP vs HTTPS links so you can find insecure references. And the full-text filter lets you search and narrow the extracted list before exporting. This is dramatically more useful than basic URL extractors that give you a raw deduplicated list with no categorization, no domain analysis, and no insight into the structure of the link set. Whether you're auditing a website's external link profile, extracting API endpoints from documentation, building a sitemap, or doing competitive link analysis, the URL Extractor gives you a complete picture rather than just a list. All processing runs entirely in your browser — your content never leaves your device.

Assumptions

·Internal vs external classification based on first detected absolute domain in input
·Deduplication is case-insensitive for protocol and domain, case-sensitive for path
·Categorization uses extension and path pattern matching, not HTTP content-type headers
·All processing is client-side — no data transmitted to any server

When Should You Use This?

→Extracting all links from HTML source to audit a page's internal and external link structure
→Finding all image URLs, script sources, or stylesheet hrefs in a web page's source code
→Pulling all URLs from a markdown document, README, or text file for link checking
→Extracting API endpoint URLs from documentation, Postman exports, or code files
→Building a list of external links from a piece of content for competitive or SEO analysis
→Deduplicating and cleaning a URL list from multiple sources before importing to a tool

Example Scenario

Marcus is doing an SEO audit for a client's website. He copies the full HTML source of the client's homepage (12,000 lines) and pastes it into the URL Extractor. Result in under a second: 847 total URL matches → 312 unique URLs after deduplication. Breakdown: 189 internal links (61%), 91 external links (29%), 24 images (8%), 8 scripts (2%). Top external domain: google.com (analytics + tag manager, 12 links). 3 HTTP (non-HTTPS) external links flagged. 14 links to social media platforms. He exports the external links as a CSV and the image URLs as a JSON array for further processing.

🔒

100% private. All processing runs in your browser — your HTML or text never leaves your device.

Paste HTML, markdown, CSV, or any text with links

URL Extractor — Extract, Categorize & Export Links

Paste any HTML, markdown, CSV, or text content and instantly extract every URL. The extractor finds links in href/src/action attributes, CSS url() values, plain-text URLs, and markdown link syntax — then deduplicates, categorizes each URL (images, scripts, stylesheets, documents, API endpoints, social media, mailto, internal, external), and provides a full domain breakdown. Export as plain text, comma-separated, JSON array, or CSV with metadata. All processing is client-side.

What makes this better than basic URL extractors?

Smart categorization — automatically identifies images, scripts, stylesheets, documents, API endpoints, social links, and more
Domain breakdown — bar chart + pie chart showing which sites are linked to most, with percentage breakdown
HTTP/HTTPS detection — flags insecure HTTP links that may cause mixed-content issues
Category filter — click any category to instantly filter the list to that type only
CSV export with metadata — export URL, category, domain, and protocol columns for further analysis
Inline open — hover any URL to open it in a new tab or copy it individually
100% private — your HTML never leaves your browser

FAQs

What sources does this extract from?

href, src, action, data-src, data-href, poster attributes; CSS url() values; plain http/https URLs; protocol-relative // URLs; markdown [text](url) syntax; mailto: links.

How does it detect internal vs external links?

The first absolute domain found in your input is treated as the base domain. Relative URLs and URLs matching that domain are 'internal'; all others are 'external'.

What is the CSV export format?

Four columns: url, category, domain, protocol (https/http/other). Useful for importing to spreadsheets or further automated processing.

Is there a size limit?

No — runs entirely in your browser. Inputs up to several MB process in under a second on modern devices.