There are various motives you might need to search out all of the URLs on an internet site, but your specific aim will identify Whatever you’re searching for. As an example, you may want to:
Determine every single indexed URL to analyze challenges like cannibalization or index bloat
Accumulate present and historic URLs Google has seen, specifically for internet site migrations
Obtain all 404 URLs to Get well from publish-migration glitches
In Every state of affairs, a single Instrument gained’t give you anything you need. Unfortunately, Google Look for Console isn’t exhaustive, and also a “internet site:example.com” research is limited and challenging to extract facts from.
With this publish, I’ll wander you thru some equipment to make your URL list and in advance of deduplicating the information utilizing a spreadsheet or Jupyter Notebook, determined by your web site’s sizing.
Outdated sitemaps and crawl exports
If you’re searching for URLs that disappeared with the Dwell web site not too long ago, there’s an opportunity a person on the crew can have saved a sitemap file or even a crawl export before the adjustments had been designed. For those who haven’t now, check for these data files; they might typically supply what you will need. But, for those who’re reading through this, you most likely did not get so lucky.
Archive.org
Archive.org
Archive.org is a useful Resource for Website positioning jobs, funded by donations. When you seek for a website and choose the “URLs” option, it is possible to access as many as ten,000 detailed URLs.
Nevertheless, Here are a few limitations:
URL limit: It is possible to only retrieve as much as web designer kuala lumpur ten,000 URLs, that is inadequate for more substantial web sites.
Good quality: Several URLs may very well be malformed or reference useful resource files (e.g., images or scripts).
No export option: There isn’t a built-in method to export the listing.
To bypass The dearth of an export button, make use of a browser scraping plugin like Dataminer.io. Nevertheless, these constraints necessarily mean Archive.org may well not present an entire Alternative for bigger internet sites. Also, Archive.org doesn’t reveal regardless of whether Google indexed a URL—however, if Archive.org located it, there’s a great likelihood Google did, too.
Moz Professional
Whilst you might generally utilize a website link index to uncover exterior web sites linking for you, these applications also discover URLs on your website in the process.
How you can utilize it:
Export your inbound inbound links in Moz Pro to secure a rapid and straightforward list of concentrate on URLs out of your web-site. In the event you’re addressing a large Site, consider using the Moz API to export details further than what’s workable in Excel or Google Sheets.
It’s vital that you Take note that Moz Professional doesn’t affirm if URLs are indexed or found out by Google. Having said that, given that most websites utilize the exact same robots.txt principles to Moz’s bots because they do to Google’s, this technique normally works properly to be a proxy for Googlebot’s discoverability.
Google Look for Console
Google Look for Console offers numerous precious resources for setting up your list of URLs.
Backlinks studies:
Similar to Moz Professional, the Links part offers exportable lists of focus on URLs. Regrettably, these exports are capped at one,000 URLs Each and every. You can use filters for precise internet pages, but because filters don’t use to your export, you might need to rely on browser scraping tools—restricted to 500 filtered URLs at any given time. Not excellent.
Overall performance → Search engine results:
This export provides you with an index of pages getting lookup impressions. Even though the export is restricted, You should use Google Research Console API for larger datasets. You will also find no cost Google Sheets plugins that simplify pulling more extensive details.
Indexing → Web pages report:
This section presents exports filtered by situation type, however they're also minimal in scope.
Google Analytics
Google Analytics
The Engagement → Internet pages and Screens default report in GA4 is a wonderful supply for gathering URLs, using a generous Restrict of 100,000 URLs.
Better still, it is possible to use filters to generate diverse URL lists, efficiently surpassing the 100k Restrict. One example is, if you'd like to export only blog URLs, stick to these techniques:
Move one: Increase a phase to the report
Step 2: Click “Make a new phase.”
Action 3: Determine the section having a narrower URL sample, which include URLs that contains /weblog/
Note: URLs located in Google Analytics might not be discoverable by Googlebot or indexed by Google, but they provide valuable insights.
Server log documents
Server or CDN log data files are Most likely the ultimate Device at your disposal. These logs capture an exhaustive checklist of every URL route queried by buyers, Googlebot, or other bots through the recorded period.
Factors:
Knowledge dimension: Log data files might be huge, lots of web-sites only keep the final two months of knowledge.
Complexity: Examining log files might be complicated, but different instruments are offered to simplify the procedure.
Incorporate, and great luck
As soon as you’ve gathered URLs from every one of these sources, it’s time to mix them. If your internet site is small enough, use Excel or, for larger datasets, applications like Google Sheets or Jupyter Notebook. Assure all URLs are persistently formatted, then deduplicate the record.
And voilà—you now have a comprehensive list of recent, aged, and archived URLs. Very good luck!