Get URL Data

Using the correct query

There are two ways to retrieve raw URL (/link/sitemap/etc) data from Deepcrawl:

  • This page describes how to retrieve defined metrics for URLs in the crawl. This query can be filtered, sorted, etc. but requires you to paginate URLs 100 at a time. This is perfect for getting a sample of the available data, but is not well suited to getting all data for a crawl.
  • The Download Raw Data query allows you to download all data from a datasource in a single request, however this cannot be filtered or sorted. This is the most efficient way to access all data.

Using the getCrawl query to access Crawl URL data

The sample query below will return 5 properties (fetchTime, pageTitle, responsive, url, wordCount) from the crawled URL but hundreds are available - for the comprehensive list, inspect type CrawlUrl.

query {
getCrawl(id: 1612640) {
first: 1
filter: {
datasourceCode: { eq: "crawl_urls" }
reportTypeCode: { eq: "basic" }
reportTemplateCode: { eq: "all_pages" }
segmentId: { isNull: true }
orderBy: [{ field: reportTemplateCode, direction: ASC }]
) {
nodes {
crawlUrls(first: 3) {
nodes {
