Skip to main content

Get URL Data

Using the correct query​

There are two ways to retrieve raw URL (/link/sitemap/etc) data from Lumar:

  • This page describes how to retrieve defined metrics for URLs in the crawl. This query can be filtered, sorted, etc. but requires you to paginate URLs 100 at a time. This is perfect for getting a sample of the available data, but is not well suited to getting all data for a crawl.
  • The Download Raw Data query allows you to download all data from a datasource in a single request, however this cannot be filtered or sorted. This is the most efficient way to access all data.

Using the getCrawl query to access Crawl URL data​

The sample query below will return 5 properties (fetchTime, pageTitle, responsive, url, wordCount) from the crawled URL but hundreds are available - for the comprehensive list, inspect type CrawlUrl.

query {
getCrawl(id: 1612640) {
reports(
first: 1
filter: {
datasourceCode: { eq: "crawl_urls" }
reportTypeCode: { eq: "basic" }
reportTemplateCode: { eq: "all_pages" }
segmentId: { isNull: true }
}
orderBy: [{ field: reportTemplateCode, direction: ASC }]
) {
nodes {
crawlUrls(first: 3) {
nodes {
fetchTime
pageTitle
responsive
url
wordCount
}
totalCount
}
}
}
}
}

Try in explorer