Overview

Complete overview of MarketCheck's automotive inventory data collection across US, Canada, and UK markets, including our crawling methodology, quality assurance processes, and data access options.

MarketCheck provides comprehensive automotive inventory data through systematic collection from dealer websites, auction sites, and private party sellers across multiple markets. This guide explains our data collection methodology, coverage, and quality assurance processes.

What We Collect

MarketCheck collects vehicle listings and inventory data from dealer websites, auction sites, and for-sale-by-owner (FSBO) platforms.

Geographies

MarketSources
United StatesDealer websites, auction sites, private party sellers
CanadaDealer websites, private party sellers
United KingdomDealer websites, private party sellers
MarketCheck does not crawl aggregator websites and classifieds to ensure data quality and accuracy from original sources.

Timeline

MarketCheck has been gathering car inventory data since 2015, establishing one of the most complete automotive datasets in the industry.

Platform Evolution

2015 ──── US Dealer Websites
  │
2017 ──── US Private Party Sellers
  │
2018 ──── Canadian Dealer Websites
  │
2018 ──── US Auction Sites
  │
2022 ──── UK Dealer Websites
  │
2023 ──── Canadian Private Party Sellers
  │
2024 ──── UK Private Party Sellers

Scale

MarketWebsites Crawled DailyDaily Listings VolumeHistorical Dataset Size
United States80,000+~15 million~5 billion listings*
Canada8,200+~1 million
United Kingdom10,000+~600,000~65 million
*US & Canada combined dataset (predominantly US)

Data Points: Each listing contains approximately 110 data points, ensuring detailed vehicle information across all markets.

How We Collect

MarketCheck uses Autobot, our proprietary crawling platform developed and refined over 10 years of continuous operation.

Discovery

MarketCheck indexes and classifies websites from the internet to add to its crawling platform. The websites that are discovered containing car inventory data are then added to the crawling platform for regular crawl.

Methodology

Autobot employs a systematic approach with 24/7 crawling operations monitored by a dedicated operations team for uptime.

Website TypeCrawling Frequency
Dealer websites (all countries)Daily
Auction and private party sitesEvery 48 hours

Crawling Focus: MarketCheck only crawls inventory pages from sites - other pages are skipped. We do not crawl the full website.

Process

Phase 1: Search Result Pages (SRPs)

  • Autobot starts from seed target pages (search result pages on websites)
  • Extract all available data from SRPs: VIN, price, mileage, year/make/model, headlines, images
  • Most importantly: gather Vehicle Detail Page (VDP) links
  • All SRPs are crawled daily

Phase 2: Vehicle Detail Pages (VDPs) VDP crawling follows specific logic based on listing status:

Listing StatusVDP Crawl Decision
New listing (first time seen)VDP crawled same day
Existing listing (no changes in SRP)VDP skipped, unless 14+ days since last VDP crawl
Existing listing (changes detected)VDP crawled same day (price, mileage, or other attribute changes)

Extraction

MarketCheck uses rules-based extraction employing XPath expressions, regular expressions (regex), and JSON extraction over automated natural language extraction to achieve highest accuracy.

Processing Pipeline:

  1. Combine SRP and VDP data into unified car listing
  2. Send to downstream applications for cleaning and enrichment
  3. Extensive parsing phase with data validation using external/internal references
  4. Add computed data points and enrich listings
  5. Persist final processed data to database

Quality Assurance

Website Coverage

MarketCheck employs continuous operational monitoring of its crawling platform to ensure complete inventory coverage. During both search result page crawls and vehicle detail page crawls, the system ensures access to all webpages through necessary means so that no inventory listings are lost.

Monitoring and Alerts:

When connectivity issues are detected—whether complete or partial website access problems—alerts are immediately raised and sent to the operations team who monitor crawls 24/7.

Issue Resolution:

  • Standard fixes: Within 24 hours
  • Complex issues: Up to 72 hours for difficult problems
  • Temporary outages: Continuous monitoring and probing until resolution

If a website is temporarily down or status is uncertain, MarketCheck continues monitoring and probing for uptime over the foreseeable period.

Data Quality

After crawling and extraction phases are completed, the parsing process ensures data quality and consistency. The operations team conducts multiple daily reviews of crawled pages and extracted data from previous windows, verifying that coverage and quality of critical data points remain consistent and meet standards.

Quality Assurance Process:

  • Multiple daily reviews of crawled pages and extracted data
  • Coverage consistency checks for critical data points
  • Quality threshold monitoring and validation

Response Timeline:

When quality issues are identified, alerts are raised and the operations team reviews and resolves them on priority within 24-48 hours.

This consistent operation has maintained high-accuracy automotive data collection for 10 years, continuously strengthening our extensive dataset.

How to Access

Access MethodDescription
Daily Data Feed DumpsComplete batch data delivery
API AccessReal-time programmatic data access

For detailed information about each access method, visit their respective documentation pages.


This data gathering operation represents 10 years of consistent, high-accuracy automotive data collection, providing customers with detailed vehicle inventory intelligence.