Overview

This document provides a comprehensive overview of MarketCheck’s Recreational Vehicle (RV) inventory data collection across U.S. markets. It outlines our web crawling methodology, quality assurance processes to ensure data accuracy and consistency, and the various options available for accessing the data, including APIs and bulk feeds.

What We Collect

MarketCheck collects comprehensive vehicle listings and inventory data from dealer websites. This includes key information such as make, model, year, price, mileage, VIN, vehicle features, and dealer contact details. We capture data for both new and used vehicles, ensuring broad and accurate coverage of the available market inventory.

Geographies

MarketSources
United StatesDealer websites

Timeline

MarketCheck began collecting recreational vehicle (RV) inventory data in 2020. Since then, we’ve built one of the most comprehensive and continuously updated RV datasets in the industry.

Scale

MarketWebsites Crawled DailyDaily Listings VolumeHistorical Dataset Size
United States1700+~250K~15 million listings*

Data Points: Each listing contains approximately 40 data points, ensuring detailed vehicle information across US markets.

How We Collect

MarketCheck uses Autobot, our proprietary crawling platform developed and refined over 5 years of continuous operation.

Discovery

MarketCheck indexes and classifies websites from the internet to add to its crawling platform. The websites that are discovered containing vehicle inventory data are then added to the crawling platform for regular crawl.

Methodology

Autobot employs a systematic approach with 24/7 crawling operations monitored by a dedicated operations team for uptime.

Website TypeCrawling Frequency
Dealer websitesDaily

Crawling Focus: MarketCheck only crawls inventory pages from sites - other pages are skipped. We do not crawl the full website.

Process

Phase 1: Search Result Pages (SRPs)

  • Autobot starts from seed target pages (search result pages on websites)
  • Extract all available data from SRPs: VIN, price, mileage, year/make/model, headlines, images
  • Most importantly: gather Vehicle Detail Page (VDP) links
  • All SRPs are crawled daily

Phase 2: Vehicle Detail Pages (VDPs) VDP crawling follows specific logic based on listing status:

Listing StatusVDP Crawl Decision
New listing (first time seen)VDP crawled same day
Existing listing (no changes in SRP)VDP skipped, unless 14+ days since last VDP crawl
Existing listing (changes detected)VDP crawled same day (price, mileage, or other attribute changes)

Extraction

MarketCheck uses rules-based extraction employing XPath expressions, regular expressions (regex), and JSON extraction over automated natural language extraction to achieve highest accuracy.

Processing Pipeline:

  1. Combine SRP and VDP data into unified vehicle listing
  2. Send to downstream applications for cleaning and enrichment
  3. Extensive parsing phase with data validation using external/internal references
  4. Persist final processed data to database

Quality Assurance

Website Coverage

The MarketCheck crawling platform access to all webpages during both search results page crawls and vehicle detail page crawls.

Issue Resolution:

  • Standard fixes: Within 24 hours
  • Complex issues: Up to 72 hours for difficult problems

Data Quality

After crawling and extraction phases are completed, the parsing process ensures data quality and consistency.

Response Timeline:

When quality issues are identified, alerts are raised and the operations team reviews and resolves them on priority within 24-48 hours.

How to Access

Access MethodDescription
Daily Data Feed DumpsComplete batch data delivery
API AccessReal-time programmatic data access

For detailed information about each access method, visit their respective documentation pages.


This data gathering operation represents 5 years of consistent, high-accuracy recreational vehicle data collection, providing customers with detailed vehicle inventory intelligence.