Key Concepts

Attribution logic for duplicate listings, sales inference, and inventory quality concepts for MarketCheck's car inventory data

Listing

A listing represents a vehicle record available for sale by a dealer on their website. Key characteristics include:

  • Unique Identification: Each listing has a unique identifier (id) composed of the vehicle's VIN and a UUID (Universally Unique Identifier), which remains constant throughout the listing's lifetime
  • Listing Evolution: A single vehicle on a dealer's website may generate multiple listings over time as the dealer updates information such as pricing, status, or specifications. For details on how listings change, see Listing Lifecycle
  • Multi-Dealer Scenarios: The same vehicle (identified by VIN) may appear as separate listings across different dealer websites when multiple dealers list the same vehicle. Our attribution system handles these scenarios as described in Attribution (Searchable Listings)

Attribution (Searchable Listings)

Vehicle attribution is the process of identifying which dealer actually has physical possession of a specific vehicle when the same vehicle (identified by VIN) appears across multiple dealer websites or domains. The system analyzes various data points to identify the "true owner" and marks only that dealer's listing as "searchable".

What is Attribution?

When a car with VIN "ABC123" appears on 5 different dealer websites, attribution determines which of those 5 dealers actually has the physical car on their lot. Only that dealer's listing gets marked as is_searchable=1, while others remain is_searchable=0.

Why Attribution is Required

For buyers & market clarity:

  • Helps users find the dealer who actually has the car available for purchase
  • Eliminates confusion about car location and availability
  • Ensures buyers contact the right dealer

For data analytics & downstream processing:

  • Simplifies analysis and computations by avoiding duplicate counting
  • Prevents the same vehicle from being aggregated multiple times in market statistics
  • Provides clean, deduplicated data for inventory analysis

For sales attribution & dealer performance:

  • Enables accurate tracking of which dealer sold a specific car
  • Allows proper measurement of dealer performance and sales metrics
  • Ensures fair credit is given to the dealer that actually sold the vehicle

For business intelligence:

  • Provides reliable data for market share analysis
  • Enables accurate pricing and inventory trend analysis
  • Supports competitive intelligence without data distortion

How Attribution Works

The attribution system uses a 8-level hierarchical process to determine which dealer website should mark a car as "searchable" (visible to buyers).

Data Collection

When the same car (VIN) appears on multiple websites, we extract vehicle details from each listing's Vehicle Detail Page (VDP).

For example, if a car appears on 8 different dealer websites, we collect information like location, seller name, and address from all 8 listings. Often, 4-5 of these listings will indicate the car is "available for transfer from" or "available for sale at" a specific location/dealer, while others may show different details.

Attribution Decision

The system uses this collected data from all listings to assess ownership. It then applies matching criteria across 8 levels to determine which dealer listing should be marked as searchable.

Below is a breakdown of each attribution level, along with the logic and fields involved:

LevelNameMatching FieldsLogic Summary
1Single ListingN/AOnly one listing exists
2ZIP Code Matchingzip, car_zipZIP codes match
3Seller Name & City Matchingseller_name, car_city, car_seller_nameSeller name and car city match
4Seller Name Matchingseller_name, car_seller_nameSeller names match exactly
5Address & Seller Matchingseller_address, car_address, unformatted_addressAddress components match
6Domain & Seller Correlationdomain, car_seller_nameDomain correlates with seller name
7Brand Matchingmake, car_seller_nameDealer specializes in the car's brand
8Latest Published Date Comparisonscraped_at_dateEarliest legitimate listing date

level_ss Indicator

The level_ss field indicates which attribution level successfully matched the car to a dealer, helping you understand the reliability and method of attribution. Lower level numbers indicate higher confidence matching.

Accessing Attribution Data

In data feeds:

  • Attribution information is available as the is_searchable column
  • is_searchable=true: Car attributed to this dealer (appears in search results)
  • is_searchable=false: Car not attributed to this dealer (hidden from search)

In API:

  • Attribution is turned on by default - API returns only searchable listings
  • For each VIN, only one listing (the attributed dealer) appears in search results
  • This ensures clean, deduplicated results for most use cases

Getting all listings (deduplication override):

  • Use nodedup=true parameter to get all listings for a VIN regardless of attribution
  • Useful for specific use cases like:
    • Analyzing all vehicles listed by a particular dealer (regardless of actual ownership)
    • Market research requiring complete listing visibility
    • Data analysis needing full market coverage
  • See API documentation for more details.

Inferred Sales

Inferred Sales is MarketCheck's system for identifying when vehicles have likely been sold, even when dealers don't explicitly mark listings as "sold". It uses vehicle listing activity patterns to mark a listing as "sold" based on absence and timing.

This system complements attribution by tying the sale to the same dealer who was previously identified as having physical possession of the vehicle (i.e., the searchable listing).

Inferred Sale Rules

A vehicle (VIN) is considered sold when:

  1. The VIN no longer appears in the active dataset (from the latest daily crawl)
  2. The last known status_date of any listing for that VIN is older than a set threshold, currently 7 days
  3. That final listing was the searchable one, i.e., the dealer most likely to have sold it

Inference Logic

  1. VIN Evaluation
    • Check if a VIN is no longer present in current-day active inventory
  2. Listing Aggregation
    • Retrieve all past listings for that VIN across domains
  3. Identify latest status_date and compare gap
    • Determine the most recent status_date, calculate the number of days since that date, and, if the gap exceeds the 7‑day threshold, mark the VIN as sold
  4. Select the Attributed Dealer
    • Among all listings, the previously attributed (searchable) dealer is marked as the seller

Example

Suppose VIN VIN123 meets the following conditions:

  • The last recorded status_date is 2023-08-01
  • The current date (today_date) is 2023-10-16
  • The gap between these dates is 76 days, which is greater than the 7-day threshold, so the vehicle is considered sold
  • The last listing for this VIN had is_searchable = 1, meaning attribution applied and this dealer was identified as having physical possession
  • As a result, this listing is marked with is_sold_listing = 1
  • Any earlier listing for the same VIN with is_searchable = 0 (i.e., not attributed to physical possession) is not marked as sold

This process ensures that only the dealer who was most likely in possession of the vehicle at the time of sale is credited with the inferred sale.

Accessing Inferred Sales Data

In data feeds:

  • Inferred sales information is available as the is_sold_listing column only in historical data feeds
  • is_sold_listing=true: Vehicle inferred sold by the attributed dealer
  • is_sold_listing=false: Vehicle not inferred sold by this dealer or still active

In API:

  • Inferred sales information is available in Past Inventory Search APIs (aka Recent Inventory APIs)
  • By default, all expired listings (the ones that are no longer active) are returned, these include listings that are marked as sold
  • Use sold=true parameter to filter for listings that are inferred to be sold
  • See API documentation for more details

Key Takeaways

  • Attribution ensures only the true dealer listing is searchable, improving data quality and user experience.
  • Inferred Sales provides a reliable way to track vehicle sales, even without explicit dealer input.
  • Both features are accessible via MarketCheck data feeds and APIs, supporting a wide range of business and analytics needs.

VIN v/s Non-VIN Listings

MarketCheck's inventory data primarily revolves around VIN (Vehicle Identification Number) listings, which are unique identifiers for each vehicle. However, there are also non-VIN listings where VIN is not available or not provided.

  • VIN listings make up the majority of our inventory data, as they allow for precise identification and tracking of vehicles.
  • Non-VIN listings make up a smaller percentage of the total inventory. These are not high quality listings since they lack the unique VIN identifier, making it difficult enrich and normalize the data.
  • Non-VIN listings are included in feed and API, mainly for completeness, but they are not as reliable for detailed analysis or attribution.
  • By default, these are not included in search results, but can be accessed using the include_non_vin_listings=true parameter in API calls.