# Batch API

> Process thousands of records in a single request with MarketCheck's asynchronous Batch API.

The Batch API allows you to process large files of vehicle data asynchronously. Instead of making thousands of individual API calls, you submit one file, track progress via polling or webhooks, and download the results when processing completes.

> **IMPORTANT**: 
> The Batch API is an Enterprise package offering and is enabled on your account upon request. All batch requests require [authentication](/docs/get-started/api/authentication).



## When to Use Batch vs Single-Request API

| Scenario        | Recommended API |
|-----------------|-----------------|
| Decode a single VIN | Single-request NeoVIN Decode |
| Decode 1,000–3,000 VINs at once | Batch NeoVIN Decode |
| Rank a single vehicle | Single-request MarketMatch |
| Rank a large CSV of vehicles | Batch MarketMatch Rank |


Use the single-request API for real-time, low-latency lookups. Use the Batch API when you have a file of records to process and can wait for asynchronous results.



## Supported Batch Operations

| Operation     | Base Path     | Input Format  | Output Format |
|---------------|---------------|---------------|---------------|
| **NeoVIN Decode** | `/v2/batch/neovin/decode` | Plain CSV (`.csv`) | Gzip JSONL (`.jsonl.gz`) |
| **MarketMatch Rank** | `/v2/batch/marketmatch/rank` | Gzip CSV (`.csv.gz`) | Gzip CSV (`.csv.gz`) |


> **IMPORTANT**: 
> Input file formats differ between operations. NeoVIN Decode accepts **plain CSV** files and rejects gzip. MarketMatch Rank accepts **gzip-compressed CSV** files and rejects plain CSV.

- [Batch NeoVIN Decode documentation](/docs/api/cars/vehicle-specs/neovin#batch-neovin-decode)
- [Batch MarketMatch Rank documentation](/docs/api/cars/market-insights/marketmatch#batch-marketmatch-rank)



## How It Works

Every batch job follows the same workflow:

**1. Prepare your input file**
Create a CSV file with the required columns for your chosen operation. Compress it if the operation requires gzip.

**2. Submit the job**
Upload the file via a `POST` request to the operation's submit endpoint. You receive a `job_id` immediately.

**3. Track progress**
Poll the status endpoint using your `job_id`, or register a webhook URL at submission time to receive automatic notifications.

**4. Download results**
When the job status is `COMPLETED`, download the result file from the download endpoint.



## Job Statuses

Every batch job has one of three statuses:

| Status      | Terminal    | Description |
|-------------|-------------|-------------|
| `PROCESSING` | No | The job is being processed. Check `progress_percent` for progress (0–100). |
| `COMPLETED` | Yes | Processing finished. Results are available for download. |
| `FAILED` | Yes | Processing failed. Check `error_code` and `error_message` for details. |


Jobs always start in `PROCESSING` and transition to either `COMPLETED` or `FAILED`. There is no way to cancel a job or return it to `PROCESSING` from a terminal state.

```text
PROCESSING ───→ COMPLETED
     │
     └─────────→ FAILED
```



## Progress Tracking

The `progress_percent` field (0–100) reports processing progress as a percentage:

| Range   | Meaning |
|---------|---------|
| 0–10 | Job submitted, preparing to process |
| 10–90 | Records are being processed |
| 90–99 | Processing complete, preparing and finalizing results |
| 100 | Complete — results ready for download |


> **NOTE**: 
> Do not use `progress_percent` to estimate remaining time. Progress may advance in bursts and does not correlate linearly with elapsed time. For failed jobs, progress freezes at the last value reached before the failure.

**Recommended polling interval:** 60 seconds. More frequent polling provides no benefit and is subject to rate limiting.



## Idempotency

To safely retry a job submission after a network timeout, include an `Idempotency-Key` header:

```http
Idempotency-Key: your-unique-key-123
```

If a job with the same idempotency key already exists for your account, the API returns the existing job instead of creating a duplicate. Idempotency keys:

- Are scoped to your account — different accounts can use the same key
- Are permanent — once used, a key always returns the same job
- Are payload-independent — resubmitting with a different file and the same key returns the original job
- Have a maximum length of 255 characters



## Webhooks

Webhooks allow you to receive automatic notifications when your batch jobs complete or fail, instead of polling the status endpoint. Provide a `webhook_url` when submitting a job to enable them.

**Requirements:**

- Must use HTTPS
- Must resolve to a publicly routable IP address
- Maximum length: 2,048 characters



### Webhook Events

| Event        | Trigger      | Content-Type |
|--------------|--------------|--------------|
| `job.completed` | Job finished processing successfully | `application/json` |
| `job.failed` | Job failed or timed out | `application/json` |




### Verifying Webhook Signatures

If you provided a `webhook_secret` at submission time, each webhook includes a signature header for verification:

```http
X-Webhook-Signature: t=1773585000,v1=a1b2c3d4e5f6...
X-Webhook-ID: 550e8400-e29b-41d4-a716-446655440000
```

**Verification steps:**

1. Extract `t` (timestamp) and `v1` (signature) from the `X-Webhook-Signature` header
2. Reject the webhook if `|current_time - t| > 300` seconds (5-minute replay window)
3. Compute `HMAC-SHA256(your_webhook_secret, "{t}.{raw_request_body}")`
4. Compare your computed signature with `v1` using a constant-time comparison

**Python example:**

```python
import hmac
import hashlib
import time

def verify_webhook(payload_body, signature_header, secret):
    parts = dict(p.split("=", 1) for p in signature_header.split(","))
    timestamp = parts["t"]
    expected_sig = parts["v1"]

    # Reject stale webhooks (> 5 minutes old)
    if abs(time.time() - int(timestamp)) > 300:
        raise ValueError("Webhook timestamp too old")

    computed = hmac.new(
        secret.encode(),
        f"{timestamp}.{payload_body}".encode(),
        hashlib.sha256
    ).hexdigest()

    if not hmac.compare_digest(computed, expected_sig):
        raise ValueError("Invalid webhook signature")
```

The `X-Webhook-ID` header contains the `job_id` and can be used as a deduplication key to handle duplicate deliveries.

> **NOTE**: 
> In webhook payloads, `completed_at` is always present for both `job.completed` and `job.failed` events. In the status API response, `completed_at` appears only for `COMPLETED` jobs — it is omitted for `PROCESSING` and `FAILED` jobs.



### Webhook Retry Behavior

If your webhook endpoint is unavailable, the API retries delivery with exponential backoff:

| Attempt           | Approximate Delay |
|-------------------|-------------------|
| 1 | Immediate |
| 2 | ~30 seconds |
| 3 | ~1 minute |
| 4 | ~2 minutes |
| 5 | ~4 minutes |


After 5 failed attempts (~9 minutes total), the API marks the webhook as permanently failed and stops retrying.

| Your Response | API Behavior  |
|---------------|---------------|
| 2xx | Delivery successful — no retries |
| 429 | Rate limited — retries with `Retry-After` header honored |
| 4xx (except 429) | Permanent failure — no retries |
| 5xx | Temporary failure — retries with backoff |
| Connection error / timeout | Temporary failure — retries with backoff |


> **TIP**: 
> Return `200 OK` immediately after receiving the webhook, then process it asynchronously. If your processing takes too long, the request may time out and trigger unnecessary retries.



## Downloads

Results for completed jobs are available for download via the operation's download endpoint. Each download call streams the result file directly.

**Key constraints:**

- **3 downloads per job.** Each call to the download endpoint consumes one slot. Save the file on first download.
- **POST method.** The download endpoint uses `POST` (not `GET`) because each call has side effects (incrementing the download counter).



### File Integrity Verification

Every download response includes an `X-File-Checksum` header containing the SHA-256 checksum of the file. Verify integrity after downloading:

```text
echo "EXPECTED_CHECKSUM  results.jsonl.gz" | sha256sum -c -
```

Where `EXPECTED_CHECKSUM` is the value from the `X-File-Checksum` response header.



## Error Format

All batch API errors follow the [RFC 7807 Problem Details](https://www.rfc-editor.org/rfc/rfc7807) standard:

```text
{
    "type": "about:blank",
    "title": "Bad Request",
    "status": 400,
    "detail": "Uploaded file must be a valid CSV (.csv)",
    "code": "invalid_file_format",
    "instance": "req-a1b2c3d4"
}
```

| Field       | Type        | Description |
|-------------|-------------|-------------|
| `type` | string | Always `"about:blank"` |
| `title` | string | HTTP status phrase |
| `status` | integer | HTTP status code |
| `detail` | string | Human-readable error description |
| `code` | string | Machine-readable error code for programmatic handling |
| `instance` | string | Request ID — include when contacting support |




## Concurrency

Each account is limited to **1 active job per operation** at a time. Submitting a second job while one is active returns `409 Conflict` with error code `active_job_exists`. Wait for your current job to complete or fail before submitting a new one.

You can have one NeoVIN Decode job and one MarketMatch Rank job running simultaneously, as the limit is per operation.



## Best Practices

**Retry strategy:**

- Use the `Idempotency-Key` header on all submission requests to safely retry after timeouts
- For 5xx errors and network timeouts, retry with exponential backoff: wait 1s, then 2s, then 4s, up to 30s between attempts
- Never retry 4xx errors — these indicate a problem with your request that must be fixed

**Efficient polling:**

- Poll the status endpoint every 60 seconds, not more frequently
- Use webhooks instead of polling when possible — they provide immediate notification with no wasted requests

**File handling:**

- Verify downloaded files using the `X-File-Checksum` header
- Save downloaded files immediately — you have only 3 download attempts per job

**Error handling:**

- Always check the `code` field in error responses for programmatic error handling
- Log the `instance` field from error responses — include this when contacting support
- Handle `409 active_job_exists` gracefully — this means your previous job is still running, not that something is broken
