Batch Processing Guide

Process large datasets efficiently with SavvyIQ's Entity Resolution API using asynchronous batch processing.

Overview

Entity resolution requests can take 2-5 minutes each and have concurrent connection limits. The async API enables processing of large files by:

  • Submitting requests without holding connections
  • Processing at 600 requests/minute (vs 10 concurrent for streaming)
  • Collecting results when ready
  • Handling failures gracefully

API Comparison

Streaming API (Not Recommended for Batch)

  • Real-time responses (holds connection)
  • Simpler to implement
  • Limited to 10 concurrent requests
  • Resource intensive for large datasets

Async API (Recommended)

  • Submit requests immediately
  • 600 requests/minute throughput
  • Collect results when ready
  • Optimized for batch processing

Two-Phase Workflow

Phase 1: Submit

  1. Read entities from CSV file
  2. Submit each to /v2/entity-resolution/async
  3. Receive request_id for each submission
  4. Write request IDs to tracking file

Phase 2: Collect

  1. Read request IDs from tracking file
  2. Check status via /v2/entity-resolution/status/{id}
  3. Collect completed results
  4. Export to final CSV with full entity data

Quick Start Example

Loading...

Production-Ready Implementation

For production use, you need robust error handling, rate limiting, and resumability. Here are complete implementations:

JavaScript Implementation

Loading...

Python Implementation

Loading...

Input File Format

Your input CSV should have these columns (column names are flexible):

company_name,location
Apple Inc,California US
Microsoft Corporation,Washington US
Google LLC,California US
Amazon.com Inc,Washington US

Supported column names:

  • Company: company_name, name
  • Location: location, address

Note: Including location data, particularly country, significantly improves accuracy. For US companies with state codes, add "US" to avoid ambiguity (e.g., "CA" could be California or Canada).


Usage Instructions

1. Setup

Loading...

2. Submit Batch Requests

Loading...

3. Collect Results

Wait 5-10 minutes for processing, then collect results:

Loading...

Output Format

The final entity_results.csv contains comprehensive entity data.

Key columns:

  • entity_id: Use with /v1/entities/{entity_id} for full business intelligence
  • confidence: Match confidence (0-100)
  • resolution_status: matched, inconclusive, or no_match
  • processing_status: COMPLETED, FAILED, or PENDING

Best Practices

  • Respect 600 requests/minute rate limit
  • Include locations for better matching
  • Consider streaming for very large datasets

Getting Help

  • Usage Limits: Monitor your usage at savvyiq.ai/usage
  • Support: Contact help@savvyiq.ai for technical assistance
  • Rate Limit Increases: Email us about enterprise plans for higher limits

What's Next?

For Additional Processing

Understanding Your Data