Batch Processing Guide

Process large datasets efficiently with SavvyIQ's Entity Resolution API using asynchronous batch processing.

Overview

Entity resolution requests can take 2-5 minutes each and have concurrent connection limits. The async API enables processing of large files by:

Submitting requests without holding connections
Processing at 600 requests/minute (vs 10 concurrent for streaming)
Collecting results when ready
Handling failures gracefully

API Comparison

Streaming API (Not Recommended for Batch)

Real-time responses (holds connection)
Simpler to implement
Limited to 10 concurrent requests
Resource intensive for large datasets

Async API (Recommended)

Submit requests immediately
600 requests/minute throughput
Collect results when ready
Optimized for batch processing

Two-Phase Workflow

Phase 1: Submit

Read entities from CSV file
Submit each to /v2/entity-resolution/async
Receive request_id for each submission
Write request IDs to tracking file

Phase 2: Collect

Read request IDs from tracking file
Check status via /v2/entity-resolution/status/{id}
Collect completed results
Export to final CSV with full entity data

Quick Start Example

Loading...

Production-Ready Implementation

For production use, you need robust error handling, rate limiting, and resumability. Here are complete implementations:

JavaScript Implementation

Loading...

Python Implementation

Loading...

Input File Format

Your input CSV should have these columns (column names are flexible):

company_name,location
Apple Inc,California US
Microsoft Corporation,Washington US
Google LLC,California US
Amazon.com Inc,Washington US

Supported column names:

Company: company_name, name
Location: location, address

Note: Including location data, particularly country, significantly improves accuracy. For US companies with state codes, add "US" to avoid ambiguity (e.g., "CA" could be California or Canada).

Usage Instructions

1. Setup

Loading...

2. Submit Batch Requests

Loading...

3. Collect Results

Wait 5-10 minutes for processing, then collect results:

Loading...

Output Format

The final entity_results.csv contains comprehensive entity data.

Key columns:

entity_id: Use with /v1/entities/{entity_id} for full business intelligence
confidence: Match confidence (0-100)
resolution_status: matched, inconclusive, or no_match
processing_status: COMPLETED, FAILED, or PENDING

Best Practices

Respect 600 requests/minute rate limit
Include locations for better matching
Consider streaming for very large datasets

Getting Help

Usage Limits: Monitor your usage at savvyiq.ai/usage
Support: Contact help@savvyiq.ai for technical assistance
Rate Limit Increases: Email us about enterprise plans for higher limits

What's Next?

For Additional Processing

Contact us about enterprise plans with higher rate limits
Use Domain Intelligence API for email and website based entity resolution

Understanding Your Data

Core Concepts - Deep dive into entity vs candidate results
API Reference - Complete endpoint documentation