Batch Processing Guide
Process large datasets efficiently with SavvyIQ's Entity Resolution API using asynchronous batch processing.
Overview
Entity resolution requests can take 2-5 minutes each and have concurrent connection limits. The async API enables processing of large files by:
- Submitting requests without holding connections
- Processing at 600 requests/minute (vs 10 concurrent for streaming)
- Collecting results when ready
- Handling failures gracefully
API Comparison
Streaming API (Not Recommended for Batch)
- Real-time responses (holds connection)
- Simpler to implement
- Limited to 10 concurrent requests
- Resource intensive for large datasets
Async API (Recommended)
- Submit requests immediately
- 600 requests/minute throughput
- Collect results when ready
- Optimized for batch processing
Two-Phase Workflow
Phase 1: Submit
- Read entities from CSV file
- Submit each to
/v2/entity-resolution/async
- Receive
request_id
for each submission - Write request IDs to tracking file
Phase 2: Collect
- Read request IDs from tracking file
- Check status via
/v2/entity-resolution/status/{id}
- Collect completed results
- Export to final CSV with full entity data
Quick Start Example
Loading...
Production-Ready Implementation
For production use, you need robust error handling, rate limiting, and resumability. Here are complete implementations:
JavaScript Implementation
Loading...
Python Implementation
Loading...
Input File Format
Your input CSV should have these columns (column names are flexible):
company_name,location
Apple Inc,California US
Microsoft Corporation,Washington US
Google LLC,California US
Amazon.com Inc,Washington US
Supported column names:
- Company:
company_name
,name
- Location:
location
,address
Note: Including location data, particularly country, significantly improves accuracy. For US companies with state codes, add "US" to avoid ambiguity (e.g., "CA" could be California or Canada).
Usage Instructions
1. Setup
Loading...
2. Submit Batch Requests
Loading...
3. Collect Results
Wait 5-10 minutes for processing, then collect results:
Loading...
Output Format
The final entity_results.csv
contains comprehensive entity data.
Key columns:
entity_id
: Use with/v1/entities/{entity_id}
for full business intelligenceconfidence
: Match confidence (0-100)resolution_status
:matched
,inconclusive
, orno_match
processing_status
:COMPLETED
,FAILED
, orPENDING
Best Practices
- Respect 600 requests/minute rate limit
- Include locations for better matching
- Consider streaming for very large datasets
Getting Help
- Usage Limits: Monitor your usage at savvyiq.ai/usage
- Support: Contact help@savvyiq.ai for technical assistance
- Rate Limit Increases: Email us about enterprise plans for higher limits
What's Next?
For Additional Processing
- Contact us about enterprise plans with higher rate limits
- Use Domain Intelligence API for email and website based entity resolution
Understanding Your Data
- Core Concepts - Deep dive into entity vs candidate results
- API Reference - Complete endpoint documentation