Documentation Index
Fetch the complete documentation index at: https://docs.svalync.com/llms.txt
Use this file to discover all available pages before exploring further.
Bulk Scrape
The Bulk Scrape node enables you to scrape multiple URLs concurrently while managing resources efficiently.
Overview
This node allows you to:
- Scrape multiple URLs
- Manage concurrency
- Handle rate limiting
- Process results in batches
- Export data in bulk
Configuration
| Parameter | Type | Description |
|---|
| URLs | Array[String] | List of URLs to scrape |
| Concurrency | Number | Maximum concurrent requests |
| Rate Limit | Object | Rate limiting settings |
| Selectors | Object | Data extraction selectors |
| Export | Object | Export configuration |
Example Usage
Basic Bulk Scrape
{
"urls": [
"https://example.com/page1",
"https://example.com/page2",
"https://example.com/page3"
],
"concurrency": 3,
"selectors": {
"title": "h1",
"content": ".main-content"
}
}
Advanced Configuration
{
"urls": ["https://example.com/page1", "https://example.com/page2"],
"concurrency": 2,
"rate_limit": {
"requests_per_second": 1,
"delay_between_requests": 1000
},
"selectors": {
"title": {
"selector": "h1",
"type": "text"
},
"images": {
"selector": "img",
"type": "attribute",
"attribute": "src"
}
},
"export": {
"format": "json",
"filename": "scrape_results.json"
}
}
Queue Management
{
"queue": {
"max_size": 1000,
"retry_failed": true,
"max_retries": 3,
"priority": {
"enabled": true,
"rules": [
{
"pattern": "*.important.com",
"priority": 1
}
]
}
}
}
Error Handling
Retry Configuration
{
"retry": {
"max_attempts": 3,
"codes": [429, 503],
"delay": {
"initial": 1000,
"multiplier": 2,
"max": 10000
}
}
}
JSON Export
{
"export": {
"format": "json",
"pretty": true,
"filename": "results.json"
}
}
CSV Export
{
"export": {
"format": "csv",
"headers": true,
"delimiter": ",",
"filename": "results.csv"
}
}
Progress Tracking
{
"progress": {
"enabled": true,
"update_interval": 1000,
"callbacks": {
"onProgress": "https://api.example.com/progress",
"onComplete": "https://api.example.com/complete"
}
}
}
Resource Management
{
"resources": {
"max_memory": "1GB",
"max_cpu": 0.8,
"cleanup_interval": 5000
}
}