Bulk Scrape

The Bulk Scrape node enables you to scrape multiple URLs concurrently while managing resources efficiently.

Overview

This node allows you to:

Scrape multiple URLs
Manage concurrency
Handle rate limiting
Process results in batches
Export data in bulk

Configuration

Parameter	Type	Description
URLs	Array[String]	List of URLs to scrape
Concurrency	Number	Maximum concurrent requests
Rate Limit	Object	Rate limiting settings
Selectors	Object	Data extraction selectors
Export	Object	Export configuration

Example Usage

Basic Bulk Scrape

{
  "urls": [
    "https://example.com/page1",
    "https://example.com/page2",
    "https://example.com/page3"
  ],
  "concurrency": 3,
  "selectors": {
    "title": "h1",
    "content": ".main-content"
  }
}

Advanced Configuration

{
  "urls": ["https://example.com/page1", "https://example.com/page2"],
  "concurrency": 2,
  "rate_limit": {
    "requests_per_second": 1,
    "delay_between_requests": 1000
  },
  "selectors": {
    "title": {
      "selector": "h1",
      "type": "text"
    },
    "images": {
      "selector": "img",
      "type": "attribute",
      "attribute": "src"
    }
  },
  "export": {
    "format": "json",
    "filename": "scrape_results.json"
  }
}

Queue Management

{
  "queue": {
    "max_size": 1000,
    "retry_failed": true,
    "max_retries": 3,
    "priority": {
      "enabled": true,
      "rules": [
        {
          "pattern": "*.important.com",
          "priority": 1
        }
      ]
    }
  }
}

Error Handling

Retry Configuration

{
  "retry": {
    "max_attempts": 3,
    "codes": [429, 503],
    "delay": {
      "initial": 1000,
      "multiplier": 2,
      "max": 10000
    }
  }
}

Export Formats

JSON Export

{
  "export": {
    "format": "json",
    "pretty": true,
    "filename": "results.json"
  }
}

CSV Export

{
  "export": {
    "format": "csv",
    "headers": true,
    "delimiter": ",",
    "filename": "results.csv"
  }
}

Progress Tracking

{
  "progress": {
    "enabled": true,
    "update_interval": 1000,
    "callbacks": {
      "onProgress": "https://api.example.com/progress",
      "onComplete": "https://api.example.com/complete"
    }
  }
}

Resource Management

{
  "resources": {
    "max_memory": "1GB",
    "max_cpu": 0.8,
    "cleanup_interval": 5000
  }
}

Getting Started

Workflow Nodes

Voice AI

Chatbot AI

Bulk Scrape

Bulk Scrape

Overview

Configuration

Example Usage

Basic Bulk Scrape

Advanced Configuration

Queue Management

Error Handling

Retry Configuration

Export Formats

JSON Export

CSV Export

Progress Tracking

Resource Management

Getting Started

Workflow Nodes

Voice AI

Chatbot AI

​Bulk Scrape

​Overview

​Configuration

​Example Usage

​Basic Bulk Scrape

​Advanced Configuration

​Queue Management

​Error Handling

​Retry Configuration

​Export Formats

​JSON Export

​CSV Export

​Progress Tracking

​Resource Management

Bulk Scrape

Overview

Configuration

Example Usage

Basic Bulk Scrape

Advanced Configuration

Queue Management

Error Handling

Retry Configuration

Export Formats

JSON Export

CSV Export

Progress Tracking

Resource Management