Extract Cards

The Extract Cards node helps you extract repeated card-style elements from web pages.

Overview

This node enables you to:

  • Extract repeated elements
  • Parse card structures
  • Handle dynamic loading
  • Process grid layouts
  • Extract media content

Configuration

ParameterTypeDescription
URLStringTarget webpage URL
Card SelectorStringCSS selector for card elements
FieldsObjectMapping of card fields to selectors
PaginationObjectPagination configuration

Example Usage

Basic Card Extraction

{
  "url": "https://example.com/products",
  "card_selector": ".product-card",
  "fields": {
    "title": ".card-title",
    "price": ".price",
    "image": "img.product-image",
    "description": ".description"
  }
}

Advanced Configuration

{
  "url": "https://example.com/products",
  "card_selector": ".product-card",
  "fields": {
    "title": {
      "selector": ".card-title",
      "attribute": "text",
      "transform": "trim"
    },
    "price": {
      "selector": ".price",
      "attribute": "text",
      "transform": "number"
    },
    "image": {
      "selector": "img.product-image",
      "attribute": "src"
    },
    "rating": {
      "selector": ".rating",
      "attribute": "data-rating"
    }
  },
  "pagination": {
    "enabled": true,
    "next_button": ".pagination .next",
    "max_pages": 5
  }
}

Field Types

Text Content

{
  "field": {
    "selector": ".text-content",
    "attribute": "text",
    "transform": ["trim", "lowercase"]
  }
}

Images

{
  "field": {
    "selector": "img",
    "attributes": ["src", "alt"],
    "download": true
  }
}
{
  "field": {
    "selector": "a",
    "attributes": ["href", "title"],
    "follow": true
  }
}

Data Processing

Transformations

{
  "transformations": {
    "price": ["remove_currency", "to_number"],
    "description": ["trim", "remove_html"]
  }
}

Validation Rules

{
  "validation": {
    "required": ["title", "price"],
    "types": {
      "price": "number",
      "rating": "float"
    }
  }
}

Error Handling

Common issues and solutions:

  • Missing elements
  • Invalid selectors
  • Dynamic content
  • Rate limiting
  • Network errors