Skip to content

Stat Collection

qCrawl provides a comprehensive, thread-safe, and extensible statistics system via the StatsCollector class.
It allows monitoring and recording various metrics during crawling sessions.

Key Features

  • Thread-Safe: Designed to work safely with synchronous counters in an async runtime.
  • Custom Metrics: Easily define and track custom statistics relevant to your crawl.
  • Built-in Metrics: The runtime emits common metrics (request/response counts, bytes, errors).
  • Exportable: Collected statistics can be retrieved programmatically for export or display.

Default Metrics

Metric key Description
spider_name Spider name
start_time Time when spider opened (ISO 8601 timestamp)
finish_time Time when spider closed (ISO 8601 timestamp)
finish_reason Reason the spider stopped (finished, error, etc.)
elapsed_time_seconds Total runtime in seconds
scheduler/request_scheduled_count Total URLs added to the scheduler (deduplicated adds)
scheduler/dequeued Counter incremented when a request is dropped/removed
downloader/request_downloaded_count Number of requests that reached the downloader (attempted fetch)
downloader/response_status_count Total responses received
downloader/response_status_{CODE} Responses grouped by HTTP status (e.g. downloader/response_status_200)
downloader/bytes_downloaded Total bytes received
pipeline/item_scraped_count Total items yielded to pipelines
engine/error_count Total exceptions/errors signalled as engine errors

Accessing Stats

During crawl

async with Crawler(spider, settings) as crawler:
    await crawler.crawl()

    # Get single value (inside context manager)
    downloaded = crawler.stats.get_value("downloader/request_downloaded_count", 0)

    # Get all stats snapshot
    all_stats = crawler.stats.get_stats()
    print(f"Downloaded {downloaded} pages")

After crawl

# Store stats before crawler closes
async with Crawler(spider, settings) as crawler:
    await crawler.crawl()
    stats = crawler.stats.get_stats()  # Get stats before exiting

# Use stats after crawler is closed
downloaded_count = stats.get('downloader/request_downloaded_count', 0)
print(f"Downloaded {downloaded_count} pages")

Adding Custom Metrics

Increment Counter

crawler.stats.inc_value("custom/my_metric", count=1)

Set Value

crawler.stats.set_counter("custom/processed_items", 42)
crawler.stats.set_meta("custom/last_run", "2025-04-05")

Preferred way to add custom metrics (using Signals):

async def on_response(sender, response, request=None, **kwargs):
    if "api" in getattr(response, "url", ""):
        sender.stats.inc_value("api_calls", count=1)

# Connect the handler to the crawler-bound dispatcher
crawler.signals.connect("response_received", on_response)