Stat Collection

qCrawl provides a comprehensive, thread-safe, and extensible statistics system via the StatsCollector class.
It allows monitoring and recording various metrics during crawling sessions.

Key Features

Thread-Safe: Designed to work safely with synchronous counters in an async runtime.
Custom Metrics: Easily define and track custom statistics relevant to your crawl.
Built-in Metrics: The runtime emits common metrics (request/response counts, bytes, errors).
Exportable: Collected statistics can be retrieved programmatically for export or display.

Default Metrics

Metric key	Description
`spider_name`	Spider name
`start_time`	Time when spider opened (ISO 8601 timestamp)
`finish_time`	Time when spider closed (ISO 8601 timestamp)
`finish_reason`	Reason the spider stopped (`finished`, `error`, etc.)
`elapsed_time_seconds`	Total runtime in seconds
`scheduler/request_scheduled_count`	Total URLs added to the scheduler (deduplicated adds)
`scheduler/dequeued`	Counter incremented when a request is dropped/removed
`downloader/request_downloaded_count`	Number of requests that reached the downloader (attempted fetch)
`downloader/response_status_count`	Total responses received
`downloader/response_status_{CODE}`	Responses grouped by HTTP status (e.g. `downloader/response_status_200`)
`downloader/bytes_downloaded`	Total bytes received
`pipeline/item_scraped_count`	Total items yielded to pipelines
`engine/error_count`	Total exceptions/errors signalled as engine errors

Accessing Stats

During crawl

async with Crawler(spider, settings) as crawler:
    await crawler.crawl()

    # Get single value (inside context manager)
    downloaded = crawler.stats.get_value("downloader/request_downloaded_count", 0)

    # Get all stats snapshot
    all_stats = crawler.stats.get_stats()
    print(f"Downloaded {downloaded} pages")

After crawl

# Store stats before crawler closes
async with Crawler(spider, settings) as crawler:
    await crawler.crawl()
    stats = crawler.stats.get_stats()  # Get stats before exiting

# Use stats after crawler is closed
downloaded_count = stats.get('downloader/request_downloaded_count', 0)
print(f"Downloaded {downloaded_count} pages")

Adding Custom Metrics

Increment Counter

crawler.stats.inc_value("custom/my_metric", count=1)

Set Value

crawler.stats.set_counter("custom/processed_items", 42)
crawler.stats.set_meta("custom/last_run", "2025-04-05")

Preferred way to add custom metrics (using Signals):

async def on_response(sender, response, request=None, **kwargs):
    if "api" in getattr(response, "url", ""):
        sender.stats.inc_value("api_calls", count=1)

# Connect the handler to the crawler-bound dispatcher
crawler.signals.connect("response_received", on_response)