Command-Line Interface
You can run spiders programmatically within your Python code or via the command-line interface (CLI) provided by qCrawl. Here, let's explore CLI usage.
Configuration precedence
qCrawl has the following precedence order for applying settings:
flowchart LR
A(qCrawl defaults) --> B(TOML Config file) --> C(Environment variables) --> D(CLI) --> E(Programmatic overrides)
CLI usage
CLI is intended to run spiders with minimal setup. The basic syntax is:
qcrawl <spider> [options]
<spider> is the spider class path in the format module:ClassName. It is the only required argument.
Spider & Crawl Settings
| Option | Type | Default | Description |
|---|---|---|---|
spider |
str |
n/a | Spider path: module:Class, module.Class, or module. |
--setting, -s |
key=value |
[] |
Per-spider settings using key=value pairs (repeatable). Values can be JSON arrays/objects when wrapped in [...]/{...} |
Example
# --setting used multiple times
qcrawl mymodule:MySpider \
--setting concurrency=8 \
--setting concurrency_per_domain=2 \
--setting delay_per_domain=0.5 \
--setting max_depth=3
For the full list of settings, refer to the Settings documentation.
Output & Export
| Option | Type | Default | Description |
|---|---|---|---|
--export <path> |
str |
stdout |
Export destination (local path or - / stdout for stdout). Defaults to stdout if not specified. |
--export-format |
ndjson, json, csv, xml |
ndjson |
Export format. |
--export-mode |
buffered, stream |
buffered |
Export mode for JSON/NDJSON (buffered writes all at once, stream writes item-by-item). |
--export-buffer-size |
int |
500 |
Buffer size (only used when --export-format=json and --export-mode=buffered). |
Configuration File
| Option | Type | Default | Description |
|---|---|---|---|
--settings-file |
str |
None |
Load spider settings from TOML (merged with --setting args). |
Logging & Debugging
| Option | Type | Default | Description |
|---|---|---|---|
--log-level |
str |
INFO |
Logging verbosity (choices: DEBUG, INFO, WARNING, ERROR, CRITICAL). |
--log-file |
str |
None |
Write logs to file. |
Help & Version
| Option | Type | Default | Description |
|---|---|---|---|
--version |
flag | n/a | Print qCrawl version and exit. |
--help |
flag | n/a | Show help grouped by sections and exit. |