Requests and Responses
qCrawl uses Request and Page (response) objects to handle HTTP interactions during the crawling process.
Request
Request is the canonical object for instructing the crawler what to fetch. It contains url, headers, priority,
immutable body (bytes), and a meta dict for crawler, spider, middleware, and metadata.
Serialization
Request.to_bytes()serializes to a compact binary format using Msgspec.Request.from_bytes()orRequest.from_dict()deserialize from bytes/dict; both methods validate types and raiseTypeErrorfor malformed input.
Creating and yielding requests
A Request with custom headers is yielded from a spider.
from qcrawl.core.request import Request
req = Request(
url="https://example.com/page",
headers={"X-Custom-Header": "value"},
priority=5,
meta={"depth": 2}
)
yield req # from spider's parse(), start_requests(), or middleware
Accessing response data
Inside Spider.parse(), you receive a Page object representing the HTTP response:
async def parse(self, response: Page):
url = response.url
status = response.status_code
content = response.content # bytes
headers = response.headers
request = response.request # the original Request object
# Process the response content...
Page (response)
Page is the response object produced by the Downloader and passed to Spider.parse().
It provides attributes (content, status_code, headers, url, request) and methods (text(), json()).
Creating Page objects (manually)
You typically do not create Page objects yourself; they are produced by the downloader. For unit tests or middleware
mocks you can construct one explicitly:
from qcrawl.core.response import Page
fake_response = Page(
url="https://example.com/fake",
status_code=200,
content=b"<html>...</html>",
headers={"Content-Type": "text/html"},
request=original_request # optional, link to the originating Request
)
Best practices
- Use
Request.copy()before mutatingmetaorheadersin middleware or spider code to avoid accidental shared-mutation bugs. - Use
metaonly for crawler-internal flags (depth tracking, retry counters, etc.). Never store scraped data here — useIteminstead.