Help

The Web Crawl Configuration screen controls how your website is crawled and indexed for use by the AI.

The screen is organised into four tabs. This section of the help documentation mirrors those tabs:

  • General — Enable/disable the crawler, politeness delay, user agent, and HTTP→HTTPS conversion
  • Crawl Scope — Seed URLs, maximum pages, include/exclude patterns, URL parameters, and robots.txt
  • Image Extraction — AI auto-detection, image URL patterns, and XPaths for page thumbnails
  • Field Mappings — How to extract and transform metadata (title, description, image, date, etc.)

After you save changes, the configuration is written to your ingest config. Depending on what you changed, you may be prompted to trigger an index or HTML-processing job so that updates take effect.

You have unsaved changes