Airgentic Help
This module explains how content gets into Airgentic and how the AI finds relevant information when answering questions. Read this when you want to understand content indexing, search configuration, or troubleshoot why certain answers aren't working.
Content flows through several stages before the AI can use it:
Source → Crawl/Upload → Processing → Indexing → Search → Answer
Understanding this pipeline helps you diagnose problems at the right stage.
The web crawler automatically fetches and processes pages from your website.
What you control:
- Seed URLs — Starting points for the crawl
- Crawl scope — Include/exclude patterns that determine which pages are crawled
- Politeness delay — How fast the crawler requests pages
- Field mappings — How metadata is extracted from pages
When content updates:
Content changes on your website aren't automatically reflected. You must run a data sync to re-crawl and re-index.
Related: Crawler Settings, Data Sync
For content not available via web crawl, upload files directly.
Supported formats:
- PDF documents
- Microsoft Word (.doc, .docx)
- HTML files
Public vs secure:
- Public documents — Anyone can view citations
- Secure documents — Only authenticated users can view citations
When content updates:
After uploading, click Index Documents to make new content searchable.
Related: Upload Documents
When a user asks a question:
| Factor | Impact |
|---|---|
| Content coverage | If the information isn't indexed, it can't be found |
| Content quality | Well-structured pages with clear text rank better |
| Query matching | The AI's query must match how content is written |
| Category configuration | Boosts and mappings affect ranking |
| Synonym rules | Help match user terminology to content terminology |
Categories group indexed pages into named buckets (e.g., Products, Support, News).
AI Auto-Categorization: When enabled, Airgentic analyses pages and assigns categories automatically during indexing.
Manual Mappings: Human-defined rules that match pages based on URL patterns or metadata.
Category boosts: Adjust ranking for entire categories. A boost of +3 promotes pages in that category higher in results.
When to use categories:
- When you want to label result types in the widget
- When you want to boost certain content types
- When you want agents to search specific categories
Synonym rules modify user queries before search:
| Rule type | Effect |
|---|---|
| Add | Search for both terms (X and Y) |
| Replace | Search for Y instead of X |
| Delete | Remove X from the query |
Examples:
- Add: "FAQ" → also search "frequently asked questions"
- Replace: "T&Cs" → "terms and conditions"
- Delete: "pdf" (removes noise word)
When to use synonyms:
- Users search for abbreviations not in your content
- Your content uses different terminology than users
- Common words are cluttering searches
Agents can be restricted to search only specific URL prefixes:
https://www.example.com/products/
https://www.example.com/support/
Use this to ensure specialist agents only see relevant content.
Related: Search Settings
Control which pages the crawler fetches:
| Setting | Purpose |
|---|---|
| Seed URLs | Starting points for the crawl |
| Maximum pages | Limit how many pages are crawled |
| Include patterns | URLs that should be crawled (regex) |
| Exclude patterns | URLs that should not be crawled (regex) |
| URL parameters | How to handle query parameters |
Common configurations:
- Crawl only specific sections of a large site
- Exclude login pages, print versions, or redundant content
- Handle pagination correctly
Control how metadata is extracted from pages:
| Field | What it's used for |
|---|---|
| Title | Document title in search results and citations |
| Description | Summary text |
| Image | Thumbnail in search results |
| Date | Publication date for filtering or display |
| Custom fields | Product model, category, etc. for filtering |
Field mappings use XPath or selectors to locate content in the page HTML.
Related: Crawler Settings Overview
Check:
1. Is the content on your website correct and current?
2. When was the last data sync? (Check Data Sync status)
3. Is the page being crawled? (Check crawler logs)
Fix:
- Update the website content
- Run a data sync to re-index
Check:
1. Is the content on a page within crawler scope?
2. Is the page excluded by include/exclude patterns?
3. For uploaded documents, was Index Documents clicked?
Fix:
- Adjust crawler scope to include the content
- Upload the document if it's not web-accessible
- Run indexing after uploads
Check:
1. Use Admin Chat Trace Log to see which pages were retrieved
2. Check relevance scores — is the wrong page ranking higher?
3. Is there duplicate or conflicting content?
Fix:
- Add a curated answer to pin the correct source
- Adjust category boosts to favour the right content
- Clean up duplicate content on your website
Check:
1. How does the user phrase the question vs how content is written?
2. Are there abbreviations or alternate terms users might use?
Fix:
- Add synonym rules
- Update content to use user-friendly terminology
- Add curated answers for specific phrasings
Back to: Optional Deep-Dives