Set up an AgentticAI knowledge base that actually answers correctly.
A grounded assistant is only as good as the sources behind it. This guide walks through the trade-offs between documents, websites, snippets, and Q&A pairs — and how to test before you publish.
Guide
The pattern that works: start with the top 20 questions, layer documents for depth, add Q&A pairs for answers that have to be exact, and test before turning the widget on.
How should I set up a knowledge base for my first AI assistant?
Start from the top 20 questions you already get, not from your full document library. Map each question to the source that should answer it, then load that source. The library can grow from there.
What gets harder without it
Uploading everything upfront drowns the retrieval
Indexing the entire document library on day one makes retrieval less precise, not more. The assistant pulls less relevant context.
Prompt tweaks cannot fix bad sources
If the answer is wrong because the source is wrong, no amount of prompt engineering rescues it. Fix the source first.
Untested assistants embarrass you on day one
The most common production issue is shipping an assistant that has not been asked the top 20 questions. The first real visitor finds them for you.
What the solution includes
When to use documents
Long-form content where the answer needs context.
- Policy PDFs, product manuals, support handbooks.
- Pricing and packaging documents.
- Anything that has a definitive version someone already maintains.
When to use website sources
Content you already keep current on the public web.
- Help center articles.
- Documentation sites.
- Service or product pages.
- Set a small crawl scope first. Recrawl on a schedule if the plan allows.
When to use snippets
Short reusable facts you want the assistant to nail every time.
- Hours, locations, contact details.
- Disclaimers and policy statements.
- Standard responses to common edge cases.
When to use Q&A pairs
High-confidence answers where the wording matters.
- Refund policy phrased the way Legal approved.
- Pricing answers that have to match the website exactly.
- Escalation rules and handoff phrasing.
Setup order
The order that gets you live fastest with the fewest surprises.
Write the top 20 questions on a page. This is the test set.
Match each question to the source type that should answer it.
Upload documents and crawl websites for broad coverage.
Add Q&A pairs for answers that must be exact.
Run all 20 in the playground. Fix anything that is off.
Publish.
Common mistakes
Patterns that cost time later.
Indexing the entire SharePoint folder on day one.
Crawling the marketing site without scope limits.
Skipping snippet-level facts and hoping documents will cover them.
Publishing without testing the top 20.
When to refresh
Sources go stale. Plan for it.
Pricing changes — update Q&A pairs same day.
Policy or product changes — refresh documents.
Website updates — schedule a recrawl.
Quarterly review of all sources and the top 20 test set.
How teams get to value
Build the top 20 question list before you upload anything.
Load the smallest set of sources that covers those 20.
Test in the playground until every question has a useful, grounded answer.
Publish, monitor, and add sources as new question patterns appear.
What you can measure
The top 20 questions get useful, cited answers before the public widget goes live.
Every assistant answer maps to an approved source you can point at.
A clear refresh cadence so the assistant does not drift over time.
Questions teams ask
How many documents should I start with?
Three to ten covers most assistants. Start with the ones that map directly to the top 20 questions, not the full library.
Should I crawl the whole company website?
No. Pick the specific paths that contain answers — help center, service pages, FAQ. Broad crawls reduce retrieval precision.
When are Q&A pairs better than documents?
When wording matters. Refund policy, pricing, and escalation language should be in Q&A pairs even if they are also in documents.
How often should I retest?
Every time a source changes, and at least quarterly otherwise.
Explore adjacent solution paths
Need a second pair of eyes on your knowledge base?
Bring your top 20 questions to a working session. We will walk through source choices, indexing, and the test pass with you.