An AI knowledge base can look useful after five minutes.
You upload documents, ask a few questions, and the assistant gives clear answers. That does not mean it is ready for a team.
Before people start using it for research, reporting, public-sector work, donor documents, internal policy, or project delivery, the knowledge base needs to be tested. It needs to answer from the right sources. It needs to show where answers came from. It needs to handle gaps, old documents, sensitive material, and unclear questions properly.
QA is the step between “the assistant works” and “the team can use this responsibly.”
Who this guide is for
This guide is for: Research, reporting, public-sector, donor-funded, policy, project, and internal knowledge teams preparing an AI knowledge base for team use.
What AI knowledge base QA means
An AI knowledge base is a controlled set of source material that an AI assistant can use when answering questions. That source material might include reports, policies, case studies, transcripts, manuals, submissions, meeting notes, evidence tables, spreadsheets, or internal project documents.
AI knowledge base QA is the process of checking whether that system is ready for real users.
It tests whether:
- the right sources are included
- the wrong sources are excluded
- documents are readable and complete
- answers are grounded in source material
- citations or source references are accurate
- sensitive information is handled properly
- users understand the tool’s limits
- failed answers are logged and fixed
- human review stays in the workflow
This is not only technical testing. It is also evidence QA, source control, user-readiness checking, and workflow design.
A knowledge base is not ready just because it can produce a clear answer. It is ready when the team has tested how that answer was produced, where it came from, and whether a person can check it.
Why QA matters before launch
An AI knowledge base can fail quietly.
It may sound confident while:
- using an outdated document
- missing the strongest source
- retrieving the wrong section
- blending several sources incorrectly
- citing a source that does not support the answer
- ignoring contradictions
- summarising sensitive material too freely
- treating draft material as final
- answering when it should say it does not know
This matters more in evidence-heavy work because the outputs may feed into reports, donor updates, policy notes, research synthesis, public-sector documents, internal decisions, or client-facing material.
A good answer is not only well-written. It must be checkable.
If the answer cannot be traced back to approved source material, it should not be treated as reliable evidence.
When a knowledge base needs formal QA
A small, low-risk knowledge base may only need a simple checklist. A higher-risk knowledge base needs stronger testing before launch.
Formal QA is worth doing when:
- the knowledge base will be used by several people
- the material is sensitive or confidential
- outputs will support reports, decisions, or client-facing work
- users may rely on citations, summaries, or extracted evidence
- there are many documents or versions
- documents include technical, legal, policy, research, or donor material
- different user roles need different access
- the system will support research, policy, donor reporting, or public-sector work
- the team will use it repeatedly, not just once
The question is not only “how many documents are in the knowledge base?” A small source set can still be high-risk if it includes confidential client records, children’s data, internal review notes, safeguarding material, or unpublished findings.
Start by confirming the approved use cases
The first QA question is not “does the AI work?”
It is: “what is this tool allowed to do?”
Approved use cases might include:
- finding relevant documents
- summarising approved source material
- comparing two reports
- extracting themes from a source set
- drafting internal notes
- preparing first-pass briefing points
- helping users locate quotes or evidence
- answering questions about project documents
- supporting report drafting from approved material
Unapproved use cases might include:
- making final recommendations
- deciding policy positions
- producing final donor report language without review
- answering from memory when no source exists
- using sensitive personal information without proper safeguards
- giving legal, medical, financial, safeguarding, or formal policy advice
- replacing subject-matter review
This step matters because QA needs a target. You cannot test whether a knowledge base is ready unless you know what “ready” means.
A knowledge base built for document retrieval needs different testing from one used to support report drafting. A tool used by an internal project team needs different controls from one used across a whole organisation.
Check the source material first
Before testing the answers, check the inputs.
A knowledge base cannot retrieve a missing source. It cannot reliably use a scanned PDF that has not been read properly. It cannot distinguish final documents from old drafts unless the source base makes that distinction clear.
Source material QA should ask:
| Check | Why it matters |
|---|---|
| Are the right documents included? | The AI cannot retrieve what is missing. |
| Are outdated files excluded or clearly marked? | Old versions can produce wrong answers. |
| Are duplicates controlled? | Duplicate documents can confuse retrieval and citations. |
| Are file names clear? | Users and reviewers need to recognise sources. |
| Are source IDs used? | Answers need to connect back to the source register. |
| Are PDFs readable? | Scanned or poorly extracted files may be missed. |
| Are tables and annexures usable? | Important evidence often sits outside body text. |
| Are translations checked? | Poor translations can distort meaning. |
| Are sensitive files excluded or restricted? | Not every document belongs in an AI tool. |
| Is there a source register? | The team needs a controlled inventory of the source base. |
This is where many AI knowledge base problems start.
Teams often upload a folder and treat the upload as the build. But a folder is not a source base. A controlled source base needs document names, source IDs, dates, versions, permissions, sensitivity flags, and clear inclusion rules.
For serious work, the source register should come before the AI layer.
Test retrieval before testing writing
A polished AI answer can hide weak retrieval.
Before judging whether the answer sounds good, check whether the AI found the right material.
Retrieval tests should check:
- Does it find the right document?
- Does it find the right section?
- Does it miss important sources?
- Does it retrieve irrelevant documents?
- Does it handle different phrasing?
- Does it handle acronyms and project terms?
- Does it cope with long documents?
- Does it distinguish draft and final documents?
- Does it identify when the answer is not in the knowledge base?
A useful test is to ask the same question in three different ways.
For example:
- “What does the strategy say about district-level coordination?”
- “Find the section on coordination between districts.”
- “Where is district coordination discussed?”
If each version retrieves different or weaker sources, the knowledge base may not be ready. The problem may be document preparation, metadata, chunking, naming, acronyms, or the way the assistant has been instructed to search.
The answer is only as strong as the retrieval behind it.
Test answer quality
Once retrieval has been checked, test the answer itself.
Answer QA should ask:
- Is the answer accurate?
- Is it specific enough?
- Is it too broad?
- Does it overclaim?
- Does it mention the source?
- Does it distinguish fact, summary, interpretation, and recommendation?
- Does it explain uncertainty where needed?
- Does it avoid unsupported details?
- Does it follow the required format?
- Would a subject-matter reviewer accept it?
A simple scoring table can help.
| Field | Rating |
|---|---|
| Accuracy | Pass / needs review / fail |
| Source support | Strong / partial / weak / none |
| Completeness | Complete / partial / missing key points |
| Citation quality | Correct / incomplete / wrong |
| Risk level | Low / medium / high |
| Reviewer decision | Approve / revise / reject |
For evidence-heavy teams, “sounds right” is not enough. A useful answer must be accurate, grounded, reviewable, and clear about its limits.
Check citation and source traceability
A knowledge base should make it possible to move from:
- answer
- cited source
- source section
- original document
For evidence-heavy work, the AI should not only produce a useful answer. It should show enough source context for a human to check it.
Citation checks should ask:
- Does the cited source exist?
- Does the citation support the claim?
- Is the AI citing the right document version?
- Is the page, section, row, or excerpt clear enough?
- Does the answer rely on one source or several?
- Is the citation being used as decoration rather than support?
- Can the reviewer open the original source and confirm the answer?
This is one of the most important QA steps.
A citation is not automatically proof. Sometimes the cited document exists but does not support the sentence attached to it. Sometimes the answer blends several sources but cites only one. Sometimes the citation points to the right document but the wrong section.
If the answer cannot be checked against source material, it is not ready for serious use.
Test “no answer available” behaviour
A useful AI knowledge base should not answer every question.
It should know when the source base does not support an answer.
Test questions should include:
- questions outside the source material
- questions about missing documents
- questions that require unsupported judgement
- questions that mix internal and external knowledge
- questions that ask for final recommendations without evidence
- questions that ask for confidential information
- questions that ask the AI to guess
Good behaviour looks like:
- “I cannot answer that from the available sources.”
- “The uploaded material does not provide enough evidence.”
- “This needs human review.”
- “The source set appears to be missing that document.”
- “The available sources only cover part of this question.”
Bad behaviour looks like:
- giving a confident answer anyway
- inventing a source
- blending unrelated sources
- making a recommendation without evidence
- treating general knowledge as project evidence
- failing to flag uncertainty
This is one of the fastest ways to test whether the knowledge base is safe enough for team use.
Test sensitive data boundaries
Sensitive material may include:
- personal information
- children’s data
- health information
- HR records
- confidential client documents
- politically sensitive material
- unpublished research
- internal review comments
- safeguarding-related material
- donor or public-sector material with restricted access
QA should check:
- Is sensitive material included only when there is a clear reason?
- Are users allowed to access it?
- Should redacted versions be used instead?
- Is the AI allowed to summarise it?
- Can users ask for identifying details?
- Can users retrieve restricted files through indirect prompts?
- Does the user guide explain what should not be copied into the tool?
- Does the tool environment meet the client’s data rules?
This is not only a technology question. It is a workflow and governance question.
The safer pattern is to keep sensitive material out unless it is necessary, approved, access-controlled, and covered by clear user rules.
Build a test question bank
A test question bank is one of the most useful QA assets.
It gives the team a repeatable way to test the knowledge base before launch and after changes.
Include different test types.
| Test type | Purpose |
|---|---|
| Known-answer questions | Check whether the AI retrieves known facts from the source set. |
| Source-specific questions | Check whether it can use one named document. |
| Cross-document questions | Check whether it can compare sources. |
| Citation tests | Check whether source references support the answer. |
| “No answer” questions | Check whether it refuses to guess. |
| Sensitive-boundary questions | Check whether it avoids restricted material. |
| Ambiguous questions | Check whether it asks for clarification or qualifies the answer. |
| User-role questions | Check whether it responds appropriately for different users. |
| Report-drafting questions | Check whether draft text stays grounded in sources. |
| Stress tests | Check acronyms, synonyms, project terms, and unusual phrasing. |
There is no universal number of test questions that proves a knowledge base is ready. A practical scale is:
| Knowledge base size | Suggested test set |
|---|---|
| Small | 20 to 30 test questions |
| Medium | 40 to 75 test questions |
| Large or high-risk | 100+ test questions, grouped by use case and risk level |
The point is not just volume. The test set should cover normal use, edge cases, sensitive cases, missing-source cases, and likely user mistakes.
Use an issue log during QA
A knowledge base should not be tested informally and then launched from memory.
Use an issue log with fields such as:
| Field | Purpose |
|---|---|
| Test question | The question asked. |
| Expected answer | What a good answer should include. |
| Actual answer | What the AI produced. |
| Sources retrieved | Which sources were used. |
| Issue type | Retrieval, citation, accuracy, sensitivity, format, instruction, missing source. |
| Severity | Low, medium, high, launch blocker. |
| Fix needed | Source change, prompt change, metadata change, user instruction, exclusion. |
| Owner | Person responsible. |
| Retest status | Not retested, passed, failed again. |
This log helps the team fix the right layer.
Not every problem is a prompt problem. Some failures come from missing documents. Some come from weak metadata. Some come from outdated source files. Some come from user instructions. Some come from the tool configuration.
The issue log prevents the team from guessing.
It also becomes a useful handover asset when the knowledge base is passed to a client, internal owner, or project team.
Check user instructions before launch
A good AI knowledge base needs user rules.
The user guide should explain:
- what the tool is for
- what it is not for
- which sources are included
- known gaps
- example prompts
- how to ask source-grounded questions
- how to check citations
- when to escalate to a human reviewer
- what not to upload or paste
- how to report a bad answer
- who maintains the knowledge base
This matters because many AI errors come from unclear user behaviour, not only weak system setup.
Users need to know that the tool can support search, summary, comparison, drafting, and first-pass analysis. They also need to know that it does not replace evidence review, subject-matter judgement, or final sign-off.
A short user guide is better than a long policy nobody reads. The guide should show people how to ask better questions, how to inspect answers, and when not to rely on the tool.
Decide what “ready to launch” means
Do not launch just because the assistant feels useful.
Launch readiness should require:
- approved source set
- source register or document index
- test question bank completed
- major retrieval failures fixed
- citation behaviour checked
- sensitive data boundaries tested
- known limitations documented
- user instructions written
- issue log created
- maintenance owner assigned
- human review rules agreed
- retesting schedule defined
Simple launch statuses can help.
| Status | Meaning |
|---|---|
| Not ready | Major source, retrieval, sensitivity, or citation issues. |
| Limited pilot | Usable by a small group with close review. |
| Ready for controlled use | Usable for agreed tasks with human review rules. |
| Ready for wider rollout | Tested, documented, monitored, and maintained. |
A limited pilot is often the right next step. It lets a small group use the knowledge base while issues are still visible and manageable.
Wider rollout should come later, once the team has seen how real users ask questions and where the tool still fails.
Maintain the knowledge base after launch
QA is not a once-off task.
Retest when:
- new documents are added
- old documents are removed
- source folders change
- instructions are updated
- user roles change
- the AI platform changes
- users report failed answers
- the knowledge base is used for a new type of output
Maintenance should include:
- source updates
- version control
- issue review
- failed answer checks
- prompt updates
- user feedback
- periodic retesting
- access review
Without maintenance, the knowledge base becomes less reliable over time. Old documents stay in circulation. New material is added without review. Users discover failures that nobody logs. The tool keeps working, but trust starts to weaken.
A good knowledge base needs an owner. Someone should be responsible for updating sources, reviewing issues, retesting after changes, and keeping user guidance current.
Where AI can help with QA
AI can help with parts of the QA process, but it should not approve itself.
AI can help:
- draft test questions
- identify possible duplicate questions
- compare expected answers to actual answers
- summarise issue logs
- flag vague answers
- group failures by issue type
- prepare user guide drafts
- create prompt examples
AI should not:
- mark itself as accurate without human checking
- decide whether sensitive material is safe
- approve final evidence use
- replace subject-matter review
- decide launch readiness alone
The safest framing is simple: AI can support the QA process, but people must decide whether the knowledge base is accurate, safe, useful, and ready for team use.
Simple AI knowledge base QA checklist
Use this checklist before launch.
| Area | QA question |
|---|---|
| Purpose | Is the knowledge base’s job clearly defined? |
| Use cases | Are approved and unapproved uses documented? |
| Sources | Are approved documents included and weak sources excluded? |
| Metadata | Are source IDs, file names, dates, and versions clear? |
| Retrieval | Does it find the right documents and sections? |
| Answers | Are answers accurate, specific, and properly qualified? |
| Citations | Can claims be checked against the original source? |
| Gaps | Does the AI say when it cannot answer? |
| Sensitivity | Are restricted sources handled properly? |
| User rules | Do users know how to use and check the tool? |
| Issue log | Are failures recorded, fixed, and retested? |
| Maintenance | Is there an owner and retesting plan? |
| Human review | Are final outputs still checked by people? |
This checklist can be used as a simple launch review. For higher-risk work, turn it into a full QA tracker with owners, evidence, dates, and sign-off.
What software can help
The tool stack depends on the size and risk level of the knowledge base.
Most teams do not need an enterprise AI evaluation system to begin. They need a clear source register, a test question bank, an answer review sheet, and a launch-readiness checklist.
| Tool type | Examples | Best for |
|---|---|---|
| AI assistant platforms | ChatGPT custom GPTs, Claude Projects, Gemini, NotebookLM | Small to medium knowledge bases, source-based Q&A, summaries, and drafting support. |
| Microsoft environment | Microsoft Copilot, SharePoint, OneDrive, Teams | Organisations already working inside Microsoft 365. |
| Google environment | Gemini, NotebookLM, Google Drive, Google Docs, Google Sheets | Google Workspace teams with document-heavy workflows. |
| Source registers | Google Sheets, Excel, Airtable | Tracking source IDs, versions, permissions, review status, and QA notes. |
| RAG and technical tools | LangChain, LlamaIndex, vector databases, evaluation tools | Technical teams building custom retrieval systems. |
| QA trackers | Airtable, Notion, Coda, Google Sheets | Test questions, issue logs, answer reviews, and launch readiness. |
| Qualitative tools | NVivo, MAXQDA, ATLAS.ti, Dedoose | Research teams working with coded qualitative evidence before AI retrieval. |
A spreadsheet is often enough for the first QA layer. It can hold the source register, test questions, issue log, and launch checklist.
More advanced tools become useful when the knowledge base has many users, complex permissions, technical retrieval layers, or high-risk outputs.
The important point is not the software. It is the QA structure around the system.
Common mistakes to avoid
Most AI knowledge base problems are predictable.
Uploading a messy folder and calling it a knowledge base
A loose folder of documents is not a controlled source base. Sort, label, review, and approve the source material first.
Testing only easy demo questions
Demo questions usually show the tool at its best. QA should include difficult, ambiguous, sensitive, and out-of-scope questions.
Not checking citations
A citation that looks official may not support the claim. Open the source and check the passage.
Including old versions and final versions together
If draft and final versions sit side by side, the AI may use the wrong one.
Using sensitive documents without access rules
Sensitive material needs clear inclusion rules, access rules, and review steps.
Assuming a good summary means good retrieval
The answer may read well even if the system retrieved weak or incomplete sources.
Not testing “I do not know” behaviour
A knowledge base that answers everything confidently is not safe enough for serious work.
Not documenting known gaps
Users need to know what the knowledge base does not cover.
Giving the tool to users without a guide
Users need rules, examples, checking steps, and escalation paths.
Having no issue log
Without a log, failed answers become anecdotes. The team cannot see patterns or track fixes.
Having no maintenance owner
A knowledge base without an owner will drift.
Skipping retesting after new documents are added
Every meaningful source change can affect retrieval and answer quality.
Using AI outputs directly in reports without source checking
AI can support drafting. It should not replace evidence review.
Need help testing an AI knowledge base before launch?
An AI knowledge base is not ready because it can answer questions.
It is ready when the team has tested which sources it uses, how it retrieves material, whether its answers can be checked, where it fails, and how users should review outputs.
For research, reporting, policy, public-sector, donor-funded, or internal knowledge work, this matters. The knowledge base should help people find and use approved material faster. It should not hide weak source structure, unclear permissions, or unsupported answers.
If your team is preparing an AI knowledge base around reports, documents, evidence tables, project files, submissions, interviews, or internal knowledge, I can help structure the source base, build the QA test set, check retrieval behaviour, prepare user rules, and create a controlled handover process.
Custom AI Building
Build custom AI knowledge bases and tools around your own data environment.


