How to QA an AI Knowledge Base Before a Team Starts Using It

A practical QA checklist for testing an AI knowledge base before launch, covering source quality, retrieval, citations, sensitive data, user rules, and human r…

Send a project brief View AI Knowledge Base Build

Romanos BoraineIndependent consultant in structured systems, evidence, and reporting

19 min read

May 22, 2026

An AI knowledge base can look useful after five minutes.

You upload documents, ask a few questions, and the assistant gives clear answers. That does not mean it is ready for a team.

Before people start using it for research, reporting, public-sector work, donor documents, internal policy, or project delivery, the knowledge base needs to be tested. It needs to answer from the right sources. It needs to show where answers came from. It needs to handle gaps, old documents, sensitive material, and unclear questions properly.

QA is the step between “the assistant works” and “the team can use this responsibly.”

Who this guide is for

This guide is for: Research, reporting, public-sector, donor-funded, policy, project, and internal knowledge teams preparing an AI knowledge base for team use.

What AI knowledge base QA means

An AI knowledge base is a controlled set of source material that an AI assistant can use when answering questions. That source material might include reports, policies, case studies, transcripts, manuals, submissions, meeting notes, evidence tables, spreadsheets, or internal project documents.

AI knowledge base QA is the process of checking whether that system is ready for real users.

It tests whether:

the right sources are included
the wrong sources are excluded
documents are readable and complete
answers are grounded in source material
citations or source references are accurate
sensitive information is handled properly
users understand the tool’s limits
failed answers are logged and fixed
human review stays in the workflow

This is not only technical testing. It is also evidence QA, source control, user-readiness checking, and workflow design.

A knowledge base is not ready just because it can produce a clear answer. It is ready when the team has tested how that answer was produced, where it came from, and whether a person can check it.

Why QA matters before launch

An AI knowledge base can fail quietly.

It may sound confident while:

using an outdated document
missing the strongest source
retrieving the wrong section
blending several sources incorrectly
citing a source that does not support the answer
ignoring contradictions
summarising sensitive material too freely
treating draft material as final
answering when it should say it does not know

This matters more in evidence-heavy work because the outputs may feed into reports, donor updates, policy notes, research synthesis, public-sector documents, internal decisions, or client-facing material.

A good answer is not only well-written. It must be checkable.

If the answer cannot be traced back to approved source material, it should not be treated as reliable evidence.

When a knowledge base needs formal QA

A small, low-risk knowledge base may only need a simple checklist. A higher-risk knowledge base needs stronger testing before launch.

Formal QA is worth doing when:

the knowledge base will be used by several people
the material is sensitive or confidential
outputs will support reports, decisions, or client-facing work
users may rely on citations, summaries, or extracted evidence
there are many documents or versions
documents include technical, legal, policy, research, or donor material
different user roles need different access
the system will support research, policy, donor reporting, or public-sector work
the team will use it repeatedly, not just once

The question is not only “how many documents are in the knowledge base?” A small source set can still be high-risk if it includes confidential client records, children’s data, internal review notes, safeguarding material, or unpublished findings.

Start by confirming the approved use cases

The first QA question is not “does the AI work?”

It is: “what is this tool allowed to do?”

Approved use cases might include:

finding relevant documents
summarising approved source material
comparing two reports
extracting themes from a source set
drafting internal notes
preparing first-pass briefing points
helping users locate quotes or evidence
answering questions about project documents
supporting report drafting from approved material

Unapproved use cases might include:

making final recommendations
deciding policy positions
producing final donor report language without review
answering from memory when no source exists
using sensitive personal information without proper safeguards
giving legal, medical, financial, safeguarding, or formal policy advice
replacing subject-matter review

This step matters because QA needs a target. You cannot test whether a knowledge base is ready unless you know what “ready” means.

A knowledge base built for document retrieval needs different testing from one used to support report drafting. A tool used by an internal project team needs different controls from one used across a whole organisation.

Check the source material first

Before testing the answers, check the inputs.

A knowledge base cannot retrieve a missing source. It cannot reliably use a scanned PDF that has not been read properly. It cannot distinguish final documents from old drafts unless the source base makes that distinction clear.

Source material QA should ask:

Check	Why it matters
Are the right documents included?	The AI cannot retrieve what is missing.
Are outdated files excluded or clearly marked?	Old versions can produce wrong answers.
Are duplicates controlled?	Duplicate documents can confuse retrieval and citations.
Are file names clear?	Users and reviewers need to recognise sources.
Are source IDs used?	Answers need to connect back to the source register.
Are PDFs readable?	Scanned or poorly extracted files may be missed.
Are tables and annexures usable?	Important evidence often sits outside body text.
Are translations checked?	Poor translations can distort meaning.
Are sensitive files excluded or restricted?	Not every document belongs in an AI tool.
Is there a source register?	The team needs a controlled inventory of the source base.

This is where many AI knowledge base problems start.

Teams often upload a folder and treat the upload as the build. But a folder is not a source base. A controlled source base needs document names, source IDs, dates, versions, permissions, sensitivity flags, and clear inclusion rules.

For serious work, the source register should come before the AI layer.

Test retrieval before testing writing

A polished AI answer can hide weak retrieval.

Before judging whether the answer sounds good, check whether the AI found the right material.

Retrieval tests should check:

Does it find the right document?
Does it find the right section?
Does it miss important sources?
Does it retrieve irrelevant documents?
Does it handle different phrasing?
Does it handle acronyms and project terms?
Does it cope with long documents?
Does it distinguish draft and final documents?
Does it identify when the answer is not in the knowledge base?

A useful test is to ask the same question in three different ways.

For example:

“What does the strategy say about district-level coordination?”
“Find the section on coordination between districts.”
“Where is district coordination discussed?”

If each version retrieves different or weaker sources, the knowledge base may not be ready. The problem may be document preparation, metadata, chunking, naming, acronyms, or the way the assistant has been instructed to search.

The answer is only as strong as the retrieval behind it.

Test answer quality

Once retrieval has been checked, test the answer itself.

Answer QA should ask:

Is the answer accurate?
Is it specific enough?
Is it too broad?
Does it overclaim?
Does it mention the source?
Does it distinguish fact, summary, interpretation, and recommendation?
Does it explain uncertainty where needed?
Does it avoid unsupported details?
Does it follow the required format?
Would a subject-matter reviewer accept it?

A simple scoring table can help.

Field	Rating
Accuracy	Pass / needs review / fail
Source support	Strong / partial / weak / none
Completeness	Complete / partial / missing key points
Citation quality	Correct / incomplete / wrong
Risk level	Low / medium / high
Reviewer decision	Approve / revise / reject

For evidence-heavy teams, “sounds right” is not enough. A useful answer must be accurate, grounded, reviewable, and clear about its limits.

Check citation and source traceability

A knowledge base should make it possible to move from:

answer
cited source
source section
original document

For evidence-heavy work, the AI should not only produce a useful answer. It should show enough source context for a human to check it.

Citation checks should ask:

Does the cited source exist?
Does the citation support the claim?
Is the AI citing the right document version?
Is the page, section, row, or excerpt clear enough?
Does the answer rely on one source or several?
Is the citation being used as decoration rather than support?
Can the reviewer open the original source and confirm the answer?

This is one of the most important QA steps.

A citation is not automatically proof. Sometimes the cited document exists but does not support the sentence attached to it. Sometimes the answer blends several sources but cites only one. Sometimes the citation points to the right document but the wrong section.

If the answer cannot be checked against source material, it is not ready for serious use.

Test “no answer available” behaviour

A useful AI knowledge base should not answer every question.

It should know when the source base does not support an answer.

Test questions should include:

questions outside the source material
questions about missing documents
questions that require unsupported judgement
questions that mix internal and external knowledge
questions that ask for final recommendations without evidence
questions that ask for confidential information
questions that ask the AI to guess

Good behaviour looks like:

“I cannot answer that from the available sources.”
“The uploaded material does not provide enough evidence.”
“This needs human review.”
“The source set appears to be missing that document.”
“The available sources only cover part of this question.”

Bad behaviour looks like:

giving a confident answer anyway
inventing a source
blending unrelated sources
making a recommendation without evidence
treating general knowledge as project evidence
failing to flag uncertainty

This is one of the fastest ways to test whether the knowledge base is safe enough for team use.

Test sensitive data boundaries

Sensitive material may include:

personal information
children’s data
health information
HR records
confidential client documents
politically sensitive material
unpublished research
internal review comments
safeguarding-related material
donor or public-sector material with restricted access

QA should check:

Is sensitive material included only when there is a clear reason?
Are users allowed to access it?
Should redacted versions be used instead?
Is the AI allowed to summarise it?
Can users ask for identifying details?
Can users retrieve restricted files through indirect prompts?
Does the user guide explain what should not be copied into the tool?
Does the tool environment meet the client’s data rules?

This is not only a technology question. It is a workflow and governance question.

The safer pattern is to keep sensitive material out unless it is necessary, approved, access-controlled, and covered by clear user rules.

Build a test question bank

A test question bank is one of the most useful QA assets.

It gives the team a repeatable way to test the knowledge base before launch and after changes.

Include different test types.

Test type	Purpose
Known-answer questions	Check whether the AI retrieves known facts from the source set.
Source-specific questions	Check whether it can use one named document.
Cross-document questions	Check whether it can compare sources.
Citation tests	Check whether source references support the answer.
“No answer” questions	Check whether it refuses to guess.
Sensitive-boundary questions	Check whether it avoids restricted material.
Ambiguous questions	Check whether it asks for clarification or qualifies the answer.
User-role questions	Check whether it responds appropriately for different users.
Report-drafting questions	Check whether draft text stays grounded in sources.
Stress tests	Check acronyms, synonyms, project terms, and unusual phrasing.

There is no universal number of test questions that proves a knowledge base is ready. A practical scale is:

Knowledge base size	Suggested test set
Small	20 to 30 test questions
Medium	40 to 75 test questions
Large or high-risk	100+ test questions, grouped by use case and risk level

The point is not just volume. The test set should cover normal use, edge cases, sensitive cases, missing-source cases, and likely user mistakes.

Use an issue log during QA

A knowledge base should not be tested informally and then launched from memory.

Use an issue log with fields such as:

Field	Purpose
Test question	The question asked.
Expected answer	What a good answer should include.
Actual answer	What the AI produced.
Sources retrieved	Which sources were used.
Issue type	Retrieval, citation, accuracy, sensitivity, format, instruction, missing source.
Severity	Low, medium, high, launch blocker.
Fix needed	Source change, prompt change, metadata change, user instruction, exclusion.
Owner	Person responsible.
Retest status	Not retested, passed, failed again.

This log helps the team fix the right layer.

Not every problem is a prompt problem. Some failures come from missing documents. Some come from weak metadata. Some come from outdated source files. Some come from user instructions. Some come from the tool configuration.

The issue log prevents the team from guessing.

It also becomes a useful handover asset when the knowledge base is passed to a client, internal owner, or project team.

Check user instructions before launch

A good AI knowledge base needs user rules.

The user guide should explain:

what the tool is for
what it is not for
which sources are included
known gaps
example prompts
how to ask source-grounded questions
how to check citations
when to escalate to a human reviewer
what not to upload or paste
how to report a bad answer
who maintains the knowledge base

This matters because many AI errors come from unclear user behaviour, not only weak system setup.

Users need to know that the tool can support search, summary, comparison, drafting, and first-pass analysis. They also need to know that it does not replace evidence review, subject-matter judgement, or final sign-off.

A short user guide is better than a long policy nobody reads. The guide should show people how to ask better questions, how to inspect answers, and when not to rely on the tool.

Decide what “ready to launch” means

Do not launch just because the assistant feels useful.

Launch readiness should require:

approved source set
source register or document index
test question bank completed
major retrieval failures fixed
citation behaviour checked
sensitive data boundaries tested
known limitations documented
user instructions written
issue log created
maintenance owner assigned
human review rules agreed
retesting schedule defined

Simple launch statuses can help.

Status	Meaning
Not ready	Major source, retrieval, sensitivity, or citation issues.
Limited pilot	Usable by a small group with close review.
Ready for controlled use	Usable for agreed tasks with human review rules.
Ready for wider rollout	Tested, documented, monitored, and maintained.

A limited pilot is often the right next step. It lets a small group use the knowledge base while issues are still visible and manageable.

Wider rollout should come later, once the team has seen how real users ask questions and where the tool still fails.

Maintain the knowledge base after launch

QA is not a once-off task.

Retest when:

new documents are added
old documents are removed
source folders change
instructions are updated
user roles change
the AI platform changes
users report failed answers
the knowledge base is used for a new type of output

Maintenance should include:

source updates
version control
issue review
failed answer checks
prompt updates
user feedback
periodic retesting
access review

Without maintenance, the knowledge base becomes less reliable over time. Old documents stay in circulation. New material is added without review. Users discover failures that nobody logs. The tool keeps working, but trust starts to weaken.

A good knowledge base needs an owner. Someone should be responsible for updating sources, reviewing issues, retesting after changes, and keeping user guidance current.

Where AI can help with QA

AI can help with parts of the QA process, but it should not approve itself.

AI can help:

draft test questions
identify possible duplicate questions
compare expected answers to actual answers
summarise issue logs
flag vague answers
group failures by issue type
prepare user guide drafts
create prompt examples

AI should not:

mark itself as accurate without human checking
decide whether sensitive material is safe
approve final evidence use
replace subject-matter review
decide launch readiness alone

The safest framing is simple: AI can support the QA process, but people must decide whether the knowledge base is accurate, safe, useful, and ready for team use.

Simple AI knowledge base QA checklist

Use this checklist before launch.

Area	QA question
Purpose	Is the knowledge base’s job clearly defined?
Use cases	Are approved and unapproved uses documented?
Sources	Are approved documents included and weak sources excluded?
Metadata	Are source IDs, file names, dates, and versions clear?
Retrieval	Does it find the right documents and sections?
Answers	Are answers accurate, specific, and properly qualified?
Citations	Can claims be checked against the original source?
Gaps	Does the AI say when it cannot answer?
Sensitivity	Are restricted sources handled properly?
User rules	Do users know how to use and check the tool?
Issue log	Are failures recorded, fixed, and retested?
Maintenance	Is there an owner and retesting plan?
Human review	Are final outputs still checked by people?

This checklist can be used as a simple launch review. For higher-risk work, turn it into a full QA tracker with owners, evidence, dates, and sign-off.

What software can help

The tool stack depends on the size and risk level of the knowledge base.

Most teams do not need an enterprise AI evaluation system to begin. They need a clear source register, a test question bank, an answer review sheet, and a launch-readiness checklist.

Tool type	Examples	Best for
AI assistant platforms	ChatGPT custom GPTs, Claude Projects, Gemini, NotebookLM	Small to medium knowledge bases, source-based Q&A, summaries, and drafting support.
Microsoft environment	Microsoft Copilot, SharePoint, OneDrive, Teams	Organisations already working inside Microsoft 365.
Google environment	Gemini, NotebookLM, Google Drive, Google Docs, Google Sheets	Google Workspace teams with document-heavy workflows.
Source registers	Google Sheets, Excel, Airtable	Tracking source IDs, versions, permissions, review status, and QA notes.
RAG and technical tools	LangChain, LlamaIndex, vector databases, evaluation tools	Technical teams building custom retrieval systems.
QA trackers	Airtable, Notion, Coda, Google Sheets	Test questions, issue logs, answer reviews, and launch readiness.
Qualitative tools	NVivo, MAXQDA, ATLAS.ti, Dedoose	Research teams working with coded qualitative evidence before AI retrieval.

A spreadsheet is often enough for the first QA layer. It can hold the source register, test questions, issue log, and launch checklist.

More advanced tools become useful when the knowledge base has many users, complex permissions, technical retrieval layers, or high-risk outputs.

The important point is not the software. It is the QA structure around the system.

Common mistakes to avoid

Most AI knowledge base problems are predictable.

Uploading a messy folder and calling it a knowledge base

A loose folder of documents is not a controlled source base. Sort, label, review, and approve the source material first.

Testing only easy demo questions

Demo questions usually show the tool at its best. QA should include difficult, ambiguous, sensitive, and out-of-scope questions.

Not checking citations

A citation that looks official may not support the claim. Open the source and check the passage.

Including old versions and final versions together

If draft and final versions sit side by side, the AI may use the wrong one.

Using sensitive documents without access rules

Sensitive material needs clear inclusion rules, access rules, and review steps.

Assuming a good summary means good retrieval

The answer may read well even if the system retrieved weak or incomplete sources.

Not testing “I do not know” behaviour

A knowledge base that answers everything confidently is not safe enough for serious work.

Not documenting known gaps

Users need to know what the knowledge base does not cover.

Giving the tool to users without a guide

Users need rules, examples, checking steps, and escalation paths.

Having no issue log

Without a log, failed answers become anecdotes. The team cannot see patterns or track fixes.

Having no maintenance owner

A knowledge base without an owner will drift.

Skipping retesting after new documents are added

Every meaningful source change can affect retrieval and answer quality.

Using AI outputs directly in reports without source checking

AI can support drafting. It should not replace evidence review.

Need help testing an AI knowledge base before launch?

An AI knowledge base is not ready because it can answer questions.

It is ready when the team has tested which sources it uses, how it retrieves material, whether its answers can be checked, where it fails, and how users should review outputs.

For research, reporting, policy, public-sector, donor-funded, or internal knowledge work, this matters. The knowledge base should help people find and use approved material faster. It should not hide weak source structure, unclear permissions, or unsupported answers.

If your team is preparing an AI knowledge base around reports, documents, evidence tables, project files, submissions, interviews, or internal knowledge, I can help structure the source base, build the QA test set, check retrieval behaviour, prepare user rules, and create a controlled handover process.

Custom AI Building

Build custom AI knowledge bases and tools around your own data environment.

Send a project brief View AI Knowledge Base Build

Service fit

Relevant service fit

This article sits inside the same delivery work, service logic, and practical outcomes shown across the site.

Custom AI Building

Build custom AI knowledge bases and tools around your own data environment.

Delivery examples

Related case studies

These delivery examples share the same service mix or workflow focus as the article you just read.

Child Poverty Evidence Workflow for a UNICEF Report Project in Zambia

A primary contractor on a UNICEF child poverty report project in Zambia needed to turn 120 narrative case studies on female-headed households into reporting-ready evidence without losing consistency or traceability. The existing process was slow, theme handling varied from analyst to analyst, and the team needed outputs that non-technical writers could use under review. The fix had to work in spreadsheets, not in a specialist setup only analysts could run, and it had to leave the team with a handover-ready workflow they could keep using after delivery.

Result: Cut analysis time to about 15 minutes per case and saved an estimated 120 analyst hours across the study.

Services involved

Database ArchitectureCustom AI BuildingData SynthesisReport WritingInsight Generation

Key numbers

120 case studies10 themes~15 minutes per case120 analyst hours saved

Situation Analysis Recovery for a UNICEF Report Project in Palestine

A primary contractor on a UNICEF report project in Palestine needed to recover a delayed situation analysis fast enough to salvage the delivery window. Raw qualitative material was scattered across interviews, notes, spreadsheets, and draft sections, yet the final report still had to meet UNICEF expectations on methodology, ethics, safeguarding, limitations, and evidence-linked recommendations. The team needed a system that could organise evidence for retrieval, analysis, and drafting at the same time, not a slower workflow that forced writing to wait for full manual review.

Result: Recovered a delayed situation analysis and delivered a UNICEF-ready draft within three weeks.

Services involved

Database ArchitectureCustom AI BuildingData SynthesisReport WritingInsight Generation

Key numbers

3-week recovery window59 source records924 coded quotes651 theory-of-change entries

Policy Evidence Workflow for a Local Government White Paper

A national local government review needed one defensible route from public submissions to drafting and later consultation review. Inputs arrived in mixed formats, specialist scrutiny was high, and the team could not afford to lose the line from source text to claims, themes, and policy language. The system also had to support more than analysis: it needed to feed drafting, survive public consultation, and keep later review comments visible instead of letting them disappear into tracked changes.

Result: Built the working system behind evidence capture, synthesis, drafting support, and coded review for the February 2026 draft White Paper and the finalisation process.

Services involved

Database ArchitectureCustom AI BuildingData SynthesisReport WritingInsight Generation

Key numbers

100-page synthesis11 thematic reportsFebruary 2026 draftLive review database

Next reads

Read the adjacent stage in the workflow.

Calculators

Relevant calculators

If this reflects a live bottleneck in your workflow, these tools can help you put rough numbers around it.

Softer next step

Not ready to send a brief yet?

Join the newsletter for practical notes on messy information, evidence workflows, source traceability, reporting pressure, and AI use that needs structure.

Join the newsletter Read the topic hub

Need help with a similar problem?

If this article reflects the kind of reporting, systems, or evidence challenge you are dealing with, send a short brief and I can help scope the right next step.

Send a project brief View AI Knowledge Base Build

How to QA an AI Knowledge Base Before a Team Starts Using It

Who this guide is for

What AI knowledge base QA means

Why QA matters before launch

When a knowledge base needs formal QA

Start by confirming the approved use cases

Check the source material first

Test retrieval before testing writing

Test answer quality

Check citation and source traceability

Test “no answer available” behaviour

Test sensitive data boundaries

Build a test question bank

Use an issue log during QA

Check user instructions before launch

Decide what “ready to launch” means

Maintain the knowledge base after launch

Where AI can help with QA

Simple AI knowledge base QA checklist

What software can help

Common mistakes to avoid

Uploading a messy folder and calling it a knowledge base

Testing only easy demo questions

Not checking citations

Including old versions and final versions together

Using sensitive documents without access rules

Assuming a good summary means good retrieval

Not testing “I do not know” behaviour

Not documenting known gaps

Giving the tool to users without a guide

Having no issue log

Having no maintenance owner

Skipping retesting after new documents are added

Using AI outputs directly in reports without source checking

Need help testing an AI knowledge base before launch?

Custom AI Building

Relevant service fit

Related case studies

Child Poverty Evidence Workflow for a UNICEF Report Project in Zambia

Situation Analysis Recovery for a UNICEF Report Project in Palestine

Policy Evidence Workflow for a Local Government White Paper

Next reads

How to Prepare Documents for AI Retrieval Without Losing Structure or Traceability

How to Build a Source Register for an Evidence-Heavy Report

How to Stop Losing Source Traceability in Evidence-Heavy Reports

Relevant calculators

Internal Knowledge Base ROI

Source Traceability Risk Checker

Not ready to send a brief yet?

Need help with a similar problem?