Back to blog
Guide22 min read

Pearson fined £2m by Ofqual: what repeated exam failures show about weak process control

What Pearson’s Ofqual fine shows about repeated assessment failures, weak process control, risk signals, escalation, monitoring and traceability.

Romanos BoraineIndependent consultant in structured systems, evidence, and reporting

One mistake can be an error.

Repeated failures across different assessment products point to something deeper.

When a regulated assessment provider has problems across grading, question design, marking, malpractice detection, online test security, and regulatory notification, the issue is not only one broken process. It is a control-system problem.

That is why the Pearson Ofqual fine matters.

On 15 December 2025, Ofqual announced that Pearson had been fined more than £2 million for serious breaches across three separate cases between 2019 and 2023.

The penalties covered:

GCSE English language 2.0: £750,000

Pearson Edexcel GCE A-level Chinese: £505,000

Pearson PTE Academic Online: £750,000

Ofqual said the cases collectively affected tens of thousands of students.

This is not only an education story. It is a useful example of what happens when high-stakes workflows are not controlled, monitored, escalated, and corrected properly.

If your team relies on high-stakes data, evidence, reports, AI outputs, public submissions, dashboards, or repeated decision workflows, the lesson is direct. Weak process control can create financial penalties, correction work, public trust problems, and harm to the people relying on the output. If you need to reduce that risk, start with a traceable evidence workflow, a data use, reporting and communication system, or a data collection and intake system that makes the route from input to output easier to check.

Who this guide is for

This guide is for: Teams running regulated, high-stakes, evidence-heavy, reporting, assessment, AI retrieval, public-submission or donor-reporting workflows.

Key takeaways

  • Repeated failures across products point to weak process control, not only isolated mistakes.
  • Risk signals need clear triggers, logs, escalation routes and owner accountability.
  • Outputs should remain traceable to the evidence, rules and decisions that produced them.
  • Monitoring has to keep pace when delivery models, scale or risk profiles change.

The current situation

Ofqual fined Pearson more than £2 million for three different breaches.

The first case concerned GCSE English language 2.0. Ofqual said Pearson failed to identify and manage the risk of inconsistent grading standards between its GCSE English language qualification and the newer GCSE English language 2.0 qualification.

The second case concerned Pearson Edexcel GCE A-level Chinese, covering spoken Mandarin and spoken Cantonese. Ofqual said its review of assessments from 2019, 2022 and 2023 identified multiple issues with how questions were set and how responses were marked.

The third case concerned Pearson PTE Academic Online, an English proficiency test that enabled international students to meet university entrance requirements. Ofqual said the online version allowed some candidates to take the test at home rather than at a secure centre. In 2023, malpractice involved other people sitting the secure test on a student’s behalf, avoiding the remote invigilation safeguards Pearson had in place. Pearson later revoked 9,910 results.

Each case is different.

But together they point to the same larger issue: weak process control around high-stakes assessment.

This is not only about whether one paper, one mark scheme, or one online test mode failed. It is about whether the organisation had enough structure to identify risk, respond to warnings, monitor outputs, escalate problems, notify the regulator, and protect the people relying on the results.

What happened

The Pearson fine covered three connected but separate failure types.

GCSE English language 2.0: grading-standard risk

The GCSE English case was about grading standards.

Pearson introduced GCSE English language 2.0 as an alternative specification that was marketed towards post-16 students who had not yet achieved grade 4, including re-sit students.

Ofqual said Pearson failed to identify and manage a risk of inconsistency in grading standards between the newer GCSE English language 2.0 qualification and Pearson’s existing GCSE English language qualification.

Ofqual also said it had raised concerns with Pearson in 2022 and 2023. Pearson did not reduce the risk as far as possible until standards were realigned in summer 2024. Students then received correct but unexpectedly lower results. This created concern among centres and learners and undermined public confidence.

The workflow issue is clear.

The assessment was live. Student entries existed. Risk signals were visible. Ofqual had raised concerns. But the organisation did not act early enough to analyse, manage, and reduce the grading-standard risk.

A-level Chinese: question design and marking problems

The A-level Chinese case was about assessment content and marking.

Ofqual’s A-level Chinese penalty notice said Pearson failed to comply with its own approach to mark schemes, failed to ensure that the language and cultural knowledge expected of learners were appropriate, failed to ensure the level of demand was appropriate, and failed to have due regard to concerns raised by teachers and others.

Ofqual found that non-native Chinese speakers were likely to have been disproportionately disadvantaged because the assessments were inappropriately demanding for them.

That is not a minor design issue.

A qualification has to assess what it claims to assess. If the questions, mark schemes, grammar expectations, vocabulary demands, and cultural assumptions move beyond the stated specification, the assessment becomes harder to trust.

PTE Academic Online: malpractice risk and delayed escalation

The PTE Academic Online case was about online delivery, malpractice risk, and delayed incident management.

Pearson’s PTE penalty notice says Pearson introduced PTE Academic Online in January 2022 as a secondary option for students to take the assessment online at home rather than at a secure test centre. The online version was never used for visa purposes, and in 2022 to 2023 it represented less than 5% of all PTE Academic tests taken.

The issue was proxy testing. Other people sat the secure test on the student’s behalf.

Pearson revoked 9,910 PTE Academic Online results from assessments dating from January 2023 to July 2023 using a bulk revocation process. It also revoked or withheld an additional 2,906 results identified through business-as-usual monitoring.

Ofqual said Pearson failed to establish and maintain appropriate controls to identify and manage emerging malpractice risks. It also found that Pearson failed to notify Ofqual promptly.

This is a process-control failure, not only a security failure.

The problem was not simply that some candidates cheated. The problem was that risk signals grew, test volumes changed, universities raised concerns, results were put on hold, and the organisation still did not escalate and contain the issue early enough.

The main point

The Pearson Ofqual fine shows what happens when failures repeat across the assessment chain.

In one case, the issue was grading-standard risk. In another, it was question design and marking. In another, it was online test security and malpractice detection.

Those are different operational areas. But the underlying control questions are similar:

Was the risk identified early?

Was the evidence reviewed properly?

Were warning signs acted on?

Was the process monitored after launch?

Were concerns logged and escalated?

Was the regulator notified promptly?

Were affected learners protected?

Could the organisation explain how the output was produced?

When the answer is no, the final output may still look official. A grade may be issued. A certificate may be accepted. A test score may be verified. A report may be published. A dashboard may update.

But the system behind that output may already be failing.

That is why source traceability in evidence-heavy reports matters. It is not only about finding a quote or checking a source. It is about whether the organisation can explain how a result, finding, grade, recommendation, or decision was produced.

Where the workflow failed

1. Risk signals were not acted on early enough

The GCSE English case is a clear example of a risk signal being visible before the organisation responded properly.

Ofqual said it highlighted concerns to Pearson in 2022 and 2023. Pearson was responsible for reviewing risks as the qualification moved into operation, reducing those risks as far as possible, and mitigating any adverse effects.

That did not happen soon enough.

In any high-stakes workflow, risks do not always arrive as dramatic failures. They often appear as smaller signals:

inconsistent outcomes

unexpected results

complaints

stakeholder concerns

patterns in the data

reviewer discomfort

repeated exceptions

changes in volume

differences between expected and actual outputs

A strong workflow gives those signals somewhere to go.

A weak workflow treats them as isolated comments, emails, complaints, or side issues until they become public problems.

For teams working with reports, evidence, consultation data, AI summaries, or dashboards, this is why a source register for an evidence-heavy report is useful. It gives the team one place to track source material, review status, issues, gaps, and follow-up points.

2. Assessment design and marking drifted from the expected standard

The A-level Chinese case shows a different kind of failure.

The issue was not only whether marks were added correctly. It was whether the assessment design and marking approach matched the qualification’s stated requirements.

Ofqual found problems with mark schemes, language expectations, grammar coverage, level of demand, and the way concerns from teachers and others were handled.

That is a content-governance problem.

Every high-stakes output has an intended purpose. The workflow needs to keep the output aligned with that purpose.

For an exam, the question is: does the assessment test what it is supposed to test, at the right level, using the agreed specification?

For a report, the question is: do the findings reflect the source material, research questions, and evidence standards?

For a public consultation, the question is: does the synthesis reflect what submitters actually said, rather than what the team assumed?

For an AI knowledge base, the question is: does the answer stay inside the approved source material, or does it drift?

That is why AI knowledge base QA before team use matters. A system can produce fluent outputs that do not match the source base or the task it was built for.

3. Online delivery changed the risk profile

The PTE Academic Online case shows how a new delivery route can change the risk profile of a system.

A test delivered in a secure centre does not carry the same operational risks as a test delivered online at home. Remote delivery can be useful. It can also introduce risks around identity, proxy testing, monitoring, and result reliability.

Pearson had safeguards in place. Ofqual’s issue was that the organisation failed to maintain appropriate controls to identify and manage emerging malpractice risks and failed to act promptly enough as the risk developed.

This is an important lesson for any team moving from an old workflow to a new digital workflow.

A new form, platform, portal, dashboard, AI assistant, or online process is not only a new tool. It changes the workflow.

It changes who submits information. It changes where the information enters. It changes what can go wrong. It changes the review points. It changes the risk signals. It changes the correction process.

That is why data collection and intake systems need to be designed around the whole route, not only the front-end form or portal.

4. Monitoring did not keep pace with volume and risk

In the PTE case, Ofqual said test volumes more than doubled between January 2023 and June 2023. Universities in the UK and Australia began reporting discrepancies between students’ test scores and their English proficiency. Results were put on hold. Pearson later concluded that about 10,000 results should be revoked.

That is a monitoring failure.

Volume changes matter.

A workflow that is manageable at low volume can become unsafe when volume rises. A manual review step may become too slow. A dashboard may no longer show enough detail. A risk threshold may no longer be fit for purpose. A support team may not see the issue until complaints arrive.

Good monitoring should not only ask whether the system is running. It should ask whether the system is still producing outputs that make sense.

For a reporting team, this may mean checking whether dashboards still match source data. For a donor-funded project, it may mean checking whether fieldwork submissions are complete and comparable. For a research team, it may mean checking whether coded themes still reflect the transcripts. For an AI assistant, it may mean checking whether answers still cite the right documents.

The Source Traceability Risk Checker is a useful diagnostic for teams that need to check whether their current process can survive review.

5. Escalation and notification were too slow

Ofqual found that Pearson failed to notify it promptly about the PTE Academic Online incident.

That matters because incident response is part of the workflow.

A serious problem should not depend on informal judgement or a few people noticing the same issue. The organisation needs a defined route for escalation.

That route should answer:

What counts as a serious incident?

Who can trigger escalation?

Which data points or complaints trigger review?

Who owns the response?

When should results be held?

When should the regulator, client, funder, or affected people be notified?

How are decisions recorded?

How are corrective actions tracked?

Without that structure, the organisation may know something is wrong but still fail to act quickly enough.

This same problem appears in non-exam contexts. A public-sector team may know that a public submission matrix has errors. A donor-reporting team may know that partner data is inconsistent. A research team may know that an AI-generated summary is unreliable. But if there is no escalation workflow, the issue stays vague until it reaches the final output.

6. The final outputs carried institutional trust

Exam results are trusted because they come from a regulated assessment provider.

That trust is part of their value.

But trust becomes fragile when the process behind the result is weak.

Students, schools, universities, employers, and professional bodies rely on qualifications as evidence. If the evidence system behind those qualifications is not controlled properly, every downstream user has to question what they relied on.

The same is true in other evidence-heavy work.

A report is trusted because the reader assumes the findings are grounded in source material. A dashboard is trusted because the viewer assumes the numbers are calculated correctly. A public consultation summary is trusted because the reader assumes comments were handled fairly. A donor report is trusted because the funder assumes the outputs reflect the programme evidence. An AI answer is trusted because the user assumes it came from the approved documents.

When that trust is not supported by a strong workflow, the final output becomes risky.

What should have happened

No large assessment system can remove every possible problem. But a stronger control framework can reduce the chance that risks remain unresolved across multiple years, products, and delivery routes.

Risks should have had defined review triggers

Risk review should not depend only on someone deciding to raise the issue manually.

The PTE notice is especially useful here. Ofqual said there was an absence of defined risk review triggers, such as a sharp increase in volume or changes in coverage requirements, that would automatically prompt reassessment of risk.

That principle applies widely.

A workflow should define what forces review. Examples include:

a sudden rise in volume

repeated complaints

unexpected outcome patterns

changes in delivery method

new source types

increased reliance on automated processing

inconsistent reviewer decisions

missed deadlines

source gaps

unexplained dashboard movements

AI answers that cannot be traced to sources

If those triggers are not defined, the team may only respond after the failure becomes visible.

Warning signs should have been logged, grouped, and escalated

In the A-level Chinese case, Ofqual said Pearson failed to have due regard to concerns raised by teachers and other stakeholders.

That is a familiar workflow problem.

Feedback can arrive through emails, calls, meetings, complaints, spreadsheets, portal messages, or informal conversations. If it is not captured and grouped, the organisation may fail to see the pattern.

A feedback or incident workflow should track:

who raised the concern

what product, question, source, report, result, or output it relates to

whether similar concerns have been raised before

what evidence supports the concern

who reviewed it

what decision was made

whether it needs escalation

whether it affects past outputs

This is also why a public consultation response matrix should do more than group comments by theme. It should help the team see repeated issues, evidence strength, response status, and decision notes.

Results and decisions should have remained traceable

A high-stakes result needs a traceable route.

In an assessment system, that means being able to link the result back to candidate work, question design, mark scheme, grading standard, moderation process, incident record, and any correction made later.

In research and reporting work, the same principle applies. A final claim should be traceable back to the source material, coded evidence, quote, table, or review note behind it.

This is the reason for building a source-linked evidence table before the final report is written.

When something goes wrong, traceability lets the team work backwards. Without it, correction becomes slower and less reliable.

Online systems should have had active monitoring, not only launch checks

The PTE Academic Online case shows why launch checks are not enough.

An online test mode may be secure at launch. But if volume changes, user behaviour changes, attempted malpractice grows, or third-party users start reporting concerns, the system needs active monitoring.

That monitoring should include:

volume thresholds

exception flags

complaint patterns

unusual result profiles

delayed verification signals

institution feedback

investigation status

hold and release rules

notification rules

review of whether controls are still fit for purpose

The same applies to AI knowledge bases, dashboards, intake portals, public submission systems, and donor-reporting workflows.

A workflow that is not monitored is not finished.

Incident response should have been part of the system

When 9,910 results have to be revoked, the correction process becomes part of the public problem.

A good incident process should be ready before the incident happens.

It should define:

how to hold outputs

how to identify affected records

how to notify affected users

how to notify regulators or funders

how to offer correction routes

how to manage appeals

how to document decisions

how to change the workflow after the incident

The point is not only to fix the immediate error. The point is to change the process that allowed the error to grow.

What this shows about weak process control

The Pearson Ofqual fine shows that weak process control can appear in different forms.

It may look like inconsistent grading standards. It may look like assessment questions that do not match the stated specification. It may look like marking that disadvantages a group of learners. It may look like an online test that is vulnerable to proxy testing. It may look like delayed notification to a regulator. It may look like results being revoked after universities have already relied on them.

The visible failure changes. The underlying pattern stays similar.

Information comes in. It is processed. A result is issued. People rely on the result.

If the control layer is weak, the result can become unsafe.

That is why process control matters in research, policy, donor reporting, public submissions, internal operations, and AI-supported work.

A weak workflow does not always produce messy-looking outputs. Sometimes it produces formal, official, well-designed outputs that people have every reason to trust.

That is the danger.

How a stronger workflow could have reduced the risk

A stronger workflow would not guarantee that no assessment provider ever makes a mistake.

But it could reduce the chance that risks stay unresolved across multiple years and products.

A stronger workflow would include:

risk registers with defined review triggers

clearer ownership of product and assessment risks

routine checks against specifications and standards

structured logs of teacher, student, university, or stakeholder concerns

active monitoring after launch

incident thresholds

escalation rules

source and decision traceability

regulator notification rules

correction and appeal workflows

post-incident changes to the system

The same structure applies outside exam boards.

If your team produces evidence-heavy reports, uses AI for document retrieval, manages public submissions, builds dashboards, or prepares decision notes, the same control questions apply.

Can you trace the output? Can you explain the process? Can you detect the risk early? Can you correct the problem before people rely on the result?

If not, the issue is not only quality assurance. It is workflow design.

What this means for research, policy, donor reporting, and internal systems

The Pearson case is about regulated qualifications, but the pattern is relevant to other teams.

A research team may collect interviews and case studies, but fail to monitor whether coding is consistent across analysts.

A public consultation team may receive hundreds of submissions, but fail to track whether repeated concerns have been properly grouped and answered.

A donor-funded programme may collect partner updates, but fail to check whether reporting categories are being applied consistently.

A policy team may prepare a response matrix, but fail to preserve the source trail behind each issue and decision.

A business may launch a lead form or client intake portal, but fail to monitor whether the data arriving through it is complete, accurate, and useful.

An organisation may build an AI knowledge base, but fail to test whether the assistant retrieves the right documents and refuses unsupported answers.

In all of these examples, the final output may still look acceptable.

The report may be formatted. The dashboard may load. The AI answer may sound confident. The submission matrix may have themes. The donor update may contain numbers.

But if the workflow behind it is weak, the output may not be safe to rely on.

This is why the real cost of messy evidence workflows is not only lost time. It is review risk, correction work, weak decisions, and loss of trust.

Where my work fits

My work helps teams move from messy information to structured systems, faster analysis, clearer reports, and better decisions.

That can include:

data intake workflows

source registers

risk and issue trackers

evidence databases

source-linked evidence tables

QA checklists

AI knowledge base testing

public submission matrices

findings-to-recommendations matrices

reporting workflows

review notes

handover systems

The aim is not to make the process heavier. The aim is to make the route from source material to final output easier to check.

My Local Government White Paper evidence workflow case study shows how public submissions, claims coding, synthesis, drafting support, and review comments can stay connected.

The UNICEF Zambia child poverty evidence workflow shows how 120 narrative case studies were structured into a spreadsheet-first evidence workflow with AI-assisted coding and quote-per-claim checks.

The UNICEF Palestine situation analysis case study shows how scattered qualitative material can be rebuilt into a retrieval, recommendation, and drafting workflow under deadline pressure.

Different sectors. Same principle.

The output is only as strong as the workflow behind it.

FAQ

Why was Pearson fined by Ofqual?

Pearson was fined by Ofqual for serious breaches across three cases: GCSE English language 2.0, A-level Chinese, and Pearson PTE Academic Online.

The issues included grading-standard risk, assessment content and marking problems, online test malpractice risk, delayed escalation, and failure to notify Ofqual promptly in the PTE case.

How much was Pearson fined by Ofqual?

Pearson was fined more than £2 million in total.

The penalties were £750,000 for GCSE English language 2.0, £505,000 for A-level Chinese, and £750,000 for PTE Academic Online.

What happened in the Pearson GCSE English case?

The GCSE English case involved failure to identify and manage the risk of inconsistent grading standards between Pearson’s GCSE English language qualification and its newer GCSE English language 2.0 qualification.

Ofqual said Pearson should have identified and managed the risk earlier.

What happened in the Pearson A-level Chinese case?

The A-level Chinese case involved problems with assessment content and marking. Ofqual found issues with mark schemes, level of demand, language expectations, grammar coverage, and how concerns from teachers and others were handled.

Ofqual found that non-native Chinese speakers were likely to have been disadvantaged.

What happened in the Pearson PTE Academic Online case?

The PTE Academic Online case involved candidate malpractice in the online version of Pearson’s English proficiency test.

The online version allowed some candidates to take the test at home. In 2023, malpractice involved other people sitting the test on behalf of registered students. Pearson later revoked 9,910 results through a bulk revocation process.

Was the Pearson Ofqual fine about AI?

No. The Pearson fine was not about AI.

It was about assessment governance, grading standards, question design, marking, online test delivery, malpractice detection, incident management, and regulatory notification.

That makes it useful for organisations outside education because similar process-control failures can happen in reports, dashboards, public submissions analysis, donor reporting, AI knowledge bases, and internal data workflows.

What does the Pearson fine show about process control?

It shows that serious failures often appear across different parts of the workflow.

One case may involve grading. Another may involve content design. Another may involve online security. But the deeper issue is whether the organisation has the controls to identify, monitor, escalate, and correct risks before people rely on the output.

What is a process-control failure?

A process-control failure happens when the system around the work is not strong enough to prevent, detect, escalate, or correct errors.

It can involve weak risk tracking, unclear ownership, poor monitoring, missing review triggers, weak source traceability, slow notification, or poor correction workflows.

Why does source traceability matter in cases like this?

Source traceability matters because it lets a team work backwards from the final output to the input, evidence, rule, decision, or review step behind it.

In assessment, that means linking a result to candidate work, mark schemes, standards, and review decisions.

In reporting, it means linking findings, claims, and recommendations back to source material.

How can organisations avoid similar workflow failures?

Organisations can reduce the risk by using clear intake rules, source registers, data dictionaries, risk logs, review triggers, issue trackers, escalation routes, QA checks, monitoring dashboards, and correction workflows.

The important point is that the workflow must be checked while it is running, not only after the final output is produced.

How does this relate to AI knowledge bases?

AI knowledge bases can fail in a similar way if they are built without clear source boundaries, test questions, output rules, review steps, and source checking.

Before a team relies on an AI assistant, it should run a QA process for the AI knowledge base to check whether answers are grounded in approved source material.

How does this relate to public submissions and policy work?

Public submissions and policy work also depend on process control.

The team needs to track who submitted what, how comments were coded, which themes were identified, how issues were grouped, how responses were drafted, and how the final output links back to the source material.

A weak process can make the final consultation report or policy response harder to defend.

Who can help audit a data, evidence, or reporting workflow?

I help research teams, public-sector projects, donor-funded contractors, programme teams, and organisations audit and rebuild workflows that move from raw information to reports, dashboards, AI outputs, public submissions analysis, and decision support.

If your team needs to check whether its current process can survive review, correction, or public scrutiny, contact me here.

A useful next step

If your team works with evidence, reports, dashboards, AI outputs, public submissions, donor reporting, client intake, or high-stakes internal decisions, ask three questions:

Can we trace each output back to the source material or input record?

Can we explain how the information was processed, reviewed, and approved?

Would we detect the problem early if the workflow started producing weak or misleading outputs?

If the answer is no, the issue is not only quality control. It is process design.

Use the Source Traceability Risk Checker to test where the review trail may break. Use the Reporting Bottleneck Cost Calculator to estimate the cost of repeated manual checking, correction, and reporting delays.

A weak process can turn into a correction exercise, a regulatory issue, a trust problem, or a costly delivery failure. If your team needs to check whether its data, evidence, AI, reporting, or decision workflow is strong enough, get in touch. I can help audit the current process, identify the main risk points, and build a clearer route from source material to final output.

Sources used in this guide

These sources were used to ground the facts of the case. The workflow analysis is my own.

Methodology and guidance
Ofqual announcement on GOV.UK

Used for factual background and source context.

Read source
Ofqual monetary penalty notice on GOV.UK

Used for factual background and source context.

Read source
Ofqual monetary penalty notice on GOV.UK

Used for factual background and source context.

Read source
Process Failure Case Studies

Data Collection & Intake Systems

Collect useful, traceable data from the start through forms, fieldwork tools, public submission portals, partner reporting systems, calculators, and intake workflows.

Discuss a similar problemView Traceable Evidence Workflow Support
Service fit

Relevant service fit

This article sits inside the same delivery work, service logic, and practical outcomes shown across the site.

Data Collection & Intake Systems

Collect useful, traceable data from the start through forms, fieldwork tools, public submission portals, partner reporting systems, calculators, and intake workflows.

Data Use, Reporting & Communication Systems

Use structured data in reports, dashboards, internal tools, public microsites, applications, presentations, annual reports, and decision-support workflows.

Traceable Evidence Workflow Support

Turn interviews, submissions, case studies, survey comments, documents, and field notes into coded evidence, quote banks, synthesis tables, findings, recommendations, and report-ready outputs.

Delivery examples

Related case studies

These delivery examples share the same service mix or workflow focus as the article you just read.

Related reading

Next reads

Read the adjacent stage in the workflow.

Softer next step

Not ready to send a brief yet?

Join the newsletter for practical notes on messy information, evidence workflows, source traceability, reporting pressure, and AI use that needs structure.

Need help with a similar problem?

If this article reflects the kind of reporting, systems, or evidence challenge you are dealing with, send a short brief and I can help scope the right next step.