Robodebt Failed Because Income Averages Were Treated as Proof of Debt

Robodebt shows what happens when the wrong data points are used to make serious decisions, human review is weakened, and a system turns income estimates into d…

Who this guide is for

This guide is for: Public-sector teams, policy teams, service delivery teams, evidence-heavy reporting teams, and teams using data or AI systems to support serious decisions.

What was Robodebt?

Robodebt was an Australian Government welfare debt scheme that used data matching and income averaging to raise debts against people who had received welfare payments.

The basic idea was to compare two sets of income information. On one side, the government had annual income data from the Australian Taxation Office. On the other side, Centrelink had income records connected to welfare payments. Where the numbers did not line up, the system treated the mismatch as a possible overpayment issue.

That first step was not automatically wrong. Comparing records can be useful. A mismatch can show that something needs to be checked.

The problem was what happened after that.

Instead of treating the mismatch as a prompt for review, the system averaged a person’s annual income across shorter welfare reporting periods. That average was then used to calculate alleged debts.

This is where the whole thing starts to fall apart.

Centrelink payments depended on what someone earned in specific reporting periods, often fortnight by fortnight. Annual income does not show that. A person may earn money for a short period and then earn nothing for months. They may work casually, seasonally, part-time, or irregularly. Averaging annual income across the year creates a tidy figure, but it does not show when the income was actually earned.

Justice Kyrou’s Federal Court speech on Robodebt-type maladministration explains the core issue clearly: income averaging used PAYG information from the ATO, averaged it into fortnightly amounts, and did not take into account the person’s actual income for each relevant fortnight. He also notes that the previous manual review step did not take place under Robodebt, and that recipients were given an income figure and expected to provide information to contradict it.

So the system had data, but it was not the right data for the decision being made.

That is the core of the Robodebt failure.

The system used an annual income figure to create an averaged estimate. That estimate was then treated as if it proved actual income in specific welfare periods. From there, alleged debts were raised, and people were expected to disprove the calculation.

The Royal Commission into the Robodebt Scheme later examined how the scheme was designed, implemented, defended, and allowed to continue. It was not only a technical failure. It was also a failure of policy design, legal judgement, human review, communication, governance, and accountability.

The system used the wrong data points for the decision

The decision the system needed to support was specific:

Did this person receive more welfare than they were entitled to in a particular reporting period?

To answer that properly, the system needed information about timing and frequency. It needed to know when income was earned, how much was earned in each relevant period, and whether the person’s welfare entitlement should have changed during those periods.

Annual income data could not answer that question on its own.

Annual income tells you what someone earned across a year. It does not tell you whether the income was earned evenly across that year. It does not show whether someone earned money in February but not in March, or worked for six weeks and then had no work for several months.

That is why the data design was wrong for the decision.

The annual income figure had a source: ATO income data. But the debt decision needed something more specific. It needed evidence of actual income in the relevant welfare reporting periods. The failure was not a total absence of source data. It was the use of a sourced annual figure to prove something that annual data could not prove on its own.

That distinction matters.

Source traceability asks: where did this number come from?

Evidence sufficiency asks: does this number prove enough to support this decision?

Robodebt had a serious problem with the second question.

It used annual income data and an averaged estimate to make decisions that required period-specific income evidence. The source may have been real, but the inference drawn from it was not strong enough for the decision.

This is also why database design is not just a technical exercise. You do not design a database only by asking what data is available. You design it by asking what decision, report, recommendation, dashboard, or public output the data needs to support.

If the decision depends on fortnightly income, the database needs fields that can deal with fortnightly income. If timing matters, the database needs timing. If frequency matters, the database needs frequency. If a figure is averaged, inferred, disputed, or verified, the system needs to say so clearly.

Why averaged ATO income alone was unlawful in Robodebt

The legal problem was not only that the calculation was rough. It was that averaged ATO income alone could not prove actual income in the relevant welfare reporting periods.

Robodebt needed evidence of actual income during specific periods. Instead, it used an averaged annual figure as if it could stand in for that evidence.

The Commonwealth Ombudsman’s supplementary submission on Centrelink’s compliance programme summarised the Deanna Amato finding in plain terms: averaged ATO income information was not capable of satisfying the decision-maker that Ms Amato owed the debt under the Social Security Act.

This is the clearest way to understand the issue.

The debt decision required period-specific proof. The system relied on a broad annual figure converted into an average. That average may have been useful for flagging a case, but averaged ATO income alone was not enough to prove the debt.

A mismatch should have created a review case, not a debt

The safer version of the process would have looked different.

The data match could have said: “These records do not line up. This case needs review.”

That would have been a reasonable use of automation. The system could have flagged the case, created a review record, identified the missing information, and asked a person to check whether the mismatch actually showed an overpayment.

Instead, Robodebt allowed the averaged figure to do too much work. The mismatch moved from being a warning sign to becoming the basis for a debt calculation.

That is the jump that should not have happened.

A mismatch is not a debt. An average is not proof. An estimate is not the same as verified income for a specific period.

This is where data collection and intake systems matter. If the system is going to support a serious decision, the fields, categories, review statuses, source links, and exception flags need to match the decision being made.

The problem was not simply that the system used data. The problem was that the system used the wrong level of data for the decision, and then treated the result as more certain than it was.

Humans made the system decisions

It is too easy to talk about Robodebt as if a machine made the mistake.

The system mattered, but humans designed the process. Humans accepted the assumptions. Humans approved the use of income averaging. Humans removed or weakened checking steps. Humans responded to legal concerns, complaints, tribunal decisions, and public pressure. Humans decided whether to pause, defend, adjust, or continue the scheme.

That is why the human review question is central.

Justice Kyrou’s speech explains that the previous manual review process was different. Before Robodebt, data matching could identify a discrepancy, but a Departmental compliance officer would often conduct a manual review and seek further information from the person and their employer. Under Robodebt, that review did not take place. Compliance officers no longer engaged with recipients or employers in the same way before determining whether a debt existed.

That changed the nature of the system.

The process should have had a human review point between the automated flag and the debt notice. Someone should have been responsible for asking whether the data actually supported the decision. Someone should have checked whether the income pattern was regular or irregular. Someone should have checked whether the averaged figure was safe to use in that case.

This is not an argument against automation.

Automation can help identify problems. It can prepare review packs. It can flag missing information. It can route cases to the right team. It can help with calculations, summaries, and communication.

But automation should not remove the judgement step when the decision affects a person’s life.

A weak evidentiary method became harmful because it was allowed to move through the system as if it were a proven debt calculation.

The burden moved onto the person receiving the debt notice

One of the most serious parts of Robodebt was that people were expected to disprove debts generated from averaged income data.

That could mean finding old payslips, employer records, bank records, or other documents from years earlier. For some people, that would have been difficult or impossible. Some had casual jobs. Some had moved. Some may not have received letters at the right address. Some were dealing with financial stress, health issues, unstable housing, or other pressures.

This was not just bad luck for individuals. It was a design choice that assumed people could easily supply historical evidence on demand.

The Royal Commission recognised this human side of the system. Its recommendations included better design of policies and processes around the people they affect, including clearer communication, different ways to engage, and better attention to people who may struggle with digital or compliance-heavy processes.

The Deanna Amato case shows the problem clearly. Victoria Legal Aid explains that Centrelink averaged her income into fortnightly amounts, raised a debt based on averaged ATO income data, and applied a penalty. After legal proceedings were filed, Centrelink contacted her former employer and bank. It then accepted that the original debt amount was wrong.

That sequence is important.

The stronger evidence check happened after the person challenged the decision.

It should have happened before the debt was raised.

A fair process would require the agency to check whether its own claim is strong enough before making the person carry the burden of proving it wrong.

The warning signs should have changed the system

Robodebt did not fail because nobody could see any problem until the end.

Warnings appeared in several places. There were legal concerns, tribunal decisions, complaints, advocacy work, frontline experience, media scrutiny, and litigation.

Justice Kyrou’s speech notes that between 2016 and 2022, AAT Tier 1 decisions repeatedly questioned the legal basis for using income averaging as evidence of actual income, overpayment, or debt. He also explains that those decisions were not published, and that the Royal Commission recommended a system for publishing first-instance social security decisions involving significant legal conclusions or policy implications.

That tells us something about the wider system failure.

A serious workflow does not only process new cases. It also learns from the cases that go wrong.

If decisions are being challenged, overturned, questioned, or defended with difficulty, that is not background noise. It is feedback on the method. If frontline staff are seeing the same problems repeatedly, that should matter. If lawyers are raising concerns, that should matter. If review bodies are questioning the basis for decisions, that should matter.

There should have been clear points where the scheme could be paused, reviewed, and redesigned.

That is where human control matters most. Human review is not only one person checking one file. It is also the governance around the whole process: who sees the warnings, who has authority to act, who can stop the process, and who is accountable when the system keeps producing questionable results.

Robodebt needed better human review at the case level and at the system level.

The 2026 NACC update reinforces the governance failure

The later accountability process reinforces the same lesson.

In March 2026, the National Anti-Corruption Commission published its report into six individuals referred by the Royal Commission. The NACC found that two of the six engaged in serious corrupt conduct: Mark Withnell, for intentionally misleading Department of Social Services officers during the preparation of a 2015 Cabinet submission, and Serena Wilson, for intentionally misleading the Commonwealth Ombudsman during a 2017 investigation. The NACC found that the other four referred individuals did not engage in corrupt conduct.

The NACC report dealt with corrupt conduct under the NACC Act for six referred individuals. It did not re-decide the broader legality of Robodebt or replace the Royal Commission’s wider administrative findings.

That does not change the income-averaging problem. It adds another layer to it.

Robodebt was not only a weak data method. It was also a system where legal concerns, oversight, internal warnings, and review signals were not handled properly. A serious public-sector data system needs more than a working calculation. It needs documented assumptions, clear escalation routes, legal review, decision ownership, and a way to stop the process when the evidence no longer supports the output.

The cost shows why evidence checks matter

The financial consequences also show why this was not a small process issue.

In September 2025, the Commonwealth agreed to a proposed $548.5 million settlement in the Knox Robodebt class action appeal. The Federal Court notice of proposed settlement describes the package as $475 million in compensation for eligible group members, up to $13.5 million in legal costs, and up to $60 million in administration costs. The settlement still needs Federal Court approval, with a hearing listed for 22 June 2026.

The Attorney-General said that, if approved by the Court, the settlement would be the largest class action settlement in Australian history.

The exact total cost of Robodebt depends on what is counted. It can include refunds, debts reduced to zero, the original class action settlement, the later proposed Knox settlement, administration costs, and legal costs. On that broader basis, credible reporting has put the wider redress bill at more than $2.4 billion, but that figure should be treated as an aggregate estimate rather than one single settlement amount.

That is the practical warning.

Skipping evidence checks does not remove cost. It moves the cost somewhere else: complaints, appeals, legal challenges, compensation, system repair, public scrutiny, and harm to the people affected.

A review gate may feel slow inside a workflow. But when a system is making serious decisions, the absence of a review gate can become much more expensive.

How I would redesign the process

I would not start by building a nicer dashboard. I would start by asking what decision the system needs to support.

In this case, the decision was whether a person owed a debt for a specific welfare period. That decision needed period-specific income evidence. So the data structure should have been designed around that requirement.

The process should have separated the different types of data clearly.

Annual income data should have been treated as annual income data. It could flag a possible mismatch, but it could not prove actual income in each fortnight.

Centrelink reporting data should have been treated as period-specific welfare information.

Averaged income should have been marked as an estimate.

Employer records, payslips, bank records, and other documents should have been treated as possible verification evidence.

A debt notice should only have been produced after the record had enough evidence to support the claim.

That sounds simple, but it changes the whole workflow.

The system should have moved from data match to review case, not from data match to debt. Once a case was created, the database should have shown what information was known, what was missing, what had been inferred, what had been checked, what still needed human review, and who owned the decision.

A safer process would have looked like this:

Data match identifies a mismatch.

System creates a review case.

Case record separates annual, reported, averaged, inferred, disputed, and verified income.

System flags irregular income patterns and missing period-specific evidence.

Reviewer checks whether the averaged figure can safely be used.

Additional evidence is requested or checked where needed.

A debt decision is made only if the evidence can carry the claim.

The person affected receives a plain-language explanation of the calculation and review options.

Complaints, appeals, tribunal decisions, legal warnings, and frontline feedback feed back into the system design.

This is the difference between an automated flagging process and an automated decision process that can cause harm.

It is also the difference between a database that only stores records and a database that helps a team make the right decision. A good structure does not only capture fields. It controls what can move forward, what needs review, what is still uncertain, and what should not be used yet.

This is not separate from the official reform lesson. The Royal Commission’s recommendations point in the same direction: better documentation of assumptions and data sources, stronger governance of data-matching programmes, clearer review pathways for automated decisions, better escalation of significant legal issues, and systems for identifying important review decisions before the same error keeps repeating across cases.

The same principle applies in research, policy, and reporting work. A source register helps a team know where information came from. A source-linked evidence table helps show whether a claim is actually supported by the source material behind it. But the deeper point is that the structure should match the decision or output the team needs to produce.

Where AI could help if this were redesigned now

Robodebt ended before the current wave of AI tools became widely used. By 2026, it is natural to ask whether AI could help with a process like this if it were redesigned today.

The answer is yes, but only in the right parts of the workflow.

AI could help compare records, identify missing evidence, summarise case files for reviewers, extract income periods from payslips or employer documents, flag irregular income patterns, prepare plain-language explanations, and identify repeated complaint or appeal patterns. It could also help reviewers see where the data is incomplete or where the case depends too heavily on an assumption.

For example, AI could extract dates, employer names, pay periods, gross amounts, and net amounts from payslips or employer records into structured fields for a human reviewer. But the reviewer would still need to decide whether the evidence supports the debt claim.

AI could be useful in the review layer.

It should not become the decision-maker.

It should not decide that a person owes money. It should not turn an averaged estimate into proof. It should not hide uncertainty. It should not replace legal review, policy judgement, or human responsibility.

The correct use of AI would be to help people review better, faster, and with clearer source material. It should make the uncertainty more visible, not less visible.

That fits the wider lesson from Robodebt. The problem was not that technology was used. The problem was that the system let a weak assumption carry a strong decision.

AI would only make that worse if the workflow stayed weak. This is why I keep coming back to the same point in AI-supported work: the source structure comes first. AI tools are much more useful when the underlying material is clean, current, labelled, and reviewable. When the source material is messy, AI outputs become harder to trust, which is why I have also written about why AI gives weak answers when source material is messy and how to QA an AI knowledge base before a team starts using it.

Where my work fits

This is where Robodebt connects to my work.

I do not make legal decisions. I do not decide welfare entitlement. I do not replace public officials, researchers, lawyers, evaluators, or policy teams.

My work sits in the information route around serious decisions and outputs.

I help teams work backwards from the decision, report, recommendation, dashboard, or public output they need to produce. From there, we define what data is needed, what sources matter, what fields need to be captured, what needs human review, and how the final output should stay connected to the evidence behind it.

On my site, that route is covered through Data Collection & Intake Systems, Traceable Evidence Workflow Support, and Data Use, Reporting & Communication Systems.

In practical terms, this can mean building source registers, data dictionaries, evidence tables, claim trackers, review fields, QA checks, dashboards, reporting structures, AI review prompts, or handover notes. The point is not to add admin for its own sake. The point is to make sure the system captures the right data points for the decision, marks uncertainty clearly, and keeps human review in the workflow.

Robodebt is a large public-sector example, but the same pattern appears in smaller projects too.

A research team may collect interviews but fail to capture the fields needed for comparison later. A policy team may summarise public submissions without knowing which claims support which recommendations. A donor-funded programme may produce a dashboard that shows totals but not the evidence behind the totals. A team may use AI to summarise a document library without knowing which sources are complete, approved, current, or relevant to the question being asked.

In each case, the issue is not only whether the team has information.

The issue is whether the information has been structured for the decision, report, recommendation, or output it needs to support.

This is the layer I work on with teams: turning messy or mixed source material into reviewable evidence structures before reports, recommendations, dashboards, AI tools, or serious decisions are produced.

What this means for teams using data, AI or reporting systems

The Robodebt lesson is not “never automate”. It is not “never match data”. It is not “never use estimates”. It is also not “source traceability solves everything”.

The lesson is more specific.

Before a team uses data to support a serious output, it needs to ask whether the data can actually carry that output.

Is the figure actual, averaged, inferred, disputed, or verified? Does it cover the right time period? Does it show frequency, timing, and context? Is it being used for the same purpose it was collected for? Has anyone checked whether the assumption is safe? What happens when the person affected cannot respond? Who owns the decision? Where do warnings and appeals go?

Those questions are technical, but they are also human. They shape whether the system is fair, reviewable, and fit for the decision being made.

That is why I would not describe Robodebt only as a bad algorithm story. It was a bad information system, a bad review process, and a bad decision environment. The data problem mattered, but it became harmful because people let the wrong data points drive decisions with serious consequences.

The same question applies to a public-sector data system, a donor report, an AI knowledge base, a public consultation database, a programme dashboard, or a management reporting workflow.

Can this data carry the output we are asking it to carry?

The practical takeaway

Robodebt failed because averaged annual income was treated as proof of actual income in specific welfare periods.

That was the central evidence problem.

The system had data, but it was the wrong level of data for the decision. It needed period-specific income evidence, timing, frequency, review, and verification. Instead, it used a broad annual figure, turned it into an average, and treated that estimate as enough to raise debts.

A better process would have started with the decision and worked backwards. What evidence is needed to support this decision? Which data fields are required? What can be automated? What must be reviewed by a person? What happens when the data is incomplete? Where do warnings go? When should the process stop?

Those are the questions that should sit behind any serious data, AI, reporting, dashboard, public submission, or decision-support system.

The question is not only: do we have the data?

The better question is: does this data answer the decision we are asking it to support?

If not, the system needs a stronger route from information to evidence to output.

Need to check your own evidence workflow?

If your team is using spreadsheets, dashboards, AI tools, public submissions, research material, donor reports, or internal records to support serious outputs, the evidence layer matters.

Start by checking whether your findings, claims, summaries, recommendations, or decisions can still be traced back to the right source material, with the right data points, at the right level of detail.

You can use the Source Traceability Risk Checker, or read more about my Traceable Evidence Workflow Support if you are working with evidence-heavy material and need a clearer route from source material to review-ready outputs.

Database Systems & Information Structure

Data Collection & Intake Systems

Collect useful, traceable data from the start through forms, fieldwork tools, public submission portals, partner reporting systems, calculators, and intake workflows.

Discuss a similar problemView Traceable Evidence Workflow Support
Share this article
Service fit

Relevant service fit

This article sits inside the same delivery work, service logic, and practical outcomes shown across the site.

Data Collection & Intake Systems

Collect useful, traceable data from the start through forms, fieldwork tools, public submission portals, partner reporting systems, calculators, and intake workflows.

Traceable Evidence Workflow Support

Turn interviews, submissions, case studies, survey comments, documents, and field notes into coded evidence, quote banks, synthesis tables, findings, recommendations, and report-ready outputs.

Data Use, Reporting & Communication Systems

Use structured data in reports, dashboards, internal tools, public microsites, applications, presentations, annual reports, and decision-support workflows.

Delivery examples

Related case studies

These delivery examples share the same service mix or workflow focus as the article you just read.

Related reading

Next reads

Read the adjacent stage in the workflow.

Softer next step

Not ready to send a brief yet?

Join the newsletter for practical notes on messy information, evidence workflows, source traceability, reporting pressure, and AI use that needs structure.

Need help with a similar problem?

If this article reflects the kind of reporting, systems, or evidence challenge you are dealing with, send a short brief and I can help scope the right next step.