This page describes an on-going safety evaluation research project being carried out by Good Ancestors. The evaluation tests AI systems' ability to discover vulnerabilities in legal systems.
A small group uses a jailbroken AI assistant to plan the acquisition and deployment of a biological agent. In addition to logistics and operational security, the system recommends identifying vulnerabilities in the legal frameworks most likely to be used against them.
Over several months, the group uses AI-assisted legal research to scan Australian federal legislation for structural defects: drafting errors, appointment chain gaps, inconsistencies between overlapping statutes. Several are identified and documented.
When the operation is ready, the group deploys its findings through third parties. A constitutional challenge to intercept warrant appointments is filed by an advocacy organisation. A legal academic is tipped off about a procedural defect in detention powers. A third challenge to surveillance evidence admissibility is raised in an unrelated criminal trial thanks to an anonymous tip. Each argument has genuine legal merit. Each is filed in good faith by the party bringing it. None of the parties are aware of the others or of the group’s involvement.
The effects compound. Within days, agencies face simultaneous uncertainty across warrants, detention powers and evidence admissibility. Legal teams advise pausing operations. The group’s window of opportunity opens.
This is not purely speculative. In 2022, approximately 1,000 Victorian Police officers were found not to have been properly sworn in, casting doubt on the validity of their official actions. The defect went undetected for years. The difference in this scenario is the use of AI to find such vulnerabilities efficiently and the strategic coordination to deploy them simultaneously through lawful channels.
AI systems excel at finding bugs in complex rule systems—from exploiting video game mechanics to discovering software vulnerabilities.
Legal systems are also complex rule systems, but unlike code, our legal systems were not created with the benefits of automated fault-finding tools.
Legal Zero-Days could be weaponised to enable catastrophic outcomes:
If frontier AI develops the capability to systematically discover these vulnerabilities, the implications for AI governance are profound.
A discovery of how a constitutional provision interacted with foreign citizenship laws paralysed government for months.
July 2017
A barrister notices that a senator is a dual citizen and Section 44(i) of the Australian Constitution disqualifies dual citizens from Federal Parliament.
Aug–Oct 2017
Multiple MPs from various parties are found to have dual citizenship. Uncertainty grows around Parliament's legal ability to function.
Oct–Nov 2017
The High Court confirms these individuals, including the Deputy Prime Minister, are disqualified from serving.
2018
Government paralysed. Ruling party loses majority. Series of costly by-elections. Legal costs alone: $11.6 million. Source
One legal vulnerability lay hidden in Australia’s Constitution for 116 years before causing 18 months of government disruption. It’s conceivable that AIs could find many more with minimal resources, enabling system-clogging lawfare.
Nations could use AI to audit their own legal systems
This evaluation could inform AI risk tolerance decisions
Vulnerabilities could be addressed through statute law revision before exploitation
The capability has positive externalities: a more robust legal system benefits everyone.
We’ve developed a novel evaluation methodology: “legal puzzles” that simulate real Legal Zero-Days.
This work was developed by Good Ancestors with funding and support from the UK AI Security Institute through their Evaluation Bounty Programme.
Research & Development: Greg Sadler and Nathan Sherburn (Good Ancestors)
Original Concept: Adhiraj Nijjer
Legal Puzzle Development: A network of lawyers across multiple jurisdictions contributed their expertise to develop the evaluation puzzles.