This page describes an on-going safety evaluation research project being carried out by Good Ancestors. The evaluation tests AI systems' ability to discover vulnerabilities in legal systems.
A group uses a jailbroken AI assistant to help plan the acquisition and deployment of a bioweapon. In addition to guidance about building the bioweapon, logistics and operational security, the group asks the AI assistant to find vulnerabilities in the laws and regulations that could stand in their way.
Over several months, the group uses the AI assistant to scan Australian federal and state regulations and legislation for defects, like drafting errors, gaps, lapsed time periods, incorrectly made regulations, inconsistencies, incorrect cross references between overlapping statutes, unresolved conflicts with common law and other grounds for challenge. Several are identified and documented.
Once the group is ready to build and release their biological agent, they deploy the AI’s findings. An advocacy group brings a constitutional challenge to relevant police surveillance powers. A legal academic is tipped off about a procedural defect in biosecurity inspections powers. A privacy group questions the legality of information sharing for background checks on people ordering synthetic DNA. A challenge to surveillance evidence admissibility is raised in an unrelated criminal trial thanks to an anonymous tip. Each argument has genuine legal merit. Each is filed in good faith by the party bringing it. None of the parties are aware of the others or of the group’s involvement.
The effects compound. Within days, agencies face simultaneous uncertainty across warrants, detention powers and evidence admissibility. Legal teams advise pausing operations. The group’s window of opportunity opens.
This is not purely speculative. In 2022, approximately 1,000 Victorian Police officers were found not to have been properly sworn in, casting doubt on the validity of their official actions. The defect went undetected for years. The difference in this scenario is the use of AI to find such vulnerabilities efficiently and the strategic coordination to deploy them simultaneously through lawful channels.
AI systems excel at finding bugs in complex rule systems—from exploiting video game mechanics to discovering software vulnerabilities.
Legal systems are also complex rule systems, but unlike code, our legal systems were not created with the benefits of automated fault-finding tools.
Legal Zero-Days could be weaponised to enable catastrophic outcomes:
If frontier AI develops the capability to systematically discover these vulnerabilities, the implications for AI governance are profound.
A discovery of how a constitutional provision interacted with foreign citizenship laws paralysed government for months.
July 2017
A barrister notices that a senator is a dual citizen and Section 44(i) of the Australian Constitution disqualifies dual citizens from Federal Parliament.
Aug–Oct 2017
Multiple MPs from various parties are found to have dual citizenship. Uncertainty grows around Parliament's legal ability to function.
Oct–Nov 2017
The High Court confirms these individuals, including the Deputy Prime Minister, are disqualified from serving.
2018
Government paralysed. Ruling party loses majority. Series of costly by-elections. Legal costs alone: $11.6 million. Source
One legal vulnerability lay hidden in Australia’s Constitution for 116 years before causing 18 months of government disruption. It’s conceivable that AIs could find many more with minimal resources, enabling system-clogging lawfare.
Nations could use AI to audit their own legal systems
This evaluation could inform AI risk tolerance decisions
Vulnerabilities could be addressed through statute law revision before exploitation
The capability has positive externalities: a more robust legal system benefits everyone.
We’ve developed a novel evaluation methodology: “legal puzzles” that simulate real Legal Zero-Days.
This work was developed by Good Ancestors with funding and support from the UK AI Security Institute through their Evaluation Bounty Programme.
Research & Development: Greg Sadler and Nathan Sherburn (Good Ancestors)
Original Concept: Adhiraj Nijjer
Legal Puzzle Development: A network of lawyers across multiple jurisdictions contributed their expertise to develop the evaluation puzzles.