Fortnightly Digest 17 February 2025
A week ago the Paris AI Summit was held, which saw AI, its safety and its security, in the headlines A LOT.
Global AI governance remains deeply fragmented, as seen in the US and UK’s refusal to sign the Paris AI Summit’s declaration for inclusive and sustainable AI. While some nations push for regulatory oversight, others prioritise innovation with minimal restrictions, reflecting broader tensions in AI policy. The withdrawal of the EU’s AI Liability Directive and the FTC’s crackdown on misleading AI claims highlight the regulatory uncertainty surrounding AI accountability.
The UK’s AI Safety Institute has now also been renamed to the AI Security Institute, reflecting the importance of this topic (a move we are delighted to see and are trying not to say I told you so).
Meanwhile, AI security threats are evolving rapidly, with adversaries exploiting weaknesses in machine learning models and supply chains. Malicious AI models on Hugging Face, NVIDIA container toolkit vulnerabilities, and AI-generated hallucinations in software development all reveal systemic risks. Attacks are becoming more sophisticated, with researchers uncovering new adversarial techniques like token smuggling and agentic AI-powered phishing. The inadequacy of existing AI security measures is further underscored by DEF CON’s criticism of AI red teaming and calls for a standardised AI vulnerability disclosure system akin to cybersecurity’s CVE framework.
Despite these challenges, promising advancements in AI security research are emerging. Anthropic’s Constitutional Classifiers offer a structured approach to preventing universal jailbreaks, while FLAME proposes a shift towards output moderation for AI safety. New governance audits, like the metric-driven security analysis of AI standards, provide insight into regulatory gaps and the need for stronger technical controls.
20/02/2025