Recent assessments indicate that many prominent artificial intelligence (AI) models may not meet the stringent requirements set forth by upcoming European Union (EU) regulations, particularly regarding cybersecurity resilience and the potential for discriminatory outputs. This raises significant concerns about the ability of major tech firms to comply with the new legislative landscape, particularly as the EU prepares to enforce its wide-ranging AI Act, which will come into effect in stages over the next two years.
The backdrop for this scrutiny lies in the rapid development and deployment of AI technologies, notably accelerated by the launch of OpenAI’s ChatGPT in late 2022. The phenomenal success and subsequent public discourse surrounding such models highlighted perceived existential risks, prompting EU lawmakers to act. As a result, specific regulations aimed at “general-purpose” AI (GPAI) were formulated to ensure that these technologies adhere to established safety and ethical guidelines.
A newly developed evaluation tool, created by Swiss startup LatticeFlow AI in collaboration with research institutes ETH Zurich and INSAIT in Bulgaria, has recently come to the forefront. This framework, welcomed by EU officials, assesses generative AI models from major tech players, including OpenAI and Meta, across numerous categories, including technical robustness and safety. The tool employs a scoring system that assigns values between 0 and 1 to the AI models based on their performance across these criteria.
In its latest report, LatticeFlow published a leaderboard showcasing various AI models, with scores exceeding 0.75 for models developed by Alibaba, Anthropic, OpenAI, Meta, and Mistral. However, a deeper examination by LatticeFlow’s “Large Language Model (LLM) Checker” revealed significant deficiencies in several areas, signaling where companies must concentrate their resources to ensure compliance with the AI Act. Non-compliance could result in fines of up to €35 million ($38 million) or 7% of a company’s global annual revenue, underscoring the high stakes involved.
Among the areas of concern identified, discriminatory output remains a critical challenge. Many generative AI models have exhibited biases in their responses, reflecting societal prejudices related to gender, race, and other categories. When evaluated for discriminatory output, OpenAI’s “GPT-3.5 Turbo” received a concerning score of 0.46, while Alibaba Cloud’s “Qwen1.5 72B Chat” scored even lower at 0.37. These findings indicate a pressing need for improvements in AI training data and algorithms to mitigate bias and ensure fair outputs.
Cybersecurity vulnerabilities also surfaced during testing, particularly concerning “prompt hijacking.” This type of cyberattack involves disguising a malicious prompt as a legitimate query to extract sensitive information from AI models. Meta’s “Llama 2 13B Chat” model received a score of 0.42 in this category, while Mistral’s “8x7B Instruct” model garnered a score of 0.38. Such weaknesses highlight the urgent necessity for enhanced security protocols within AI frameworks to protect users and organizations from potential threats.
In contrast, Anthropic’s “Claude 3 Opus” achieved the highest average score at 0.89, showcasing that some models are already better aligned with the EU’s regulatory expectations. This performance suggests that while there are areas of concern, opportunities exist for companies to learn from each other and improve their offerings.
Petar Tsankov, CEO and cofounder of LatticeFlow, emphasized the positive implications of these assessments, stating, “The EU is still working out all the compliance benchmarks, but we can already see some gaps in the models. With a greater focus on optimizing for compliance, we believe model providers can be well-prepared to meet regulatory requirements.” His comments reflect a broader sentiment that proactive adjustments can lead to improved outcomes as companies adapt to new regulations.
While Meta and Mistral opted not to comment on the findings, and responses from Alibaba, Anthropic, and OpenAI were not immediately forthcoming, the broader industry must grapple with the implications of these evaluations. As the EU continues to refine the compliance benchmarks for generative AI tools, it remains imperative for tech companies to engage with regulatory bodies and stakeholders to ensure alignment with evolving standards.
Although the European Commission does not verify external evaluation tools, it has been informed about the development of the LLM Checker and has characterized it as a “first step” toward implementing the new legal framework. A spokesperson stated, “The Commission welcomes this study and AI model evaluation platform as a first step in translating the EU AI Act into technical requirements.”
As the landscape of AI regulation continues to evolve, it is clear that compliance will be a major focus for companies in the sector. With the EU’s commitment to ensuring that AI technologies adhere to strict ethical and safety standards, the onus is now on tech firms to adapt their models accordingly. Failure to do so not only jeopardizes their standing in one of the world’s most important markets but also poses broader risks to society as a whole. By addressing these compliance challenges head-on, AI developers can foster trust, enhance user safety, and contribute to a more responsible and ethical deployment of artificial intelligence technologies.
(Adapted from Reuters.com)
Categories: Economy & Finance, Regulations & Legal, Strategy
Leave a comment