Evaluating the Robustness of Neural Networks: An Extreme Value Theory Approach | OpenReviewToggle navigationOpenReview.net×Go to ICLR 2018 Conference homepageEvaluating the Robustness of Neural Networks: An Extreme Value Theory ApproachTsui-Wei Weng*, Huan Zhang*, Pin-Yu Chen, Jinfeng Yi, Dong Su, Yupeng Gao, Cho-Jui Hsieh, Luca Daniel 15 Feb 2018 (modified: 22 Jun 2025)ICLR 2018 Conference Blind SubmissionReaders: EveryoneAbstract: The robustness of neural networks to adversarial examples has received great attention due to security implications. Despite various attack approaches to crafting visually imperceptible adversarial examples, little has been developed towards a comprehensive measure of robustness. In this paper, we provide theoretical justification for converting robustness analysis into a local Lipschitz constant estimation problem, and propose to use the Extreme Value Theory for efficient evaluation. Our analysis yields a novel robustness metric called CLEVER, which is short for Cross Lipschitz Extreme Value for nEtwork Robustness. The proposed CLEVER score is attack-agnostic and is computationally feasible for large neural networks. Experimental results on various networks, including ResNet, Inception-v3 and MobileNet, show that (i) CLEVER is aligned with the robustness indication measured by the $\ell_2$ and $\ell_\infty$ norms of adversarial examples from powerful attacks, and (ii) defended networks using defensive distillation or bounded ReLU indeed give better CLEVER scores. To the best of our knowledge, CLEVER is the first attack-independent robustness metric that can be applied to any neural network classifiers.TL;DR: We propose the first attack-independent robustness metric, a.k.a CLEVER, that can be applied to any neural network classifier.Keywords: robustness, adversarial machine learning, neural network, extreme value theory, adversarial example, adversarial perturbationCode: [ huanzhang12/CLEVER](https://github.com/huanzhang12/CLEVER)Community Implementations: [ 2 code implementations](https://www.catalyzex.com/paper/evaluating-the-robustness-of-neural-networks/code)13 RepliesLoadingJoin the TeamJoin the Team is a long-term project to advance science through improved peer review with legal nonprofit status. We gratefully acknowledge the support of the . © 2025 OpenReview, Our analysis yields a novel robustness metric called CLEVER, which is short for Cross Lipschitz Extreme Value for nEtwork Robustness. The proposed CLEVER score is attack-agnostic and is computationally feasible for large neural networks., One common approach is training models to refuse unsafe queries, but this strategy can be vulnerable to clever prompts, often referred to as jailbreak attacks, which can trick the AI into providing harmful responses. Our method, STAIR (SafeTy Alignment with Introspective Reasoning), guides models to think more carefully before responding..