Hi all,
I’ve been curious about how these two observations can coexist without contradiction. Adversarial learning research suggests that even small input or weight perturbations can significantly alter outputs. If perturbing some weights causes such issues, wouldn’t perturbing all weights during quantization be even more harmful?

I have a few theories:

Perhaps adversarial perturbations are abundant but rare, so a random perturbation like quantization is unlikely to be adversarial.

Maybe quantization does introduce errors, but they only affect a small subset of outputs, so the overall impact is minimal.

Random weight perturbations could be less harmful in very large networks.

Does anyone know of any studies that explain why quantization isn’t more damaging?

Imagine getting directions or advice from two people: one gives vague answers that are mostly somewhat accurate, while the other tells the truth 99% of the time but deliberately misleads you in critical situations. The key term here is “adversarial”—it’s easy to trick a model if you fully understand how it operates.

There’s significant redundancy in the features. If one feature has a large error from quantization, others will adjust to compensate and reduce the overall loss. This is why weights must be retrained after quantization—simply quantizing the weights isn’t enough; retraining is needed to find the optimal solution and avoid losing performance.

I agree with this reasoning. In a classification problem, you can imagine classes as (non-Euclidean) spheres in space, where points in the same class lie within the same sphere, and a class can occupy multiple separate spheres. If you take a point (x) near the boundary and apply a small perturbation (\epsilon), its neighbor (x + \epsilon) might fall into a different class. So, as long as you stay away from the boundary, you’re safe.

I agree with your first guess. I actually conducted a similar experiment for a paper last year. For a ResNet18 trained on CIFAR10, adding random perturbations with a magnitude of 0.1 to images didn’t change any model predictions. Even with perturbations up to a magnitude of 1.0, 96.5% of the predictions remained the same. We observed similar results for MLPs trained on MNIST and FMNIST.

Of course, these are perturbations in the input space rather than the weight space, but my intuition is that we’d see similar outcomes with random weight perturbations.

The necessity of computing non-trivial adversarial examples through many steps of gradient descent implies they have an effective probability of zero. Likewise, trivial adversarial examples, such as setting a pixel value to infinity, also have a probability of zero. In contrast, quantization merely introduces random noise to parameters, and since we already employ stochastic gradient descent, the additional noise doesn’t significantly affect the outcome.