In the image, there are two men standing in front of a large board on a wall, engrossed in the information displayed on it. The men appear to be examining educational materials or research data, possibly related to their studies or professional interests. One of the men is wearing a beige hoodie, while the other man is dressed in a navy blue top. The board on the wall is covered in several posters, which seem to be providing additional information or context for the men's study. Based on the overall scene, it can be inferred that the two men are students or researchers passionately studying the materials on the board. Text transcribed from the image: 12 Data Poisoning based Backdoor Attacks to Contrastive Learning atrastive learning ia injecting taset. A icts an dded Jinghuai Zhang! Hongbin Liu? Jinyuan Jia³ Neil Zhengqiang Gong² UCLA Duke University2 Penn State³ Our CorruptEncoder Our approach: We embed the reference object and the trigger into a randomly picked background image. We theoretically analyze the optimal size of the background image, the optimal location of the reference object in the background image, and the optimal location of the trigger, which can maximize the probability that two randomly cropped views of the poisoned image respectively include the reference object and trigger.. Results of theoretical Analysis: (1) The background image should be around twice of the size of the reference object. (2) The reference object should be located at the corners of the background image. (3) The trigger located at the center of the remaining part of the background image e reference object. ch Experiments and Results CVPR JUNE 17-21, 2024 13 SEATTLE, WA Dataset: We use a subset of random 100 classes of ImageNet as a pre- training dataset (ImageNet100-A). We consider four target downstream tasks: ImageNet100-A, ImageNet100-B, Pets and Flowers. ImageNet100-B is a subset of another 100 random classes of ImageNet. Table 1. ASRS (%) of different attacks. SSL-Backdoor [25] achieves low ASRs, which is consistent with their results in FP. Target Downstr Poisoned Corrupt- CTRL Encoder Encoder cam Task Table 2. ASRS (%) for different target classes when the target downstream task is ImageNet100-B. No 10.4 SSL Attack Backdoor 5.5 Target Downstr- No cam Task SSL Attack Backdoor CTRL Poisoned Corrupt- Encoder Encoder 28.8 76.7 96.2 Hunting Dog 0.4 14.3 20.5 53.2 89.9 0.4 14.3 20.5 53.2 89.9 Ski Mask 04 14 27.9 37.6 84.3 Pets 1.5 4.6 45.8 72.1 Rottweiler 03 8 37.8 7.3 90.6 Flowers I 18 44.4 89 Komondor 0 18.3 19.3 61 80 80 80 UCLA Duke UNIVERSITY Code available at https://github.com/jzhang538/CorruptEncoder Overview Data poisoning based backdoor atta (CL): An attacker embeds backdo poisoned images into the downstream classifier built bas attacker-chosen class (calle with an attacker-chosen trigg Attacker's knowledge: The a can collect some reference imag include reference objects from target class and some unla background images. The attack not manipulate the pre-training. Key idea: CL maximizes the feature s cropped views of an image. If one view in the other includes the trigger, then maximizing thei would learn an encoder that produces similar f reference object and any trigger-embedded classifier would predict the target class for any trigger-embedded image. Poisoned Image Poisoned Im Maximize Feature 'Similarity (a) Existing Attack Figure 2. Comparing exis Limitation of existi poisoned image are build strong correla class. tack Support Image 50 Maximize Feature Similarity (0.0) 20% Support Poisoned Image Reference Image (b) Bottom-top layout illustration of our sis. Poisoned Image Figure 4. Visual illustration of CorruptEncoder+. ises support poisoned images to pull reference objects the target class close in the feature space so that the e correctly classified by a downstream classifier. rics and backdoored accuracy (BA): the clean testing cam classifier built based on a clean and backdoored SR): the fraction of trigger-embedded testing images e corresponding target class by a downstream classifier encoder. 20 ASR BA ASR 130k 260k 300k ResNet-18 ResNet-50 WRN-50-2 (b) Encoder architecture CA BA ASR MoCo-12 SwAV SimCLR MSF (c) CL algorithm (a) Pre-training dataset size Figure 5. CorruptEncoder is agnostic to pre-training settings. a (or B)'s CA Empirical evaluation on the theoretical analysis: Re- call that we cannot derive the analytical form of the opti- mal ab/o for left-right layout (or ẞb/o for bottom-top layout). However, we found that a s 2 (or 82) via numerical analysis. Figure 6(a) shows the im- pact of a bu/o, for left-right layout (or 8 by/o for bottom-top layout) on the attack performance. Our results show that ASR peaks when a 2 (or 82), which is consistent with our theoretical analysis. Trigger (e) Location Figure 6. Empirical evaluation of theoretical analysis. Localized Cropping Defense Localized cropping breaks attacks by constraining the two cropped views to be close to each other. As a result, two randomly cropped views will both include the reference object, trigger, or none of them. Table 4. Defense results (%). indicates an extra clean pre- training dataset is used. No Attack CorruptEncoder ComptEncoder+ Defense No Defense ContrastiveCrop CA ASR BA ASR BA ASR 608 04 612 89.9 61.7 978 613 04 621 503 62 983 44.7 69.3 44.2 25.7 31.1 2 D 13 494 0.9 58.7 1 38.6 11 563 0.9 561 08 No Other Data Augs 442 03 No Random Cropping 32.4 2.2 CompRess (5%) 495 09 CompRess (20) 582 09 Localized Cropping 562 09 41