Speakers
Synopsis
Over the past years, large-scale AI models, such as ChatGPT, have succeeded significantly, which stimulated numerous companies and organizations to seek to employ AI power. One of the most common ways is to offer AI via APIs or integrate AI into part of their service.
However, this format inevitably makes AI models publicly accessible, presenting opportunities for adversaries to steal models, seek to produce knockoffs and establish pirated API services for profit. The massive investment in deploying large-scale AI models makes stealing models an extremely profitable crime activity. What's worse, model theft can also severely damage intellectual property and pose privacy leakage risks to the model owner.
Inspired by digital watermarking used to protect multimedia content, model watermarking is proposed as a copyright identifier to determine if a suspect model is a knockoff. In this session, we will provide listeners with a practical understanding of AI model watermarking, the taxonomy behind watermarking, and how it is executed in the production system.
The session will begin with establishing a baseline understanding of AI model mechanics relevant to the critical concepts needed to understand model stealing. A brief and diagrammatic explanation of AI model architecture, training data reconstruction, model inversion, and adversarial training will be introduced. A live demonstration of model stealing will be run to illustrate its severity (toy example for demonstration purposes only).
Then, we will dive into AI model watermarking techniques by covering their basic concepts and techniques. Based on the embedding mechanism, we will divide AI model watermarking into two different categories: white-box and black-box watermarks. White-box watermarks directly embed secret patterns into the structure or parameters of the model. This watermarking method has higher robustness, capacity, and precision but requires access to the suspect model parameters during verification. Black-box watermarking requires only access to the model's output for the trigger sets without needing access to the internal details of the model. If the suspect model produces the predefined labels for the trigger set with a high probability, it is identified as a copy of the protected model.
Another live demonstration regarding model watermarking will be exhibited to illustrate its effectiveness against model stealing. This demonstration will break down the taxonomy of the model watermarking.
After the demonstration, we will explore the risks and challenges regarding AI model watermarking in real-world scenarios, such as model modification and extraction. The available commercial-level watermarking solutions will also be provided and analyzed.
The final topic will close the session with an outlook on the current and future state of AI model watermarking in academia and industry, exploring potential applications, e.g., employing model watermarking to label AI-generated content.
The fight to secure AI systems demands a proactive approach. Let's move beyond just detection and defense. Integrate AI model watermarking into your AI security strategy to defend against and address vulnerabilities proactively. By anticipating threats, we can ensure your defenses stay ahead of the curve.