.

Security Intelligence for Deployed Generative AI Models

By Ankit Agarwal, Head of Business Development, Generative AI & Machine Learning, Private Equity, Amazon Web Services (AWS)

Generative AI models have significantly transformed industries, offering innovations in content creation and personalized user experiences. However, as these models integrate into core business operations, the imperative of safeguarding them against potential threats becomes paramount. In this article, we delve into the domain of Security Intelligence for Deployed Generative AI Models, exploring critical considerations and providing insights supported by pertinent industry examples.

Understanding the Risks

Deployed Generative AI Models introduce unique security challenges that necessitate a proactive and comprehensive approach. Some of the primary risks include:

  1. Adversarial Attacks: Generative models are susceptible to adversarial attacks, where malicious actors manipulate inputs to deceive the model into generating incorrect or undesirable outputs.Examples
    • An attacker could manipulate an image of a traffic sign to make it appear to say something different, such as “Stop” instead of “Yield.” This could cause a self-driving car to make a dangerous maneuver.
    • An attacker could create a fake audio recording of a person’s voice that could be used to impersonate them and make fraudulent requests.
    • An attacker could create a fake news article that looks like it was written by a reputable news source. This could be used to spread misinformation and propaganda.

     Mitigation –

    • Adversarial training: This technique involves exposing the model to adversarial inputs during training, allowing it to learn to recognize and reject these inputs.
    • Input validation: This technique involves checking the inputs to the model for any suspicious patterns or characteristics that could indicate an adversarial attack.
    • Defensive distillation: This technique involves training a smaller, simpler model to mimic the behavior of the original model. The smaller model is less likely to be susceptible to adversarial attacks.
  1. Data Privacy Concerns: If the training data used to develop the model contains sensitive information, there is a risk that the generated content may inadvertently disclose private details.Examples –
    • OpenAI’s ChatGPT, a large language model, faced concerns regarding potential disclosure of sensitive information during interactions. Users found that the model, if prompted with certain inputs, could inadvertently generate responses containing personally identifiable information or sensitive details.
    • Imagine a Generative AI model trained on medical records to generate patient reports. If not properly protected, the model might inadvertently expose details about specific patients in its generated content, breaching confidentiality
    • In the finance sector, if a Generative AI model is trained on transaction data that includes personally identifiable information (PII), there’s a risk that the generated content could contain subtle hints or details about specific individuals, leading to privacy concerns.

    Mitigation

  • Differential privacy: This technique adds noise to the training data to make it difficult to identify individual records.
  • Federated learning: This technique involves training the model on multiple devices without sharing the data between devices.
  • Secure multi-party computation: This technique allows multiple parties to compute a function without revealing their private inputs to each other.
  1. Model Inference Attacks: Attackers may exploit vulnerabilities in the model’s inference process to glean insights into its inner workings or extract sensitive information.Examples
    • In 2022, researchers at the University of California, Berkeley found that they could use adversarial attacks to fool an image recognition model into misclassifying images. The researchers were able to do this by creating small perturbations to the input images that were imperceptible to humans but that caused the model to classify the images as something else.
    • In 2023, researchers at MIT found that they could use adversarial attacks to fool a generative music model into generating music that was plagiarized from other artists. The researchers were able to do this by creating small perturbations to the input music that were imperceptible to humans but that caused the model to generate music that was similar to other existing music.

   Mitigation

  • Whitelisting and blacklisting: This technique involves creating a list of inputs that the model is allowed or not allowed to process.
  • Input sanitization: This technique involves removing or modifying inputs to the model to make them less likely to trigger an attack.
  • Model monitoring: This technique involves monitoring the model’s performance for any signs of unusual behavior that could indicate an attack.
  1. Bias and Fairness Issues: Unchecked biases in training data can lead to biased outputs, potentially perpetuating or exacerbating societal inequalities.

    Examples

    • AI-powered resume screening tools are often used to automate the initial screening of job applications. However, these tools have been shown to exhibit biases that can disadvantage certain groups of applicants, such as women and minorities.
    • Algorithmic risk assessment tools are used in the criminal justice system to predict the likelihood of reoffending. However, these tools have been criticized for perpetuating racial and socioeconomic biases. A study by the Brennan Center for Justice found that algorithmic risk assessment tools were more likely to flag black defendants as high risk than white defendants, even when they had similar criminal histories.

   Mitigation

    • Data debiasing: This technique involves identifying and removing biases from the training data.
    • Fairness metrics: This technique involves using metrics to measure and assess the fairness of the model’s outputs.
    • Explainable AI (XAI): This technique involves developing methods to make the model’s decision-making process more transparent and understandable.

Best Practices for Security Intelligence

  1. Regular Audits and Assessments:
    • Conducting routine security audits to identify and address vulnerabilities.
    • Assessing model robustness against adversarial attacks.
    • Ensuring compliance with data privacy regulations.
  2. Explainability and Transparency:
    • Enhancing model transparency to understand decision-making processes.
    • Detecting and addressing biases and contributing to a better understanding of vulnerabilities.
  3. Continuous Monitoring:
    • Implementing robust monitoring systems to detect unusual patterns.
    • Responding swiftly to potential security threats.
  4. Collaboration with Security Experts:
    • Engaging security experts to evaluate the model’s architecture.
    • Uncovering potential vulnerabilities overlooked by domain-specific professionals.
  5. Dynamic Model Updates:
    • Regularly updating models with new data to enhance resilience against adversarial attacks.
    • Retraining models with diverse datasets to mitigate biases and improve overall performance.

Companies/Startups building solutions to enhance Security Intelligence for Deployed Generative AI Models. Here are few examples of startups working actively in this area –

Veriff:

Veriff is a Danish startup that provides a platform for verifying the authenticity of online identities. The platform uses a combination of machine learning and human review to detect and prevent fraud. Veriff’s solution can be used to secure generative AI models that are used for identity verification purposes, such as those used to authenticate users of financial services or online gaming platforms.

Deepchecks:

Deepchecks is a German startup that provides a platform for detecting and preventing adversarial attacks on generative AI models. The platform uses a combination of machine learning and formal methods to identify and block malicious inputs that could cause the model to generate incorrect or misleading outputs. Deepchecks’ solution can be used to protect generative AI models that are used for tasks such as image generation, text generation, and machine translation.

Evidently AI:

Evidently AI is an American startup that provides a platform for detecting and explaining bias in generative AI models. The platform uses a combination of machine learning and explainable AI (XAI) to identify and explain the sources of bias in the model’s outputs. Evidently AI’s solution can be used to improve the fairness and accountability of generative AI models that are used in sensitive applications, such as those used in hiring and loan decisions.

Immuta:

Immuta is an American startup that provides a platform for data governance and security. The platform can be used to protect the privacy of data that is used to train and operate generative AI models. Immuta’s solution can be used to control who has access to the data, how it can be used, and how it can be shared.

OneLogin:

OneLogin is an American company that provides identity and access management (IAM) solutions. OneLogin’s solutions can be used to protect the identity of users who interact with generative AI models. For example, OneLogin’s solutions can be used to authenticate users before they can access a generative AI model that is used for identity verification or financial services.

Contributions from Hyperscalers

Hyperscalers like Google, AWS and Micorsoft play a pivotal role in the evolution and emergence of Generative AI and are working actively towards improving security intelligence for the Generative AI Solutions. Some of the ways they are supporting Security intelligence are –

Google : Google AI has a dedicated research team focused on developing new security techniques for generative AI. The team is working on topics such as adversarial attack detection, model fairness, and privacy-preserving training. Google AI is also a member of the Partnership on AI (PAI), a non-profit organization that promotes responsible AI development.

AWS : AWS Security Research is also actively researching generative AI security. The team is exploring topics such as secure multi-party computation, differential privacy, and federated learning. AWS is a member of the Consortium for Safe, Equitable, and Trustworthy AI (SETAI), a research consortium that aims to develop trustworthy AI systems.

Microsoft : Microsoft Research is investigating a variety of generative AI security challenges, including adversarial attacks, data privacy, and bias and fairness and offers a range of security features for generative AI, including tools for data protection, anomaly detection, and explainable AI. Microsoft is a member of the Global Partnership on AI (GPAI), an international group of governments and AI organizations that collaborate on AI policy issues.

Conclusion

As Generative AI models become integral to industries, a proactive security intelligence approach is crucial. By acknowledging risks, learning from industry examples, and implementing best practices, organizations can secure their deployed Generative AI models, ensuring output integrity and sensitive information protection. Continuous vigilance, collaboration, and adaptation are imperative to stay ahead in the dynamic landscape of AI security.

Hot Topics

Related Articles