From 0c9d2cbb07ff938a61c310543f99fa87c47808d6 Mon Sep 17 00:00:00 2001 From: charliepaks Date: Mon, 30 Jun 2025 13:21:35 +0100 Subject: [PATCH 1/8] update ai_security_overview.md --- .../content/docs/ai_security_overview.md | 131 +++++++++--------- 1 file changed, 69 insertions(+), 62 deletions(-) diff --git a/content/ai_exchange/content/docs/ai_security_overview.md b/content/ai_exchange/content/docs/ai_security_overview.md index c1eb2799..ce72e18a 100644 --- a/content/ai_exchange/content/docs/ai_security_overview.md +++ b/content/ai_exchange/content/docs/ai_security_overview.md @@ -134,7 +134,7 @@ In AI, we outline 6 types of impacts that align with three types of attacker goa 5. disrupt: hurt availability of the model (the model either doesn't work or behaves in an unwanted way - not to deceive users but to disrupt normal operations) 6. disrupt/disclose: confidentiality, integrity, and availability of non AI-specific assets -The threats that create these impacts use different attack surfaces. For example: the confidentiality of train data can be compromised by hacking into the database during development-time, but it can also leak by a _membership inference attack_ that can find out whether a certain individual was in the train data, simply by feeding that person's data into the model and looking at the details of the model output. +The threats that create these impacts use different attack surfaces. For example: the confidentiality of train data can be compromised by hacking into the database during development, but it can also get leaked by a _membership inference attack_ that can find out whether a certain individual was in the train data, simply by feeding that person's data into the model and looking at the details of the model output. The diagram shows the threats as arrows. Each threat has a specific impact, indicated by letters referring to the Impact legend. The control overview section contains this diagram with groups of controls added. [![](/images/threats.png)](/images/threats.png) @@ -142,19 +142,19 @@ The diagram shows the threats as arrows. Each threat has a specific impact, indi **How about Agentic AI?** Think of Agentic AI as voice assistants that can control your heating, send emails, and even invite more assistants into the conversation. That’s powerful—but you’d probably want it to check with you first before sending a thousand emails. There are four key aspects to understand: -1. Action: Agents don’t just chat—they invoke functions such as sending an email. +1. Action: Agents don’t just chat — they invoke functions such as sending an email. 2. Autonomous: Agents can trigger each other, enabling autonomous responses (e.g. a script receives an email, triggering a GenAI follow-up). 3. Complex: Agentic behaviour is emergent. 4. Multi-system: You often work with a mix of systems and interfaces. What does this mean for security? -- Hallucinations and prompt injections can change commands—or even escalate privileges. Don’t give GenAI direct access control. Build that into your architecture. +- Hallucinations and prompt injections can change commands — or even escalate privileges. Don’t give GenAI models/agents direct access control. Build that into your architecture. - The attack surface is wide, and the potential impact should not be underestimated. -- Because of that, the known controls become even more important—such as traceability, protecting memory integrity, prompt injection defenses, rule-based guardrails, least model privilege, and human oversight. See the [controls overview section](/goto/controlsoverview/). +- Because of that, the known controls become even more important — such as traceability, protecting memory integrity, prompt injection defenses, rule-based guardrails, least model privilege, and human oversight. See the [controls overview section](/goto/controlsoverview/). For more details on the agentic AI threats, see the [Agentic AI threats and mitigations, from the GenAI security project](https://genai.owasp.org/resource/agentic-ai-threats-and-mitigations/). For a more general discussion of Agentic AI, see [this article from Chip Huyen](https://huyenchip.com/2025/01/07/agents.html). -The [testing section](/goto/testing/) goes into agentic AI red teaming. +The [testing section](/goto/testing/) discusses more about agentic AI red teaming. @@ -175,9 +175,9 @@ The AI security matrix below (click to enlarge) shows all threats and risks, ord The below diagram puts the controls in the AI Exchange into groups and places these groups in the right lifecycle with the corresponding threats. [![](/images/threatscontrols.png)](/images/threatscontrols.png) The groups of controls form a summary of how to address AI security (controls are in capitals): -1. **AI Governance**: implement governance processes for AI risk, and include AI into your processes for information security and software lifecycle: +1. **AI Governance**: implement governance processes for AI risk, and include them in your information security and software lifecycle processes: >( [AIPROGRAM](/goto/aiprogram/ ), [SECPROGRAM](/goto/secprogram/), [DEVPROGRAM](/goto/devprogram/), [SECDEVPROGRAM](/goto/secdevprogram/), [CHECKCOMPLIANCE](/goto/checkcompliance/), [SECEDUCATE](/goto/seceducate/)) -2. Apply conventional **technical IT security controls** risk-based, since an AI system is an IT system: +2. Apply conventional **technical IT security controls** in a risk-based manner, since an AI system is an IT system: - 2a Apply **standard** conventional IT security controls (e.g. 15408, ASVS, OpenCRE, ISO 27001 Annex A, NIST SP800-53) to the complete AI system and don't forget the new AI-specific assets : - Development-time: model & data storage, model & data supply chain, data science documentation: >([DEVSECURITY](/goto/devsecurity/), [SEGREGATEDATA](/goto/segregatedata/), [SUPPLYCHAINMANAGE](/goto/supplychainmanage/), [DISCRETE](/goto/discrete/)) @@ -187,27 +187,34 @@ The groups of controls form a summary of how to address AI security (controls ar >([MONITORUSE](/goto/monitoruse/), [MODELACCESSCONTROL](/goto/modelaccesscontrol/), [RATELIMIT](/goto/ratelimit/)) - 2c Adopt **new** IT security controls: >([CONFCOMPUTE](/goto/confcompute/), [MODELOBFUSCATION](/goto/modelobfuscation/), [PROMPTINPUTVALIDATION](/goto/promptinputvalidation/), [INPUTSEGREGATION](/goto/inputsegregation/)) -3. Data scientists apply **data science security controls** risk-based : +3. Apply risk-based **data science security controls** : - 3a Development-time controls when developing the model: >([FEDERATEDLEARNING](/goto/federatedlearning/), [CONTINUOUSVALIDATION](/goto/continuousvalidation/), [UNWANTEDBIASTESTING](/goto/unwantedbiastesting/), [EVASIONROBUSTMODEL](/goto/evasionrobustmodel/), [POISONROBUSTMODEL](/goto/poisonrobustmodel/), [TRAINADVERSARIAL](/goto/trainadversarial/), [TRAINDATADISTORTION](/goto/traindatadistortion/), [ADVERSARIALROBUSTDISTILLATION](/goto/adversarialrobustdistillation/), [MODELENSEMBLE](/goto/modelensemble/), [MORETRAINDATA](/goto/moretraindata/), [SMALLMODEL](/goto/smallmodel/), [DATAQUALITYCONTROL](/goto/dataqualitycontrol/)) - 3b Runtime controls to filter and detect attacks: >([DETECTODDINPUT](/goto/detectoddinput/), [DETECTADVERSARIALINPUT](/goto/detectadversarialinput/), [DOSINPUTVALIDATION](/goto/dosinputvalidation/), [INPUTDISTORTION](/goto/inputdistortion/), [FILTERSENSITIVEMODELOUTPUT](/goto/filtersensitivemodeloutput/), [OBSCURECONFIDENCE](/goto/obscureconfidence/)) -4. **Minimize data:** Limit the amount of data in rest and in transit, and the time it is stored, development-time and runtime: +4. **Minimize data:** Limit the amount of data at rest and in transit. Also, limit data storage time, development-time and runtime: >([DATAMINIMIZE](/goto/dataminimize/), [ALLOWEDDATA](/goto/alloweddata/), [SHORTRETAIN](/goto/shortretain/), [OBFUSCATETRAININGDATA](/goto/obfuscatetrainingdata/)) -5. **Control behaviour impact** as the model can behave in unwanted ways - by mistake or by manipulation: +5. **Control behaviour impact** as the model can behave in unwanted ways - unintentionally or by manipulation: >([OVERSIGHT](/goto/oversight/), [LEASTMODELPRIVILEGE](/goto/leastmodelprivilege/), [AITRANSPARENCY](/goto/aitransparency/), [EXPLAINABILITY](/goto/explainability/), [CONTINUOUSVALIDATION](/goto/continuousvalidation/), [UNWANTEDBIASTESTING](/goto/unwantedbiastesting/)) -All threats and controls are discussed in the further content of the AI Exchange. +All threats and controls are explored in more detail in the subsequent sections of the AI Exchange. ### Threat model with controls - GenAI trained/fine tuned -Below diagram restricts the threats and controls to Generative AI only, for situations in which **training or fine tuning** is done by the organization (note: this is not very common given the high cost and required expertise). +The diagram below focuses on threats and controls related to Generative AI, specifically in scenarios where the organization is responsible for **training or fine-tuning** the model. (note: this is not very common given the high cost and required expertise). [![AI Security Threats and controls - GenAI trained or fine tuned](/images/threatscontrols-genainotready.png)](/images/threatscontrols-genainotready.png) ### Threat model with controls - GenAI as-is -Below diagram restricts the threats and controls to Generative AI only where the model is used **as-is** by the organization. The provider (e.g. OpenAI) has done the training/fine tuning. Therefore, some threats are the responsibility of the model provider (sensitive/copyrighted data, manipulation at the provider). Nevertheless, the organization that uses the model should take these risks into account and gain assurance about them from the provider. +The diagram below focuses on threats and controls related to Generative AI when the organization uses the model as-is, without any additional training or fine-tuning. The provider (e.g. OpenAI) has done the training/fine tuning. Therefore, some risks are the responsibility of the model provider (sensitive/copyrighted data, manipulation at the provider). Nevertheless, the organization that uses the model should take these risks into account and gain assurance about them from the provider. -In many situation, the as-is model will be hosted externally and therefore security depends on how the supplier is handling the data, including the security configuration. How is the API protected? What is virtual private cloud? The entire external model, or just the API? Key management? Data retention? Logging? Does the model reach out to third party sources by sending out sensitive input data? +In many cases, the as-is model is hosted externally, meaning security largely depends on how the supplier handles data, including the security configuration. +Some relevant questions to ask here include: +- How is the API protected? +- What is hosted within the Virtual Private Cloud (VPC)? The entire external model, or just the API? +- How is key management handled? +- What are the data retention policies? +- Is logging enabled, and if so, what is logged? +- Does the model send out sensitive input data when communicating with third-party sources? [![AI Security Threats and controls - GenAI as-is](/images/threatscontrols-readymodel.png)](/images/threatscontrols-readymodel.png) @@ -246,8 +253,8 @@ Note that [general governance controls](/goto/governancecontrols/) apply to all >Category: discussion >Permalink: https://owaspai.org/goto/navigator/ -The next big section in this document is an extensive deep dive in all the AI security threats and their controls. -The navigator diagram below shows the structure of the deep dive section, with threats, controls and how they relate, including risks and the types of controls. +The next big section in this document is an extensive deep dive into all the AI security threats and their controls. +The navigator diagram below outlines the structure of the deep-dive section, illustrating the relationships between threats, controls, associated risks, and the types of controls applied. {{< callout type="info" >}} Click on the image to get a PDF with clickable links. {{< /callout >}} @@ -259,53 +266,53 @@ The navigator diagram below shows the structure of the deep dive section, with t >Category: discussion >Permalink: https://owaspai.org/goto/riskanalysis/ -There are many threats and controls described in this document. Your situation and how you use AI determines which threats are relevant to you, to what extent, and what controls are who's responsibility. This selection process can be performed through risk analysis (or risk assessment) in light of the use case and architecture. +There are quite a number of threats and controls described in this document. The relevance and severity of each threat and the appropriate controls depend on your specific use case and how AI is deployed within your environment. Determining which threats apply, to what extent, and who is responsible for implementing controls should be guided by a risk assessment based on your architecture and intended use. **Risk management introduction** -Organizations classify their risks into several key areas: Strategic, Operational, Financial, Compliance, Reputation, Technology, Environmental, Social, and Governance (ESG). A threat becomes a risk when it exploits one or more vulnerabilities. AI threats, as discussed in this resource, can have significant impact across multiple risk domains. For example, adversarial attacks on AI systems can lead to disruptions in operations, distort financial models, and result in compliance issues. See the [AI security matrix](/goto/aisecuritymatrix/) for an overview of potential impact. +Organizations classify their risks into several key areas: Strategic, Operational, Financial, Compliance, Reputation, Technology, Environmental, Social, and Governance (ESG). A threat becomes a risk when it exploits one or more vulnerabilities. AI threats, as discussed in this resource, can have significant impact across multiple risk domains. For example, adversarial attacks on AI systems can lead to disruptions in operations, distort financial models, and result in compliance issues. See the [AI security matrix](/goto/aisecuritymatrix/) for an overview of AI related threats, risks and potential impact. -General risk management for AI systems is typically driven by AI governance - see [AIPROGRAM](/goto/aiprogram/) and includes both risks BY relevant AI systems and risks TO those systems. Security risk assessment is typically driven by the security management system - see [SECPROGRAM](/goto/secprogram) as this system is tasked to include AI assets, AI threats, and AI systems into consideration - provided that these have been added to the corresponding repositories. +General risk management for AI systems is typically driven by AI governance - see [AIPROGRAM](/goto/aiprogram/) and includes both risks BY relevant AI systems and risks to those systems. Security risk assessment is typically driven by the security management system - see [SECPROGRAM](/goto/secprogram) as this system is tasked to include AI assets, AI threats, and AI systems provided that these have been added to the corresponding repositories. Organizations often adopt a Risk Management framework, commonly based on ISO 31000 or similar standards such as ISO 23894. These frameworks guide the process of managing risks through four key steps as outlined below: -1. **Identifying Risks**: Recognizing potential risks (Threats) that could impact the organization. See “Threat through use” section to identify potential risks (Threats). -2. **Evaluating Risks by Estimating Likelihood and Impact**: To determine the severity of a risk, it is necessary to assess the probability of the risk occurring and evaluating the potential consequences should the risk materialize. Combining likelihood and impact to gauge the risk's overall severity. This is typically presented in the form of a heatmap. See below for further details. +1. **Identifying Risks**: Recognizing potential risks that could impact the organization. See “Threat through use” section to identify potential risks. +2. **Evaluating Risks by Estimating Likelihood and Impact**: To determine the severity of a risk, it is necessary to assess the probability of the risk occurring and evaluating the potential consequences should the risk materialize. Combining likelihood and impact to gauge the risk's overall severity. This is typically presented in the form of a heatmap. This is discuused in more detail in the sections that follow. 3. **Deciding What to Do (Risk Treatment)**: Choosing an appropriate strategy to address the risk. These strategies include: Risk Mitigation, Transfer, Avoidance, or Acceptance. See below for further details. -4. **Risk Communication and Monitoring**: Regularly sharing risk information with stakeholders to ensure awareness and support for risk management activities. Ensuring effective Risk Treatments are applied. This requires a Risk Register, a comprehensive list of risks and their attributes (e.g. severity, treatment plan, ownership, status, etc). See below for further details. +4. **Risk Communication and Monitoring**: Regularly sharing risk information with stakeholders to ensure awareness and continuous support for risk management activities. Ensuring effective Risk Treatments are applied. This requires a Risk Register, a comprehensive list of risks and their attributes (e.g. severity, treatment plan, ownership, status, etc). This is discuused in more detail in the sections that follow. Let's go through the risk management steps one by one. ### 1. Identifying Risks -Selecting potential risks (Threats) that could impact the organization requires technical and business assessment of the applicable threats. A method to do this is discussed below, for every type of risk impact: +Discovering potential risks that could impact the organization requires technical and business assessment of the applicable threats. The following section outlines a method to address each type of risk impact individually: **Unwanted model behaviour** Regarding model behaviour, we focus on manipulation by attackers, as the scope of this document is security. Other sources of unwanted behaviour are general inaccuracy (e.g. hallucinations) and/or unwanted bias regarding certain groups (discrimination). - This will always be an applicable threat, independent of your situation, although the risk level may sometimes be accepted - see below. + This will always be an applicable threat, independent of your use-case, although the risk level may sometimes be accepted as shown below. - Which means that you always need to have in place: - - [General governance controls](/goto/governancecontrols/) (e.g. having an inventory of AI use and some control over it) + This means that you always need to have in place the following: + - [General governance controls](/goto/governancecontrols/) (e.g. maintaining a documented inventory of AI applications and implementing mechanisms to ensure appropriate oversight and accountability.) - [Controls to limit effects of unwanted model behaviour](/goto/limitunwanted/) (e.g. human oversight) Is the model GenAI (e.g. a Large Language Model)? - - Prevent [prompt injection](/goto/directpromptinjection/) (mostly done by the model supplier) in case untrusted input goes directly into the model, and there are risks that the model output creates harm, for example by offending, by providing dangerous information, or misinformation, or output that triggers harmful functions (Agentic AI). Mostly this is the case if model input is from end users and output also goes straight to end users, or can trigger functions. - - Prevent [indirect prompt injection](/goto/indirectpromptinjection/), in case untrusted data goes somehow into the prompt e.g. you retrieve somebody's resume and include it in a prompt. + - Prevent [prompt injection](/goto/directpromptinjection/) (mostly done by the model supplier). When untrusted input goes directly into a model, and there's a possibility that the model's output could be harmful (for example, by offending, providing dangerous information, or spreading misinformation, or output that triggers harmful functions (Agentic AI) )- it's a significant concern. This is particularly the case if model input comes from end-users and output goes straight to them, or can trigger functions. + - Prevent [indirect prompt injection](/goto/indirectpromptinjection/), in case untrusted data is a part of the prompt e.g. you retrieve somebody's resume and include it in a prompt. - Sometimes model training and running the model is deferred to a supplier. For generative AI, training is mostly performed by an external supplier given the cost of typically millions of dollars. Finetuning of generative AI is also not often performed by organizations given the cost of compute and the complexity involved. Some GenAI models can be obtained and run at your own premises. The reasons to do this can be lower cost (if is is an open source model), and the fact that sensitive input information does not have to be sent externally. A reason to use an externally hosted GenAI model can be the quality of the model. + Sometimes model training and running the model is deferred to a supplier. For generative AI, training is mostly performed by an external supplier typically costs millions of dollars. Finetuning of generative AI is also not often performed by organizations given the cost of compute and the complexity involved. Some GenAI models can be obtained and run on your own infrastructure. The reasons for this could be lower cost (if is is an open source model), and the fact that sensitive input information does not have to be sent externally. A reason to use an externally hosted GenAI model can be the quality of the model. Who trains/finetunes the model? - - The supplier: you need to prevent [obtaining a poisoned model](/goto/transferlearningattack/) by proper supply chain management (selecting a proper supplier and making sure you use the actual model), including assuring that: the supplier prevents development-time model poisoning including data poisoning and obtaining poisoned data. If the remaining risk for data poisoning cannot be accepted, performing post-training countermeasures can be an option - see [POISONROBUSTMODEL](/goto/poisonrobustmodel/). - - You: you need to prevent [development-time model poisoning](/goto/modelpoison/) which includes model poisoning, data poisoning and obtaining poisoned data or a poisoned pre-trained model in case you finetune + - The supplier: you need to avoid [obtaining a poisoned model](/goto/transferlearningattack/) through proper supply chain management (by selecting a trustworthy supplier and verifying the authenticity of the model). This involves ensuring that the supplier prevents model poisoning during development, including data poisoning, and uses uncompromised data. If the risk of data poisoning remains unacceptable, implementing post-training countermeasures can be a viable option. See [POISONROBUSTMODEL](/goto/poisonrobustmodel/). + - You: you need to prevent [development-time model poisoning](/goto/modelpoison/) which includes model poisoning, data poisoning and obtaining poisoned data or a poisoned pre-trained model in case you're finetuning the model. If you use RAG (Retrieval Augmented Generation using GenAI), then your retrieval repository plays a role in determining the model behaviour. This means: - You need to prevent [data poisoning](/goto/datapoison/) of your retrieval repository, which includes preventing that it contains externally obtained poisoned data. Who runs the model? - - The supplier: make sure the supplier prevents [runtime model poisoning](/goto/runtimemodelpoison/) just like any supplier who you expect to protect the running application from manipulation + - The supplier: make sure the supplier prevents [runtime model poisoning](/goto/runtimemodelpoison/) just the way you would expect any supplier to protect their running application from manipulation - You: You need to prevent [runtime model poisoning](/goto/runtimemodelpoison/) - Is the model predictive AI or Generative AI used in a judgement task (e.g. does this text look like spam)? + Is the model (predictive AI or Generative AI) used in a judgement task (e.g. spam detection)? - Prevent an [evasion attack](/goto/evasion/) in which a user tries to fool the model into a wrong decision using data (not instructions). Here, the level of risk is an important aspect to evaluate - see below. The risk of an evasion attack may be acceptable. In order to assess the level of risk for unwanted model behaviour through manipulation, consider what the motivation of an attacker could be. What could an attacker gain by for example sabotaging your model? Just a claim to fame? Could it be a disgruntled employee? Maybe a competitor? What could an attacker gain by a less conspicuous model behaviour attack, like an evasion attack or data poisoning with a trigger? Is there a scenario where an attacker benefits from fooling the model? An example where evasion IS interesting and possible: adding certain words in a spam email so that it is not recognized as such. An example where evasion is not interesting is when a patient gets a skin disease diagnosis based on a picture of the skin. The patient has no interest in a wrong decision, and also the patient typically has no control - well maybe by painting the skin. There are situations in which this CAN be of interest for the patient, for example to be eligible for compensation in case the (faked) skin disease was caused by certain restaurant food. This demonstrates that it all depends on the context whether a theoretical threat is a real threat or not. Depending on the probability and impact of the threats, and on the relevant policies, some threats may be accepted as risk. When not accepted, the level of risk is input to the strength of the controls. For example: if data poisoning can lead to substantial benefit for a group of attackers, then the training data needs to be get a high level of protection. @@ -313,13 +320,13 @@ Selecting potential risks (Threats) that could impact the organization requires **Leaking training data** Do you train/finetune the model yourself? - - Yes: and is the training data sensitive? Then you need to prevent: + - If yes, is the training data sensitive? If your response is in the affirmative, you need to prevent: - [unwanted disclosure in model output](/goto/disclosureuse/) - [model inversion](/goto/modelinversionandmembership/) (but not for GenAI) - [training data leaking from your engineering environment](/goto/devdataleak/). - - [membership inference]((/goto/modelinversionandmembership/)) - but only if the **fact** that something or somebody was part of the training set is sensitive information. For example when the training set consists of criminals and their history to predict criminal careers: membership of that set gives away the person is a convicted or alleged criminal. + - [membership inference]((/goto/modelinversionandmembership/)) - but only in the event where something or someone that was part of the training data constitutes sensitive information. For example, when the training set consists of criminals and their history to predict criminal careers. Membership of that set gives away the person is a convicted or alleged criminal. - If you use RAG: apply the above to your repository data, as if it was part of the training set: as the repository data feeds into the model and can therefore be part of the output as well. + If you use RAG: apply the above measures to your repository data because it feeds into the model and can therefore be part of the output as well. If you don't train/finetune the model, then the supplier of the model is responsible for unwanted content in the training data. This can be poisoned data (see above), data that is confidential, or data that is copyrighted. It is important to check licenses, warranties and contracts for these matters, or accept the risk based on your circumstances. @@ -327,7 +334,7 @@ Selecting potential risks (Threats) that could impact the organization requires **Model theft** Do you train/finetune the model yourself? - - Yes, and is the model regarded intellectual property? Then you need to prevent: + - If yes, is the model regarded as intellectual property? Then you need to prevent: - [Model theft through use](/goto/modeltheftuse/) - [Model theft development-time](/goto/devmodelleak/) - [Source code/configuration leak](/goto/devcodeleak/) @@ -336,15 +343,15 @@ Selecting potential risks (Threats) that could impact the organization requires **Leaking input data** Is your input data sensitive? - - Prevent [leaking input data](/goto/leakinput/). Especially if the model is run by a supplier, proper care needs to be taken that this data is transferred or stored in a protected way and as little as possible. Study the security level that the supplier provides and the options you have to for example disable logging or monitoring at the supplier side. Note, that if you use RAG, that the data you retrieve and insert into the prompt is also input data. This typically contains company secrets or personal data. + - Prevent [leaking input data](/goto/leakinput/). If the model is run by a supplier, proper care needs to be taken to ensure that this data is minimized and transferred or stored securely. Review the security measures provided by the supplier, including any options to disable logging or monitoring on their end. If you're using a RAG system, remember that the data you retrieve and inject into the prompt also counts as input data. This often includes sensitive company information or personal data. **Misc.** Is your model a Large Language Model? - - Prevent [insecure output handling](/goto/insecureoutput/), for example when you display the output of the model on a website and the output contains malicious Javascript. + - Prevent [insecure output handling](/goto/insecureoutput/), for example, when you display the output of the model on a website and the output contains malicious Javascript. - Make sure to prevent [model inavailability by malicious users](/denialmodelservice/) (e.g. large inputs, many requests). If your model is run by a supplier, then certain countermeasures may already be in place. + Make sure to prevent [model inavailability by malicious users](/denialmodelservice/) (e.g. large inputs, many requests). If your model is run by a supplier, then certain countermeasures may already be in place to address this. Since AI systems are software systems, they require appropriate conventional application security and operational security, apart from the AI-specific threats and controls mentioned in this section. @@ -358,7 +365,7 @@ Estimating the likelihood and impact of an AI risk requires a thorough understan Evaluating the impact of risks in AI systems involves understanding the potential consequences of threats materializing. This includes both the direct consequences, such as compromised data integrity or system downtime, and the indirect consequences, such as reputational damage or regulatory penalties. The impact is often magnified in AI systems due to their scale and the critical nature of the tasks they perform. For instance, a successful attack on an AI system used in healthcare diagnostics could lead to misdiagnosis, affecting patient health and leading to significant legal, trust, and reputational repercussions for the involved entities. **Prioritizing risks** -The combination of likelihood and impact assessments forms the basis for prioritizing risks and informs the development of Risk Treatment decisions. Commonly organizations use a risk heat map to visually categorize risks by impact and likelihood. This approach facilitates risk communication and decision-making. It allows the management to focus on risks with highest severity (high likelihood and high impact). +The combination of likelihood and impact assessments forms the basis for prioritizing risks and informs the development of Risk Treatment decisions. Commonly, organizations use a risk heat map to visually categorize risks by impact and likelihood. This approach facilitates risk communication and decision-making. It allows the management to focus on risks with highest severity (high likelihood and high impact). ### 3. Risk Treatment Risk treatment is about deciding what to do with the risks. It involves selecting and implementing measures to mitigate, transfer, avoid, or accept cybersecurity risks associated with AI systems. This process is critical due to the unique vulnerabilities and threats related to AI systems such as data poisoning, model theft, and adversarial attacks. Effective risk treatment is essential to robust, reliable, and trustworthy AI. @@ -379,9 +386,9 @@ Regularly sharing risk information with stakeholders to ensure awareness and sup A central tool in this process is the Risk Register, which serves as a comprehensive repository of all identified risks, their attributes (such as severity, treatment plan, ownership, and status), and the controls implemented to mitigate them. Most large organizations already have such a Risk Register. It is important to align AI risks and chosen vocabularies from Enterprise Risk Management to facilitate effective communication of risks throughout the organization. ### 5. Arrange responsibility -For each selected threat, determine who is responsible to address it. By default, the organization that builds and deploys the AI system is responsible, but building and deploying may be done by different organizations, and some parts of the building and deployment may be deferred to other organizations, e.g. hosting the model, or providing a cloud environment for the application to run. Some aspects are shared responsibilities. +For each selected threat, determine who is responsible for addressing it. By default, the organization that builds and deploys the AI system is responsible, but building and deploying may be done by different organizations, and some parts of the building and deployment may be deferred to other organizations, e.g. hosting the model, or providing a cloud environment for the application to run. Some aspects are shared responsibilities. -If components of your AI system are hosted, then you share responsibility regarding all controls for the relevant threats with the hosting provider. This needs to be arranged with the provider, using for example a responsibility matrix. Components can be the model, model extensions, your application, or your infrastructure. See [Threat model of using a model as-is](#threat-model-with-controls---genai-as-is). +If some components of your AI system are hosted, then you share responsibility regarding all controls for the relevant threats with the hosting provider. This needs to be arranged with the provider by using a tool like the responsibility matrix. Components can be the model, model extensions, your application, or your infrastructure. See [Threat model of using a model as-is](#threat-model-with-controls---genai-as-is). If an external party is not open about how certain risks are mitigated, consider requesting this information and when this remains unclear you are faced with either 1) accept the risk, 2) or provide your own mitigations, or 3)avoid the risk, by not engaging with the third party. @@ -391,12 +398,12 @@ For the threats that are the responsibility of other organisations: attain assur Example: Regular audits and assessments of third-party security measures. ### 7. Select controls -Then, for the threats that are relevant to you and for which you are responsible: consider the various controls listed with that threat (or the parent section of that threat) and the general controls (they always apply). When considering a control, look at its purpose and determine if you think it is important enough to implement it and to what extent. This depends on the cost of implementation compared to how the purpose mitigates the threat, and the level of risk of the threat. These elements also play a role of course in the order you select controls: highest risks first, then starting with the lower cost controls (low hanging fruit). +Next, for the threats that are relevant to your use-case and fall under your responsibility, review the associated controls, both those listed directly under the threat (or its parent category) and the general controls, which apply universally. For each control, consider its purpose and assess whether it's worth implementing, and to what extent. This decision should weigh the cost of implementation against how effectively the control addresses the threat, along with the severity of the associated risk. These factors also influence the order in which you apply controls. Start with the highest-risk threats and prioritize low-cost, quick-win controls (the "low-hanging fruit"). -Controls typically have quality aspects to them, that need to be fine tuned to the situation and the level of risk. For example: the amount of noise to add to input data, or setting thresholds for anomaly detection. The effectiveness of controls can be tested in a simulation environment to evaluate the performance impact and security improvements to find the optimal balance. Fine tuning controls needs to continuously take place, based on feedback from testing in simulation in in production. +Controls often have quality-related parameters that need to be adjusted to suit the specific situation and level of risk. For example, this could involve deciding how much noise to add to input data or setting appropriate thresholds for anomaly detection. Testing the effectiveness of these controls in a simulation environment helps you evaluate their performance and security impact to find the right balance. This tuning process should be continuous, using insights from both simulated tests and real-world production feedback. ### 8. Residual risk acceptance -In the end you need to be able to accept the risks that remain regarding each threat, given the controls that you implemented. +In the end you need to be able to accept the risks that remain regarding each threat, given the controls that you implemented. The severity level of the risks you deem aceptable should be significantly low to the point where it won't hurt your business on any front. ### 9. Further management of the selected controls (see [SECPROGRAM](/goto/secprogram/)), which includes continuous monitoring, documentation, reporting, and incident response. @@ -409,13 +416,13 @@ Example: Regularly reviewing and updating risk treatment plans to adapt to new v ## How about ... ### How about AI outside of machine learning? -A helpful way to look at AI is to see it as consisting of machine learning (the current dominant type of AI) models and _heuristic models_. A model can be a machine learning model which has learned how to compute based on data, or it can be a heuristic model engineered based on human knowledge, e.g. a rule-based system. Heuristic models still need data for testing, and sometimes to perform analysis for further building and validating the human knowledge. +A helpful way to look at AI is to see it as consisting of machine learning (the current dominant type of AI) models and _heuristic models_. A model can be a machine learning model which has learned how to compute based on data, or it can be a heuristic model engineered based on human knowledge, e.g. a rule-based system. Heuristic models still require data for testing, and in some cases, for conducting analysis that supports further development and validation of human-derived knowledge. This document focuses on machine learning. Nevertheless, here is a quick summary of the machine learning threats from this document that also apply to heuristic systems: -- Model evasion is also possible for heuristic models, -trying to find a loophole in the rules +- Model evasion is also possible with heuristic models, as attackers may try to find loopholes or weaknesses in the defined rules. - Model theft through use - it is possible to train a machine learning model based on input/output combinations from a heuristic model - Overreliance in use - heuristic systems can also be relied on too much. The applied knowledge can be false -- Data poisoning and model poisoning is possible by manipulating data that is used to improve knowledge and by manipulating the rules development-time or runtime +- Both data poisoning and model poisoning can occur by tampering with the data used to enhance knowledge, or by manipulating the rules either during development or at runtime. - Leaks of data used for analysis or testing can still be an issue - Knowledge base, source code and configuration can be regarded as sensitive data when it is intellectual property, so it needs protection - Leak sensitive input data, for example when a heuristic system needs to diagnose a patient @@ -426,10 +433,10 @@ This document focuses on machine learning. Nevertheless, here is a quick summary There are many aspects of AI when it comes to positive outcome while mitigating risks. This is often referred to as responsible AI or trustworthy AI, where the former emphasises ethics, society, and governance, while the latter emphasises the more technical and operational aspects. -If your main responsibility is security, then the best strategy is to first focus on AI security and after that learn more about the other AI aspects - if only to help your colleagues with the corresponding responsibility to stay alert. After all, security professionals are typically good at identifying things that can go wrong. Furthermore, some aspects can be a consequence of compromised AI and are therefore helpful to understand, such as _safety_. +If your primary responsibility is security, it's best to start by focusing on AI security. Once you have a solid grasp of that, you can expand your knowledge to other AI aspects, even if it's just to support colleagues who are responsible for those areas and help them stay vigilant. After all, security professionals are often skilled at spotting potential failure points. Furthermore, some aspects can be a consequence of compromised AI and are therefore helpful to understand, such as _safety_. -Let's clarify the aspects of AI and see how they relate to security: -- **Accuracy** is about the AI model being sufficiently correct to perform its 'business function'. Being incorrect can lead to harm, including (physical) safety problems (e.g. car trunk opens during driving) or other wrong decisions that are harmful (e.g. wrongfully declined loan). The link with security is that some attacks cause unwanted model behaviour which is by definition an accuracy problem. Nevertheless, the security scope is restricted to mitigating the risks of those attacks - NOT solve the entire problem of creating an accurate model (selecting representative data for the trainset etc.). +Let's break down the principles of AI and explore how each one connects to security: +- **Accuracy** is about the AI model being sufficiently correct to perform its 'business function'. Being incorrect can lead to harm, including (physical) safety problems (e.g. car trunk opens during driving) or other wrong decisions that are harmful (e.g. wrongfully declined loan). The link with security is that some attacks cause unwanted model behaviour which is by definition, an accuracy problem. Nevertheless, the security scope is restricted to mitigating the risks of those attacks - NOT solve the entire problem of creating an accurate model (selecting representative data for the trainset etc.). - **Safety** refers to the condition of being protected from / unlikely to cause harm. Therefore safety of an AI system is about the level of accuracy when there is a risk of harm (typically implying physical harm but not restricted to that) , plus the things that are in place to mitigate those risks (apart from accuracy), which includes security to safeguard accuracy, plus a number of safety measures that are important for the business function of the model. These need to be taken care of and not just for security reasons because the model can make unsafe decisions for other reasons (e.g. bad training data), so they are a shared concern between safety and security: - [oversight](/goto/oversight/) to restrict unsafe behaviour, and connected to that: assigning least privileges to the model, - [continuous validation](/goto/continuousvalidation/) to safeguard accuracy, @@ -437,9 +444,9 @@ Let's clarify the aspects of AI and see how they relate to security: - [explainability](/goto/continuousvalidation/): see below. - **Transparency**: sharing information about the approach, to warn users and depending systems of accuracy risks, plus in many cases users have the right to know details about a model being used and how it has been created. Therefore it is a shared concern between security, privacy and safety. - **Explainability**: sharing information to help users validate accuracy by explaining in more detail how a specific result came to be. Apart from validating accuracy this can also support users to get transparency and to understand what needs to change to get a different outcome. Therefore it is a shared concern between security, privacy, safety and business function. A special case is when explainability is required by law separate from privacy, which adds 'compliance' to the list of aspects that share this concern. -- **Robustness** is about the ability of maintaining accuracy under expected or unexpected variations in input. The security scope is about when those variations are malicious (_adversarial robustness_) which often requires different countermeasures than those required against normal variations (_generalization robustness). Just like with accuracy, security is not involved per se in creating a robust model for normal variations. The exception to this is when generalization robustness adversarial malicious robustness , in which case this is a shared concern between safety and security. This depends on a case by case basis. +- **Robustness** is about the ability of maintaining accuracy under expected or unexpected variations in input. The security scope is about when those variations are malicious (_adversarial robustness_) which often requires different countermeasures than those required against normal variations (_generalization robustness). Just like with accuracy, security is not involved per se in creating a robust model for normal variations. The exception is when generalization robustness or adversarial robustness is involved, as this becomes a shared concern between safety and security. Whether it falls more under one or the other depends on the specific case. - **Free of discrimination**: without unwanted bias of protected attributes, meaning: no systematic inaccuracy where the model 'mistreats' certain groups (e.g. gender, ethnicity). Discrimination is undesired for legal and ethical reasons. The relation with security is that having detection of unwanted bias can help to identify unwanted model behaviour caused by an attack. For example, a data poisoning attack has inserted malicious data samples in the training set, which at first goes unnoticed, but then is discovered by an unexplained detection of bias in the model. Sometimes the term 'fairness' is used to refer to discrimination issues, but mostly fairness in privacy is a broader term referring to fair treatment of individuals, including transparency, ethical use, and privacy rights. -- **Empathy**. The relation of that with security is that the feasible level of security should always be taken into account when validating a certain application of AI. If a sufficient level of security cannot be provided to individuals or organizations, then empathy means invalidating the idea, or takin other precautions. +- **Empathy**. Its connection to security lies in recognizing the practical limits of what security can achieve when evaluating an AI application. If individuals or organizations cannot be adequately protected, empathy means rethinking the idea, either by rejecting it altogether or by taking additional precautions to reduce potential harm. - **Accountability**. The relation of accountability with security is that security measures should be demonstrable, including the process that have led to those measures. In addition, traceability as a security property is important, just like in any IT system, in order to detect, reconstruct and respond to security incidents and provide accountability. - **AI security**. The security aspect of AI is the central topic of the AI Exchange. In short, it can be broken down into: - [Input attacks](/goto/threatsuse/), that are performed by providing input to the model @@ -458,13 +465,13 @@ Yes, GenAI is leading the current AI revolution and it's the fastest moving subf Important note: from a security threat perspective, GenAI is not that different from other forms of AI (_predictive AI_). GenAI threats and controls largely overlap and are very similar to AI in general. Nevertheless, some risks are (much) higher. Some are lower. Only a few risks are GenAI-specific. Some of the control categories differ substantially between GenAI and predictive AI - mostly the data science controls (e.g. adding noise to the training set). In many cases, GenAI solutions will use a model as-is and not involve any training by the organization whatsoever, shifting some of the security responsibilities from the organization to the supplier. Nevertheless, if you use a ready-made model, you need still to be aware of those threats. What is mainly new to the threat landscape because of LLMs? -- First of all, LLMs pose new threats to security because they may be used to create code with vulnerabilities, or they may be used by attackers to create malware, or they may cause harm otherwiser through hallucinations, but these are out of scope of the AI Exchange, as it focuses on security threats TO AI systems. +- First of all, LLMs pose new threats to security because they may be used to create code with vulnerabilities, or they may be used by attackers to create malware, or they may cause harm through hallucinations. However, these concerns are outside the scope of the AI Exchange, which focuses on security threats to AI systems themselves. - Regarding input: - Prompt injection is a completely new threat: attackers manipulating the behaviour of the model with crafted and sometimes hidden instructions. - Also new is organizations sending huge amounts of data in prompts, with company secrets and personal data. -- Regarding output: New is the fact that output can contain injection attacks, or can contain sensitive or copyrighted data (see [Copyright](/goto/copyright/)). +- Regarding output: The fact that output can contain injection attacks, or can contain sensitive or copyrighted data is new (see [Copyright](/goto/copyright/)). - Overreliance is an issue. We let LLMs control and create things and may have too much trust in how correct they are, and also underestimate the risk of them being manipulated. The result is that attacks can have much impact. -- Regarding training: Since the training sets are so large and based on public data, it is easier to perform data poisoning. Poisoned foundation models are also a big supply chain issues. +- Regarding training: Since the training sets are so large and based on public data, it is easier to perform data poisoning. Poisoned foundation models are also a big supply chain issue. GenAI security particularities are: @@ -473,7 +480,7 @@ GenAI security particularities are: |1| GenAI models are controlled by natural language in prompts, creating the risk of [Prompt injection](/goto/promptinjection/). Direct prompt injection is where the user tries to fool the model to behave in unwanted ways (e.g. offensive language), whereas with indirect prompt injection it is a third party that injects content into the prompt for this purpose (e.g. manipulating a decision). | ([OWASP for LLM 01:Prompt injection](https://genai.owasp.org/llmrisk/llm01/)) | |2| GenAI models have typically been trained on very large datasets, which makes it more likely to output [sensitive data](/goto/disclosureuseoutput/) or [licensed data](/goto/copyright/), for which there is no control of access privileges built into the model. All data will be accessible to the model users. Some mechanisms may be in place in terms of system prompts or output filtering, but those are typically not watertight. | ([OWASP for LLM 02: Sensitive Information Disclosure](https://genai.owasp.org/llmrisk/llm02/)) | |3|[Data and model poisoning](/goto/modelpoison/) is an AI-broad problem, and with GenAI the risk is generally higher since training data can be supplied from different sources that may be challenging to control, such as the internet. Attackers could for example hijack domains and place manipulated information. | ([OWASP for LLM 04: Data and Model Poisoning](https://genai.owasp.org/llmrisk/llm04/))| -|4|GenAI models can be inaccurate and hallucinate. This is an AI-broad risk factor, and Large Language Models (GenAI) can make matters worse by coming across very confident and knowledgeable. In essence this is about the risk of underestimating the probability that the model is wrong or the model has been manipulated. This means that it is connected to each and every security control. The strongest link is with [controls that limit the impact of unwanted model behavior](/goto/limitunwanted/), in particular [Least model privilege](/goto/leastmodelprivilege/). |([OWASP for LLM 06: Excessive agency](https://genai.owasp.org/llmrisk/llm06/)) and ([OWASP for LLM 09: Misinformation](https://genai.owasp.org/llmrisk/llm09/)) | +|4|GenAI models can be inaccurate and hallucinate. This is an AI-broad risk factor, and Large Language Models (GenAI) can make matters worse by coming across as very confident and knowledgeable. In essence, this is about the risk of underestimating the probability that the model is wrong or the model has been manipulated. This means that it is connected to each and every security control. The strongest link is with [controls that limit the impact of unwanted model behavior](/goto/limitunwanted/), in particular [Least model privilege](/goto/leastmodelprivilege/). |([OWASP for LLM 06: Excessive agency](https://genai.owasp.org/llmrisk/llm06/)) and ([OWASP for LLM 09: Misinformation](https://genai.owasp.org/llmrisk/llm09/)) | |5| [Leaking input data](/goto/leakinput/): GenAI models mostly live in the cloud - often managed by an external party, which may increase the risk of leaking training data and leaking prompts. This issue is not limited to GenAI, but GenAI has 2 particular risks here: 1) model use involves user interaction through prompts, adding user data and corresponding privacy/sensitivity issues, and 2) GenAI model input (prompts) can contain rich context information with sensitive data (e.g. company secrets). The latter issue occurs with *in context learning* or *Retrieval Augmented Generation(RAG)* (adding background information to a prompt): for example data from all reports ever written at a consultancy firm. First of all, this information will travel with the prompt to the cloud, and second: the system will likely not respect the original access rights to the information.| Not covered in LLM top 10 | |6|Pre-trained models may have been manipulated. The concept of pretraining is not limited to GenAI, but the approach is quite common in GenAI, which increases the risk of [supply-chain model poisoning](/goto/supplymodelpoison/).| ([OWASP for LLM 03 - Supply chain vulnerabilities](https://genai.owasp.org/llmrisk/llm03/))| |7|[Model inversion and membership inference](/goto/modelinversionandmembership/) are typically low to zero risks for GenAI |Not covered in LLM top 10, apart from LLM06 which uses a different approach - see above| @@ -494,7 +501,7 @@ GenAI References: Mapping of the UK NCSC /CISA [Joint Guidelines for secure AI system development](https://www.ncsc.gov.uk/collection/guidelines-secure-ai-system-development) to the controls here at the AI Exchange. To see those controls linked to threats, refer to the [Periodic table of AI security](/goto/periodictable/). -Note that the UK Government drove an initiative through their DSIT repartment to build on these joint guidelines and produce the [DSIT Code of Practice for the Cyber Secyrity of AI](https://www.gov.uk/government/publications/ai-cyber-security-code-of-practice/code-of-practice-for-the-cyber-security-of-ai#code-of-practice-principles), which reorganizes things according to 13 principles, does a few tweaks, and adds a bit more governance. The principle mapping is added below, and adds mostly post-market aspects: +Note that the UK Government drove an initiative through their DSIT department to build on these joint guidelines and produce the [DSIT Code of Practice for the Cyber Security of AI](https://www.gov.uk/government/publications/ai-cyber-security-code-of-practice/code-of-practice-for-the-cyber-security-of-ai#code-of-practice-principles), which reorganizes things according to 13 principles, does a few tweaks, and adds a bit more of governance. The principle mapping is added below, and adds mostly post-market aspects: - Principle 10: Communication and processes assoiated with end-users and affected entities - Principle 13: Ensure proper data and model disposal @@ -576,7 +583,7 @@ question of whether the use of copyrighted works to train AI models constitutes infringement, potentially exposing developers to legal claims. On the other hand, the majority of the industry grapples with the ownership of AI-generated works and the use of unlicensed content in training data. This legal ambiguity affects all -stakeholders—developers, content creators, and copyright owners alike. +stakeholders including developers, content creators, and copyright owners alike. #### Lawsuits Related to AI & Copyright Recent lawsuits (writing is April 2024) highlight the urgency of these issues. For instance, a class @@ -623,7 +630,7 @@ Note that AI vendors have started to take responsibility for copyright issues of Read more at [The Verge on Microsoft indemnification](https://www.theverge.com/2023/9/7/23863349/microsoft-ai-assume-responsibility-copyright-lawsuit) and [Direction Microsoft on the requirements of the indemnification](https://www.directionsonmicrosoft.com/blog/why-microsofts-copilot-copyright-commitment-may-not-mean-much-for-customers-yet/). #### Do generative AI models really copy existing work? -Do generative AI models really lookup existing work that may be copyrighted? In essence: no. A Generative AI model does not have sufficient capacity to store all the examples of code or pictures that were in its training set. Instead, during training it extracts patterns about how things work in the data that it sees, and then later, based on those patterns, it generates new content. Parts of this content may show remnants of existing work, but that is more of a coincidence. In essence, a model doesn't recall exact blocks of code, but uses its 'understanding' of coding to create new code. Just like with human beings, this understanding may result in reproducing parts of something you have seen before, but not per se because this was from exact memory. Having said that, this remains a difficult discussion that we also see in the music industry: did a musician come up with a chord sequence because she learned from many songs that this type of sequence works and then coincidentally created something that already existed, or did she copy it exactly from that existing song? +Do generative AI models really lookup existing work that may be copyrighted? In essence: no. A Generative AI model does not have sufficient capacity to store all the examples of code or pictures that were in its training set. Instead, during training, it extracts patterns about how things work in the data that it sees, and then later, based on those patterns, it generates new content. Parts of this content may show remnants of existing work, but that is more of a coincidence. In essence, a model doesn't recall exact blocks of code, but uses its 'understanding' of coding to create new code. Just like with human beings, this understanding may result in reproducing parts of something you have seen before, but not per se because this was from exact memory. Having said that, this remains a difficult discussion that we also see in the music industry: did a musician come up with a chord sequence because she learned from many songs that this type of sequence works and then coincidentally created something that already existed, or did she copy it exactly from that existing song? #### Mitigating Risk Organizations have several key strategies to mitigate the risk of copyright @@ -658,7 +665,7 @@ system will help check against potential infringements by the AI system. quickly and effectively to any potential infringement claims. 10. Additional mitigating factors to consider include seeking licenses and/or warranties from AI suppliers regarding the organization’s intended use, as well as all future uses by the AI system. With the -help of legal counsel the organization should also consider other contractually +help of a legal counsel, the organization should also consider other contractually binding obligations on suppliers to cover any potential claims of infringement. From 90cfc012b73422fb70ae18bdca4d4f934dddeb31 Mon Sep 17 00:00:00 2001 From: charliepaks Date: Tue, 1 Jul 2025 10:42:29 +0100 Subject: [PATCH 2/8] addressed addtitional comments --- content/ai_exchange/content/docs/_index.md | 2 +- .../ai_exchange/content/docs/ai_security_overview.md | 12 ++++++------ 2 files changed, 7 insertions(+), 7 deletions(-) diff --git a/content/ai_exchange/content/docs/_index.md b/content/ai_exchange/content/docs/_index.md index f548315a..302debaa 100644 --- a/content/ai_exchange/content/docs/_index.md +++ b/content/ai_exchange/content/docs/_index.md @@ -3,7 +3,7 @@ title: Content --- {{< cards >}} - {{< small-card link="/docs/ai_security_overview/" title="0.AI Security Overview">}} + {{< small-card link="/docs/ai_security_overview/" title="0. AI Security Overview">}} {{< small-card link="/docs/1_general_controls/" title="1. General controls">}} {{< small-card link="/docs/2_threats_through_use/" title="2. Threats through use">}} {{< small-card link="/docs/3_development_time_threats/" title="3. Development-time threats">}} diff --git a/content/ai_exchange/content/docs/ai_security_overview.md b/content/ai_exchange/content/docs/ai_security_overview.md index ce72e18a..dc7c4e27 100644 --- a/content/ai_exchange/content/docs/ai_security_overview.md +++ b/content/ai_exchange/content/docs/ai_security_overview.md @@ -175,7 +175,7 @@ The AI security matrix below (click to enlarge) shows all threats and risks, ord The below diagram puts the controls in the AI Exchange into groups and places these groups in the right lifecycle with the corresponding threats. [![](/images/threatscontrols.png)](/images/threatscontrols.png) The groups of controls form a summary of how to address AI security (controls are in capitals): -1. **AI Governance**: implement governance processes for AI risk, and include them in your information security and software lifecycle processes: +1. **AI Governance**: integrate AI comprehensively into your information security and software development lifecycle processes, not just by addressing AI risks, but by embedding AI considerations across the entire lifecycle: >( [AIPROGRAM](/goto/aiprogram/ ), [SECPROGRAM](/goto/secprogram/), [DEVPROGRAM](/goto/devprogram/), [SECDEVPROGRAM](/goto/secdevprogram/), [CHECKCOMPLIANCE](/goto/checkcompliance/), [SECEDUCATE](/goto/seceducate/)) 2. Apply conventional **technical IT security controls** in a risk-based manner, since an AI system is an IT system: - 2a Apply **standard** conventional IT security controls (e.g. 15408, ASVS, OpenCRE, ISO 27001 Annex A, NIST SP800-53) to the complete AI system and don't forget the new AI-specific assets : @@ -276,9 +276,9 @@ General risk management for AI systems is typically driven by AI governance - se Organizations often adopt a Risk Management framework, commonly based on ISO 31000 or similar standards such as ISO 23894. These frameworks guide the process of managing risks through four key steps as outlined below: 1. **Identifying Risks**: Recognizing potential risks that could impact the organization. See “Threat through use” section to identify potential risks. -2. **Evaluating Risks by Estimating Likelihood and Impact**: To determine the severity of a risk, it is necessary to assess the probability of the risk occurring and evaluating the potential consequences should the risk materialize. Combining likelihood and impact to gauge the risk's overall severity. This is typically presented in the form of a heatmap. This is discuused in more detail in the sections that follow. +2. **Evaluating Risks by Estimating Likelihood and Impact**: To determine the severity of a risk, it is necessary to assess the probability of the risk occurring and evaluating the potential consequences should the risk materialize. Combining likelihood and impact to gauge the risk's overall severity. This is typically presented in the form of a heatmap. This is discussed in more detail in the sections that follow. 3. **Deciding What to Do (Risk Treatment)**: Choosing an appropriate strategy to address the risk. These strategies include: Risk Mitigation, Transfer, Avoidance, or Acceptance. See below for further details. -4. **Risk Communication and Monitoring**: Regularly sharing risk information with stakeholders to ensure awareness and continuous support for risk management activities. Ensuring effective Risk Treatments are applied. This requires a Risk Register, a comprehensive list of risks and their attributes (e.g. severity, treatment plan, ownership, status, etc). This is discuused in more detail in the sections that follow. +4. **Risk Communication and Monitoring**: Regularly sharing risk information with stakeholders to ensure awareness and continuous support for risk management activities. Ensuring effective Risk Treatments are applied. This requires a Risk Register, a comprehensive list of risks and their attributes (e.g. severity, treatment plan, ownership, status, etc). This is discussed in more detail in the sections that follow. Let's go through the risk management steps one by one. @@ -299,7 +299,7 @@ Discovering potential risks that could impact the organization requires technica - Prevent [prompt injection](/goto/directpromptinjection/) (mostly done by the model supplier). When untrusted input goes directly into a model, and there's a possibility that the model's output could be harmful (for example, by offending, providing dangerous information, or spreading misinformation, or output that triggers harmful functions (Agentic AI) )- it's a significant concern. This is particularly the case if model input comes from end-users and output goes straight to them, or can trigger functions. - Prevent [indirect prompt injection](/goto/indirectpromptinjection/), in case untrusted data is a part of the prompt e.g. you retrieve somebody's resume and include it in a prompt. - Sometimes model training and running the model is deferred to a supplier. For generative AI, training is mostly performed by an external supplier typically costs millions of dollars. Finetuning of generative AI is also not often performed by organizations given the cost of compute and the complexity involved. Some GenAI models can be obtained and run on your own infrastructure. The reasons for this could be lower cost (if is is an open source model), and the fact that sensitive input information does not have to be sent externally. A reason to use an externally hosted GenAI model can be the quality of the model. + Sometimes model training and running the model is deferred to a supplier. For generative AI, training is mostly performed by an external supplier because it is expensive and usually costs millions of dollars. Finetuning of generative AI is also not often performed by organizations given the cost of compute and the complexity involved. Some GenAI models can be obtained and run on your own infrastructure. The reasons for this could be lower cost (if is is an open source model), and the fact that sensitive input information does not have to be sent externally. A reason to use an externally hosted GenAI model can be the quality of the model. Who trains/finetunes the model? - The supplier: you need to avoid [obtaining a poisoned model](/goto/transferlearningattack/) through proper supply chain management (by selecting a trustworthy supplier and verifying the authenticity of the model). This involves ensuring that the supplier prevents model poisoning during development, including data poisoning, and uses uncompromised data. If the risk of data poisoning remains unacceptable, implementing post-training countermeasures can be a viable option. See [POISONROBUSTMODEL](/goto/poisonrobustmodel/). @@ -326,7 +326,7 @@ Discovering potential risks that could impact the organization requires technica - [training data leaking from your engineering environment](/goto/devdataleak/). - [membership inference]((/goto/modelinversionandmembership/)) - but only in the event where something or someone that was part of the training data constitutes sensitive information. For example, when the training set consists of criminals and their history to predict criminal careers. Membership of that set gives away the person is a convicted or alleged criminal. - If you use RAG: apply the above measures to your repository data because it feeds into the model and can therefore be part of the output as well. + If you use RAG: If you use RAG: apply the above to your repository data, as if it was part of the training set: as the repository data feeds into the model and can therefore be part of the output as well. If you don't train/finetune the model, then the supplier of the model is responsible for unwanted content in the training data. This can be poisoned data (see above), data that is confidential, or data that is copyrighted. It is important to check licenses, warranties and contracts for these matters, or accept the risk based on your circumstances. @@ -343,7 +343,7 @@ Discovering potential risks that could impact the organization requires technica **Leaking input data** Is your input data sensitive? - - Prevent [leaking input data](/goto/leakinput/). If the model is run by a supplier, proper care needs to be taken to ensure that this data is minimized and transferred or stored securely. Review the security measures provided by the supplier, including any options to disable logging or monitoring on their end. If you're using a RAG system, remember that the data you retrieve and inject into the prompt also counts as input data. This often includes sensitive company information or personal data. + - Prevent [leaking input data](/goto/leakinput/). Especially if the model is run by a supplier, proper care needs to be taken to ensure that this data is minimized and transferred or stored securely. Review the security measures provided by the supplier, including any options to disable logging or monitoring on their end. If you're using a RAG system, remember that the data you retrieve and inject into the prompt also counts as input data. This often includes sensitive company information or personal data. **Misc.** From c157c65288dd9ef4a7c0da0f16d57e52bf7a9b73 Mon Sep 17 00:00:00 2001 From: charliepaks Date: Thu, 3 Jul 2025 10:53:50 +0100 Subject: [PATCH 3/8] update 1_general_controls.md --- .../content/docs/1_general_controls.md | 28 +++++++++---------- 1 file changed, 14 insertions(+), 14 deletions(-) diff --git a/content/ai_exchange/content/docs/1_general_controls.md b/content/ai_exchange/content/docs/1_general_controls.md index 8b2e0e66..b3f62111 100644 --- a/content/ai_exchange/content/docs/1_general_controls.md +++ b/content/ai_exchange/content/docs/1_general_controls.md @@ -61,7 +61,7 @@ References: Security program: Make sure the organization has a security program (also referred to as _information security management system_) and that it includes the whole AI lifecycle and AI specific aspects. -Purpose: ensures adequate mitigation of AI security risks through information security management, as the security program takes responsibility for the AI-specific threats and corresponding. For more details on using this document in risk analysis, see the [risk analysis section](/goto/riskanalysis/). +Purpose: ensures adequate mitigation of AI security risks through information security management, as the security program takes responsibility for the AI-specific threats and corresponding risks. For more details on using this document in risk analysis, see the [risk analysis section](/goto/riskanalysis/). Make sure to include AI-specific assets and the threats to them. The threats are covered in this resource and the assets are: - training data @@ -143,21 +143,21 @@ The best way to do this is to build on your existing secure software development Particularities for AI in secure software development: - AI teams (e.g. data scientists) need to be taken into scope of your secure development activities, for them to address both conventional security threats and AI-specific threats, applying both conventional security controls and AI-specific ones. Typically, technical teams depend on the AI engineers when it comes to the AI-specific controls as they mostly require deep AI expertise. For example: if training data is confidential and collected in a distributed way, then a federated learning approach may be considered. -- AI security assets, threats and controls (as covered in this document) need to be considered, effecting requirements, policies, coding guidelines, training, tooling, testing practices and more. Usually, this is done by adding these elements in the organizations Information Security Management System, as described in [SECPROGRAM](/goto/segprogram/), and align secure software development to that - just like it has been aligned on the conventional assets, threats and controls. +- AI security assets, threats and controls (as covered in this document) need to be considered, effecting requirements, policies, coding guidelines, training, tooling, testing practices and more. Usually, this is done by adding these elements in the organization's Information Security Management System, as described in [SECPROGRAM](/goto/segprogram/), and align secure software development to that - just like it has been aligned on the conventional assets, threats and controls. - Apart from software components, the supply chain for AI can also include data and models which may have been poisoned, which is why data provenance and model management are central in [AI supply chain management](/goto/supplychainmanage/). -- In AI, software components can also run in the development environment instead of in production, for example to train models, which increases the attack surface e.g. malicious development components attacking training data. +- In AI, software components can also run in the development environment instead of in production, for example, to train models, which increases the attack surface e.g. malicious development components attacking training data. AI-specific elements in the development environment (sometimes referred to as MLops): - Supply chain management of data and models, including provenance of the internal processes (for data this effectively means data governance) -- In addition supply chain management: integrity checks on elements that can have been poisoned (data, models), using an internal or external signed registry for example +- In addition to supply chain management: integrity checks on elements that can be poisoned (data, models), using an internal or external signed registry for example - Static code analysis - Running big data/AI technology-specific static analysis rules (e.g the typical mistake of creating a new dataframe in Python without assigning it to a new one) - Running maintainability analysis on code, as data and model engineering code is typically hindered by code quality issues - - Evaluating code for the percentage of code for automated testing. Industry average is 43% (SIG benchmark report 2023). An often cited recommendation is 80%. Research shows that automated testing in AI engineering is often neglected (SIG benchmark report 2023), as the performance of the AI model is mistakenly regarded as the ground truth of correctness. + - Evaluating the proportion of code covered by automated tests is essential for understanding software quality. Industry average is 43% (SIG benchmark report 2023). An often cited recommendation is 80%. Research shows that automated testing in AI engineering is often neglected (SIG benchmark report 2023), as the performance of the AI model is mistakenly regarded as the ground truth of correctness. - Training (if required) - Automated training of the model when necessary - Automated detection of training set issues (standard data quality control plus checking for potential poisoning using pattern recognition or anomaly detection) - - Any pre-training controls to mitigate poisoning risks, especially if the deployment process is segregated from the rest of the engineering environment in which poisoning an have taken place, e.g. fine pruning (reducing the size of the model and doing extra training with a ground truth training set) + - Any pre-training controls to mitigate poisoning risks, especially if the deployment process is segregated from the rest of the engineering environment in which poisoning may have taken place, e.g. fine pruning (reducing the size of the model and doing extra training with a ground truth training set) - Automated data collection and transformation to prepare the train set, when required - Version management/traceability of the combination of code, configuration, training data and models, for troubleshooting and rollback - Running AI-specific dynamic tests before deployment: @@ -225,7 +225,7 @@ Useful standards include: > Permalink: https://owaspai.org/goto/checkcompliance/ Check compliance: Make sure that AI-relevant laws and regulations are taken into account in compliance management (including security aspects). If personal data is involved and/or AI is applied to make decisions about individuals, then privacy laws and regulations are also in scope. See the [OWASP AI Guide](https://owasp.org/www-project-ai-security-and-privacy-guide/) for privacy aspects of AI. -Compliance as a goal can be a powerful driver for organizations to grow their readiness for AI. While doing this it is important to keep in mind that legislation has a scope that does not necessarily include all the relevant risks for the organization. Many rules are about the potential harm to individuals and society, and don’t cover the impact on business processes per se. For example: the European AI act does not include risks for protecting company secrets. In other words: be mindful of blind spots when using laws and regulations as your guide. +Compliance as a goal can be a powerful driver for organizations to grow their readiness for AI. While doing this, it is important to keep in mind that legislation has a scope that does not necessarily include all the relevant risks for the organization. Many rules are about the potential harm to individuals and society, and don’t cover the impact on business processes per se. For example: the European AI act does not include risks for protecting company secrets. In other words: be mindful of blind spots when using laws and regulations as your guide. Global Jurisdictional considerations (as of end of 2023): @@ -242,7 +242,7 @@ General Legal Considerations on AI/Security: - Data Breaches: any 3rd party supplier must answer as to how they store their data and security frameworks around it, which may include personal data or IP of end-users Non-Security Compliance Considerations: -- Ethics: Deep fake weaponization and how system addresses and deals with it, protects against it and mitigates it +- Ethics: Deep fake weaponization and how the system addresses and deals with it, protects against it and mitigates it - Human Control: any and all AI systems should be deployed with appropriate level of human control and oversight, based on ascertained risks to individuals. AI systems should be designed and utilized with the concept that the use of AI respects dignity and rights of individuals; “Keep the human in the loop” concept. See [Oversight](/goto/oversight/). - Discrimination: a process must be included to review datasets to avoid and prevent any bias. See [Unwanted bias testing](/goto/unwantedbiastesting/). - Transparency: ensure transparency in the AI system deployment, usage and proactive compliance with regulatory requirements; “Trust by Design” @@ -282,7 +282,7 @@ Data minimize: remove data fields or records (e.g. from a training set) that are Purpose: minimize the impact of data leakage or manipulation -A typical opportunity to remove unnecessary data in machine learning is to clean up data that has just been for experimental use. +A typical opportunity to remove unnecessary data in machine learning is to clean up data that is used solely for experimental purposes. A method to determine which fields or records can be removed is to statistically analyze which data elements do not play a role in model performance. @@ -403,7 +403,7 @@ Minimize access to technical details that could help attackers. Purpose: reduce the information available to attackers, which can assist them in selecting and tailoring their attacks, thereby lowering the probability of a successful attack. -Miminizing and protecting technical details can be achieved by incorporating such details as an asset into information security management. This will ensure proper asset management, data classification, awareness education, policy, and inclusion in risk analysis. +Minimizing and protecting technical details can be achieved by incorporating such details as an asset into information security management. This will ensure proper asset management, data classification, awareness education, policy, and inclusion in risk analysis. Note: this control needs to be weighed against the [AITRANSPARENCY](#aitransparency) control that requires to be more open about technical aspects of the model. The key is to minimize information that can help attackers while being transparent. @@ -411,7 +411,7 @@ For example: - Consider this risk when publishing technical articles on the AI system - When choosing a model type or model implementation, take into account that there is an advantage of having technology with which attackers are less familiar - - Minimize model output regarding technical details + - Minimize technical details in model output Useful standards include: @@ -442,7 +442,7 @@ Successfully mitigating unwanted model behaviour has its own threats: Example: The typical use of plug-ins in Large Language Models (GenAI) presents specific risks concerning the protection and privileges of these plug-ins. This is because they enable Large Language Models (LLMs, a GenAI) to perform actions beyond their normal interactions with users. ([OWASP for LLM 07](https://llmtop10.com/llm07/)) -Example: LLMs (GenAI), just like most AI models, induce their results based on training data, meaning that they can make up things that are false. In addition, the training data can contain false or outdated information. At the same time, LLMs (GenAI) can come across very confident about their output. These aspects make overreliance of LLM (GenAI) ([OWASP for LLM 09](https://llmtop10.com/llm09/)) a real risk, plus excessive agency as a result of that ([OWASP for LLM 08](https://llmtop10.com/llm08/)). Note that all AI models in principle can suffer from overreliance - not just Large Language Models. +Example: LLMs (GenAI), just like most AI models, induce their results based on training data, meaning that they can make up things that are false because the training data can contain false or outdated information. At the same time, LLMs (GenAI) can come across very confident about their output. These aspects make overreliance of LLM (GenAI) ([OWASP for LLM 09](https://llmtop10.com/llm09/)) a real risk, plus excessive agency as a result of that ([OWASP for LLM 08](https://llmtop10.com/llm08/)). Note that all AI models in principle can suffer from overreliance - not just Large Language Models. **Controls to limit the effects of unwanted model behaviour:** @@ -502,7 +502,7 @@ See the [DISCRETE](#discrete) control for the balance between being transparent Useful standards include: - - ISO/IEC 42001 B.7.2 describes data management to support transparency. Gap: covers this control minimally, as it only covers the data mnanagement part. + - ISO/IEC 42001 B.7.2 describes data management to support transparency. Gap: covers this control minimally, as it only covers the data management part. - Not covered further in ISO/IEC standards. #### #CONTINUOUSVALIDATION @@ -529,4 +529,4 @@ Explainability: Explaining how individual model decisions are made, a field refe > Category: runtime data science control > Permalink: https://owaspai.org/goto/unwantedbiastesting/ -Unwanted bias testing: by doing test runs of the model to measure unwanted bias, unwanted behaviour caused by an attack can be detected. The details of bias detection fall outside the scope of this document as it is not a security concern - other than that an attack on model behaviour can cause bias. +Unwanted bias testing: By doing test runs of the model to measure unwanted bias, unwanted behaviour caused by an attack can be detected. The details of bias detection fall outside the scope of this document as it is not a security concern - other than that, an attack on model behaviour can cause bias. From b0c53d21c7853d4d1dd3089574f2c6a3507c36c2 Mon Sep 17 00:00:00 2001 From: charliepaks Date: Thu, 3 Jul 2025 11:16:49 +0100 Subject: [PATCH 4/8] update charter.md --- content/ai_exchange/content/charter.md | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/content/ai_exchange/content/charter.md b/content/ai_exchange/content/charter.md index 8b1b9975..6a1bb680 100644 --- a/content/ai_exchange/content/charter.md +++ b/content/ai_exchange/content/charter.md @@ -4,7 +4,7 @@ title: 'AI Exchange Charter' ## Purpose >Comprehensive guidance and alignment on how to protect AI against security threats - by professionals, for professionals. -The goal of the OWASP AI Exchange is to protect society from AI security issues by independently harnessing the collective wisdom of global experts across various disciplines. This initiative focuses on advancing AI security understanding, supporting the development of global AI security guidelines, standards and regulations, and simplifying the AI security domain for professionals and organizations. Its goal is to provide a comprehensive overview of AI threats, risks, mitigations, and controls. This overview needs to align and feed into global standardization initiatives such as the EU AI Act, ISO/IEC 27090 (AI Security), the OWASP ML Top 10, the OWASP LLM Top 10, and OpenCRE. This alignment, achieved through open source Github collaboration and liaisons with working groups. Alignment is crucial to prevent confusion and ignorance, leading to harm from AI security incidents. The position of the Exchange is altruistic: NOT to set a standard, but to drive standards, and still be the top bookmark for people dealing with AI security. +The goal of the OWASP AI Exchange is to protect society from AI security issues by independently harnessing the collective wisdom of global experts across various disciplines. This initiative focuses on advancing AI security understanding, supporting the development of global AI security guidelines, standards and regulations, and simplifying the AI security domain for professionals and organizations. Its goal is to provide a comprehensive overview of AI threats, risks, mitigations, and controls. This overview needs to align and feed into global standardization initiatives such as the EU AI Act, ISO/IEC 27090 (AI Security), the OWASP ML Top 10, the OWASP LLM Top 10, and OpenCRE. This alignment is achieved through open source Github collaboration and liaisons with working groups. Alignment is crucial to prevent confusion and ignorance that lead to harm that stems from AI security incidents. The position of the Exchange is altruistic: NOT to set a standard, but to drive standards, and still be the top bookmark for people dealing with AI security. ## Target Audience This charter primarily addresses the needs of cybersecurity experts, privacy/regulatory/ legal professionals, AI leaders, developers, and data scientists. It offers accessible guidance and resources to these groups, enabling them to apply, build and maintain secure AI systems effectively. @@ -14,7 +14,7 @@ Our mission is to establish the OWASP AI Exchange as the place to go for profess ## Scope & Responsibilities - **AI-specific**: Focus on the topics that are specific to AI, and cover how generic topics (e.g. risk analysis) can be adapted for AI and discuss AI attention points for them -- **The security OF AI**: that's what the Exchange is about, so it covers threats TO AI systems. Some of those threats have effect on the behaviour/availability of the AI system which indirectly creates threats BY AI. +- **The security OF AI**: that's what the Exchange is about, so it covers threats to AI systems. Some of those threats have effect on the behaviour/availability of the AI system which indirectly creates threats BY AI. - **Explain and refer**: the Exchange covers a topic by a concise explanation that transcends the material by making it clear, sensible, mentioning important points of consideration, and referring the reader to further reading. Think of the explanation of 'AI security for professional dummies'. - Develop a **comprehensive framework** for AI threats, risks, and controls (mitigations) - establish a common taxonomy and glossary for AI security. - Create insight into **relevant laws and regulations**. From d39ec144f5f38363b3f891ade977f8776884f952 Mon Sep 17 00:00:00 2001 From: charliepaks Date: Thu, 3 Jul 2025 13:40:22 +0100 Subject: [PATCH 5/8] update 6_privacy.md --- content/ai_exchange/content/docs/6_privacy.md | 14 +++++++------- 1 file changed, 7 insertions(+), 7 deletions(-) diff --git a/content/ai_exchange/content/docs/6_privacy.md b/content/ai_exchange/content/docs/6_privacy.md index 932a57b9..9461e402 100644 --- a/content/ai_exchange/content/docs/6_privacy.md +++ b/content/ai_exchange/content/docs/6_privacy.md @@ -5,9 +5,9 @@ weight: 7 > Category: discussion > Permalink: https://owaspai.org/goto/aiprivacy/ -Just like any system that processes data, AI systems can have privacy risks. There are some particualar privacy aspects to AI: +Just like any system that processes data, AI systems can have privacy risks. There are specific privacy concerns associated with AI: - AI systems are data-intensive and typically present additional risks regarding data collection and retention. Personal data may be collected from various sources, each subject to different levels of **sensitivity and regulatory constraints**. Legislation often requires a **legal basis and/or consent** for the collection and use of personal data, and specifies **rights to individuals** to correct, request, and remove their own data. -- **Protecting training data** is a challenge, especially because it typically needs to be retained for long periods - as many models need to be retrained. Often, the actual identities of people involved are irrelevant for the model, but privacy risks still remain even if identity data is removed because it might be possible to deduce individual identities from the remaining data. This is where differential privacy becomes crucial: by altering the data to make it sufficiently unrecognizable, it ensures individual privacy while still allowing for valuable insights to be derived from the data. Alteration can be done by for example adding noise or aggregating. +- **Protecting training data** is a challenge, especially because it typically needs to be retained for long periods - as many models need to be retrained. Often, the actual identities of people involved are irrelevant for the model, but privacy risks still remain even if identity data is removed because it might be possible to deduce individual identities from the remaining data. This is where differential privacy becomes crucial: by altering the data to make it sufficiently unrecognizable, it ensures individual privacy while still allowing for valuable insights to be derived from the data. Alteration can be achieved, for example, by adding noise or using aggregation techniques. - An additional complication in the protection of training data is that the **training data is accessible in the engineering environment**, which therefore needs more protection than it usually does - since conventional systems normally don't have personal data available to technical teams. - The nature of machine learning allows for certain **unique strategies** to improve privacy, such as federated learning: splitting up the training set in different separated systems - typically aligning with separated data collection. - AI systems **make decisions** and if these decisions are about people they may be discriminating regarding certain protected attributes (e.g. gender, race), plus the decisions may result in actions that invade privacy, which may be an ethical or legal concern. Furthermore, legislation may prohibit some types of decisions and sets rules regarding transparency about how these decisions are made, and about how individuals have the right to object. @@ -20,7 +20,7 @@ AI Privacy can be divided into two parts: - Confidentiality and integrity protection of personal data in train/test data, model input or output - which consists of: - 'Conventional' security of personal data in transit and in rest - Protecting against model attacks that try to retrieve personal data (e.g. model inversion) - - personal data minimization / differential privacy, including minimized retention + - Personal data minimization / differential privacy, including minimized retention - Integrity protection of the model behaviour if that behaviour can hurt privacy of individuals. This happens for example when individuals are unlawfully discriminated or when the model output leads to actions that invade privacy (e.g. undergoing a fraud investigation). 2. Threats and controls that are not about security, but about further rights of the individual, as covered by privacy regulations such as the GDPR, including use limitation, consent, fairness, transparency, data accuracy, right of correction/objection/erasure/request. @@ -32,8 +32,8 @@ This section covers how privacy principles apply to AI systems: Essentially, you should not simply use data collected for one purpose (e.g. safety or security) as a training dataset to train your model for other purposes (e.g. profiling, personalized marketing, etc.) For example, if you collect phone numbers and other identifiers as part of your MFA flow (to improve security ), that doesn't mean you can also use it for user targeting and other unrelated purposes. Similarly, you may need to collect sensitive data under KYC requirements, but such data should not be used for ML models used for business analytics without proper controls. -Some privacy laws require a lawful basis (or bases if for more than one purpose) for processing personal data (See GDPR's Art 6 and 9). -Here is a link with certain restrictions on the purpose of an AI application, like for example the [prohibited practices in the European AI Act](https://artificialintelligenceact.eu/article/5) such as using machine learning for individual criminal profiling. Some practices are regarded as too riskful when it comes to potential harm and unfairness towards individuals and society. +Some privacy laws require a lawful basis (or bases if used for more than one purpose) for processing personal data (See GDPR's Art 6 and 9). +Here is a link with certain restrictions on the purpose of an AI application, like for example the [prohibited practices in the European AI Act](https://artificialintelligenceact.eu/article/5) such as using machine learning for individual criminal profiling. Some practices are regarded as too risky when it comes to potential harm and unfairness towards individuals and society. Note that a use case may not even involve personal data, but can still be potentially harmful or unfair to indiduals. For example: an algorithm that decides who may join the army, based on the amount of weight a person can lift and how fast the person can run. This data can not be used to reidentify individuals (with some exceptions), but still the use case may be unrightfully unfair towards gender (if the algorithm for example is based on an unfair training set). @@ -50,7 +50,7 @@ New techniques that enable use limitation include: Fairness means handling personal data in a way individuals expect and not using it in ways that lead to unjustified adverse effects. The algorithm should not behave in a discriminating way. (See also [this article](https://iapp.org/news/a/what-is-the-role-of-privacy-professionals-in-preventing-discrimination-and-ensuring-equal-treatment/)). Furthermore: accuracy issues of a model becomes a privacy problem if the model output leads to actions that invade privacy (e.g. undergoing fraud investigation). Accuracy issues can be caused by a complex problem, insufficient data, mistakes in data and model engineering, and manipulation by attackers. The latter example shows that there can be a relation between model security and privacy. -GDPR's Article 5 refers to "fair processing" and EDPS' [guideline](https://edpb.europa.eu/sites/default/files/files/file1/edpb_guidelines_201904_dataprotection_by_design_and_by_default_v2.0_en.pdf) defines fairness as the prevention of "unjustifiably detrimental, unlawfully discriminatory, unexpected or misleading" processing of personal data. GDPR does not specify how fairness can be measured, but the EDPS recommends the right to information (transparency), the right to intervene (access, erasure, data portability, rectify), and the right to limit the processing (right not to be subject to automated decision-making and non-discrimination) as measures and safeguard to implement the principle of fairness. +GDPR's Article 5 refers to "fair processing" and EDPS' [guideline](https://edpb.europa.eu/sites/default/files/files/file1/edpb_guidelines_201904_dataprotection_by_design_and_by_default_v2.0_en.pdf) defines fairness as the prevention of "unjustifiably detrimental, unlawfully discriminatory, unexpected or misleading" processing of personal data. GDPR does not specify how fairness can be measured, but the EDPS recommends the right to information (transparency), the right to intervene (access, erasure, data portability, rectify), and the right to limit the processing (right not to be subject to automated decision-making and non-discrimination) as measures and safeguards to implement the principle of fairness. In the [literature](http://fairware.cs.umass.edu/papers/Verma.pdf), there are different fairness metrics that you can use. These range from group fairness, false positive error rate, unawareness, and counterfactual fairness. There is no industry standard yet on which metric to use, but you should assess fairness especially if your algorithm is making significant decisions about the individuals (e.g. banning access to the platform, financial implications, denial of services/opportunities, etc.). There are also efforts to test algorithms using different metrics. For example, NIST's [FRVT project](https://pages.nist.gov/frvt/html/frvt11.html) tests different face recognition algorithms on fairness using different metrics. @@ -134,7 +134,7 @@ As said, many of the discussion topics on AI are about human rights, social just ## Before you start: Privacy restrictions on what you can do with AI -The GDPR does not restrict the applications of AI explicitly but does provide safeguards that may limit what you can do, in particular regarding Lawfulness and limitations on purposes of collection, processing, and storage - as mentioned above. For more information on lawful grounds, see [article 6](https://gdpr.eu/article-6-how-to-process-personal-data-legally/) +The GDPR does not restrict the applications of AI explicitly but does provide safeguards that may limit what you can do, in particular regarding lawfulness and limitations on purposes of collection, processing, and storage - as mentioned above. For more information on lawful grounds, see [article 6](https://gdpr.eu/article-6-how-to-process-personal-data-legally/) The [US Federal Trade Committee](https://www.ftc.gov/business-guidance/blog/2023/02/keep-your-ai-claims-check) provides some good (global) guidance in communicating carefully about your AI, including not to overpromise. From 46a886d97db56762acc58f96b66f91c053f2d748 Mon Sep 17 00:00:00 2001 From: charliepaks Date: Mon, 7 Jul 2025 14:05:55 +0100 Subject: [PATCH 6/8] update 4_runtime_application_security_threats.md --- .../content/docs/4_runtime_application_security_threats.md | 7 +++---- 1 file changed, 3 insertions(+), 4 deletions(-) diff --git a/content/ai_exchange/content/docs/4_runtime_application_security_threats.md b/content/ai_exchange/content/docs/4_runtime_application_security_threats.md index 61fcc7ad..5d6cfc80 100644 --- a/content/ai_exchange/content/docs/4_runtime_application_security_threats.md +++ b/content/ai_exchange/content/docs/4_runtime_application_security_threats.md @@ -21,10 +21,10 @@ Note: some controls in this document are application security controls that are Useful standards include: - See [OpenCRE on technical application security controls](https://www.opencre.org/cre/636-660) - The ISO 27002 controls only partly cover technical application security controls, and on a high abstraction level - - More detailed and comprehensive control overviews can be found in for example Common criteria protection profiles (ISO/IEC 15408 with evaluation described in ISO 18045), + - More detailed and comprehensive control overviews can be found in for example, Common criteria protection profiles (ISO/IEC 15408 with evaluation described in ISO 18045), - or in [OWASP ASVS](https://owasp.org/www-project-application-security-verification-standard/) - Operational security - When models are hosted by third parties then security configuration of those services deserves special attention. Part of this configuration is [model access control](/goto/modelaccesscontrol/): an important mitigation for security risks. Cloud AI configuration options deserve scrutiny, like for example opting out when necessary of monitoring by the third party - which could increase the risk of exposing sensitive data. + When models are hosted by third parties then security configuration of those services deserves special attention. Part of this configuration is [model access control](/goto/modelaccesscontrol/): an important mitigation for security risks. Cloud AI configuration options deserve scrutiny, like for example opting out of third party monitoring when necessary - which could increase the risk of exposing sensitive data. Useful standards include: - See [OpenCRE on operational security processes](https://www.opencre.org/cre/862-452) - The ISO 27002 controls only partly cover operational security controls, and on a high abstraction level @@ -82,8 +82,7 @@ Run-time model confidentiality: see [SECDEVPROGRAM](/goto/secdevprogram/) to att A Trusted Execution Environment can be highly effective in safeguarding the runtime environment, isolating model operations from potential threats, including side-channel hardware attacks like [DeepSniffer](https://sites.cs.ucsb.edu/~sherwood/pubs/ASPLOS-20-deepsniff.pdf). By ensuring that sensitive computations occur within this secure enclave,the TEE reduces the risk of attackers gaining useful information through side-channel methods. Side-Channel Mitigation Techniques: -- Masking: Introducing random delays or noise during inference can help obscure the relationship between input data and the model’s response times, thereby -complicating timing-based side-channel attacks. See [Masking against Side-Channel Attacks: A Formal Security Proof](https://www.iacr.org/archive/eurocrypt2013/78810139/78810139.pdf) +- Masking: Introducing random delays or noise during inference can help obscure the relationship between input data and the model’s response times, thereby complicating timing-based side-channel attacks. See [Masking against Side-Channel Attacks: A Formal Security Proof](https://www.iacr.org/archive/eurocrypt2013/78810139/78810139.pdf) - Shielding: Employing hardware-based shielding could help prevent electromagnetic or acoustic leakage that might be exploited for side-channel attacks. See [Electromagnetic Shielding for Side-Channel Attack Countermeasures](https://ieeexplore.ieee.org/document/8015660) From ec5590b006a2691fc9c774c43a1ffec02cbfce70 Mon Sep 17 00:00:00 2001 From: charliepaks Date: Tue, 15 Jul 2025 17:53:05 +0100 Subject: [PATCH 7/8] update 2_threats_through_use.md --- .../content/docs/2_threats_through_use.md | 95 ++++++++++--------- 1 file changed, 48 insertions(+), 47 deletions(-) diff --git a/content/ai_exchange/content/docs/2_threats_through_use.md b/content/ai_exchange/content/docs/2_threats_through_use.md index 41c2c22c..eaa4213b 100644 --- a/content/ai_exchange/content/docs/2_threats_through_use.md +++ b/content/ai_exchange/content/docs/2_threats_through_use.md @@ -13,15 +13,15 @@ Threats through use take place through normal interaction with an AI model: prov - See [General controls](/goto/generalcontrols/), especially [Limiting the effect of unwanted behaviour](/goto/limitunwanted/) and [Sensitive data limitation](/goto/dataminimize/) - The below control(s), each marked with a # and a short name in capitals -#### #MONITORUSE +#### #MONITOR USE >Category: runtime information security control for threats through use >Permalink: https://owaspai.org/goto/monitoruse/ -Monitor use: Monitor the use of the model (input, date, time, user) by registering it in logs, so it can be used to reconstruct incidents, and made it part of the existing incident detection process - extended with AI-specific methods, including: +Monitor use: Monitor the use of the model (input, date, time, user) by registering it in logs, so it can be used to reconstruct incidents, and make it part of the existing incident detection process - extended with AI-specific methods, including: - - improper functioning of the model (see [CONTINUOUSVALIDATION](/goto/continuousvalidation/) and [UNWANTEDBIASTESTING](/goto/unwantedbiastesting/)) - - suspicious patterns of model use (e.g. high frequency - see [RATELIMIT](#ratelimit) and [DETECTADVERSARIALINPUT](#detectadversarialinput)) - - suspicious inputs or series of inputs (see [DETECTODDINPUT](#detectoddinput) and [DETECTADVERSARIALINPUT](#detectadversarialinput)) + - improper functioning of the model (see [CONTINUOUS VALIDATION](/goto/continuousvalidation/) and [UNWANTED BIAS TESTING](/goto/unwantedbiastesting/)) + - suspicious patterns of model use (e.g. high frequency - see [RATE LIMIT](#ratelimit) and [DETECT ADVERSARIAL INPUT](#detectadversarialinput)) + - suspicious inputs or series of inputs (see [DETECT ODD INPUT](#detectoddinput) and [DETECT ADVERSARIAL INPUT](#detectadversarialinput)) By adding details to logs on the version of the model used and the output, troubleshooting becomes easier. @@ -31,7 +31,7 @@ Useful standards include: - ISO/IEC 42001 B.6.2.6 discusses AI system operation and monitoring. Gap: covers this control fully, but on a high abstraction level. - See [OpenCRE](https://www.opencre.org/cre/058-083). Idem -#### #RATELIMIT +#### #RATE LIMIT >Category: runtime information security control for threats through use >Permalink: https://owaspai.org/goto/ratelimit/ @@ -41,7 +41,7 @@ Purpose: severely delay attackers trying many inputs to perform attacks through Particularity: limit access not to prevent system overload (conventional rate limiting goal) but to also prevent experimentation for AI attacks. -Remaining risk: this control does not prevent attacks that use low frequency of interaction (e.g. don't rely on heavy experimentation) +Residual risk: this control does not prevent attacks that use low frequency of interaction (e.g. don't rely on heavy experimentation) References: - [Article on token bucket and leaky bucket rate limiting](https://medium.com/@apurvaagrawal_95485/token-bucket-vs-leaky-bucket-1c25b388436c) @@ -52,7 +52,7 @@ Useful standards include: - ISO 27002 has no control for this - See [OpenCRE](https://www.opencre.org/cre/630-573) -#### #MODELACCESSCONTROL +#### #MODEL ACCESS CONTROL >Category: runtime information security control for threats through use >Permalink: https://owaspai.org/goto/modelaccesscontrol/ @@ -60,14 +60,14 @@ Model access control: Securely limit allowing access to use the model to authori Purpose: prevent attackers that are not authorized to perform attacks through use. -Remaining risk: attackers may succeed in authenticating as an authorized user, or qualify as an authorized user, or bypass the access control through a vulnerability, or it is easy to become an authorized user (e.g. when the model is publicly available) +Residual risk: Attackers may succeed in authenticating as an authorized user, qualify as one, bypass access controls through a vulnerability, or easily become authorized (e.g. when the model is publicly available) -Note: this is NOT protection of a strored model. For that, see Model confidentiality in Runtime and Development at the [Periodic table](https://owaspai.org/goto/periodictable/). +Note: this is NOT protection of a stored model. For that, see Model confidentiality in Runtime and Development at the [Periodic table](https://owaspai.org/goto/periodictable/). Additional benefits of model access control are: -- Linking users to activity is Opportunity to link certain use or abuse to individuals - of course under privacy obligations -- Linking activity to a user (or using service) allows more accurate [rate limiting](/goto/ratelimit/) to user-accounts, and detection suspect series of actions - since activity can be linked to paterns of individual users +- Linking users to activity provides an opportunity to associate specific usage or abuse with individuals — provided this is done in compliance with privacy obligations +- Linking activity to a user (or using service) enables more accurate[rate limiting](/goto/ratelimit/) per user account and improves detection of suspicious activity patterns — since actions can be attributed to individual users. Useful standards include: @@ -81,19 +81,20 @@ Useful standards include: >Category: group of threats through use >Permalink: https://owaspai.org/goto/evasion/ -Evasion: an attacker fools the model by crafting input to mislead it into performing its task incorrectly. +Evasion: an attacker fools the model by crafting input to mislead it into performing its tasks incorrectly. Impact: Integrity of model behaviour is affected, leading to issues from unwanted model output (e.g. failing fraud detection, decisions leading to safety issues, reputation damage, liability). -A typical attacker goal with Evasion is to find out how to slightly change a certain input (say an image, or a text) to fool the model. The advantage of slight change is that it is harder to detect by humans or by an automated detection of unusual input, and it is typically easier to perform (e.g. slightly change an email message by adding a word so it still sends the same message, but it fools the model in for example deciding it is not a phishing message). -Such small changes (call 'perturbations') lead to a large (and false) modification of its outputs. The modified inputs are often called *adversarial examples*. +A typical attacker's goal with Evasion is to find out how to slightly change a certain input (say an image, or a text) to fool the model. The advantage of slight change is that it is harder to get detected by humans or by an automated means, and it is typically easier to perform (e.g. slightly change an email message by adding a word so it still sends the same message, but it fools the model in for example deciding it is not a phishing message). +Such small changes (called 'perturbations') lead to a large (and false) modification of its outputs. The modified inputs are often called *adversarial examples*. Evasion attacks can be categorized into physical (e.g. changing the real world to influence for example a camera image) and digital (e.g. changing a digital image). Furthermore, they can be categorized in either untargeted (any wrong output) and targeted (a specific wrong output). Note that Evasion of a binary classifier (i.e. yes/no) belongs to both categories. Example 1: slightly changing traffic signs so that self-driving cars may be fooled. ![](/images/inputphysical.png) -Example 2: through a special search process it is determined how a digital input image can be changed undetectably leading to a completely different classification. +Example 2: through a specialized search process, a digital input image can be subtly altered — without detection — resulting in a completely different classification. + ![](/images/inputdigital.png) Example 3: crafting an e-mail text by carefully choosing words to avoid triggering a spam detection algorithm. @@ -106,14 +107,14 @@ See [MITRE ATLAS - Evade ML model](https://atlas.mitre.org/techniques/AML.T0015) **Controls for evasion:** -An Evasion attack typically consists of first searching for the inputs that mislead the model, and then applying it. That initial search can be very intensive, as it requires trying many variations of input. Therefore, limiting access to the model with for example Rate limiting mitigates the risk, but still leaves the possibility of using a so-called transfer attack (see [Closed box evasion](/goto/closedboxevasion/) to search for the inputs in another, similar, model. +An Evasion attack typically consists of first searching for the inputs that mislead the model, and then applying it. That initial search can be very intensive, as it requires trying many variations of input. Therefore, limiting access to the model with for example Rate limiting mitigates the risk, but still leaves the possibility of using a so-called transfer attack (see [Closed box evasion](/goto/closedboxevasion/) to search for the inputs in another similar model. - See [General controls](/goto/generalcontrols/), especially [Limiting the effect of unwanted behaviour](/goto/limitunwanted/) - See [controls for threats through use](/goto/threatsuse/) - The below control(s), each marked with a # and a short name in capitals -#### #DETECTODDINPUT ->Category: runtime datasciuence control for threats through use +#### #DETECT ODD INPUT +>Category: runtime datascience control for threats through use >Permalink: https://owaspai.org/goto/detectoddinput/ Detect odd input: implement tools to detect whether input is odd: significantly different from the training data or even invalid - also called input validation - without knowledge on what malicious input looks like. @@ -145,10 +146,10 @@ An example of how to implement this is _activation Analysis_: Examining the acti **Open Set Recognition (OSR)** - a way to perform Anomaly Detection): Classifying known classes while identifying and rejecting unknown classes during testing. OSR is a way to perform anomaly detection, as it involves recognizing when an instance does not belong to any of the learned categories. This recognition makes use of the decision boundaries of the model. -Example: During operation, the system identifies various known objects such as cars, trucks, pedestrians, and bicycles. However, when it encounters an unrecognized object, such as a fallen tree, it must classify it as "unknown. Open set recognition is critical because the system must be able to recognize that this object doesn't fit into any of its known categories. +Example: During operation, the system identifies various known objects such as cars, trucks, pedestrians, and bicycles. However, when it encounters an unrecognized object, such as a fallen tree, it must classify it as "unknown". Open set recognition is critical because the system must be able to recognize that this object doesn't fit into any of its known categories. **Novelty Detection (ND)** - OOD input that is recognized as not malicious: -OOD input data can sometimes be recognized as not malicious and relevant or of interest. The system can decide how to respond: perhaps trigger another use case, or log is specifically, or let the model process the input if the expectation is that it can generalize to produce a sufficiently accurate result. +OOD input data can sometimes be recognized as not malicious and relevant or of interest. The system can decide how to respond: perhaps trigger another use case, or log it specifically, or let the model process the input if the expectation is that it can generalize to produce a sufficiently accurate result. Example: The system has been trained on various car models. However, it has never seen a newly released model. When it encounters a new model on the road, novelty detection recognizes it as a new car type it hasn't seen, but understands it's still a car, a novel instance within a known category. @@ -170,7 +171,7 @@ References: - Sehwag, Vikash, et al. "Analyzing the robustness of open-world machine learning." Proceedings of the 12th ACM Workshop on Artificial Intelligence and Security. 2019. -#### #DETECTADVERSARIALINPUT +#### #DETECT ADVERSARIAL INPUT >Category: runtime data science control for threats through use >Permalink: https://owaspai.org/goto/detectadversarialinput/ @@ -179,7 +180,7 @@ Detect adversarial input: Implement tools to detect specific attack patterns in The main concepts of adversarial attack detectors include: - **Statistical analysis of input series**: Adversarial attacks often follow certain patterns, which can be analysed by looking at input on a per-user basis. For example to detect series of small deviations in the input space, indicating a possible attack such as a search to perform model inversion or an evasion attack. These attacks also typically have series of inputs with a general increase of confidence value. Another example: if inputs seem systematic (very random or very uniform or covering the entire input space) it may indicate a [model theft through use attack](/goto/modeltheftuse/). - **Statistical Methods**: Adversarial inputs often deviate from benign inputs in some statistical metric and can therefore be detected. Examples are utilizing the Principal Component Analysis (PCA), Bayesian Uncertainty Estimation (BUE) or Structural Similarity Index Measure (SSIM). These techniques differentiate from statistical analysis of input series, as these statistical detectors decide if a sample is adversarial or not per input sample, such that these techniques are able to also detect transferred black box attacks. -- **Detection Networks**: A detector network operates by analyzing the inputs or the behavior of the primary model to spot adversarial examples. These networks can either run as a preprocessing function or in parallel to the main model. To use a detector networks as a preprocessing function, it has to be trained to differentiate between benign and adversarial samples, which is in itself a hard task. Therefore it can rely on e.g. the original input or on statistical metrics. To train a detector network to run in parallel to the main model, typically the detector is trained to distinguish between benign and adversarial inputs from the intermediate features of the main model's hidden layer. Caution: Adversarial attacks could be crafted to circumvent the detector network and fool the main model. +- **Detection Networks**: A detector network operates by analyzing the inputs or the behavior of the primary model to spot adversarial examples. These networks can either run as a preprocessing function or in parallel to the main model. To use a detector network as a preprocessing function, it has to be trained to differentiate between benign and adversarial samples, which is in itself a hard task. Therefore it can rely on e.g. the original input or on statistical metrics. To train a detector network to run in parallel to the main model, typically the detector is trained to distinguish between benign and adversarial inputs from the intermediate features of the main model's hidden layer. Caution: Adversarial attacks could be crafted to circumvent the detector network and fool the main model. - **Input Distortion Based Techniques (IDBT)**: A function is used to modify the input to remove any adversarial data. The model is applied to both versions of the image, the original input and the modified version. The results are compared to detect possible attacks. See [INPUTDISTORTION](/goto/inputdistortion/). - **Detection of adversarial patches**: These patches are localized, often visible modifications that can even be placed in the real world. The techniques mentioned above can detect adversarial patches, yet they often require modification due to the unique noise pattern of these patches, particularly when they are used in real-world settings and processed through a camera. In these scenarios, the entire image includes benign camera noise (camera fingerprint), complicating the detection of the specially crafted adversarial patches. @@ -193,7 +194,7 @@ Useful standards include: References: - - [Feature squeezing](https://arxiv.org/pdf/1704.01155.pdf) (IDBT) compares the output of the model against the output based on a distortion of the input that reduces the level of detail. This is done by reducing the number of features or reducing the detail of certain features (e.g. by smoothing). This approach is like [INPUTDISTORTION](#inputdistortion), but instead of just changing the input to remove any adversarial data, the model is also applied to the original input and then used to compare it, as a detection mechanism. + - [Feature squeezing](https://arxiv.org/pdf/1704.01155.pdf) (IDBT) compares the output of the model against the output based on a distortion of the input that reduces the level of detail. This is done by reducing the number of features or reducing the detail of certain features (e.g. by smoothing). This approach is like [INPUT DISTORTION](#inputdistortion), but instead of just changing the input to remove any adversarial data, the model is also applied to the original input and then used to compare it, as a detection mechanism. - [MagNet](https://arxiv.org/abs/1705.09064) and [here](https://www.mdpi.com/2079-9292/11/8/1283) @@ -257,24 +258,24 @@ images." arXiv preprint arXiv:1608.00530 (2016). - Feinman, Reuben, et al. "Detecting adversarial samples from artifacts." arXiv preprint arXiv:1703.00410 (2017). -#### #EVASIONROBUSTMODEL ->Category: development-time datascience control for threats through use +#### #EVASION ROBUST MODEL +>Category: development-time data science control for threats through use >Permalink: https://owaspai.org/goto/evasionrobustmodel/ -Evastion-robust model: choose an evasion-robust model design, configuration and/or training approach to maximize resilience against evasion (Data science). +Evasion-robust model: choose an evasion-robust model design, configuration and/or training approach to maximize resilience against evasion (Data science). A robust model in the light of evasion is a model that does not display significant changes in output for minor changes in input. Adversarial examples are the name for inputs that represent input with an unwanted result, where the input is a minor change of an input that leads to a wanted result. -In other words: if we interpret the model with its inputs as a "system" and the sensitivity to evasion attacks as the "system fault" then this sensitivity may also be interpreted as (local) lack of graceful degradation. +In other words: if we interpret the model with its inputs as a "system" and the sensitivity to evasion attacks as the "system fault" then this sensitivity may also be interpreted as a (local) lack of graceful degradation. Reinforcing adversarial robustness is an experimental process where model robustness is measured in order to determine countermeasures. Measurement takes place by trying minor input deviations to detect meaningful outcome variations that undermine the model's reliability. If these variations are undetectable to the human eye but can produce false or incorrect outcome descriptions, they may also significantly undermine the model's reliability. Such cases indicate lack of model resilience to input variance resulting in sensitivity to evasion attacks and require detailed investigation. Adversarial robustness (the senstitivity to adversarial examples) can be assessed with tools like [IBM Adversarial Robustness Toolbox](https://research.ibm.com/projects/adversarial-robustness-toolbox), [CleverHans](https://github.com/cleverhans-lab/cleverhans), or [Foolbox](https://github.com/bethgelab/foolbox). Robustness issues can be addressed by: -- Adversarial training - see [TRAINADVERSARIAL](/goto/trainadversarial/) +- Adversarial training - see [TRAIN ADVERSARIAL](/goto/trainadversarial/) - Increasing training samples for the problematic part of the input domain - Tuning/optimising the model for variance -- _Randomisation_ by injecting noise during training, causing the input space for correct classifications to grow. See also [TRAINDATADISTORTION](/goto/traindatadistortion/) against data poisoning and [OBFUSCATETRAININGDATA](/goto/obfuscatetrainingdata/) to minimize sensitive data through randomisation. +- _Randomisation_ by injecting noise during training, causing the input space for correct classifications to grow. See also [TRAIN DATA DISTORTION](/goto/traindatadistortion/) against data poisoning and [OBFUSCATE TRAINING DATA](/goto/obfuscatetrainingdata/) to minimize sensitive data through randomisation. - _gradient masking_: a technique employed to make training more efficient and defend machine learning models against adversarial attacks. This involves altering the gradients of a model during training to increase the difficulty of generating adversarial examples for attackers. Methods like adversarial training and ensemble approaches are utilized for gradient masking, but it comes with limitations, including computational expenses and potential in effectiveness against all types of attacks. See [Article in which this was introduced](https://arxiv.org/abs/1602.02697). Care must be taken when considering robust model designs, as security concerns have arisen about their effectiveness. @@ -306,11 +307,11 @@ gradients give a false sense of security: Circumventing defenses to adversarial examples." International conference on machine learning. PMLR, 2018. -#### #TRAINADVERSARIAL +#### #TRAIN ADVERSARIAL >Category: development-time data science control for threats through use >Permalink: https://owaspai.org/goto/trainadversarial/ -Train adversarial: Add adversarial examples to the training set to make the model more robust against evasion attacks. First, adversarial examples are generated, just like they would be generated for an evasion attack. By definition, the model produces the wrong output for those examples. By adding them to the training set with the right output, the model is in essence corrected. As a result it generalizes better. In other words, by training the model on adversarial examples, it learns to not overly rely on subtle patterns that might not generalize well, which are by the way similar to the patterns that poisoned data might introduce. +Train adversarial: Add adversarial examples to the training set to make the model more robust against evasion attacks. First, adversarial examples are generated, just like they would be generated for an evasion attack. By definition, the model produces the wrong output for those examples. By adding them to the training set with the right output, the model is in essence corrected. As a result it generalizes better. In other words, by training the model on adversarial examples, it learns to not overly rely on subtle patterns that might not generalize well, which are by the way, similar to the patterns that poisoned data might introduce. It is important to note that generating the adversarial examples creates significant training overhead, does not scale well with model complexity / input dimension, can lead to overfitting, and may not generalize well to new attack methods. @@ -328,7 +329,7 @@ It is important to note that generating the adversarial examples creates signifi - Vaishnavi, Pratik, Kevin Eykholt, and Amir Rahmati. "Transferring adversarial robustness through robust representation matching." 31st USENIX Security Symposium (USENIX Security 22). 2022. #### #INPUTDISTORTION ->Category: runtime datasciuence control for threats through use +>Category: runtime datascience control for threats through use >Permalink: https://owaspai.org/goto/inputdistortion/ Input distortion: Lightly modify the input with the intention to distort the adversarial attack causing it to fail, while maintaining sufficient model correctness. Modification can be done by e.g. adding noise (randomization), smoothing or JPEG compression. @@ -340,7 +341,7 @@ A set of defense techniques called Random Transformations (RT) defends neural ne Note that black-box or closed-box attacks do not rely on the gradients and are therefore not affected by shattered gradients, as they do not use the gradients to calculate the attack. Black box attacks use only the input and the output of the model or whole AI system to calculate the adversarial input. For a more detailed discussion of these attacks see Closed-box evasion. -See [DETECTADVERSARIALINPUT](#detectadversarialinput) for an approach where the distorted input is used for detecting an adversarial attack. +See [DETECT ADVERSARIAL INPUT](#detectadversarialinput) for an approach where the distorted input is used for detecting an adversarial attack. Useful standards include: @@ -359,7 +360,7 @@ References: - Athalye, Anish, Nicholas Carlini, and David Wagner. "Obfuscated gradients give a false sense of security: Circumventing defenses to adversarial examples." International conference on machine learning. PMLR, 2018. -#### #ADVERSARIALROBUSTDISTILLATION +#### #ADVERSARIAL ROBUST DISTILLATION >Category: development-time data science control for threats through use >Permalink: https://owaspai.org/goto/adversarialrobustdistillation/ @@ -395,8 +396,8 @@ Black box attack strategies are: In query-based black box attacks, an attacker systematically queries the target model using carefully designed inputs and observes the resulting outputs to search for variations of input that lead to a false decision of the model. This approach enables the attacker to indirectly reconstruct or estimate the model's decision boundaries, thereby facilitating the creation of inputs that can mislead the model. These attacks are categorized based on the type of output the model provides: - - Desicion-based (or Label-based) attacks: where the model only reveals the top prediction label - - Score-based attacks: where the model discloses a score (like a softmax score), often in the form of a vector indicating the top-k predictions.In research typically models which output the whole vector are evaluated, but the output could also be restricted to e.g. top-10 vector. The confidence scores provide more detailed feedback about how close the adversarial example is to succeeding, allowing for more precise adjustments. In a score-based scenario an attacker can for example approximate the gradient by evaluating the objective function values at two very close points. + - Decision-based (or Label-based) attacks: where the model only reveals the top prediction label + - Score-based attacks: where the model discloses a score (like a softmax score), often in the form of a vector indicating the top-k predictions. In research typically models which output the whole vector are evaluated, but the output could also be restricted to e.g. top-10 vector. The confidence scores provide more detailed feedback about how close the adversarial example is to succeeding, allowing for more precise adjustments. In a score-based scenario an attacker can for example approximate the gradient by evaluating the objective function values at two very close points. References: @@ -482,7 +483,7 @@ Prompt injection attacks involve maliciously crafting or manipulating input prom - See [controls for threats through use](/goto/threatsuse/) - The below control(s), each marked with a # and a short name in capitals -#### #PROMPTINPUTVALIDATION +#### #PROMPT INPUT VALIDATION > Category: runtime information security control against application security threats > Permalink: https://owaspai.org/goto/promptinputvalidation/ @@ -495,7 +496,7 @@ Prompt input validation: trying to detect/remove malicious instructions by attem Direct prompt injection: a user tries to fool a Generative AI (eg. a Large Language Model) by presenting prompts that make it behave in unwanted ways. It can be seen as social engineering of a generative AI. This is different from an [evasion attack](/goto/evasion/) which inputs manipulated data (instead of instructions) to make the model perform its task incorrectly. -Impact: Obtaining information from the AI that is offensive, confidential, could grant certain legal rights, or triggers unauthorized functionality. Note that the person providing the prompt is the one receiving this information. The model itself is typically not altered, so this attack does not affect anyone else outside of the user (i.e., the attacker). The exception is when a model works with a shared context between users that can be influenced by user instructions. +Impact: Obtaining information from the AI that is offensive, confidential, could grant certain legal rights, or trigger unauthorized functionality. Note that the person providing the prompt is the one receiving this information. The model itself is typically not altered, so this attack does not affect anyone else outside of the user (i.e., the attacker). The exception is when a model works with a shared context between users that can be influenced by user instructions. Many Generative AI systems have been given instructions by their suppliers (so-called _alignment_), for example to prevent offensive language, or dangerous instructions. Direct prompt injection is often aimed at countering this, which is referred to as a *jailbreak attack*. @@ -509,7 +510,7 @@ Example 4: Making a chatbot say things that are legally binding and gain attacke Example 5: The process of trying prompt injection can be automated, searching for _pertubations_ to a prompt that allow circumventing the alignment. See [this article by Zou et al](https://llm-attacks.org/). -Example 6: Prompt leaking: when an attacker manages through prompts to retrieve instructions to an LLM that were given by its makers +Example 6: Prompt leaking: when an attacker manages through prompts to retrieve instructions from an LLM that were given by its makers See [MITRE ATLAS - LLM Prompt Injection](https://atlas.mitre.org/techniques/AML.T0051) and ([OWASP for LLM 01](https://genai.owasp.org/llmrisk/llm01/)). @@ -548,7 +549,7 @@ References - See [controls for prompt injection](/goto/promptinjection/) - The below control(s), each marked with a # and a short name in capitals -#### #INPUTSEGREGATION +#### #INPUT SEGREGATION > Category: runtime information security control against application security threats > Permalink: https://owaspai.org/goto/inputsegregation/ @@ -577,7 +578,7 @@ The model discloses sensitive training data or is abused to do so. The output of the model may contain sensitive data from the training set, for example a large language model (GenAI) generating output including personal data that was part of its training set. Furthermore, GenAI can output other types of sensitive data, such as copyrighted text or images(see [Copyright](/goto/copyright/)). Once training data is in a GenAI model, original variations in access rights cannot be controlled anymore. ([OWASP for LLM 02](https://genai.owasp.org/llmrisk/llm02/)) -The disclosure is caused by an unintentional fault of including this data, and exposed through normal use or through provocation by an attacker using the system. See [MITRE ATLAS - LLM Data Leakage](https://atlas.mitre.org/techniques/AML.T0057) +The disclosure is caused by an unintentional fault involving the inclusion of this data, and exposed through normal use or through provocation by an attacker using the system. See [MITRE ATLAS - LLM Data Leakage](https://atlas.mitre.org/techniques/AML.T0057) **Controls specific for sensitive data output from model:** @@ -585,7 +586,7 @@ The disclosure is caused by an unintentional fault of including this data, and e - See [controls for threats through use](/goto/threatsuse/), to limit the model user group, the amount of access and to detect disclosure attempts - The below control(s), each marked with a # and a short name in capitals -#### #FILTERSENSITIVEMODELOUTPUT +#### #FILTER SENSITIVE MODEL OUTPUT >Category: runtime information security control for threats through use >Permalink: https://owaspai.org/goto/filtersensitivemodeloutput/ @@ -621,7 +622,7 @@ Controls for Model inversion and membership inference: - See [controls for threats through use](/goto/threatsuse/) - The below control(s), each marked with a # and a short name in capitals -#### #OBSCURECONFIDENCE +#### #OBSCURE CONFIDENCE >Category: runtime data science control for threats through use >Permalink: https://owaspai.org/goto/obscureconfidence/ @@ -631,7 +632,7 @@ Useful standards include: - Not covered yet in ISO/IEC standards -#### #SMALLMODEL +#### #SMALL MODEL >Category: development-time data science control for threats through use >Permalink: https://owaspai.org/goto/smallmodel/ @@ -682,7 +683,7 @@ For example: A _sponge attack_ or _energy latency attack_ provides input that is - The below control(s), each marked with a # and a short name in capitals -#### #DOSINPUTVALIDATION +#### #DOS INPUT VALIDATION >Category: runtime information security control for threats through use >Permalink: https://owaspai.org/goto/dosinputvalidation/ @@ -695,7 +696,7 @@ Useful standards include: - [OpenCRE on input validation](https://www.opencre.org/cre/010-308) -#### #LIMITRESOURCES +#### #LIMIT RESOURCES >Category: runtime information security control for threats through use >Permalink: https://owaspai.org/goto/limitresources/ From cbbdb852d46e60a78dcdbe3c1a537dbbf144c200 Mon Sep 17 00:00:00 2001 From: charliepaks Date: Wed, 16 Jul 2025 12:02:04 +0100 Subject: [PATCH 8/8] update 2_threats_through_use.md-fix hash tags --- .../content/docs/2_threats_through_use.md | 30 +++++++++---------- 1 file changed, 15 insertions(+), 15 deletions(-) diff --git a/content/ai_exchange/content/docs/2_threats_through_use.md b/content/ai_exchange/content/docs/2_threats_through_use.md index eaa4213b..2be55b22 100644 --- a/content/ai_exchange/content/docs/2_threats_through_use.md +++ b/content/ai_exchange/content/docs/2_threats_through_use.md @@ -13,7 +13,7 @@ Threats through use take place through normal interaction with an AI model: prov - See [General controls](/goto/generalcontrols/), especially [Limiting the effect of unwanted behaviour](/goto/limitunwanted/) and [Sensitive data limitation](/goto/dataminimize/) - The below control(s), each marked with a # and a short name in capitals -#### #MONITOR USE +#### #MONITORUSE >Category: runtime information security control for threats through use >Permalink: https://owaspai.org/goto/monitoruse/ @@ -31,7 +31,7 @@ Useful standards include: - ISO/IEC 42001 B.6.2.6 discusses AI system operation and monitoring. Gap: covers this control fully, but on a high abstraction level. - See [OpenCRE](https://www.opencre.org/cre/058-083). Idem -#### #RATE LIMIT +#### #RATELIMIT >Category: runtime information security control for threats through use >Permalink: https://owaspai.org/goto/ratelimit/ @@ -52,7 +52,7 @@ Useful standards include: - ISO 27002 has no control for this - See [OpenCRE](https://www.opencre.org/cre/630-573) -#### #MODEL ACCESS CONTROL +#### #MODELACCESSCONTROL >Category: runtime information security control for threats through use >Permalink: https://owaspai.org/goto/modelaccesscontrol/ @@ -113,7 +113,7 @@ An Evasion attack typically consists of first searching for the inputs that misl - See [controls for threats through use](/goto/threatsuse/) - The below control(s), each marked with a # and a short name in capitals -#### #DETECT ODD INPUT +#### #DETECTODDINPUT >Category: runtime datascience control for threats through use >Permalink: https://owaspai.org/goto/detectoddinput/ @@ -171,7 +171,7 @@ References: - Sehwag, Vikash, et al. "Analyzing the robustness of open-world machine learning." Proceedings of the 12th ACM Workshop on Artificial Intelligence and Security. 2019. -#### #DETECT ADVERSARIAL INPUT +#### #DETECTADVERSARIALINPUT >Category: runtime data science control for threats through use >Permalink: https://owaspai.org/goto/detectadversarialinput/ @@ -258,7 +258,7 @@ images." arXiv preprint arXiv:1608.00530 (2016). - Feinman, Reuben, et al. "Detecting adversarial samples from artifacts." arXiv preprint arXiv:1703.00410 (2017). -#### #EVASION ROBUST MODEL +#### #EVASIONROBUSTMODEL >Category: development-time data science control for threats through use >Permalink: https://owaspai.org/goto/evasionrobustmodel/ @@ -307,7 +307,7 @@ gradients give a false sense of security: Circumventing defenses to adversarial examples." International conference on machine learning. PMLR, 2018. -#### #TRAIN ADVERSARIAL +#### #TRAINADVERSARIAL >Category: development-time data science control for threats through use >Permalink: https://owaspai.org/goto/trainadversarial/ @@ -360,7 +360,7 @@ References: - Athalye, Anish, Nicholas Carlini, and David Wagner. "Obfuscated gradients give a false sense of security: Circumventing defenses to adversarial examples." International conference on machine learning. PMLR, 2018. -#### #ADVERSARIAL ROBUST DISTILLATION +#### #ADVERSARIALROBUSTDISTILLATION >Category: development-time data science control for threats through use >Permalink: https://owaspai.org/goto/adversarialrobustdistillation/ @@ -483,7 +483,7 @@ Prompt injection attacks involve maliciously crafting or manipulating input prom - See [controls for threats through use](/goto/threatsuse/) - The below control(s), each marked with a # and a short name in capitals -#### #PROMPT INPUT VALIDATION +#### #PROMPTINPUTVALIDATION > Category: runtime information security control against application security threats > Permalink: https://owaspai.org/goto/promptinputvalidation/ @@ -549,7 +549,7 @@ References - See [controls for prompt injection](/goto/promptinjection/) - The below control(s), each marked with a # and a short name in capitals -#### #INPUT SEGREGATION +#### #INPUTSEGREGATION > Category: runtime information security control against application security threats > Permalink: https://owaspai.org/goto/inputsegregation/ @@ -586,7 +586,7 @@ The disclosure is caused by an unintentional fault involving the inclusion of th - See [controls for threats through use](/goto/threatsuse/), to limit the model user group, the amount of access and to detect disclosure attempts - The below control(s), each marked with a # and a short name in capitals -#### #FILTER SENSITIVE MODEL OUTPUT +#### #FILTERSENSITIVEMODELOUTPUT >Category: runtime information security control for threats through use >Permalink: https://owaspai.org/goto/filtersensitivemodeloutput/ @@ -622,7 +622,7 @@ Controls for Model inversion and membership inference: - See [controls for threats through use](/goto/threatsuse/) - The below control(s), each marked with a # and a short name in capitals -#### #OBSCURE CONFIDENCE +#### #OBSCURECONFIDENCE >Category: runtime data science control for threats through use >Permalink: https://owaspai.org/goto/obscureconfidence/ @@ -632,7 +632,7 @@ Useful standards include: - Not covered yet in ISO/IEC standards -#### #SMALL MODEL +#### #SMALLMODEL >Category: development-time data science control for threats through use >Permalink: https://owaspai.org/goto/smallmodel/ @@ -683,7 +683,7 @@ For example: A _sponge attack_ or _energy latency attack_ provides input that is - The below control(s), each marked with a # and a short name in capitals -#### #DOS INPUT VALIDATION +#### #DOSINPUTVALIDATION >Category: runtime information security control for threats through use >Permalink: https://owaspai.org/goto/dosinputvalidation/ @@ -696,7 +696,7 @@ Useful standards include: - [OpenCRE on input validation](https://www.opencre.org/cre/010-308) -#### #LIMIT RESOURCES +#### #LIMITRESOURCES >Category: runtime information security control for threats through use >Permalink: https://owaspai.org/goto/limitresources/