~/posts/genai-security-principles

GenAI Security Principles

Building secure AI products at scale

GenAI Security Principles
$ cat content.md

Introduction

At Block, we are not only integrating AI in our products (like Square AI), we are revolutionizing how development happens with Goose, our open source agent.

Block’s Product Security team has a core tenant of securing our financial platform without slowing down the pace of innovation. To aid our mission, the ProdSec team has developed a set of security principles that allow for secure adoption of GenAI. Platform and product teams at Block are using these best practices to build fundamental defenses within their apps, services, and frameworks. We thought it might be a good idea to share these principles with our industry peers.

Defining the GenAI Security Guidelines

Background info: DSL levels

Before we jump into talking about Block’s GenAI security guidelines, we would have to talk about DSL. At Block we have the concept of Data Safety Levels (DSL). We have discussed DSL in detail in a previous blog. It categorizes data into four tiers (DSL-1 through DSL-4), defining how sensitive the data is and what security controls are required.

  • DSL-1 covers data with low sensitivity, like publicly available information, needing minimal safeguards.
  • DSL-2 includes internal or confidential data, which requires standard security measures.
  • DSL-3 applies to significantly sensitive data, such as transaction histories or partial customer identifiers, calling for enhanced encryption and stricter access control.
  • DSL-4 is reserved for highly sensitive information (like full payment card numbers, Social Security numbers, or tax returns) requiring the strongest protections, including mandatory encryption at the application level and multi-party authorization.

Security Tiers

DSL is a part of common lingo at Block and therefore it made sense to structure security principles into tiers based on the sensitivity of the data accessed by AI functionality. The guidelines of each tier encompass those of the preceding tiers.

  • Tier 2 (Baseline): Fundamental security controls applicable to all systems using generative AI at DSL-1 & DSL-2.
  • Tier 3 (Sensitive): Minimum required controls for generative AI handling sensitive DSL-3 data and any Personally Identifiable Information (PII).
  • Tier 4 (Highly Sensitive): Applies to AI implementations that deal with DSL-4 data.

Security Principles

The tables below use the following symbols to indicate whether a particular security control MUST or SHOULD, be used at a given Security Requirement Tier, as follows:
✅ — MUST (required)
⭐ — SHOULD (recommended)

Generative AI Security PrinciplesTier 2Tier 3Tier 4
Treat LLM output the same as user generated content
All output generated by Large Language Models (LLMs) must be treated the same as freeform user input. This means applying the same validation, sanitization, and content moderation processes to ensure security. While models may provide proper output during testing, they are ever-changing. It is important to have input and output filters in place and not rely solely on the model to avoid hallucinations or ignore prompt injections.
The LLM should never be the one enforcing access control
Access control should be managed independently of the LLM, as LLMs can be manipulated by user input.
Avoid directly executing generated content
For example, instead of having a LLM generate raw HTML for a custom graph to display to a user, have it generate a json schema that describes the graph.
Treat LLM output the same as input: LLM Data Inheritance
Assume the model output can reflect the entire context provided as input When DSL’n’ information is provided to an LLM, then the output must also be treated as DSL’n’. To reduce scope we can run a redaction step over the output, but engineering the solution to not use sensitive data is more ideal.
Select the right environment for your use case
Not all providers are suitable for workflows that could contain higher risk data.
Consider downstream LLM influence
If LLM output controls any future LLM calls or features, the risk of prompt injection/manipulation of those features should be considered.
Apply least privilege
Assume that LLMs do not have the same authorization boundaries as other systems do and may allow information to be exposed to users who are not authorized to view the information. When possible have the LLM operate under the same or less permissions as the user calling it. For example, a user with READ access should not be using an LLM with READ+WRITE as they could make unauthorized changes
Use parameterized queries or break up queries
Consider allowing the LLM to influence only certain segments of the query in a structured manner, such as enabling specific parameterizing SQL fields. By splitting up LLM tasks, it becomes harder for an attacker to chain prompts together to manipulate features For example, if an LLM is in charge of finding a relevant document and summarizing its contents, have one query to select the best document and another query which focuses on summarizing the document.
Rate limit to prevent model abuse and resource consumption
Expect users to attempt to abuse your endpoints for free tokens (specifically free form text in and out). This can be solved by rate limiting queries, by structuring the input or output into something other than freeform text, and by ensuring a good system prompt. Limits must be enforced at the user level to ensure organizational rate limits are not exceeded. Similarly context size for large prompts and conversation length should have mandatory upper bound quotas.
Sanitize training data
Any knowledge base / information the Gen AI has been trained on can be reproduced in its output at inference time. Do not train on sensitive data or PII
Prefer human reviewers where feasible
While LLM performance has significantly improved, it is not infallible. Human reviewers should make final decisions in high-impact or user-facing scenarios. Example: When generating product descriptions, ensure that merchants can review and edit the output before it is published.

Why did we need these principles?

Interest in using Generative AI to improve our products has been an ever growing experiment at Block. We kept a close eye on the development of these AI powered integrations. Part of that was building an approval process for these new applications and integrations. Unsurprisingly, we eventually found ourselves with several hundred approval requests as the ecosystem accelerated.

We knew that a manual review of all these integrations would be unfeasible, but we also didn't want dangerous patterns to develop and multiply. To solve this problem, we came up with the Gen AI principles which we have found extremely useful in securing Gen AI integrations across our organization earlier in the development lifecycle. Development teams use this as a checklist in the design phase of their dev cycle, and are able to mitigate concerns early on that had traditionally been addressed during the AI integration review process. This helped speed up the review process.

How did we come up with them?

Our Gen AI security principles are based a few assumptions observed from the current state of LLMs:

  1. LLMs are not General Artificial Intelligence, they can be tricked or confused. They can lose track of their overall prime directive, and will generally always have a failure rate.
  2. LLMs operate in an opaque manner, meaning they operate as a black box that can perform some processing and spit out an answer.
  3. LLMs and systems leveraging LLMs mix instructions and data.
  4. Any input is potentially an instruction for the LLM (including images!)

Common mistakes when building AI

After reviewing hundreds of implementations we boiled down AI specific security concerns in the following broad buckets:

Irreversible Actions

While we have been using ML for a long time at Block, new GenAI integrations have expanded opportunities for automation in places where not previously possible. Ensuring we have proper flows for handling AI failure is paramount. Ideally, we would keep LLMs in easy to reverse flows wherever possible such that mistakes can be reversed.

Authorization Smuggling

LLMs are not good at authorization. Attempting to use an LLM for permission checks is dangerous. Checks should always be performed outside the context of an LLM as LLMs, despite improvements, remain susceptible to manipulation through attacker and insider instructions.

Executing Generated Content

Injecting user generated code into a user's session is dangerous because of classic web security problems. Instead of having an LLM write code, have it write a config for a component library via a Domain-Specific Language or schema (see Slack BlockKit). This adds guardrails and prevents LLMs from making mistakes minimizing risk of injections and breakages.

LLM Data Inheritance

Everything is a prompt injection vector. When working with LLMs, user inputs or any other incoming data (like documents or tool calls) carries prompt injection risks. Always treat data as untrusted even when it has passed through internal systems. Avoid freeform text fields, and conform to predefined schemas whenever possible.

Outcomes

We believe that clear AI security principles enable teams in designing products, apps, and services securely against GenAI risks. AI security is just a component to proper product security. Sharing these principles in checklist format helps keep expectations clear, affording teams the ability to plan mitigations early and to keep security at the table from day one. GenAI can expand what we build, but only if these requirements are visible, standard, and enforced.

$cat tags
$