Skip to content

Security Validations

Security Validations

1. Detect Secrets

Purpose: Prevents accidental sharing of sensitive credentials and keys.

What It Detects: - API Keys - Authentication Tokens - Passwords - SSH Keys - Database Connection Strings

Example Block Message: Whenever the safeguards system detects secrets following message will be displayed to the user. "🔐 -> Potential sensitive information detected: Please ensure you're not sharing any confidential data, passwords, or access keys."

Figure 1: Secret Detection Block Message

2. Detect PII (Personal Identifiable Information)

Purpose: Protects personal and sensitive information from exposure.

What It Detects:

Personal Information

  • 📝 Social Security Numbers (SSN)
  • 💳 Credit Card Numbers
  • 📧 Email Addresses
  • 📱 Phone Numbers (International formats)
  • 🏠 Physical Addresses
  • 🛂 Passport Numbers
  • 🚗 Driver's License Numbers
  • 📅 Birth Dates
  • 👤 Person Names (First, Middle, Last)

Financial Information

  • 🏦 Bank Account Numbers
  • 💰 IBAN Codes
  • 💵 Swift Codes
  • 💳 CVV Numbers

Medical Information

  • 🏥 Medical License Numbers
  • 📋 Medical Record Numbers

Location Information

  • 📍 IP Addresses
  • 📫 ZIP/Postal Codes
  • 🌍 GPS Coordinates
  • 🏢 Location Identifiers

Government Identifiers

  • 🪪 National ID Numbers
  • 🏛️ Government Official Numbers

Digital Identifiers

  • 💻 MAC Addresses
  • 🌐 URLs containing personal info
  • 📱 Device IDs
  • 🔑 Cryptocurrency Addresses

Professional Information

  • 👔 Employee Numbers
  • 🏢 Corporate Email Patterns

Cultural Identifiers

  • 🌍 Nationality
  • 🗣️ Ethnicity
  • ⛪ Religious Identifiers

Example Block Message: Whenever the safeguards system detects personally identifiable information following message will be displayed to the user: "**🔒 -> Personal information detected: For your privacy and security please avoid sharing sensitive information." Example Warn Message:

3. Sentiment Analysis

Purpose: Maintains professional communication standards and prevents harmful content.

Monitors For: - Hostile Language - Inappropriate Content - Unprofessional Tone - Harassment - Discriminatory Language

Threshold Settings: - Low Risk (0.3): Minor unprofessional language - Medium Risk (0.6): Concerning tone or content - High Risk (0.8): Severe violations

Example Block Message

4. Unusual Prompt Detection

Purpose: Identifies potentially harmful or suspicious requests.

Monitors For: - Code Injection Attempts - Prompt Engineering Attacks - System Command Requests - Policy Violation Attempts

Example Block Message

5. DetectSafeUnsafePrompt

Purpose: Blocks prompts that attempt to request unsafe or inappropriate content across multiple models, ensuring adherence to ethical and legal guidelines.

Capabilities:
DetectSafeUnsafePrompt is a robust system that identifies and blocks unsafe or harmful prompts across 13 categories, including both text-based and image-based inputs. It ensures that communication and content generation comply with organizational policies, ethical standards, and regulatory requirements.

Note: This guardrail is multimodal and works with prompts containing both text and images together. It does not work with image-only inputs. It can analyze and validate prompts that combine textual and visual content.

Supported Categories:

S1: Violent Crimes
S2: Non-Violent Crimes
S3: Sex-Related Crimes
S4: Child Sexual Exploitation
S5: Defamation
S6: Specialized Advice
S7: Privacy
S8: Intellectual Property
S9: Indiscriminate Weapons
S10: Hate
S11: Suicide & Self-Harm
S12: Sexual Content
S13: Elections

Example Block Messages: ⚠️Safety check failed: Your request contains potentially harmful, unsafe, or inappropriate content.

  1. Text Prompt Example:
  2. Image Prompt Example:

Note: For DetectSafeUnsafePrompt to function correctly, ensure that Llama Guard is deployed and its URL is correctly set in the .env file. Add the following line to your .env file:

LLAMA_GUARD_URL=http://localhost:8888

Adjust the URL as needed based on your deployment configuration. For detailed deployment instructions, please refer to the Llama Guard Deployment Guide.

6. DenyList GuardRail

Purpose & Use Cases: The DenyList plugin lets you centrally define and enforce a blacklist of words, phrases, domains, URLs or other terms—such as, project codenames, internal IPs and servers, confidential budgets, or profanity—so that any prompt containing those entries is automatically blocked before processing. It’s ideal for preventing disclosure of sensitive or proprietary information and for filtering out inappropriate language in prompts without relying on LLMs.

Configuration:

  1. Navigate to the DenyList section in the Admin Panel.

  2. Enter the word or phrase you want to block.

    To add multiple entries at once, place each on its own line, then press Ctrl + Enter or click Add.

  3. View current entries in the list below.

  4. Remove entries by clicking the delete button next to each item.

Examples:

Words to block:
- "ProjectCodename2024"
- "www.facebook.com"
- "internal-server-ip"
- "confidential-budget"

How It Works:

  • Exact Matching: Blocks the exact word or phrase as specified

  • Case Sensitive: "SECRET" and "secret" are treated as different entries

  • Complete Word Matching: DenyList only supports exact word matching (e.g., "password123" will not be blocked if only "password" is in the DenyList)

Note: Words containing spaces are not supported.

Best Practices:

  • Enter only those keywords which you want to exact match.

  • Use specific phrases rather than common words

  • Keep the size of list considerable

Administration

  • Admin Only: Only administrator users can configure or modify the DenyList GuardRail.
  • Onboarding Portal: If you accessed the system through the onboarding portal, you can disable the plugin through Admin Panel.

7. DenyRegex GuardRail

Purpose: The DenyRegex GuardRail lets you configure custom regular‑expression patterns that are evaluated locally on each prompt. Here, you can add the exact patterns you want to block—such as proprietary codes, credit card numbers, or other sensitive data—and they will be automatically detected and prevented from ever reaching the LLM.

Use Cases:

  • Block social security numbers: \d{3}-\d{2}-\d{4}

  • Filter email addresses: [a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\.[a-zA-Z]{2,}

  • Block phone numbers: \(\d{3}\)\s*\d{3}-\d{4}

  • Filter IP addresses: \b\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3}\b

Configuration:

  1. Navigate to the DenyRegex section in the Configuration Portal.
  2. Enter your regular expression pattern.
  3. (Optional) Add a description by using the format
    pattern|description
    
    where | separates the pattern from its description.
  4. Add multiple patterns at once by placing each (and its optional description) on a new line, then pressing Ctrl + Enter or clicking Add.
  5. Manage existing patterns in the list view—edit or delete entries as needed.

Note: Do not include the pipe character (|) within your regex pattern itself—it’s reserved to separate the pattern from its description.

Common Patterns:

Pattern Type Regex Example Description
Credit Card \d{4}[\s-]?\d{4}[\s-]?\d{4}[\s-]?\d{4} Matches credit card numbers
SSN \d{3}-\d{2}-\d{4} Social Security Numbers
Phone \+?1?[\s-]?\(?[0-9]{3}\)?[\s-]?[0-9]{3}[\s-]?[0-9]{4} US phone numbers
URL https?://[^\s]+ Web URLs

Best Practices:

  • Test regex patterns before deploying

  • Use specific patterns to avoid false positives

  • Add the optional description to each pattern for better validation reports

  • Consider case sensitivity in your patterns

Administration

  • Admin Only: Only administrator users can configure or modify the DenyRegex GuardRail.
  • Onboarding Portal: If you accessed the system through the onboarding portal, you can disable the plugin through Admin Panel.