Skip to content

Security Validations

Security Validations

1. Detect Secrets

Purpose: Prevents accidental sharing of sensitive credentials and keys.

What It Detects: - API Keys - Authentication Tokens - Passwords - SSH Keys - Database Connection Strings

Example Block Message: Whenever the safeguards system detects secrets following message will be displayed to the user. "🔐 -> Potential sensitive information detected: Please ensure you're not sharing any confidential data, passwords, or access keys."

Figure 1: Secret Detection Block Message

2. Detect PII (Personal Identifiable Information)

Purpose: Protects personal and sensitive information from exposure.

What It Detects:

Personal Information

  • 📝 Social Security Numbers (SSN)
  • 💳 Credit Card Numbers
  • 📧 Email Addresses
  • 📱 Phone Numbers (International formats)
  • 🏠 Physical Addresses
  • 🛂 Passport Numbers
  • 🚗 Driver's License Numbers
  • 📅 Birth Dates
  • 👤 Person Names (First, Middle, Last)

Financial Information

  • 🏦 Bank Account Numbers
  • 💰 IBAN Codes
  • 💵 Swift Codes
  • 💳 CVV Numbers

Medical Information

  • 🏥 Medical License Numbers
  • 📋 Medical Record Numbers

Location Information

  • 📍 IP Addresses
  • 📫 ZIP/Postal Codes
  • 🌍 GPS Coordinates
  • 🏢 Location Identifiers

Government Identifiers

  • 🪪 National ID Numbers
  • 🏛️ Government Official Numbers

Digital Identifiers

  • 💻 MAC Addresses
  • 🌐 URLs containing personal info
  • 📱 Device IDs
  • 🔑 Cryptocurrency Addresses

Professional Information

  • 👔 Employee Numbers
  • 🏢 Corporate Email Patterns

Cultural Identifiers

  • 🌍 Nationality
  • 🗣️ Ethnicity
  • ⛪ Religious Identifiers

Example Block Message: Whenever the safeguards system detects personally identifiable information following message will be displayed to the user: "**🔒 -> Personal information detected: For your privacy and security please avoid sharing sensitive information." Example Warn Message:

3. Sentiment Analysis

Purpose: Maintains professional communication standards and prevents harmful content.

Monitors For: - Hostile Language - Inappropriate Content - Unprofessional Tone - Harassment - Discriminatory Language

Threshold Settings: - Low Risk (0.3): Minor unprofessional language - Medium Risk (0.6): Concerning tone or content - High Risk (0.8): Severe violations

Example Block Message

4. Unusual Prompt Detection

Purpose: Identifies potentially harmful or suspicious requests.

Monitors For: - Code Injection Attempts - Prompt Engineering Attacks - System Command Requests - Policy Violation Attempts

Example Block Message

5. DetectSafeUnsafePrompt

Purpose: Blocks prompts that attempt to request unsafe or inappropriate content across multiple models, ensuring adherence to ethical and legal guidelines.

Capabilities:
DetectSafeUnsafePrompt is a robust system that identifies and blocks unsafe or harmful prompts across 13 categories, including both text-based and image-based inputs. It ensures that communication and content generation comply with organizational policies, ethical standards, and regulatory requirements.

Supported Categories:

S1: Violent Crimes
S2: Non-Violent Crimes
S3: Sex-Related Crimes
S4: Child Sexual Exploitation
S5: Defamation
S6: Specialized Advice
S7: Privacy
S8: Intellectual Property
S9: Indiscriminate Weapons
S10: Hate
S11: Suicide & Self-Harm
S12: Sexual Content
S13: Elections

Example Block Messages: ⚠️Safety check failed: Your request contains potentially harmful, unsafe, or inappropriate content.

  1. Text Prompt Example:
  2. Image Prompt Example:

Note: For DetectSafeUnsafePrompt to function correctly, ensure that Llama Guard is deployed and its URL is correctly set in the .env file. Add the following line to your .env file:

LLAMA_GUARD_URL=http://localhost:8888

Adjust the URL as needed based on your deployment configuration. For detailed deployment instructions, please refer to the Llama Guard Deployment Guide.