Security Validations
Security Validations
1. Detect Secrets
Purpose: Prevents accidental sharing of sensitive credentials and keys.
What It Detects: - API Keys - Authentication Tokens - Passwords - SSH Keys - Database Connection Strings
Example Block Message: Whenever the safeguards system detects secrets following message will be displayed to the user. "🔐 -> Potential sensitive information detected: Please ensure you're not sharing any confidential data, passwords, or access keys."
Figure 1: Secret Detection Block Message
2. Detect PII (Personal Identifiable Information)
Purpose: Protects personal and sensitive information from exposure.
What It Detects:
Personal Information
- 📝 Social Security Numbers (SSN)
- 💳 Credit Card Numbers
- 📧 Email Addresses
- 📱 Phone Numbers (International formats)
- 🏠 Physical Addresses
- 🛂 Passport Numbers
- 🚗 Driver's License Numbers
- 📅 Birth Dates
- 👤 Person Names (First, Middle, Last)
Financial Information
- 🏦 Bank Account Numbers
- 💰 IBAN Codes
- 💵 Swift Codes
- 💳 CVV Numbers
Medical Information
- 🏥 Medical License Numbers
- 📋 Medical Record Numbers
Location Information
- 📍 IP Addresses
- 📫 ZIP/Postal Codes
- 🌍 GPS Coordinates
- 🏢 Location Identifiers
Government Identifiers
- 🪪 National ID Numbers
- 🏛️ Government Official Numbers
Digital Identifiers
- 💻 MAC Addresses
- 🌐 URLs containing personal info
- 📱 Device IDs
- 🔑 Cryptocurrency Addresses
Professional Information
- 👔 Employee Numbers
- 🏢 Corporate Email Patterns
Cultural Identifiers
- 🌍 Nationality
- 🗣️ Ethnicity
- ⛪ Religious Identifiers
Example Block Message:
Whenever the safeguards system detects personally identifiable information following message will be displayed to the user:
"**🔒 -> Personal information detected: For your privacy and security please avoid sharing sensitive information."
Example Warn Message:
3. Sentiment Analysis
Purpose: Maintains professional communication standards and prevents harmful content.
Monitors For: - Hostile Language - Inappropriate Content - Unprofessional Tone - Harassment - Discriminatory Language
Threshold Settings: - Low Risk (0.3): Minor unprofessional language - Medium Risk (0.6): Concerning tone or content - High Risk (0.8): Severe violations
4. Unusual Prompt Detection
Purpose: Identifies potentially harmful or suspicious requests.
Monitors For: - Code Injection Attempts - Prompt Engineering Attacks - System Command Requests - Policy Violation Attempts
5. DetectSafeUnsafePrompt
Purpose: Blocks prompts that attempt to request unsafe or inappropriate content across multiple models, ensuring adherence to ethical and legal guidelines.
Capabilities:
DetectSafeUnsafePrompt is a robust system that identifies and blocks unsafe or harmful prompts across 13 categories, including both text-based and image-based inputs. It ensures that communication and content generation comply with organizational policies, ethical standards, and regulatory requirements.
Supported Categories:
S1: Violent Crimes
S2: Non-Violent Crimes
S3: Sex-Related Crimes
S4: Child Sexual Exploitation
S5: Defamation
S6: Specialized Advice
S7: Privacy
S8: Intellectual Property
S9: Indiscriminate Weapons
S10: Hate
S11: Suicide & Self-Harm
S12: Sexual Content
S13: Elections
Example Block Messages: ⚠️Safety check failed: Your request contains potentially harmful, unsafe, or inappropriate content.
Note: For DetectSafeUnsafePrompt to function correctly, ensure that Llama Guard is deployed and its URL is correctly set in the
.env
file. Add the following line to your.env
file:Adjust the URL as needed based on your deployment configuration. For detailed deployment instructions, please refer to the Llama Guard Deployment Guide.