Security guide
About This Guide
This security guide focuses on iGuard, InspectChat's comprehensive safeguard system designed to protect your organization's data and ensure secure AI interactions. This guide explains the various security measures and how they work to protect your communications.
Understanding iGuard
iGuard is InspectChat's built-in security system that acts as a protective layer between users and AI models. It automatically scans all communications for potential security risks, sensitive information, and policy violations.
Key Functions
- Real-time message scanning
- Automatic detection of sensitive information
- Policy enforcement
- Compliance monitoring
- Immediate blocking of risky communications
Security Validations
1. Detect Secrets
Purpose: Prevents accidental sharing of sensitive credentials and keys.
What It Detects: - API Keys - Authentication Tokens - Passwords - SSH Keys - Database Connection Strings
Example Block Message: Whenever the safeguards system detects secrets following message will be displayed to the user. "🔐 -> Potential sensitive information detected: Please ensure you're not sharing any confidential data, passwords, or access keys."
Figure 1: Secret Detection Block Message
2. Detect PII (Personal Identifiable Information)
Purpose: Protects personal and sensitive information from exposure.
What It Detects:
Personal Information
- 📝 Social Security Numbers (SSN)
- 💳 Credit Card Numbers
- 📧 Email Addresses
- 📱 Phone Numbers (International formats)
- 🏠 Physical Addresses
- 🛂 Passport Numbers
- 🚗 Driver's License Numbers
- 📅 Birth Dates
- 👤 Person Names (First, Middle, Last)
Financial Information
- 🏦 Bank Account Numbers
- 💰 IBAN Codes
- 💵 Swift Codes
- 💳 CVV Numbers
Medical Information
- 🏥 Medical License Numbers
- 📋 Medical Record Numbers
Location Information
- 📍 IP Addresses
- 📫 ZIP/Postal Codes
- 🌍 GPS Coordinates
- 🏢 Location Identifiers
Government Identifiers
- 🪪 National ID Numbers
- 🏛️ Government Official Numbers
Digital Identifiers
- 💻 MAC Addresses
- 🌐 URLs containing personal info
- 📱 Device IDs
- 🔑 Cryptocurrency Addresses
Professional Information
- 👔 Employee Numbers
- 🏢 Corporate Email Patterns
Cultural Identifiers
- 🌍 Nationality
- 🗣️ Ethnicity
- ⛪ Religious Identifiers
Example Block Message:
Whenever the safeguards system detects personally identifiable information following message will be displayed to the user:
"**🔒 -> Personal information detected: For your privacy and security please avoid sharing sensitive information."
Example Warn Message:
3. Sentiment Analysis
Purpose: Maintains professional communication standards and prevents harmful content.
Monitors For: - Hostile Language - Inappropriate Content - Unprofessional Tone - Harassment - Discriminatory Language
Threshold Settings: - Low Risk (0.3): Minor unprofessional language - Medium Risk (0.6): Concerning tone or content - High Risk (0.8): Severe violations
4. Unusual Prompt Detection
Purpose: Identifies potentially harmful or suspicious requests.
Monitors For: - Code Injection Attempts - Prompt Engineering Attacks - System Command Requests - Policy Violation Attempts
5. DetectSafeUnsafePrompt
Purpose: Blocks prompts that attempt to request unsafe or inappropriate content across multiple models, ensuring adherence to ethical and legal guidelines.
Capabilities:
DetectSafeUnsafePrompt is a robust system that identifies and blocks unsafe or harmful prompts across 13 categories, including both text-based and image-based inputs. It ensures that communication and content generation comply with organizational policies, ethical standards, and regulatory requirements.
Supported Categories:
S1: Violent Crimes
S2: Non-Violent Crimes
S3: Sex-Related Crimes
S4: Child Sexual Exploitation
S5: Defamation
S6: Specialized Advice
S7: Privacy
S8: Intellectual Property
S9: Indiscriminate Weapons
S10: Hate
S11: Suicide & Self-Harm
S12: Sexual Content
S13: Elections
Example Block Messages: ⚠️Safety check failed: Your request contains potentially harmful, unsafe, or inappropriate content.
Note: For DetectSafeUnsafePrompt to function correctly, ensure that Llama Guard is deployed and its URL is correctly set in the
.env
file. Add the following line to your.env
file:Adjust the URL as needed based on your deployment configuration. For detailed deployment instructions, please refer to the Llama Guard Deployment Guide.
Configuring Safeguards
Administrators can customize iGuard settings:
- Enable/Disable Validations: Control which checks are active.
- Set Thresholds: Adjust sensitivity levels.
- On Fail Actions: Define system responses (block or warn).
Administrators can customize iGuard settings in real-time through declarative configuration files, without the need to restart the system. This allows for immediate adaptation to new policies or threats, ensuring continuous protection and compliance.
Declarative Configuration for iGuard
Configurations for iGuard are defined in a YAML file, allowing for clear and human-readable settings. Changes to this configuration are applied in real-time, enabling administrators to adjust validations on-the-fly.
Here's an example of how the configuration can be set:
validations:
- name: DetectSecrets
enabled: True
models:
- OpenAI
- Gemini
on_fail: block
- name: DetectPII
enabled: True
models:
- OpenAI
- Gemini
on_fail: block
mode: permissive
- name: Sentiment
enabled: True
models:
- OpenAI
- Gemini
on_fail: block
threshold: 0.5
- name: DetectUnusualPrompt
enabled: True
models:
- OpenAI
- Gemini
on_fail: block
- name: DetectSafeUnsafePrompt
enabled: True
models:
- OpenAI
- Gemini
on_fail: block
Response Actions
The on_fail
Parameter
Determines how the system responds when a validation fails:
- Block: Stops the request and notifies the user.
- Warn: Allows the request but issues a warning.
Block Mode
- Immediately stops the message
- Displays error message
- Logs the incident
Warn Mode
- Shows warning message to the user.
- Logs the warning