ProbeLLM: LLM Vulnerability Detection Toolkitο
ProbeLLM is a modular Python toolkit for automated LLM vulnerability detection using Monte Carlo Tree Search (MCTS). It enables researchers and developers to systematically probe language models for edge cases, failure modes, and unexpected behaviors.
Note
ProbeLLM is designed as a library-first toolkit. Every component can be used independently or composed together for complex testing workflows.
Key Featuresο
- π― Intelligent Test Generation
Uses MCTS-guided search to automatically discover failure cases in LLMs
- π§ MCP-Based Tool System
Extensible tool registry following Model Context Protocol (MCP) for custom test generation strategies
- π Multi-Strategy Search
Micro search: Local perturbations around known failures
Macro search: Exploration of distant semantic spaces
- π Modular Architecture
Independent components (tools, search engine, validators, data loaders)
Mix and match components for your use case
Easy to extend with custom tools and datasets
- π Built-in Analysis
Automated failure clustering (PCA + DBSCAN)
Statistical benchmarking across datasets
Visualization of search trees and failure patterns
Quick Exampleο
As a Library:
from probellm import VulnerabilityPipelineAsync
from probellm.tools import build_default_tool_registry
# Create pipeline with custom tool registry
registry = build_default_tool_registry(model="gpt-4o-mini")
pipeline = VulnerabilityPipelineAsync(
model_name="gpt-4o-mini",
test_model="gpt-4o-mini", # Model under test
judge_model="gpt-4o-mini",
num_simulations=100,
num_samples=10,
tool_registry=registry # Inject custom tools
)
# Add datasets to test
pipeline.add_datasets_batch(['mmlu', 'hellaswag', 'truthful_qa'])
# Run vulnerability search
pipeline.run()
# Results saved to: results/run_<timestamp>/
As a CLI:
# Validate configuration before running
python validate_config.py
# Run search with default configuration
python -m probellm.search
# Analyze results
python -m probellm.analysis results/run_20260126_123456/
Custom Tools Example:
from probellm.tools import ToolRegistry, LocalMCPTool
# Define custom tool
def my_custom_tool(arguments: dict) -> dict:
query = arguments.get("query", "")
# Your custom logic here
return {"result": f"Processed: {query}"}
spec = {
"name": "my_tool",
"description": "My custom vulnerability probe",
"inputSchema": {
"type": "object",
"properties": {"query": {"type": "string"}},
"required": ["query"]
}
}
registry = ToolRegistry()
registry.register(LocalMCPTool(spec, my_custom_tool))
# Use in test generation
result = registry.call_tool("my_tool", {"query": "test input"})
Why ProbeLLM?ο
Traditional LLM testing relies on static benchmarks, which:
β Donβt adapt to model improvements
β Miss edge cases outside predefined test sets
β Provide limited insights into failure patterns
ProbeLLM solves this by:
β Dynamically generating adversarial test cases during search
β Focusing computational budget on promising failure regions (MCTS)
β Clustering failures to identify systematic weaknesses
β Providing extensible tools for domain-specific testing
Use Casesο
Model Evaluation: Systematically probe for weaknesses before deployment
Adversarial Testing: Generate challenging edge cases for robustness testing
Benchmark Augmentation: Expand existing test suites with model-specific failures
Research: Study failure modes and develop mitigation strategies
Documentation Structureο
Getting Started
Core Modules
Additional Resources
Communityο
π Documentation: (Available after review)
π¬ Issues: (Available after review)
π€ Contributing: See contributing
Licenseο
MIT License - see LICENSE file for details.