ProbeLLM: LLM Vulnerability Detection Toolkit

ProbeLLM is a modular Python toolkit for automated LLM vulnerability detection using Monte Carlo Tree Search (MCTS). It enables researchers and developers to systematically probe language models for edge cases, failure modes, and unexpected behaviors.

Note

ProbeLLM is designed as a library-first toolkit. Every component can be used independently or composed together for complex testing workflows.

Key Features

🎯 Intelligent Test Generation

Uses MCTS-guided search to automatically discover failure cases in LLMs

🔧 MCP-Based Tool System

Extensible tool registry following Model Context Protocol (MCP) for custom test generation strategies

📊 Multi-Strategy Search

Micro search: Local perturbations around known failures
Macro search: Exploration of distant semantic spaces

🔌 Modular Architecture

Independent components (tools, search engine, validators, data loaders)
Mix and match components for your use case
Easy to extend with custom tools and datasets

📈 Built-in Analysis

Automated failure clustering (PCA + DBSCAN)
Statistical benchmarking across datasets
Visualization of search trees and failure patterns

Quick Example

As a Library:

from probellm import VulnerabilityPipelineAsync
from probellm.tools import build_default_tool_registry

# Create pipeline with custom tool registry
registry = build_default_tool_registry(model="gpt-4o-mini")

pipeline = VulnerabilityPipelineAsync(
    model_name="gpt-4o-mini",
    test_model="gpt-4o-mini",  # Model under test
    judge_model="gpt-4o-mini",
    num_simulations=100,
    num_samples=10,
    tool_registry=registry  # Inject custom tools
)

# Add datasets to test
pipeline.add_datasets_batch(['mmlu', 'hellaswag', 'truthful_qa'])

# Run vulnerability search
pipeline.run()
# Results saved to: results/run_<timestamp>/

As a CLI:

# Validate configuration before running
python validate_config.py

# Run search with default configuration
python -m probellm.search

# Analyze results
python -m probellm.analysis results/run_20260126_123456/

Custom Tools Example:

from probellm.tools import ToolRegistry, LocalMCPTool

# Define custom tool
def my_custom_tool(arguments: dict) -> dict:
    query = arguments.get("query", "")
    # Your custom logic here
    return {"result": f"Processed: {query}"}

spec = {
    "name": "my_tool",
    "description": "My custom vulnerability probe",
    "inputSchema": {
        "type": "object",
        "properties": {"query": {"type": "string"}},
        "required": ["query"]
    }
}

registry = ToolRegistry()
registry.register(LocalMCPTool(spec, my_custom_tool))

# Use in test generation
result = registry.call_tool("my_tool", {"query": "test input"})

Why ProbeLLM?

Traditional LLM testing relies on static benchmarks, which:

❌ Don’t adapt to model improvements
❌ Miss edge cases outside predefined test sets
❌ Provide limited insights into failure patterns

ProbeLLM solves this by:

✅ Dynamically generating adversarial test cases during search
✅ Focusing computational budget on promising failure regions (MCTS)
✅ Clustering failures to identify systematic weaknesses
✅ Providing extensible tools for domain-specific testing

Use Cases

Model Evaluation: Systematically probe for weaknesses before deployment
Adversarial Testing: Generate challenging edge cases for robustness testing
Benchmark Augmentation: Expand existing test suites with model-specific failures
Research: Study failure modes and develop mitigation strategies

Documentation Structure

Core Modules

User Guides

Additional Resources

CLI Reference

Community

📖 Documentation: (Available after review)
💬 Issues: (Available after review)
🤝 Contributing: See contributing

License

MIT License - see LICENSE file for details.