ProbeLLM: LLM Vulnerability Detection Toolkit

Python 3.8+ License

ProbeLLM is a modular Python toolkit for automated LLM vulnerability detection using Monte Carlo Tree Search (MCTS). It enables researchers and developers to systematically probe language models for edge cases, failure modes, and unexpected behaviors.

Note

ProbeLLM is designed as a library-first toolkit. Every component can be used independently or composed together for complex testing workflows.

Key Features

🎯 Intelligent Test Generation

Uses MCTS-guided search to automatically discover failure cases in LLMs

πŸ”§ MCP-Based Tool System

Extensible tool registry following Model Context Protocol (MCP) for custom test generation strategies

πŸ“Š Multi-Strategy Search
  • Micro search: Local perturbations around known failures

  • Macro search: Exploration of distant semantic spaces

πŸ”Œ Modular Architecture
  • Independent components (tools, search engine, validators, data loaders)

  • Mix and match components for your use case

  • Easy to extend with custom tools and datasets

πŸ“ˆ Built-in Analysis
  • Automated failure clustering (PCA + DBSCAN)

  • Statistical benchmarking across datasets

  • Visualization of search trees and failure patterns

Quick Example

As a Library:

from probellm import VulnerabilityPipelineAsync
from probellm.tools import build_default_tool_registry

# Create pipeline with custom tool registry
registry = build_default_tool_registry(model="gpt-4o-mini")

pipeline = VulnerabilityPipelineAsync(
    model_name="gpt-4o-mini",
    test_model="gpt-4o-mini",  # Model under test
    judge_model="gpt-4o-mini",
    num_simulations=100,
    num_samples=10,
    tool_registry=registry  # Inject custom tools
)

# Add datasets to test
pipeline.add_datasets_batch(['mmlu', 'hellaswag', 'truthful_qa'])

# Run vulnerability search
pipeline.run()
# Results saved to: results/run_<timestamp>/

As a CLI:

# Validate configuration before running
python validate_config.py

# Run search with default configuration
python -m probellm.search

# Analyze results
python -m probellm.analysis results/run_20260126_123456/

Custom Tools Example:

from probellm.tools import ToolRegistry, LocalMCPTool

# Define custom tool
def my_custom_tool(arguments: dict) -> dict:
    query = arguments.get("query", "")
    # Your custom logic here
    return {"result": f"Processed: {query}"}

spec = {
    "name": "my_tool",
    "description": "My custom vulnerability probe",
    "inputSchema": {
        "type": "object",
        "properties": {"query": {"type": "string"}},
        "required": ["query"]
    }
}

registry = ToolRegistry()
registry.register(LocalMCPTool(spec, my_custom_tool))

# Use in test generation
result = registry.call_tool("my_tool", {"query": "test input"})

Why ProbeLLM?

Traditional LLM testing relies on static benchmarks, which:

  • ❌ Don’t adapt to model improvements

  • ❌ Miss edge cases outside predefined test sets

  • ❌ Provide limited insights into failure patterns

ProbeLLM solves this by:

  • βœ… Dynamically generating adversarial test cases during search

  • βœ… Focusing computational budget on promising failure regions (MCTS)

  • βœ… Clustering failures to identify systematic weaknesses

  • βœ… Providing extensible tools for domain-specific testing

Use Cases

  • Model Evaluation: Systematically probe for weaknesses before deployment

  • Adversarial Testing: Generate challenging edge cases for robustness testing

  • Benchmark Augmentation: Expand existing test suites with model-specific failures

  • Research: Study failure modes and develop mitigation strategies

Documentation Structure

Additional Resources

Community

  • πŸ“– Documentation: (Available after review)

  • πŸ’¬ Issues: (Available after review)

  • 🀝 Contributing: See contributing

License

MIT License - see LICENSE file for details.

Indices and tables