Architecture Overview
ProbeLLM follows a modular, extensible architecture where each component can operate independently or be composed into complex workflows.
High-Level Architecture
┌─────────────────────────────────────────────────────────────┐
│ ProbeLLM Toolkit │
└─────────────────────────────────────────────────────────────┘
│
┌──────────────────────┼──────────────────────┐
│ │ │
┌────▼─────┐ ┌─────▼──────┐ ┌─────▼──────┐
│ Tools │ │ Search │ │ Validation │
│ Layer │◄────────┤ Engine │ │ System │
└────┬─────┘ └─────┬──────┘ └────────────┘
│ │
│ ┌──────▼──────────┐
│ │ Data Loader │
│ └─────────────────┘
│
┌────▼──────────────────────────────┐
│ MCP Tool Registry (Extensible) │
│ • perturbation • python_exec │
│ • web_search • custom_tool │
└───────────────────────────────────┘
Core Components
1. Tool Layer (MCP-Based)
Purpose: Provide pluggable test generation strategies
Location: probellm/tools/
Key Classes:
ToolRegistry: Central registry managing all available toolsLocalMCPTool: Wrapper conforming to MCP protocolbuild_default_tool_registry(): Factory for built-in tools
Built-in Tools:
Tool Name |
Purpose |
Use Case |
|---|---|---|
|
Semantic-preserving rewording |
Micro-search (local exploration) |
|
Execute Python code for computation |
Math/algorithmic questions |
|
Retrieve external knowledge |
Factual questions, macro-search |
Extension Point: Add custom tools by registering with ToolRegistry
from probellm.tools import ToolRegistry, LocalMCPTool
def my_tool(args: dict) -> dict:
return {"result": "custom output"}
spec = {"name": "my_tool", "description": "...", ...}
registry = ToolRegistry()
registry.register(LocalMCPTool(spec, my_tool))
2. Search Engine (MCTS)
Purpose: Intelligently explore the space of test cases
Location: probellm/search.py
Key Classes:
VulnerabilityPipelineAsync: Main search orchestratorRootNode: Represents a dataset rootSyntheticNode: Generated test case nodeTestCaseGenerator: Uses tools to create new testsAnswerGenerator: Generates ground truth for synthetic questions
Search Strategies:
MCTS Loop:
┌──────────────────────────────────────┐
│ 1. Selection (UCB1) │
│ → Choose most promising node │
├──────────────────────────────────────┤
│ 2. Expansion (Tool Selection) │
│ → Generate new test via tool │
├──────────────────────────────────────┤
│ 3. Simulation (Model Inference) │
│ → Test model on new case │
├──────────────────────────────────────┤
│ 4. Backpropagation (Update Stats) │
│ → Update visit counts & errors │
└──────────────────────────────────────┘
Dual-Strategy Search:
Micro:
perturbationtool → stay in trust region around failuresMacro:
web_search+ greedy-k-center sampling → explore distant semantic spaces
3. Data Loader
Purpose: Unified interface to benchmark datasets
Location: dataloader/
Key Components:
YAMLDatasetLoader: Readsdatasets_config.yamlDatasetInterface: Provides structured dataset accessHierarchicalSampler: Ensures balanced sampling across subsetsdatasets_config.yaml: Declarative dataset configuration
Supported Datasets (out of the box):
MMLU (5 subjects)
SuperGLUE (5 tasks)
HellaSwag
TruthfulQA
MBPP (code generation)
Custom Dataset Support: See Custom Datasets Guide
4. Validation System
Purpose: Pre-flight checks before expensive searches
Location: probellm/validate.py, validate_config.py
Checks:
Check Type |
Details |
|---|---|
Dependencies |
|
Environment Variables |
|
YAML Schema |
Valid dataset configuration structure |
Hard-coded Secrets |
Warns if API keys found in code (heuristic) |
Usage:
python validate_config.py --limit-datasets 5
Data Flow
Typical Search Flow:
┌──────────────┐
│ User Script │
└──────┬───────┘
│
┌──────▼─────────────────────────────────────────┐
│ VulnerabilityPipelineAsync │
│ • add_datasets_batch() │
│ • run() → _run_concurrent() │
└──────┬─────────────────────────────────────────┘
│
┌──────▼──────────────────────────────────┐
│ DatasetInterface.load_dataset_structured│
│ → Returns (dataset_index, sample_store)│
└──────┬──────────────────────────────────┘
│
┌──────▼────────────────────────────┐
│ HierarchicalSampler │
│ → Balanced sampling plan │
└──────┬────────────────────────────┘
│
┌──────▼───────────────────────────────┐
│ init_tree_async() │
│ → Find initial failures → MCTS root │
└──────┬───────────────────────────────┘
│
┌──────▼────────────────────────────────────┐
│ MCTS Loop (_mcts_search_async) │
│ ┌─────────────────────────────────────┐ │
│ │ Select → Expand → Simulate → Back │ │
│ │ ▲ │ │ │
│ │ └────────────────────────┘ │ │
│ └─────────────────────────────────────┘ │
│ │
│ Expansion uses: │
│ • TestCaseGenerator (→ Tool Registry) │
│ • AnswerGenerator (→ Tool Registry) │
└──────┬─────────────────────────────────────┘
│
┌──────▼────────────────────┐
│ Results JSON (per dataset)│
│ + Tree visualization │
└───────────────────────────┘
Tool Invocation During Expansion:
TestCaseGenerator.generate_nearby(q, a)
│
┌──────▼──────────────────────┐
│ _plan_tool_nearby() │
│ → LLM selects tool │
└──────┬──────────────────────┘
│
┌──────▼──────────────────────────────┐
│ ToolRegistry.call_tool(name, args) │
│ → Executes tool handler │
└──────┬──────────────────────────────┘
│
┌──────▼─────────────────────┐
│ Tool result → LLM synthesis│
│ → New question candidate │
└────────────────────────────┘
Extensibility Points
Custom Tools (Custom Tool Integration Guide)
Implement handler function
Define MCP-compatible spec
Register with
ToolRegistry
Custom Datasets (Custom Datasets Guide)
Add entry to
datasets_config.yamlSpecify
load_params,key_mapping,ground_truthmapping
Custom Samplers (
dataloader/sampler.py)Inherit from
BaseSamplerImplement
_build_plan()
Custom Search Strategies
Override
_select()/_expand_async()inVulnerabilityPipelineAsyncOr: Use tools to inject custom expansion logic
Design Principles
Modularity: Each component has a single responsibility
Extensibility: Easy to add new tools, datasets, samplers
Async-First: Concurrent execution for speed (
asyncio)MCP-Compatible: Tools follow standardized protocol
Library-First: Every feature accessible programmatically
Fail-Safe: Validation and error handling at all layers
Next Steps
Core Concepts: Understand core concepts (MCTS, tool selection, etc.)
Quickstart: Hands-on tutorial
Tool System (MCP-Based): Deep dive into tool system