Core Concepts
=============

Understanding these concepts will help you effectively use ProbeLLM and extend it for your needs.

Monte Carlo Tree Search (MCTS)
-------------------------------

**Why MCTS for Vulnerability Detection?**

Traditional fuzzing generates random inputs, but LLM input spaces are **vast and semantic**. MCTS addresses this by:

1. **Guided Exploration**: Uses UCB1 to balance exploration vs exploitation
2. **Adaptive Budget**: Focuses compute on promising failure regions
3. **Tree Structure**: Maintains history for analysis and replay

**MCTS Phases in ProbeLLM**:

.. code-block:: text

   Phase 1: Selection
   ──────────────────
   Start from root → navigate tree using UCB1
   UCB1(node) = error_rate + C × sqrt(ln(parent_visits) / node_visits)
   
   → Selects nodes with high error_rate OR low visit count

   Phase 2: Expansion
   ──────────────────
   At selected node:
   1. Tool planning: LLM chooses tool (perturbation/python_exec/web_search)
   2. Tool execution: Generate new test case
   3. Answer generation: Create ground truth for synthetic question
   4. Add child node to tree

   Phase 3: Simulation
   ────────────────────
   Test the target model:
   1. Send query to model-under-test
   2. Compare prediction vs ground truth (using judge LLM)
   3. Record: correct / error + reasoning

   Phase 4: Backpropagation
   ─────────────────────────
   Update statistics from leaf → root:
   - visits += 1
   - error_count += 1 (if error detected)
   
   → Future selections favor branches with high error rates

**Stopping Criteria**:

- Reaches ``num_simulations`` iterations
- Exhausts sample budget (``num_samples``)
- Maximum tree depth (``max_depth``)

Micro vs Macro Search
----------------------

ProbeLLM uses **dual-strategy MCTS** to balance local exploitation and global exploration.

Micro Search (Local Exploitation)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

**Goal**: Find variations of known failures

**Strategy**:
- Uses ``perturbation`` tool (paraphrasing, reformulation)
- Stays in "trust region" around existing failures
- Preserves semantic content but changes surface form

**Example**:

.. code-block:: text

   Original failure:
   Q: "What is the capital of France?"
   Model: "London" ❌
   
   Micro-generated variations:
   - "Name the capital city of France."
   - "France's capital is which city?"
   - "Multiple choice: France's capital: (A) Berlin (B) Paris (C) Rome"

**When to use**: When you have a known failure and want to understand its **robustness boundary**.

Macro Search (Global Exploration)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

**Goal**: Discover **distant** failure modes

**Strategy**:
- Uses ``web_search`` tool + greedy-k-center sampling
- Selects topics maximally different from seen failures (embedding distance)
- Generates questions on novel domains

**Example**:

.. code-block:: python

   Existing failures (MMLU physics):
   - Kinematics, thermodynamics, optics
   
   Macro-generated topics:
   - Quantum entanglement (far from classical mechanics)
   - Astrophysics (different scale, different intuitions)

**When to use**: When you want to expand test coverage beyond the initial failure set.

**Greedy-K-Center Algorithm**:

.. code-block:: text

   # Pseudo-code
   embeddings = [embed(q) for q in seen_failures]
   selected = [random_pick(embeddings)]
   
   for i in range(k-1):
       # Find point farthest from any selected point
       distances = [min(dist(e, s) for s in selected) for e in embeddings]
       next_idx = argmax(distances)
       selected.append(embeddings[next_idx])
   
   # Use selected failures to prompt web_search tool
   -> Generate question on distant topic

Tool Selection Strategy
-----------------------

During expansion, ProbeLLM uses an **LLM-based planner** to choose the appropriate tool.

**Planning Prompt**:

.. code-block:: text

   "You are a tool planning expert. Based on the question, choose the best tool.
   
   Available tools: perturbation, python_exec, web_search
   
   Return JSON: {tool: ..., args: {...}, purpose: ...}"

**Decision Factors**:

.. list-table::
   :header-rows: 1
   :widths: 20 40 40

   * - Tool
     - When Selected
     - Example Trigger
   * - ``perturbation``
     - Nearby search, paraphrasing
     - "similar question but different form"
   * - ``python_exec``
     - Computational/algorithmic tasks
     - "needs calculation", "code execution"
   * - ``web_search``
     - Factual knowledge, far search
     - "needs external knowledge", "novel domain"

**Why LLM planning?**

- **Adaptive**: Tool choice depends on question characteristics
- **Explainable**: ``purpose`` field explains reasoning
- **Extensible**: Add new tools → planner automatically considers them

Test Case Validity
------------------

Not all generated test cases are valid. ProbeLLM uses **multi-stage validation**:

1. **Generation-Time Validation** (``TestCaseGenerator``):
   
   .. code-block:: python
   
      def generate_nearby(q, a):
          candidate = llm.generate(...)
          verdict = llm.check(candidate)  # {valid: bool, reasons: [str]}
          
          if not verdict["valid"]:
              # Log failure, potentially retry
              pass
          
          return candidate

2. **Answer-Time Validation** (``AnswerGenerator``):
   
   - Ensures answer is non-empty
   - Detects format specification leakage (e.g., "mapping: {0: 'A', ...}")
   - Retry with feedback if invalid

3. **Execution-Time Validation**:
   
   - For ``python_exec``: Timeout (6s), sandbox constraints
   - LLM-based error correction (up to 3 retries)

4. **Judge-Time Validation**:
   
   - Strict factual equivalence check
   - Lenient on formatting/wording

Ground Truth Generation
------------------------

For **synthetic test cases**, we need ground truth. ProbeLLM uses:

**Strategy**:

1. **Tool selection**: Choose ``web_search`` (factual) or ``python_exec`` (computational)
2. **Evidence retrieval**: Execute tool → gather context
3. **LLM synthesis**:
   
   .. code-block:: python
   
      prompt = f"""Question: {question}
      Evidence: {tool_output}
      
      Provide concise answer (<= 3 sentences), cite sources if applicable."""
      
      answer = llm.generate(prompt)

4. **Validation**: Check answer is substantive (not metadata/format specs)

**Confidence Tracking**:

Each generated answer includes:

- ``answer``: The actual answer text
- ``confidence``: Float 0-1 (LLM self-assessed)
- ``reasoning``: Justification

**Use case**: Filter low-confidence questions or use confidence in scoring.

Checkpoint & Resume
-------------------

Long searches can be interrupted. ProbeLLM supports **resumable search**:

**Checkpoint Structure**:

.. code-block:: text

   {
     "metadata": {
       "dataset_id": "mmlu",
       "last_simulation": 42,
       "timestamp": "2026-01-27T12:00:00"
     },
     "root_state": {
       "visits": 100,
       "error_count": 25,
       "tree_layer_num": [5, 12, 8]
     },
     "nodes": [
       {
         "id": "syn_mmlu_1_0",
         "parent_id": "root_mmlu",
         "depth": 1,
         "sample": {"query": "...", "ground_truth": "..."},
         "visits": 10,
         "error_count": 3,
         "results": [...]
       },
       ...
     ]
   }

**Usage**:

.. code-block:: bash

   from probellm import create_checkpoints, resume_from_checkpoint

   # Create checkpoint from interrupted run
   create_checkpoints("results/run_xxx/")

   # Resume
   resume_from_checkpoint("results/run_xxx/")

**Features**:

- Preserves tree structure + statistics
- Resumes from exact simulation count
- Appends to existing results files
- Rebuilds embeddings.pkl if missing

Token Usage Tracking
--------------------

ProbeLLM tracks token consumption at **every step**:

**Tracked Operations**:

1. **Tool Planning**: Selecting which tool to use
2. **Tool Execution**: Running tool (e.g., LLM calls inside ``perturbation``)
3. **Candidate Generation**: Synthesizing new question
4. **Validation**: Checking question validity
5. **Answer Generation**: Creating ground truth
6. **Model Inference**: Testing model-under-test
7. **Judging**: Comparing prediction vs ground truth

**Result Format**:

.. code-block:: text

   {
     "id": "syn_mmlu_2_5",
     "query": "...",
     "prediction": "...",
     "token_usage": {
       "testcase_gen": {
         "plan": {"prompt_tokens": 123, "completion_tokens": 45, "total_tokens": 168},
         "candidate_generation": { ... },
         "validation": { ... },
         "total_tokens": 512
       },
       "answer_gen": {
         "plan": { ... },
         "answer_generation": { ... },
         "total_tokens": 300
       },
       "model_inference": {"total_tokens": 150},
       "judge": {"total_tokens": 200},
       "total_tokens": 1162
     }
   }

**Use case**: Cost analysis, budget allocation, optimization

Error Classification
--------------------

ProbeLLM records **why** a model failed:

**Judge Output**:

.. code-block:: python

   {
     "correct": false,
     "error_reason": "Model claimed Paris is in Germany, which is factually incorrect.",
     "correct_reason": ""  # Empty for errors
   }

**Analysis** (``pcaAnalysisEnhanced.py``):

- Clusters errors by embedding similarity
- Identifies **systematic failure patterns**
- Generates human-readable reports

**Example Clusters**:

.. code-block:: text

   Cluster 0 (23 errors): "Capital city confusion"
     - Model consistently swaps European capitals
     - Likely memorization issue
   
   Cluster 1 (18 errors): "Unit conversion"
     - Fails when converting meters → feet
     - Suggests training data bias (metric vs imperial)

MCP Tool Protocol
-----------------

ProbeLLM tools follow **Model Context Protocol** (MCP) conventions:

**Tool Specification**:

.. code-block:: python

   {
     "name": "tool_name",
     "description": "Human-readable description",
     "inputSchema": {
       "type": "object",
       "properties": {
         "arg1": {"type": "string", "description": "..."},
         "arg2": {"type": "integer", "minimum": 0}
       },
       "required": ["arg1"]
     }
   }

**Tool Response** (JSON-RPC-like envelope):

.. code-block:: python

   # Success
   {
     "result": {...}  # Tool-specific payload
   }
   
   # Error
   {
     "error": {
       "code": -32603,
       "message": "Tool execution failed",
       "data": {"details": "..."}
     }
   }

**Why MCP?**

- **Standardized**: Interoperable with other MCP-compliant systems
- **Typed**: Input schemas enforce validation
- **Extensible**: Easy to add new tools without modifying core

Next Steps
----------

- :doc:`quickstart`: Hands-on tutorial
- :doc:`modules/tools`: Deep dive into tool system
- :doc:`modules/search`: MCTS implementation details
- :doc:`guides/custom_tools`: Build your own tools