Is sandboxing an MCP server enough to make it safe?

Sandboxing (a container with no network, least privilege) limits the blast radius if a server misbehaves, which is genuinely valuable. But it doesn't tell you whether to trust the server, and most useful servers need exactly the access a strict sandbox removes — a database MCP needs your database, a filesystem one needs your files. So sandboxing contains risk; it doesn't assess it. Pair it with a scan that tells you what the server actually does.

Can a scanner catch everything a manual review would?

No. A scanner is fast and consistent and catches the common dangerous patterns — shell execution, credential reads, network egress, download-and-run, prompt injection in tool descriptions. It can miss novel or heavily obfuscated logic that a careful human reviewer would spot. The point of a scanner is to make the cheap, repeatable pass instant so you only spend human review time where it flags something.

What is the fastest safe way to vet an MCP server?

Scan it first (seconds), and let the grade decide your effort: a clean grade on a simple server is usually enough; a medium/high grade tells you exactly which files and lines to review by hand; and for anything you run regularly, sandbox it and re-check on version changes. Scanning narrows where the expensive human and sandbox steps are worth spending.

MCP security guide · reviewed June 20, 2026

Scanning vs sandboxing vs manual review: how to actually vet an MCP server

Short answer: they're not competitors — they catch different things. Scanning tells you what a server does (fast, before install). Sandboxing limits what it can do to you if it misbehaves (but doesn't assess trust). Manual review catches the novel stuff a scanner misses (but is slow). The right move is to layer them — and use the cheap, instant one first to decide where the expensive ones are worth your time.

What each actually catches — and misses

Approach	Catches	Misses	Cost
Scanning (static + AI)	Shell exec, credential/file reads, network egress, download-and-run, install scripts, prompt injection in tool text — with the exact file & line.	Novel or heavily obfuscated logic; intent. It flags patterns, not motives.	Seconds, free, repeatable.
Sandboxing (container, no-net, least-priv)	Limits blast radius — contains a server that does turn out hostile at runtime.	Tells you nothing about whether to trust it; breaks servers that legitimately need the access you removed.	Setup + ongoing ops; can defeat the server's purpose.
Manual review	Everything, in principle — including clever, novel attacks a scanner can't pattern-match.	Human time and attention; doesn't scale; easy to skip under deadline.	Slow, expensive, inconsistent.

How to combine them (in order)

Scan first. It's instant and free, so there's no reason not to. The grade sets your effort budget for the next two steps.
Let the grade route your attention. Clean on a simple server is usually enough. Medium/high points you at the exact lines worth a human read — that's where manual review pays off, instead of reading the whole repo.
Sandbox what you run regularly. For servers you depend on, run them with least privilege and no network-by-default to cap the downside — then re-scan on every version change, because approval is an event, not a permanent state.

Start with a scan — it's the free, instant step Risk grade + the exact risky lines, in seconds. No signup.

Scan an extension →

Run it from the web, or inside Claude Code so your agent scans a server before it installs it:

claude mcp add mcpvet -- npx -y github:volohq-info/mcpvet

FAQ

Is sandboxing enough on its own?

It limits damage but doesn't assess trust, and it breaks servers that need real access. Pair it with a scan.

Can a scanner replace a manual review?

No — it makes the cheap pass instant and tells you where a manual review is worth it. It can miss novel/obfuscated logic.

Fastest safe way to vet a server?

Scan first; let the grade decide how much manual review and sandboxing it's worth.

MCPVet is an automated heuristic aid, not a human security audit or a guarantee. A clean grade doesn't prove a server is safe; always review code and instructions you don't trust. Guide reviewed June 20, 2026.