How to Run an AI Network Engineer on a Local LLM (Ollama, vLLM, Foundry Local)

Q: Can I run an AI network engineer offline on a local LLM?

Yes. An AI network engineer needs three things — a local model server (Ollama, vLLM, or Microsoft Foundry Local), an agent loop that can call tools reliably, and a real lab to act on. NetPilot On-Prem packages all three air-gapped: the agent runs on your own local model, designs and deploys real ContainerLab topologies, and runs device CLI, with no cloud and no phone-home.

Q: Which local model should I use for network automation?

Pick a model with reliable tool-calling, because the agent drives a multi-step design-deploy-verify loop rather than answering one prompt. Recent Gemma, Qwen, and gpt-oss-class models that honor a reasoning-effort setting work well; very small models tend to skip or malform tool calls. You can change the model in the admin console without touching the rest of the stack.

Q: What is the difference between Ollama and an AI network engineer?

Ollama is a local model server — it runs the LLM. An AI network engineer is the agent layer on top: it turns a plain-English goal into a plan, calls tools to design a topology, deploy a lab, and run CLI, observes the result, and iterates. Ollama answers a prompt; the agent does the job. You need both, plus a real lab the agent can act on.

Search "local LLM network automation" and every result is a do-it-yourself project: a Reddit thread on fine-tuning a model for NetOps, a Cisco Learning Network walkthrough of LM Studio, an arXiv paper on 5G automation with local models, a dozen LinkedIn how-tos wiring Ollama to Netmiko. The pattern is always the same — people in regulated, classified, or disconnected environments who cannot send a running-config to a cloud model, so they assemble their own stack and stop somewhere short of a real agent.

This guide is the productized version of that DIY project. It explains what "an AI network engineer on a local LLM" actually requires, where the hand-rolled path stops, and how to run the full design → build → validate loop fully offline on your own model, with a real multi-vendor lab to act on.

Why a local LLM for network engineering

For most teams, the model endpoint is a data-handling decision, not a quality one. The moment an agent is useful for networking it is reading real configs, real topologies, and real show output — exactly the data that a defense, federal, finance, or telco security policy will not let leave the network. A cloud model means every prompt and every config crosses your boundary to a vendor API.

A local LLM removes that question. The model runs on a server you own; inference never leaves your LAN. You trade some raw model capability for total data control — and for the build-and-validate work an agent does in a lab, that trade is usually easy.

What "an AI network engineer on a local LLM" actually requires

A local model alone is not an AI network engineer. Three pieces have to come together:

A local model server — something that speaks an OpenAI-compatible API on your network. Ollama, vLLM, and Microsoft Foundry Local all qualify.
An agent loop that calls tools reliably. The agent has to plan a sequence of steps, call a tool (design a topology, deploy a lab, run a command), read the result, and iterate until the goal is met. This is the part the model's tool-calling quality makes or breaks.
A real lab to act on. An agent that only writes text is a chatbot. To be a network engineer it needs to deploy real network-OS devices and run real CLI against them — which means a lab environment wired to the agent.

Miss any one and you have a demo, not a workflow. The DIY path usually nails (1), improvises (2), and skips (3).

The DIY path — and where it stops

The hand-rolled stack looks like this:

# 1. Run a local model
ollama serve
ollama pull qwen3:32b
 
# 2. Glue it to your devices with scripts
pip install netmiko nornir openai
# ...then write a few hundred lines wiring prompts -> Netmiko -> devices

It works for single-shot tasks: "summarize this config," "draft an ACL." Where it stalls is the agent loop and the lab. You end up writing your own planner, your own tool dispatch, your own retry logic for when the model emits a malformed tool call — and you still have no safe, reproducible multi-vendor lab for the agent to build and test in. That last 30% is most of the work, and it's the part nobody's blog post finishes.

The productized path — NetPilot On-Prem

NetPilot On-Prem packages the three pieces as one air-gapped install. The agent runs on your local model (Ollama, vLLM, or Foundry Local — you pick it in the admin console), it deploys real labs on your ContainerLab host through an authenticated MCP server, and it runs real device CLI — with no cloud and no phone-home at runtime.

The model is the only swappable part: point NetPilot at whichever local endpoint your security team has approved, and the rest of the agent — planning, tool dispatch, retries, the lab — is already built.

Direct CLI is always available. None of this hides the command line. Every device is reachable over SSH with its real vendor CLI; the agent is the fast path, and the CLI is how you verify its work or drill into one device by hand. You get both workflows.

Walkthrough — design, build, validate, offline

Here is the loop the agent runs, all on local infrastructure. Start with intent, in plain English:

"Build a three-router eBGP lab — AS 65001, 65002, 65003 in a triangle, advertise a loopback from each, and confirm all three routes propagate."

The agent designs the topology, assigns addressing, writes the per-vendor BGP config, deploys the lab on your ContainerLab host, and then checks its own work — it runs the verification commands and reads the output back before it reports done.

Verify it yourself, too. SSH into any node and confirm by hand:

# On R1 (FRR), check the eBGP sessions and learned routes
vtysh
show bgp summary
show ip route bgp

When the model emits a tool call that doesn't parse — which smaller local models do more often than a frontier cloud model — the agent retries against the same local endpoint rather than failing the turn. That retry budget is exactly the kind of glue the DIY path leaves you to build.

Then iterate, still offline:

"Add a route reflector so the three ASes don't need a full iBGP mesh, and redistribute connected into BGP on R2."

The agent makes the cross-device change and re-validates. No prompt, config, or show output ever left your network.

Choosing a local model

The single most important property for this use case is reliable tool-calling, not benchmark trivia. The agent succeeds or stalls on whether the model emits well-formed tool calls through a multi-step loop.

Favor models that honor a reasoning-effort setting and have solid tool-calling — recent Gemma, Qwen, and gpt-oss-class models are reasonable starting points.
Be wary of very small models. They skip or malform tool calls under a long agent loop, which shows up as stalled turns.
Tool-call argument streaming is limited on some local runtimes (notably Ollama's OpenAI-compatible endpoint), so a tool card may appear all at once after a short wait rather than streaming token by token. That's a runtime limitation, not an agent bug — surface it honestly to your users.

Because the model is configured in the admin console, you can A/B two local models against the same prompts and keep whichever drives the loop most reliably on your hardware.

On the hardware side — which GPU actually runs a ~31B model, and why you don't need an H100 or A100 — see What GPU Do You Need to Run an On-Prem AI Network Engineer?.

FAQ

Can I run an AI network engineer offline on a local LLM?

Yes. You need a local model server (Ollama, vLLM, or Foundry Local), an agent loop that calls tools reliably, and a real lab to act on. NetPilot On-Prem packages all three air-gapped — the agent runs on your own model, deploys real ContainerLab labs, and runs device CLI, with no cloud and no phone-home.

Which local model should I use for network automation?

Pick a model with reliable tool-calling, since the agent drives a multi-step design-deploy-verify loop. Recent Gemma, Qwen, and gpt-oss-class models that honor a reasoning-effort setting work well; very small models tend to skip or malform tool calls. The model is swappable in the admin console.

What is the difference between Ollama and an AI network engineer?

Ollama runs the model. The AI network engineer is the agent on top — it plans, calls tools to design a topology and deploy a lab, runs CLI, reads the result, and iterates. You need both, plus a real lab the agent can act on.

Does any of this touch the cloud?

No. With NetPilot On-Prem the app, the lab, and the model all run on your LAN. There is no telemetry and no outbound connectivity at runtime; updates come from signed offline bundles you bring across your boundary.

Copy-paste ready: The three-AS eBGP prompt in our example library is a good first lab to run on a local model — paste it and watch the agent design, deploy, and verify it.

Related reading: On-Prem AI Network Lab covers the full air-gapped deployment; Building an Air-Gapped Network Lab walks through the no-cloud lab itself; and Self-Hosted ContainerLab + AI shows how the agent connects to your ContainerLab host.

Running an air-gapped or regulated network? The On-Prem AI Network Lab page covers the deployment end to end — contact sales to scope it for your environment.

Try NetPilot Free