A high-performance network packet capture and analytics pipeline. From the ground up.

Wirelake is built on a zero-copy AF_XDP capture core, protocol-native decoders, and Apache Arrow/Parquet output. Every component is designed for sustained 100Gbps throughput with no packet loss and no compromise on data fidelity.

How Wirelake is built.

Capture
NIC
E810 · X710
RSS queues
queue-per-core
AF_XDP
zero-copy bypass
eBPF / XDP
classify & steer
Decode
DNS
all record types
GTP-U / GTP-C
TEID · inner IP
Multicast UDP
market data
TCP + more
per-protocol
Store
Arrow C GLib
columnar write
Parquet GLib
Hive-partitioned
Local NVMe
hour-partitioned
MinIO
async warm tier
Query
DuckDB
direct Parquet
Grafana
via DuckDB
Spark/Trino
distributed
MCP server
AI-native · tier-aware
Flight SQL
federation
Wirelake core
Open ecosystem
Async / optional

Capture Layer

AF_XDP with eBPF/XDP for kernel-bypass, zero-copy packet ingestion. Traffic is distributed across RSS queues, with a dedicated capture thread per queue. This eliminates kernel overhead and ensures sustained line-rate throughput even on dense, high-PPS traffic.

Key properties

  • AF_XDP zero-copy — no kernel/userspace memory copies
  • eBPF/XDP for early packet classification and steering
  • Per-RSS-queue threading — no contention, linear scaling
  • Compatible with Intel X710, E810, and other XDP-native NICs

Decode Layer

Protocol decoders run in dedicated threads, pulling from per-protocol queues. Decoders are implemented in C for maximum performance. Each decoder produces structured, typed records — not raw bytes — ready for columnar serialisation.

Supported protocols (v1)

  • DNS (UDP/TCP, all record types)
  • GTP-U / GTP-C (mobile carrier core network)
  • Multicast UDP (financial market data feeds)
  • More protocols added per customer requirements

Storage Layer

Decoded records are written to Apache Parquet via the Arrow C GLib and Parquet GLib libraries. Files are partitioned by hour using Hive-style directory layout, enabling efficient predicate pushdown in DuckDB, Spark, and Trino.

/mnt/store1/<protocol>_parquet/ts_hour=<epoch_hours>/

Schema conventions

  • IP addresses stored as binary(16) (IPv4-mapped IPv6)
  • Timestamps in nanoseconds (ts_ns)
  • Flow IDs via Toeplitz hashing
  • Per-minute summary Parquet files for fast dashboard queries

Query Layer

No proprietary query engine. Wirelake output is standard Parquet — queryable with any tool that speaks Apache Arrow.

DuckDB Apache Spark / Trino Grafana (DuckDB datasource plugin) Python / pandas / Polars

MCP Server

Wirelake includes a built-in MCP (Model Context Protocol) server — a unified query interface that exposes your entire Parquet data lake to any MCP-compatible LLM or AI agent.

The MCP server implements automatic query tier selection: summary Parquet tables are used by default for time ranges over 15 minutes, with raw record-level tables used for short windows or precise lookups. This means AI queries are fast and cost-efficient without any manual optimisation.

Because the MCP server reads the same Hive-partitioned Parquet files as every other query interface, there is no separate data pipeline to maintain. Structured data written by the decoder is immediately available to your AI tooling.

Key properties

  • MCP-native — compatible with any MCP-enabled LLM or agent framework, open source or proprietary
  • Unified single-server architecture with namespaced tool registration per protocol
  • Automatic query tier selection — summary tables for broad time ranges, raw tables for record-level queries
  • DuckDB query engine over local Parquet — sub-second response on typical analytical queries
  • On-premises — the MCP server runs on your capture hardware; data never leaves your infrastructure
  • Works alongside existing query interfaces — DuckDB, Spark, Grafana, and the MCP server all read the same Parquet files

Multi-node deployment.

For deployments spanning multiple capture nodes, Wirelake includes an Arrow Flight SQL gateway for cross-node query federation. Nodes register with Consul KV using semantic tags, enabling routing to specific node subsets by location, protocol, or role — without enumerating IP addresses. An optional async warm tier via MinIO allows historical data to be offloaded from local NVMe. The local write path never depends on network availability.

On-premises. Always.

Wirelake is deployed on your hardware, in your facility. Packet data never traverses a network you don't control. This is a hard design requirement, not a configuration option.

Requirements

  • Linux (Ubuntu 22.04 LTS or later)
  • Intel E810-C or X710-DA2 NIC (XDP native mode required)
  • Local NVMe storage
  • No cloud dependencies on the capture or write path

Management tools

Consul KV Prometheus + Grafana Grafana Loki Ansible

See Wirelake on your infrastructure.

We work with teams to run a proof of concept on your hardware, with your traffic. No commitment, no obligation.

Get a Demo →