A high-performance network packet capture and analytics pipeline. From the ground up.

Wirelake is built on a zero-copy AF_XDP capture core, protocol-native decoders, and Apache Arrow/Parquet output. Every component is designed for sustained 100Gbps throughput with no packet loss and no compromise on data fidelity.

How Wirelake is built.

Capture

NIC

E810 · X710

RSS queues

queue-per-core

AF_XDP

zero-copy bypass

eBPF / XDP

classify & steer

Decode

DNS

all record types

GTP-U / GTP-C

TEID · inner IP

Multicast UDP

market data

TCP + more

per-protocol

Store

Arrow C GLib

columnar write

Parquet GLib

Hive-partitioned

Local NVMe

hour-partitioned

MinIO

async warm tier

Query

DuckDB

direct Parquet

Grafana

via DuckDB

Spark/Trino

distributed

MCP server

AI-native · tier-aware

Flight SQL

federation

Wirelake core

Open ecosystem

Async / optional

Capture Layer

AF_XDP with eBPF/XDP for kernel-bypass, zero-copy packet ingestion. Traffic is distributed across RSS queues, with a dedicated capture thread per queue. This eliminates kernel overhead and ensures sustained line-rate throughput even on dense, high-PPS traffic.

Key properties

→ AF_XDP zero-copy — no kernel/userspace memory copies
→ eBPF/XDP for early packet classification and steering
→ Per-RSS-queue threading — no contention, linear scaling
→ Compatible with Intel X710, E810, and other XDP-native NICs

Decode Layer

Protocol decoders run in dedicated threads, pulling from per-protocol queues. Decoders are implemented in C for maximum performance. Each decoder produces structured, typed records — not raw bytes — ready for columnar serialisation.

Supported protocols (v1)

→ DNS (UDP/TCP, all record types)
→ GTP-U / GTP-C (mobile carrier core network)
→ Multicast UDP (financial market data feeds)
→ More protocols added per customer requirements

Storage Layer

Decoded records are written to Apache Parquet via the Arrow C GLib and Parquet GLib libraries. Files are partitioned by hour using Hive-style directory layout, enabling efficient predicate pushdown in DuckDB, Spark, and Trino.

/mnt/store1/<protocol>_parquet/ts_hour=<epoch_hours>/

Schema conventions

→ IP addresses stored as binary(16) (IPv4-mapped IPv6)
→ Timestamps in nanoseconds (ts_ns)
→ Flow IDs via Toeplitz hashing
→ Per-minute summary Parquet files for fast dashboard queries

Query Layer

No proprietary query engine. Wirelake output is standard Parquet — queryable with any tool that speaks Apache Arrow.

DuckDB Apache Spark / Trino Grafana (DuckDB datasource plugin) Python / pandas / Polars

MCP Server

Wirelake includes a built-in MCP (Model Context Protocol) server — a unified query interface that exposes your entire Parquet data lake to any MCP-compatible LLM or AI agent.

The MCP server implements automatic query tier selection: summary Parquet tables are used by default for time ranges over 15 minutes, with raw record-level tables used for short windows or precise lookups. This means AI queries are fast and cost-efficient without any manual optimisation.

Because the MCP server reads the same Hive-partitioned Parquet files as every other query interface, there is no separate data pipeline to maintain. Structured data written by the decoder is immediately available to your AI tooling.

Key properties

→ MCP-native — compatible with any MCP-enabled LLM or agent framework, open source or proprietary
→ Unified single-server architecture with namespaced tool registration per protocol
→ Automatic query tier selection — summary tables for broad time ranges, raw tables for record-level queries
→ DuckDB query engine over local Parquet — sub-second response on typical analytical queries
→ On-premises — the MCP server runs on your capture hardware; data never leaves your infrastructure
→ Works alongside existing query interfaces — DuckDB, Spark, Grafana, and the MCP server all read the same Parquet files

Multi-node deployment.

For deployments spanning multiple capture nodes, Wirelake includes an Arrow Flight SQL gateway for cross-node query federation. Nodes register with Consul KV using semantic tags, enabling routing to specific node subsets by location, protocol, or role — without enumerating IP addresses. An optional async warm tier via MinIO allows historical data to be offloaded from local NVMe. The local write path never depends on network availability.

On-premises. Always.

Wirelake is deployed on your hardware, in your facility. Packet data never traverses a network you don't control. This is a hard design requirement, not a configuration option.

Requirements

→ Linux (Ubuntu 22.04 LTS or later)
→ Intel E810-C or X710-DA2 NIC (XDP native mode required)
→ Local NVMe storage
→ No cloud dependencies on the capture or write path

Management tools

Consul KV Prometheus + Grafana Grafana Loki Ansible

See Wirelake on your infrastructure.

We work with teams to run a proof of concept on your hardware, with your traffic. No commitment, no obligation.

Get a Demo →