ANR-NSF MINT

Funded by

Consortium

Modeling Modern Network Traffic: From Data Representation to Automated Machine Learning

Both network operations and research depend on the ability to answer questions about network traffic. Decades ago, the questions were simpler: they involved traffic volumes and simple performance metrics. The answers were also more apparent: most traffic was not encrypted, and the answers to most questions were readily apparent from protocol headers and unencrypted packet payloads. Today, operators and researchers are asking more sophisticated questions about application performance, quality of experience (QoE), and malicious traffic originating from IoT devices, as well as trying to predict the impact of potential changes. And yet, as questions are becoming increasingly complex and important, network data is becoming more difficult to obtain. Increased traffic requires operators to make hard decisions about sampling and altogether precludes analyzing individual packets and reassembled streams. Furthermore, traffic is increasingly opaque. Web content has become ubiquitously encrypted, preventing operators from directly inspecting video streams to troubleshoot performance problems. Major services have moved to a handful of IP addresses on large cloud providers like Amazon, Google, and Cloudflare, removing the identity once provided by IP addresses. Networks contain increasingly heterogeneous manufacturer-controlled devices that cannot be troubleshooted locally. As a result, even seemingly simple, but important questions like What content is sent in cleartext? or What is the packet loss for Netflix traffic on my network? are impossible to answer today.

This ANR-NSF funded project MINT aims to make it easier for operators and researchers to ask questions about network traffic. Doing so involves solving new, challenging research questions to create the requisite analytical building blocks required to model traffic on modern networks. Once we have the analysis platform and models in place, we can then turn to helping operators answer questions that help them more effectively run their networks and enabling researchers answer questions that drive discovery. The project involves three core following activities: (1) Study how to represent traffic data in ways that are amenable to modeling, and that could optimize models for both supervised and unsupervised modeling tasks. (2) We will build on our work on traffic data representation, to develop a set of tools to automatically explore model and traffic representations tailored for network traffic problems. (3) We will use the software platforms and algorithmic primitives we built to design new techniques and tools for operators to solve the challenges that block them from transferring developed models from isolated laboratory experiments to real-world deployments.

Recent News

Feb 7, 2024 Our work on developing NetDiffusion, a tool to generate synthetic network traffic that is high fidelity and conforms to protocol specifications, has been accepted and will appear at ACM Sigmetrics 2024
Jul 4, 2023 Our work on developing a on detecting, explaining, and mitigating concept drift in cellular networks has been accepted and will appear at ACM CoNEXT 2023
Apr 3, 2023 Our work on developing a on a multimodal video / network traffic for enhanced activity recognition has been accepted and will appear at ACM Ubicomp 2023
Aug 22, 2022 Our work on developing a software framework that lets users analyze over 100 Gbps of real-world traffic on a single server with no specialized hardware has been accepted and will appear at ACM Sigcomm 2022
Oct 4, 2021 Our work on developing a new framework and system that enables the joint evaluation of both machine learning performance and systems-level costs of different representations of network traffic has been accepted and will appear at ACM Sigmetrics 2022