Sunday, July 8, 2018

Building a Network Communication map with Scapy and NetworkX


In my previous posts, I have described using NetworkX to simulate the topography of a network. I have also discussed using Scapy for various packet capture and crafting tasks. In this post I am going to tie these concepts together to generate a network communication map, which is suitable for analysis using SciKit-Learn.

Overview

The concept of building a network map from packets traversing the network is nothing new. Most GUI-based packet analysis tools, like WireShark or Zenmap, include an area where you can view the topology.

Zenmap Network graph screen

These views are helpful in understanding the communication flow around the network. If you read my previous blog on simulating network infections with networkX, then the above picture may already look familiar. It is, in fact, a network graph similar to the type generated before. The primary differentiator is that these nodes and edges are defined based off of packets traversing an actual network, rather than a hypothetical network. Each node represents a system, and each edge a communication pathway. Both nodes and Edges have attributes that can be collected and used for analysis. Our first goal is to capture the pcap data, and use it to generate a Multi-Directed Graph.

Back to Basics

As I previously mentioned, Scapy is the Pythonic Swiss Army knife of packet manipulation. Using the sniff() function allows us to capture the raw packet data and pass it off to a function we will define shortly. This portion is really as simple as:
from scapy.all import *
sniff(iface='eth0', prn=process_packet, count=50)
Now, every packet that comes in Eth0 will be passed off to the function process_packet(p).

This is where the magic begins to happen. First the type of packet is determined based off the presence (or absence) of the IP() layer. For simplicity, I am going to ignore any packet which is not TCP or UDP, both of which are built above the I.P. layer. Since I am using the I.P source and Destination as the node identifiers, any packet which does not have these defined is going to be ignored as well. In a more complex implementation you would want to define other methods which processed these packets in a more useful manner. Below is the beginning of the process_packet() definition. This code handles the Node identification and packet rejection.
import networkx as nx
G = nx.MultiDiGraph()  #Holds The comm. Graph
def process_packet(p):
    global G
    dt = datetime.now()
    try:
        src_ip = p[IP].src
        dst_ip = p[IP].dst
    except:
        print("ignoring packet without Src and Dst")
        return
Normally, I try to stay away from global variables, but in this case I think it makes the code cleaner to read and easier to understand so I left it. The first thing the code does is define a Datetime.now() which represents the time the packet started processing. I will add this to the edge definition in NetworkX later in the code.

Integrating Scapy with NetworkX

Now that we have a way to capture packets, and access their relevant information, like source and destination port, total size, etc. we can use this information to begin to add structure to our graph. For each packet that comes in we want to create a directed edge, which points from the source I.P. to the destination I.P. Additionally, we want to capture the source and destination ports, the time the packet was processed, and the the payload size (if one is sent). Finally, for ICMP packets, we want to capture the ICMP code.
if p.haslayer('TCP'):
sport = p[TCP].sport
dport = p[TCP].dport
if hasattr(p[TCP], 'load'):
ld = len(p[TCP].load)
else:
ld = None
attribs = {
"sport": sport,
"dport": dport,
"time": dt,
"load": ld,
"type": "TCP"
}
elif p.haslayer('UDP'):
sport = p[UDP].sport
dport = p[UDP].dport
if hasattr(p[UDP], 'load'):
ld = len(p[UDP].load)
else:
ld = None
attribs = {
"sport": sport,
"dport": dport,
"time": dt,
"load": ld,
"type": "UDP"
}
elif p.haslayer('ICMP'):
sport = p[ICMP].sport
dport = p[ICMP].dport
attribs = {
"sport": sport,
"dport": dport,
"time": dt,
"code": p[ICMP].code,
"type": "ICMP"
}
else:
print(p.summary())
return
G.add_edge(src_ip, dst_ip, dict=attribs)


Inside the packet processing function, we check to see if the packet is one of three types: TCP, UDP, or ICMP. If it can extract the required layer information, the code adds an edge to the graph along with thew attributes dictionary. Otherwise it simply prints the packet summary and returns. Obviously there is a lot more you could capture but for the purpose of this example that should be plenty.

Drawing the Network Graph

Now that we have captured the packets and used them to define the underlying network graph, it is time to display our results. NetworkX makes this easy by integrating with Matplotlib (among others). We can import matplotlib.pyplot and use it as we would any other matplotlib plot. This includes the ability to use axes objects and fancy patches to improve the layout of the graph.

Before and after applying the FancyArrowPatch
As you can see, by adding the fancy arrows patch we get a much better visual indicator of how much traffic was flowing in each direction. There are of course other ways one could choose to display this information (such as edge weights or colorization)...but I like this because it also exemplifies more advanced ways you can draw graphs. below the call to the sniff function we place our graph drawing code:
pos = nx.spring_layout(G, k=1)
f, (ax1, ax2) = plt.subplots(1, 2, sharey=True)
nx.draw_networkx(G, pos, ax=ax1, with_labels=False)
draw_network(G, pos, ax2)
plt.axis('equal')
plt.axis('off')
plt.title("All connections")
plt.savefig("conn_graph.pdf")
plt.show()
The pos variable holds the resulting (x,y) position of each node in the graph after NetworkX calculates a spring layout for it. Next the matplotlib.pyplot library is called to generate a figure, with two subplots which share their Y axis I use two subplots so you can easily see the before and after effect of the patch. The first subplot is sent to the built-in networkX function draw_networkx() which results in the graph on the left of the image. The second subplot is passed to a function which handles drawing a second copy of the graph and applying the fancy patch. I will cover that in a moment. But for now, once that completes we set the axis ratio to equal and we set the axis markers to off. We add a simple title. Finally, we save the image as a pdf to the hard drive and display it for visual inspection.

Now for the real work in the drawing. the draw_network() function mentioned above.
def draw_network(G,pos,ax,sg=None):
for n in G:
c=Circle(pos[n],radius=0.05,alpha=0.5, color='g')
ax.add_patch(c)
G.node[n]['patch']=c
x,y=pos[n]
seen={}
for (u,v,d) in G.edges(data=True):
n1=G.node[u]['patch']
n2=G.node[v]['patch']
rad=0.1
if (u,v) in seen:
rad=seen.get((u,v))
rad=(rad+np.sign(rad)*0.1)*-1
alpha=0.25
color='k'

e = FancyArrowPatch(n1.center,n2.center,patchA=n1,patchB=n2,
arrowstyle='fancy',
connectionstyle='arc3,rad=%s'%rad,
mutation_scale=10.0,
lw=2,
alpha=alpha,
color=color)
seen[(u,v)]=rad
ax.add_patch(e)
return e
It seems like a lot at first, but if you break it down really all the code does is iterate over the list of nodes in the graph G and for each node it finds all the outbound edges for that node and applies a radial function to determine the curve of the edge. This is related to the number of out-directed edges connecting the nodes. As more edges get added the curve becomes more drastic keeping them from overlapping. There are many possible parameters so I would suggest reading over https://matplotlib.org/api/_as_gen/matplotlib.patches.FancyArrowPatch.html
To get a better feel for the options and what they each mean. One possible change is to set the edge color based off the packet type. For example if the edge is a TCP edge color it green, if it is UDP we could color it blue, and if it is ICMP we can color it red.

Edge color denotes packet protocol

Conclusion

I hope this post has shed some light on the integration of NetworkX and Scapy. This really only scratches the surface. The reason for wanting to capture this packet information in form of a graph goes beyond making pretty pictures. NetworkX also integrates well with the Pandas data manipulation library. This means we can use graph statistics as features in our data set. Pandas has the added benefit of integrating with SciKit-Learn which will allow us to train predictive models based off these features. But all of that is for another time. For now, let's enjoy one last pretty network picture.



No comments:

Post a Comment