Saturday, 22 July 2017

Visualize Windows Logs With Neo4j

Intro

I spend a lot of my day staring at logs. More specifically, logs found in a SIEM device. Most of the time, these are represented in rows of data, sometimes I get lucky and get to look at charts.

Although a SIEM can ingest a huge amount of data, it kind of sucks at representing this data in a meaningful way.

As soon as I saw the amazing BloodHound project, my crappy rows of data and charts seemed old and busted. I really wanted the same kind of visual representation, but for other types of log data.

In this post then, I want to cover how to visualize some Sysmon logs with Neo4j.

Getting Started

I did some quick Googling and stumbled across this fantastic post which gave me the foundations I needed to get started.

First step is to grab the free community edition of Neo4j, which can be found here.

After the install, you should be able to browse to http://127.0.0.1:7474/browser/ to access the Neo4j DB, the default user/pass is neo4j/neo4j - it should prompt you to change your password on first login.

Second step is to install Sysmon with a configuration file, I covered this here.

Get Your Logs Ready

Now that you have some juicy log data and Neo4j set up, we need to get the logs into a format that we can import to Neo4j. I used the following short PowerShell script:

Import-Module C:\Users\Anton\Downloads\Get-WinEventData.ps1
$File = "C:\Users\Anton\Desktop\logs.csv"
Clear-Content "C:\Users\Anton\Desktop\logs.csv"

Add-Content $File -Value Source","Destination","DestinationPort","Application`n -NoNewline
$EventsID3 = Get-WinEvent -FilterHashtable @{logname="Microsoft-Windows-Sysmon/Operational";id=3} | Get-WinEventData | select EventDataSourceIp,EventDataDestinationIp,EventDataDestinationPort,EventDataImage
foreach ($Event3 in $EventsID3)
{
  
    $output = Write-Output $Event3.EventDataSourceIp","$Event3.EventDataDestinationIp","$Event3.EventDataDestinationPort","$Event3.EventDataImage`n
    Add-Content $File -Value $output -NoNewline
}

This script creates a CSV file that is simple for us to import, it should look something like this:


For simplicity, I put the file into the C:\Users\<Name>\Documents\Neo4j\default.graphdb\import directory. On my installation, I had to create the import directory myself.

Importing Data and Cypher Query

Now that have the file in the format we need, open up the Neo4j interface and enter the following commands in the input box:

load csv with headers from "file:///logs.csv" AS csvLine
CREATE (source:address { address: csvLine.Source })
CREATE (destination:addressd { addressd: csvLine.Destination })
CREATE (DestinationPort:DestPort { destport: csvLine.DestinationPort })
CREATE (application:app { Application: csvLine.Application })
CREATE (source)-[:ConnectedTo]->(destination)-[:Using]->(application)-[:OnPort]->(DestinationPort)

*Note: the whole thing needs to be entered as one paragraph, Neo4j wouldn't let me do it line by line.

This is my first time writing Neo4j cypher query, but to break this down a bit:

- The first line loads our CSV,

- The next four CREATE statements make our graph elements for the source address, destination address, destination port and application

- The final CREATE statement builds our relationship. In this case, I want to know what source IP connected to what destination IP and with what application and destination port.


Hit the play button and you should see something similar to the following:


Now we can get to the fun stuff, click the Database icon on the left hand side followed by the * icon under 'Relationship Types':


You should now see your relationship graph. Lets take a closer look:


We get a really clear visual representation of what connected to what on our system. In this case we can see 192.168.1.123 connected to 204.79.197.200 using IE on port 443

Practical Example

Let's take this a step further and examine how this can be use to elucidate some malicious activity.

I'm going to use PowerShell to grab Invoke-Mimikatz using a handy list of PowerShell download cradles provided by harmj0y.


Of course Neo4j and this kind of setup won't alert you to any malicious activity, but pretend that the above command triggered some kind of alert that you want to investigate further.

We can write a simple query to see all PowerShell network connections:


Which should give you a list of results, 1 in my case:


We can double click the bubble to expand it (in my case I had to double click the destination IP bubble as well). We should now see our visual representation of PowerShell network activity:


Again, we can clearly see that 192.168.1.123 connected to 151.101.124.133 (Github) using PowerShell over port 443.

This is obviously a simplified example but I think that this would be a super handy tool to have in order to gain some extra insights from your logs. I used Sysmon here because it provides really rich data. Instead of IP I could have included hostname for extra clarity, or you could include both and change the query to suite your needs. In either case, I find this visual format much easier to digest than the standard table.

This kind of visual representation could also be used to profile high-value accounts. For example, which hosts does <admin account> connect to on a daily basis? Neo4j and authentication logs could be used to build up a "known good" baseline fairly easily.

Closing Notes

- You can feed a Neo4j database via it's API with JSON much like BloodHound does. I don't have the programming knowledge for that however.

- Linked to the point above, my data set was tiny and I have no idea how this performs at scale, If you have tons of logs you are probably better off using the API. Another option would be to further filter down the logs you select for importing.

- Credit to this post: http://blog.davidvassallo.me/2014/08/03/getting-started-with-neo4j-and-security-data-analysis/  the cypher query I used above is basically a modified version of the one found there

- If any SIEM vendors happen to be reading this (ya right). It would be a huge value-add if this kind of capability could be baked into a SIEM device. Some kind of "Export to Neo4j" feature coupled with a cypher query builder.... shut up and take my money.

Any questions or comments, feel free to connect with me via twitter @Antonlovesdnb.