TL;DR --> Qualify your data with "scope," "resolution," "animation;" qualify your analysis with "technique" and "accessibility"
I don’t know about you, but I’ve had a difficult time explaining the differences between hunting methodologies. If you believe in a broad definition of hunting, how you describe the act can be quite various. I’ve heard the following descriptors of hunting:
Real-time or Streaming or Continuous
Interestingly, I don’t think any of these are incorrectly described. They do say something about the type of hunting being performed, but they describe different dimensions of hunting.
IOC-based and behavior-based hunting describe the type of signature (aka "pattern," "analytic," "detection") that’s being used. Query-based, scan-based, and real-time/streaming describe how the hunted data is accessed, not necessarily what is done with the data. Continuous is more or less a descriptor of an operating procedure.
What about the type of data being used for hunting? What are the qualities of the data you hunt?
In an attempt to describe different hunting methodologies (roughly, what data is being hunted and how it’s being hunted), I break down the topic into what I call The Five Dimensions of Hunting:
By describing separately the data you’re hunting on (the Data Dimensions) as well as the way you’re hunting it (the Analysis Dimensions), I believe it can add clarity to conversations rather than talking past each other (“You perform IOC-based hunting? Cool, I perform Query-based hunting.”...... huh?). I think this dimensional point of view is mostly comprehensive. At the least, I believe it’s simple enough to start a clear conversation using an agreeable nomenclature.
Again, I like describing the data being hunted on separately than the way it’s hunted. The Data Dimensions of hunting are meant to describe attributes of the data being hunted on. Regardless if it’s network data, endpoint data, or something in between, these dimensions are arguably the attributes that matter in describing your data.
The Scope dimension answers the question of “How broad is my coverage?” If you have only netflow data to hunt on, you arguably have very broad coverage of IT activity in general (unless you want to see activity off the corporate network, but that’s a different discussion). If you have only endpoint process metadata (say, from Sysmon or an EDR product), unless you’re pulling data from every single endpoint on the network, your coverage is arguably less than that of the netflow example (again, unless you want to see what’s happening off the corporate network).
The Resolution dimension answers the question “How deep is my coverage?” Using the netflow scenario, you arguably have relatively low resolution of activity. For example, netflow may tell you which computer sent how much data to what destination as well as everyone else sending to that destination. What it can’t tell you is what’s causing that network traffic. In the endpoint process metadata scenario, your resolution is arguably very high. This data can reveal exactly which process caused your network traffic, where that process came from, and everything else that process did.
The last Data Dimension can be tricky to understand if you’ve never experienced certain kinds of data. The Animation dimension answers the question of how representative your data is of activity that has occurred versus the artifacts left behind by that activity. Highly animated data during a bank robbery would be the CCTV video feed of the perpetrator in action, whereas low animation data points would be the bullet casings left behind after the burglar fired shots. In the security world, this is analogous to having streaming endpoint process metadata (highly animated) versus static file listings and binaries left behind on disk (low animation). At the network level, full packet captures are arguably the most animated data.
The Analysis Dimensions are meant to house those descriptors of how to access the data and what you do with the data.
The Technique dimension answers the question of “How am I hunting the data?” Specifically, what questions are you asking your data? Perhaps you want to match a set of terms (e.g. IPs, domains, process names) against your data. Or maybe you want to look for something discretely anomalous (e.g. svchost.exe was spawned by an unexpected process). Or maybe you’d like to algorithmically determine something anomalous (e.g. machine learning algorithm with powershell.exe process activity classifiers). Or, perhaps, you simply want to aggregate the data for raw discovery with a single query interface. This is where “IOC-based” and “behavior-based” are appropriate descriptors - they describe analytical techniques.
The Accessibility dimension answers the question of “How accessible to my techniques is the data I’m hunting?” You might be proactively hunting for IOCs after they’re discovered via threat intel feeds or incident response, but where exactly are you hunting? If the data you’re hunting against is aggregated somewhere, then you’re probably conducting a set of queries against that centralized repository to conduct IOC-based hunting. If the data you need is resident on each endpoint, then you’re probably scanning X number of endpoints to conduct IOC matching. In both cases, you’re performing IOC-based hunting, but depending on how accessible your data is, you’re doing it with either queries or scans. This is where descriptors like “query-based,” “scan-based,” and “real-time” are appropriate. They describe how you’re accessing your huntable data.
There are a couple other terms I’ve heard used to describe something about hunting data or the methods being used to analyze. One of those terms is “visibility.” This is a useful term to describe what amount of total visibility of your information network your data provides you. I consider Visibility to be a function of the Data Dimensions of Scope and Resolution, whereby you have a combination of breadth and depth of coverage.
Another useful term is “liveness.” What is the overall liveness of your hunting methodology? As in, how close to real-time do your hunting analysis conclusions come? I see Liveness as a function of the Analytical Dimension of Accessibility but not necessarily as a function of the data itself. You can make virtually any data lively or not.
For the physics-minded big brains out there, I know I’m probably abusing the term “dimension” to some extent. It was the first term to come to mind when I contemplated this topic. I think the term implores entire planes of conversation, decision, and research for each attribute of data and analysis. I understand the term “attribute” is probably a more semantically correct term, but I think it lacks in the depth I was looking to conjure.
One more thing - these dimensions can probably be applied to security data and security analysis in general. This doesn’t have to be a hunting-specific thing.