Introduction #
Over the past five years, I’ve supported dozens of organizations through cyber crises as an incident response manager. Here’s a conversation I’ve found myself having in almost every ransomware incident:
CEO: “Okay… And how long will this take?”
Me: “It’s hard to say right now. We only started investigating yesterday.”
CEO: “Estimate.”
Me: “In cases like this - a full ransomware shutdown - typically I would say between one to two weeks before you can restore operations, safely.”
Usually there’s a short silence and nobody in the Zoom meeting is saying anything before the CEO responds. For most of them, it’s their first real cyber crisis.
CEO: “We’re probably losing over two million dollars each day. So why can’t we go faster?”
I can answer “massive data gaps”, but this is not the right moment.
…
We’ll get back to this story in a moment and how I usually respond to that question. But first, some incident response context.
Some Incident Response Context #
A cyber breach is inevitable. It will happen. Any seasoned security expert will agree it’s not a question of if, it’s when. You can’t control when it will happen, but you can control how ready you are. And the difference between a bad day and a full blown catastrophe comes down to speed — how fast you can find the truth, decide, and act.
For me, that is the core truth of incident response: find the truth, decide, and act. Everything else - playbooks, tooling, cadence - is built on that simple progression. Let’s touch briefly on each step, starting from the end.
The last step, act, is usually the easiest. The plan is clear, all decisions were already made, and the teams just need to execute. There is little room for failure in the act step (except sloppiness).
One step before that, decide, is where things get complicated. And this is THE most important part of any incident and crisis management - decisions. An inform decision must be data-driven and risk-driven (the truth). Most decisions during a cyber crisis are impacting the business, thus they are business decisions taken by the executive team. Since the executives cannot gather the truth on their own, it is the role of the security team to provide it for them. And what happens if the security team delivers bad data or misinterprets the risk? Wrong decisions. Wrong decisions are what can turn a cyber incident (no matter its size) into a company-wide business crisis.
I’ll share an example.
Several years ago I managed a complex incident where the company was notified by the FBI that they have a nation-state threat actor in their network. That was a Friday afternoon and the company decided to disconnect their entire organization from the internet.
Very smart decision.
During the weekend they increased EDR deployment coverage and isolated a couple of servers where they found some malware, and on Monday morning they decided on their own to restore full internet access. Within 12 hours their entire data centers were completely wiped by the threat actor to destroy evidence and as a result the business was shutdown for almost three weeks.
Very poor decision.
The only difference between those two decisions was “the truth” that was provided to the executives. The first decision was based on the accurate truth that was provided by the FBI (“the threat actor you have in your network is known to be very aggressive upon identification, so the risk is high and we recommend you isolate your network”) and this led to the right decision. But the second decision was based on “the truth” that was falsely communicated to the executives by the untrained IT team of the company who didn’t understand the risk (“we’ve increased coverage of security agents and deleted all the malware that we found, so we feel we are safe at this moment”) which led to the wrong decision that halted the business for weeks.
And this takes us to the first step - find the truth. Every breach is a story and the security team needs to investigate and discover all the evidence and build that story piece by piece. The team needs to be aware what is known for a fact, what is an estimation, and most importantly what is unknown. An incident response routine can take days and weeks, and the story becomes clearer with each day that passes. The security team needs to continuously give the executive the truth every day - without speculating or guesstimating, just the truth, what are the facts, and what is unknown - so that they can make the right decisions.
So: find the truth, decide, and act.
Okay, back to our story.
Facing Ransomware With Data Gaps #
CEO: “Estimate.”
Me: “In cases like this - a full ransomware shutdown - typically I would say between one to two weeks before you can restore operations, safely.”
CEO: “We’re probably losing over two million dollars each day. So why can’t we go faster?”
I’m thinking about how we don’t know yet how the threat actor got into their network, which method they used to deploy the ransomware payload across all encrypted servers and workstations, and that we still need to identify malicious backdoors. The threat actor probably deployed more than one persistence mechanism. We still don’t have any IOCs. We haven’t found the C2 communication yet. We don’t know how many domains were impacted, if not all of them. Also, we’re not sure which servers were directly accessed by the threat actor and were used to facilitate the attack. We haven’t checked yet if the threat actor deployed tools on the ESXi hosts or any appliances. So much to do.
Me: “One thing I can reassure you for a fact, is that we are going to do absolutely everything we can do to finish the job as fast as possible. Unfortunately, in this case the investigation is very complicated and it does take time.”
CEO: “Okay. We have several clients who started to reach out to us because our systems are down and they are asking if it’s a cyber attack. Now you’re telling me that we are going to be in shutdown for one to two weeks. What are we supposed to tell them?”
Money loss aside, the impact on reputation and client trust is the real pain. The CEO isn’t looking for a quote from me, he’s looking for a way to retain their trust.
Me: “You can tell the truth, but don’t disclose any details they didn’t ask for. You should tell them that you experienced a cyber attack and made the decision to shut down internet access as a security measure to prevent any escalation and to keep their data safe. That you are conducting an investigation to understand any potential impact. And that you hope to restore full operations very soon and plans to do that once you have full confidence that your network is safe.”
He nods.
CEO: “Okay. James, please prepare our draft response today and send it to me and Rebecca.” Then he refers another question to me. “And if they ask us for a timeline? Telling them one to two weeks is simply too long.”
He is trying to accelerate the investigation once again. Very reasonable. I’m thinking about how they are missing Windows Event Logs and collect them only from a quarter of their servers, with command line auditing disabled, and how they have only 7 days logs retention in their EDR console, which is partially deployed across the environment as is. So, we’re going to have to do a lot of manual forensics work here. Also, they don’t collect firewall telemetries to the SIEM, so it’s going to take some time to get those logs. Well, if they even exist - it’s going to be a long process to map hosts communicating with the C2 infrastructure without it. I was also informed by my team right before this call that they have no visibility for VPN logins, so there goes a quick win to check the common initial access via the VPN. I won’t be surprised if PowerShell logging was disabled, and I hope that at least we’ll have some visibility into their identity provider logs. Honestly, it’s probably closer to two weeks than one week in this case.
However, I need to give the CEO some confidence that we know what we’re doing. Surfacing these data gaps at this meeting won’t help his business today.
Me: “Essentially, before we go back to normal and resume your business operation, we need to get key servers recovered from backup, and it must be a version that we definitely know is clean. We need to know for sure that any server we recover is not in a compromised state, otherwise the threat actor can still have access to it and the incident can escalate. Also, at this point, we don’t know yet how the threat actor got into the network, so if we restore internet access they can just come back again the same way. The risk is very high right now. The investigation team must have time to identify how they got it, which malicious backdoors were deployed so we can know how to clean the recovered servers, and it’s not a short effort. It’s a long forensic analysis. The team already started and we’ll be sharing findings every day.”
This initial leadership meeting is not the best moment for me to explain to the CEO why the process is so long - that they have too many data gaps. From my experience it’s better to do so when they can already see the light at the end of the tunnel and they are prepared for a post mortem discussion and lessons learned for the future.
Remember “find the truth, decide, and act”?
In this real-world example above all I needed to do was to convey the truth to the CEO: we don’t know enough at this point, the risk is high, trying to restore operations too fast is too dangerous, and we need one to two weeks to investigate while you manage your clients.
This truth led to the right decision: don’t force your IT team to restore whatever backups they have and get everything moving immediately, instead invest time to clean the environment in a safe manner and try to control the damage in the meanwhile.
The Takeaway #
True incident preparedness starts with your security data. Being battle-ready means your security team can work fast, but still get accurate results. And it’s all comes down to data maturity - the discipline of collecting, retaining, and understanding what are the right telemetries, before you need them.
In the ransomware case above, the two-week shutdown wasn’t because our investigation team was slow. It was because we were blind in critical areas. Key logs were missing. Telemetry was inconsistent. Every question from leadership - “What’s clean? What’s compromised? Can we safely bring systems back online? Which data was stolen?” - took a very long time to answer. With stronger visibility, that same investigation could have been resolved in days, not weeks.
For example, if the collection of logs from Windows servers was comprehensive with advanced auditing enabled, and if the EDR had a very broad coverage with longer data retention, then there was no need to image dozens of disks for offline forensics analysis. This is one of the biggest time sinks in any incident response. If the firewall logs had been accessible and searchable, we could have mapped every host that communicated with the threat actor’s C2 infrastructure in minutes. That single capability can collapse days of uncertainty into a few focused queries.
A security breach is a cyber crisis, but it doesn’t have to become a business crisis. And when it does, the impact can range from minimal disruption to a full-scale shutdown.
And THAT is where data gaps make the difference. Strong visibility leads to faster investigations. Faster investigations reveal the truth sooner. And learning the truth sooner empowers leadership to make the right decisions, faster.
So, the core of incident response readiness isn’t about just defining roles and responsibilities, writing protocols and the periodic war room exercise. It’s about building the visibility to know, within hours, exactly what happened and what to do.
That readiness starts with data maturity and access.
Breaking the Cycle with Vega #
Traditional SIEMs made visibility a luxury - forcing teams to choose between coverage, cost, and speed. Every new data source meant more ingestion, more pipelines, and more invoices. That’s the cycle security teams have been trapped in for years.
Vega breaks that cycle.
With Vega’s federated Security Analytics Mesh (SAM), teams can analyze data where it already lives - across multiple storages, clouds, and environments - without re-ingesting or duplicating it. Organizations can achieve full visibility at scale, using both premium and low-cost storage, and query it all through one high-performance engine.
The result: faster investigations, faster decision making, and faster recovery - without the tradeoffs, and at an sustainable cost. Vega gives security teams the freedom to mature their data strategy, close visibility gaps, and be truly battle-ready for whatever comes next.
And that is why I joined Vega.
Ready to close your visibility gaps? Request a demo to see how Vega can help your team respond faster and with confidence when it matters most.