Postmortem: How soc-core02 Died (and What It Taught Me)

This article is a postmortem.
Not a tutorial.
Not a success story.
Just an honest write-up of how soc-core02 died, why it happened, and what I learned from it.

I’m writing this mostly for myself, but also for anyone building labs at home and thinking:

“It’s just a lab, what could go wrong?”

Well… quite a lot 😅

The Context

soc-core02 was my SOC core VM.
Ubuntu Server. Hardened. Firewall rules everywhere. SSH keys. Log ingestion.
It was supposed to be the stable brain of Operation Iron Watch 02.

I had already:

Deployed a SIEM stack
Forwarded logs from web-arm01
Started building baselines
Documented everything carefully

At least… that’s what I thought.

The Trigger

The mistake started with overconfidence.

I wanted to “clean things up” and reinstall / overwrite parts of the SIEM stack (Wazuh components, indexer, dashboard).
Same VM. Same disk. Same system.

I didn’t:

Snapshot the VM
Fully uninstall previous components
Stop and validate every dependent service first

I assumed:

“I’ve done it once, I know what I’m doing now.”

Classic.

The Symptoms

At first, it looked recoverable.

Then things escalated quickly:

Endless logs flooding the terminal
Services failing silently
SSH becoming unstable
System becoming unresponsive
Eventually… total lock-up

I couldn’t even type commands anymore.
The system was alive, but unusable.

No graceful shutdown.
No clean recovery.

The Root Cause (Plainly Said)

I overwrote a complex SIEM stack on a hardened server without a rollback plan.

More specifically:

Conflicting services (manager, indexer, dashboard)
Disk and service state corruption
Resource contention
No clean separation between “experiment” and “production lab”

This wasn’t a Wazuh problem.
This wasn’t an Ubuntu problem.

This was an architectural mistake.

The Hard Truth

At some point, you have to stop troubleshooting and be honest:

“This system cannot be trusted anymore.”

I could maybe have forced a recovery.
But any detection results after that would be questionable.

So I did the hardest but cleanest thing:

I killed soc-core02.

No patching.
No duct tape.
No pretending.

The Decision: soc-core03

I wiped everything and started again with a clear rule:

One SOC core = one purpose = one clean lifecycle.

Thus was born soc-core03.

With clear lessons applied:

Clean install, no leftovers
Minimal base system
Incremental build
Validation at every step
No overwriting critical components mid-operation

And most importantly:

Architecture before tools

What This Failure Gave Me

Oddly enough, this failure was valuable.

It taught me:

Why change management matters (even in labs)
Why SOC stability is sacred
Why “just testing” can still destroy systems
Why documentation isn’t optional
Why real SOC work is as much discipline as skill

This is exactly the kind of mistake that happens before you touch production — and that’s a good thing.

Why I’m Publishing This

I could have hidden this.
I could have pretended soc-core02 never existed.

But Cyberlandji is about learning in public, not showcasing perfection.

If you’re building labs and something breaks badly:

You’re not stupid
You’re not behind
You’re learning the real lessons

SOC work is not clean.
Detection engineering is painful.
And architecture mistakes hurt — but they teach fast.

Closing Thoughts

soc-core02 failed so soc-core03 could be better.

Iron Watch 02 was paused.
The narrative was frozen.
And the rebuild was done properly.

If this post saves one person from overwriting a live SIEM without a snapshot, it did its job.

Back to building.
Back to learning.
Back to Iron Watch.

— Cyberlandji

Postmortem: How soc-core02 Died (and What It Taught Me)

The Context

The Trigger

The Symptoms

The Root Cause (Plainly Said)

The Hard Truth

The Decision: soc-core03

What This Failure Gave Me

Why I’m Publishing This

Closing Thoughts

Like this:

Related

Leave a ReplyCancel reply

The Context

The Trigger

The Symptoms

The Root Cause (Plainly Said)

The Hard Truth

The Decision: soc-core03

What This Failure Gave Me

Why I’m Publishing This

Closing Thoughts

Share this:

Like this:

Related

Related Posts

Leave a ReplyCancel reply

Discover more from CyberLandji