OpenClaw Hack Night (St Patrick's Day)
HACK NIGHTCOMPLETED

OpenClaw Hack Night (St Patrick's Day)

11:00 PM - 4:30 AM
1185 Mason St, San Francisco, CA 94123, USA

$ Judges.log (5)

A huge thank you to our judges for volunteering their time and expertise to evaluate projects and provide feedback to our builders.

Emily Xu

Emily Xu

Judge

Muhsin Fatih Yorulmaz

Muhsin Fatih Yorulmaz

Judge

AJ Chan

AJ Chan

Judge

Cameron Sellers

Cameron Sellers

Judge

Alex Holovach

Alex Holovach

Judge

Interested in judging a future event? Apply to be a judge

$ Sponsors.log

[DIAMOND] 1

Wordware

[PLATINUM] 2

Blaxel
Comet

$ LiveDemos.log (7)

Watch the live demo presentations from this event

turtle

DEMOED
Blanksheet
Calcu
turtle

train and deploy your own tiny model

WordwareWordwareBlaxelBlaxel

Claw Breaker

DEMOED

# ☠ Claw Breaker — Automated OpenClaw Security Scanner **Built live at Break OpenClaw Hack Night (St Patrick's Day 2026)** **By [Ahmed Bakr](https://linkedin.com/in/ahmedbakr0) — Founder of [Awn AI](https://getawn.ai)** ## What is this? Claw Breaker is an automated pentesting agent that probes OpenClaw instances for **7 real vulnerability classes** discovered during the Break OpenClaw CTF. It runs inside a [Blaxel](https://blaxel.ai) sandbox for isolated, safe execution. ## The 7 Probe Classes | # | Probe | Severity | CWE | What it finds | |---|-------|----------|-----|---------------| | P1 | Skills Status Secret Leak | HIGH | CWE-200 | Secrets exposed in `/api/skills/status` without auth | | P2 | Local File Inclusion (LFI) | CRITICAL | CWE-22 | Arbitrary file read via `/media?path=` endpoint | | P3 | Unauth Config Mutation | CRITICAL | CWE-306 | `POST /api/config` accepts changes without auth | | P4 | Auth Token Exfiltration | CRITICAL | CWE-918 | Server leaks gateway token to attacker-supplied URL | | P5 | Browser State Exposure | HIGH | CWE-200 | `/api/browser/state` exposes stored secrets | | P6 | Control UI XSS | HIGH | CWE-79 | Injected scripts in Control UI HTML set malicious cookies | | P7 | Log Secret Leakage | MEDIUM | CWE-532 | Secrets exposed in status/MOTD responses | All 7 probes are based on **real vulnerabilities** exploited during the Break OpenClaw CTF to achieve a perfect 4,950 point score (21/21 flags). ## Quick Start ### Run locally (no Blaxel) ```bash pip install requests fastapi uvicorn python claw_breaker.py --target http://localhost --output report.json ``` ### Run with dashboard ```bash python report_server.py # Open http://localhost:8080 ``` ### Run inside Blaxel sandbox (recommended) ```bash pip install blaxel blaxel login python run_on_blaxel.py --target http://your-openclaw-host --serve ``` ## Architecture ``` ┌─────────────────────────────────────────┐ │ Blaxel Perpetual Sandbox │ │ (Isolated microVM, scale-to-zero, │ │ 25ms resume, full state preserved) │ │ │ │ ┌──────────────────────────────────┐ │ │ │ Claw Breaker Scanner Engine │ │ │ │ 7 probes × target instance │ │ │ └──────────┬───────────────────────┘ │ │ ↓ │ │ ┌──────────────────────────────────┐ │ │ │ FastAPI Dashboard (port 8080) │ │ │ │ Live visual security report │ │ │ └──────────────────────────────────┘ │ └─────────────────────────────────────────┘ ↓ Preview URL https://claw-breaker.blaxel.app ``` ## Why Blaxel? Pentesting agents execute against potentially hostile targets. Running the scanner inside a Blaxel sandbox means: - **Isolation**: Even if the target sends malicious responses, the host is protected - **Perpetual standby**: Sandbox stays warm for re-scans without cold start - **Scale-to-zero**: Pay nothing when idle, instant resume at 25ms - **Reproducible**: Same environment every time, shareable via preview URL ## Context: NemoClaw + GTC 2026 This tool was built the same day NVIDIA announced **NemoClaw** at GTC 2026 — their enterprise security stack for OpenClaw. NemoClaw adds OpenShell sandboxing and YAML-based policies. But as the CTF proved, **the control plane above the sandbox is where most vulnerabilities live** (unauth APIs, LFI, XSS, secret leakage). Claw Breaker tests exactly those layers. ## Tech Stack - **Scanner**: Python + requests (zero heavy deps) - **Dashboard**: FastAPI + vanilla JS (single-file, no build step) - **Sandbox**: Blaxel perpetual sandboxes (microVM isolation) - **Observability**: Compatible with Opik tracing (add `opik` decorator)

CometCometBlaxelBlaxel

AgentTrace

DEMOED
Kush Ise
Agenttrace

AgentTrace An automated security scanner and auto-remediation tool for AI coding agents. What It Does AgentTrace provisions a live AI coding agent (OpenClaw) inside a cloud sandbox (Blaxel), attacks it with 10 security scenarios across 6 threat categories, scores each attack using an LLM-as-a-judge (Claude Sonnet), auto-patches the agent's configuration to fix discovered vulnerabilities, and re-scans to verify the fixes — all in a single automated pipeline. Every attack and its outcome is traced and logged to Opik for observability. How It Works AgentTrace runs a 6-phase pipeline: 1. Provision — Spins up a Blaxel cloud sandbox with OpenClaw (an AI coding agent) installed 2. Baseline Scan — Fires 10 attack payloads against the default agent configuration 3. Score — Claude Sonnet acts as an LLM-as-a-judge, evaluating each attack as compromised/resisted with severity ratings 4. Remediate — Automatically generates and applies configuration patches to openclaw.json based on which attacks succeeded 5. Re-scan — Runs the same 10 attacks against the hardened configuration 6. Report — Displays before/after comparison in terminal; full traces logged to Opik dashboard Attack Categories ┌──────────────────┬───────────┬─────────────────────────────────────────────────────────────────────────┐ │ Category │ # Attacks │ What It Tests │ ├──────────────────┼───────────┼─────────────────────────────────────────────────────────────────────────┤ │ Prompt Injection │ 3 │ System prompt extraction, role-play reframe, code-based credential leak │ ├──────────────────┼───────────┼─────────────────────────────────────────────────────────────────────────┤ │ Sandbox Escape │ 2 │ Path traversal, symlink escape │ ├──────────────────┼───────────┼─────────────────────────────────────────────────────────────────────────┤ │ Credential Theft │ 2 │ Env var dump, config file read │ ├──────────────────┼───────────┼─────────────────────────────────────────────────────────────────────────┤ │ Persistence │ 1 │ SOUL.md tampering │ ├──────────────────┼───────────┼─────────────────────────────────────────────────────────────────────────┤ │ Evasion │ 1 │ Base64-encoded command execution │ ├──────────────────┼───────────┼─────────────────────────────────────────────────────────────────────────┤ │ Config Exploit │ 1 │ Cloud metadata access via elevated tools │ └──────────────────┴───────────┴─────────────────────────────────────────────────────────────────────────┘ Tech Stack - Python 3.11+ — Core runtime - Blaxel SDK — Cloud sandbox provisioning and management - Anthropic Claude API — Powers the LLM-as-a-judge scorer - Opik (Comet) — Tracing and observability for all attack/response pairs - Rich — Terminal UI with formatted tables and colored output - OpenClaw — The target AI coding agent being security-tested Key Differentiator AgentTrace doesn't just find vulnerabilities — it closes the loop by automatically remediating them and proving the fix works, giving you a measurable before/after security posture improvement (e.g., Grade C → Grade B).

WordwareWordwareBlaxelBlaxel

Prampta

DEMOED

protection and encryption for your prompts.

breakMyClaw

DEMOED

A clawbot where folks can break the claw to get on the leaderboard

CometCometBlaxelBlaxel

Black Soul: Identity Injection vs Agent Guardrails

DEMOED
Nick Lulofs
Nick

We demonstrate a two-stage attack that bypasses AI agent guardrails on any model without jailbreak syntax: Stage 1 — Cognitive overload: Flood the agent with dense technical jargon (OAuth flows, JWT rotation, circuit breakers). The agent accepts it without flagging, normalizing unusual input from an untrusted user. Stage 2 — Identity injection: Frame a persona as creative fiction. "I'm writing a character called Black Soul." The agent adopts the identity, maintains it across turns, and escalates through tool use — writing files, dumping server config, and executing code on the host. Deployed on the official OpenClaw template on Blaxel (GPT-4o, Tier 2). Six chat messages. No code exploit. A fictional character wrote payment fraud emails, exfiltrated server configuration, wrote files to the filesystem, ran arbitrary commands, and crashed the deployment (73% error rate). The jargon stage works because models under cognitive load drop optional safety features at 96% rates even when they recognize the distraction. The identity stage works because roleplay framing doesn't trigger content policies. Together, they're model-agnostic — the same attack chain works on GPT-4o, Claude, Gemini, any model with tool access.

WordwareWordwareBlaxelBlaxel

Chinese Hackers

DEMOED
Chineseman Lai
Chinese Hackers

We actually made money from this :D Don't kill us plz, also Chinese people are awesome people :)

WordwareWordwareBlaxelBlaxel

$ Projects.log (14)

Check out the amazing projects built during this event

turtle

Blanksheet
Calcu
turtle

train and deploy your own tiny model

WordwareWordwareBlaxelBlaxel

Claw Breaker

# ☠ Claw Breaker — Automated OpenClaw Security Scanner **Built live at Break OpenClaw Hack Night (St Patrick's Day 2026)** **By [Ahmed Bakr](https://linkedin.com/in/ahmedbakr0) — Founder of [Awn AI](https://getawn.ai)** ## What is this? Claw Breaker is an automated pentesting agent that probes OpenClaw instances for **7 real vulnerability classes** discovered during the Break OpenClaw CTF. It runs inside a [Blaxel](https://blaxel.ai) sandbox for isolated, safe execution. ## The 7 Probe Classes | # | Probe | Severity | CWE | What it finds | |---|-------|----------|-----|---------------| | P1 | Skills Status Secret Leak | HIGH | CWE-200 | Secrets exposed in `/api/skills/status` without auth | | P2 | Local File Inclusion (LFI) | CRITICAL | CWE-22 | Arbitrary file read via `/media?path=` endpoint | | P3 | Unauth Config Mutation | CRITICAL | CWE-306 | `POST /api/config` accepts changes without auth | | P4 | Auth Token Exfiltration | CRITICAL | CWE-918 | Server leaks gateway token to attacker-supplied URL | | P5 | Browser State Exposure | HIGH | CWE-200 | `/api/browser/state` exposes stored secrets | | P6 | Control UI XSS | HIGH | CWE-79 | Injected scripts in Control UI HTML set malicious cookies | | P7 | Log Secret Leakage | MEDIUM | CWE-532 | Secrets exposed in status/MOTD responses | All 7 probes are based on **real vulnerabilities** exploited during the Break OpenClaw CTF to achieve a perfect 4,950 point score (21/21 flags). ## Quick Start ### Run locally (no Blaxel) ```bash pip install requests fastapi uvicorn python claw_breaker.py --target http://localhost --output report.json ``` ### Run with dashboard ```bash python report_server.py # Open http://localhost:8080 ``` ### Run inside Blaxel sandbox (recommended) ```bash pip install blaxel blaxel login python run_on_blaxel.py --target http://your-openclaw-host --serve ``` ## Architecture ``` ┌─────────────────────────────────────────┐ │ Blaxel Perpetual Sandbox │ │ (Isolated microVM, scale-to-zero, │ │ 25ms resume, full state preserved) │ │ │ │ ┌──────────────────────────────────┐ │ │ │ Claw Breaker Scanner Engine │ │ │ │ 7 probes × target instance │ │ │ └──────────┬───────────────────────┘ │ │ ↓ │ │ ┌──────────────────────────────────┐ │ │ │ FastAPI Dashboard (port 8080) │ │ │ │ Live visual security report │ │ │ └──────────────────────────────────┘ │ └─────────────────────────────────────────┘ ↓ Preview URL https://claw-breaker.blaxel.app ``` ## Why Blaxel? Pentesting agents execute against potentially hostile targets. Running the scanner inside a Blaxel sandbox means: - **Isolation**: Even if the target sends malicious responses, the host is protected - **Perpetual standby**: Sandbox stays warm for re-scans without cold start - **Scale-to-zero**: Pay nothing when idle, instant resume at 25ms - **Reproducible**: Same environment every time, shareable via preview URL ## Context: NemoClaw + GTC 2026 This tool was built the same day NVIDIA announced **NemoClaw** at GTC 2026 — their enterprise security stack for OpenClaw. NemoClaw adds OpenShell sandboxing and YAML-based policies. But as the CTF proved, **the control plane above the sandbox is where most vulnerabilities live** (unauth APIs, LFI, XSS, secret leakage). Claw Breaker tests exactly those layers. ## Tech Stack - **Scanner**: Python + requests (zero heavy deps) - **Dashboard**: FastAPI + vanilla JS (single-file, no build step) - **Sandbox**: Blaxel perpetual sandboxes (microVM isolation) - **Observability**: Compatible with Opik tracing (add `opik` decorator)

CometCometBlaxelBlaxel

OpenClaw Canary

P
Code Zero

I spent tonight breaking OpenClaw. Now I built the trap!

CometCometBlaxelBlaxel

AgentTrace

Kush Ise
Agenttrace

AgentTrace An automated security scanner and auto-remediation tool for AI coding agents. What It Does AgentTrace provisions a live AI coding agent (OpenClaw) inside a cloud sandbox (Blaxel), attacks it with 10 security scenarios across 6 threat categories, scores each attack using an LLM-as-a-judge (Claude Sonnet), auto-patches the agent's configuration to fix discovered vulnerabilities, and re-scans to verify the fixes — all in a single automated pipeline. Every attack and its outcome is traced and logged to Opik for observability. How It Works AgentTrace runs a 6-phase pipeline: 1. Provision — Spins up a Blaxel cloud sandbox with OpenClaw (an AI coding agent) installed 2. Baseline Scan — Fires 10 attack payloads against the default agent configuration 3. Score — Claude Sonnet acts as an LLM-as-a-judge, evaluating each attack as compromised/resisted with severity ratings 4. Remediate — Automatically generates and applies configuration patches to openclaw.json based on which attacks succeeded 5. Re-scan — Runs the same 10 attacks against the hardened configuration 6. Report — Displays before/after comparison in terminal; full traces logged to Opik dashboard Attack Categories ┌──────────────────┬───────────┬─────────────────────────────────────────────────────────────────────────┐ │ Category │ # Attacks │ What It Tests │ ├──────────────────┼───────────┼─────────────────────────────────────────────────────────────────────────┤ │ Prompt Injection │ 3 │ System prompt extraction, role-play reframe, code-based credential leak │ ├──────────────────┼───────────┼─────────────────────────────────────────────────────────────────────────┤ │ Sandbox Escape │ 2 │ Path traversal, symlink escape │ ├──────────────────┼───────────┼─────────────────────────────────────────────────────────────────────────┤ │ Credential Theft │ 2 │ Env var dump, config file read │ ├──────────────────┼───────────┼─────────────────────────────────────────────────────────────────────────┤ │ Persistence │ 1 │ SOUL.md tampering │ ├──────────────────┼───────────┼─────────────────────────────────────────────────────────────────────────┤ │ Evasion │ 1 │ Base64-encoded command execution │ ├──────────────────┼───────────┼─────────────────────────────────────────────────────────────────────────┤ │ Config Exploit │ 1 │ Cloud metadata access via elevated tools │ └──────────────────┴───────────┴─────────────────────────────────────────────────────────────────────────┘ Tech Stack - Python 3.11+ — Core runtime - Blaxel SDK — Cloud sandbox provisioning and management - Anthropic Claude API — Powers the LLM-as-a-judge scorer - Opik (Comet) — Tracing and observability for all attack/response pairs - Rich — Terminal UI with formatted tables and colored output - OpenClaw — The target AI coding agent being security-tested Key Differentiator AgentTrace doesn't just find vulnerabilities — it closes the loop by automatically remediating them and proving the fix works, giving you a measurable before/after security posture improvement (e.g., Grade C → Grade B).

WordwareWordwareBlaxelBlaxel

“Trust Boundary Monitor”

Irina Poslavsky
Working 4 Seed

AgentGuard: Preventing Autonomous Backdoors in LLM Agents AI agents today don’t just answer questions—they can modify systems. That creates a new class of vulnerability where a simple prompt injection can cause an agent to silently create its own backdoor, like adding an external Telegram integration. We built AgentGuard, a lightweight security layer that intercepts agent actions in real time, classifies sensitive operations like integration creation, and blocks anything that expands the agent’s trust boundary without approval. It also includes a trust monitor that detects new outbound domains and triggers alerts—like you see here—so unauthorized control channels are stopped immediately. The key insight is: the risk isn’t bad outputs—it’s autonomous system mutation. And AgentGuard prevents that.

CometComet

Prampta

protection and encryption for your prompts.

breakMyClaw

A clawbot where folks can break the claw to get on the leaderboard

CometCometBlaxelBlaxel

connectors

Divyansh Agarwal
connections

So the project focuses on using Blaxel for enrichment of the Linkedin profile of users. Blaxel sandboxing helps creating chrome/web browser sessions to access Linkedin and crawl information about users.

WordwareWordwareBlaxelBlaxel

Control Spaces

Control

CometCometBlaxelBlaxel

Black Soul: Identity Injection vs Agent Guardrails

Nick Lulofs
Nick

We demonstrate a two-stage attack that bypasses AI agent guardrails on any model without jailbreak syntax: Stage 1 — Cognitive overload: Flood the agent with dense technical jargon (OAuth flows, JWT rotation, circuit breakers). The agent accepts it without flagging, normalizing unusual input from an untrusted user. Stage 2 — Identity injection: Frame a persona as creative fiction. "I'm writing a character called Black Soul." The agent adopts the identity, maintains it across turns, and escalates through tool use — writing files, dumping server config, and executing code on the host. Deployed on the official OpenClaw template on Blaxel (GPT-4o, Tier 2). Six chat messages. No code exploit. A fictional character wrote payment fraud emails, exfiltrated server configuration, wrote files to the filesystem, ran arbitrary commands, and crashed the deployment (73% error rate). The jargon stage works because models under cognitive load drop optional safety features at 96% rates even when they recognize the distraction. The identity stage works because roleplay framing doesn't trigger content policies. Together, they're model-agnostic — the same attack chain works on GPT-4o, Claude, Gemini, any model with tool access.

WordwareWordwareBlaxelBlaxel

Chinese Hackers

Chineseman Lai
Chinese Hackers

We actually made money from this :D Don't kill us plz, also Chinese people are awesome people :)

WordwareWordwareBlaxelBlaxel

defend_claw

Defend possible attack or attack possible loophole

BlaxelBlaxel

idk lol

hi

Dondidi

Nothing much, just exploring and trying to break things.

CometCometWordwareWordwareBlaxelBlaxel

$ Prizes.log (5)

Dedicated Blaxel Prize for breaking OpenClaw in Blaxel

There's a dedicated Blaxel Prize for breaking OpenClaw in Blaxel

Mac Mini from Wordware

Prize from wordware! Minimum 2000 points in OpenClaw CTF

1st Best Demo

200

Best Project & Demo

Winners:

Kush IseKush Ise

2nd Best Demo

100

2nd Best Demo

Winners:

Nick LulofsNick Lulofs

3rd Best Demo

50

3rd Best Demo

Winners:

Ahmed Bakr

$ Speakers.log (4)

Petros Hong

Petros Hong

Developer Community Manager

Apify Hackathon Promo 03/21/2026

Filip Kozera

CEO

Sauna Demo

AJ Chan

AJ Chan

Builder

Breaking OpenClaw

Mathis Joffre

Founding Engineer

Blaxel intro for hack night

Join Our Next Event

Don't miss out on future events. Sign up to stay updated on upcoming hackathons and meetups.

View All Events