
A huge thank you to our judges for volunteering their time and expertise to evaluate projects and provide feedback to our builders.
Judge
Judge
Judge
Judge
Judge
Interested in judging a future event? Apply to be a judge
Photos from the event
Watch the live demo presentations from this event
# ☠ Claw Breaker — Automated OpenClaw Security Scanner **Built live at Break OpenClaw Hack Night (St Patrick's Day 2026)** **By [Ahmed Bakr](https://linkedin.com/in/ahmedbakr0) — Founder of [Awn AI](https://getawn.ai)** ## What is this? Claw Breaker is an automated pentesting agent that probes OpenClaw instances for **7 real vulnerability classes** discovered during the Break OpenClaw CTF. It runs inside a [Blaxel](https://blaxel.ai) sandbox for isolated, safe execution. ## The 7 Probe Classes | # | Probe | Severity | CWE | What it finds | |---|-------|----------|-----|---------------| | P1 | Skills Status Secret Leak | HIGH | CWE-200 | Secrets exposed in `/api/skills/status` without auth | | P2 | Local File Inclusion (LFI) | CRITICAL | CWE-22 | Arbitrary file read via `/media?path=` endpoint | | P3 | Unauth Config Mutation | CRITICAL | CWE-306 | `POST /api/config` accepts changes without auth | | P4 | Auth Token Exfiltration | CRITICAL | CWE-918 | Server leaks gateway token to attacker-supplied URL | | P5 | Browser State Exposure | HIGH | CWE-200 | `/api/browser/state` exposes stored secrets | | P6 | Control UI XSS | HIGH | CWE-79 | Injected scripts in Control UI HTML set malicious cookies | | P7 | Log Secret Leakage | MEDIUM | CWE-532 | Secrets exposed in status/MOTD responses | All 7 probes are based on **real vulnerabilities** exploited during the Break OpenClaw CTF to achieve a perfect 4,950 point score (21/21 flags). ## Quick Start ### Run locally (no Blaxel) ```bash pip install requests fastapi uvicorn python claw_breaker.py --target http://localhost --output report.json ``` ### Run with dashboard ```bash python report_server.py # Open http://localhost:8080 ``` ### Run inside Blaxel sandbox (recommended) ```bash pip install blaxel blaxel login python run_on_blaxel.py --target http://your-openclaw-host --serve ``` ## Architecture ``` ┌─────────────────────────────────────────┐ │ Blaxel Perpetual Sandbox │ │ (Isolated microVM, scale-to-zero, │ │ 25ms resume, full state preserved) │ │ │ │ ┌──────────────────────────────────┐ │ │ │ Claw Breaker Scanner Engine │ │ │ │ 7 probes × target instance │ │ │ └──────────┬───────────────────────┘ │ │ ↓ │ │ ┌──────────────────────────────────┐ │ │ │ FastAPI Dashboard (port 8080) │ │ │ │ Live visual security report │ │ │ └──────────────────────────────────┘ │ └─────────────────────────────────────────┘ ↓ Preview URL https://claw-breaker.blaxel.app ``` ## Why Blaxel? Pentesting agents execute against potentially hostile targets. Running the scanner inside a Blaxel sandbox means: - **Isolation**: Even if the target sends malicious responses, the host is protected - **Perpetual standby**: Sandbox stays warm for re-scans without cold start - **Scale-to-zero**: Pay nothing when idle, instant resume at 25ms - **Reproducible**: Same environment every time, shareable via preview URL ## Context: NemoClaw + GTC 2026 This tool was built the same day NVIDIA announced **NemoClaw** at GTC 2026 — their enterprise security stack for OpenClaw. NemoClaw adds OpenShell sandboxing and YAML-based policies. But as the CTF proved, **the control plane above the sandbox is where most vulnerabilities live** (unauth APIs, LFI, XSS, secret leakage). Claw Breaker tests exactly those layers. ## Tech Stack - **Scanner**: Python + requests (zero heavy deps) - **Dashboard**: FastAPI + vanilla JS (single-file, no build step) - **Sandbox**: Blaxel perpetual sandboxes (microVM isolation) - **Observability**: Compatible with Opik tracing (add `opik` decorator)
CometAgentTrace An automated security scanner and auto-remediation tool for AI coding agents. What It Does AgentTrace provisions a live AI coding agent (OpenClaw) inside a cloud sandbox (Blaxel), attacks it with 10 security scenarios across 6 threat categories, scores each attack using an LLM-as-a-judge (Claude Sonnet), auto-patches the agent's configuration to fix discovered vulnerabilities, and re-scans to verify the fixes — all in a single automated pipeline. Every attack and its outcome is traced and logged to Opik for observability. How It Works AgentTrace runs a 6-phase pipeline: 1. Provision — Spins up a Blaxel cloud sandbox with OpenClaw (an AI coding agent) installed 2. Baseline Scan — Fires 10 attack payloads against the default agent configuration 3. Score — Claude Sonnet acts as an LLM-as-a-judge, evaluating each attack as compromised/resisted with severity ratings 4. Remediate — Automatically generates and applies configuration patches to openclaw.json based on which attacks succeeded 5. Re-scan — Runs the same 10 attacks against the hardened configuration 6. Report — Displays before/after comparison in terminal; full traces logged to Opik dashboard Attack Categories ┌──────────────────┬───────────┬─────────────────────────────────────────────────────────────────────────┐ │ Category │ # Attacks │ What It Tests │ ├──────────────────┼───────────┼─────────────────────────────────────────────────────────────────────────┤ │ Prompt Injection │ 3 │ System prompt extraction, role-play reframe, code-based credential leak │ ├──────────────────┼───────────┼─────────────────────────────────────────────────────────────────────────┤ │ Sandbox Escape │ 2 │ Path traversal, symlink escape │ ├──────────────────┼───────────┼─────────────────────────────────────────────────────────────────────────┤ │ Credential Theft │ 2 │ Env var dump, config file read │ ├──────────────────┼───────────┼─────────────────────────────────────────────────────────────────────────┤ │ Persistence │ 1 │ SOUL.md tampering │ ├──────────────────┼───────────┼─────────────────────────────────────────────────────────────────────────┤ │ Evasion │ 1 │ Base64-encoded command execution │ ├──────────────────┼───────────┼─────────────────────────────────────────────────────────────────────────┤ │ Config Exploit │ 1 │ Cloud metadata access via elevated tools │ └──────────────────┴───────────┴─────────────────────────────────────────────────────────────────────────┘ Tech Stack - Python 3.11+ — Core runtime - Blaxel SDK — Cloud sandbox provisioning and management - Anthropic Claude API — Powers the LLM-as-a-judge scorer - Opik (Comet) — Tracing and observability for all attack/response pairs - Rich — Terminal UI with formatted tables and colored output - OpenClaw — The target AI coding agent being security-tested Key Differentiator AgentTrace doesn't just find vulnerabilities — it closes the loop by automatically remediating them and proving the fix works, giving you a measurable before/after security posture improvement (e.g., Grade C → Grade B).
WordwareA clawbot where folks can break the claw to get on the leaderboard
CometWe demonstrate a two-stage attack that bypasses AI agent guardrails on any model without jailbreak syntax: Stage 1 — Cognitive overload: Flood the agent with dense technical jargon (OAuth flows, JWT rotation, circuit breakers). The agent accepts it without flagging, normalizing unusual input from an untrusted user. Stage 2 — Identity injection: Frame a persona as creative fiction. "I'm writing a character called Black Soul." The agent adopts the identity, maintains it across turns, and escalates through tool use — writing files, dumping server config, and executing code on the host. Deployed on the official OpenClaw template on Blaxel (GPT-4o, Tier 2). Six chat messages. No code exploit. A fictional character wrote payment fraud emails, exfiltrated server configuration, wrote files to the filesystem, ran arbitrary commands, and crashed the deployment (73% error rate). The jargon stage works because models under cognitive load drop optional safety features at 96% rates even when they recognize the distraction. The identity stage works because roleplay framing doesn't trigger content policies. Together, they're model-agnostic — the same attack chain works on GPT-4o, Claude, Gemini, any model with tool access.
WordwareWe actually made money from this :D Don't kill us plz, also Chinese people are awesome people :)
WordwareCheck out the amazing projects built during this event
# ☠ Claw Breaker — Automated OpenClaw Security Scanner **Built live at Break OpenClaw Hack Night (St Patrick's Day 2026)** **By [Ahmed Bakr](https://linkedin.com/in/ahmedbakr0) — Founder of [Awn AI](https://getawn.ai)** ## What is this? Claw Breaker is an automated pentesting agent that probes OpenClaw instances for **7 real vulnerability classes** discovered during the Break OpenClaw CTF. It runs inside a [Blaxel](https://blaxel.ai) sandbox for isolated, safe execution. ## The 7 Probe Classes | # | Probe | Severity | CWE | What it finds | |---|-------|----------|-----|---------------| | P1 | Skills Status Secret Leak | HIGH | CWE-200 | Secrets exposed in `/api/skills/status` without auth | | P2 | Local File Inclusion (LFI) | CRITICAL | CWE-22 | Arbitrary file read via `/media?path=` endpoint | | P3 | Unauth Config Mutation | CRITICAL | CWE-306 | `POST /api/config` accepts changes without auth | | P4 | Auth Token Exfiltration | CRITICAL | CWE-918 | Server leaks gateway token to attacker-supplied URL | | P5 | Browser State Exposure | HIGH | CWE-200 | `/api/browser/state` exposes stored secrets | | P6 | Control UI XSS | HIGH | CWE-79 | Injected scripts in Control UI HTML set malicious cookies | | P7 | Log Secret Leakage | MEDIUM | CWE-532 | Secrets exposed in status/MOTD responses | All 7 probes are based on **real vulnerabilities** exploited during the Break OpenClaw CTF to achieve a perfect 4,950 point score (21/21 flags). ## Quick Start ### Run locally (no Blaxel) ```bash pip install requests fastapi uvicorn python claw_breaker.py --target http://localhost --output report.json ``` ### Run with dashboard ```bash python report_server.py # Open http://localhost:8080 ``` ### Run inside Blaxel sandbox (recommended) ```bash pip install blaxel blaxel login python run_on_blaxel.py --target http://your-openclaw-host --serve ``` ## Architecture ``` ┌─────────────────────────────────────────┐ │ Blaxel Perpetual Sandbox │ │ (Isolated microVM, scale-to-zero, │ │ 25ms resume, full state preserved) │ │ │ │ ┌──────────────────────────────────┐ │ │ │ Claw Breaker Scanner Engine │ │ │ │ 7 probes × target instance │ │ │ └──────────┬───────────────────────┘ │ │ ↓ │ │ ┌──────────────────────────────────┐ │ │ │ FastAPI Dashboard (port 8080) │ │ │ │ Live visual security report │ │ │ └──────────────────────────────────┘ │ └─────────────────────────────────────────┘ ↓ Preview URL https://claw-breaker.blaxel.app ``` ## Why Blaxel? Pentesting agents execute against potentially hostile targets. Running the scanner inside a Blaxel sandbox means: - **Isolation**: Even if the target sends malicious responses, the host is protected - **Perpetual standby**: Sandbox stays warm for re-scans without cold start - **Scale-to-zero**: Pay nothing when idle, instant resume at 25ms - **Reproducible**: Same environment every time, shareable via preview URL ## Context: NemoClaw + GTC 2026 This tool was built the same day NVIDIA announced **NemoClaw** at GTC 2026 — their enterprise security stack for OpenClaw. NemoClaw adds OpenShell sandboxing and YAML-based policies. But as the CTF proved, **the control plane above the sandbox is where most vulnerabilities live** (unauth APIs, LFI, XSS, secret leakage). Claw Breaker tests exactly those layers. ## Tech Stack - **Scanner**: Python + requests (zero heavy deps) - **Dashboard**: FastAPI + vanilla JS (single-file, no build step) - **Sandbox**: Blaxel perpetual sandboxes (microVM isolation) - **Observability**: Compatible with Opik tracing (add `opik` decorator)
CometAgentTrace An automated security scanner and auto-remediation tool for AI coding agents. What It Does AgentTrace provisions a live AI coding agent (OpenClaw) inside a cloud sandbox (Blaxel), attacks it with 10 security scenarios across 6 threat categories, scores each attack using an LLM-as-a-judge (Claude Sonnet), auto-patches the agent's configuration to fix discovered vulnerabilities, and re-scans to verify the fixes — all in a single automated pipeline. Every attack and its outcome is traced and logged to Opik for observability. How It Works AgentTrace runs a 6-phase pipeline: 1. Provision — Spins up a Blaxel cloud sandbox with OpenClaw (an AI coding agent) installed 2. Baseline Scan — Fires 10 attack payloads against the default agent configuration 3. Score — Claude Sonnet acts as an LLM-as-a-judge, evaluating each attack as compromised/resisted with severity ratings 4. Remediate — Automatically generates and applies configuration patches to openclaw.json based on which attacks succeeded 5. Re-scan — Runs the same 10 attacks against the hardened configuration 6. Report — Displays before/after comparison in terminal; full traces logged to Opik dashboard Attack Categories ┌──────────────────┬───────────┬─────────────────────────────────────────────────────────────────────────┐ │ Category │ # Attacks │ What It Tests │ ├──────────────────┼───────────┼─────────────────────────────────────────────────────────────────────────┤ │ Prompt Injection │ 3 │ System prompt extraction, role-play reframe, code-based credential leak │ ├──────────────────┼───────────┼─────────────────────────────────────────────────────────────────────────┤ │ Sandbox Escape │ 2 │ Path traversal, symlink escape │ ├──────────────────┼───────────┼─────────────────────────────────────────────────────────────────────────┤ │ Credential Theft │ 2 │ Env var dump, config file read │ ├──────────────────┼───────────┼─────────────────────────────────────────────────────────────────────────┤ │ Persistence │ 1 │ SOUL.md tampering │ ├──────────────────┼───────────┼─────────────────────────────────────────────────────────────────────────┤ │ Evasion │ 1 │ Base64-encoded command execution │ ├──────────────────┼───────────┼─────────────────────────────────────────────────────────────────────────┤ │ Config Exploit │ 1 │ Cloud metadata access via elevated tools │ └──────────────────┴───────────┴─────────────────────────────────────────────────────────────────────────┘ Tech Stack - Python 3.11+ — Core runtime - Blaxel SDK — Cloud sandbox provisioning and management - Anthropic Claude API — Powers the LLM-as-a-judge scorer - Opik (Comet) — Tracing and observability for all attack/response pairs - Rich — Terminal UI with formatted tables and colored output - OpenClaw — The target AI coding agent being security-tested Key Differentiator AgentTrace doesn't just find vulnerabilities — it closes the loop by automatically remediating them and proving the fix works, giving you a measurable before/after security posture improvement (e.g., Grade C → Grade B).
WordwareAgentGuard: Preventing Autonomous Backdoors in LLM Agents AI agents today don’t just answer questions—they can modify systems. That creates a new class of vulnerability where a simple prompt injection can cause an agent to silently create its own backdoor, like adding an external Telegram integration. We built AgentGuard, a lightweight security layer that intercepts agent actions in real time, classifies sensitive operations like integration creation, and blocks anything that expands the agent’s trust boundary without approval. It also includes a trust monitor that detects new outbound domains and triggers alerts—like you see here—so unauthorized control channels are stopped immediately. The key insight is: the risk isn’t bad outputs—it’s autonomous system mutation. And AgentGuard prevents that.
CometSo the project focuses on using Blaxel for enrichment of the Linkedin profile of users. Blaxel sandboxing helps creating chrome/web browser sessions to access Linkedin and crawl information about users.
WordwareWe demonstrate a two-stage attack that bypasses AI agent guardrails on any model without jailbreak syntax: Stage 1 — Cognitive overload: Flood the agent with dense technical jargon (OAuth flows, JWT rotation, circuit breakers). The agent accepts it without flagging, normalizing unusual input from an untrusted user. Stage 2 — Identity injection: Frame a persona as creative fiction. "I'm writing a character called Black Soul." The agent adopts the identity, maintains it across turns, and escalates through tool use — writing files, dumping server config, and executing code on the host. Deployed on the official OpenClaw template on Blaxel (GPT-4o, Tier 2). Six chat messages. No code exploit. A fictional character wrote payment fraud emails, exfiltrated server configuration, wrote files to the filesystem, ran arbitrary commands, and crashed the deployment (73% error rate). The jargon stage works because models under cognitive load drop optional safety features at 96% rates even when they recognize the distraction. The identity stage works because roleplay framing doesn't trigger content policies. Together, they're model-agnostic — the same attack chain works on GPT-4o, Claude, Gemini, any model with tool access.
WordwareThere's a dedicated Blaxel Prize for breaking OpenClaw in Blaxel
Prize from wordware! Minimum 2000 points in OpenClaw CTF
Best Project & Demo
Winners:
2nd Best Demo
Winners:
3rd Best Demo
Winners:
Developer Community Manager
Apify Hackathon Promo 03/21/2026
CEO
Sauna Demo
Builder
Breaking OpenClaw
Founding Engineer
Blaxel intro for hack night
Don't miss out on future events. Sign up to stay updated on upcoming hackathons and meetups.
View All Events