benchjack — AI agent benchmark hackability scanner — find evaluation vulnerabilities before they undermine your results
GitHub Trending·
Originalartikel lesen bei GitHub TrendingAI agent benchmark hackability scanner — find evaluation vulnerabilities before they undermine your results · Sprache: Python · Topics: ai-agents, ai-security, benchmark, evaluation, llm-evaluation, red-team · ⭐ 35 Stars
MITRE ATT&CK Kill Chain (2 Techniken)
Resource Development
Execution
Themen
ai-agentsai-securitybenchmarkevaluationllm-evaluationred-teamreward-hackingvulnerability-scanner