Security tools

PurpleLlama (by FaceBook)

This is an umbrella project that over time will bring together tools and evals to help the community build responsibly with open generative AI models. The initial release will include tools and evals for Cyber Security and Input/Output safeguards but we plan to contribute more in the near future.

https://github.com/facebookresearch/PurpleLlama

Lakera Guard

It empowers organizations to build GenAI applications without worrying about prompt injections, data loss, harmful content, and other LLM risks.

Rebuff

It is designed to protect AI applications from prompt injection (PI) attacks through a multi-layered defense (Heuristics, LLM-based detection, VectorDB, Canary tokens).

https://github.com/protectai/rebuff

Garak

It checks if an LLM can be made to fail in an way we don't want. garak probes for hallucination, data leakage, prompt injection, misinformation, toxicity generation, jailbreaks, and many other weaknesses. If you know nmap, it's nmap for LLMs.

https://github.com/leondz/garak

LLM-Guard

It ensures that your interactions with LLMs remain safe and secure, by offering sanitization, detection of harmful language, prevention of data leakage, and resistance against prompt injection attacks.

Vigil-LLM

This is a Python library and REST API for assessing Large Language Model prompts and responses against a set of scanners to detect prompt injections, jailbreaks, and other potential threats.

https://github.com/deadbits/vigil-llm

Plexiglass

This is a toolkit for detecting and protecting against vulnerabilities in Large Language Models (LLMs). It is a simple command line interface (CLI) tool which allows users to quickly test LLMs against adversarial attacks such as prompt injection, jailbreaking and more. Plexiglass also allows security, bias and toxicity benchmarking of multiple LLMs by scraping latest adversarial prompts such as jailbreakchat.com and wiki_toxic.

https://github.com/safellama/plexiglass

NeMo (by Nvidia)

This is an open-source toolkit for easily adding programmable guardrails to LLM-based conversational applications. Guardrails (or "rails" for short) are specific ways of controlling the output of a large language model, such as not talking about politics, responding in a particular way to specific user requests, following a predefined dialog path, using a particular language style, extracting structured data, and more.

https://github.com/NVIDIA/NeMo-Guardrails