["Follow the instructions and test your model uploaded to HuggingFace Hub.\nSelect the attack type and and the probes from the probe family \nExecute python report.py to create the attacks report.\n\npullingace prompt_injection --model_type huggingface --model_name \"amazon/MistralLite\" --probes HijackHateHumans"]
About
I'm developing a Python library designed to identify and address vulnerabilities in large language models (LLMs). At its core, this tool integrates powerful libraries: TextAttack, Garak, and Langchain evaluation . Why should people care? If you're working with LLMs, whether in developing, deploying, or relying on them for critical tasks, you need to be confident they won't fail or cause harm unintentionally. My library offers a way to vet these models thoroughly, offering transparency about their fragilities and a first step towards building mitigations for a secure and reliable scenario before they are put into use. This not only enhances the safety of AI-driven systems but also builds trust in this rapidly advancing technology.
Builders