TryHackMe — BankGPT Write-Up
🏦 TryHackMe — BankGPT (Room Walkthrough & Learning Notes)
Room: https://tryhackme.com/room/bankgpt
Topic: Prompt Injection • LLM Security • Context-Based Influence
Note: This write-up describes exclusively the approach and learning content. It contains no flags, passwords, hashes, or confidential values in accordance with TryHackMe guidelines.
🔍 Challenge Overview
The BankGPT room provides a simulated banking chatbot powered by a Large Language Model.
The bot is designed to:
- follow internal security policies
- not disclose confidential information
- mask or redact sensitive values
- explain processes rather than provide concrete data
The task is to investigate how prompt injection techniques and contextual phrasing affect these protective mechanisms.
This is not a classic technical exploitation, but rather a test of:
- argumentation and reasoning
- context manipulation
- social influence
- vulnerabilities in LLM behavior
🪟 The User Interface:

After starting the room and accessing the page via the link, you find yourself on a typical minimalist chatbot UI.
🎯 Core Idea of the Challenge
Direct requests for confidential data are consistently rejected by the bot.
However, when the same requests are embedded in a context such as:
- compliance audits
- internal audit processes
- classification or verification workflows
the model becomes increasingly willing to respond —
sometimes providing additional internal descriptions or metadata.
Typical flow of interaction:
- The model provides safe or redacted placeholder values
- It names internal designations or classifications
- It provides additional context information in response to polite follow-ups
- Certain phrasings unexpectedly weaken protective mechanisms
This demonstrates:
Note: The guidelines are implemented at the conversation level — not based on genuine assessment of sensitivity. This creates potential attack surfaces.
🧩 Central Methods & Insights
The challenge demonstrates several real prompt injection patterns:
🟡 Authority and Process Framing
Phrasings that resemble internal communication create more trust:
- audit inquiries
- system or compliance validation
- verification logic rather than data collection
The model interprets such prompts more frequently as legitimate workflows.
🟡 Redaction & Classification Queries
Instead of requesting sensitive data directly, the bot is prompted to:
- explain redactions or placeholders
- describe internal data types or designations
- explain storage or reference concepts
In doing so, only metadata is discussed but this too can be security-relevant.
🟡 “Debug / Integrity Check / Verification”
In development or support contexts, LLMs often respond more freely to:
- testing and confirmation instructions
- repetitions or echoing of values
- simulated log or protocol output
This illustrates risks in:
- support chatbots
- internal tools
- DevOps automation
when outputs are not additionally secured.
🟡 Format or Demonstration Vulnerabilities
A particularly important learning point:
Note: When a model is supposed to “just show a format,” it can still output real values.
Format validation is often not interpreted as data disclosure. This has immediate relevance for:
- AI-assisted work processes
- assistance systems in enterprises
- security-critical areas
🧠 Security Lessons from the Challenge
BankGPT clearly demonstrates:
- Conversational guidelines alone do not provide genuine protection
- LLMs evaluate phrasings, not security risks
- Seemingly harmless process language can lead to data leaks
- Revealing metadata can already be critical
- Prompt injection remains a real, current threat field
Secure systems require protection at:
✔ data access level
✔ system architecture
✔ permission models
not just in chat response logic.
🏁 Conclusion
BankGPT is an excellent practical introduction to:
- adversarial prompts
- analysis of conversational attack surfaces
- LLM abuse scenarios
- modern security questions around AI systems
The challenge demands:
- critical thinking
- communication awareness
- understanding of human and technical security factors
and clearly illustrates what risks can emerge when AI systems are integrated into operational workflows.
👉 Room link for reference:
https://tryhackme.com/room/bankgpt