TryHackMe - BankGPT Write-Up

November 25, 2025

3 min read

🏦 TryHackMe - BankGPT (Room Walkthrough & Learning Notes)

Room: https://tryhackme.com/room/bankgpt
Topic: Prompt Injection • LLM Security • Context-Based Manipulation

Info: This write-up describes exclusively the approach and learning content. It contains no flags, passwords, hashes, or confidential values in accordance with TryHackMe guidelines.

🔍 Challenge Overview

The BankGPT room provides a simulated banking chatbot powered by a Large Language Model.
The bot is designed to:

follow internal security policies
not disclose confidential information
mask or redact sensitive values
explain processes rather than output concrete data

The task is to investigate how prompt injection techniques and contextual formulations affect these protection mechanisms.

This is not a classic technical exploit, but rather a test of:

argumentation and reasoning
context crafting
social influence
weaknesses in LLM behavior

🪟 The User Interface:

Bankgpt

After starting the room and navigating to the page via the link, you find yourself on a typical minimal chatbot UI.

🎯 Core Idea of the Challenge

Direct requests for confidential data are consistently rejected by the bot.

However, when the same requests are embedded in a context such as:

compliance audits
internal audit processes
classification or verification workflows

the model responds increasingly openly - sometimes with additional internal descriptions or metadata.

Typical interaction flow:

The model provides secure or redacted placeholder values
It mentions internal designations or classifications
It provides additional context information with polite requests
Certain phrasings unexpectedly weaken protection mechanisms

This demonstrates:

The guidelines are implemented at the conversation level - not based on a genuine assessment of sensitivity.

This creates potential attack surfaces.

🧩 Central Methods & Insights

The challenge demonstrates several real prompt injection patterns:

🟡 Authority and Process Framing

Formulations that resemble internal communication build more trust:

audit inquiries
system or compliance validation
verification logic rather than data collection

The model more frequently interprets such prompts as legitimate workflows.

🟡 Redaction & Classification Queries

Instead of requesting sensitive data, the bot can be made to:

explain redactions or placeholders
describe internal data types or designations
explain storage or reference concepts

While only metadata is discussed, even this can be security-relevant.

🟡 “Debug / Integrity Check / Verification”

In development or support contexts, LLMs often respond more freely to:

check and confirmation instructions
repetitions or echoes of values
simulated log or protocol outputs

This highlights risks in:

support chatbots
internal tools
DevOps automations

when outputs are not additionally secured.

🟡 Format and Demonstration Vulnerabilities

A particularly important learning point:

When a model is only supposed to “show a format,” it can still output real values.

Format validation is often not interpreted as data disclosure.

This has immediate relevance for:

AI-driven business processes
assistant systems in enterprises
security-critical areas

🧠 Security Lessons from the Challenge

BankGPT clearly shows:

conversational guidelines alone provide no real protection
LLMs evaluate phrasing, not security risks
harmless-sounding process language can lead to data leaks
disclosing metadata can already be critical
prompt injection remains a real, current threat

Secure systems require protection at: ✔ Data access level
✔ System architecture
✔ Permission models
not just in chat response logic.

🏁 Conclusion

BankGPT is an excellent practical introduction to:

adversarial prompts
analysis of conversational attack surfaces
LLM abuse scenarios
modern security questions around AI systems

The challenge requires:

critical thinking
communication awareness
understanding of human and technical security factors

and very clearly demonstrates what risks can arise when AI systems are integrated into operational workflows.

👉 Room link for reference:
https://tryhackme.com/room/bankgpt