浪人
DE | EN
← Back to Blog

TryHackMe — BankGPT Write-Up

3 min read

🏦 TryHackMe — BankGPT (Room Walkthrough & Learning Notes)

Room: https://tryhackme.com/room/bankgpt
Topic: Prompt Injection • LLM Security • Context-Based Influence

Note: This write-up describes exclusively the approach and learning content. It contains no flags, passwords, hashes, or confidential values in accordance with TryHackMe guidelines.


🔍 Challenge Overview

The BankGPT room provides a simulated banking chatbot powered by a Large Language Model.
The bot is designed to:

  • follow internal security policies
  • not disclose confidential information
  • mask or redact sensitive values
  • explain processes rather than provide concrete data

The task is to investigate how prompt injection techniques and contextual phrasing affect these protective mechanisms.

This is not a classic technical exploitation, but rather a test of:

  • argumentation and reasoning
  • context manipulation
  • social influence
  • vulnerabilities in LLM behavior

🪟 The User Interface:

Bankgpt

After starting the room and accessing the page via the link, you find yourself on a typical minimalist chatbot UI.


🎯 Core Idea of the Challenge

Direct requests for confidential data are consistently rejected by the bot.

However, when the same requests are embedded in a context such as:

  • compliance audits
  • internal audit processes
  • classification or verification workflows

the model becomes increasingly willing to respond —
sometimes providing additional internal descriptions or metadata.

Typical flow of interaction:

  1. The model provides safe or redacted placeholder values
  2. It names internal designations or classifications
  3. It provides additional context information in response to polite follow-ups
  4. Certain phrasings unexpectedly weaken protective mechanisms

This demonstrates:

Note: The guidelines are implemented at the conversation level — not based on genuine assessment of sensitivity. This creates potential attack surfaces.


🧩 Central Methods & Insights

The challenge demonstrates several real prompt injection patterns:


🟡 Authority and Process Framing

Phrasings that resemble internal communication create more trust:

  • audit inquiries
  • system or compliance validation
  • verification logic rather than data collection

The model interprets such prompts more frequently as legitimate workflows.


🟡 Redaction & Classification Queries

Instead of requesting sensitive data directly, the bot is prompted to:

  • explain redactions or placeholders
  • describe internal data types or designations
  • explain storage or reference concepts

In doing so, only metadata is discussed but this too can be security-relevant.


🟡 “Debug / Integrity Check / Verification”

In development or support contexts, LLMs often respond more freely to:

  • testing and confirmation instructions
  • repetitions or echoing of values
  • simulated log or protocol output

This illustrates risks in:

  • support chatbots
  • internal tools
  • DevOps automation

when outputs are not additionally secured.


🟡 Format or Demonstration Vulnerabilities

A particularly important learning point:

Note: When a model is supposed to “just show a format,” it can still output real values.

Format validation is often not interpreted as data disclosure. This has immediate relevance for:

  • AI-assisted work processes
  • assistance systems in enterprises
  • security-critical areas

🧠 Security Lessons from the Challenge

BankGPT clearly demonstrates:

  • Conversational guidelines alone do not provide genuine protection
  • LLMs evaluate phrasings, not security risks
  • Seemingly harmless process language can lead to data leaks
  • Revealing metadata can already be critical
  • Prompt injection remains a real, current threat field

Secure systems require protection at: ✔ data access level
✔ system architecture
✔ permission models not just in chat response logic.


🏁 Conclusion

BankGPT is an excellent practical introduction to:

  • adversarial prompts
  • analysis of conversational attack surfaces
  • LLM abuse scenarios
  • modern security questions around AI systems

The challenge demands:

  • critical thinking
  • communication awareness
  • understanding of human and technical security factors

and clearly illustrates what risks can emerge when AI systems are integrated into operational workflows.

👉 Room link for reference:
https://tryhackme.com/room/bankgpt