AI GM
Level design / System Design

AI Game Master – Where GMing Meets Artificial Intelligence

The AI Game Master (AI GM) is a capstone project that explored the idea of mixing Reinforcement Learning (RL) with the role of the GM in role-playing table top gaming. It is built to functino as a cooperative Game Master, with the possibility of generating storylines, enviorments, and character interactions in real time, using the player choices as its foundation for these decision.

Blurb:
The goal of this project is not to replace human storytellers, but to empower them. The AI Game Master acts as a creative partner, capable of expanding worlds, improvising encounters, and maintaining narrative consistency across sessions. Using natural language processing and procedural generation, it brings the spontaneity of traditional tabletop experiences into an AI-supported environment.

This ongoing development combines my background in RPG system design with my growing passion for human-centered AI. Through usability testing, player feedback, and iterative storytelling experiments, the AIGM project continues to evolve; bridging the gap between imagination and technology, and reimagining what it means to share a story with a machine that listens, learns, and responds.

10/26/2025

Introduction

This month marked a major step in the development of my AI Game Master (AIGM) application. As a lifelong RPG fan and designer, I wanted to create a system that could act as a cooperative Game Master, adapting to players’ creativity rather than replacing it. The AIGM prototype aims to generate unique stories, dialogue, and encounters in real time, bringing the spontaneity of tabletop sessions to a digital, AI-assisted format.

This post focuses on the core story-generation feature I built and tested this month, the tools used to create it, and what I learned along the way.

Feature Development: Adaptive Story Generation

Overview

This month marked the early design phase of my capstone project, the AI Game Master (AIGM), a system that merges artificial intelligence and tabletop role-playing games to create interactive storytelling experiences.

As both a game designer and a lifelong RPG fan, I wanted to build something that helps players and Game Masters alike craft memorable campaigns. The AIGM isn’t meant to replace human creativity, it’s meant to enhance it, acting as a collaborative storyteller that adapts to each group’s imagination.

This month, I focused specifically on designing the user interface (UI) and overall layout prototype for the application. The goal was to create an intuitive environment that supports both casual players and experienced Game Masters during AI-assisted sessions.

Tools & Technologies Used

Python 3.11 – Core programming language for the backend logic.

Development Process

The month began with designing how the AI would store and recall narrative context. I implemented a memory cache system that logs player choices, ensuring the AI can reference past events to maintain story consistency.
Testing sessions focused on balancing creativity vs. structure; fine-tuning prompt templates to prevent the AI from creating overly long or irrelevant responses. This worked a bit unreliably, but is functioning at this time.

By the end of the sprint, the AI could dynamically create story hooks, describe environments, and generate dialogue trees that shifted based on player tone and decisions.

Challenges Faced

The prototype was developed entirely in Python using the Streamlit framework to quickly mock up an interactive tabletop interface. The layout was divided into three major sections:

Narrative Panel – Displays AI or NPC text, player dialogue, and narrative controls (Approve, Edit, Regenerate).
Encounter Tracker – Manages player and enemy hit points, attack rolls, and dice results.
Game Master Controls – Allows for spawning NPCs, adjusting difficulty, and editing the world log.

The goal was to simulate the feel of a digital tabletop session before introducing real AI logic.

Tools & Technologies Used

Python 3.11 – Core programming language.
Streamlit – Used for building and displaying the user interface, managing session state, and creating interactive elements like buttons, sliders, and text inputs.
Random library – Generated placeholder dice rolls and attack results.

Development Process

The prototype was built through an iterative process:

Session State Setup: Initialized persistent session variables for encounter data, chat logs, and player/NPC text.
Interface Columns: Structured the app into three columns (left narrative, middle encounters, right GM tools) to mirror the typical tabletop experience.
Interactivity: Added working buttons for saving, loading, dice rolls, and regenerating NPC dialogue placeholders.
Chat Simulation: Implemented a temporary system where the “DM” gives automated responses to user input; simulating AI conversation flow for later testing.

This structure allowed real-time interaction between components, giving me a live preview of how the eventual AI logic will integrate with user input and session data.

Challenges Faced

The main challenge this month was ensuring the Streamlit layout stayed responsive and readable across different screen widths. Streamlit’s wide-layout option helped, but some nested columns required careful proportioning. Several pages were dispalying different font sizes regardless of the CSS that was being used.

Additionally, keeping session data consistent between updates was tricky, early versions would reset variables whenever an action occurred. This was solved by centralizing initialization in an _init_state() function that preserved all key values between user actions.

Furthermore, on a more personal note, a rather lengthy illness made me lose about a weeks worth of development on the project during this course.

Retrospective

What Went Right

Completed the IRB documentation for upcoming usability testing.
Feedback from faculty advisor Oleg provided strong direction on ethics and data handling.
Completed a fully interactive Streamlit prototype representing all major game functions.
Established a solid session-state system for managing user input, encounters, and dialogue.
Successfully simulated dice rolls and player actions using Python’s randomization functions.

What Went Wrong

Personal Life - thanks to getting sick which left me unable to work much on the project during the last week, the prototype is behind where I originally envisioned it for this time.
Time allocation between coding, testing, and documentation was uneven during mid-month crunch weeks.
Early versions of the layout were too narrow, making columns overlap or clip on smaller screens.
Encounter tracker logic briefly broke when session states weren’t properly initialized.
Time spent on debugging UI alignment slightly delayed documentation progress.

How I’ll Improve Moving Forward

Continue refining layout scaling for smaller monitors and potential tablet support.
Begin linking this prototype to the upcoming AI backend module next month.
Add data persistence so sessions can be saved and loaded between runs.
Conduct informal user tests to gauge readability and workflow before formal IRB testing begins.

Closing Thoughts

This month’s focus on UI and user interaction flow created a solid foundation for future development. While the AI system is still forthcoming, the current prototype now provides a working environment for testing usability, pacing, and data structure.

Designing with Streamlit proved to be an efficient way to visualize gameplay systems in real time, giving a clearer picture of how both players and Game Masters will interact with the AI Dungeon Master once intelligence and narrative features are added.

Next month’s goal is to integrate early AI responses and begin gathering usability data from tabletop players.

11/22/2025

Introduction

This month marked a major step in the development of my AI Game Master (AI GM). As both a game designer and a longtime RPG player, I’ve always wanted to build a system capable of guiding players through character creation and world interaction in a way that feels responsive, intuitive, and supportive rather than restrictive.

The AI GM prototype is designed to assist players and Game Masters by generating structured character data, applying races and features, and preparing prompts for later AI-driven narrative modules. This post focuses on the performance testing conducted this month, the usability evaluations completed, and the technical refinements that improved the overall responsiveness of the tool.

Feature Development: Performance Optimization & Usability Improvements

Overview

This month centered on two major development areas: CPU profiling and user testing. After several rounds of iteration, the AI DM interface was functional but showing signs of inefficiency. Additionally, I wanted to understand how real users interpreted the character creation flow, especially the “Apply Race” workflow. This was met with a wide range of mixed results, depending heavily on the user's experience with the world's most popular role-playing table top game at this time.

Profiling the system revealed CPU bottlenecks around Streamlit’s rerun behavior, repeated JSON loading, and inefficient prompt construction. Meanwhile, user testing with participants of varying tabletop RPG experience provided insight into usability challenges, including terminology confusion and unclear system feedback.

Together, these efforts helped improve both performance and user experience, paving the way for deeper AI integration later in the project.

Tools & Technologies Used

Python 3.12 – Main language for backend logic
Streamlit – Framework used to build the UI and manage reruns
cProfile – Python CPU profiler used for detailed performance testing
SnakeViz – Visualization tool for interpreting profiler output
JSON configuration files – Store race, class, and rule data
Local command-line environment – Used for profiling and debugging

Development Process

The month began with a full CPU analysis using cProfile. The profiler was attached during a typical run of the application, including selecting a race, applying racial bonuses, navigating to Step 2, and triggering prompt generation. This produced a 100-second profile containing over seven million function calls, which was then visualized using SnakeViz.

The results highlighted that:

Streamlit’s event loop consumed the largest amount of CPU time
JSON files were being reloaded on every rerun
String concatenation during prompt building caused unnecessary overhead
Some heavy logic was still running at the global scope

Based on this, several key improvements were made:

Implemented @st.cache_data to store static race and class data
Rewrote prompt building using a list-join approach instead of repeated concatenation
Moved expensive logic into functions so Streamlit would not repeatedly execute them
Reduced logging noise to eliminate unnecessary console work

Parallel to these improvements, I also conducted usability testing with four participants. Each user completed a Talk-Aloud session where they selected a race, clicked Apply Race, and moved into Step 2 while narrating their thoughts.

A structured questionnaire was created that included:

Demographic questions
Five Likert-scale usability questions
Open-ended follow-up questions
A script instructing users how to perform the Talk-Aloud

This process highlighted areas where user expectations did not match interface behavior, especially during page reruns.

Challenges Faced

The prototype was developed entirely in Python using the Streamlit framework to quickly mock up an interactive tabletop interface. The layout was divided into three major sections:

Narrative Panel – Displays AI or NPC text, player dialogue, and narrative controls (Approve, Edit, Regenerate).
Encounter Tracker – Manages player and enemy hit points, attack rolls, and dice results.
Game Master Controls – Allows for spawning NPCs, adjusting difficulty, and editing the world log.

The goal was to simulate the feel of a digital tabletop session before introducing real AI logic.

Tools & Technologies Used

Python 3.11 – Core programming language.
Streamlit – Used for building and displaying the user interface, managing session state, and creating interactive elements like buttons, sliders, and text inputs.
Random library – Generated placeholder dice rolls and attack results.

Development Process

The results highlighted that:

Streamlit’s event loop consumed the largest amount of CPU time
JSON files were being reloaded on every rerun
String concatenation during prompt building caused unnecessary overhead
Some heavy logic was still running at the global scope

Based on this, several key improvements were made:

Implemented @st.cache_data to store static race and class data
Rewrote prompt building using a list-join approach instead of repeated concatenation
Moved expensive logic into functions so Streamlit would not repeatedly execute them
Reduced logging noise to eliminate unnecessary console work

A structured questionnaire was created that included:

Demographic questions
Five Likert-scale usability questions
Open-ended follow-up questions
A script instructing users how to perform the Talk-Aloud

This process highlighted areas where user expectations did not match interface behavior, especially during page reruns.

Challenges Faced

One of the main technical challenges this month was Streamlit’s rerun model. Whenever a widget changes, the entire script re-executes, which caused:

Repeated data loading
Flickering on update
Confusion for inexperienced users who believed the app froze

Caching mitigated some issues, but Streamlit’s event loop remains a source of complexity moving forward.

From a usability standpoint, several participants struggled with terminology such as “ability bonuses” and the difference between selecting and applying a race. Users unfamiliar with tabletop RPG mechanics needed more guidance than expected, while experienced users requested clearer confirmation feedback.

Maintaining consistent session data between reruns was another area requiring attention. Several early versions duplicated work or failed to persist state. This was addressed by consolidating initialization within a dedicated setup function.

Retrospective

What Went Right

Completed full CPU profiling to identify performance bottlenecks.
Implemented caching that significantly reduced redundant operations.
Completed Talk-Aloud usability testing with four participants.
Created the first formal questionnaire for structured user evaluation.
Improved prompt generation logic and removed unnecessary overhead.
Gained actionable insight into how new vs. experienced users interpret the UI.

What Went Wrong

Some portions of the interface remained unclear to beginners, especially the Apply Race workflow.
Streamlit rerun behavior caused visible flickering that confused certain participants.
Time spent debugging visualization in SnakeViz slowed early analysis.
Maintaining consistent session state required several rewrites.
Balancing profiling, user testing, and documentation compressed development time.

How I’ll Improve Moving Forward

Add visual confirmation messages whenever Apply Race succeeds.
Implement tooltips to explain terminology like modifiers and ability bonuses.
Continue refining caching strategies to minimize reruns and refresh delays.
Begin polishing the Step-2 interface to better guide new users.
Expand testing to more users before integrating AI logic.
Start linking the optimized character creation tool to the upcoming narrative engine.

Closing Thoughts

This month’s focus on profiling and user testing provided a clearer understanding of how the AI DM performs under real use. The insights gathered from both technical analysis and human evaluation shaped meaningful improvements to the system’s responsiveness, clarity, and reliability.

By refining the character creation workflow now, the project builds a strong foundation for the AI-driven features planned for future iterations. Next month’s goal is to continue polishing the interface, integrate early narrative-generation components, and deepen usability testing with a broader range of tabletop players.

12/21/2025

Introduction

This month marked a major step in the development of my AI Game Master (AI DM). As both a game designer and a longtime tabletop RPG player, I’ve always wanted to build a system capable of guiding players through character creation and world interaction in a way that feels responsive, intuitive, and supportive rather than restrictive. As the application has progressed, it is steadily getting closer to realizing this goal.

The AI DM prototype is designed to assist players and Game Masters by generating structured character data, applying races and features, and preparing prompts for later AI-driven narrative modules. This post focuses on the performance testing conducted this month, the usability evaluations completed, and the technical refinements that improved the overall responsiveness and clarity of the tool.

Feature Development: Performance Optimization & Usability Improvements

Overview

This month centered on two major development areas: CPU profiling and user testing. After several rounds of iteration, the Virtual DM interface was functionally complete but showing signs of inefficiency. At the same time, I wanted to better understand how real users interpreted the character creation flow; particularly the Apply Race workflow.

Profiling revealed CPU bottlenecks related to Streamlit’s rerun behavior, repeated JSON loading, and inefficient prompt construction. In parallel, usability testing with participants of varying tabletop RPG experience exposed challenges around terminology, feedback clarity, and user expectations during page reruns.

Together, these efforts improved both performance and usability, laying a stable foundation for deeper AI integration later in the project.

Tools & Technologies Used

Python 3.11 / 3.12 – Core language for backend logic and tooling
Streamlit – UI framework used to build the prototype and manage reruns
cProfile – Python CPU profiler used for performance analysis
SnakeViz – Visualization tool for interpreting profiler output
JSON configuration files – Store race, class, and rule data
Local command-line environment – Used for profiling, testing, and debugging

Development Process

The month began with a full CPU analysis using cProfile. The profiler was attached during a typical application run, including selecting a race, applying racial bonuses, navigating to Step 2, and triggering prompt generation. This produced a 100-second profile with over seven million function calls, which was then visualized using SnakeViz.

The analysis highlighted several key issues:

Streamlit’s event loop consumed the largest share of CPU time
JSON files were reloaded on every rerun
Prompt construction relied heavily on repeated string concatenation
Some expensive logic still executed at the global scope

Based on these findings, several optimizations were implemented:

Added @st.cache_data to store static race and class data
Rewrote prompt generation using list-join patterns instead of concatenation
Moved expensive logic into functions to reduce unnecessary reruns
Reduced logging noise to eliminate redundant console operations

In parallel, I conducted Talk-Aloud usability testing with four participants. Each user selected a race, clicked Apply Race, and progressed into Step 2 while narrating their thoughts.

A structured questionnaire was created that included:

Demographic questions
Five Likert-scale usability questions
Open-ended follow-up questions
A scripted Talk-Aloud instruction guide

This process highlighted several mismatches between user expectations and system behavior, particularly during page reruns and confirmation feedback.

Challenges Faced

One of the primary technical challenges this month was Streamlit’s rerun model. Any widget interaction causes the entire script to re-execute, which resulted in:

Repeated data loading
Visual flickering during updates
Confusion among inexperienced users who believed the app had frozen

While caching mitigated many of these issues, Streamlit’s event loop remains an ongoing consideration for future scalability.

From a usability perspective, several participants struggled with terminology such as ability bonuses and the distinction between selecting and applying a race. Less experienced tabletop players required more guidance than expected, while experienced players requested clearer confirmation feedback.

Maintaining consistent session state between reruns was another challenge. Early versions duplicated initialization logic or failed to persist state correctly, which was resolved by consolidating setup logic into a dedicated initialization function.

Retrospective

What Went Right

Completed full CPU profiling and identified concrete performance bottlenecks
Implemented caching that significantly reduced redundant operations
Conducted Talk-Aloud usability testing with four participants
Created the first formal questionnaire for structured user evaluation
Improved prompt generation efficiency
Gained valuable insight into novice vs. experienced user expectations

What Went Wrong

The Apply Race workflow remained unclear for some beginners
Streamlit reruns caused visible flickering that confused users
Time spent debugging SnakeViz visualizations slowed early analysis
Session state persistence required multiple rewrites
Balancing profiling, testing, and documentation compressed development time

How I’ll Improve Moving Forward

Add explicit confirmation messages when Apply Race succeeds
Introduce tooltips for terminology like modifiers and ability bonuses
Further refine caching strategies to minimize reruns
Polish the Step-2 interface for improved guidance
Expand usability testing to additional participants
Begin linking optimized character creation to early narrative generation

Closing Thoughts

This month’s focus on profiling and user testing provided a clearer understanding of how the Virtual DM performs under real-world use. Insights from both technical analysis and human evaluation led to meaningful improvements in responsiveness, clarity, and reliability.

By strengthening the character creation workflow now, the project establishes a solid foundation for the AI-driven narrative systems planned for future iterations. Next month’s focus will be on interface refinement, early narrative integration, and expanded usability testing with a broader range of tabletop players.

02/01/2025

Master of Science in Computer Science — Program Reflection & Capstone Retrospective

Self-Reflection: Looking Back at the Journey

Completing the Master of Science in Computer Science

program marked a clear shift in how I approach software

development; both technically and philosophically. I already

had a decade of experience in computer programming and

game design, which accelerated the development of my

application.

However, when I began the program, my focus was largely

on making systems work, with only a little thought to making

the systems explainable, testable and justifiable. In this way, I ran into Black Box issue that is common amongst AI develoment. .

My early coursework emphasized structured problem solving, algorithmic thinking, and foundational theory. These are all strengths of mine, and ones I have rather exciled at in the past. However, later courses forced me to reconcile these concepts with real-world constraints: performance, usability, maintainability, evaluation, and ethics. Before this course, these are things I did not really think much about when developing my projects. And these directly shaped how I designed and built my capstone project, AI DM.

Comparing my earliest journal entries to my final coursework, the difference is clear. My early work often emphasized feature completeness. This is a trait of mine that has stood out several times in this course, and my previous Bachelor's program. My later work emphasized design rationale, trade-offs, validation, and user-centered evaluation. I learned to ask not just “Does this work?” but “Why does this approach make sense?” and “How do I prove that it does?”

How the Program Changed My Approach to Software Development

Building Software

The program reinforced that good software is designed, not improvised. I now begin projects with:

Explicit requirements
Data schemas and interfaces
Clear separation between deterministic logic and probabilistic systems
Early consideration of performance and scalability

This mindset directly informed the architecture of Virtual DM, which intentionally separates AI-generated content from rule-driven mechanics to ensure consistency and player trust.

Evaluating Software

Before this program, evaluation often meant “it works on my machine.” Now, evaluation includes:

Quantitative testing
User feedback (Talk-Aloud and TAM-based evaluation)
Performance profiling
Ethical and usability considerations

This shift fundamentally changed how I validated my capstone, pushing me to justify not just outcomes, but design decisions

Course-by-Course Reflection and Capstone Impact

Below is a summary of how each course contributed to my final project and overall development.

Foundations of Computer Science

This course reinforced algorithmic thinking and problem decomposition. While not directly visible in the final UI, it influenced how I structured game logic, rule resolution, and state transitions within the Virtual DM system.

Capstone Impact: Core logic design and deterministic rule handling.

Lesson Learned: Strong fundamentals reduce long-term complexity.

Data Structures and Algorithms

Understanding time and space complexity became critical when dealing with JSON-based rule systems and dynamic content generation.

Capstone Impact: Efficient character loading, rule resolution, and state management.

Lesson Learned: Optimization matters even in “small” systems when they scale in complexity.

Software Engineering

This course had one of the most direct impacts on my capstone. Emphasis on modularity, documentation, and maintainability shaped the project’s architecture.

Capstone Impact:

Clear separation between UI, rules engine, and AI components
Version control discipline
Iterative design documentation

Lesson Learned: Maintainability is a feature, not an afterthought.

Artificial Intelligence

This course fundamentally changed how I view AI systems. Rather than treating AI as a black box, I learned to frame it as a tool with specific strengths and limitations.

Capstone Impact:

Use of large language models for narrative generation only
Avoidance of AI for rules enforcement
Prompt engineering and constraint design

Lesson Learned: AI excels when paired with deterministic systems—not when replacing them.

Human-Computer Interaction

This course directly influenced my UI decisions and testing methodology. I learned that usability issues often surface only through observation.

Capstone Impact:

Talk-Aloud testing sessions
Interface revisions based on user confusion
Streamlined layout and workflow improvements

Research Methods and Analysis

This course prepared me to formally justify my design choices. It also introduced structured evaluation frameworks that became essential during capstone documentation.

Capstone Impact:

TAM-based questionnaires
Structured evaluation metrics
Clear methodology section in the final paper

Lesson Learned: Good research framing strengthens technical credibility.

Ethics and Professional Practice

This course reshaped how I think about AI responsibility. It emphasized transparency, explainability, and user trust.

Capstone Impact:

Clear distinction between AI-generated content and rule-based outcomes
Explicit system limitations communicated to users

Lesson Learned: Ethical design is proactive, not reactive.

Capstone Project

The capstone made me unify the skills I developed over this course. It required technical execution, documentation, evaluation, and reflection; mirroring real-world software development far more than isolated assignments. As a solo developer outside of this program, having to consider these was a new trial in and of itself.

Lesson Learned: Integration is the true test of understanding.

Challenges and Curriculum Feedback

Most Challenging Elements

Balancing academic rigor with creative system design
Managing scope in a large, evolving project
Translating technical decisions into formal academic justification

What Would Be Useful Additions

Earlier exposure to evaluation frameworks
More structured guidance on AI system validation

Advice for Future Students

For students entering the CSMS program or beginning their capstone:

Start documenting design decisions early
Treat evaluation as part of development, not a final step
Keep projects modular to avoid late-stage refactors
Be honest about system limitations—clarity builds trust

Most importantly, view the capstone not as a final assignment, but as a professional artifact you may carry forward beyond graduation.

Closing Reflection

This program helped refine how I take on the task of developing programs. The skills I gained in this course will be invaluable to my future endeavors and applications. I have taken the new knowledge and skills in utilizing AI and applied them to my game design and programming outside of this course already; allowing me to accelerating balancing several featurs in a gaem project I am developing.

Lesson Learned: User behavior reveals more than assumptions ever will.

AI Game Master – Where GMing Meets Artificial Intelligence

Introduction

Feature Development: Adaptive Story Generation

Overview

Tools & Technologies Used

Development Process

Challenges Faced

Tools & Technologies Used

Development Process

Challenges Faced

Retrospective

What Went Right

What Went Wrong

How I’ll Improve Moving Forward

Closing Thoughts

Introduction

Feature Development: Performance Optimization & Usability Improvements

Overview

Tools & Technologies Used​

Development Process

Challenges Faced

Tools & Technologies Used

Development Process

Challenges Faced

Retrospective

​

What Went Right

What Went Wrong

How I’ll Improve Moving Forward

Closing Thoughts

Introduction

Feature Development: Performance Optimization & Usability Improvements

Overview

Tools & Technologies Used

Development Process

Challenges Faced

Retrospective

What Went Right

What Went Wrong

How I’ll Improve Moving Forward

Closing Thoughts

Master of Science in Computer Science — Program Reflection & Capstone Retrospective

How the Program Changed My Approach to Software Development

Building Software

Evaluating Software

Course-by-Course Reflection and Capstone Impact

Foundations of Computer Science

Data Structures and Algorithms

Software Engineering

Artificial Intelligence

Human-Computer Interaction

Research Methods and Analysis

Ethics and Professional Practice

Capstone Project

Challenges and Curriculum Feedback

Most Challenging Elements

What Would Be Useful Additions

Advice for Future Students

Closing Reflection

Tools & Technologies Used