University Research Project — Deep Dive

Charlie

An AI-powered virtual patient for medical education. Students practice clinical assessments — like the Glasgow Coma Scale — on a fully embodied 3D avatar that responds with realistic speech, animation, and medical behavior. Built to run on VR, desktop, and browser.

My Contribution

Built from Scratch

Charlie is a university research project led by Prof. Dr. med. Daniela Becker (concept & medical supervision) and Prof. Armin Grasnick (technical direction). I designed and built the entire technical implementation — from 3D character to animation system to AI pipeline.

What I Built

Full 3D character modeling, rigging, and animation (19 motor animations + facial blend shapes)
Complete game architecture in Unity (v1.0) and Unreal Engine 5 (v2.0 migration)
Custom NLU engine for semantic parsing of medical commands (225+ phrase patterns)
Speech AI pipeline: Whisper STT → NLU → LLM → Piper TTS
GCS simulation logic (90 possible combinations, medically accurate responses)
Multi-platform deployment: Web, Desktop, VR (Quest 3)
Backend server architecture (FastAPI, PostgreSQL, Redis)
RAG-enhanced tutor mode with course material integration

The Problem

Medical Training Doesn't Scale

Teaching clinical assessment requires real patients or expensive SimMan mannequins (€500k+). Students get limited practice time, no repeatability, and no immediate feedback. Charlie replaces this with an AI patient available 24/7, from any browser, that behaves like a real patient — confused speech, pain responses, involuntary motor reactions.

Capabilities

Three Modes

Charlie isn't just a chatbot — it's a multi-purpose educational AI with distinct operational modes.

☤

Medical Simulation

GCS/OSCE training with a virtual patient. Students examine, diagnose, and receive scored feedback.

LIVE

★

Intelligent Tutor

RAG-enhanced Q&A tied to course materials. Answers adapt to student level and curriculum context.

PLANNED

☼

Digital Presence

Live meeting attendance, automated note-taking, and session summaries for remote learning.

PLANNED

Architecture

End-to-End AI Pipeline

From spoken word to animated response in under 500ms. The entire pipeline can run locally — zero cloud dependencies when needed.

Voice Input (16kHz) → Whisper STT → NLU Intent Parser → GCS Logic / RAG → LLM Response → Piper TTS → Animation + Audio

Speech Recognition

Whisper.cpp for offline mode (12+ concurrent sessions, zero API cost) or Whisper API for cloud. German and English support.

Natural Language Understanding

Custom semantic classifier parsing complex medical commands. Handles multi-intent inputs like "lift left arm and tell me your name" as two separate actions.

Response Generation

Ollama (local) or GPT-4o-mini (cloud) for contextual responses. RAG retrieval from ChromaDB with embedded course materials.

Text-to-Speech

Piper TTS for local synthesis in 50+ languages. ElevenLabs as premium alternative. Natural voice output synchronized with avatar lip-sync.

Animation System

3-layer animator: base body + action responses + facial blend shapes. Pain responses override all states. Eye behavior follows GCS protocol.

Multi-Platform

Single codebase targeting Desktop, Quest 3 VR (90 FPS, <20ms latency), PCVR, and Web via Pixel Streaming.

Medical Accuracy

Glasgow Coma Scale Simulation

The GCS is the standard for assessing consciousness in clinical settings. Charlie implements all 90 possible combinations with medically accurate responses — from fully oriented (GCS 15) to unresponsive (GCS 3).

Component	Range	What Charlie Does
Eye Response (E)	E1 — E4	No opening → opens to pain → opens to voice → spontaneous. Blend shape animation with smooth interpolation.
Verbal Response (V)	V1 — V5	None → groaning → confused words → disoriented speech → fully oriented conversation.
Motor Response (M)	M1 — M6	No movement → extension → flexion → withdrawal → localizes pain → follows commands. 19 distinct animations.

Under the Hood

Tech Stack

Game Engine

Unity (v1.0 shipped) → Unreal Engine 5.4 (v2.0 in development). MetaHuman avatar with FACS facial animation.

Backend

Python 3.11 + FastAPI. PostgreSQL for session data, Redis for caching, Docker for deployment.

AI Stack

Whisper (STT), Custom NLU (C++), ChromaDB + Sentence Transformers (RAG), Ollama/GPT-4o (LLM), Piper (TTS).

VR Platform

OpenXR for hardware abstraction. Quest 3 native APK, SteamVR, Pixel Streaming for browser access.

Unreal Engine 5 Unity C++ C# Python FastAPI Whisper Ollama ChromaDB Piper TTS OpenXR Quest 3 PostgreSQL Docker

By the Numbers

Key Metrics

90 GCS Combinations

Every clinically valid combination of Eye, Verbal, and Motor responses — implemented and medically verified.

225+ NLU Patterns

Growing phrase database for medical command recognition. Self-learning: admin can add new patterns at runtime.

4 Platforms

Desktop, Quest 3 VR, PCVR, and Web browser — from a single codebase with platform-specific optimizations.

500ms E2E Latency

From spoken question to animated response. Fast enough for natural conversation flow in VR.

50+ Languages

Piper TTS supports global deployment — Arabic, Chinese, Spanish, and more. NLU currently covers German and English.

Gamescom 2026

UE5 version targeting live demo at Gamescom, August 2026. 12 two-week sprints from concept to showfloor.

Visit vrcharlie.app →