Unified Experience Partner
An event-driven AI assistant system integrating long-term memory, dynamic emotions, and personalized interactions, featuring modular architecture with speech recognition, NLP, LLM, voice synthesis, custom movement and animation for complete multimodal experiences.
Project Introduction
U.E.P is an event-driven intelligent assistant system designed to create an AI companion with long-term memory, dynamic emotions, and personalized interactions. The project adopts a modular architecture, integrating speech recognition, natural language processing, large language models, text-to-speech synthesis, and visual animation to provide a complete multimodal interaction experience.
Core Philosophy
- Long-term Memory Capability: Solves the problem of traditional AI assistants having short-term memory and being unable to accumulate user interaction history
- Flexible Workflow System: Automatically recognizes user intent through LLM and executes corresponding complex task flows
- Modularity & Extensibility: Adopts event-driven architecture, supporting independent module development, testing, and replacement
My Responsibilities
As the core developer of the project, I am responsible for:
System Architecture Design
- Designed three-layer event-driven architecture (Input Layer → Processing Layer → Output Layer)
- Implemented central Event Bus coordinating 9 functional modules
- Established three-tier session management system (Global Session / Chat Session / Working Session)
Core Feature Implementation
- Memory System: FAISS-based vector database supporting identity isolation and long-term memory retrieval
- Workflow Engine: Integrated MCP protocol, implementing 21 automated workflows (document generation, schedule management, knowledge retrieval, etc.)
- Frontend Integration: Developed Frontend Bridge and three-window toolset (Live2D animation, subtitles, dialogue bubbles)
- Special State System: Implemented dynamic emotion system (MISCHIEF mischief mode, SLEEP sleep state, etc.)
Testing & Documentation
- Established comprehensive testing system: 476 test cases with 85% coverage
- Authored System Design Document (SDD), Project Execution Plan (PEP), and Test Reports (TR-00 ~ TR-07)
Core Features
1. Event-Driven Modular Architecture
- 9 Functional Modules: Speech Input (STT), Natural Language Processing (NLP), Memory Management (MEM), Language Model (LLM), System Control (SYS), Text-to-Speech (TTS), User Interface (UI), Animation Control (ANI), Motion Execution (MOV)
- Event Bus Hub: Coordinates inter-module communication through 20+ event types, achieving loosely coupled design
2. Identity-Isolated Long-Term Memory System
- Each user owns an independent FAISS vector index
- Supports semantic retrieval, time-range filtering, and memory management
- MCP tooling design allows LLM to actively query and create memories
3. LLM-Driven Workflow Automation
- 21 Workflows: Schedule management, note-taking, document generation, knowledge retrieval, reminder settings, email drafts, etc.
- NLP Intent Recognition: Automatically analyzes user input and triggers corresponding workflows
- MCP Protocol Integration: Standardized LLM tool invocation interface
4. Three-Tier Session Management
- Global Session (GS): System-level settings and global state
- Chat Session (CS): Context and participant information for a single conversation
- Working Session (WS): Temporary state during workflow execution
5. Dynamic Emotion & Personality System
- Status Manager: Tracks user states (IDLE, LISTENING, THINKING, SPEAKING, etc.)
- MISCHIEF Mode: Low-probability triggered playful interactions
- SLEEP State: Draggable wake-up animation interaction
6. Complete Frontend Integration
- Frontend Bridge: Unified management of frontend communication (Live2D animation, subtitle display, dialogue bubbles)
- Three-Window Toolset: Main window (Live2D), subtitle window, dialogue window
- Animation Event System: Frontend reports events after animation completion, ensuring process synchronization
Technologies Used
AI / ML Technologies
- Whisper: OpenAI speech recognition model
- Google Gemini: Large language model (supporting 2M token context caching)
- FAISS: Facebook vector similarity search engine
- Edge-TTS: Microsoft text-to-speech service
System Architecture
- Python 3.10: Core development language
- PyQt6: System loop and event integration
- Event Bus Pattern: Central event coordination
- MCP Protocol: Model Context Protocol (LLM tool invocation standard)
- YAML Configuration Management: Modular configuration files
Development Tools
- pytest: Unit testing and integration testing
- GitHub Actions: CI/CD automation
- logging: Hierarchical logging system (debug/runtime/error)
Project Status
Current Version: v0.9.4-stable
- Core Module Stability: Event Bus, Sessions, Controller, Frontend Bridge are all stable
Testing Status
- Total Test Cases: 476
- Pass Rate: 81.9% (390 passed / 476 total)
- Test Coverage: 85% line coverage, 80% branch coverage
- Known Issues: 28 (P0=1, P1=6, P2=18, P3=3)
Development Challenges & Achievements
1. Architecture Refactoring: From Direct Calls to Event-Driven
Challenge: v1.0 used direct function calls, resulting in tightly coupled modules that were difficult to maintain and extend.
Solution: Complete refactoring to event-driven architecture, with all modules communicating through Event Bus, achieving loosely coupled design.
Achievements:
- Modules can be developed and tested independently
- Adding new features requires no modification to existing code
- System stability significantly improved
2. MCP Protocol Integration
Challenge: How to enable LLM to actively invoke system functions (memory queries, workflow execution, etc.)?
Solution: Integrated Model Context Protocol, encapsulating system functions as standardized tools, allowing LLM to execute operations through structured calls.
Achievements:
- Enabled LLM to actively query memories
- Supported complex workflow automation (e.g., generating email drafts with attachments)
- Established an extensible tool ecosystem
3. Identity-Isolated Memory System
Challenge: How to ensure memory isolation and security in a multi-user environment?
Solution: Created independent FAISS indexes for each user, tracking memory flow through Memory Token mechanism.
Achievements:
- Completely isolated user memories
- Efficient semantic retrieval (FAISS vector search)
- Traceable memory sources (avoiding hallucination issues)
4. Performance Optimization
Challenge: System cold start time was excessively long (initially 120s), affecting user experience.
Solution:
- Lazy loading of non-essential modules
- Gemini context caching (reducing token processing time)
- Removed redundant initialization processes
Results: Cold start time reduced to 47.6s (60% reduction)
5. Technical Debt Management
Challenge: ConditionalStep workflow steps execute sequentially without waiting mechanisms, leading to race conditions.
Current Status: Documented in technical debt list, planned for refactoring in v0.10.0.
Future Roadmap
Short-term Goals
- Fix TTS module loading time issue (currently 20~24s)
- Increase unit test pass rate to 90%
- Complete error handling mechanisms for workflow engine
Mid-term Goals
- Refactor workflow engine (resolve ConditionalStep architecture debt)
- Implement more workflow types (30+ types)
- Increase test coverage to 90%
Long-term Goals
- Support multi-user concurrent sessions
- Cloud memory synchronization functionality
- Complete plugin ecosystem
Project Highlights
Technical Innovation
- ✅ Event-driven architecture achieving complete module decoupling
- ✅ MCP protocol integration enabling LLM to actively invoke system functions
- ✅ Identity-isolated vector memory system
Development Achievements
- ✅ 476 test cases with 85% coverage
- ✅ 21 automated workflows
- ✅ Complete frontend integration (Live2D animation, subtitles, dialogue bubbles)
Project Management
- ✅ Complete documentation system (SDD, PEP, TR test reports)
- ✅ Iterative development (Phase 1 → Phase 2 → Phase 3)
- ✅ CI/CD automated testing
Long-term Value
- ✅ Extensible modular design, easy to add new features
- ✅ Standardized interfaces (Event Bus, MCP), supporting third-party integration
- ✅ Comprehensive testing and monitoring system ensuring system stability
Programming Language: Python 3.10
Codebase Size: Approximately 15,000+ lines (excluding tests)
Test Coverage: 85% line coverage, 80% branch coverage
Test Pass Rate: 81.9% (390/476 cases)