# majiang-rl A minimal Mahjong (Guobiao) simulation environment and reinforcement learning scaffold built on gymnasium. ## Features - 4-player Guobiao tile set (144 tiles, including flowers) - Draw/discard turn loop with flower replacement - Basic calls: win (ron/tsumo), pong, chi, kong - Win checking for standard hands, seven pairs, and thirteen orphans - Gymnasium-style environment API with action masks (type/discard/pong/kong/chi) - Simple RL loop and random agent ## Limitations - No scoring or 8-fan enforcement - NPC players use simple greedy claims and random discards - No detailed round rules (winds/seat rotation, riichi, etc.) ## Quick start (uv) ```bash uv venv uv pip install -e . uv run python main.py ``` ## Environment API ```python from majiang_rl import MahjongEnv env = MahjongEnv() obs, info = env.reset() # action format # type: 0 discard, 1 declare win, 2 declare kong, 3 pass, 4 declare pong, 5 declare chi # tile: tile id (0-41) # chi: 0 left, 1 middle, 2 right action = {"type": 0, "tile": 0, "chi": 0} obs, reward, terminated, truncated, info = env.step(action) ``` ## RL scaffold ```python from majiang_rl import MahjongEnv from majiang_rl.rl import RandomAgent, run_training env = MahjongEnv() agent = RandomAgent() results = run_training(env, agent, episodes=10) print(results[0]) ``` ## GRPO self-play training ```bash uv run python -m majiang_rl.rl.grpo --updates 20 --group-size 16 --device auto uv run python -m majiang_rl.rl.grpo --updates 20 --group-size 16 --device auto --pong-reward 0.1 --closest-bonus 1.0 uv run python -m majiang_rl.rl.grpo --updates 20 --group-size 16 --device auto --swanlab --swanlab-project majiang-rl --swanlab-run-name grpo-demo ``` Reward uses a simplified fan breakdown (thirteen orphans, seven pairs, pure/half flush, all pungs, all honors). ## Simple web UI ```bash uv run python -m majiang_rl.ui.web --port 8000 ``` Then open `http://localhost:8000/index.html` to watch the playback.