feat: initialize majiang-rl project
This commit is contained in:
64
README.md
Normal file
64
README.md
Normal file
@@ -0,0 +1,64 @@
|
||||
# majiang-rl
|
||||
|
||||
A minimal Mahjong (Guobiao) simulation environment and reinforcement learning scaffold built on gymnasium.
|
||||
|
||||
## Features
|
||||
- 4-player Guobiao tile set (144 tiles, including flowers)
|
||||
- Draw/discard turn loop with flower replacement
|
||||
- Basic calls: win (ron/tsumo), pong, chi, kong
|
||||
- Win checking for standard hands, seven pairs, and thirteen orphans
|
||||
- Gymnasium-style environment API with action masks (type/discard/pong/kong/chi)
|
||||
- Simple RL loop and random agent
|
||||
|
||||
## Limitations
|
||||
- No scoring or 8-fan enforcement
|
||||
- NPC players use simple greedy claims and random discards
|
||||
- No detailed round rules (winds/seat rotation, riichi, etc.)
|
||||
|
||||
## Quick start (uv)
|
||||
```bash
|
||||
uv venv
|
||||
uv pip install -e .
|
||||
uv run python main.py
|
||||
```
|
||||
|
||||
## Environment API
|
||||
```python
|
||||
from majiang_rl import MahjongEnv
|
||||
|
||||
env = MahjongEnv()
|
||||
obs, info = env.reset()
|
||||
|
||||
# action format
|
||||
# type: 0 discard, 1 declare win, 2 declare kong, 3 pass, 4 declare pong, 5 declare chi
|
||||
# tile: tile id (0-41)
|
||||
# chi: 0 left, 1 middle, 2 right
|
||||
action = {"type": 0, "tile": 0, "chi": 0}
|
||||
obs, reward, terminated, truncated, info = env.step(action)
|
||||
```
|
||||
|
||||
## RL scaffold
|
||||
```python
|
||||
from majiang_rl import MahjongEnv
|
||||
from majiang_rl.rl import RandomAgent, run_training
|
||||
|
||||
env = MahjongEnv()
|
||||
agent = RandomAgent()
|
||||
results = run_training(env, agent, episodes=10)
|
||||
print(results[0])
|
||||
```
|
||||
|
||||
## GRPO self-play training
|
||||
```bash
|
||||
uv run python -m majiang_rl.rl.grpo --updates 20 --group-size 16 --device auto
|
||||
uv run python -m majiang_rl.rl.grpo --updates 20 --group-size 16 --device auto --pong-reward 0.1 --closest-bonus 1.0
|
||||
uv run python -m majiang_rl.rl.grpo --updates 20 --group-size 16 --device auto --swanlab --swanlab-project majiang-rl --swanlab-run-name grpo-demo
|
||||
```
|
||||
|
||||
Reward uses a simplified fan breakdown (thirteen orphans, seven pairs, pure/half flush, all pungs, all honors).
|
||||
|
||||
## Simple web UI
|
||||
```bash
|
||||
uv run python -m majiang_rl.ui.web --port 8000
|
||||
```
|
||||
Then open `http://localhost:8000/index.html` to watch the playback.
|
||||
Reference in New Issue
Block a user