Q2. 1. 关于模拟器的设计,我们发现实际上只需要不多的几个参数就可以描述整个游戏,比如用云台的俯仰角,底盘的角度和机器人的坐标就可以表示机器人的空间状态。所以我们打算这样来设计模拟器,就是先假设其他组的可以获得这些信息,然后我们来决策。这样简化了模拟器的设计,请问这样的设计会有什么问题吗?2.关于强化学习的,实际比赛的时候,我们不能直接获得敌方的比赛信息,血量,枪口热度之类的,但在模拟器里训练的时候可以获得,我们想在训练网络的时候输入实际可以获得信息,但奖励函数使用这些信息,也就是网络的输入与奖励函数的参数是不一样的。这样可以吗,会有什么问题吗?提前谢谢了
1. The abstraction level of observation space and action space will definitely have an impact on the performance. And an end-to-end system is not that stable in the real world. So you may need to train the network separately for the high noise model, or just substitute high noise model with the human-designed model. 2. A new firmware that would provide enemy information for training is in our future plan, but during the competition, you are not able to know enemy information. So reward shaping may be tough work. |