This paper presents a state estimation approach for reinforcement learning (RL) of a partially observable Markov decision process. It is based on a special recurrent neural network architecture, the Markov decision process extraction network with shortcuts (MPEN-S). In contrast to previous work regarding this topic, we address the problem of long-term dependencies, which cause major problems in many real-world applications. The architecture is designed to model the reward-relevant dynamics of an
environment and is capable to condense large sets of continuous observables to a compact Markovian state representation. The resulting estimate can be used as input for RL methods that assume the underlying system to be a Markov decision process. Although the approach was developed with RL in mind, it is also useful for general prediction tasks.