用tensorflow实现强化学习的dql算法报错:tensorflow.python.framework.errors_impl.InternalError: Could not find valid device for node. Node: {{node OneHot}} = OneHot[T=DT_FLOAT, TI=DT_FLOAT, axis=-1](dummy_input, dummy_input, dummy_input, dummy_input)

  统计/机器学习 TensorFlow    浏览次数: 131
1

代码:

import tensorflow as tf

import numpy as np

import gym

import random

from collections import deque


tf.enable_eager_execution()

num_episodes = 500

num_exploration_episodes = 100

max_len_episode = 1000

batch_size = 32

learning_rate = 1e-3

gamma =1

initial_epsilon = 1

final_epsilon = 0.01



class QNetwork(tf.keras.Model):

    def __init__(self):

        super().__init__()

        self.dense1 = tf.keras.layers.Dense(units= 24, activation = tf.nn.relu)

        self.dense2 = tf.keras.layers.Dense(units= 24, activation = tf.nn.relu)

        self.dense3 = tf.keras.layers.Dense(units = 2)


    def call(self, inputs):

        x = self.dense1(inputs)

        x = self.dense2(x)

        x= self.dense3(x)

        return x


    def predict(self, inputs):

        q_values = self(inputs)

        return tf.argmax(q_values, axis=-1)




env = gym.make('CartPole-v1')

model = QNetwork()

optimizer = tf.train.AdamOptimizer(learning_rate)

replay_buffer = deque(maxlen = 10000)

epsilon = initial_epsilon

for episode_id in range(num_episodes):

    state = env.reset()

    episode = max(

        initial_epsilon* (num_exploration_episodes - episode_id) / num_exploration_episodes,

        final_epsilon

    )


    for t in range(max_len_episode):

        env.render()

        if random.random() < epsilon:

            action = env.action_space.sample()

        else:

            action = model.predict(

                tf.constant(np.expand_dims(state,axis=0, dtype = tf.float32)).numpy()

            )

            action = action[0]


        next_state, reward, done, info = env.step(action)

        reward = -10. if done else reward

        replay_buffer.append((state,action,reward,next_state,done))

        state = next_state


        if done==1:

            print('episode %d, epsilon %f, score %d' % (episode_id, epsilon, t))

            break


        if len(replay_buffer) >=batch_size:

            batch_state, batch_action, batch_reward, batch_next_state, batch_done = [np.array(a, dtype=np.float32) for a in zip(*random.sample(replay_buffer, batch_size))]


            q_value = model(tf.constant(batch_next_state, dtype = tf.float32))

            y = batch_reward + (gamma* tf.reduce_max(q_value, axis=1))*(1-batch_done)


            with tf.GradientTape() as tape:

                loss = tf.losses.mean_squared_error(

                    lables=y,

                    predictions = tf.reduce_sum(model(tf.constant(batch_state)) * tf.one_hot(batch_action, depth=2), axis=1)

                )


            grads = tape.gradient(loss, model.variables)

            optimizer.apply_gradients(grads_and_vars = zip(grads,model.variables))


报错:/home/kalarea/.conda/envs/py35/bin/python /home/kalarea/PycharmProjects/dql_demo/dpq_cartplie.py

/home/kalarea/.conda/envs/py35/lib/python3.5/site-packages/h5py/__init__.py:36: FutureWarning: Conversion of the second argument of issubdtype from `float` to `np.floating` is deprecated. In future, it will be treated as `np.float64 == np.dtype(float).type`.

  from ._conv import register_converters as _register_converters

episode 0, epsilon 1.000000, score 12

2018-10-15 12:54:59.165871: I tensorflow/core/platform/cpu_feature_guard.cc:141] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2 FMA

Traceback (most recent call last):

  File "/home/kalarea/PycharmProjects/dql_demo/dpq_cartplie.py", line 77, in <module>

    predictions = tf.reduce_sum(model(tf.constant(batch_state)) * tf.one_hot(batch_action, depth=2),axis=1)

  File "/home/kalarea/.conda/envs/py35/lib/python3.5/site-packages/tensorflow/python/ops/array_ops.py", line 2439, in one_hot

    name)

  File "/home/kalarea/.conda/envs/py35/lib/python3.5/site-packages/tensorflow/python/ops/gen_array_ops.py", line 4563, in one_hot

    _six.raise_from(_core._status_to_exception(e.code, message), None)

  File "<string>", line 3, in raise_from

tensorflow.python.framework.errors_impl.InternalError: Could not find valid device for node.

Node: {{node OneHot}} = OneHot[T=DT_FLOAT, TI=DT_FLOAT, axis=-1](dummy_input, dummy_input, dummy_input, dummy_input)

All kernels registered for op OneHot :

  device='CPU'; TI in [DT_INT64]; T in [DT_VARIANT]

  device='CPU'; TI in [DT_INT32]; T in [DT_VARIANT]

  device='CPU'; TI in [DT_UINT8]; T in [DT_VARIANT]

  device='CPU'; TI in [DT_INT64]; T in [DT_RESOURCE]

  device='CPU'; TI in [DT_INT32]; T in [DT_RESOURCE]

  device='CPU'; TI in [DT_UINT8]; T in [DT_RESOURCE]

  device='CPU'; TI in [DT_INT64]; T in [DT_STRING]

  device='CPU'; TI in [DT_INT32]; T in [DT_STRING]

  device='CPU'; TI in [DT_UINT8]; T in [DT_STRING]

  device='CPU'; TI in [DT_INT64]; T in [DT_BOOL]

  device='CPU'; TI in [DT_INT32]; T in [DT_BOOL]

  device='CPU'; TI in [DT_UINT8]; T in [DT_BOOL]

  device='CPU'; TI in [DT_INT64]; T in [DT_COMPLEX128]

  device='CPU'; TI in [DT_INT32]; T in [DT_COMPLEX128]

  device='CPU'; TI in [DT_UINT8]; T in [DT_COMPLEX128]

  device='CPU'; TI in [DT_INT64]; T in [DT_COMPLEX64]

  device='CPU'; TI in [DT_INT32]; T in [DT_COMPLEX64]

  device='CPU'; TI in [DT_UINT8]; T in [DT_COMPLEX64]

  device='CPU'; TI in [DT_INT64]; T in [DT_DOUBLE]

  device='CPU'; TI in [DT_INT32]; T in [DT_DOUBLE]

  device='CPU'; TI in [DT_UINT8]; T in [DT_DOUBLE]

  device='CPU'; TI in [DT_INT64]; T in [DT_FLOAT]

  device='CPU'; TI in [DT_INT32]; T in [DT_FLOAT]

  device='CPU'; TI in [DT_UINT8]; T in [DT_FLOAT]

  device='CPU'; TI in [DT_INT64]; T in [DT_BFLOAT16]

  device='CPU'; TI in [DT_INT32]; T in [DT_BFLOAT16]

  device='CPU'; TI in [DT_UINT8]; T in [DT_BFLOAT16]

  device='CPU'; TI in [DT_INT64]; T in [DT_HALF]

  device='CPU'; TI in [DT_INT32]; T in [DT_HALF]

  device='CPU'; TI in [DT_UINT8]; T in [DT_HALF]

  device='CPU'; TI in [DT_INT64]; T in [DT_INT8]

  device='CPU'; TI in [DT_INT32]; T in [DT_INT8]

  device='CPU'; TI in [DT_UINT8]; T in [DT_INT8]

  device='CPU'; TI in [DT_INT64]; T in [DT_UINT8]

  device='CPU'; TI in [DT_INT32]; T in [DT_UINT8]

  device='CPU'; TI in [DT_UINT8]; T in [DT_UINT8]

  device='CPU'; TI in [DT_INT64]; T in [DT_INT16]

  device='CPU'; TI in [DT_INT32]; T in [DT_INT16]

  device='CPU'; TI in [DT_UINT8]; T in [DT_INT16]

  device='CPU'; TI in [DT_INT64]; T in [DT_UINT16]

  device='CPU'; TI in [DT_INT32]; T in [DT_UINT16]

  device='CPU'; TI in [DT_UINT8]; T in [DT_UINT16]

  device='CPU'; TI in [DT_INT64]; T in [DT_INT32]

  device='CPU'; TI in [DT_INT32]; T in [DT_INT32]

  device='CPU'; TI in [DT_UINT8]; T in [DT_INT32]

  device='CPU'; TI in [DT_INT64]; T in [DT_INT64]

  device='CPU'; TI in [DT_INT32]; T in [DT_INT64]

  device='CPU'; TI in [DT_UINT8]; T in [DT_INT64]

 [Op:OneHot] name: one_hot/

Exception ignored in: <bound method Viewer.__del__ of <gym.envs.classic_control.rendering.Viewer object at 0x7fca9f2baef0>>

Traceback (most recent call last):

  File "/home/kalarea/gym/gym/envs/classic_control/rendering.py", line 143, in __del__

  File "/home/kalarea/gym/gym/envs/classic_control/rendering.py", line 62, in close

  File "/home/kalarea/.conda/envs/py35/lib/python3.5/site-packages/pyglet/window/xlib/__init__.py", line 480, in close

  File "/home/kalarea/.conda/envs/py35/lib/python3.5/site-packages/pyglet/gl/xlib.py", line 345, in destroy

  File "/home/kalarea/.conda/envs/py35/lib/python3.5/site-packages/pyglet/gl/base.py", line 334, in destroy

  File "/home/kalarea/.conda/envs/py35/lib/python3.5/site-packages/pyglet/gl/xlib.py", line 335, in detach

  File "/home/kalarea/.conda/envs/py35/lib/python3.5/site-packages/pyglet/gl/lib.py", line 97, in errcheck

  File "<frozen importlib._bootstrap>", line 968, in _find_and_load

  File "<frozen importlib._bootstrap>", line 953, in _find_and_load_unlocked

  File "<frozen importlib._bootstrap>", line 887, in _find_spec

TypeError: 'NoneType' object is not iterable


Process finished with exit code 1


 

北斗   2018-10-15 14:01



    还没有回答。我来答!  


  相关主题

模仿tensorflow教程10-11的代码,报错,求解~~~~~~   1回答

如何通过docker安装TensorFlow   1回答

pycharm+anaconda的安装问题   1回答

tf.reduce_sum是什么函数   1回答

tf.truncated_normal和tf.random_normal有什么区别?   1回答

tf.placeholder和tf.variable什么区别?   1回答

怎么print或者查看tf.tensor中的数值   1回答

tensorflow 训练的时候输出nan   1回答

tensorflow里出现的strides是什么意思   1回答

tf.placeholder(tf.float32, shape=(None, 1024))中的None是什么意思   1回答

tensorflow里面怎么自定义一个loss function?   2回答

tensorflow一定要用gpu吗?   2回答



回答问题时需要注意什么?

我们谢绝在回答前讲“生动”的故事。

我们谢绝“这么简单,你自己想”、“书上有的,你认真看”这类的回答;如果你认为对方的提问方式或者内容不妥,你可以直接忽略该问题,不用进行任何作答,甚至可以对该问题投反对票。

我们谢绝答非所问。

我们谢绝自己不会、硬要回答。

我们感激每一个用户在编写答案时的努力与付出!