跑tensorflow踩过的坑

tensorflow报错锦集

2018年5月3日

Q1: 在自定义损失函数的时候，求tf.sqrt()用到了类型转换tf.bitcast；然后报错LookupError: No gradient defined for operation 'loss/Bitcast' (op type: Bitcast)；
A1：用tf.to_float转类型试试，实测已经解决
Q2：ValueError: Cannot feed value of shape (485686,) for Tensor 'input_y:0', which has shape '(?, 485686)'
A2：维度问题，tf.expand_dims(y, 0)，0为在前面加维度，1为加在第二维，-1为加在最后一维应该可以解决；不对用这个方法还是报错TypeError: The value of a feed cannot be a tf.Tensor object. Acceptable feed values include Python scalars, strings, lists, numpy ndarrays, or TensorHandles.For reference, the tensor object was Tensor("ExpandDims:0", shape=(1, 485686), dtype=float64) which was passed to the feed with key Tensor("input_y:0", shape=(?, 485686), dtype=float32).，看来得用np.reshape,解决
Q3：InvalidArgumentError (see above for traceback): Nan in summary histogram for: conv-maxpool-3/W_0/grad/hist [[Node: conv-maxpool-3/W_0/grad/hist = HistogramSummary[T=DT_FLOAT, _device="/job:localhost/replica:0/task:0/device:CPU:0"](conv-maxpool-3/W_0/grad/hist/tag, gradients/conv-maxpool-3/conv_grad/tuple/control_dependency_1)]]
A3：问题解答;可能是梯度爆炸，在计算距离的时候可能商为0？在商上面加个很小的值看看；不行，还是会报错。试试数据类型转高精度to_double,同时把学习率给折半了5 e-4，还是不行；把hist写入的那个关了。
Q4：tensorflow加载模型的时候报错：tensorflow.python.framework.errors_impl.DataLossError: Unable to open table file；
A4：TensorFlow 模型保存/载入；saver.restore()时填的文件名，因为在saver.save的时候，每个checkpoint会保存三个文件，如
my-model-10000.meta, my-model-10000.index, my-model-10000.data-00000-of-00001
在import_meta_graph时填的就是meta文件名，我们知道权值都保存在my-model-10000.data-00000-of-00001这个文件中，但是如果在restore方法中填这个文件名，就会报错，应该填的是前缀，这个前缀可以使用tf.train.latest_checkpoint(checkpoint_dir)这个方法获取。
Q5：`ValueError: The name ‘global_step’ refers to an Operation, not a Tensor. Tensor names must be of the form “<op_name>:<output_index>”.
A5：.get_operation_by_name(‘’).outputs[0]（亲测可以）或者.get_collection(‘’)没试过
Q6：KeyError: "The name 'loss' refers to an Operation not in the graph."，loss是在scope(“loss”)下面的，重载模型后不知道名字应该是啥，应为loss没命名名字，只是放在scope(“loss”)下面
A6：貌似无解，因为loss不是张量。直接定义调用？