Hey guys,
I'm new to floydhub and I'm not sure this is the place to post this so excuse me if it isn't.
I'm trying to train a simple constitutional neural net with tensorflow and up until now floydhub has been really easy to use. I'm using the cifar10 dataset that I've already uploaded, and my network has been training without any errors. So since I had that nailed down, I wanted to save checkpoints so that once it's trained I could run the classification on my laptop. But once I implemented the saving, my code gets a memory error when I'm trying to save checkpoints.
Here is my whole error traceback :
Traceback (most recent call last):
2018-08-21 14:43:07 PSTFile "main.py", line 173, in <module>
2018-08-21 14:43:07 PSTtrain(SAVING_PATH, LOAD_CHECKPOINT, dataset_iterator, N_EPOCHS, next_element, MODEL_NAME)
2018-08-21 14:43:07 PSTFile "main.py", line 165, in train
2018-08-21 14:43:07 PSTsaver.save(sess, saving_path + model_name, 10)
2018-08-21 14:43:07 PSTFile "/usr/local/lib/python3.6/site-packages/tensorflow/python/training/saver.py", line 1728, in save
2018-08-21 14:43:07 PSTmeta_graph_filename, strip_default_attrs=strip_default_attrs)
2018-08-21 14:43:07 PSTFile "/usr/local/lib/python3.6/site-packages/tensorflow/python/training/saver.py", line 1766, in export_meta_graph
2018-08-21 14:43:07 PSTgraph_def=ops.get_default_graph().as_graph_def(add_shapes=True),
2018-08-21 14:43:07 PSTFile "/usr/local/lib/python3.6/site-packages/tensorflow/python/framework/ops.py", line 3228, in as_graph_def
2018-08-21 14:43:07 PSTresult, _ = self._as_graph_def(from_version, add_shapes)
2018-08-21 14:43:07 PSTFile "/usr/local/lib/python3.6/site-packages/tensorflow/python/framework/ops.py", line 3170, in _as_graph_def
2018-08-21 14:43:07 PSTdata = c_api.TF_GetBuffer(buf)
2018-08-21 14:43:07 PSTMemoryError
Here is the part of my code that saves the checkpoint :
saver.save(sess, saving_path + model_name, i)
model_name = 'cifar10'
saving_path = '/output/' + model_name + '-cnn'
sess is the tf.Session() I'm using
Finally, i is the epoch I'm at
It is important to note that I am not trying to save at each epoch, I only save every 1000 epochs.
Would you guys have any idea of what I'm doing wrong when trying to save this checkpoint ? Any help is greatly appreciated,
Mindoo