Hi @an2908,
Is there any way for me to connect my cpu / gpu instance to gcp?
Yes, it is. First of all, you need to set up the proper read/write permissions to the GS bucket for you (or your service account), and upload your google-auth.json
to the remote machine on FlodyHub (make sure to work with Private Projects!). Then you need to install the below packages (I assume you are working from a notebook on FloydHub's Workspace):
!pip install google-api-core google-cloud-storage
Test it that it's working as expected:
import os
# Make the credential available in the env
os.environ["GOOGLE_APPLICATION_CREDENTIALS"] = '/path/to/google-auth.json
from google.cloud import storage
# If you don't specify credentials when constructing the client, the
# client library will look for credentials in the environment.
storage_client = storage.Client()
# Make an authenticated API request
buckets = list(storage_client.list_buckets())
print(buckets)
If the script returns the list of your bucket, you can now start to use gs://<bucket_name>
inside your pipeline scripts.
Useful considerations:
- Not all the frameworks API support cloud filesystem, check this in the frameworks API docs.
- Latency can kill the performance of your pipeline if you are planning to stream data during training. To overcome this just a bit, you can consider to store your bucket in the Oregon datacenter (IIRC, us-west-1).
Useful resources:
- GCloud Getting Started with Authentication
- tfds and Google Cloud Storage
- Training Faster With Large Datasets using Scale and PyTorch
Let me know if you have any questions or need more help.
Hope that helps