Getting started with Cloud Storage¶
This tutorial focuses on using gcloud to access Google Cloud Storage. We’ll go through the basic concepts, how to operate on buckets and keys, and how to handle access control, among other things.
We’re going to assume that you’ve already downloaded and installed the library.
Creating a project¶
Create a project
Start off by visiting https://cloud.google.com/console and click on the big red button that says “Create Project”.
Choose a name
In the box that says “name”, choose something friendly. This is going to be the human-readable name for your project.
Choose an ID
In the box that says “ID”, choose something unique (hyphens are OK). I typically choose a project name that starts with my initials, then a hyphen, then a unique identifier for the work I’m doing. For this example, you might choose <initials>-quickstart.
Then click OK (give it a second to create your project).
Enabling the API¶
Now that you created a project, you need to turn on the Cloud Datastore API. This is sort of like telling Google which services you intend to use for this project.
- Click on APIs & Auth on the left hand side, and scroll down to where it says “Google Cloud Storage JSON API”.
- Click the “Off” button on the right side to turn it into an “On” button.
Enabling a service account¶
Now that you have a project, we need to make sure we are able to access our data. There are many ways to authenticate, but we’re going to use a Service Account for today.
A Service Account is sort of like a username and password (like when you’re connecting to your MySQL database), except the username is automatically generated (and is an e-mail address) and the password is actually a private key file.
To create a Service Account:
Click on Credentials under the “APIs & Auth” section.
Click the big red button that says “Create New Client ID” under the OAuth section (the first one).
Choose “Service Account” and click the blue button that says “Create Client ID”.
This will automatically download a private key file. Do not lose this.
Rename your key something shorter. I like to name the key <project name>.p12.
This is like your password for the account.
Copy the long weird e-mail address labeled “E-mail address” in the information section for the Service Account you just created.
This is like your username for the account.
Creating a connection¶
The first step in accessing Cloud Storage is to create a connection to the service:
>>> from gcloud import storage
>>> connection = storage.get_connection(project_name, email, key_path)
We’re going to use this connection object for the rest of this guide.
Creating a bucket¶
Once you’ve established a connection to Cloud Storage, the first thing you typically want to do is create a new bucket. A bucket is a container used to store objects in Cloud Storage (if you’re familiar with S3, buckets on Cloud Storage mean the same thing). Think of each bucket as a single “disk drive”, where you can store lots of files on each. How you organize your data is up to you, but it’s typical to group common data in a single bucket.
Let’s create a bucket:
>>> bucket = connection.create_bucket('test')
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "gcloud/storage/connection.py", line 340, in create_bucket
data={'name': bucket.name})
File "gcloud/storage/connection.py", line 224, in api_request
raise exceptions.ConnectionError(response, content)
gcloud.storage.exceptions.ConnectionError: {'status': '409', 'alternate-protocol': '443:quic', 'content-length': '271', 'x-xss-protection': '1; mode=block', 'x-content-type-options': 'nosniff', 'expires': 'Sat, 15 Mar 2014 19:19:47 GMT', 'server': 'GSE', '-content-encoding': 'gzip', 'cache-control': 'private, max-age=0', 'date': 'Sat, 15 Mar 2014 19:19:47 GMT', 'x-frame-options': 'SAMEORIGIN', 'content-type': 'application/json; charset=UTF-8'}{
"error": {
"errors": [
{
"domain": "global",
"reason": "conflict",
"message": "Sorry, that name is not available. Please try a different one."
}
],
"code": 409,
"message": "Sorry, that name is not available. Please try a different one."
}
}
Whoops! It might be important to mention that bucket names are like domain names: it’s one big namespace that we all share, so you have to pick a bucket name that isn’t already taken.
It’s up to you to decide what a good name is, let’s assume that you found a unique name and are ready to move on with your newly created bucket.
Storing data¶
OK, so you have a bucket. Now what? Cloud Storage is just an arbitrary data container, so you can put whatever format of data you want. The naming of your files is also arbitrary, however the Cloud Storage online file browser tries to make it feel a bit like a file system by recognizing forward-slashes (/) so if you want to group data into “directories”, you can do that.
The fundamental container for a file in Cloud Storage is called an Object, however gcloud uses the term Key to avoid confusion between object and Object.
If you want to set some data, you just create a Key inside your bucket and store your data inside the key:
>>> key = bucket.new_key('greeting.txt')
>>> key.set_contents_from_string('Hello world!')
new_key creates a Key object locally and set_contents_from_string allows you to put a string into the key.
Now we can test if it worked:
>>> key = bucket.get_key('greeting.txt')
>>> print key.get_contents_as_string()
Hello world!
What if you want to save the contents to a file?
>>> key.get_contents_to_filename('greetings.txt')
Then you can look at the file in a terminal:
$ cat greetings.txt
Hello world!
And what about when you’re not dealing with text? That’s pretty simple too:
>>> key = bucket.new_key('kitten.jpg')
>>> key.set_contents_from_filename('kitten.jpg')
And to test whether it worked?
>>> key = bucket.get_key('kitten.jpg')
>>> key.get_contents_to_filename('kitten2.jpg')
and check if they are the same in a terminal:
$ diff kitten.jpg kitten2.jpg
Notice that we’re using get_key to retrieve a key we know exists remotely. If the key doesn’t exist, it will return None.
Note
get_key is not retrieving the entire object’s data.
If you want to “get-or-create” the key (that is, overwrite it if it already exists), you can use new_key. However, keep in mind, the key is not created until you store some data inside of it.
If you want to check whether a key exists, you can use the in operator in Python:
>>> print 'kitten.jpg' in bucket
True
>>> print 'does-not-exist' in bucket
False
Accessing a bucket¶
If you already have a bucket, use get_bucket to retrieve the bucket object:
>>> bucket = connection.get_bucket('my-bucket')
If you want to get all the keys in the bucket, you can use get_all_keys:
>>> keys = bucket.get_all_keys()
However, if you’re looking to iterate through the keys, you can use the bucket itself as an iterator:
>>> for key in bucket:
... print key
Deleting a bucket¶
You can delete a bucket using the delete_bucket method:
>>> connection.delete_bucket('my-bucket')
Remember, the bucket you’re deleting needs to be empty, otherwise you’ll get an error.
If you have a full bucket, you can delete it this way:
>>> bucket = connection.get_bucket('my-bucket')
>>> for key in bucket:
... key.delete()
>>> bucket.delete()
Listing available buckets¶
The Connection object itself is iterable, so you can loop over it, or call list on it to get a list object:
>>> for bucket in connection:
... print bucket.name
>>> print list(connection)
Managing access control¶
Cloud storage provides fine-grained access control for both buckets and keys. gcloud tries to simplify access control by working with entities and “grants”. On any ACL, you get a reference to an entity, and then either grant or revoke a specific access level. Additionally, we provide two default entities: all users, and all authenticated users.
For example, if you want to grant read access to all users on your bucket:
>>> bucket.get_acl().all().grant_read()
For more detail on access control, see gcloud.storage.acl.