google.resumable_media.requests
¶
requests
utilities for Google Media Downloads and Resumable Uploads.
This sub-package assumes callers will use the requests library
as transport and google-auth for sending authenticated HTTP traffic
with requests
.
Authorized Transport¶
To use google-auth
and requests
to create an authorized transport
that has read-only access to Google Cloud Storage (GCS):
>>> import google.auth
>>> import google.auth.transport.requests as tr_requests
>>>
>>> ro_scope = u'https://www.googleapis.com/auth/devstorage.read_only'
>>> credentials, _ = google.auth.default(scopes=(ro_scope,))
>>> transport = tr_requests.AuthorizedSession(credentials)
>>> transport
<google.auth.transport.requests.AuthorizedSession object at 0x...>
Simple Downloads¶
To download an object from Google Cloud Storage, construct the media URL for the GCS object and download it with an authorized transport that has access to the resource:
>>> from google.resumable_media.requests import Download
>>>
>>> url_template = (
... u'https://www.googleapis.com/download/storage/v1/b/'
... u'{bucket}/o/{blob_name}?alt=media')
>>> media_url = url_template.format(
... bucket=bucket, blob_name=blob_name)
>>>
>>> download = Download(media_url)
>>> response = download.consume(transport)
>>> download.finished
True
>>> response
<Response [200]>
>>> response.headers[u'Content-Length']
'1364156'
>>> len(response.content)
1364156
To download only a portion of the bytes in the object,
specify start
and end
byte positions (both optional):
>>> download = Download(media_url, start=4096, end=8191)
>>> response = download.consume(transport)
>>> download.finished
True
>>> response
<Response [206]>
>>> response.headers[u'Content-Length']
'4096'
>>> response.headers[u'Content-Range']
'bytes 4096-8191/1364156'
>>> len(response.content)
4096
Chunked Downloads¶
For very large objects or objects of unknown size, it may make more sense to download the object in chunks rather than all at once. This can be done to avoid dropped connections with a poor internet connection or can allow multiple chunks to be downloaded in parallel to speed up the total download.
A ChunkedDownload
uses the same media URL and authorized
transport that a basic Download
would use, but also
requires a chunk size and a write-able byte stream
. The chunk size is used
to determine how much of the resouce to consume with each request and the
stream is to allow the resource to be written out (e.g. to disk) without
having to fit in memory all at once.
>>> from google.resumable_media.requests import ChunkedDownload
>>>
>>> chunk_size = 50 * 1024 * 1024 # 50MB
>>> stream = io.BytesIO()
>>> download = ChunkedDownload(
... media_url, chunk_size, stream)
>>> # Check the state of the download before starting.
>>> download.bytes_downloaded
0
>>> download.total_bytes is None
True
>>> response = download.consume_next_chunk(transport)
>>> # Check the state of the download after consuming one chunk.
>>> download.finished
False
>>> download.bytes_downloaded # chunk_size
52428800
>>> download.total_bytes # 1GB
1073741824
>>> response
<Response [206]>
>>> response.headers[u'Content-Length']
'52428800'
>>> response.headers[u'Content-Range']
'bytes 0-52428799/1073741824'
>>> len(response.content) == chunk_size
True
>>> stream.seek(0)
0
>>> stream.read(29)
b'The beginning of the chunk...'
The download will change it’s finished
status to True
once the final chunk is consumed. In some cases, the final chunk may
not be the same size as the other chunks:
>>> # The state of the download in progress.
>>> download.finished
False
>>> download.bytes_downloaded # 20 chunks at 50MB
1048576000
>>> download.total_bytes # 1GB
1073741824
>>> response = download.consume_next_chunk(transport)
>>> # The state of the download after consuming the final chunk.
>>> download.finished
True
>>> download.bytes_downloaded == download.total_bytes
True
>>> response
<Response [206]>
>>> response.headers[u'Content-Length']
'25165824'
>>> response.headers[u'Content-Range']
'bytes 1048576000-1073741823/1073741824'
>>> len(response.content) < download.chunk_size
True
In addition, a ChunkedDownload
can also take optional
start
and end
byte positions.
Simple Uploads¶
Among the three supported upload classes, the simplest is
SimpleUpload
. A simple upload should be used when the resource
being uploaded is small and when there is no metadata (other than the name)
associated with the resource.
>>> from google.resumable_media.requests import SimpleUpload
>>>
>>> url_template = (
... u'https://www.googleapis.com/upload/storage/v1/b/{bucket}/o?'
... u'uploadType=media&'
... u'name={blob_name}')
>>> upload_url = url_template.format(
... bucket=bucket, blob_name=blob_name)
>>>
>>> upload = SimpleUpload(upload_url)
>>> data = b'Some not too large content.'
>>> content_type = u'text/plain'
>>> response = upload.transmit(transport, data, content_type)
>>> upload.finished
True
>>> response
<Response [200]>
>>> json_response = response.json()
>>> json_response[u'bucket'] == bucket
True
>>> json_response[u'name'] == blob_name
True
>>> json_response[u'contentType'] == content_type
True
>>> json_response[u'md5Hash']
'M0XLEsX9/sMdiI+4pB4CAQ=='
>>> int(json_response[u'size']) == len(data)
True
In the rare case that an upload fails, an InvalidResponse
will be raised:
>>> upload = SimpleUpload(upload_url)
>>> error = None
>>> try:
... upload.transmit(transport, data, content_type)
... except resumable_media.InvalidResponse as caught_exc:
... error = caught_exc
...
>>> error
InvalidResponse('Request failed with status code', 503,
'Expected one of', <HTTPStatus.OK: 200>)
>>> error.response
<Response [503]>
>>>
>>> upload.finished
True
Even in the case of failure, we see that the upload is
finished
, i.e. it cannot be re-used.
Multipart Uploads¶
After the simple upload, the MultipartUpload
can be used to
achieve essentially the same task. However, a multipart upload allows some
metadata about the resource to be sent along as well. (This is the “multi”:
we send a first part with the metadata and a second part with the actual
bytes in the resource.)
Usage is similar to the simple upload, but transmit()
accepts an extra required argument: metadata
.
>>> from google.resumable_media.requests import MultipartUpload
>>>
>>> url_template = (
... u'https://www.googleapis.com/upload/storage/v1/b/{bucket}/o?'
... u'uploadType=multipart')
>>> upload_url = url_template.format(bucket=bucket)
>>>
>>> upload = MultipartUpload(upload_url)
>>> metadata = {
... u'name': blob_name,
... u'metadata': {
... u'color': u'grurple',
... },
... }
>>> response = upload.transmit(transport, data, metadata, content_type)
>>> upload.finished
True
>>> response
<Response [200]>
>>> json_response = response.json()
>>> json_response[u'bucket'] == bucket
True
>>> json_response[u'name'] == blob_name
True
>>> json_response[u'metadata'] == metadata[u'metadata']
True
As with the simple upload, in the case of failure an InvalidResponse
is raised, enclosing the response
that caused
the failure and the upload
object cannot be re-used after a failure.
Resumable Uploads¶
A ResumableUpload
deviates from the other two upload classes:
it transmits a resource over the course of multiple requests. This
is intended to be used in cases where:
- the size of the resource is not known (i.e. it is generated on the fly)
- requests must be short-lived
- the client has request size limitations
- the resource is too large to fit into memory
In general, a resource should be sent in a single request to avoid latency and reduce QPS. See GCS best practices for more things to consider when using a resumable upload.
After creating a ResumableUpload
instance, a
resumable upload session must be initiated to let the server know that
a series of chunked upload requests will be coming and to obtain an
upload_id
for the session. In contrast to the other two upload classes,
initiate()
takes a byte stream
as input rather
than raw bytes as data
. This can be a file object, a BytesIO
object or any other stream implementing the same interface.
>>> from google.resumable_media.requests import ResumableUpload
>>>
>>> url_template = (
... u'https://www.googleapis.com/upload/storage/v1/b/{bucket}/o?'
... u'uploadType=resumable')
>>> upload_url = url_template.format(bucket=bucket)
>>>
>>> chunk_size = 1024 * 1024 # 1MB
>>> upload = ResumableUpload(upload_url, chunk_size)
>>> stream = io.BytesIO(data)
>>> # The upload doesn't know how "big" it is until seeing a stream.
>>> upload.total_bytes is None
True
>>> metadata = {u'name': blob_name}
>>> response = upload.initiate(transport, stream, metadata, content_type)
>>> response
<Response [200]>
>>> upload.resumable_url == response.headers[u'Location']
True
>>> upload.total_bytes == len(data)
True
>>> upload_id = response.headers[u'X-GUploader-UploadID']
>>> upload_id
'ABCdef189XY_super_serious'
>>> upload.resumable_url == upload_url + u'&upload_id=' + upload_id
True
Once a ResumableUpload
has been initiated, the resource is
transmitted in chunks until completion:
>>> response0 = upload.transmit_next_chunk(transport)
>>> response0
<Response [308]>
>>> upload.finished
False
>>> upload.bytes_uploaded == upload.chunk_size
True
>>>
>>> response1 = upload.transmit_next_chunk(transport)
>>> response1
<Response [308]>
>>> upload.finished
False
>>> upload.bytes_uploaded == 2 * upload.chunk_size
True
>>>
>>> response2 = upload.transmit_next_chunk(transport)
>>> response2
<Response [200]>
>>> upload.finished
True
>>> upload.bytes_uploaded == upload.total_bytes
True
>>> json_response = response2.json()
>>> json_response[u'bucket'] == bucket
True
>>> json_response[u'name'] == blob_name
True