google.resumable_media.requests package¶
Submodules¶
Module contents¶
Utilities for Google Media Downloads and Resumable Uploads.
This sub-package assumes callers will use the requests library
as transport and google-auth for sending authenticated requests
with requests
.
Authorized Transport¶
To use google-auth
and requests
to create an authorized transport
that has read-only access to Google Cloud Storage (GCS):
>>> import google.auth
>>> import google.auth.transport.requests as tr_requests
>>>
>>> ro_scope = u'https://www.googleapis.com/auth/devstorage.read_only'
>>> credentials, _ = google.auth.default(scopes=(ro_scope,))
>>> transport = tr_requests.AuthorizedSession(credentials)
>>> transport
<google.auth.transport.requests.AuthorizedSession object at 0x...>
Simple Downloads¶
To download an object from Google Cloud Storage, construct the media URL for the GCS object and download it with an authorized transport that has access to the resource:
>>> from google.resumable_media.requests import Download
>>>
>>> url_template = (
... u'https://www.googleapis.com/download/storage/v1/b/'
... u'{bucket}/o/{blob_name}?alt=media')
>>> media_url = url_template.format(
... bucket=bucket, blob_name=blob_name)
>>>
>>> download = Download(media_url)
>>> response = download.consume(transport)
>>> download.finished
True
>>> response
<Response [200]>
>>> response.headers[u'Content-Length']
'1364156'
>>> len(response.content)
1364156
To download only a portion of the bytes in the object,
specify start
and end
byte positions (both optional):
>>> download = Download(media_url, start=4096, end=8191)
>>> response = download.consume(transport)
>>> download.finished
True
>>> response
<Response [206]>
>>> response.headers[u'Content-Length']
'4096'
>>> response.headers[u'Content-Range']
'bytes 4096-8191/1364156'
>>> len(response.content)
4096
Chunked Downloads¶
For very large objects or objects of unknown size, it may make more sense to download the object in chunks rather than all at once. This can be done to avoid dropped connections with a poor internet connection or can allow multiple chunks to be downloaded in parallel to speed up the total download.
A ChunkedDownload
uses the same media URL and authorized
transport that a basic Download
would use, but also
requires a chunk size and a write-able byte stream
. The chunk size is used
to determine how much of the resouce to consume with each request and the
stream is to allow the resource to be written out (e.g. to disk) without
having to fit in memory all at once.
>>> from google.resumable_media.requests import ChunkedDownload
>>>
>>> chunk_size = 50 * 1024 * 1024 # 50MB
>>> stream = io.BytesIO()
>>> download = ChunkedDownload(
... media_url, chunk_size, stream)
>>> # Check the state of the download before starting.
>>> download.bytes_downloaded
0
>>> download.total_bytes is None
True
>>> response = download.consume_next_chunk(transport)
>>> # Check the state of the download after consuming one chunk.
>>> download.finished
False
>>> download.bytes_downloaded # chunk_size
52428800
>>> download.total_bytes # 1GB
1073741824
>>> response
<Response [206]>
>>> response.headers[u'Content-Length']
'52428800'
>>> response.headers[u'Content-Range']
'bytes 0-52428799/1073741824'
>>> len(response.content) == chunk_size
True
>>> stream.seek(0)
0
>>> stream.read(29)
b'The beginning of the chunk...'
The download will change it’s finished
status to True
once the final chunk is consumed. In some cases, the final chunk may
not be the same size as the other chunks:
>>> # The state of the download in progress.
>>> download.finished
False
>>> download.bytes_downloaded # 20 chunks at 50MB
1048576000
>>> download.total_bytes # 1GB
1073741824
>>> response = download.consume_next_chunk(transport)
>>> # The state of the download after consuming the final chunk.
>>> download.finished
True
>>> download.bytes_downloaded == download.total_bytes
True
>>> response
<Response [206]>
>>> response.headers[u'Content-Length']
'25165824'
>>> response.headers[u'Content-Range']
'bytes 1048576000-1073741823/1073741824'
>>> len(response.content) < download.chunk_size
True
In addition, a ChunkedDownload
can also take optional
start
and end
byte positions.
Simple Uploads¶
Among the three supported upload classes, the simplest is
SimpleUpload
. A simple upload should be used when the resource
being uploaded is small and when there is no metadata (other than the name)
associated with the resource.
>>> from google.resumable_media.requests import SimpleUpload
>>>
>>> url_template = (
... u'https://www.googleapis.com/upload/storage/v1/b/{bucket}/o?'
... u'uploadType=media&'
... u'name={blob_name}')
>>> upload_url = url_template.format(
... bucket=bucket, blob_name=blob_name)
>>>
>>> upload = SimpleUpload(upload_url)
>>> data = b'Some not too large content.'
>>> content_type = u'text/plain'
>>> response = upload.transmit(transport, data, content_type)
>>> upload.finished
True
>>> response
<Response [200]>
>>> json_response = response.json()
>>> json_response[u'bucket'] == bucket
True
>>> json_response[u'name'] == blob_name
True
>>> json_response[u'contentType'] == content_type
True
>>> json_response[u'md5Hash']
'M0XLEsX9/sMdiI+4pB4CAQ=='
>>> int(json_response[u'size']) == len(data)
True
In the rare case that an upload fails, an InvalidResponse
will be raised:
>>> upload = SimpleUpload(upload_url)
>>> error = None
>>> try:
... upload.transmit(transport, data, content_type)
... except resumable_media.InvalidResponse as caught_exc:
... error = caught_exc
...
>>> error
InvalidResponse('Request failed with status code', 503,
'Expected one of', <HTTPStatus.OK: 200>)
>>> error.response
<Response [503]>
>>>
>>> upload.finished
True
Even in the case of failure, we see that the upload is
finished
, i.e. it cannot be re-used.
Multipart Uploads¶
After the simple upload, the MultipartUpload
can be used to
achieve essentially the same task. However, a multipart upload allows some
metadata about the resource to be sent along as well. (This is the “multi”:
we send a first part with the metadata and a second part with the actual
bytes in the resource.)
Usage is similar to the simple upload, but transmit()
accepts an extra required argument: metadata
.
>>> from google.resumable_media.requests import MultipartUpload
>>>
>>> url_template = (
... u'https://www.googleapis.com/upload/storage/v1/b/{bucket}/o?'
... u'uploadType=multipart')
>>> upload_url = url_template.format(bucket=bucket)
>>>
>>> upload = MultipartUpload(upload_url)
>>> metadata = {
... u'name': blob_name,
... u'metadata': {
... u'color': u'grurple',
... },
... }
>>> response = upload.transmit(transport, data, metadata, content_type)
>>> upload.finished
True
>>> response
<Response [200]>
>>> json_response = response.json()
>>> json_response[u'bucket'] == bucket
True
>>> json_response[u'name'] == blob_name
True
>>> json_response[u'metadata'] == metadata[u'metadata']
True
As with the simple upload, in the case of failure an InvalidResponse
is raised, enclosing the response
that caused
the failure and the upload
object cannot be re-used after a failure.
Resumable Uploads¶
A ResumableUpload
deviates from the other two upload classes:
it transmits a resource over the course of multiple requests. This
is intended to be used in cases where:
- the size of the resource is not known (i.e. it is generated on the fly)
- requests must be short-lived
- the client has request size limitations
- the resource is too large to fit into memory
In general, a resource should be sent in a single request to avoid latency and reduce QPS. See GCS best practices for more things to consider when using a resumable upload.
After creating a ResumableUpload
instance, a
resumable upload session must be initiated to let the server know that
a series of chunked upload requests will be coming and to obtain an
upload_id
for the session. In contrast to the other two upload classes,
initiate()
takes a byte stream
as input rather
than raw bytes as data
. This can be a file object, a BytesIO
object or any other stream implementing the same interface.
>>> from google.resumable_media.requests import ResumableUpload
>>>
>>> url_template = (
... u'https://www.googleapis.com/upload/storage/v1/b/{bucket}/o?'
... u'uploadType=resumable')
>>> upload_url = url_template.format(bucket=bucket)
>>>
>>> chunk_size = 1024 * 1024 # 1MB
>>> upload = ResumableUpload(upload_url, chunk_size)
>>> stream = io.BytesIO(data)
>>> # The upload doesn't know how "big" it is until seeing a stream.
>>> upload.total_bytes is None
True
>>> metadata = {u'name': blob_name}
>>> response = upload.initiate(transport, stream, metadata, content_type)
>>> response
<Response [200]>
>>> upload.resumable_url == response.headers[u'Location']
True
>>> upload.total_bytes == len(data)
True
>>> upload_id = response.headers[u'X-GUploader-UploadID']
>>> upload_id
'ABCdef189XY_super_serious'
>>> upload.resumable_url == upload_url + u'&upload_id=' + upload_id
True
Once a ResumableUpload
has been initiated, the resource is
transmitted in chunks until completion:
>>> response0 = upload.transmit_next_chunk(transport)
>>> response0
<Response [308]>
>>> upload.finished
False
>>> upload.bytes_uploaded == upload.chunk_size
True
>>>
>>> response1 = upload.transmit_next_chunk(transport)
>>> response1
<Response [308]>
>>> upload.finished
False
>>> upload.bytes_uploaded == 2 * upload.chunk_size
True
>>>
>>> response2 = upload.transmit_next_chunk(transport)
>>> response2
<Response [200]>
>>> upload.finished
True
>>> upload.bytes_uploaded == upload.total_bytes
True
>>> json_response = response2.json()
>>> json_response[u'bucket'] == bucket
True
>>> json_response[u'name'] == blob_name
True
-
class
google.resumable_media.requests.
ChunkedDownload
(media_url, chunk_size, stream, start=0, end=None, headers=None)¶ Bases:
google.resumable_media.requests._helpers.RequestsMixin
,google.resumable_media._download.ChunkedDownload
Download a resource in chunks from a Google API.
Parameters: - media_url (str) – The URL containing the media to be downloaded.
- chunk_size (int) – The number of bytes to be retrieved in each request.
- stream (IO[bytes]) – A write-able stream (i.e. file-like object) that will be used to concatenate chunks of the resource as they are downloaded.
- start (int) – The first byte in a range to be downloaded. If not
provided, defaults to
0
. - end (int) – The last byte in a range to be downloaded. If not provided, will download to the end of the media.
- headers (
Optional
[Mapping
[str
,str
] ]) – Extra headers that should be sent with each request, e.g. headers for data encryption key headers.
-
media_url
¶ str – The URL containing the media to be downloaded.
-
chunk_size
¶ int – The number of bytes to be retrieved in each request.
Raises: ValueError
– Ifstart
is negative.-
bytes_downloaded
¶ int – Number of bytes that have been downloaded.
-
consume_next_chunk
(transport)¶ Consume the next chunk of the resource to be downloaded.
Parameters: transport (Session) – A requests
object which can make authenticated requests.Returns: The HTTP response returned by transport
.Return type: Response Raises: ValueError
– If the current download has finished.
-
finished
¶ bool – Flag indicating if the download has completed.
-
invalid
¶ bool – Indicates if the download is in an invalid state.
This will occur if a call to
consume_next_chunk()
fails.
-
class
google.resumable_media.requests.
Download
(media_url, start=None, end=None, headers=None)¶ Bases:
google.resumable_media.requests._helpers.RequestsMixin
,google.resumable_media._download.Download
Helper to manage downloading a resource from a Google API.
“Slices” of the resource can be retrieved by specifying a range with
start
and / orend
. However, in typical usage, neitherstart
norend
is expected to be provided.Parameters: - media_url (str) – The URL containing the media to be downloaded.
- start (int) – The first byte in a range to be downloaded. If not
provided, but
end
is provided, will download from the beginning toend
of the media. - end (int) – The last byte in a range to be downloaded. If not
provided, but
start
is provided, will download from thestart
to the end of the media. - headers (
Optional
[Mapping
[str
,str
] ]) – Extra headers that should be sent with the request, e.g. headers for encrypted data.
-
media_url
¶ str – The URL containing the media to be downloaded.
-
consume
(transport)¶ Consume the resource to be downloaded.
Parameters: transport (Session) – A requests
object which can make authenticated requests.Returns: The HTTP response returned by transport
.Return type: Response Raises: ValueError
– If the currentDownload
has already finished.
-
finished
¶ bool – Flag indicating if the download has completed.
-
class
google.resumable_media.requests.
MultipartUpload
(upload_url, headers=None)¶ Bases:
google.resumable_media.requests._helpers.RequestsMixin
,google.resumable_media._upload.MultipartUpload
Upload a resource with metadata to a Google API.
A multipart upload sends both metadata and the resource in a single (multipart) request.
Parameters: -
upload_url
¶ str – The URL where the content will be uploaded.
-
finished
¶ bool – Flag indicating if the upload has completed.
-
transmit
(transport, data, metadata, content_type)¶ Transmit the resource to be uploaded.
Parameters: - transport (Session) – A
requests
object which can make authenticated requests. - data (bytes) – The resource content to be uploaded.
- metadata (
Mapping
[str
,str
]) – The resource metadata, such as an ACL list. - content_type (str) – The content type of the resource, e.g. a JPEG
image has content type
image/jpeg
.
Returns: The HTTP response returned by
transport
.Return type: - transport (Session) – A
-
-
class
google.resumable_media.requests.
ResumableUpload
(upload_url, chunk_size, headers=None)¶ Bases:
google.resumable_media.requests._helpers.RequestsMixin
,google.resumable_media._upload.ResumableUpload
Initiate and fulfill a resumable upload to a Google API.
A resumable upload sends an initial request with the resource metadata and then gets assigned an upload ID / upload URL to send bytes to. Using the upload URL, the upload is then done in chunks (determined by the user) until all bytes have been uploaded.
Parameters: - upload_url (str) – The URL where the resumable upload will be initiated.
- chunk_size (int) – The size of each chunk used to upload the resource.
- headers (
Optional
[Mapping
[str
,str
] ]) – Extra headers that should be sent with theinitiate()
request, e.g. headers for encrypted data. These will not be sent withtransmit_next_chunk()
orrecover()
requests.
-
upload_url
¶ str – The URL where the content will be uploaded.
Raises: ValueError
– Ifchunk_size
is not a multiple ofUPLOAD_CHUNK_SIZE
.-
bytes_uploaded
¶ int – Number of bytes that have been uploaded.
-
chunk_size
¶ int – The size of each chunk used to upload the resource.
-
finished
¶ bool – Flag indicating if the upload has completed.
-
initiate
(transport, stream, metadata, content_type)¶ Initiate a resumable upload.
Parameters: - transport (Session) – A
requests
object which can make authenticated requests. - stream (IO[bytes]) – The stream (i.e. file-like object) that will
be uploaded. The stream must be at the beginning (i.e.
stream.tell() == 0
). - metadata (
Mapping
[str
,str
]) – The resource metadata, such as an ACL list. - content_type (str) – The content type of the resource, e.g. a JPEG
image has content type
image/jpeg
.
Returns: The HTTP response returned by
transport
.Return type: - transport (Session) – A
-
invalid
¶ bool – Indicates if the upload is in an invalid state.
This will occur if a call to
transmit_next_chunk()
fails. To recover from such a failure, callrecover()
.
-
recover
(transport)¶ Recover from a failure.
This method should be used when a
ResumableUpload
is in aninvalid
state due to a request failure.This will verify the progress with the server and make sure the current upload is in a valid state before
transmit_next_chunk()
can be used again.Parameters: transport (Session) – A requests
object which can make authenticated requests.Returns: The HTTP response returned by transport
.Return type: Response
-
transmit_next_chunk
(transport)¶ Transmit the next chunk of the resource to be uploaded.
In the case of failure, an exception is thrown that preserves the failed response:
>>> error = None >>> try: ... upload.transmit_next_chunk(transport) ... except resumable_media.InvalidResponse as caught_exc: ... error = caught_exc ... >>> error InvalidResponse('Request failed with status code', 400, 'Expected one of', <HTTPStatus.OK: 200>, 308) >>> error.response <Response [400]>
Parameters: transport (Session) – A requests
object which can make authenticated requests.Returns: The HTTP response returned by transport
.Return type: Response Raises: InvalidResponse
– If the status code is not 200 or 308.
-
class
google.resumable_media.requests.
SimpleUpload
(upload_url, headers=None)¶ Bases:
google.resumable_media.requests._helpers.RequestsMixin
,google.resumable_media._upload.SimpleUpload
Upload a resource to a Google API.
A simple media upload sends no metadata and completes the upload in a single request.
Parameters: -
upload_url
¶ str – The URL where the content will be uploaded.
-
finished
¶ bool – Flag indicating if the upload has completed.
-
transmit
(transport, data, content_type)¶ Transmit the resource to be uploaded.
Parameters: Returns: The HTTP response returned by
transport
.Return type:
-