In addition to providing some background on the capabilities of the Documents List Data API, this guide provides examples for interacting with the API using the Python client library. If you're interested in understanding more about the underlying protocol used by the Python client library to interact with the Documents List, please see the protocol guide.
Important: This Developer's Guide describes how to work with an older version of the Document List Data API protocol. Rest assured we will continue to support this version according to our Terms of Service. However, the latest (and we think greatest) version of the protocol can be found in the left-side navbar -- if it meets your needs, we encourage you to migrate.
This document is intended for developers who want to write client applications using the Google Data Python client library that can interact with Google Documents.
Google Documents uses Google Accounts for authentication, so if you have a Google account you are all set. Otherwise, you can create a new account.
To use the Python client library, you'll need Python 2.2+ and the modules listed on the DependencyModules wiki page. After downloading the client library, see Getting Started with the Google Data Python Library for help installing and using the Python client.
A full working sample is located in the samples/docs
subdirectory of the project's SVN repository
(/trunk/samples/docs/docs_example.py).
Run the example as follows:
python docs_example.py
The program uses ClientLogin, so it will prompt you for a username and password. These values are the same credentials that you use to login to Google Documents.
The sample allows the user to perform a number of operations which demonstrate how to use the Documents List API. To include the
examples in this guide into your own code, you'll need
the following import
statements:
import gdata.docs import gdata.docs.service
You will also need to setup a DocsService
object, which represents a client connection (with authentication) to the Documents List API.
gd_client = gdata.docs.service.DocsService
(source='yourCo-yourAppName-v1')
The source
argument is optional and should follow the format: company-applicationname-version
.
It's recommended to include this parameter for logging purposes.
Note: The rest of the guide assumes you created a DocsService
in the variable gd_client
.
The Python client library can be used to work with either public or private feeds. The Documents List Data API provides access to private feeds only which require authentication with the documents servers. This can be done via ClientLogin username/password authentication, AuthSub, or OAuth.
Note: The API only offers private feeds at the moment. Your application must perform authentication to issue requests against the Documents List.
Please see the Google Data APIs Authentication Overview for more information on AuthSub, OAuth, and ClientLogin.
To use ClientLogin, invoke the ClientLogin
method of DocsService
, which is inherited from
Service
. Specify the email address and password of the user on whose behalf your client is making requests. For example:
gd_client.ClientLogin('example@gmail.com', 'pa$$word')
For more information on ClientLogin, see the Authentication for Installed Applications documentation.
AuthSub Authentication for Web Applications should be used by client applicaitons which need to authenticate their users to Google accounts. The operator does not need access to the username and password for the Google Documents user - only an AuthSub token is required.
When the user first visits your application, they need to authenticate. In this case, you need to print some text and a link directing the user to Google to authenticate your request for access to their documents. The Python Google Data client library provides a function to generate this URL. The code below sets up a link to the AuthSubRequest page.
import gdata.service def GetAuthSubUrl(): next = 'http://www.example.com/welcome.pyc' scopes = ['http://docs.google.com/feeds/'] secure = False # set secure=True to request a secure AuthSub token session = True return gdata.service.GenerateAuthSubRequestUrl(next, scopes, secure=secure, session=session) print '<a href="%s">Login to your Google account</a>' % GetAuthSubUrl()
Notice the parameters sent to the GenerateAuthSubRequestUrl
method:
False
indicates we won't be using secure AuthSub tokens.True
indicates the single-use token can be exchanged for a long-lived session token.Note: To authenticate users to a Google Apps account, include the domain
keyword argument
set to your domain name: domain='example.com'
.
The generated URL looks something like this:
https://www.google.com/accounts/AuthSubRequest?scope=http%3A%2F%2Fdocs.google.com%2Ffeeds%2F&session;=1&secure;=0&hd;=default&next;=http%3A%2F%2Fwww.example.com%2Fwelcome.pyc%3Fauthsub_token_scope%3Dhttp%253A%252F%252Fdocs.google.com%252Ffeeds%252F
The user can then follow the link to Google's site and authenticate to their Google account.
After the user authenticates, they will be redirected back to the next
URL. The URL will have a single-use token value appended to it as a query parameter. The URL looks something like this:
http://www.example.com/welcome.pyc?token=yourSingleUseToken
For security, this token is single-use only, so now you need to exchange this single-use token for a session token. This process is described in the Using AuthSub with the Google Data API Client Libraries document. The following code snippet shows how to upgrade the token.
import cgi parameters = cgi.FieldStorage() authsub_token = parameters['token'] gd_client.auth_token = authsub_token gd_client.UpgradeToSessionToken()
Alternatively, you could use the extract_auth_sub_token_from_url
method of the gdata.auth
module to create a
AuthSubToken
/ SecureAuthSubToken
object from the single-use token.
import gdata.auth # TODO: Get page's current URL single_use_token = gdata.auth.extract_auth_sub_token_from_url(current_url) gd_client.UpgradeToSessionToken(single_use_token)
For more details on AuthSub, including how to use secure AuthSub, see Using AuthSub with the Google Data API Client Libraries and AuthSub Authentication for Web Applications.
To fetch a feed containing a list of the currently authenticated user's documents, send an authenticated GET
request to the following URL:
http://docs.google.com/feeds/documents/private/full
The result is a "meta-feed," a feed that lists all of that user's documents; each entry in the feed represents a document (spreadsheet, presentation, word processor document, pdf, etc.). Again, this feed is only accessible after Authenticating to the Documents List API.
Here is an example of printing out the user's entire document list:
def PrintFeed(feed): """Prints out the contents of a feed to the console.""" print '\n' if not feed.entry: print 'No entries in feed.\n' for entry in feed.entry: print '%s %s %s' % (entry.title.text.encode('UTF-8'), entry.GetDocumentType(), entry.resourceId.text) feed = gd_client.GetDocumentListFeed() PrintFeed(feed)
The resulting DocumentListFeed
object feed
represents a response from the server. Among other things, this feed
contains a list of DocumentListEntry
objects (feed.entry
), each of which represents a single
document. DocumentListEntry
encapsulates the information shown in the protocol developer's guide.
You can search the Document List using some of the standard Google Data API query parameters.
Categories are used to restrict the type of document (spreadsheet, folder, etc.) returned. The full-text query string (q
parameter) is used to search the content of all the documents. More detailed information on parameters specific to the Documents List can be found in the Documents List Data API Reference Guide.
In the Python client library, a DocumentQuery
object can be used to construct queries for the Documents List feed. The following code
is used in all of the examples below to print out the feed results to the command line.
def PrintFeed(feed): """Prints out the contents of a feed to the console.""" print '\n' if not feed.entry: print 'No entries in feed.\n' for entry in feed.entry: print '%s %s %s' % (entry.title.text.encode('UTF-8'), entry.GetDocumentType(), entry.resourceId.text)
A list of only word processor documents can be retrieved by using the document
category as follows:
q = gdata.docs.service.DocumentQuery(categories=['document']) feed = gd_client.Query(q.ToUri()) PrintFeed(feed)
A list of only spreadsheets can be retrieved by using the spreadsheet
category as follows:
q = gdata.docs.service.DocumentQuery(categories=['spreadsheet']) feed = gd_client.Query(q.ToUri()) PrintFeed(feed)
A list of only starred presentations can be retrieved by using the presentation
and mine
categories as follows:
q = gdata.docs.service.DocumentQuery() q.categories.append('presentation') q.categories.append('mine') feed = gd_client.Query(q.ToUri()) PrintFeed(feed)
A list of folders can be retrieved by using the folder
category along with the showfolders=true
parameter:
q = gdata.docs.service.DocumentQuery(categories=['folder'], params={'showfolders': 'true'}) feed = gd_client.Query(q.ToUri()) PrintFeed(feed)
Tip: Category queries can work for other document types as well. For a list of possible categories, see the reference guide.
It is possible to retrieve documents by matching on their title instead of their entire contents. To do this, add the
title
parameter to the DocumentQuery
object. To match a title exactly, add a title-exact
parameter to indicate this is the full, explicit title. Since this parameter is case-insensitive or multiple docs could have the same title, a feed is returned.
q = gdata.docs.service.DocumentQuery() q['title'] = 'Test' q['title-exact'] = 'true' feed = self.gd_client.Query(q.ToUri()) PrintFeed(feed)
Note: title-exact
queries are case-insenstive. For example, the sample above will print documents
that match "Test", "test", and "TeSt", but not "Test title".
In most cases, a category query which includes the folder name will find the documents in that folder. However, you can also explicitly request
documents in a named folder by using a schema qualified query. The AddNamedFolder
function lets you retrieve all documents in a
specified folder belonging to a user with the specified email address:
q = gdata.docs.service.DocumentQuery() q.AddNamedFolder(email, folder_name)
This style of query is useful when a folder name conflicts with a category that has a different meaning, such as "starred". For example, to query for all the documents in the "starred" folder belonging to user "user@gmail.com", you could use the function as follows:
q = gdata.docs.service.DocumentQuery() q.AddNamedFolder('user@gmail.com', 'starred') feed = self.gd_client.Query(q.ToUri()) PrintFeed(feed)
The important distinction here is if you had simply appended the category of "starred" you would get back a list of all starred documents, not the documents in the folder named "starred".
You can search the contents of documents by using the text_query property of the DocumentQuery
object.
q = gdata.docs.service.DocumentQuery() q.text_query = 'test' feed = gd_client.Query(q.ToUri()) PrintFeed(feed)
This searches the entire contents of every document for the string "test" and returns all documents where this string is found. This is different than searching just the title of every document, which can be done as described in the section Retrieving a document by an exact title match.
Any document uploaded to the server is first wrapped in a MediaSource
object.
In the examples the MediaSource
constructor is taking in two variables: file_path
is the name of the file including the
file system path, and content_type
is the MIME type (e.g. text/plain
) of the document being uploaded. For more information
on the MediaSource
class, please use the Python built-in documentation system:
import gdata help(gdata.MediaSource)
For your convenience, there is a static dictionary member of the gdata.docs.service
module named SUPPORTED_FILETYPES
.
It maps upper-case file extensions to their appropriate MIME types. You should also refer to the supported file types section of the FAQ.
This example creates a new spreadsheet in the Documents List feed by creating a DocumentListEntry
object containining
metadata for the document.
new_entry = gdata.GDataEntry() new_entry.title = gdata.atom.Title(text='MyBlankSpreadsheetTitle') category = gd_client._MakeKindCategory(gdata.docs.service.SPREADSHEET_LABEL) new_entry.category.append(category) created_entry = gd_client.Post(new_entry, '/feeds/documents/private/full') print 'Spreadsheet now accessible online at:', created_entry.GetAlternateLink().href
This example uploads a document, assuming file_path
is the path to a word processor document of MIME type
content_type
. The entry
variable is a DocumentListEntry
object containining information about
the document that was uploaded, including a direct link to the document.
ms = gdata.MediaSource(file_path='/path/to/your/test.doc', content_type=gdata.docs.service.SUPPORTED_FILETYPES['DOC']) entry = gd_client.Upload(ms, 'MyDocTitle') print 'Document now accessible online at:', entry.GetAlternateLink().href
Similarly, you can upload different file types:
ms = gdata.MediaSource(file_path='/path/to/your/test.rtf', content_type=gdata.docs.service.SUPPORTED_FILETYPES['RTF']) entry = gd_client.Upload(ms, 'MyDocTitle') print 'Document now accessible online at:', entry.GetAlternateLink().href
This example uploads a presentation, assuming file_path
is
the path to a presentation of MIME type content_type
. The entry
variable is a DocumentListEntry
object containing information about the presentation that was uploaded, including a direct link to the presentation.
ms = gdata.MediaSource(file_path='/path/to/your/test.ppt', content_type=gdata.docs.service.SUPPORTED_FILETYPES['PPT']) entry = gd_client.Upload(ms, 'MyPresoTitle') print 'Presentation now accessible online at:', entry.GetAlternateLink().href
This example uploads a spreadsheet, assuming file_path
is the path to a spreadsheet of MIME type content_type
. The entry
variable is a DocumentListEntry
object containing information about the spreadsheet that was uploaded, including a direct link to the spreadsheet.
ms = gdata.MediaSource(file_path='/path/to/your/test.xls', content_type=gdata.docs.service.SUPPORTED_FILETYPES['XLS']) entry = gd_client.Upload(ms, 'MySpreadsheetTitle') print 'Spreadsheet now accessible online at:', entry.GetAlternateLink().href
The previous methods also take an optional third argument that accepts either a DocumentListEntry
object representing a folder,
or the folder's self link (entry.GetSelfLink().href
). This example uploads a spreadsheet to an existing folder. It assume folder
is a DocumentListEntry
fetched form the document list.
ms = gdata.MediaSource(file_path='/path/to/your/test.csv', content_type=gdata.docs.service.SUPPORTED_FILETYPES['CSV']) entry = gd_client.Upload(ms, 'MySpreadsheetTitle', folder_or_uri=folder) print 'Spreadsheet now accessible online at:', entry.GetAlternateLink().href
To export documents from the Documents List feed, you need the Atom entry of the document or
the document, spreadsheet, or presentation's resource id (e.g. document:12345
).
The following example exports a DocumentListEntry
objection as a .doc file:
file_path = '/path/to/save/your_document.doc' print 'Downloading document to %s...' % (file_path,) gd_client.Export(entry, file_path)
Alternatively, you can pass in the resource id, as the Download helper methods also accept that.
The following example exports a DocumentListEntry
object as a .html file:
file_path = '/path/to/save/your_document.html' print 'Downloading document to %s...' % (file_path,) gd_client.Export(entry.resourceId.text, file_path)
The following example exports a DocumentListEntry
object as a .swf file:
file_path = '/path/to/save/your_presentation.swf' print 'Downloading presentation to %s...' % (file_path,) gd_client.Export(entry, file_path)
The following example exports a DocumentListEntry
object as a .xls file:
file_path = '/path/to/save/your_spreadsheets.xls' print 'Downloading spreadsheet to %s...' % (file_path,) gd_client.Export(entry, file_path)
When exporting to .csv or .tsv, you can specify which grid/sheet to download by using the gid
argument:
file_path = '/path/to/save/your_spreadsheets.csv' print 'Downloading spreadsheet to %s...' % (file_path,) gd_client.Export(entry, file_path, gid=1) # export the second sheet
Important: In order to download spreadsheets, your client needs a valid token for the Spreadsheets API service. See downloading spreadsheets section of the protocol guide for more details.
If you're using AuthSub, the solution is to request a multi-scoped token, good for both the Documents List API and the
Spreadsheets API. Pass in scope=['http://docs.google.com/feeds/', 'http://spreadsheets.google.com/feeds/']
to
GenerateAuthSubRequestUrl
.
For ClientLogin, first create a SpreadsheetsService
object (to obtain a spreadsheets token), and then
swap that token into your DocsService
object. This example demonstrates that process:
import gdata.spreadsheet.service spreadsheets_client = gdata.spreadsheet.service.SpreadsheetsService() spreadsheets_client.ClientLogin('user@gmail.com', 'pa$$word') # substitute the spreadsheets token into our gd_client docs_auth_token = gd_client.GetClientLoginToken() gd_client.SetClientLoginToken(spreadsheets_client.GetClientLoginToken()) gd_client.Export(entry, file_path) gd_client.SetClientLoginToken(docs_auth_token) # reset the DocList auth token
To trash a document, folder, presentation, or spreadsheet, use the Delete
method of the service object
on the edit
link of the Atom entry representing the document. For example, to trash one of the new
documents from the upload examples above, you would execute the following.
gd_client.Delete(entry.GetEditLink().href)
Here is an example of updating a presentation's metadata, but leaving its content unchanged.
The presentation's name will be changed to 'better presentation title'. Since the request does not contain new document content,
the entry's edit
link is used.
existing_entry.title.text = 'better presentation title' updated_entry = gd_client.Put(existing_entry, existing_entry.GetEditLink().href)
To update a document's content, use the Atom entry's edit-media
link. The following example replaces the
document's content with replacementContent.doc's content, and updates the document's title to 'updated document' in the same request.
ms = gdata.MediaSource(file_path='/path/to/your/replacementContent.doc', content_type=gdata.docs.service.SUPPORTED_FILETYPES['DOC']) entry.title.text = 'updated document' updated_entry = gd_client.Put(ms, entry.GetEditMediaLink().href)
For word documents, you can append data to the document's content by using the append=true
parameter.
The process is exactly the same as replacing a document's content (above).
Important: Content-Type: text/plain
is the only accepted content type when appending data.
For example, you cannot append the contents of a Word processor document with Content-Type: application/msword
to an
existing document.
Here is an example of appending the text 'Appending this data!' to an existing document. Again, the edit-media
link
is used because we're modifying the document's content.
import StringIO data = 'Appending this data!' ms = gdata.MediaSource(file_handle=StringIO.StringIO(data), content_type='text/plain', content_length=len(data)) updated_entry = gd_client.Put(ms, entry.GetEditMediaLink().href + '?append=true')
To create a new folder, use the CreateFolder
helper by passing it folder name:
folder_entry = gd_client.CreateFolder('YourFolderTitle')
To create a folder inside another folder, pass in the folder's Atom entry as a second argument to CreatedFolder
.
This example creates two folders. The second folder (Folder2) is created inside of Folder1:
parent_folder_entry = gd_client.CreateFolder('Folder1') child_folder_entry = gd_client.CreateFolder('Folder2', parent_folder_entry)
Deleting a folder is the same as deleting a document. Use the service's Delete
method and
pass it the entry's edit
link:
gd_client.Delete(folder_entry.GetEditLink().href)
Moving a document into a folder requires that you have a DocumentListEntry
object for the document, and another for the folder
in which the document should be moved to. The DocsService
module provides a method for moving documents, presentations, spreadsheets,
and folders in and out of folders.
moved_doc_entry = gd_client.MoveIntoFolder(doc_entry, dest_folder_entry)
The first argument to MoveIntoFolder()
is a DocumentListEntry
object representing the source folder or document to relocate. The second argument
is a DocumentListEntry
object representing the destination folder entry.
gd_client.MoveOutOfFolder(src_entry)
The GetDocumentListAclFeed
method can be used to retrieve the ACL permissions for a document. To retrieve the
permission for a document, you need the <gd:feedLink>
from the Atom entry. The gdata.docs.DocumentListEntry.GetAclLink()
method returns that link.
The following example fetches the first document the authenticated user owns, queries its ACL feed, and prints out the permission entries:
uri = ('http://docs.google.com/feeds/documents/private/full/' '-/mine?max-results=1') feed = gd_client.GetDocumentListFeed(uri) acl_feed = gd_client.GetDocumentListAclFeed(feed.entry[0].GetAclLink().href) for acl_entry in acl_feed.entry: print '%s - %s (%s)' % (acl_entry.role.value, acl_entry.scope.value, acl_entry.scope.type)
You can also instantiate a DocumentAclQuery
if you know the document's resouce_id:
resource_id = 'spreadsheet:12345' query = gdata.docs.service.DocumentAclQuery(resource_id) acl_feed = gd_client.GetDocumentListAclFeed(query.ToUri()) ...
To add a new permission to a document, your client needs to create a new DocumentListAclEntry
and POST
it to the server.
Here's an example that adds 'user@example.com' as a reader
to the document represented by doc_entry
:
scope = gdata.docs.Scope(value='user@example.com', type='user') role = gdata.docs.Role(value='reader') acl_entry = gdata.docs.DocumentListAclEntry(scope=scope, role=role) created_acl_entry = gd_client.Post(acl_entry, doc_entry.GetAclLink().href, converter=gdata.docs.DocumentListAclEntryFromString)
Possible values for the Role
are reader
, writer
, and owner
.
We can update the ACL entry that we just added by sending a PUT
request (with the updated content) to the
edit
link of the acl entry in question.
This example modfies our previous created_acl_entry
by updating 'user@example.com' to be a writer
(collaborator):
created_acl_entry.role.value = 'writer' updated_acl_entry = gd_client.Put(created_acl_entry, created_acl_entry.GetEditLink().href, converter=gdata.docs.DocumentListAclEntryFromString)
Deleting a permission invovles sending a DELETE
to the ACL entry's edit
link.
gd_client.Delete(updated_acl_entry.GetEditLink().href)