Google Documents List Data API v1.0

Documents List Data API
- Developer's Guide
- - Version 3.0 ^(Labs ⁾
  - Version 2.0
  - Version 1.0
    - Protocol Guide
    - Reference Guide
    - Java
    - .NET
    - PHP
    - Python
- API Changelog
- Issue Tracker
- Libraries and Code
- Solutions Marketplace

Google Data Protocol

Python Language Guide (v1.0)

In addition to providing some background on the capabilities of the Documents List Data API, this guide provides examples for interacting with the API using the Python client library. If you're interested in understanding more about the underlying protocol used by the Python client library to interact with the Documents List, please see the protocol guide.

Important: This Developer's Guide describes how to work with an older version of the Document List Data API protocol. Rest assured we will continue to support this version according to our Terms of Service. However, the latest (and we think greatest) version of the protocol can be found in the left-side navbar -- if it meets your needs, we encourage you to migrate.

Audience

This document is intended for developers who want to write client applications using the Google Data Python client library that can interact with Google Documents.

Getting started

Google Documents uses Google Accounts for authentication, so if you have a Google account you are all set. Otherwise, you can create a new account.

To use the Python client library, you'll need Python 2.2+ and the modules listed on the DependencyModules wiki page. After downloading the client library, see Getting Started with the Google Data Python Library for help installing and using the Python client.

Running the sample

A full working sample is located in the samples/docs subdirectory of the project's SVN repository (/trunk/samples/docs/docs_example.py).

Run the example as follows:

python docs_example.py

The program uses ClientLogin, so it will prompt you for a username and password. These values are the same credentials that you use to login to Google Documents.

The sample allows the user to perform a number of operations which demonstrate how to use the Documents List API. To include the examples in this guide into your own code, you'll need the following import statements:

import gdata.docs
import gdata.docs.service

You will also need to setup a DocsService object, which represents a client connection (with authentication) to the Documents List API.

gd_client = gdata.docs.service.DocsService(source='yourCo-yourAppName-v1')

The source argument is optional and should follow the format: company-applicationname-version. It's recommended to include this parameter for logging purposes.

Note: The rest of the guide assumes you created a DocsService in the variable gd_client.

Authenticating to the Documents List API

The Python client library can be used to work with either public or private feeds. The Documents List Data API provides access to private feeds only which require authentication with the documents servers. This can be done via ClientLogin username/password authentication, AuthSub, or OAuth.

Note: The API only offers private feeds at the moment. Your application must perform authentication to issue requests against the Documents List.

Please see the Google Data APIs Authentication Overview for more information on AuthSub, OAuth, and ClientLogin.

ClientLogin for "installed" applications

To use ClientLogin, invoke the ClientLogin method of DocsService, which is inherited from Service. Specify the email address and password of the user on whose behalf your client is making requests. For example:

gd_client.ClientLogin('example@gmail.com', 'pa$$word')

For more information on ClientLogin, see the Authentication for Installed Applications documentation.

AuthSub for web applications

AuthSub Authentication for Web Applications should be used by client applicaitons which need to authenticate their users to Google accounts. The operator does not need access to the username and password for the Google Documents user - only an AuthSub token is required.

Request a single-use token

When the user first visits your application, they need to authenticate. In this case, you need to print some text and a link directing the user to Google to authenticate your request for access to their documents. The Python Google Data client library provides a function to generate this URL. The code below sets up a link to the AuthSubRequest page.

import gdata.service

def GetAuthSubUrl():
  next = 'http://www.example.com/welcome.pyc'
  scopes = ['http://docs.google.com/feeds/']
  secure = False  # set secure=True to request a secure AuthSub token
  session = True
  return gdata.service.GenerateAuthSubRequestUrl(next, scopes, secure=secure, session=session)

print '<a href="%s">Login to your Google account</a>' % GetAuthSubUrl()

Notice the parameters sent to the GenerateAuthSubRequestUrl method:

next, the URL of the page that Google should redirect the user to after authentication.
scope, indicates that the application is requesting access to the Documents feed.
secure, False indicates we won't be using secure AuthSub tokens.
session, True indicates the single-use token can be exchanged for a long-lived session token.

Note: To authenticate users to a Google Apps account, include the domain keyword argument set to your domain name: domain='example.com'.

The generated URL looks something like this:

https://www.google.com/accounts/AuthSubRequest?scope=http%3A%2F%2Fdocs.google.com%2Ffeeds%2F&session;=1&secure;=0&hd;=default&next;=http%3A%2F%2Fwww.example.com%2Fwelcome.pyc%3Fauthsub_token_scope%3Dhttp%253A%252F%252Fdocs.google.com%252Ffeeds%252F

The user can then follow the link to Google's site and authenticate to their Google account.

After the user authenticates, they will be redirected back to the next URL. The URL will have a single-use token value appended to it as a query parameter. The URL looks something like this:

http://www.example.com/welcome.pyc?token=yourSingleUseToken

Upgrading to a session token

For security, this token is single-use only, so now you need to exchange this single-use token for a session token. This process is described in the Using AuthSub with the Google Data API Client Libraries document. The following code snippet shows how to upgrade the token.

import cgi

parameters = cgi.FieldStorage()
authsub_token = parameters['token']

gd_client.auth_token = authsub_token
gd_client.UpgradeToSessionToken()

Alternatively, you could use the extract_auth_sub_token_from_url method of the gdata.auth module to create a AuthSubToken / SecureAuthSubToken object from the single-use token.

import gdata.auth

# TODO: Get page's current URL
single_use_token = gdata.auth.extract_auth_sub_token_from_url(current_url)
gd_client.UpgradeToSessionToken(single_use_token)

For more details on AuthSub, including how to use secure AuthSub, see Using AuthSub with the Google Data API Client Libraries and AuthSub Authentication for Web Applications.

Retrieving a list of documents

To fetch a feed containing a list of the currently authenticated user's documents, send an authenticated GET request to the following URL:

http://docs.google.com/feeds/documents/private/full

The result is a "meta-feed," a feed that lists all of that user's documents; each entry in the feed represents a document (spreadsheet, presentation, word processor document, pdf, etc.). Again, this feed is only accessible after Authenticating to the Documents List API.

Here is an example of printing out the user's entire document list:

def PrintFeed(feed):
  """Prints out the contents of a feed to the console."""
  print '\n'
  if not feed.entry:
    print 'No entries in feed.\n'
  for entry in feed.entry:
    print '%s %s %s' % (entry.title.text.encode('UTF-8'), entry.GetDocumentType(), entry.resourceId.text)

feed = gd_client.GetDocumentListFeed()
PrintFeed(feed)

The resulting DocumentListFeed object feed represents a response from the server. Among other things, this feed contains a list of DocumentListEntry objects (feed.entry), each of which represents a single document. DocumentListEntry encapsulates the information shown in the protocol developer's guide.

Searching the documents feed

You can search the Document List using some of the standard Google Data API query parameters. Categories are used to restrict the type of document (spreadsheet, folder, etc.) returned. The full-text query string (q parameter) is used to search the content of all the documents. More detailed information on parameters specific to the Documents List can be found in the Documents List Data API Reference Guide.

In the Python client library, a DocumentQuery object can be used to construct queries for the Documents List feed. The following code is used in all of the examples below to print out the feed results to the command line.

def PrintFeed(feed):
  """Prints out the contents of a feed to the console."""
  print '\n'
  if not feed.entry:
    print 'No entries in feed.\n'
  for entry in feed.entry:
    print '%s %s %s' % (entry.title.text.encode('UTF-8'), entry.GetDocumentType(), entry.resourceId.text)

Retrieving all word processor documents

A list of only word processor documents can be retrieved by using the document category as follows:

q = gdata.docs.service.DocumentQuery(categories=['document'])
feed = gd_client.Query(q.ToUri())
PrintFeed(feed)

Retrieving all spreadsheets

A list of only spreadsheets can be retrieved by using the spreadsheet category as follows:

q = gdata.docs.service.DocumentQuery(categories=['spreadsheet'])
feed = gd_client.Query(q.ToUri())
PrintFeed(feed)

Retrieving all presentations that I own

A list of only starred presentations can be retrieved by using the presentation and mine categories as follows:

q = gdata.docs.service.DocumentQuery()
q.categories.append('presentation')
q.categories.append('mine')
feed = gd_client.Query(q.ToUri())
PrintFeed(feed)

Retrieving a list of folders

A list of folders can be retrieved by using the folder category along with the showfolders=true parameter:

q = gdata.docs.service.DocumentQuery(categories=['folder'], params={'showfolders': 'true'})
feed = gd_client.Query(q.ToUri())
PrintFeed(feed)

Tip: Category queries can work for other document types as well. For a list of possible categories, see the reference guide.

Retrieving a document by an exact title match

It is possible to retrieve documents by matching on their title instead of their entire contents. To do this, add the title parameter to the DocumentQuery object. To match a title exactly, add a title-exact parameter to indicate this is the full, explicit title. Since this parameter is case-insensitive or multiple docs could have the same title, a feed is returned.

q = gdata.docs.service.DocumentQuery()
q['title'] = 'Test'
q['title-exact'] = 'true'
feed = self.gd_client.Query(q.ToUri())
PrintFeed(feed)

Note: title-exact queries are case-insenstive. For example, the sample above will print documents that match "Test", "test", and "TeSt", but not "Test title".

Retrieving all documents in a named folder

In most cases, a category query which includes the folder name will find the documents in that folder. However, you can also explicitly request documents in a named folder by using a schema qualified query. The AddNamedFolder function lets you retrieve all documents in a specified folder belonging to a user with the specified email address:

q = gdata.docs.service.DocumentQuery()
q.AddNamedFolder(email, folder_name)

This style of query is useful when a folder name conflicts with a category that has a different meaning, such as "starred". For example, to query for all the documents in the "starred" folder belonging to user "user@gmail.com", you could use the function as follows:

q = gdata.docs.service.DocumentQuery()
q.AddNamedFolder('user@gmail.com', 'starred')
feed = self.gd_client.Query(q.ToUri())
PrintFeed(feed)

The important distinction here is if you had simply appended the category of "starred" you would get back a list of all starred documents, not the documents in the folder named "starred".

Performing a text query

You can search the contents of documents by using the text_query property of the DocumentQuery object.

q = gdata.docs.service.DocumentQuery()
q.text_query = 'test'
feed = gd_client.Query(q.ToUri())
PrintFeed(feed)

This searches the entire contents of every document for the string "test" and returns all documents where this string is found. This is different than searching just the title of every document, which can be done as described in the section Retrieving a document by an exact title match.

Uploading documents

Any document uploaded to the server is first wrapped in a MediaSource object. In the examples the MediaSource constructor is taking in two variables: file_path is the name of the file including the file system path, and content_type is the MIME type (e.g. text/plain) of the document being uploaded. For more information on the MediaSource class, please use the Python built-in documentation system:

import gdata
help(gdata.MediaSource)

For your convenience, there is a static dictionary member of the gdata.docs.service module named SUPPORTED_FILETYPES. It maps upper-case file extensions to their appropriate MIME types. You should also refer to the supported file types section of the FAQ.

Creating an empty document

This example creates a new spreadsheet in the Documents List feed by creating a DocumentListEntry object containining metadata for the document.

new_entry = gdata.GDataEntry()
new_entry.title = gdata.atom.Title(text='MyBlankSpreadsheetTitle')
category = gd_client._MakeKindCategory(gdata.docs.service.SPREADSHEET_LABEL)
new_entry.category.append(category)

created_entry = gd_client.Post(new_entry, '/feeds/documents/private/full')
print 'Spreadsheet now accessible online at:', created_entry.GetAlternateLink().href

Uploading a word processor document

This example uploads a document, assuming file_path is the path to a word processor document of MIME type content_type. The entry variable is a DocumentListEntry object containining information about the document that was uploaded, including a direct link to the document.

ms = gdata.MediaSource(file_path='/path/to/your/test.doc', content_type=gdata.docs.service.SUPPORTED_FILETYPES['DOC'])
entry = gd_client.Upload(ms, 'MyDocTitle')
print 'Document now accessible online at:', entry.GetAlternateLink().href

Similarly, you can upload different file types:

ms = gdata.MediaSource(file_path='/path/to/your/test.rtf', content_type=gdata.docs.service.SUPPORTED_FILETYPES['RTF'])
entry = gd_client.Upload(ms, 'MyDocTitle')
print 'Document now accessible online at:', entry.GetAlternateLink().href

Uploading a presentation

This example uploads a presentation, assuming file_path is the path to a presentation of MIME type content_type. The entry variable is a DocumentListEntry object containing information about the presentation that was uploaded, including a direct link to the presentation.

ms = gdata.MediaSource(file_path='/path/to/your/test.ppt', content_type=gdata.docs.service.SUPPORTED_FILETYPES['PPT'])
entry = gd_client.Upload(ms, 'MyPresoTitle')
print 'Presentation now accessible online at:', entry.GetAlternateLink().href

Uploading a spreadsheet

This example uploads a spreadsheet, assuming file_path is the path to a spreadsheet of MIME type content_type. The entry variable is a DocumentListEntry object containing information about the spreadsheet that was uploaded, including a direct link to the spreadsheet.

ms = gdata.MediaSource(file_path='/path/to/your/test.xls', content_type=gdata.docs.service.SUPPORTED_FILETYPES['XLS'])
entry = gd_client.Upload(ms, 'MySpreadsheetTitle')
print 'Spreadsheet now accessible online at:', entry.GetAlternateLink().href

Uploading a document to a folder

The previous methods also take an optional third argument that accepts either a DocumentListEntry object representing a folder, or the folder's self link (entry.GetSelfLink().href). This example uploads a spreadsheet to an existing folder. It assume folder is a DocumentListEntry fetched form the document list.

ms = gdata.MediaSource(file_path='/path/to/your/test.csv', content_type=gdata.docs.service.SUPPORTED_FILETYPES['CSV'])
entry = gd_client.Upload(ms, 'MySpreadsheetTitle', folder_or_uri=folder)
print 'Spreadsheet now accessible online at:', entry.GetAlternateLink().href

Downloading and exporting documents

To export documents from the Documents List feed, you need the Atom entry of the document or the document, spreadsheet, or presentation's resource id (e.g. document:12345).

Exporting word processor documents

The following example exports a DocumentListEntry objection as a .doc file:

file_path = '/path/to/save/your_document.doc'
print 'Downloading document to %s...' % (file_path,)
gd_client.Export(entry, file_path)

Alternatively, you can pass in the resource id, as the Download helper methods also accept that. The following example exports a DocumentListEntry object as a .html file:

file_path = '/path/to/save/your_document.html'
print 'Downloading document to %s...' % (file_path,)
gd_client.Export(entry.resourceId.text, file_path)

Exporting presentations

The following example exports a DocumentListEntry object as a .swf file:

file_path = '/path/to/save/your_presentation.swf'
print 'Downloading presentation to %s...' % (file_path,)
gd_client.Export(entry, file_path)

Exporting spreadsheets

The following example exports a DocumentListEntry object as a .xls file:

file_path = '/path/to/save/your_spreadsheets.xls'
print 'Downloading spreadsheet to %s...' % (file_path,)
gd_client.Export(entry, file_path)

When exporting to .csv or .tsv, you can specify which grid/sheet to download by using the gid argument:

file_path = '/path/to/save/your_spreadsheets.csv'
print 'Downloading spreadsheet to %s...' % (file_path,)
gd_client.Export(entry, file_path, gid=1) # export the second sheet

Important: In order to download spreadsheets, your client needs a valid token for the Spreadsheets API service. See downloading spreadsheets section of the protocol guide for more details.

If you're using AuthSub, the solution is to request a multi-scoped token, good for both the Documents List API and the Spreadsheets API. Pass in scope=['http://docs.google.com/feeds/', 'http://spreadsheets.google.com/feeds/'] to GenerateAuthSubRequestUrl.

For ClientLogin, first create a SpreadsheetsService object (to obtain a spreadsheets token), and then swap that token into your DocsService object. This example demonstrates that process:

import gdata.spreadsheet.service

spreadsheets_client = gdata.spreadsheet.service.SpreadsheetsService()
spreadsheets_client.ClientLogin('user@gmail.com', 'pa$$word')

# substitute the spreadsheets token into our gd_client
docs_auth_token = gd_client.GetClientLoginToken()
gd_client.SetClientLoginToken(spreadsheets_client.GetClientLoginToken())

gd_client.Export(entry, file_path)

gd_client.SetClientLoginToken(docs_auth_token) # reset the DocList auth token

Trashing a document

To trash a document, folder, presentation, or spreadsheet, use the Delete method of the service object on the edit link of the Atom entry representing the document. For example, to trash one of the new documents from the upload examples above, you would execute the following.

gd_client.Delete(entry.GetEditLink().href)

Updating existing documents

Updating metadata

Here is an example of updating a presentation's metadata, but leaving its content unchanged. The presentation's name will be changed to 'better presentation title'. Since the request does not contain new document content, the entry's edit link is used.

existing_entry.title.text = 'better presentation title'
updated_entry = gd_client.Put(existing_entry, existing_entry.GetEditLink().href)

Replacing a document's content

To update a document's content, use the Atom entry's edit-media link. The following example replaces the document's content with replacementContent.doc's content, and updates the document's title to 'updated document' in the same request.

ms = gdata.MediaSource(file_path='/path/to/your/replacementContent.doc',
                       content_type=gdata.docs.service.SUPPORTED_FILETYPES['DOC'])
entry.title.text = 'updated document'
updated_entry = gd_client.Put(ms, entry.GetEditMediaLink().href)

Appending to a document

For word documents, you can append data to the document's content by using the append=true parameter. The process is exactly the same as replacing a document's content (above).

Important: Content-Type: text/plain is the only accepted content type when appending data. For example, you cannot append the contents of a Word processor document with Content-Type: application/msword to an existing document.

Here is an example of appending the text 'Appending this data!' to an existing document. Again, the edit-media link is used because we're modifying the document's content.

import StringIO

data = 'Appending this data!'
ms = gdata.MediaSource(file_handle=StringIO.StringIO(data), content_type='text/plain', content_length=len(data))
updated_entry = gd_client.Put(ms, entry.GetEditMediaLink().href + '?append=true')

Folder management

Creating folders

To create a new folder, use the CreateFolder helper by passing it folder name:

folder_entry = gd_client.CreateFolder('YourFolderTitle')

To create a folder inside another folder, pass in the folder's Atom entry as a second argument to CreatedFolder. This example creates two folders. The second folder (Folder2) is created inside of Folder1:

parent_folder_entry = gd_client.CreateFolder('Folder1')
child_folder_entry = gd_client.CreateFolder('Folder2', parent_folder_entry)

Trashing a folder

Deleting a folder is the same as deleting a document. Use the service's Delete method and pass it the entry's edit link:

gd_client.Delete(folder_entry.GetEditLink().href)

Moving documents/folders in and out folders

Moving a document into a folder requires that you have a DocumentListEntry object for the document, and another for the folder in which the document should be moved to. The DocsService module provides a method for moving documents, presentations, spreadsheets, and folders in and out of folders.

Moving a document/folder into another folder:

moved_doc_entry = gd_client.MoveIntoFolder(doc_entry, dest_folder_entry)

The first argument to MoveIntoFolder() is a DocumentListEntry object representing the source folder or document to relocate. The second argument is a DocumentListEntry object representing the destination folder entry.

Moving a document/folder out of a folder:

gd_client.MoveOutOfFolder(src_entry)

Modifying Document Sharing Permissions

Retrieving the ACL feed for a document

The GetDocumentListAclFeed method can be used to retrieve the ACL permissions for a document. To retrieve the permission for a document, you need the <gd:feedLink> from the Atom entry. The gdata.docs.DocumentListEntry.GetAclLink() method returns that link.

The following example fetches the first document the authenticated user owns, queries its ACL feed, and prints out the permission entries:

uri = ('http://docs.google.com/feeds/documents/private/full/'
       '-/mine?max-results=1')
feed = gd_client.GetDocumentListFeed(uri)
acl_feed = gd_client.GetDocumentListAclFeed(feed.entry[0].GetAclLink().href)
for acl_entry in acl_feed.entry:
  print '%s - %s (%s)' % (acl_entry.role.value, acl_entry.scope.value, acl_entry.scope.type)

You can also instantiate a DocumentAclQuery if you know the document's resouce_id:

resource_id = 'spreadsheet:12345'
query = gdata.docs.service.DocumentAclQuery(resource_id)
acl_feed = gd_client.GetDocumentListAclFeed(query.ToUri())
...

Modifying the ACL feed for a document

Adding a new permission

To add a new permission to a document, your client needs to create a new DocumentListAclEntry and POST it to the server.

Here's an example that adds 'user@example.com' as a reader to the document represented by doc_entry:

scope = gdata.docs.Scope(value='user@example.com', type='user')
role = gdata.docs.Role(value='reader')
acl_entry = gdata.docs.DocumentListAclEntry(scope=scope, role=role)

created_acl_entry = gd_client.Post(acl_entry, doc_entry.GetAclLink().href,
                                   converter=gdata.docs.DocumentListAclEntryFromString)

Possible values for the Role are reader, writer, and owner.

Updating a permission

We can update the ACL entry that we just added by sending a PUT request (with the updated content) to the edit link of the acl entry in question.

This example modfies our previous created_acl_entry by updating 'user@example.com' to be a writer (collaborator):

created_acl_entry.role.value = 'writer'
updated_acl_entry = gd_client.Put(created_acl_entry, created_acl_entry.GetEditLink().href,
                                  converter=gdata.docs.DocumentListAclEntryFromString)

Deleting a permission

Deleting a permission invovles sending a DELETE to the ACL entry's edit link.

gd_client.Delete(updated_acl_entry.GetEditLink().href)

May	JUL	Nov
	15
2009	2010	2011

Make web development faster with these Chrome Extensions!