Google Drive on Pigshell
Pigshell lets you mount Google Drive as a filesystem and interact with its contents using a Unix-like CLI - entirely within the browser. This approach has several advantages which complement the web client and the Google Drive native application.
- Backup files and documents stored on Drive to your desktop with a simple
cp -r /firstname.lastname@example.org /home/
- Attach Drives belonging to multiple users at the same time, including
Google Apps for Business accounts. Transfer files between them with simple
- Explore, select and perform operations on multiple files using Unix-like pipelines. For example, "move all presentations older than a year to folder ppt-2013".
Pigshell aims to provide a common minimum filesystem interface to various web data sources; it will therefore (probably) never support Drive and Docs API features like sharing, version control, etc.
Go to http://pigshell.com, click on the Google icon and the Attach Google Account popup. This will redirect you to Google's authentication screen. Once authentication and authorization are completed, the Google icon turns red. Click again to add more Google accounts if needed.
Drive filesystems are automatically mounted at
at every page reload.
Note that data flow is entirely between your browser and Google. The pigshell server is a dumb static web server - it cannot see any authentication tokens or user data.
To list the files in Drive,
pig:/$ cd /email@example.com pig:firstname.lastname@example.org$ ls
Clicking on any of the files takes you to the corresponding Google web page for editing the document.
(From now on, we omit the prompt to make it easy to cut and paste commands)
To list all spreadsheets,
ls | grep -f mime spreadsheet
ls | grep -f mime presentation ls | grep -f mime document
lists all the presentations and documents in the root folder respectively.
The Google Drive UI encourages users to create a huge pile of documents in one unmanageable root folder, thereby making the Search box a necessary first step to find anything. For those used to managing hierarchical folders, pigshell offers a way to clean up the closet without a lot of dragging and dropping.
mkdir fy2013-14 # Make a folder ls | grep -f mime spreadsheet # Figure out what spreadsheets I have ls | grep -f mime spreadsheet | grep 2013 # Refine based on name ls | grep -f mime spreadsheet | grep 2013 | mv fy2013-14 # Move em all
Typical pigshell pipelines consist of commands processing lists of objects.
In the above case, the first
grep matches only objects whose
contains the string "spreadsheet". The second matches those whose names contain
the string "2013".
mv receives a bunch of matched file objects, which it then
fy2013-14. This might equally well be accomplished by
*2013*ppt fy2013-14, but the pipe based approach allows for interactive,
incremental refining of the file selection.
ls | grep -f mime spreadsheet | grep -e 'x.mtime < Date.parse("Jan 1, 2014") && x.mtime > Date.parse("Dec 31, 2012")' | mv fy2013-14
In this case, the second
Removing files is also straightforward. Files are moved to trash rather than obliterated. Trashed files can be recovered using the Drive web GUI.
ls *2013* # OK, that looks like the right bunch of files rm *2013* # Nuke em ls *2013* | rm # Alternative method
Files shared with you are visible under the "Shared With Me" folder.
cd "Shared With Me" ls
Documents and Files
There are important differences between documents and files in the Google Drive context. Both are visible as file entities in Drive UIs as well as pigshell, but they are treated differently when it comes to viewing, copying and moving operations.
Documents should be considered as abstract resources controlled by Google
Docs. They do not have a specific size or a specific sequence of bytes as
visible to the external world via the Drive API. One can retrieve a
representation of this resource in a format like
txt, but there is no guarantee that downloading and re-uploading a document
even in a canonical format (say, docx) is going to result in a byte-identical
result, since there is conversion going on both ways.
Files are images, text files and other data which are stored by Drive as-is. A file has a specific size, contents and checksum as visible from the Drive API. Downloading and re-uploading such a file gives predictable results.
While copying document files (docx/pptx/xlsx) from any source into Drive, even those previously retrieved from Google Docs, you need to specify whether you want Drive to treat them as documents (effectively, converting them internally to Google Docs resources), or as opaque files, in which case they will be stored as-is. In the former case, you will be able to edit them with Google Docs, but in the latter case, they will appear as zip files, since docx et al use zip as a container format.
By default, pigshell satisfies a
read() on a document by retrieving its
representation in the appropriate OOXML format (docx/pptx/xlsx) and a
on a file with its binary contents. When copying a file into Drive, pigshell
defaults to storing it as a binary file, even if it was originally a document.
These defaults can be overridden by CLI options as explained below.
Viewing Documents and Files
You cannot view documents directly in pigshell, but you can view a PDF representation of their contents. To view them as Office files, you can copy them to your desktop and open them using your preferred Office application.
cat -o gdrive.fmt=pdf Resume cat -o gdrive.fmt=pdf Trip\ Expenses cat -o gdrive.fmt=txt Resume # Text representation
Files for which pigshell has media handlers can be viewed directly.
Files for which pigshell cannot determine mime type, or lacks a media handler, will be displayed as text. Unlike Unix terminals, the process of spewing binary garbage onto the screen is mercifully silent.
Copying From Drive
Copying a single document is easy:
cp Resume /tmp # Copies as docx cp -o gdrive.fmt=pdf Resume /tmp/R.pdf # Copies as pdf cp -o gdrive.fmt=txt Resume /tmp/R.txt # Plain text
To view the PDF version,
This is nice, but
/tmp is backed by a RamFS; reload the page and it's gone.
To copy a file to the desktop,
cp /email@example.com/Resume /downloads
The file will hit the default downloads directory of your browser.
python psty.py -a -d /some/dir # Run in DESKTOP SHELL (bash), not pigshell
The psty server runs only on Linux and Mac OS at present.
mount http://localhost:50937/ /home # Run in PIGSHELL, not desktop shell
/some/dir on your desktop is now visible to pigshell at
you copy from pigshell into
/home can be accessed from your desktop at
/some/dir and vice versa.
Once you've got psty running and /home mounted, you can take a full backup of your Drive as follows:
mkdir /home/drivebackup cp -rv -X /Trash /firstname.lastname@example.org /home/drivebackup
This will take a while. Copies can be continued or refreshed with
cp -crv -X /Trash /email@example.com /home/drivebackup
-c flag will skip files which have the same size on both locations. In
case the size of the source is zero (documents on Drive are 0-sized), it will
skip source files which have an older modification time than the target.
Finally, if the target file is smaller than the source, it will continue the
copy (a la
wget -c) rather than restart from scratch.
want to exclude "Shared With Me" as well (tends to be huge for corporate
cp -rv -X '/Trash|/Shared With Me' /firstname.lastname@example.org /home/drivebackup
Instead of seeing the progress printed on-screen, you could save it to a log file.
cp -crv -X /Trash /email@example.com /home/drivebackup 2>/home/drivebackup/cplog.$(date -f "YYYY-MM-DD-HHmmss")
You can use
^C to kill a long-running pipeline,
^B to continue it in the
^Z to pause it. The
do what their names suggest.
Copying To Drive
Copying a file is straightforward:
cd /firstname.lastname@example.org cp /doc/README.md . cp http://pigshell.com/sample/photos/bchips.jpg . cp /some/where/foo.docx .
These files are stored as-is. Note that
foo.docx will not be editable as a
Copying a document, i.e. with conversion, requires an extra flag.
cp -o gdrive.convert /some/where/foo.docx .
Copying Across Accounts
Assuming you have attached multiple accounts, the corresponding Drives are
Copying between these accounts is similar to the process described above.
To copy a document,
cp -o gdrive.convert /email@example.com/Resume /firstname.lastname@example.org/resume-dir
Note that we need the
convert flag to copy documents, if we want them to be
retained as documents in the target Drive.
Copying files is straightforward:
cp /email@example.com/baya.jpg /firstname.lastname@example.org/photos
Bugs and Gotchas
Probably quite a few. Don't use in production.
- The "Shared With Me" folder is read-only.
- Drive has weird ideas of timestamps. createdDate, modifiedDate, lastViewedByMeDate don't mean what they appear to. Pigshell maps the first two to the ctime and mtime file attributes. createdDate can be later than modifiedDate, and lastViewedByMeDate can be older than modifiedDate. As best as I can make out, modifiedDate is the only sane one of the lot. Verify the output of any timestamp-based filtering pipelines before doing anything destructive.
- The "Logout" button doesn't really log you out of Google, just inhibits auto-mounting of Drive within pigshell. Most people run pigshell in the same browser as their personal GMail, company GMail etc accounts and getting logging out of all these is painful.