Pigshell

Interact with the web, unix style.

Pigshell is a web app which presents resources on the web as files in a hierarchical file system. These include public web pages as well as private data in Facebook, Google drive and Picasa albums. It provides a command line interface to construct pipelines of simple commands to filter, transform and display data.

Running the psty server on your desktop is strongly recommended. It exposes a local directory for pigshell to use as a /home, serves as a proxy HTTP server, and lets pigshell pipe web data through desktop unix utilities.

Pigshell is under development. No commands, APIs or interfaces are frozen at this point. We expect to release the sources under the GNU GPLv3 around April 2014.

A few basic examples of pigshell usage are given below.

Hello, World

pig> cat http://pigshell.com/sample/life-expectancy.html | table2js -e "table.wikitable tr" foo country data | template -ig /templates/d3-worldmap1

Life expectancy

We got HTML data from a website, extracted a table, converted it to a list of Javascript objects and fed it to a D3-based template for visualization.

A command issued on the CLI will fill its output area as and when it gets data. The prompt may glow green to indicate that a command or pipeline is running.

Commands are really generator functions which yield lists of objects, composed using the pipe operator. They are not independently executing processes.

Pigshell passes objects over the pipe, rather than opaque data streams. The pipeline gets into motion when the last member (an implicit Stdout) asks the upstream command for an item, which in turn asks its upstream command and so on.

cat returned a blob to table2js, which parsed and converted the table to Javascript objects which were returned to template, which returned an HTMLDivElement object to Stdout.

"Process" Management

The ps command will show you a list of running pipelines. Pipelines are the unit of process management in pigshell.

To kill a long-running pipeline, use ps to find its PID and kill. You can also stop and start pipelines, which is roughly the equivalent of kill -STOP and kill -CONT. Stopped commands will have their prompt turn amber.

Facebook

Connect your Facebook account. (Data privacy is assured: pigshell is a pure Javascript app, no access tokens or user data are visible to or stored by the server.)

Then,

pig> cd /facebook/friends; ls

will give you a list of your friends with thumbnails.

Where in the world are my friends?

pig> map /facebook/friends/*

Friends map

map is a command which plots files with location attributes on a map. Another way of doing this would be

pig> ls /facebook/friends/ | map

Pigshell passes objects over pipes. In this case, ls emits a stream of file objects, which are consumed by map.

Let's refine the above query: Where are all my male friends?

pig> ls /facebook/friends | grep -f gender "^male" | map

grep is a generic filter command, which may filter either by an object's text representation, or a specific field - in this case, gender.

How many friends do I have?

pig> ls /facebook/friends | sum

Pie chart of relationship status of all female friends

    pig> ls /facebook/friends | grep -f gender "female" | chart -f relationship_status

Get, Filter, Visualize

In the "Hello, World" example, we didn't need to tweak the data. For a more complex example on the same lines, let us look at a visualization of the per-capita GDP on a purchasing-power-parity basis. You can copy and paste the whole command from this document to the shell.

pig> projection="orthographic"; title="GDP PPP (thousand USD)"; ppp=$(cat http://pigshell.com/sample/gdp-ppp.html | to text | jf '$$.html($$(x).find("table").first())' | table2js -e "tr" foo country data | jf 'x.data = Math.round(+x.data.replace(/,/g,"")) / 1000, x'); echo $ppp | template -c title,projection -ig /templates/d3-worldmap1

GDP PPP

That is rather a mouthful, but the broad outline should be fairly clear even if you don't know pigshell or Javascript. Reading from left to right,

  • We set up configuration in variables to be used later by the template command
  • We use the ppp variable to capture the processed list of country objects so it can be easily reused.
  • We get the contents of a URL (cat ...), convert it to text, use Cheerio to select and extract the html corresponding to the first table, using JQuery-like syntax.
  • The table2js script (found in /bin) converts each table row to an object containing the attributes foo, country, data corresponding to the first three columns,
  • We sanitize the data by removing the commas used as thousands-separators, and divide by 1000, to keep the labels of manageable length,
  • The template command waits to gather all the input (-g) and renders the given D3-based HTML template in an IFrame (-i).

You can grab the globe to move it around and zoom using the scroll wheel. Try to figure out which two pesky little countries are making everyone else look bad. If you can't figure it out, perhaps some old fashioned sorting and heading will help:

pig> echo $ppp | sort -rnf data | head | printf "%(country)s %(data)f\n"

Let's filter out the fabulously well-to-do:

pig> echo $ppp | jf 'x.data > 80 ? {} : x' | template -c title,projection -ig /templates/d3-worldmap1

jf applies the supplied Javascript expression to every input object and outputs the result.

Another way to get data into a processing pipeline is via the paste command.

pig> paste | table2js -e "tr" foo country data | template -ig /templates/d3-worldmap1

paste displays a little window into which you can paste fragments of data copied from other web pages. Hit "Emit & Quit" when done. This is a standard copy/paste operation, so you can use paste to inject data from anywhere, including spreadsheets, webpages with authenticated sessions and tweak the content before processing it.

Data conversion

Most filesystem read() operations generate blobs. It is up to the consuming command to convert incoming data into the type it likes. In the command

pig> cat http://pigshell.com/sample/photos/bchips.jpg

cat returns a blob. The terminal figures out that the blob contains PNG data, and displays it as a canvas. Similarly,

pig> cat http://pigshell.com/sample/clickingofcuthbert.pdf

is detected as a PDF and displayed using pdf.js. In case it could not figure out the contents, it attempts to convert it to text and displays it as the usual control-character porridge (though it is mercifully silent, unlike Unix terminals)

In some cases, you have to manually convert data between stages in the pipeline. For example,

pig> cat http://pigshell.com/sample/README.md | to text | jf 'x.split("\\n")' | sum

implements a poor man's wc: cat returns a blob, to converts it to text, jf splits it into lines, sum counts the number of objects it gets. A cat ... | sum would have returned 1, since only one object (a blob) was presented to sum.

$HOME sweet $HOME

Get a /home. Download Psty, run it on your desktop:

bash$ python psty.py -a -d /some/directory

and on pigshell,

pig> mount http://localhost:50937/ /home

Now you can read and write to /home it will be backed by /some/directory. cat images and PDFs stored on your desktop inside /some/directory.

This mount command needs to be typed every time you start or reload the page. To do it automatically,

pig> echo "mount http://localhost:50937/ /home" >/local/rc.sh

/local/rc.sh is a script stored in the browser's LocalStorage and will be invoked every time http://pigshell.com is (re)loaded. You need to create a /local/rc.sh on every browser on which you use pigshell.

URLs as files

Absolute URLs can be used in most places where a file path is expected. Mounting an HTTP URL exposes all links within that page as directories. To mount arbitrary, non-CORS-enabled URLs, you need to run psty.

pig> mount http://pigshell.com/sample/ /mnt; cd /mnt; ls
pig> cat oslogos.png
pig> cat .
pig> cat . | to text

Data movement

Assuming you're running psty, backing up a Picasa album to your desktop is as simple as

pig> mkdir /home/foo; cp /picasa/foo/* /home/foo

Similarly, creating an album and uploading a bunch of pictures to Picasa:

pig> mkdir /picasa/bar; cp /home/barpics/*JPG /picasa/bar

(note that album creation and uploads to Picasa require psty's proxy services)

Copying random URLs to your desktop also works:

pig> cp -c http://ftp.freebsd.org/pub/FreeBSD/ISO-IMAGES-amd64/10.0/FreeBSD-10.0-RELEASE-amd64-bootonly.iso /home

The -c option continues where it left off, so you can resume interrupted downloads.

If you don't have psty:

  • You can copy the files to /downloads, and it will get to your browser's default download folder. Note that you cannot see anything inside the /downloads directory, it's just a pseudo-target to trigger a browser download. For example,

    pig> cp /picasa/foo/DSC_1290.JPG /downloads
    
  • Click on Upload Files and select a file or files from your desktop. These files are now visible under the directory /uploads. Use ls to verify that they're there. Now use cp to copy them to the target directory.

    pig> cp /uploads/cat.jpg /gdrive
    pig> cp /uploads/cat.jpg /facebook/me/albums/MyCat/
    

Processing on the desktop - Wsh

Psty runs a websocket service, effectively converting every Unix utility which uses stdin/stdout into a potential member of the pigshell pipeline. For instance, if you have ImageMagick installed,

pig> cat http://pigshell.com/sample/oslogos.png | wsh /usr/local/bin/convert -implode 1 - - | to -g blob

will grab a png file from the web, pipe it through ImageMagick on the desktop, and display the result in pigshell.

To visualize disk usage in a zoomable treemap,

pig> wsh du /Users/foo | to -g text | template -ig /templates/d3-du-treemap

du-treemap

Note that du of a deep tree may take a while, try with a shallow directory tree first)

Finally, an example from 7 command-line tools for data science. This assumes that you have R and ggplot2 already installed.

pig> cd /home; cp https://raw.github.com/jeroenjanssens/data-science-toolbox/master/tools/Rio .

Rio is a bash script to coax R into running with stdin/stdout.

pig> cat https://raw.github.com/pydata/pandas/master/pandas/tests/data/iris.csv | wsh bash ./Rio -ge 'g+geom_point(aes(x=SepalLength,y=SepalWidth,colour=Name))' | to -g blob

Rio

Frequently Asked Questions

  1. What about the privacy of my data?

    Your data stays 100% private. The app is all static files and runs completely as client-side Javascript. The server cannot see any data or access tokens.

  2. Why does ls | cat in pigshell behave like ls | xargs cat in Unix?

    Pigshell passes objects (rather than opaque data) over the pipe. ls in its normal incarnation emits file objects. cat (and other filter commands) receiving file objects will process them not as text, but as files. You can think of it as an implicit xargs.

    ls -l emits text strings, so ls -l | cat will behave the same on both pigshell and a Unix shell. While initially confusing, this is the more natural behaviour in a web environment thickly populated with structured objects. See the User guide for more details.

  3. I can't see the cursor.

    Long-running or CPU-intensive commands may sometimes freeze the page for a few seconds. It is also possible that keyboard focus has gone elsewhere. Click on the last prompt and focus should return there.

  4. There are two cursors on the screen. What do I do now?

    You probably ran a command which is reading something from standard input, and it has opened up a little line just below its command line with a cursor. You may click on that line to move focus there and enter input. You can use Ctrl-D at the beginning of a new line to signal end of input.

    If it was a mistake, then you can simply click on the latest prompt and continue issuing new commands. You can ignore or kill the old command.

  5. How do I reboot?

    Reload the page. Hold down Shift while reloading may help, clearing your browser cache.

  6. What commands are available?

    Most commands are built-in, while a few are scripts found in /bin. You can see a list of commands using the help command. Help for a specific command like ls may be seen either using ls -h or help ls.

  7. What browsers are supported?

    Pigshell should work on most modern browsers. We use Chrome on MacOS as our primary dev/test platform, but Firefox, Safari on MacOS, Chrome and Firefox on Linux, and Chrome on Windows work as well.

    Firefox on Windows has known issues with stack overflows. There is no plan to support IE.

    Nothing on the iPad works currently due to keyboard input issues.

  8. How do I write my own scripts?

    The syntax is close enough to bash, and looking through existing scripts in /bin should give you enough of an idea to start writing your own. A trip to the User Guide is definitely recommended, though.

    The edit command implements a simple, minimal editor using CodeMirror.

    You could also create/edit scripts on your desktop using your favourite editor and run them from somewhere under /home.

  9. How do I store scripts so that they survive a "reboot"?

    The /local filesystem uses the browser's HTML5 localStorage as its backing store. Files stored there will survive a page reload (aka "reboot") Note that localStorage is typically limited to about 5 MB per site, so this is only suitable for saving small files like scripts.

    Much better to keep it inside /home, as this area is backed by your host filesystem.

  10. How do I rename files?

    Sorry, renaming files is not supported at the moment.

  11. How do I figure out what attributes a file object contains?

    Use stat <file> or printf -j <file>. Most files have a raw attribute containing all the information returned by the backend API.

  12. How do I make sense of these errors? I get parse error for a line which is most certainly correct. Expected "#", "\n", "\r", "\r\n"? WTF?

    Sorry. Error reporting is still in the "PC Load Letter" era. The line number indicates the beginning of a block which failed to parse. So if you have a long multi-line if construct with an error somewhere in the middle, it will flag the if line as the source of the error.

    The best way right now is to comment out chunks of the block to figure out which one is causing the real trouble.

  13. What data sources are supported, and what operations work on those files?

    • Facebook: Creating new albums, reading (but not editing!) photos, writing new photos. Photos created with Pigshell can be deleted. Reading of posts, writing of text posts.

    • Google Drive: Reading, creating, deleting files.

    • Picasa: Reading and editing of photos. Creating and deleting photos supported if you are running Psty, as they need a proxy to duck CORS issues.

Philosophy

  1. What is unique about pigshell? How is this different from IFTTT/YQL/ ?

    Pigshell provides a Unix-like CLI environment to converse with web data in an exploratory, improvisational style, with simple commands and pipes to process files. It occupies the same evolutionary niche as the Unix shell and shell-scripting - everyday, casual programming.

    Services like IFTTT are about setting up data flows, like cron jobs. Pigshell is about exploring, processing and visualizing data. Somewhat like YQL, but based around Unix/shell/file idioms rather than SQL.

  2. What are the common use cases for pigshell?

    1. Quick and dirty web scraping, grepping, visualization.
    2. Data movement across the web: e.g. copying photos from Picasa to Facebook to the desktop and vice versa.
    3. Data movement between the cloud and the desktop.
    4. Write scripts to customize and personalize the experience of navigating the web.

More

The user guide has more detailed coverage of pigshell concepts and the scripting language.

Contact

Email us at dev@pigshell.com or tweet @pigshell