Pigshell User Guide
Pigshell is a web app which presents resources on the web, including public web pages and private data in Facebook, Google drive and Picasa albums as files in a hierarchical file system. It provides a command line interface to construct pipelines of simple commands to filter, transform and display data.
The pigshell system is similar in spirit to Unix and bash, but also borrows
several ideas from Plan 9 and
rc, mixing syntax and features in a manner
calculated to annoy experienced users of both systems.
The name pigshell comes from the time-honoured tradition of weak puns and recursive acronyms: GNU's Not Unix, and PIG Isn't GNU.
The shell and shell scripts occupy an important niche in the Unix users' universe: they can quickly assemble ad-hoc tools from simple components to interact with their data. For complex applications, they might open the editor and write a program, but for hundreds of simple operations, the humble shell suffices.
There is no equivalent in the world of the web and the cloud, though an increasing amount of our data resides there. One is forced to go through GUIs, each with their individual warts and annoyances. Imagine having to open a different GUI application every time you accessed a different disk, with no way to directly copy from one disk to the other. The alternative is to crack open an editor, read up a plethora of API documents and do a fair amount of coding and debugging before getting the first trickle of data to go from point A to B.
Pigshell is a place to have informal conversations with data.
In this document, we describe the different components of the system, their main features and examples of usage. In addition, we will also point out the more prominent gotchas, unimplemented features and bugs.
Broadly, Pigshell consists of the following:
- The shell itself.
- Built-in commands.
- Filesystems, to represent resources from various data providers as filesystems.
The shell is designed to be feel familiar to Unix and bash users, but there are crucial differences. The most important of these are:
- Objects are the fundamental currency of the system. Objects are passed across pipes, rather than streams of unstructured data. The web environment frequently returns structured objects, in JSON, for instance, and there is no point in losing that structure and recovering it in every other stage of the pipeline.
- Commands are not really concurrently running processes. They are generator
functions which yield objects. Pipes are basically operators to compose
long chains of these functions. In the pipeline
ls -l | grep foo | sum, the (implicit)
Stdoutfunction asks its upstream function (
sum) for an item, which in turn does the same to its upstream function (
grep) and so on.
lsyields a File object, which
grepfilters, then asks
lsyields another File which
greppasses on to
sum, which increments a counter, then asks
grepfor more. Finally,
lssignals that it's out of objects, relayed by
sum, which outputs the counter to
- The pipeline is the fundamental unit of "process management". You can kill, stop, resume pipelines of commands, rather than individual commands themselves.
The shell presents itself as a terminal with a command line. Emacs-style command line editing is possible. Common shortcuts include:
- Ctrl-A, Ctrl-E: Go to the beginning or end of line.
- Ctrl-U, Ctrl-K: Kill text up to the beginning or end of line.
- Ctrl-W: Kill previous word.
- Ctrl-L: Clear screen.
- Up arrow, Down arrow: Navigate through command history.
- Ctrl-D: End of input.
The primary prompt consists of
When you type a command at the primary prompt and hit Enter, it starts
running immediately. This is the foreground command. A secondary prompt of
> is displayed.
You can use this prompt to typeahead another command, which will be executed after the foreground command completes. You can queue multiple commands in this way, and they will be executed in strict sequence.
To kill the foreground command, use Ctrl-C. This also triggers the running of the next queued command, if any.
Similarly, to pause the foreground command and continue with any queued
commands, use Ctrl-Z. The paused command can be resumed using
You may use Ctrl-B to "background" the foreground command and start running the next queued command. This is typically done when the foreground command is going to run for several seconds, and the queued command is not dependent on its predecessor.
The output of commands is restricted to an (elastic) area below the command line. Thus, many commands may be running and generating output at the same time without stomping over each other, maintaining the question-answer structure of the command line conversation.
This also means that multiple commands may be waiting for input, as indicated by blinking cursors. Simply click next to the cursor to switch focus.
The running status of a pipeline is visually indicated by the colour of the prompt.
- A green prompt indicates that the command is running,
- Amber indicates that it is stopped
- Black indicates that it has completed with a successful exit status.
- Red indicates that it has completed with an unsuccessful exit status.
Reloading the webpage is equivalent to rebooting the system and the loss of all local state. Only files stored in /local and filesystems backed by a persistent remote store (e.g. PstyFS, Google) will survive a reboot.
ಠ_ಠ Occasionally, things may get buggered up to the point that there is no cursor visible anywhere. In such cases, simply click near the last prompt and you should get focus there, and resume typing commands.
ಠ_ಠ Cut and paste is also somewhat iffy.
ls | sum echo able baker charlie >/tmpfile echo some more >>/tmpfile ls A* ls *.jpg cat bar | grep foo >/dev/null && echo "bar contains foo" cat < asd > bsd rm somefile || echo rm failed!
- To quote an argument containing spaces or special characters, it must be enclosed in single or double quotes. There is no difference between the two. Variable interpolation is not done for arguments in double quotes.
- Arguments with one type of quote may be enclosed in the other, e.g. "Patrick O'Brian" and 'Benjamin "Bugsy" Siegel'.
- Backslashes may be used to escape special characters in unquoted strings.
Pigshell variables are lists of objects. Most commonly, they are lists of strings. Variables may be assigned values in the usual manner:
msg="How's it going?"
dirs=(/facebook /twitter /gdrive)
Parentheses are used to enclose lists. The variable
dirs is thus assigned a
list of two strings.
msg is a list containing one string.
Lists are expanded on reference.
/facebook /twitter /gdrive
echo command is invoked with two arguments.
To add to a list,
/facebook /twitter /gdrive /picasa
Variables may be subscripted by a list of numbers (or a list of expressions
yielding numbers) to retrieve part of the list. List indexing starts at zero.
echo $dirs($index 2 $index)
/facebook /gdrive /facebook
The number of elements in the variable
dirs can be found using
One can do the equivalent of an
array.join(' ') using the
words=(Holy Plan9 Ripoff Batman)
echo $words and
echo $sent will both print
Holy Plan9 Ripoff Batman
echo $#words $#sent will print
Referring to a nonexistent variable yields an empty list, referring to its
length gives 0, and
$"nonexistent gives the empty string.
Local Scope: Positional variables (
$*) and variables whose names begin with an underscore (e.g.
_foo) are local to the enclosing function or shell.
Global Scope: All other variables are global to the shell, and may be freely referenced and set inside functions.
Exports: There is no notion of
export, copies of all global variables are inherited by a child shell from its parent. Changing a variable in a child will not affect the value in the parent.
Arguments may be concatenated using the
^ operator. In most cases, it is
not necessary, since pigshell will automatically concatenate arguments which
adjoin each other without any intervening whitespace. For example, in the
able=able; baker=baker; echo "able"baker able'baker' "able"'baker' able$baker $able^baker $able$baker
echo has 6 arguments, each of which is
ablebaker. Note that a caret was
only required to resolve ambiguity in one case.
The rules for concatenating lists are as follows:
- Concatenation is a left-binding operator. i.e.
a^b^cis parsed as
- Concatenation operates on strings. List elements are coerced into strings
toString()method before concatenation.
- An empty list A concatenated with a list B will yield B.
- A list A with a single element concatenated with B will yield a list where
A(0) is concatenated with every element of B.
a=able; b=(1 2 3)
able1 able2 able3
1able 2able 3able
- If lists A and B have the same number of elements, the result is a list of
strings concatenated pairwise.
a=(able baker charlie); b=(1 2 3)
able1 baker2 charlie3
- Lists not conforming to any of the above rules cannot be concatenated.
Command substitution allows the standard output of a command to be converted
into an expression, which may be used as a command argument or assigned to a
variable. Pigshell supports only the
$(command) form, not the backtick
form. For example,
nfiles=$(ls | sum)
echo "Number of files: " $(ls | sum)
files contains a list of file objects. Command substitution is
the easiest way to get objects into variables.
Command substitutions may be nested:
echo $(printf -s $format $i $(cat $i/status) $(cat $i/cmdline))
Control Flow - if
The syntax of the
if construct is very similar to bash.
; then tcmd...
[; elif cond
; then tcmd...
[; else ecmd...
If the exit value of the cond command is
true, we enter the
Any exit value other than
true is considered false. Commands may be spread
over multiple lines, like in bash.
Control Flow - for
for loops are also similar to bash.
for i in list
; do cmd...
Control Flow - while
while loops are, again, similar to bash.
; do cmd...
ಠ_ಠ Running a potentially infinite while loop from the CLI is not advisable, as it cannot be killed. Doing so from a script is fine, since scripts execute within a new shell which can be killed.
Functions can be defined as follows:
Functions behave like inline scripts in how they are invoked, how arguments
are accessed within the body, and their ability to be part of pipelines.
funcname arg1 arg2
funcname arg1 arg2 | grep foo
Arguments are accessed within the body of the function using positional
All global variables accessed, defined and modified in the body of a function are part of the global scope of the enclosing shell. Variables whose names begin with an underscore are local to the function.
Function definitions may be deleted using
with no body. Note that this is different from
which is a function with an empty body.
To execute a command, pigshell searches within its builtins and
/bin for a
match, in that order. If a command contains a path separator, then it is looked
up directly in the filesystem without going through the search process.
ಠ_ಠ There is no PATH variable as yet. It is more likely we will move towards union directories like Plan 9.
The following special variables are maintained by pigshell:
$0, $1.. $n,
$#: These variables are used inside a script to determine individual arguments to the script, the list of arguments, and the number of arguments respectively.
$?: Exit value of the last command.
truefor successful commands.
$!: PID of the latest executed pipeline.
- All builtin commands may be listed by the
helpcommand. Specific usage of a given command, say,
grep, may be obtained either using
grep -h. All builtins support the
- All pipelines have an implicit
Stdout"command" at the head and tail respectively. Objects which reach
Stdoutare displayed according to their type. Objects like files have an
htmlattribute which is used to render them to the output div.
Filter commands like
printftake in files, filter or transform them, and emit objects to
Stdout. These commands can be supplied with files in one of two ways:
As a list of arguments, corresponding to the
<file>...option given in the usage. These arguments may be strings representing file paths, actual File objects, or a mixture of both. e.g.
grep -f gender "female" /facebook/friends/*
grep -f gender "female" /facebook/friends/A* $close_friendswhere the
close_friendsvariable a list of File objects.
As a list of File objects from
ls /facebook/friends | grep -f gender "female"
echo $close_friends | grep -f gender "female"
If you accidentally fail to give either of these, a line with a blinking cursor will open up below the command. This is
Stdintrying to get input from the terminal. Typing into this line and pressing Enter will feed a string to the command. To indicate end of input, type Ctrl-D. To simply get out, click to the right of the latest shell prompt to move focus there.
Many commands which operate on objects have options to specify or extract attributes from the object.
-foption is commonly used to refer to a field in the object. For instance, File objects correponding to Facebook friends have attributes like
friend_count, etc. You can thus
ls /facebook/friends | grep -f gender "^male"
ls /facebook/friends | sort -f friend_count
to use those specific fields for filtering or sorting.
You can access nested attributes as well:
ls /facebook/friends | grep -f raw.relationshup_status single
ls /picasa/albums/Blah | sort -e "x.width * x.height"
sorts photos based on how many pixels they contain. The expression will be called with the argument
xset to the object.
heightare attributes of the object.
Important Built-in Commands
We will briefly go over the more important of these commands. Each of them deserves a man-page worth of elaboration, but hopefully these will suffice in the interim.
ls normally emits File objects to standard output, except if
-l option is given, in which case it emits a text-based long-listing.
Just like Unix
ls, it will normally will display the contents of
directories unless the
-d flag is supplied. This is sometimes confusing -
the equivalent of
sort -f friend_count /facebook/friends/* is not
ls /facebook/friends/* | sort -f friend_count, which would descend inside
friend directories, but one of
ls -d /facebook/friends/* | sort -f friend_count or
ls /facebook/friends | sort -f friend_count.
T: This is the equivalent of
test, shortened to a T. Operators are
< are string comparison operators.
< need to be quoted to avoid being interpreted as redirectors.
-ge are the arithmetic operators. Though
-n are provided, it is better to check for an empty variable
T $#varname -eq 0.
E: This is the equivalent of
expr. Arguments are concatenated and
evaled. Incrementing a variable can be done like this:
counter=$(E $counter + 1)
cat: Does what you would expect - given a filename or a File object,
reads its contents and dumps it to standard output. All the following
cat $(ls /doc/README.md)
ls /doc/README.md | cat
have the same effect: dumping of the contents of README.md onto the terminal. In addition,
cat also copies strings received at standard input to output.
edit: A barebones editor based on CodeMirror. Common Emacs bindings work.
grep: Object filter. You can select objects by a regex matched
against their string representation, specific object fields, or by an
expression involving object attributes. For example,
ls /facebook/me/albums/FooAlbum/* | grep -e 'x.likes > 100'
grep "^[a-d]" /facebook/me/albums/*/* outputs File objects
(and not contents of those files) corresponding to photos with names
beginning from a-d.
grep will filter text from standard input.
cat /doc/README.md | grep '^Pig'
printf: String formatting and printing. It can be used for
standard printf-style formatting, e.g.
printf "%-20s %s" $name $message
as well as for printing object attributes:
ls /facebook/friends | printf "%(name)-20s %(friend_count)s"
friend_count are attributes.
printf -j dumps a JSON representation of an object, and is very
useful to find out what fields it contains.
sum: Equivalent of
wc, was renamed because it counts objects rather
than words. It can sum up fields and expressions as well.
docopt: Based on docopt, it is used for easily
processing command line options in scripts. Typically, you declare
a multi-line usage string usage string at the beginning of the script and
docopt command with
$*. If successful,
shell variables with appropriate names will be populated. Slightly magical,
but far more usable than getopt. You can look at scripts like
details of usage.
Pipeline status and control files are exposed in a special /proc filesystem, so simple scripts in /bin are sufficient to implement process management.
- ps: Lists running pipelines by PID, state and commands.
- kill: Kills one or more pipelines by PID.
- stop: Stops a pipeline. Equivalent to the Unix
- start: Resume a pipeline. Equivalent to the Unix
Pigshell represents cloud resources and system resources as files. Filesystems are responsible for maintaining local file objects corresponding to remote resources. We will briefly go over the filesystems currently supported.
Facebook: Click the Connect Facebook button to mount your Facebook
Google: Supports Picasa and Google Drive. Click the Connect Google
button to mount Picasa albums under
/picasa and GDrive under
Download: Presents a single directory,
/download. You may copy files
into this directory to download them to the desktop.
Upload: Click the Upload button in the right menu and select files.
Alternately, drag and drop files onto the terminal. These files will be
/upload and can be copied from there to a target directory.
Proc: The proc filesystem, mounted at
/proc, maintains a directory
corresponding to each running pipeline. Each directory has the following
- cmdline: Command line corresponding to the pipe
- status: Read-only, contains one of 'start', 'stop', 'done'.
- ctl: Write-only. Write 'stop' to stop a pipeline, 'start' to resume it, 'kill' to kill it.
Lstor: Mounted at
/local, this filesystem is backed by HTML5 local
storage. Files stored here will survive "reboots".
/local/bin is a good place
to store your personal scripts.
Pigshell is inspired by Unix and Plan 9. We are very familiar with several
Unix implementations, but our experience with Plan 9 is purely platonic. We
have tried to retain as much of a
bash flavour as possible, to make it easy
for experienced Unix users to start using the system and incrementally
discover features, without having to read a long and tedious document like
The command line interface is an important mode of human-computer
interaction, which has not seen much change since
tab completion. Modern HTML layout engines offer interesting possibilities
which Pigshell attempts to explore.
The pigshell grammar is implemented using a PEG, which is far easier to specify and debug than BNF. The disadvantage is somewhat poor error reporting.