Pigshell User Guide
Introduction
Pigshell is a web app which presents resources on the web, including public web pages and private data in Facebook, Google drive and Picasa albums as files in a hierarchical file system. It provides a command line interface to construct pipelines of simple commands to filter, transform and display data.
The pigshell system is similar in spirit to Unix and bash, but also borrows
several ideas from Plan 9 and rc
, mixing syntax and features in a manner
calculated to annoy experienced users of both systems.
The name pigshell comes from the time-honoured tradition of weak puns and recursive acronyms: GNU's Not Unix, and PIG Isn't GNU.
The shell and shell scripts occupy an important niche in the Unix users' universe: they can quickly assemble ad-hoc tools from simple components to interact with their data. For complex applications, they might open the editor and write a program, but for hundreds of simple operations, the humble shell suffices.
There is no equivalent in the world of the web and the cloud, though an increasing amount of our data resides there. One is forced to go through GUIs, each with their individual warts and annoyances. Imagine having to open a different GUI application every time you accessed a different disk, with no way to directly copy from one disk to the other. The alternative is to crack open an editor, read up a plethora of API documents and do a fair amount of coding and debugging before getting the first trickle of data to go from point A to B.
Pigshell is a place to have informal conversations with data.
In this document, we describe the different components of the system, their main features and examples of usage. In addition, we will also point out the more prominent gotchas, unimplemented features and bugs.
Broadly, Pigshell consists of the following:
- The shell itself.
- Built-in commands.
- Filesystems, to represent resources from various data providers as filesystems.
Shell
The shell is designed to be feel familiar to Unix and bash users, but there are crucial differences. The most important of these are:
- Objects are the fundamental currency of the system. Objects are passed across pipes, rather than streams of unstructured data. The web environment frequently returns structured objects, in JSON, for instance, and there is no point in losing that structure and recovering it in every other stage of the pipeline.
- Commands are not really concurrently running processes. They are generator
functions which yield objects. Pipes are basically operators to compose
long chains of these functions. In the pipeline
ls -l | grep foo | sum
, the (implicit)Stdout
function asks its upstream function (sum
) for an item, which in turn does the same to its upstream function (grep
) and so on.ls
yields a File object, whichgrep
filters, then asksls
for more.ls
yields another File whichgrep
passes on tosum
, which increments a counter, then asksgrep
for more. Finally,ls
signals that it's out of objects, relayed bygrep
down tosum
, which outputs the counter toStdout
. - The pipeline is the fundamental unit of "process management". You can kill, stop, resume pipelines of commands, rather than individual commands themselves.
Terminal usage
The shell presents itself as a terminal with a command line. Emacs-style command line editing is possible. Common shortcuts include:
- Ctrl-A, Ctrl-E: Go to the beginning or end of line.
- Ctrl-U, Ctrl-K: Kill text up to the beginning or end of line.
- Ctrl-W: Kill previous word.
- Ctrl-L: Clear screen.
- Up arrow, Down arrow: Navigate through command history.
- Ctrl-D: End of input.
The primary prompt consists of pig<basename_of_cwd>$
.
When you type a command at the primary prompt and hit Enter, it starts
running immediately. This is the foreground command. A secondary prompt of
>
is displayed.
You can use this prompt to typeahead another command, which will be executed after the foreground command completes. You can queue multiple commands in this way, and they will be executed in strict sequence.
To kill the foreground command, use Ctrl-C. This also triggers the running of the next queued command, if any.
Similarly, to pause the foreground command and continue with any queued
commands, use Ctrl-Z. The paused command can be resumed using ps
and
start
.
You may use Ctrl-B to "background" the foreground command and start running the next queued command. This is typically done when the foreground command is going to run for several seconds, and the queued command is not dependent on its predecessor.
The output of commands is restricted to an (elastic) area below the command line. Thus, many commands may be running and generating output at the same time without stomping over each other, maintaining the question-answer structure of the command line conversation.
This also means that multiple commands may be waiting for input, as indicated by blinking cursors. Simply click next to the cursor to switch focus.
The running status of a pipeline is visually indicated by the colour of the prompt.
- A green prompt indicates that the command is running,
- Amber indicates that it is stopped
- Black indicates that it has completed with a successful exit status.
- Red indicates that it has completed with an unsuccessful exit status.
Reloading the webpage is equivalent to rebooting the system and the loss of all local state. Only files stored in /local and filesystems backed by a persistent remote store (e.g. PstyFS, Google) will survive a reboot.
ಠ_ಠ Occasionally, things may get buggered up to the point that there is no cursor visible anywhere. In such cases, simply click near the last prompt and you should get focus there, and resume typing commands.
ಠ_ಠ Cut and paste is also somewhat iffy.
Simple commands
ls | sum
echo able baker charlie >/tmpfile
echo some more >>/tmpfile
ls A*
ls *.jpg
cat bar | grep foo >/dev/null && echo "bar contains foo"
cat < asd > bsd
rm somefile || echo rm failed!
Escaping arguments
- To quote an argument containing spaces or special characters, it must be enclosed in single or double quotes. There is no difference between the two. Variable interpolation is not done for arguments in double quotes.
- Arguments with one type of quote may be enclosed in the other, e.g. "Patrick O'Brian" and 'Benjamin "Bugsy" Siegel'.
- Backslashes may be used to escape special characters in unquoted strings.
Variables
Pigshell variables are lists of objects. Most commonly, they are lists of strings. Variables may be assigned values in the usual manner:
msg="How's it going?"
dirs=(/facebook /twitter /gdrive)
Parentheses are used to enclose lists. The variable dirs
is thus assigned a
list of two strings. msg
is a list containing one string.
Lists are expanded on reference.echo $dirs
would yield/facebook /twitter /gdrive
The echo
command is invoked with two arguments.
To add to a list,dirs=($dirs /picasa)
echo $dirs
would give/facebook /twitter /gdrive /picasa
Variables may be subscripted by a list of numbers (or a list of expressions
yielding numbers) to retrieve part of the list. List indexing starts at zero.
For example,index=0
echo $dirs($index 2 $index)
would give/facebook /gdrive /facebook
The number of elements in the variable dirs
can be found using $#dirs
.
One can do the equivalent of an array.join(' ')
using the $"
operator.words=(Holy Plan9 Ripoff Batman)
sent=$"words
echo $words
and echo $sent
will both printHoly Plan9 Ripoff Batman
Note thatecho $#words $#sent
will print4 1
Referring to a nonexistent variable yields an empty list, referring to its
length gives 0, and $"nonexistent
gives the empty string.
Variable Scope
Local Scope: Positional variables (
$1
,$2
...$*
) and variables whose names begin with an underscore (e.g._i
,_foo
) are local to the enclosing function or shell.Global Scope: All other variables are global to the shell, and may be freely referenced and set inside functions.
Exports: There is no notion of
export
, copies of all global variables are inherited by a child shell from its parent. Changing a variable in a child will not affect the value in the parent.
Concatenation
Arguments may be concatenated using the ^
operator. In most cases, it is
not necessary, since pigshell will automatically concatenate arguments which
adjoin each other without any intervening whitespace. For example, in the
commandable=able; baker=baker; echo "able"baker able'baker' "able"'baker' able$baker $able^baker $able$baker
echo
has 6 arguments, each of which is ablebaker
. Note that a caret was
only required to resolve ambiguity in one case.
The rules for concatenating lists are as follows:
- Concatenation is a left-binding operator. i.e.
a^b^c
is parsed as(a^b)^c
- Concatenation operates on strings. List elements are coerced into strings
using the
toString()
method before concatenation. - An empty list A concatenated with a list B will yield B.
- A list A with a single element concatenated with B will yield a list where
A(0) is concatenated with every element of B.
a=able; b=(1 2 3)
echo $a$b
givesable1 able2 able3
echo $b$a
gives1able 2able 3able
- If lists A and B have the same number of elements, the result is a list of
strings concatenated pairwise.
a=(able baker charlie); b=(1 2 3)
echo $a$b
givesable1 baker2 charlie3
- Lists not conforming to any of the above rules cannot be concatenated.
Command substitution
Command substitution allows the standard output of a command to be converted
into an expression, which may be used as a command argument or assigned to a
variable. Pigshell supports only the $(command)
form, not the backtick
form. For example,files=$(ls)
nfiles=$(ls | sum)
echo "Number of files: " $(ls | sum)
Note that files
contains a list of file objects. Command substitution is
the easiest way to get objects into variables.
Command substitutions may be nested:echo $(printf -s $format $i $(cat $i/status) $(cat $i/cmdline))
Control Flow - if
The syntax of the if
construct is very similar to bash.if
cond; then
tcmd... [; elif
cond; then
tcmd... ]
[; else
ecmd... ]; fi
If the exit value of the cond command is true
, we enter the then
clause.
Any exit value other than true
is considered false. Commands may be spread
over multiple lines, like in bash.
Control Flow - for
for
loops are also similar to bash.for i in
list ; do
cmd...; done
Control Flow - while
while
loops are, again, similar to bash.while
cond; do
cmd...; done
ಠ_ಠ Running a potentially infinite while loop from the CLI is not advisable, as it cannot be killed. Doing so from a script is fine, since scripts execute within a new shell which can be killed.
Functions
Functions can be defined as follows:function
funcname {
cmd.. }
Functions behave like inline scripts in how they are invoked, how arguments
are accessed within the body, and their ability to be part of pipelines.funcname arg1 arg2
funcname arg1 arg2 | grep foo
Arguments are accessed within the body of the function using positional
arguments, $0...$n
and $*
.
All global variables accessed, defined and modified in the body of a function are part of the global scope of the enclosing shell. Variables whose names begin with an underscore are local to the function.
Function definitions may be deleted usingfunction
funcname
with no body. Note that this is different from function
funcname {}
,
which is a function with an empty body.
Command Execution
To execute a command, pigshell searches within its builtins and /bin
for a
match, in that order. If a command contains a path separator, then it is looked
up directly in the filesystem without going through the search process.
ಠ_ಠ There is no PATH variable as yet. It is more likely we will move towards union directories like Plan 9.
Special Variables
The following special variables are maintained by pigshell:
$0, $1.. $n
,$*
,$#
: These variables are used inside a script to determine individual arguments to the script, the list of arguments, and the number of arguments respectively.$?
: Exit value of the last command.true
for successful commands.$!
: PID of the latest executed pipeline.
Built-in Commands
Pigshell has a large number of built-in commands. These commands are implemented in Javascript and have access to all the internal APIs and filesystems. Many of these commands follow a common set of idioms.
- All builtin commands may be listed by the
help
command. Specific usage of a given command, say,grep
, may be obtained either usinghelp grep
orgrep -h
. All builtins support the-h
option. - All pipelines have an implicit
Stdin
andStdout
"command" at the head and tail respectively. Objects which reachStdout
are displayed according to their type. Objects like files have anhtml
attribute which is used to render them to the output div. Filter commands like
grep
andprintf
take in files, filter or transform them, and emit objects toStdout
. These commands can be supplied with files in one of two ways:As a list of arguments, corresponding to the
<file>...
option given in the usage. These arguments may be strings representing file paths, actual File objects, or a mixture of both. e.g.grep -f gender "female" /facebook/friends/*
grep -f gender "female" /facebook/friends/A* $close_friends
where theclose_friends
variable a list of File objects.As a list of File objects from
Stdin
. e.g.ls /facebook/friends | grep -f gender "female"
echo $close_friends | grep -f gender "female"
If you accidentally fail to give either of these, a line with a blinking cursor will open up below the command. This is
Stdin
trying to get input from the terminal. Typing into this line and pressing Enter will feed a string to the command. To indicate end of input, type Ctrl-D. To simply get out, click to the right of the latest shell prompt to move focus there.Many commands which operate on objects have options to specify or extract attributes from the object.
The
-f
option is commonly used to refer to a field in the object. For instance, File objects correponding to Facebook friends have attributes likegender
,friend_count
, etc. You can thusls /facebook/friends | grep -f gender "^male"
ls /facebook/friends | sort -f friend_count
to use those specific fields for filtering or sorting.You can access nested attributes as well:
ls /facebook/friends | grep -f raw.relationshup_status single
The
-e
option can be used to specify a lambda expression in Javascript which can be used to combine or filter field values in complex ways.ls /picasa/albums/Blah | sort -e "x.width * x.height"
sorts photos based on how many pixels they contain. The expression will be called with the argumentx
set to the object.width
andheight
are attributes of the object.
Important Built-in Commands
We will briefly go over the more important of these commands. Each of them deserves a man-page worth of elaboration, but hopefully these will suffice in the interim.
ls: ls
normally emits File objects to standard output, except if
the -l
option is given, in which case it emits a text-based long-listing.
Just like Unix ls
, it will normally will display the contents of
directories unless the -d
flag is supplied. This is sometimes confusing -
the equivalent ofsort -f friend_count /facebook/friends/*
is notls /facebook/friends/* | sort -f friend_count
, which would descend inside
friend directories, but one ofls -d /facebook/friends/* | sort -f friend_count
orls /facebook/friends | sort -f friend_count
.
T: This is the equivalent of test
, shortened to a T. Operators are
bash-like. =
, !=
, >
and <
are string comparison operators. >
and
<
need to be quoted to avoid being interpreted as redirectors. -eq
,
-ne
, -lt
, -le
, -gt
, -ge
are the arithmetic operators. Though
-z
and -n
are provided, it is better to check for an empty variable
using T $#varname -eq 0
.
E: This is the equivalent of expr
. Arguments are concatenated and
Javascript eval
ed. Incrementing a variable can be done like this:
counter=$(E $counter + 1)
cat: Does what you would expect - given a filename or a File object,
reads its contents and dumps it to standard output. All the following
commands:cat /doc/README.md
cat $(ls /doc/README.md)
ls /doc/README.md | cat
have the same effect: dumping of the contents of README.md onto the terminal.
In addition, cat
also copies strings received at standard input to output.
edit: A barebones editor based on CodeMirror. Common Emacs bindings work.
grep: Object filter. You can select objects by a regex matched
against their string representation, specific object fields, or by an
expression involving object attributes. For example,ls /facebook/me/albums/FooAlbum/* | grep -e 'x.likes > 100'
grep "^[a-d]" /facebook/me/albums/*/*
outputs File objects
(and not contents of those files) corresponding to photos with names
beginning from a-d.
Lastly, grep
will filter text from standard input.cat /doc/README.md | grep '^Pig'
printf: String formatting and printing. It can be used for
standard printf-style formatting, e.g.printf "%-20s %s" $name $message
as well as for printing object attributes:ls /facebook/friends | printf "%(name)-20s %(friend_count)s"
where name
and friend_count
are attributes.
Finally, printf -j
dumps a JSON representation of an object, and is very
useful to find out what fields it contains.
sum: Equivalent of wc
, was renamed because it counts objects rather
than words. It can sum up fields and expressions as well.
docopt: Based on docopt, it is used for easily
processing command line options in scripts. Typically, you declare
a multi-line usage string usage string at the beginning of the script and
invoke the docopt
command with $usage
and $*
. If successful,
shell variables with appropriate names will be populated. Slightly magical,
but far more usable than getopt. You can look at scripts like /bin/kill
for
details of usage.
Process Management
Pipeline status and control files are exposed in a special /proc filesystem, so simple scripts in /bin are sufficient to implement process management.
- ps: Lists running pipelines by PID, state and commands.
- kill: Kills one or more pipelines by PID.
- stop: Stops a pipeline. Equivalent to the Unix
kill -STOP
. - start: Resume a pipeline. Equivalent to the Unix
kill -CONT
.
Filesystems
Pigshell represents cloud resources and system resources as files. Filesystems are responsible for maintaining local file objects corresponding to remote resources. We will briefly go over the filesystems currently supported.
Facebook: Click the Connect Facebook button to mount your Facebook
account at /facebook
. Pigshell is pure client-side, so privacy is
completely assured.
Google: Supports Picasa and Google Drive. Click the Connect Google
button to mount Picasa albums under /picasa
and GDrive under /gdrive
.
Download: Presents a single directory, /download
. You may copy files
into this directory to download them to the desktop.
Upload: Click the Upload button in the right menu and select files.
Alternately, drag and drop files onto the terminal. These files will be
available under /upload
and can be copied from there to a target directory.
Proc: The proc filesystem, mounted at /proc
, maintains a directory
corresponding to each running pipeline. Each directory has the following
files:
- cmdline: Command line corresponding to the pipe
- status: Read-only, contains one of 'start', 'stop', 'done'.
- ctl: Write-only. Write 'stop' to stop a pipeline, 'start' to resume it, 'kill' to kill it.
Lstor: Mounted at /local
, this filesystem is backed by HTML5 local
storage. Files stored here will survive "reboots". /local/bin
is a good place
to store your personal scripts.
Design Principles
Pigshell is inspired by Unix and Plan 9. We are very familiar with several
Unix implementations, but our experience with Plan 9 is purely platonic. We
have tried to retain as much of a bash
flavour as possible, to make it easy
for experienced Unix users to start using the system and incrementally
discover features, without having to read a long and tedious document like
this one.
The command line interface is an important mode of human-computer
interaction, which has not seen much change since bash
invented
tab completion. Modern HTML layout engines offer interesting possibilities
which Pigshell attempts to explore.
The pigshell grammar is implemented using a PEG, which is far easier to specify and debug than BNF. The disadvantage is somewhat poor error reporting.