This document provides a language-agnostic specification about how application are invoked from an interactive terminal or a text-based/non-gui/batch system. It is intend for developers who wish to understand details and considerations about command-line parsing. It is unlikely to be of much value to end-users looking to add command-line parsing to their own applications. For the implementation in optigrab, please see the accompanying Using Optigrab document.
An innvocattion of command-line application commonly has this structure:
[env] [interpreter] script [verb] [options] [targets]
<-------- arguments ------->
<------------------- command-line -------------------->
(Components in [bracket] are optional.)
The sections below, detail and discuss the considerations for parsing each of the components.
See (Wikipedia: Environment Variable)[https://en.wikipedia.org/wiki/Environment_variable]
Things to note about environment variables: - inherited by child processed - implemented by the OS
The interpreter is the application that executes the script. The interpreter may be specified on the command line or There are numerous ways the interpreter may appear: Rscript
, R CMD BATCH
, r (little-r). On nix systems the interpreter* is optional and may be specified in the script through the #! (SHEBANG) syntax.
The script is the (path to the) program file/code to be run.
Collectively, arguments refers to the [verb}(#Verb), options and target(s). Each type of argument is handled in its own way.
A verb is a single optional command. (Not all applications use verbs; git is one popular application that does). The verb is the first value after the script and before any options.
The constraint that the verb is the first argument before any options is not universally followed. Some application allow placement of the verb anywhere so long as it is cannot be interpretted as part of an option. This introduces ambiguity with applications that allow the binding of multiple values to a name. This is inherently true of R since R atomistic object are vectors and not scalars.
Typically application allow only one verb. It might be easy to define verbs as all values that occur between the name of the script and the first flag (if any). If there are more than one verbs in the arguments array, the first is taken as the verb and a warning is issued
[flag[=][value [value]] [flag[=][value [value]] [...]]]
The options are the name-value(s) pairs that can be parsed into variables. Names are identified by a flag, usually a slight syntactical variation on the name such as preceeding it with one or two characters. “-” (Java-style), “–” (GNU-style) or “/” (Microsoft-style).
-name
--name
/name
Associated with each name can be zero, one or several values. If there are no values provided, the type of the variable is assumed to be boolean/logical and the value is taken as TRUE (1)
Parsing and the interpretaion of options are the focus of this library. Options occur after the script and any verbs and before any targets. The challenge of translating command-line options to variable in the program require several steps. The following conventions are used As an example consider the following simple innvocation:
script --foo bar
Component | Example | Description |
---|---|---|
command-line | script.r –foo bar | referes to thte complete command-line: script, verb, etc. |
arguments verb options targets refer to the script arguments as they appear on the command line. By default they have no type.
opt_string ‘–foo bar’ string; options as they are typed at the command-line
opts c('--foo', 'bar')
character vector; how options appear in commandArgs
flag –foo how the flags can appear on the command line; not needed by the developer
name foo the name used for the option or variable
value bar the value of the variable
The targets are typically files or other resources that are affected by the commands. See Target.
Targets are similar to the verbs in that they do not require much interpretation. Typically, they are one or more files or resources. Sometimes the expressions are globs.
Targets can be ambiguous since option can have an indefinite number of values. Consider the following command-line:
script.r --foo 1 --bar my-file-1 my-file-2
In this example, it is unclear whether my-file-1
and my-file-2
should be associated with --bar
or or targets or a combination.
Parsing command-line arguments is a bit more tricky than it may appear. Parsing involves severals steps:
opt_fill
gets values from the command-line using an existing recursive object as a template. Values in the existing recursive structure may be clobberred/overwritten or created. There are several use cases:
clobber == TRUE
: command-line application in which opts
are overwritting application defaults.
clobber == TRUE && create == TRUE
: same as above, but also get other missing commands such as
Reursive structures have several options for filling based on opts
.
x
name exists x/opts: TRUE FALSEIt seems there are two use-case.
x
is defaults; values should be clobbered and set.x
is set; alternative values should be set.clobber
and .create
clobber: if x
exists is the value overwritten.
clobber & create : clobber existing values; create new ones USE CASE: overwriting default, (commando)
clobber & ! create : only clobber values USE CASE: x are all the values that we need …only get those. commando with mutliple commands and one config file, etc. (commando: multiple commands)
! clobber & create : get new values (e.g. don’t change defaults) USE CASE: (rare) keep old values adding new ones
! clobber & ! create : non-op (with warning) USE CASE: None
(Wikipedia: Environment Variable)[https://en.wikipedia.org/wiki/Environment_variable]