Germán Rodríguez
Stata Resources Princeton University

Keeping Track of Ancillary Programs, Files and Folders

Stata commands often need to access external programs or ancillary files, which may be hard to find unless they are in the working directory or the ado path. The whereis command provides a convenient way to keep track of resource locations by maintaining a directory or registry of external files and folders, making things simple for developers and users alike.

Syntax

The syntax of the command is simple

whereis [name [location]]

The name argument specifies the name of a resource, and must be a single word conforming to Stata conventions for names, with no spaces.

The location argument is used when registering a resource and should be a full path specifying the location of the file or folder. This must conform to Stata conventions for file names. In particular, it should be enclosed in quotes if it includes spaces.

When the command is called with name and location it checks that the named file or folder exists and creates or updates an entry in its registry. If only a name is specified the command retrieves and prints the location of the named resource and checks that it exists. In both cases the location is stored in a macro called r(name) using the name of the resource.

If the command is called with no arguments it simply lists all registered resources.

Motivation

Consider a Stata command that needs to access an external executable, for example the pandoc document converter. Suppose pandoc was installed at c:\program files (x86)\pandoc\pandoc.exe. How can we pass this information to the command?

One solution I have seen used is to provide an option for the user to specify the full path to the executable. For example the developer may provide a pandoc() option, so the user can type pandoc(c:\program files (x86)\pandoc\pandoc.exe) among the options. Unfortunately this procedure is tedious and error prone, as the path has to be specified every time the program is used.

An alternative solution is to define a global macro, for example global PANDOC c:\program files (x86)\pandoc\pandoc.exe, and an even better one is to store this macro in the user's profile.do file, so it will be loaded when Stata starts up. There is a slight inefficiency in defining the macro regardless of whether the resource will be used, but presumably only a small number of programs would be involved for any given user.

Our Solution

The whereis command provides a simpler solution. Once pandoc has been installed, the user registers its location by typing in Stata the one-time command whereis pandoc "c:\program files (x86)\pandoc.exe", where we have used quotation marks because the full path to the command includes spaces.

In turn, the Stata command that needs to know the location of pandoc uses the one-liner whereis pandoc. The whereis command will print the location of the file and, being an r-class command, will also store it in the macro r(pandoc), where it can be retrieved.

Diana Goldemberg from the World Bank suggested using whereis to store the location of folders as well as files, and indicated how to modify the code to enable this extension. Their teams use GitHub and Dropbox with the same project folder structure, but everyone has a different GitHub or Dropbox root. Storing the root with whereis provides a uniform way to refer to project files and folders.

The advantages of the whereis approach over storing global macros in profile.do are that the resource location is retrieved only on demand, and more importantly, the command checks that the file or folder exists at the given location, both on storage and retrieval. This feature can be important when Stata executes a command by ``shelling out'', as the failure may not be noticed immediately.

Tips for Users

For this scheme to work the user needs to know the location of the external resources. If you are not sure exactly where a program has been installed, the operating system may help locate the file.

On Mac and Linux systems there is a system command called which that can find an executable by searching the user's path. If you are not quite sure where pandoc was installed in your Mac, open a terminal window (select Applications, Utilities and then Terminal) and type which pandoc. This will list the path to the executable if found. (There is also a Unix whereis command, after which this Stata command is named, which searches the standard locations for binary files, but I have obtained better results with which.)

On Windows there is a similar command called where. By default this searches only the user's path, but there is an option to search recursively. If you think pandoc was installed in your C drive try opening a command prompt window and typing where /R c:\ pandoc.exe.

Once you have identified the location of the file of interest using the operating system, don't forget to register it by running the Stata whereis command.

Notes for Developers

Programmers using whereis to access a resource should allow for the possibility that the path may include spaces. For example to execute pandoc one could code

. whereis pandoc
. shell "`r(pandoc)'" *arguments*

Note that the command will fail with error code 601 if pandoc has not been registered with whereis or if the file is not found in the specified location.

Installation

The whereis command is available from the Statistical Software Components (SSC) archive and can be installed by typing in Stata

. ssc install whereis

You may also try search whereis and follow the links. The current version is 1.4, and became available on SSC on 28 feb 2020.