CMSC 14100 — Lecture 25

Executables

At the beginning of the quarter, we said that there were two ways to run a program:

This corresponds to the two ways of how programs are typically used.

You may have noticed that we really only stuck with one. Since we test different parts of the programs you write, we treat the programs you write as libraries—we import your code, which gives us access to the functions you've defined. We then run those functions against tests.

A lot of programs are treated this way and many programs that you use really have many, many libraries interacting with each other underneat the surface. However, most people think of "the program" as the thing that's run and the thing they interact with. That's what we'll be discussing a bit here.

First, it's important to think about how programs are executed in Python. You may have seen that when we import a module in Python, all the code gets executed immediately, from top to bottom. This is often invisible when a module only contains definitions, but the same thing is happening: Python is executing every statement in the file from top to bottom and this includes function definitions.

What's more is that this happens just the same for imports—whenever Python hits an import line, it will import the file according to how we've discussed. This means we have to be a bit careful when thinking about the flow of the program. For instance, if we run the program


    import stuff
    print("hi")
    

we will run whatever is in stuff.py first, before printing "hi". You can think of this roughly in the same way as how control works for functions: we enter the function and finish execution before returning to where we entered it.

Now, you may have noticed this behaviour and you may have taken advantage of this as a quick way to test your programs without having to retype everything in IPython every time you make a change. In fact, there are ways to make use of this.

Suppose that there is a program that you've written that you'd like to use as both a library and an executable. Of course, it is not good if you import a library and it starts executing statements all of a sudden—we want to isolate this behaviour so that it only occurs when the program is treated like an executable.

Luckily, Python has a mechanism for this: we check the name of the module. Every time we import a module, it comes attached with its name, so that we can keep all the names in each module separate from each other. This name is stored in a special attribute called __name__. However, the program we run directly has a special name: "__main__". Then we can check it in the way that you'd expect.

if __name__ == "__main__":

This explains why sometimes when you inspect a value in IPython, it may mention __main__.

This is a handy way to control execution: if there's something you want to run only when it's run as an executable, just put it inside the above conditional!

Now, let's turn our attention to running programs as executables. Again, there are different types of executables.

We'll focus on the second. In fact, we've already been using such programs throughout the course. git and pytest are great examples of this: we provide "commands" to the program, but these are really inputs, and we get some output from them. This is the paradigm that most Unix command line programs use: we run a command and something happens, then we run the next command.

Again, we can think of these like functions. Within a programming language, we can provide inputs by calling a function, but how do we do that from outside of the language? The answer is what we've been doing all this time: adding arguments after the command. This seems like something new, but once we frame it in terms of things we've already worked with, this is actually surprisingly dull: we're getting a string input (the "commands") which we need to take apart and do stuff with.

There is a mechanism to do this.


    import sys

    if __name__ == "__main__":
        print(sys.argv)
    

Here, sys.argv is a special variable that holds a list of arguments that were passed in as part of the execution of the program. This list is space-delimited, so imagine that we took the command that was run and we ran split() on it.

It's important to note here what exactly is in this list. sys.argv[0] is the first item in the list and Python takes that to mean the name of the program (i.e., the .py file) that was run. So the first item we typically want to look at as an argument is sys.argv[1].

(You might ask why the name of the program is the first argument. Sometimes it's helpful to know the name of the program that was run.)

One thing we quickly realize is that, as with CSV and JSON files, parsing, that is, the act of processing text, is pretty easy in theory but very difficult to do completely correctly in practice—it's handling all of those special cases that becomes a challenge. In such cases, it's almost always the case that someone will have written a very useful parser for us. In this case, we have the module argparse.

argparse allows us to specify arguments we'd like to look for and actions to take. In addition to making it handy to parse the arguments, it is a convenient way to build up the documentation for all of these options.


    import argparse

    if __name__ == "__main__":
        parser = argparse.ArgumentParser(
                        prog = "Program Name",
                        description = "A description of your program")
        parser.add_argument('-a', '--argument', required=True)
        args = parser.parse_args()
        a = args.argument
    

Here, we have an ArgumentParser, in which we specify basic information about our program that will be displayed to the user. We add arguments to the parser by specifying a short flag -a, a long name --argument, and whether it's required or not.

To parse the arguments, we call parse_args() on the parser and this creates an object that has all of our arguments stored as attributes, named according to the long name we gave each argument. So when someone runs our program with -a thing or --argument thing, the result of args.argument will be "thing.

That's a very basic rundown of how to provide input on the fly to a program. Everything we've mentioned so far assumes that the user is going to specify the options they want when they run the program. But what if there are a lot of options? Or maybe we want these options to persist between executions?

Instead of providing arguments, we can put all of that stuff into a file and have the program read the file to figure out which options we'd like enabled. This concept is a configuration file.

In fact, this is how many programs store your options and preferences—they keep that stuff in a file and load it whenever the program starts running. You've already seen an example of this: Visual Studio Code's options are stored as a JSON file. For example, here is my VS Code's settings.json:


{
    "workbench.colorTheme": "Solarized Light+",
    "editor.fontFamily": "'Fira Code', 'SF Mono', Menlo, Monaco, 'Courier New', monospace",
    "editor.rulers": [
        72,
        80
    ],
    "files.exclude": {
        "**/.git": true,
        "**/.svn": true,
        "**/.hg": true,
        "**/CVS": true,
        "**/.DS_Store": true,
        "**/Thumbs.db": true,
        "**/*.olean": true
    },
    "editor.lineNumbers": "relative",
    "editor.fontLigatures": true,
    "editor.minimap.enabled": false
}