Roland's homepage

My random knot in the Web

Command-line versus GUI program

TL;DR

When you are exploring a problem, in general first write a command-line program whenever possible.

It will take less effort to write then a full-blown GUI.

Introduction

Recently I wrote a program to remove the protection from ms-excel files.

The original version was written as a command-line program. Later I re-used the relevant code for a GUI program for use on ms-windows. This was mainly for the benefit of some colleagues who are not comfortable with using the command-line.

In this article I want to contrast the two programs.

Both programs can be found in my public scripts repository. The command-line version is called unlock-excel.py, while the GUI version is unlock-excel.pyw.

Excel 2007+ files are essentially zip-files filled with mostly XML files. Since Python has a module for opening zip-files, and we can use regular expressions to remove the offending sheetProtect and workbookProtect tags, it is well-suited to this task.

Command-line program

The command-line version is around 50 lines of code, 10 additional blank lines and 8 lines of comments.

The main function is pretty simple.

def main(argv):
    logging.basicConfig(level='INFO', format='%(levelname)s: %(message)s')
    if not argv:
        logging.info(f'unlock-excel.py v{__version__}; no files to unlock')
        print('Usage: unlock-excel.py <file> <file> ...')
        sys.exit(0)
    for path in argv:
        try:
            backup_path = backup_file(path)
            remove_excel_password(backup_path, path)
        except (ValueError, shutil.SameFileError) as e:
            logging.info(e)

For each of the file names given on the command-line, it calls two functions. The first function backup_file tries to create a backup file by appending -orig to the filename and make a copy.

def backup_file(path):
    first, last = path.rsplit('.', maxsplit=1)
    backup = first + '-orig' + '.' + last
    logging.info(f'moving "{path}" to "{backup}"')
    shutil.move(path, backup)
    return backup

The second function remove_excel_password does the actual modification of the xls[xm] files by opening it with the zipfile module and writing the members (modified if necessary) to a new zipfile.

def remove_excel_password(origpath, path):
    if not zipfile.is_zipfile(origpath):
        raise ValueError(f'"{origpath}" is not a valid zip-file.')
    with zipfile.ZipFile(origpath, mode="r") as inzf, zipfile.ZipFile(
        path, mode="w", compression=zipfile.ZIP_DEFLATED, compresslevel=1
    ) as outzf:
        infos = [name for name in inzf.infolist()]
        for info in infos:
            logging.debug(f'working on "{info.filename}"')
            data = inzf.read(info)
            if b'sheetProtect' in data:
                regex = r'<sheetProtect.*?/>'
                logging.info(f'worksheet "{info.filename}" is protected')
            elif b'workbookProtect' in data:
                regex = r'<workbookProtect.*?/>'
                logging.into('the workbook is protected')
            else:
                outzf.writestr(info, data)
                continue
            text = data.decode('utf-8')
            newtext = re.sub(regex, '', text)
            if len(newtext) != len(text):
                outzf.writestr(info, newtext)
                logging.info(f'removed password from "{info.filename}"')

Note that I tend to use the logging module for status updates. There is a tradition from UNIX that command-line programs should be silent unless errors occur. This tradition made sense in the days of teletypes since paper costs money. On a terminal emulator, not so much. That is not to say that a program should be gratuitously verbose. But in this case I’d like to know how many sheets had a password set and if it was successfully removed. Binary files like ms-excel files are pretty opaque otherwise.

GUI program

By contrast, the GUI version has approximately 190 lines of code, 43 lines of comments and 33 blank lines.

A GUI program is basically a combination of state data and a lot of functions that are called from the GUI’s event loop. Your program is basically a bolt-on to the GUI toolkit.

In this case I’m using the tkinter toolkit that comes with Python.

Some of the functions are called in reaction to the user manipulating a control (e.g. pressing a button). These are generally described as “callbacks”. Others are set to run whenever there are no events to process. A general name for those is an “idle task” or “timeout”.

Such idle tasks shouldn’t take too long. Anything up to say 50 ms is fine, since a user is unlikely to notice such a small amount of time. Functions that run longer would noticeably make the GUI unresponsive since they interrupt the event loop.

For longer running tasks, there are basically three possibilities;

  1. Break them up into smaller tasks that are executed sequentially.
  2. Run them in a separate process
  3. Run them in a separate thread.

The python.org implementation of Python (“CPython”) has the restriction that only one thread at a time can be executing Python bytecode. So a computation thread would still interfere with the event loop running in the main thread, but on a lower level. Additionally, not all GUI toolkits are thread-safe, so you should only make GUI calls from the main thread unless you know otherwise. Although it seems that with Python 3.x, tkinter is thread-safe. The combination of these facts makes multithreading tricky for Python GUI applications in my opinion. It’s fine if the not-main threads are doing I/O or if they use e.g. time.sleep(). But doing long calculations might still make the GUI less responsive.

In this case, I will be using the first option since it is relatively easy to split up the processing of an ms-excel file into small steps:

  • Open the source and destination files.
  • Make a list of the internal files.
  • Read an internal file, filter it if necessary and write it to the destination file. This step is repeated as often as necessary to process all internal files.
  • Close the source and destination files.

When starting the program, two user-provided function will be called first.

The first function creates the widgets in the root window.

def create_widgets(root, w):
    """Create the window and its widgets.

    Arguments:
        root: the root window.
        w: SimpleNamespace to store widgets.
    """
    # Set the font.
    default_font = nametofont("TkDefaultFont")
    default_font.configure(size=12)
    root.option_add("*Font", default_font)
    # General commands and bindings
    root.bind_all('q', do_exit)
    root.wm_title('Unlock excel files v' + __version__)
    root.columnconfigure(3, weight=1)
    root.rowconfigure(5, weight=1)
    # First row
    ttk.Label(root, text='(1)').grid(row=0, column=0, sticky='ew')
    w.fb = ttk.Button(root, text="Select file", command=do_file)
    w.fb.grid(row=0, column=1, columnspan=2, sticky="w")
    w.fn = ttk.Label(root)
    w.fn.grid(row=0, column=3, columnspan=2, sticky="ew")
    # Second row
    ttk.Label(root, text='(2)').grid(row=1, column=0, sticky='ew')
    w.backup = tk.IntVar()
    w.backup.set(0)
    ttk.Checkbutton(root, text='backup', variable=w.backup,
                    command=on_backup).grid(row=1, column=1, sticky='ew')
    w.suffixlabel = ttk.Label(root, text='suffix:', state=tk.DISABLED)
    w.suffixlabel.grid(row=1, column=2, sticky='ew')
    w.suffix = tk.StringVar()
    w.suffix.set('-orig')
    se = ttk.Entry(root, justify='left', textvariable=w.suffix, state=tk.DISABLED)
    se.grid(row=1, column=3, columnspan=1, sticky='w')
    w.suffixentry = se
    # Third row
    ttk.Label(root, text='(3)').grid(row=2, column=0, sticky='ew')
    w.gobtn = ttk.Button(root, text="Go!", command=do_start, state=tk.DISABLED)
    w.gobtn.grid(row=2, column=1, sticky='ew')
    # Fourth row
    ttk.Label(root, text='(4)').grid(row=3, column=0, sticky='ew')
    ttk.Label(root, text='Progress:').grid(row=3, column=1, sticky='w')
    # Fifth row
    sb = tk.Scrollbar(root, orient="vertical")
    w.status = tk.Listbox(root, width=60, yscrollcommand=sb.set)
    w.status.grid(row=4, rowspan=5, column=1, columnspan=3, sticky="nsew")
    sb.grid(row=4, rowspan=5, column=5, sticky="ns")
    sb.config(command=w.status.yview)
    # Ninth row
    ttk.Button(root, text="Quit", command=do_exit).grid(row=9, column=1, sticky='ew')

The widgets that have to be accesse of widgets that have to be accessed later later are stored in a types.SimpleNamespace.

The window it creates looks like this.

window for unlock-excel.pyw

The second initializes a types.SimpleNamespace that holds the program’s state. I like to wrap state up in a namespace so it is clear what is part of the program state and what is not.

def initialize_state(s):
    """
    Initialize the global state.

    Arguments:
        s: SimpleNamespace to store application state.
    """
    s.interval = 10
    s.path = ''
    s.inzf, s.outzf = None, None
    s.infos = None
    s.currinfo = None
    s.worksheets_unlocked = 0
    s.workbook_unlocked = False
    s.directory = None
    s.remove = None

After these functions are called, the program enters the tkinter event loop.

There is also a helper function that appends status messages to the progress listbox.

def statusmsg(text):
    """Append a message to the status listbox, and make sure it is visible."""
    widgets.status.insert(tk.END, text)
    widgets.status.see(tk.END)

The names of the functions that make up the steps of processing an ms-excel file all begin with step. They are listed in sequence below. At the end of each function sets the next step to run.

def step_open_zipfiles():
    path = widgets.fn['text']
    state.path = path
    statusmsg(f'Opening “{path}”...')
    first, last = path.rsplit('.', maxsplit=1)
    if widgets.backup.get():
        backupname = first + widgets.suffix.get() + '.' + last
    else:
        backupname = first + '-orig' + '.' + last
        state.remove = backupname
    shutil.move(path, backupname)
    state.inzf = zipfile.ZipFile(backupname, mode="r")
    state.outzf = zipfile.ZipFile(
        path, mode="w", compression=zipfile.ZIP_DEFLATED, compresslevel=1
    )
    root.after(state.interval, step_discover_internal_files)


def step_discover_internal_files():
    statusmsg(f'Reading “{state.path}”...')
    state.infos = [name for name in state.inzf.infolist()]
    state.currinfo = 0
    statusmsg(f'“{state.path}” contains {len(state.infos)} internal files.')
    root.after(state.interval, step_filter_internal_file)


def step_filter_internal_file():
    current = state.infos[state.currinfo]
    stat = f'Processing “{current.filename}” ({state.currinfo+1}/{len(state.infos)})...'
    statusmsg(stat)
    # Doing the actual work
    regex = None
    data = state.inzf.read(current)
    if b'sheetProtect' in data:
        regex = r'<sheetProtect.*?/>'
        statusmsg(f'Worksheet "{current.filename}" is protected.')
    elif b'workbookProtect' in data:
        regex = r'<workbookProtect.*?/>'
        statusmsg('The workbook is protected')
    else:
        state.outzf.writestr(current, data)
    if regex:
        text = data.decode('utf-8')
        newtext = re.sub(regex, '', text)
        if len(newtext) != len(text):
            state.outzf.writestr(current, newtext)
            state.worksheets_unlocked += 1
            statusmsg(f'Removed password from "{current.filename}".')
    # Next iteration or next step.
    state.currinfo += 1
    if state.currinfo >= len(state.infos):
        statusmsg('All internal files processed.')
        state.currinfo = None
        root.after(state.interval, step_close_zipfiles)
    else:
        root.after(state.interval, step_filter_internal_file)


def step_close_zipfiles():
    statusmsg(f'Writing “{state.path}”...')
    state.inzf.close()
    state.outzf.close()
    state.inzf, state.outzf = None, None
    root.after(state.interval, step_finished)


def step_finished():
    if state.remove:
        os.chmod(state.remove, stat.S_IWRITE)
        os.remove(state.remove)
        state.remove = None
    else:
        statusmsg('Removing temporary file')
    statusmsg(f'Unlocked {state.worksheets_unlocked} worksheets.')
    statusmsg('Finished!')
    widgets.gobtn['state'] = 'disabled'
    widgets.fn['text'] = ''
    state.path = ''

The remaining functions are some widget callbacks. You can look them up in the source code if you’re interested.

Conclusion

As illustrated above, GUI programs tend to have bigger code sizes then equivalent command-line programs.

First you have to think about what input and output widgets you need and how they should be laid out. Then you have to write the mechanics of your program in such a way that it “fits” into the event-driven nature of a GUI.

Of course there are lots of situations where a GUI is a much better fit for the problem at hand. But if both approaches are possible, I will probably start with a command-line program.


For comments, please send me an e-mail.


Related articles


←  How convenient is multithreading compared to multiprocessing? Have Python log to syslog  →