some background
Getting stuck in the middle of somewhere... Mongolia 2011.

Python and Fuzzy Front End

Introduction

Python is an extremely flexible and powerful language that has earned a lot of attention due to its applicability in both data science, web development as well as for rapid prototyping in general. Especically, if the latter is of concern, you may find yourself in a situation, where you begin the first iteration based only on vague descriptions and almost no establish project architecture. Adding interdicsiplinarity into the picture, developing and integrating interesting features into the application requires creating adaptable code more than ever before.

This post focuses on a few simple, yet powerful concepts dressed in their “pythonic” uniform. More precisely, we are going to take a look at the following:

All of them can make your or your developers’ time more efficient.

Top nicest concepts

Built-in functions

Python offers a bunch of built-in functions, and zip(), map() and enumerate() position themsevles among the most widely used. They are almost the same. However, as many developers tend often to have their own preferences, it is worth knowing these subtle differences in order to understand their intentions behind the code.

zip allows to combine lists together and consequently iterate over combined list.

1
2
3
4
5
list1 = ['one', 'two', 'three']
list2 = ['make', 'me', 'laugh']
zipped = zip(list1, list2)
for a, b in zip(list1, list2):
    print (a, b)

Unzipping can be prerformed using the * operator:

list1, list2 = zip(*zipped)

The returned objects are not tuples. Converting them back to list requires applying the list() function.

In contrary to zip, the enumerate function accepts an iterable object and returns it enumerated.

1
2
3
4
5
6
list1 = ['never', 'say', 'never']
for k in enumerate(list1):
    print (k)
print (" ")
for k, l in enumerate(list1):
    print (k, l)

Finally, map in not particulary python specific. It allows to parse a list of arguments through a function.

1
2
3
4
5
6
def square(x):
    return x**2

numbers = [1, 2, 3, 4]
result = map(square, numbers)
print (list(result))

Oftentimes, map function is used in combination with lambda, which is particularly useful when passing simple functions, without having to declare them explicitly.

As an example, we can combinemap and lambda to calculate the inner product:

1
2
3
vec1 = [1, 2, 3]
vec2 = [1, 2, 3]
inner = sum(list(map(lambda: x1, x2: x1*x2, vec1, vec2)))

or outer product:

1
outer = list(map(lambda x: list(map(lambda y: x*y, v1)), v2))

Still, if you are more into this kind of maths, you should use numpy.

One-liners

One-liners are a fancy way to squash otherwise long if, if-else statement or loops.

Since python blocks are organized using indentations, sometimes too many if’s or for’s create visual mess. If the blocks aren’t particularly long or complex, it is worth considering formatting them using one-line statements.

if a > b: print("a > b")                        # if
print ("a > b") if a > b else print("a <= b")   # if-else
squared = [num**2 for num in some_list]         # for
while n < 10 print(n); n += 1                   # while

Variable number of arguments

Every now and then, you may need to put a variable number of arguments into a function or even the program itself.

At the function level, the problem of handling variable number of parameters can be solved using *args. The following example shows a “generic” function that can sum all the arguments, irrespectively of how many arguments are passed on:

1
2
3
4
5
def add(*args):
    total = 0
    for part in args:
        total += part
    return total

Then, you can calling this function as add(), add(1), add(2, 4) - all will work.

Using keyword arguments requires **kwargs. Passing it into a function, requires each argument’s name as well:

1
2
3
def printage(**kwargs):
    for key, value in kwargs.items():
        print ("{} is {} years old.".format(key, value))

Then calling printage(Bob=12, Alice=14) will return:

Bob is 12 years old.
Alice is 14 years old.

OK, I see these trivial examples, but where is it really used?

Imagine a situation you are on a team, where one of you works on a set of some data processing functions, while the other builds the rest of the program that will use these functions (e.g. loaded dymamically) in order to apply them on the data. What’s more, at the stage of writing, there is no established convention of what arguments will be passed onto them. Each and every function may use totally different set of arguments. By making the functions “open” for to accept variable number of arguments, not only do the functions can be made more generic in terms of code reusablility, but it is also possible to parallelize work.

More details are presented here.

Finally, it is also possible to pass (again: variable) number of arguments into the program itself, thus changing the flow of the program by, for example, execute it in test mode:

myprogram.py

1
2
3
import sys
print ("Number of arguments: {}.".format(len(sys.argv)))
print ("The arguments: {}.".format(str(sys.argv)))

This allows to run the program like this:

$ python myprogram.py first_argument second_argument

Even more sophisticated calls can be made using getopt module. Worth taking a look.

Dynamic imports

Another thing is dynamic import. It is something that can take it really to the next level. Imagine that even the final architecture of the program is not yet quite… certain.

You are building an experimental program, in which you expect it to depend on certain modules. The program is expected to grow as other teams would add more functionality to it, which is however yet to be determined at the time of defining of the overall architecture. The content of the modules is more or less known, but only at a high level. Yet, you would like to ensure that as the program does grow, there is the least amount of changes and adaptations to program itself.

Obviously, there are choices to be made.

One of the easiest, but aslo most harmful choice, would be allow the program to execute the attached code explicitly using exec(open('new_code.py).read()) command directly. It is dangerous (besides being just bloody bad practice) even if you trust the other team members not to inject malicious code. If done inappropriately, you risk that the code kill the program together with the machine that runs it… not good.

Another option is to let the other develop modules for you and prepare the main program to find these modules, import them and use their versions of the functions in your logic. With dynamic import, you can let your part remain almost unchanged.

Let main.py be the main program file, residing under ./main/ directory. Then in ./main/, you have a subdirectory you call mymodules. In that subdirectory, you store a bunch of files named as modulefile_id_X.py, where X is a number, where more can be added later. Then, the main.py may have the following construction:

1
2
3
4
5
6
7
import importlib as impl

def loader(module_id):
    module_name = 'modulefile_id_{}'.format(module_id)
    new_module  = impl.import_module('main.mymodules.' + module_name,
                                    package='main.mymodules')
    return new_module.some_function

This construction is based on the assumption that some_function indeed exists within that module. However, in the contrary to accept different versions of the some_function and make all the adaptaions that follow, you can now agree with the rest of the team that they would give you the whole modules and only expose this very function. If the team makes some_function dependent on some other functions, they have the full freedom to organize their code within the module. All you need to worry about is to ensure that the module is available to you and that you pass the agreed parameters to some_function and get the stuff you want.

Decorators

Finally, perhaps the nicest form of flexibility is so-called decorators. The do make functions “prettier” in a sense, but actually they do make developers’ job easier.

The concept is not specific to python, but it is definitely widely used in python. In python, decorating means to make it possible to dynamically modify a function (or any callable object) without having to alter its original code. Again, it is useful especially when working on an existing project.

Imagine a scenario, in which your collegue is a data scientist and works on some complicated machine learning algorithm or data processing function, while your role is to take “whatever” code that person has created and integrate it into some production software. It is a prototying phase and it is more than likely that sooner or later the program crashes somewhere due to data format issues, tensor dimensions, array sizes, you name it. One way you can reduce the risk is to work closely with that person and give suggestions for testing, assertions or conventions. Even though it is not a bad idea, today’s code will be replaced within a week. Eliminating the problem may seem to be chasing rabbits, especially if you do not want to dig into the details of these functions.

Here, decorators may come in handy.

Let’s assume you are to integrate apply_hard_math in your “production” code.

You can write a decorator that would perform all the checks and/or throw exceptions.

def apply_hard_math(tensor, dataset_name="training"):
    # some complicated math...
    # extracting something called "features"
    return features

Then, you can define wrappers around that function to suppress the trouble.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
def dimension_check(shape):
    def dimensions(func):
        def checker(*args, **kwargs):
            if args[0].shape == shape:
                return func(*args, **kwargs)
            else:
                raise IndexError("Dimension mismatch in <" + func.__name__ + ">.")
        return checker
    return dimensions


def dataset_check(datasets):
    def belongs(func):
        def checker(*args, **kwargs):
            if 'datasets' in kwargs:
                inside_ds = []
                for ds in datasets:
                    inside_ds.append(ds in kwargs['datasets'])
                if not any(inside_ds):
                    raise ValueError("Irrelevant dataset in <" + func.__name__ + ">.")
                else:
                    return func(*args, **kwargs):
            else:
                return func(*args, **kwargs):
        return checker
    return belongs

Decorate apply_hard_math:

dims = (2, 3, 4)                    # example
sets = ['training', 'development']  # example

@dimension_check(dims)
@dataset_check(sets)
def apply_some_math(tensor, **kwargs):
    # complex operations
    # completely unchanged
    return result

And use the function with no fear:

...
output = apply_hard_math(tensor, datasets="training")
...

All that is done here is essentially assertions and testing. There is no magic. However, doing it this way has the following advantages:

  • The code within apply_hard_math remains unchanged.
  • Even if it is changed (e.g. by your data scientist colleague), the assertions stay the the same. Thus, at least in principle, it should integrate well with the rest of the application.
  • The wrappers are agnostic to the function itself, meaning that we can reuse them across different functions within the project.
  • The “assertion/testing” code can be almost totally separated from the “content” code.

It is totally up to you how you use decorators. Here, we illustrated how they can be used for testing and scientific purposes using rather complex examples (passing mutliple parameters and multiple decorators). More fundamental examples can be found here.

Conclusion

By all means these examples are not fully illustrative to everything python can do. However, when facing fuzzy front end decisions have to be made since time is of the essence. Whenver python helps, it definitely does not hurt to help python a bit too!