JSON in context of machine-learining.
Intoduction
In the previous post, we have considered JSONs to be useful candidates for storing and passing of hyperparameters. Since there is a lot to it, let's have a closer look at how JSONs can be stored and passed between parts of the system.
Why playing JSON?
Before moving on, let's just quickly explain why is it worth to even consider using JSONs for passing of the hyperparameters?
First of all, a neural network pipeline is likely to be composed of several modules or stages that do completely different things (e.g. interfacing databases, feature extraction, grid search, etc.). Obviously, it is possible to pass all parameters (including hyperparameters as normal functions' or objects' arguments. However, working with Big Data involves plenty of complex operations of different kind. Therefore, in order to make the code understandable one has to early agree upon a specific naming conventions for the parameters. Not only it may not be plausible to create such convention at an early stage, especially when still experimenting, but also makes the code cluttered and difficult to maintain.
JSON offers a consistent way for formatting the hyperparameter input, which is both flexible and simple to use. More specifically:
- Being essentially a string, it can be stored as a whole in a file or in as a single database entry.
- It can be passed as an URL when system is distributed.
- Python can easily convert it to a dictionary, where parameters can be called by keywords.
- Just like a dictionary, its structure allows nesting that can greatly help to organize the hyperparameters.
Loading hyper-parameters in practice
As you might have noticed, I use the emphasis on the word hyper, but the approach is perfectly extendable for parameters that are not strictly considered hyperparameters in machine learning sense. They can, for example, refer to a particular SQL query through which the pipeline requests a specific subset of sample data.
In practice it is useful to create a dedicated class to deal with the parameters.
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
import json
class Param():
"""
Auxiliary class for supplying of hyperparameters.
"""
def __init__(self, filepath='./params/hyper.json'):
self.h = {
"BATCH_SIZE" : 100,
"LEARNING_RATE" : 0.001,
"GRID_SEARCH" : "False",
}
try:
p = {}
with open(filepath) as f:
p = json.load(f)
except IOError:
print ("File not found.")
except ValueError:
print ("File could not be read.")
else:
for key in p:
self.h = p[key]
print ("Hyperparameters loaded OK.")
Shipping JSON over a network
Another likely scenario may involve supplying of the hyperparameters over a network. Let's assume a situation in which we run the computation on one machine, but the way these computations are executed will depend on some configurations. In this case, we can go one way, which is to update the existing file or we could take advantage of networking, and send all configurations through url.
If the server is set to expect the configurations to be sent to some specific url,
we can define a very simple way for sending of the hyperparameters.
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
import json
import httplib2 import Http
param = Param()
config = param.h
packet = json.dumps(config)
h = Http()
try:
dest_IP = "127.0.0.1" # specify the correct address
dest_port = 1234 # specify the correct port
dest_url = 'accept/hyperparams' # specify the correct url
destination = "http://{}:{}/{}".format(dest_IP, dest_port, dest_url)
resp, content = h.request(destination, "POST", packet)
except ConnectionRefusedError:
print ("Connection Refused!")
else:
if (content == "success"):
print ("Parameters accepted.") # do something if accepted
else (content == "failure"):
print ("Parameters rejected.") # do something else if rejected
Param
class from before and they reside in self.h as dictionary. After converting it
to JSON again, we use the httplib2 module to ship it to a designated url.
If the server "knows" what to do with it, it is a quick way for configuring of the
calculations.
...with attachment
An even more "sophisticated" situation may arise, in which we are not only concerned about sending of the parameters alone, but would like to send a sample file, for example. Doing this is straightforward and requires only one more step: encoding.
In order to transfer a file through an url, we need to somehow convert it to text,
which can be transmitted as url itself. For that we can use the well-known base64
encoding.
1
2
3
4
5
6
7
8
9
import base64
with open('somefile.dat', 'rb') as f:
file_content = f.read()
byte_content = base64.b64encode(file_content)
str_content = byte_content.decode('utf-8')
param.h['somefile'] = str_content
packet = json.dumps(some_dict)
.decode
function, which is to convert it to ASCII string.
Decoding it is not difficult either. It is kind-of applying reverse operations, backwards.
1
2
3
4
5
6
7
packet_as_dict = json.loads(packet)
retrieved_str = packet_as_dict['somefile']
retrieved_byte = bytes(retrieved_str, 'utf-8')
decoded_str = base64.decodestring(retrieved_byte)
with open('samefile.dat', 'wb') as f:
f.write(decoded_str)
print ("File saved.")
Final remark
Since we touched upon networking problem, perhaps it would be useful to add that just sending data like this (especially the files) imposes potential risk of someone sending harmful content. Therefore, depending on the actual implementation, it may be a good idea to use encryption or signing of the transmitted packets. A simple trick to keep the pipeline safe.