Binary Classification with Artificial Neural Networks using Python and TensorFlow

Sat, 09 Dec 2017

Classification, Computer Science, Data Science, Data Visualization, Machine Learning, Python, Tensorflow

This post is an introduction to using the TFANN module for classification problems. The TFANN module is available here on GitHub. The name TFANN is an abbreviation for TensorFlow Artificial Neural Network. TensorFlow is an open-source library for data flow programming. Due to the nature of computational graphs, using TensorFlow can be challenging at times. The TFANN module provides several classes that allow for interaction with the TensorFlow API using familiar object-oriented programming paradigms.

The Setup

First, the required modules are installed and imported. This code requires numpy, tensorflow, and TFANN.

import numpy as np
from TFANN import ANNC

TFANN can be installed via pip or by copying the source code into a file named TFANN.py.

pip install TFANN

Next, an \((n, 2)\) matrix of random data points \(\textbf{A}\) is generated using numpy. Class labels are created following a polynomial inequality. The polynomial used is

\(F(x, y) = -x^{2}+0.1x-0.6y+0.20\).

The inequality used to generate class labels is

\(F(x, y) > 0\).

The equation \(F(x,y)=0\) is a downwards facing parabola centered on the y axis. Points below the parabola satisfy the inequality and are labeled as 1. Points above the curve are labeled as 0. Code to generate the data matrix and class labels follows.

def F(x, y):
    return - np.square(x) + .1 * x - .6 * y + .2

#Training data
A1 = np.random.uniform(-1.0, 1.0, size = (1024, 2))
Y1 = (F(A1[:, 0], A1[:, 1]) > 0).astype(np.int)
#Testing data
A2 = np.random.uniform(-1.0, 1.0, size = (1024, 2))
Y2 = (F(A2[:, 0], A2[:, 1]) > 0).astype(np.int)

The function curve is shown in Figure 1 along with a scatter plot of the generated data matrix.

PolyIneq

Figure 1: The Generated Data

The color indicates the value of \(F(x,y)\) and the curve is \(F(x,y)=0\). The same plot colored instead with class labels in shown in Figure 2.

Figure 2: Generated Data with Class Labels

As can be seen above, the data is divided into two classes: 0 and 1. The goal is to create a model which can determine if a data point belongs to class 0 or to class 1. This is known as binary classification as there are two class labels.

Multi-Layer Perceptron Classification

Next, a multi-layer perceptron (MLP) network is fit to the data generated earlier. In this example, the function used to generate class labels is known. This is typically not the case. Instead, the model iteratively updates its parameters so as to reduce the value of a loss function.

A two layer MLP is constructed. The activation function tanh is used after the first hidden layer and the output layer uses linear activation (no activation function). The architecture of the network is illustrated in Figure 3.

ANNNA

Figure 3: MLP Network Architecture

The green dots on the neurons in the hidden layer indicate tanh activation. Next, this network architecture is specified in a format that TFANN accepts and an ANN classifier is constructed.

NA = [('F', 4), ('AF', 'tanh'), ('F', 2)]

The list of tuples is the network architecture. F indicates a fully-connected layer and the following number indicates the number of neurons in the layer. AF indicates an activation function and the following string indicates the name of the function. As can be seen, the network architecture specifies a fully-connected layer with 4 neurons which is followed by tanh which is followed by another fully-connected layer with 2 neurons. The final layer is the output layer.

The docstring for the _CreateANN function provides detailed information on the types of network operations that are currently supported by TFANN.

In [109]: help(TFANN._CreateANN)
Help on function _CreateANN in module TFANN:

_CreateANN(PM, NA, X)
    Sets up the graph for a convolutional neural network from 
    a list of operation specifications like:
    
    [('C', [5, 5, 3, 64], [1, 1, 1, 1]), ('AF', 'tanh'), 
     ('P', [1, 3, 3, 1], [1, 2, 2, 1]), ('F', 10)]
    
    Operation Types:
    
        AF:     ('AF', <name>)  Activation Function 'relu', 'tanh', etc
        C:      ('C', [Filter Shape], [Stride])     2d Convolution
        CT:     2d Convolution Transpose
        C1d:    ('C1d', [Filter Shape], Stride)     1d Convolution
        C3d:    ('C3d', [Filter Shape], [Stride])   3d Convolution
        D:      ('D', Probability)                  Dropout Layer
        F:      ('F', Output Neurons)               Fully-connected
        LRN:    ('LRN')                             
        M:      ('M', Dims)                         Average over Dims
        P:      ('P', [Filter Shape], [Stride])     Max Pool
        P1d:    ('P1d', [Filter Shape], Stride)     1d Max Pooling
        R:      ('R', shape)                        Reshape
        S:      ('S', Dims)                         Sum over Dims
    
    [Filter Shape]: (Height, Width, In Channels, Out Channels)
    [Stride]:       (Batch, Height, Width, Channel)
    Stride:         1-D Stride (single value)
    
    PM:     The padding method
    NA:     The network architecture
    X:      Input tensor

The final layer of a classification network requires that class labels be encoded as 1-hot vectors along the final axis of the output. Since the network predicts a single binary class label for each sample, the final layer should have 2 neurons. In this way, the final layer outputs a matrix of dimension \((n, 2)\). The function argmax is applied along the final dimension of the output matrix to obtain the index of the class label.

Next the network architecture is passed to the constructor of the ANNC class, along with the input shape and other parameters. ANNC is an abbreviation for Artificial Neural Network for Classification.

annc = ANNC(A1.shape[1:], NA, batchSize = 1024, 
       maxIter = 4096, learnRate = 1e-3, verbose = True)

The first arguments to the ANNC constructor is the shape of a single input sample. In this case, the shape is a vector of length 2. The batchSize argument indicates the number of samples to use at a time when training the network. The batch indices are selected randomly for each training iteration. The learnRate parameter specifies the learning rate used by the training method (which is the adam method by default). The maxIter argument limits the number of training iterations to some fixed amount. Finally, verbose controls whether the loss is displayed after each iteration of training. Detailed descriptions for the constructor arguments are available using help(ANNC).

TFANN follows the fit, predict, score interface used by scikit-learn. Thus, fitting and scoring the network can be accomplished as follows.

annc.fit(A1, Y1)            #Fit using training data only
s1 = annc.score(A1, Y1)     #Performance on training data
s2 = annc.score(A2, Y2)     #Testing data
print('Train: {:04f}\tTest: {:04f}'.format(s1, s2))
YH = annc.predict(A2)       #Predicted labels

The score method uses accuracy as the metric for classification models. This is the number of samples labeled correctly divided by the number of samples. Some care should be used with this metric in problems where class labels are imbalanced.

Results

Due to the simple nature of the problem, the network is able to achieve very high accuracy on the cross-validation data. After 4096 iterations, the network achieves roughly 98% accuracy. The predictions on the testing data are plotted below in Figure 4.

PolyPred

Figure 4: Model Cross-Validation Predictions (Accuracy = 98.4%)

The reader is encouraged to modify the data, network architecture, and parameters to explore the features provided by TFANN.