[EN] Tutorial 2: DNN Network Training Instruction

  • DNN Network Training Instruction


    Before we start this tutorial, we assume that users have already read the Tutorial 1: Corerain Rainman V3 User Guide, and familiar with the basic principles and operation of the Rainman V3 development kit.

    In this tutorial, we will introduce the basic knowledge of algorithm and show you how to train a target detection(DNN) with your own dataset. The trained model is the input of the compiler Plumber(which is going to be shown in next tutorial), and the model will be parsed, converted by plumber. The results from plumber can be accelerated by hardware.

    Target detection is a classic problem in machine learning. The goal is to address and classify objects in a picture. The position of the object is represented by coordinates with upper left and lower right corners of a rectangular frame. Taking face recognition as an example, we input a picture, the result of the target detection is the position faces in this picture and the classification of the face (category is 1 if the object is a face, otherwise it will be 0 which means non-face).

    The result of a face detection is as follows:

    The deep learning algorithm has mature applications in target detection. The main framework is divided into One-Shot Detector and Two-Shot Detector. One-Shot is significantly faster than Two-Shot, while maintaining the same accuracy, making it even more practical for industrial applications.
    We use the classic Single-Shot Multibox Detector (SSD) in this tutorial. You can refer to the link below for the specific principle:

    The code of this tutorial has been modified based on the 3rd-party implementation of github. For more explanation, please refer to:

    0. Preparation for DNN algorithm training:

    1. A Host/PC/Server with GPU
      Deep learning algorithms usually take a long time (several hours or days) to train, and it takes much longer to be train in CPU, so we recommend to use GPU.
    2. Data annotation and training
      Data is the basis of deep learning. We usually regard deep learning as a black box algorithm. The deep learning network constantly tries to fit network parameters which can generate designed output. For object detection problems, the user needs to provide the location and the classification of the objects in a picture that need to be detected.

    Install GPU-based version docker (if you have GPU):

    sudo docker run --runtime=nvidia --name plumber -dti brucvv/plumber

    Install CPU-based version docker (if you don't have GPU):

    sudo docker run --name plumber -dti brucvv/plumber:cpu_1.2

    1. Data preparation

    This step converts the input original image and annotation into the record data format of the Tensorflow framework.

    Run under the /app/detection path:


    Parameter example

    --dataset_name = dataset name 
    --dataset_dir = root directory of training data
    --output_dir = storage path of converted data
    --shuffle = True(Whether to disrupt the data order, default to disrupt)

    3 files will be generated, xxx is the name of dataset dataset_name:

    We assume the data and data annotation file (img_label.txt) are in the same folder (in this example, the database folder is /app/imagetxt/, which contains 19 images and all the images are labeled with the face position).
    The data annotation file format is as follows:

      image name \
      number of object in the_image \
      object 1: the label_id corresponding to object 1 \
      2(Predefined field) \
      top left corner x coordinate \
      y coordinate \
      bottom right corner x coordinate \
      y_coordinate \
      object N: the label_id corresponding to object N \
      2(Predefined field) \
      top left corner x coordinate \
      y coordinate \
      bottom right corner x coordinate \
      y_coordinate \

    Here x is the image width direction and y is the image height direction


    1530697831_1.jpg 2 face:1 2 414.0 207.0 536.0 304.0 face:1 2 234.0 207.0 398.0 390.0

    Users can define their own data types and write corresponding data conversion code.

    Note: data annotation file img_label.txt is only labeled with one face position , so the trained algorithm of this tutorial will only detect one face in a image. User can take it as a practice, tries to add a new face positon to the annotation file and train the algorithm to achieve multiple face detection in one image.

    2. Train the model

    Train the model with TensorFlow framework.

    Run under the /app/detection path


    Parameter list

    --train_dir = the storage directory of the trained model
    --dataset_name = dataset name (same with step 1 data preparation)
    --dataset_dir = root directory of training data (same with step 1 data preparation)
    --model_name = model structure definition (added in nets and registered in nets_factory)
    --save_summaries_secs = summary storage interval seconds
    --save_interval_secs = model snapshot storage interval description
    --batch_size = adjust this para according to actual GPU,reduce batch_size if reports CUDA_OUT_OF_MEMORY

    The most important step here is to define network model model_name, and the model definition file is in the nets folder.


    • In general, the more data and the fewer annotation errors, the better training results. In this example, we only provide 19 images for training. They are only used to test verify overall processing. Please use the trained images to verify the test correctness.
    • It usually takes hours to train the model. We can decide whether to execute the subsequent operation by observing the loss and whether the corresponding ckpt file is saved in train_dir.

    An example of actual training is given below.

    INFO:tensorflow:global_step/sec: 0
    INFO:tensorflow:global step 10: loss = 83.2801 (0.059 sec/step)
    INFO:tensorflow:Recording summary at step 14.
    INFO:tensorflow:global step 20: loss = 70.6828 (0.195 sec/step)
    INFO:tensorflow:global step 30: loss = 72.3244 (0.215 sec/step)
    INFO:tensorflow:global step 40: loss = 69.1671 (0.239 sec/step)
    ... ...
    INFO:tensorflow:global step 5010: loss = 0.1913 (0.245 sec/step)
    INFO:tensorflow:global step 5020: loss = 0.2235 (0.251 sec/step)
    INFO:tensorflow:global step 5030: loss = 0.2603 (0.239 sec/step)
    INFO:tensorflow:Saving checkpoint to path model.ckpt # intermediate results have been saved

    As the training progresses, the loss will gradually decrease. When the long-term loss only shows small numerical fluctuations, but the overall trend keeps constant, we can treat model training has completed (convergence).


    • In this example, since the number of images is small, it is very easy to converge and the loss will drop to a very low value. In a general two-classification training (the number of training sample is more than 10k+), when the loss value reaches 4 to 6, we can regard the training is basically complete.

    3. Model test

    Test the performance of trained model with GPU/CPU.

    Run under the /app/detection path


    Parameter list

    --_model_key = model structure definition(same with step 2)
    --_model_path = the storage directory of the trained model(same with step 2)
    --_img_path = path of input image
    --_out_img_path = path of output image(default)
    --num_classes = Number of test classifications

    Note: the number of classifications = 1 + total number of identification classifications, for example, the training sample contains both pedestrian and vehicle objects, then the number of classifications = 3

    4. Generate post-processing parameter

    Generate required post-processing parameter file (parameters about Anchor in SSD) for subsequent board inference. For the concept of Anchor, refer to the Faster-RDNN algorithm.
    Run under the /app/detection path.

    ./script/ 4_gen_PostParam.sh
    python post_gen.py   model structure definition(same with step 2)   post-processing parameter outpot directory

    5. Export the inference model

    Frozen the trained model as pb file, which is suitable for deploying in the inference terminal. In TensorFlow, .pb holds the structure and variable names of the model network, and it also stores the values of all variables.

    Run under the /app/detection path

    python export_inference_graph.py   model structure definition   The storage directory of the trained model   model output directory

    The final output of this chapter is the input of the subsequent compilation step using plumber. One goal of compiling trained model is to remove the nodes specified for training in the TensorFlow graph.