[EN] Tutorial 2: DNN Network Training Instruction
Corerain last edited by Corerain
- DNN Network Training Instruction
Before we start this tutorial, we assume that users have already read the Tutorial 1: Corerain Rainman V3 User Guide, and familiar with the basic principles and operation of the Rainman V3 development kit.
In this tutorial, we will introduce the basic knowledge of algorithm and show you how to train a target detection(DNN) with your own dataset. The trained model is the input of the compiler
Plumber(which is going to be shown in next tutorial), and the model will be parsed, converted by plumber. The results from plumber can be accelerated by hardware.
Target detection is a classic problem in machine learning. The goal is to address and classify objects in a picture. The position of the object is represented by coordinates with upper left and lower right corners of a rectangular frame. Taking face recognition as an example, we input a picture, the result of the target detection is the position faces in this picture and the classification of the face (category is
1if the object is a face, otherwise it will be
0which means non-face).
The result of a face detection is as follows:
The deep learning algorithm has mature applications in target detection. The main framework is divided into One-Shot Detector and Two-Shot Detector. One-Shot is significantly faster than Two-Shot, while maintaining the same accuracy, making it even more practical for industrial applications.
We use the classic Single-Shot Multibox Detector (SSD) in this tutorial. You can refer to the link below for the specific principle:
The code of this tutorial has been modified based on the 3rd-party implementation of github. For more explanation, please refer to:
- A Host/PC/Server with GPU。
Deep learning algorithms usually take a long time (several hours or days) to train, and it takes much longer to be train in CPU, so we recommend to use GPU.
- Data annotation and training。
Data is the basis of deep learning. We usually regard deep learning as a black box algorithm. The deep learning network constantly tries to fit network parameters which can generate designed output. For object detection problems, the user needs to provide the location and the classification of the objects in a picture that need to be detected.
Install GPU-based version docker (if you have GPU):
sudo docker run --runtime=nvidia --name plumber -dti brucvv/plumber
Install CPU-based version docker (if you don't have GPU):
sudo docker run --name plumber -dti brucvv/plumber:cpu_1.2
This step converts the input original image and annotation into the
recorddata format of the Tensorflow framework.
Run under the
--dataset_name = dataset name --dataset_dir = root directory of training data --output_dir = storage path of converted data --shuffle = True(Whether to disrupt the data order, default to disrupt)
3 files will be generated, xxx is the name of dataset
We assume the data and data annotation file (img_label.txt) are in the same folder (in this example, the database folder is
/app/imagetxt/, which contains 19 images and all the images are labeled with the face position).
The data annotation file format is as follows:
image name \ number of object in the_image \ object 1: the label_id corresponding to object 1 \ 2(Predefined field) \ top left corner x coordinate \ y coordinate \ bottom right corner x coordinate \ y_coordinate \ ...... object N: the label_id corresponding to object N \ 2(Predefined field) \ top left corner x coordinate \ y coordinate \ bottom right corner x coordinate \ y_coordinate \
Here x is the image width direction and y is the image height direction
1530697831_1.jpg 2 face:1 2 414.0 207.0 536.0 304.0 face:1 2 234.0 207.0 398.0 390.0
Users can define their own data types and write corresponding data conversion code.
Note: data annotation file
img_label.txtis only labeled with one face position , so the trained algorithm of this tutorial will only detect one face in a image. User can take it as a practice, tries to add a new face positon to the annotation file and train the algorithm to achieve multiple face detection in one image.
Train the model with TensorFlow framework.
Run under the
--train_dir = the storage directory of the trained model --dataset_name = dataset name (same with step 1 data preparation) --dataset_dir = root directory of training data (same with step 1 data preparation) --model_name = model structure definition (added in nets and registered in nets_factory) --save_summaries_secs = summary storage interval seconds --save_interval_secs = model snapshot storage interval description --batch_size = adjust this para according to actual GPU,reduce batch_size if reports CUDA_OUT_OF_MEMORY
The most important step here is to define network model
model_name, and the model definition file is in the
- In general, the more data and the fewer annotation errors, the better training results. In this example, we only provide 19 images for training. They are only used to test verify overall processing. Please use the trained images to verify the test correctness.
- It usually takes hours to train the model. We can decide whether to execute the subsequent operation by observing the loss and whether the corresponding ckpt file is saved in train_dir.
An example of actual training is given below.
INFO:tensorflow:global_step/sec: 0 INFO:tensorflow:global step 10: loss = 83.2801 (0.059 sec/step) INFO:tensorflow:Recording summary at step 14. INFO:tensorflow:global step 20: loss = 70.6828 (0.195 sec/step) INFO:tensorflow:global step 30: loss = 72.3244 (0.215 sec/step) INFO:tensorflow:global step 40: loss = 69.1671 (0.239 sec/step) ... ... INFO:tensorflow:global step 5010: loss = 0.1913 (0.245 sec/step) INFO:tensorflow:global step 5020: loss = 0.2235 (0.251 sec/step) INFO:tensorflow:global step 5030: loss = 0.2603 (0.239 sec/step) INFO:tensorflow:Saving checkpoint to path model.ckpt # intermediate results have been saved
As the training progresses, the loss will gradually decrease. When the long-term loss only shows small numerical fluctuations, but the overall trend keeps constant, we can treat model training has completed (convergence).
- In this example, since the number of images is small, it is very easy to converge and the loss will drop to a very low value. In a general two-classification training (the number of training sample is more than 10k+), when the loss value reaches 4 to 6, we can regard the training is basically complete.
Test the performance of trained model with GPU/CPU.
Run under the
--_model_key = model structure definition(same with step 2) --_model_path = the storage directory of the trained model(same with step 2) --_img_path = path of input image --_out_img_path = path of output image(default) --num_classes = Number of test classifications
Note: the number of classifications = 1 + total number of identification classifications, for example, the training sample contains both pedestrian and vehicle objects, then the number of classifications = 3
Generate required post-processing parameter file (parameters about Anchor in SSD) for subsequent board inference. For the concept of Anchor, refer to the Faster-RDNN algorithm.
Run under the
python post_gen.py model structure definition(same with step 2) post-processing parameter outpot directory
Frozen the trained model as
pbfile, which is suitable for deploying in the inference terminal. In TensorFlow,
.pbholds the structure and variable names of the model network, and it also stores the values of all variables.
Run under the
python export_inference_graph.py model structure definition The storage directory of the trained model model output directory
The final output of this chapter is the input of the subsequent compilation step using plumber. One goal of compiling trained model is to remove the nodes specified for training in the TensorFlow graph.