This code repository contains an implementation of our ECCV video detection work. If you use this code, please cite:
The following installation procedure is tested under:
Ubuntu 16.04 CUDA 9.0 Torch 7
- Create a directory that we call
$ROOT(here we set
- Go to
$ROOTand clone this repo. You should see
$ROOT/STMNafter this command.
cd $ROOT git clone https://github.com/fanyix/STMN.git
Download ImageNet VID dataset (ILSVRC 2015) and unzip it under
$ROOT/dataset/ImageNetVID. Note that you need to download both
ILSVRC2015_VID_final.tar.gz. As a sanity check, you should be able to see
$ROOT/dataset/ImageNetVID/models. Download pre-trained models into this directory. Specifically, here are some models you might want to use: 1) The first one is our pre-trained STMN model. 2) The second one is the pre-trained RFCN model. 3) You also need the ImageNet classification pre-trained ResNet-101 model.
Download ImageNet DET dataset into
$ROOT/dataset/ImageNetDET. Then download from here the proposal and annotation files we prepared, and unzip it into the ImageNet DET directory.
After these steps, you should be expecting a code/data structure like the following:
$ROOT - STMN - dataset - ImageNetVID - Data - VID - train - val - test - exp - anno - train.t7 - val.t7 - test.t7 - proposals - train - val - test - models - stmn.t7 - rfcn.t7 - resnet-101.t7 - ImageNetDET - Annotations - ImageSets - Data - DET - train - val - exp - annotations - proposals
- Install Lua Torch following the instructions on
Note if you are using CUDA 9.0, you probably need this to solve a compilation issue regarding half precision operation.
- Install necessary libraries for Torch
luarocks install torchnet luarocks install optnet luarocks install inn luarocks install class
- Install hdf5 for Torch
sudo apt-get install libhdf5-serial-dev hdf5-tools git clone https://github.com/deepmind/torch-hdf5 cd torch-hdf5 luarocks make hdf5-0-0.rockspec LIBHDF5_LIBDIR="/usr/lib/x86_64-linux-gnu/"
- Install matio for Torch
sudo apt-get install libmatio2 luarocks install matio
- Install utilities for R-FCN, MatchTrans and STN modules
cd $ROOT/STMN/modules/rfcn luarocks make rfcn-0.1-0.rockspec cd $ROOT/STMN/modules/assemble luarocks make assemble-0.1-0.rockspec cd $ROOT/STMN/modules/stnbhwd luarocks make stnbhwd-scm-1.rockspec
- Install Lua API for COCO dataset
cd $ROOT/STMN/external/coco luarocks make LuaAPI/rocks/coco-scm-1.rockspec
[Optional] Install the temporal linkage code modified from D&T (MATLAB required). Note you will need this if you want to reproduce our results on ImageNet VID. Go to
[Optional] Finally, it seems CUDNN 7.0 is not working well with CUDA 9.0 for Torch, the following GitHub issue solves this problem: https://github.com/soumith/cudnn.torch/issues/383
- To train an RFCN model, go to
$ROOT/STMNand run the following (we use 2 V100 GPUs for training):
CUDA_VISIBLE_DEVICES=0,1 th train_video.lua -model rfcn -ckpt rfcn
- To train an STMN model, go to
$ROOT/STMNand run the following:
CUDA_VISIBLE_DEVICES=0,1 th train_video.lua -model stmn -ckpt stmn
- To generate detections with the pre-trained RFCN model, go to
$ROOT/STMN/scriptsand run the following:
CUDA_VISIBLE_DEVICES=0 th eval_detect_full.lua -model rfcn -model_path ../../dataset/ImageNetVID/models/rfcn.t7 -ckpt rfcn_eval
- To generate detections with the pre-trained STMN model, go to
$ROOT/STMN/scriptsand run the following:
CUDA_VISIBLE_DEVICES=0 th eval_detect_full.lua -model stmn -model_path ../../dataset/ImageNetVID/models/stmn.t7 -ckpt stmn_eval
Please note that above commands are examples following which you can reproduce our results, however it will be slow due to the sheer amount of frames you need to evaluate. Instead in our own experiments we always parallelize the above procedure with the help of the launch script provided in
scripts/launcher.py. We highly encourage you to take a look at this script and parallelize this procedure like we do.
Okay, after you're done with both commands shown above, you should have produced the raw detection results (without NMS) which we will then send to the temporal linkage procedure to generate our final detections. For this, we base our code on the brilliant code of D&T (however it does require a MATLAB license to use this code) and make some modifications to only use its dynamic programming functionality.
- To generate final detections with temporal linkage, go to
[Optional] Again, you will reproduce our results (80.5% mAP) with the above command, however it might be slow to go over the evaluation set. To assist you in this process, we also provide some parallelization utilities in
run_dp.m and a launch script
$ROOT/STMN/external/dp (note this is a different launch script than the one we used above under
$ROOT/STMN/scripts/). Specifically, you first set the
true and launch it with
$ROOT/STMN/external/dp/launcher.py. Then you set
true and run the script again in a MATLAB console.
We develop this codebase from the great code of multipathnet.