View on GitHub

Object Detection in Videos with Deep Neural Networks

TUBITAK 3501 Project

A Summary of the Project

Object detection is the problem of labeling and locating the objects in a given image. Modern object detection methods solve the problem in two stages, which we can call as “search” and “recognition”. In the search phase, candidates for objects are determined independently of the class, and in the recognition phase, their classes are estimated.

Our project and our contributions focused on two key challenges:

(1) Context in object detection: Using global and local context in an end-to-end deep learning system to increase the performance of the recognition phase in object detection in images and videos; and a deep neural network-based “generalized Hough transform” method was developed as an alternative to the “object proposal” methods corresponding to the search phase. It has been shown that when our method is applied to object detection problems in images and videos, it successfully can exploit contextual information and produces better results than baseline methods.

(2) Object detection in videos with the referring expressions: A new dataset was created for object search with referring expressions (e.g. “blue car on the right”) in videos; and we have developed new methods for searching objects with referring expressions in this dataset. It has been shown that the methods we have developed are very successful in detecting objects corresponding to complex referring expressions and in generating the most appropriate referring expression for two selected objects in a video.

Project Members

Project’s Academic Contributions

Publications

Theses

Completed:

Ongoing:

Workshop

Invited Talks

Models and Methods Developed within the Project

HoughNet

A method for object detection in images using context and generalized hough transform.

The corresponding paper: Samet, N., Hicsonmez, S., & Akbas, E. (2020). HoughNet: Integrating near and long-range evidence for bottom-up object detection. In European Conference on Computer Vision (pp. 406-423). link

PPDet

A method for reducing label noise in anchor-free object detectors.

The corresponding paper: Samet, N., Hicsonmez, S., & Akbas, E. (2020). Reducing Label Noise in Anchor-Free Object Detection. British Machine Vision Conference (BMVC). link

VIREF

A method and a dataset for object search in videos using referring expressions.

The corresponding paper: Anayurt, H., Ozyegin, S. A., Cetin, U., Aktas, U., & Kalkan, S. (2019). Searching for ambiguous objects in videos using relational referring expressions. British Machine Vision Conference (BMVC). link

aLRP Loss

A novel ranking based loss function for object detection.

The corresponding papers:

LRP Error

A novel evaluation metric for visual detection problems.

The corresponding papers:

Contact

Emre Akbas, Dept. of Computer Engineering, METU

Last updated on March 29, 2021.