We propose a dense image prediction out-of-distribution detection algorithm, called PixOOD, which does not require training on samples of anomalous data and is not designed for a specific application which avoids traditional training biases. In order to model the complex intra-class variability of the in-distribution data at the pixel level, we propose an online data condensation algorithm which is more robust than standard K-means and is easily trainable through SGD. We evaluate PixOOD on a wide range of problems. It achieved state-of-the-art results on four out of seven datasets, while being competitive on the rest.
2023
Calibrated Out-of-Distribution Detection with a Generic Representation
Tomáš Vojı́ř, Jan Šochman, Rahaf Aljundi, and Jiřı́ Matas
In Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV) Workshops, Oct 2023
Out-of-distribution detection is a common issue in deploying vision models in practice and solving it is an essential building block in safety critical applications. Existing OOD detection solutions focus on improving the OOD robustness of a classification model trained exclusively on in-distribution (ID) data. In this work, we take a different approach and propose to leverage generic pre-trained representations. We first investigate the behaviour of simple classifiers built on top of such representations and show striking performance gains compared to the ID trained representations. We propose a novel OOD method, called GROOD, that achieves excellent performance, predicated by the use of a good generic representation. Only a trivial training process is required for adapting GROOD to a particular problem. The method is simple, general, efficient, calibrated and with only a few hyper-parameters. The method achieves state-of-the-art performance on a number of OOD benchmarks, reaching near perfect performance on several of them. The source code is available at this https URL.
2021
Monocular Arbitrary Moving Object Discovery and Segmentation
Michal Neoral, Jan Šochman, and Jiří Matas
In The 32nd British Machine Vision Conference – BMVC 2021, Oct 2021
We propose a method for discovery and segmentation of objects that are, or their parts are, independently moving in the scene. Given three monocular video frames, the method outputs semantically meaningful regions, i.e. regions corresponding to the whole object, even when only a part of it moves.
The architecture of the CNN-based end-to-end method, called Raptor, combines semantic and motion backbones, which pass their outputs to a final region segmentation network. The semantic backbone is trained in a class-agnostic manner in order to generalise to object classes beyond the training data. The core of the motion branch is a geometrical cost volume computed from optical flow, optical expansion, mono-depth and the estimated camera motion.
Evaluation of the proposed architecture on the instance motion segmentation and binary moving-static segmentation problems on KITTI, DAVIS-Moving and YTVOS-Moving datasets shows that the proposed method achieves state-of-the-art results on all the datasets and is able to generalise well to various environments. For the KITTI dataset, we provide an upgraded instance motion segmentation annotation which covers all moving objects. Dataset, code and models are available on the github project page github.com/michalneoral/Raptor.
Two optical flow estimation problems are addressed: (i)
occlusion estimation and handling, and (ii) estimation from image
sequences longer than two frames. The proposed ContinualFlow method
estimates occlusions before flow, avoiding the use of flow corrupted by
occlusions for their estimation. We show that providing occlusion masks
as an additional input to flow estimation improves the standard performance metric by more than 25% on both KITTI and Sintel. As a
second contribution, a novel method for incorporating information from
past frames into flow estimation is introduced. The previous frame flow
serves as an input to occlusion estimation and as a prior in occluded
regions, i.e. those without visual correspondences. By continually using
the previous frame flow, ContinualFlow performance improves further by
18% on KITTI and 7% on Sintel, achieving top performance on KITTI
and Sintel.
2013
Robust abandoned object detection integrating wide area visual surveillance and social context
James Ferryman, David Hogg, Jan Sochman, Ardhendu Behera, and
9 more authors
This paper presents a video surveillance framework that robustly and efficiently detects abandoned objects in surveillance scenes. The framework is based on a novel threat assessment algorithm which combines the concept of ownership with automatic understanding of social relations in order to infer abandonment of objects. Implementation is achieved through development of a logic-based inference engine based on Prolog. Threat detection performance is conducted by testing against a range of datasets describing realistic situations and demonstrates a reduction in the number of false alarms generated. The proposed system represents the approach employed in the EU SUBITO project (Surveillance of Unattended Baggage and the Identification and Tracking of the Owner).
2010
Interpreting Structures in Man-made Scenes - Combining Low-Level and High-Level Structure Sources
Kasim Terzic, Lothar Hotz, and Jan Sochman
In Proceedings of the 2nd International Conference on Agents and Artificial Intelligence - Volume 1: ICAART, Oct 2010
Computation time is an important performance characteristic of computer
vision algorithms. The paper shows how existing (slow) binary decision
algorithms can be approximated by a (fast) trained WaldBoost classifier.
WaldBoost learning minimises the decision time of the classifier
while guaranteeing predefined precision. We show that the WaldBoost
algorithm together with bootstrapping is able to efficiently handle
an effectively unlimited number of training examples provided by
the implementation of the approximated algorithm.
Two interest point detectors, the Hessian-Laplace and the Kadir-Brady
saliency detectors, are emulated to demonstrate the approach. Experiments
show that while the repeatability and matching scores are similar
for the original and emulated algorithms, a 9-fold speed-up for
the Hessian-Laplace detector and a 142-fold speed-up for the Kadir-Brady
detector is achieved. For the Hessian-Laplace detector, the achieved
speed is similar to SURF, a popular and very fast handcrafted modification
of Hessian-Laplace; the WaldBoost emulator approximates the output
of the Hessian-Laplace detector more precisely.
2008
Training Sequential On-line Boosting Classifier for Visual Tracking
H. Grabner, J. Šochman, H. Bischof, and J. Matas
In 19th International Conference on Pattern Recognition, Jun 2008
On-line boosting allows to adapt a trained classifier
to changing environmental conditions or to use sequentially
available training data. Yet, two important problems
in the on-line boosting training remain unsolved:
(i) classifier evaluation speed optimization and, (ii) automatic
classifier complexity estimation. In this paper
we show how the on-line boosting can be combined with
Wald’s sequential decision theory to solve both of the
problems.
The properties of the proposed on-line WaldBoost algorithm
are demonstrated on a visual tracking problem.
The complexity of the classifier is changing dynamically
depending on the difficulty of the problem. On average,
a speedup of a factor of 5-10 is achieved compared to
the non-sequential on-line boosting.
2007
Learning A Fast Emulator of a Binary Decision Process
Computation time is an important performance characteristic of computer
vision algorithms. This paper shows how existing (slow) binary-valued
decision algorithms can be approximated by a trained WaldBoost classifier,
which minimises the decision time while guaranteeing predefined approximation
precision. The core idea is to take an existing algorithm as a black
box performing some useful binary decision task and to train the
WaldBoost classifier as its emulator.
Two interest point detectors, Hessian-Laplace and Kadir-Brady saliency
detector, are emulated to demonstrate the approach. The experiments
show similar repeatability and matching score of the original and
emulated algorithms while achieving a 70-fold speed-up for Kadir-Brady
detector.
2005
WaldBoost - Learning for Time Constrained Sequential Detection
Jan Šochman, and Jiřı́ Matas
In Proc. of Conference on Computer Vision and Pattern Recognition (CVPR), Jun 2005