We propose a dense image prediction out-of-distribution detection algorithm, called PixOOD, which does not require training on samples of anomalous data and is not designed for a specific application which avoids traditional training biases. In order to model the complex intra-class variability of the in-distribution data at the pixel level, we propose an online data condensation algorithm which is more robust than standard K-means and is easily trainable through SGD. We evaluate PixOOD on a wide range of problems. It achieved state-of-the-art results on four out of seven datasets, while being competitive on the rest.
@misc{vojir2024pixood,title={PixOOD: Pixel-Level Out-of-Distribution Detection},author={Voj{\'\i}\v{r}, Tom\'a\v{s} and \v{S}ochman, Jan and Matas, Ji\v{r}{\'\i}},booktitle={Computer Vision – ECCV 2024},year={2024},eprint={2405.19882},archiveprefix={arXiv},primaryclass={cs.CV},}
2023
Calibrated Out-of-Distribution Detection with a Generic Representation
Tomáš Vojı́ř, Jan Šochman, Rahaf Aljundi, and Jiřı́ Matas
In Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV) Workshops, Oct 2023
Out-of-distribution detection is a common issue in deploying vision models in practice and solving it is an essential building block in safety critical applications. Existing OOD detection solutions focus on improving the OOD robustness of a classification model trained exclusively on in-distribution (ID) data. In this work, we take a different approach and propose to leverage generic pre-trained representations. We first investigate the behaviour of simple classifiers built on top of such representations and show striking performance gains compared to the ID trained representations. We propose a novel OOD method, called GROOD, that achieves excellent performance, predicated by the use of a good generic representation. Only a trivial training process is required for adapting GROOD to a particular problem. The method is simple, general, efficient, calibrated and with only a few hyper-parameters. The method achieves state-of-the-art performance on a number of OOD benchmarks, reaching near perfect performance on several of them. The source code is available at this https URL.
@inproceedings{vojir2023calibrated,author={Voj{\'\i}\v{r}, Tom\'a\v{s} and \v{S}ochman, Jan and Aljundi, Rahaf and Matas, Ji\v{r}{\'\i}},title={Calibrated Out-of-Distribution Detection with a Generic Representation},booktitle={Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV) Workshops},month=oct,year={2023},pages={4507-4516},doi={10.1109/ICCVW60793.2023.00485},}
2021
Monocular Arbitrary Moving Object Discovery and Segmentation
Michal Neoral, Jan Šochman, and Jiří Matas
In The 32nd British Machine Vision Conference – BMVC 2021, Oct 2021
We propose a method for discovery and segmentation of objects that are, or their parts are, independently moving in the scene. Given three monocular video frames, the method outputs semantically meaningful regions, i.e. regions corresponding to the whole object, even when only a part of it moves.
The architecture of the CNN-based end-to-end method, called Raptor, combines semantic and motion backbones, which pass their outputs to a final region segmentation network. The semantic backbone is trained in a class-agnostic manner in order to generalise to object classes beyond the training data. The core of the motion branch is a geometrical cost volume computed from optical flow, optical expansion, mono-depth and the estimated camera motion.
Evaluation of the proposed architecture on the instance motion segmentation and binary moving-static segmentation problems on KITTI, DAVIS-Moving and YTVOS-Moving datasets shows that the proposed method achieves state-of-the-art results on all the datasets and is able to generalise well to various environments. For the KITTI dataset, we provide an upgraded instance motion segmentation annotation which covers all moving objects. Dataset, code and models are available on the github project page github.com/michalneoral/Raptor.
@inproceedings{Neoral2021,author={Neoral, Michal and {\v{S}}ochman, Jan and Matas, Ji{\v{r}}{\'i}},title={Monocular Arbitrary Moving Object Discovery and Segmentation},booktitle={The 32nd British Machine Vision Conference -- BMVC 2021},year={2021},}
Two optical flow estimation problems are addressed: (i)
occlusion estimation and handling, and (ii) estimation from image
sequences longer than two frames. The proposed ContinualFlow method
estimates occlusions before flow, avoiding the use of flow corrupted by
occlusions for their estimation. We show that providing occlusion masks
as an additional input to flow estimation improves the standard performance metric by more than 25% on both KITTI and Sintel. As a
second contribution, a novel method for incorporating information from
past frames into flow estimation is introduced. The previous frame flow
serves as an input to occlusion estimation and as a prior in occluded
regions, i.e. those without visual correspondences. By continually using
the previous frame flow, ContinualFlow performance improves further by
18% on KITTI and 7% on Sintel, achieving top performance on KITTI
and Sintel.
@inproceedings{Neoral2018,author={Neoral, Michal and {\v S}ochman, Jan and Matas, Jiri},title={Continual Occlusions and Optical Flow Estimation},booktitle={Asian Conference on Computer Vision},pages={159--174},year={2018},doi={10.1007/978-3-030-20870-7_10},organization={Springer},}
2013
Robust abandoned object detection integrating wide area visual surveillance and social context
James Ferryman, David Hogg, Jan Sochman, Ardhendu Behera, and
9 more authors
This paper presents a video surveillance framework that robustly and efficiently detects abandoned objects in surveillance scenes. The framework is based on a novel threat assessment algorithm which combines the concept of ownership with automatic understanding of social relations in order to infer abandonment of objects. Implementation is achieved through development of a logic-based inference engine based on Prolog. Threat detection performance is conducted by testing against a range of datasets describing realistic situations and demonstrates a reduction in the number of false alarms generated. The proposed system represents the approach employed in the EU SUBITO project (Surveillance of Unattended Baggage and the Identification and Tracking of the Owner).
@article{Ferryman2013789,title={Robust abandoned object detection integrating wide area visual surveillance and social context},journal={Pattern Recognition Letters},volume={34},number={7},pages={789 - 798},year={2013},note={<ce:title>Scene Understanding and Behaviour Analysis</ce:title>},issn={0167-8655},doi={10.1016/j.patrec.2013.01.018},url={http://www.sciencedirect.com/science/article/pii/S0167865513000226},author={Ferryman, James and Hogg, David and Sochman, Jan and Behera, Ardhendu and Rodriguez-Serrano, José A. and Worgan, Simon and Li, Longzhen and Leung, Valerie and Evans, Murray and Cornic, Philippe and Herbin, Stéphane and Schlenger, Stefan and Dose, Michael},keywords={Abandoned objects},}
2010
Interpreting Structures in Man-made Scenes - Combining Low-Level and High-Level Structure Sources
Kasim Terzic, Lothar Hotz, and Jan Sochman
In Proceedings of the 2nd International Conference on Agents and Artificial Intelligence - Volume 1: ICAART, Oct 2010
@inproceedings{Terzic2010,author={Terzic, Kasim and Hotz, Lothar and Sochman, Jan},title={Interpreting Structures in Man-made Scenes - Combining Low-Level and High-Level Structure Sources},year={2010},pages={357--364},booktitle={Proceedings of the 2nd International Conference on Agents and Artificial Intelligence - Volume 1: ICAART},publisher={SciTePress},organization={INSTICC},doi={10.5220/0002735303570364},isbn={978-989-674-021-4},issn={2184-433X},}
2009
Learning Fast Emulators of Binary Decision Processes
Jan Šochman, and Jiřı́ Matas
International Journal of Computer Vision, Jun 2009
Computation time is an important performance characteristic of computer
vision algorithms. The paper shows how existing (slow) binary decision
algorithms can be approximated by a (fast) trained WaldBoost classifier.
WaldBoost learning minimises the decision time of the classifier
while guaranteeing predefined precision. We show that the WaldBoost
algorithm together with bootstrapping is able to efficiently handle
an effectively unlimited number of training examples provided by
the implementation of the approximated algorithm.
Two interest point detectors, the Hessian-Laplace and the Kadir-Brady
saliency detectors, are emulated to demonstrate the approach. Experiments
show that while the repeatability and matching scores are similar
for the original and emulated algorithms, a 9-fold speed-up for
the Hessian-Laplace detector and a 142-fold speed-up for the Kadir-Brady
detector is achieved. For the Hessian-Laplace detector, the achieved
speed is similar to SURF, a popular and very fast handcrafted modification
of Hessian-Laplace; the WaldBoost emulator approximates the output
of the Hessian-Laplace detector more precisely.
@article{Sochman2009,author={{\v S}ochman, Jan and Matas, Ji{\v r}{\'\i}},title={Learning Fast Emulators of Binary Decision Processes},journal={International Journal of Computer Vision},year={2009},volume={83},pages={149--163},number={2},month=jun,doi={10.1007/s11263-009-0229-x},keywords={Boosting, AdaBoost, Sequential probability ratio test, Sequential
decision making, WaldBoost, Interest point detectors, Machine learning},}
2008
Training Sequential On-line Boosting Classifier for Visual Tracking
H. Grabner, J. Šochman, H. Bischof, and J. Matas
In 19th International Conference on Pattern Recognition, Jun 2008
On-line boosting allows to adapt a trained classifier
to changing environmental conditions or to use sequentially
available training data. Yet, two important problems
in the on-line boosting training remain unsolved:
(i) classifier evaluation speed optimization and, (ii) automatic
classifier complexity estimation. In this paper
we show how the on-line boosting can be combined with
Wald’s sequential decision theory to solve both of the
problems.
The properties of the proposed on-line WaldBoost algorithm
are demonstrated on a visual tracking problem.
The complexity of the classifier is changing dynamically
depending on the difficulty of the problem. On average,
a speedup of a factor of 5-10 is achieved compared to
the non-sequential on-line boosting.
@inproceedings{Grabner2008,author={Grabner, H. and {\v S}ochman, J. and Bischof, H. and Matas, J.},title={Training Sequential On-line Boosting Classifier for Visual Tracking},booktitle={19th International Conference on Pattern Recognition},year={2008},doi={10.1109/ICPR.2008.4761678},}
2007
Learning A Fast Emulator of a Binary Decision Process
Computation time is an important performance characteristic of computer
vision algorithms. This paper shows how existing (slow) binary-valued
decision algorithms can be approximated by a trained WaldBoost classifier,
which minimises the decision time while guaranteeing predefined approximation
precision. The core idea is to take an existing algorithm as a black
box performing some useful binary decision task and to train the
WaldBoost classifier as its emulator.
Two interest point detectors, Hessian-Laplace and Kadir-Brady saliency
detector, are emulated to demonstrate the approach. The experiments
show similar repeatability and matching score of the original and
emulated algorithms while achieving a 70-fold speed-up for Kadir-Brady
detector.
@inproceedings{Sochman-accv2007,author={{\v S}ochman, Jan and Matas, Ji{\v r}{\' \i}},title={Learning A Fast Emulator of a Binary Decision Process},booktitle={ACCV},year={2007},editor={Yagi, Yasushi and Kang, Sing Bing and Kweon, In So and Zha, Hongbin},volume={II},pages={236--245},address={Berlin Heidelberg},publisher={Springer},series={LNSC},isbn={978-3-540-76389-5},doi={10.1007/978-3-540-76390-1_24},}
2005
WaldBoost - Learning for Time Constrained Sequential Detection
Jan Šochman, and Jiřı́ Matas
In Proc. of Conference on Computer Vision and Pattern Recognition (CVPR), Jun 2005
@inproceedings{sochman-waldboost-cvpr05,author={{\v S}ochman, Jan and Matas, Ji{\v r}{\' \i}},title={WaldBoost - Learning for Time Constrained Sequential Detection},booktitle={Proc. of Conference on Computer Vision and Pattern Recognition (CVPR)},address={Los Alamitos, USA},year={2005},month=jun,day={20--25},isbn={0-7695-2372-2},publisher={IEEE Computer Society},book_pages={1219},pages={150--157},doi={10.1109/CVPR.2005.373},annote={ In many computer vision classification problems, both the
error and time characterizes the quality of a decision. We show that
such problems can be formalized in the framework of sequential
decision-making. If the false positive and false negative error
rates are given, the optimal strategy in terms of the shortest
average time to decision (number of measurements used) is the Wald's
sequential probability ratio test (SPRT). We built on the optimal
SPRT test and enlarge its capabilities to problems with dependent
measurements. We show, how the limitations of SPRT to a priori
ordered measurements and known joint probability density functions
can be overcome. We propose an algorithm with near optimal time -
error rate trade-off, called WaldBoost, which integrates the
AdaBoost algorithm for measurement selection and ordering and the
joint probability density estimation with the optimal SPRT decision
strategy. The WaldBoost algorithm is tested on the face detection
problem. The results are superior to the state-of-the-art methods in
average evaluation time and comparable in detection rates. },keywords={Adaboost, cascade, Wald's SPRT, sequential analysis, face detection},editor={Schmid, Cordelia and Soatto, Stefano and Tomasi, Carlo},venue={San Diego, California, USA },volume={ 2 },}