The Savior of High-Resolution Anomaly Detection!? A Deep Dive into the Memory-Efficient Tiled Ensemble Approach
In recent years, the demand for automated detection of defects and anomalies has been rising across a wide range of industries, especially in manufacturing. To ensure even the tiniest anomalies aren’t overlooked, it’s crucial to process images in high resolution. However, working directly with high-resolution images poses a major challenge: massive GPU memory consumption makes it difficult to apply conventional anomaly detection methods in practical settings.
The paper we're introducing today—“Divide and Conquer: High-Resolution Industrial Anomaly Detection via Memory Efficient Tiled Ensemble” (CVPRW 2024)—proposes a groundbreaking solution to this problem.
Paper Overview
- Title: Divide and Conquer: High-Resolution Industrial Anomaly Detection via Memory Efficient Tiled Ensemble
- Authors: Blaž Rolih (University of Ljubljana), Dick Ameln, Ashwin Vaidya, Samet Akcay (Intel)
- Published: 2024 (CVPR Workshop)
- Main Paper: https://openaccess.thecvf.com/content/CVPR2024W/VAND/papers/Rolih_Divide_and_Conquer_High-Resolution_Industrial_Anomaly_Detection_via_Memory_Efficient_CVPRW_2024_paper.pdf
- Supplement: https://openaccess.thecvf.com/content/CVPR2024W/VAND/supplemental/Rolih_Divide_and_Conquer_CVPRW_2024_supplemental.pdf
This paper introduces a new approach called the Tiled Ensemble, which splits an image into small tiles and assigns an independent model to each tile. This method preserves the high resolution of the image while reducing GPU memory usage to the level needed for processing a single tile.
Core Idea: Divide-and-Conquer Meets Ensemble Learning
The Tiled Ensemble approach is built on four main steps:
-
Tile the Image: The high-resolution input image is split into overlapping small tiles. These overlaps enhance the benefits of ensemble learning, as we’ll explain later.
-
Train Independent Models: Each tile position gets its own dedicated model, trained independently. Each model learns features specific to its corresponding tile location.
-
Inference: At inference time, the image is split the same way, and each tile is processed by its corresponding trained model to detect anomalies.
-
Merge Results: The anomaly maps and scores from all tiles are merged (typically via averaging) to produce the final full-image anomaly map and score.
A major strength of this method is its model-agnostic nature—it can be applied to many popular anomaly detection models like Padim, PatchCore, FastFlow, and Reverse Distillation without modifying their core architectures.
Why Is This Impressive? Comparison with Conventional Methods
Traditional approaches to high-resolution anomaly detection have suffered from two key issues:
- Memory Bottlenecks: Processing full-resolution images directly often exceeds GPU memory limits.
- Loss of Detail: Downsampling to reduce memory usage can cause small defects to be missed entirely.
The Tiled Ensemble addresses these challenges with:
- Memory Efficiency: By processing one tile at a time, the GPU only needs enough memory for a single tile—drastically reducing memory usage and making high-res anomaly detection feasible.
- High Accuracy: By keeping the resolution high and using overlapping tiles, ensemble effects kick in—multiple models contribute predictions, improving precision and robustness.
Real-World Results: Tested on Major Datasets
The authors validated the effectiveness of their method on two widely used industrial anomaly detection datasets:
- MVTec AD: Includes many large, obvious anomalies.
- VisA: Contains tiny, hard-to-detect defects—perfect for testing high-resolution detection.
Key findings:
- On the VisA dataset, the Tiled Ensemble showed significant improvement in detecting small anomalies.
- It provided consistent performance gains across various baseline models.
- GPU memory usage was comparable to low-resolution single-model setups, making it practical for real-world deployment.
Discussion: Limitations and Future Directions
While the Tiled Ensemble shows great promise, some limitations remain:
- Increased Latency: Since inference is done with multiple models, the overall processing time may increase. However, batch inference can help mitigate this issue.
- Loss of Global Context: Processing tiles independently can make it harder to capture global structures or contextual relationships across the full image.
The paper suggests future work may include:
- Optimizing feature extraction per tile
- Designing hybrid methods that reintegrate global context
- Applying the method to logical anomaly detection tasks that rely on scene-wide understanding
Conclusion
Divide and Conquer: High-Resolution Industrial Anomaly Detection via Memory Efficient Tiled Ensemble offers a practical, high-impact solution for detecting fine-grained anomalies in high-resolution images—without overwhelming GPU memory.
Its compatibility with existing models and proven effectiveness make it a promising breakthrough in the field of industrial computer vision. This technique could play a pivotal role in the next generation of anomaly detection systems—especially in domains where high-resolution imaging is a must.