Image recognition in a video stream is actively developing in recent years due to its practical application in various areas starting from security surveillance to virtual reality to robotics. All these areas require the system to be able to process a huge amount of data in near real-time conditions. Thus, a high-speed of computation becomes a primary target for accurate image recognition.
Besides, a great variety of algorithm modifications have already been proposed for real-time performance, but most of them can work only with a low-resolution input video of approximately 608 x 608 px. While the resolution of recording devices is rapidly increasing nowadays, there is a critical need for tools that can process high-resolution data in real time.
Even though there are algorithms that can process a broadcasting video in a high resolution like 1280 x 720 px without reducing the accuracy, the generalization ability of the system is in question. You can apply considerable efforts to speed up image recognition algorithms, but algorithmic modifications won't be enough to achieve significant improvements in computational speed.
Overcoming these challenges is only possible by using high-performance hardware, in particular, parallel microarchitectures. While hardware manufacturers try to improve the latest Central Processing Units (CPU) with the advanced vector extensions, these CPUs are not powerful enough to process a high-resolution video.
GPU For Image Recognition In Video
Fortunately, you can also speed up image recognition in a video by using Graphics Processing Unit or GPU. This is a specialized electronic circuit specifically developed to rapidly manipulate and alter memory that allows accelerating the recognition of images in a frame buffer intended for output.
The application of GPU for video processing provides impressive results in both video processing speed and image recognition accuracy. This is achieved thanks to the GPU highly parallel structure that allows algorithms to process large blocks of data simultaneously. Besides, GPU has efficient kernels and optimized data flows.
For instance, the recent study results on accelerating face detection revealed that GPU implementation led to a five-fold speedup in 1080 px videos on average over the fastest implementations of CPU ever known, while slightly improving the accuracy of image recognition.
Besides, researchers at Carnegie Mellon University went further and used GPU for developing an image recognition model that enables fast and accurate object detection in high-resolution 4K and 8K video. Their method included a two-stage evaluation of every video frame under rough and refined resolution for performing a minimum number of evaluations. After image evaluation, they applied YOLO v2 for fast object detection. The results revealed a high accuracy of object detection with an average performance of three to six fps on 4K videos and two fps on 8K videos.
However, if you want to apply deep learning models for image recognition tasks in a video, it's not enough to have a powerful GPU. In addition to growth in computational power, you also have to increase network bandwidth and non-volatile storage capacity to make it possible to store and transfer extraordinary amounts of data. Of course, all of these may require significant budget investments. However, you can start your deep learning experiments in video processing by using free GPU along with other resources offered by the largest cloud providers.
Cloud Services With Free GPU
There are plenty of cloud services that provide integration with GPU or even Tensor Processing Unit (TPU) for deep learning and big data implementations. Thanks to free cloud-based services, you can implement TensorFlow, Keras, Caffe, PyTorch, and other Python based code for image recognition in a video stream right now. Let's consider the next cloud services that can provide you with free computational power:
Amazon SageMaker
Amazon SageMaker is a cloud service that provides developers with an ability to train models for image recognition and video processing. You can use a t2.medium notebook for 250 hours to build your model. After that the service provides you with 50 hours of m4.xlarge so you can train your model, as well as 125 hours of m4.xlarge for its deployment.
All these services are free of charge but only for the first two months after sign-up. As for hardware resources, they represent a combination of CPU, GPU, memory, and networking capacity that varies upon your needs. The full list of SageMaker instances can be found here.
Google Colab
Collaboratory is a Google research project created for assisting in machine learning researches. Using Google Colab for video processing, you will get NVIDIA Tesla K80 GPU with 12GB of video memory. You can work in free Jupyter notebook environment in the cloud only for 12 hours, but you can save your results on your Google Drive account and access them any time.
Kaggle Kernels
Kaggle Kernels is one more cloud computational service to run Jupyter notebooks in cloud. It also supports scripts in R and Python, and RMarkdown reports. Kaggle can provide you with the following resources:
- Four CPU cores along with 16 GB of RAM, and 1 GB of disk space.
- Two GPU cores with 14 GB of RAM. You can also add a single NVIDIA Tesla K80 to your kernel.
In addition to these resources, the service offers pre-configured GPU-ready software and packages. You can run your kernels for up to 60 minutes for free instead of the past 20-minute limit.
Conclusion
Image recognition in video stream is a very important task for many areas of our life starting from security surveillance to computer vision applications. However, it requires high computational capacities that can detect objects on high-resolution videos accurately and timely. While GPU is proved to over perform any CPU implementations, not all developers have the latest hardware at their disposal. But thanks to cloud services like Google Colab, Kaggle Kernels, and Amazon SageMaker, there is no need to postpone your ambitions in developing your solution for image recognition in video stream.