Computer vision is a scientific discipline allowing machines to see. Why do they need it and how is it useful we’ll discuss in this article.
It’s a branch of technology that allows computers to “see” and understand what they are “seeing”. A lot of devices have cameras, that’s nothing new. However, simply getting the picture is not enough. The ability to analyze and perceive the environment is what this field of study is actively working on.
Let’s look at a few examples you are probably already familiar with. For example, let’s look at QR codes. As you point the camera of a device towards the image it is then decoded and the operation is completed. There are three stages to this process: the camera finding something that holds the information; the device processing this data; the device operating according to the received information.
Since everything holds some kind of information, the range of applications is incredibly wide. Machine vision, autonomous vehicles, healthcare, military, surveillance, and much more. All of these areas are already using computer vision to some extent. Some are impossible without it. As technology gets more refined there will be even more ways for computer vision to change our lives.
This emergent technology is closely connected with many adjacent fields of study. Among them are artificial intelligence, neurobiology, solid-state physics, and information engineering among many others. But how does the process actually work? As mentioned before, it’s boiled down to perceiving, understanding, and execution. Let’s tackle them one at a time.
A person sees when the light passes through the cornea and focuses on the retina that converts light into the neuronal signals. That’s where the connection with neuroscience comes in. As is the case with AI, the human body often inspires innovation. Human vision by its essence has provided the framework and the understanding upon which experts could replicate the same process with hardware. Instead of the eye, there is a camera and instead of neuronal signals, there is data. This data is then sent to the “brains” of the computer to be processed and “understood”. Sometimes, however, computer processing of images and video sequences are also considered to be part of computer vision. When that is the case, the data is already contained in the piece of media and doesn’t necessarily require a camera to be pointed at it.
The thing that makes computer vision difficult is that the human body analogy starts falling through. While we understand how our nervous system works there are still some areas that scientists aren’t certain on. Because of that replicating it on a computer is extremely challenging. While the achievement of general artificial intelligence is still mostly a sci-fi scenario, some tasks are possible to replicate on their own. Image processing associated with computer vision is one such thing.
How the data interpretation process works depends on the objectives of the program. For certain operations, it is enough to measure the distance between two indicators, while others require a complex assessment of the area with the number of present objects, the general business, and multiple other factors. For that purpose, artificial intelligence is utilized in most cases. For more complex tasks more complex AI processes are employed such as deep learning, for example. Say, a program needs to determine whether or not an image contains a cat. The computer then employs deep learning practices to understand what constitutes a “cat” and then analyze an image with that knowledge.
Well, that’s up to a programmer to decide. For instance, if a CV sensor catches a certain type of object in the field of vision one it launches a certain algorithm. A perfect example would be a parking camera on a car that alerts you that there is something close to your vehicle. As you can imagine, the range of operations goes well beyond that and that is one of the simpler applications. However, it also has its nuances. For example, there has been an issue with raindrops as the computer recognized them as objects that your car is about to collide with. Speaking of rain and cars, the sensors that automatically turn on windshield wipers are also possible thanks to computer vision.
The topic of computer vision is incomplete without covering at least some of the typical tasks. They help understand what processes make this technology function. In addition, they provide insight into how they can be applied throughout various areas.
It is often the first thing that comes to mind when discussing computer vision. This operation determines whether or not the data received through the camera has a certain type of object, activity, or feature. Some of both general and specific tasks that are classified as part of recognition are:
Object recognition/classification. Technology that finds certain predetermined objects in an image or a video sequence. It has a lot of different methods, from appearance- and feature-based ones to context awareness and artificial neural networks. Identification is a singular instance of object recognition and via instance segmentation, a computer determines each distinct identifiable object on an image.
Detection. This process implies scanning for a specific condition. In most cases, it is used for comparatively simple computations (“is there any movement? Y/N”), but more complex ones are becoming increasingly widespread.
Shape Recognition Technology (SRT) separates a certain type of object in the image from others. For example, it can identify people, vehicles, or any other predefined “shape". This technology is considered to be part of the digital pattern recognition process.
Pose estimation is the estimation of the alignment, position, or orientation of an object. It is especially helpful for automated assembly lines where machine arms have to interact with objects on it and all similar tasks.
Facial Recognition is quite self-explanatory. It is a process that scans an individual’s face and compares it to the existing database. It is often used for security purposes ranging from identifying criminals to unlocking your own phone. It has other applications too, for example, face detection on various cameras that aids with focus.
Optical character recognition (OCR) identifies text characters and symbols. This is frequently applied for accessibility technology as well as camera-based translation apps. It’s also useful for the conversion of printed text into an editable format.
2D code reading is used for such technology as QR codes and other data matrices. Computer vision is necessary to decode the information behind the 2D images.
Content-based image retrieval makes operations such as “show me all images containing X” possible. By understanding the content of given images this process is really helpful when the number of images in a database is large. An example of use can be asking a security system to show all CCTV footage containing people from 2 till 4 AM and things of similar nature.
Another important component of computer vision technology is the processes involving motion. Computations required for this are more complex than those of static objects but they are absolutely necessary to make CV a viable and practically applicable technology. Some of the tasks involved in this process are as follows:
Egomotion refers to the process of estimating the camera’s movement in the 3D space. This process is extremely useful for vehicle-based CV applications as well as autonomous robot navigation.
Optical Flow is largely the other side of the egomotion coin. It’s the process of estimating the movement of objects or specific points in the 3D space in relation to the camera.
Tracking allows the computer to detect and follow the movement of predetermined points of interest in the image sequence. This estimates the speed if that sequence is made with regular intervals.
Computer vision can also build 3D models based on a video, an image, or a sequence of them. The projection can be anything from a set of 3D points to complete surface models. Development in this technology is becoming increasingly more sophisticated allowing to stitch 3d images together. Grid-based 3D sensing is the process responsible for such advancement. Identifying a surface grid allows the camera to perform a degree of depth perception necessary for this practice.
This CV task fills in corrupted/blank spots, removes noise, and otherwise aids in restoring the original state of an image. Ranging from low-pass or median filters to context-aware technology and AI, this technology helps with all sorts of identification. For example, if the footage is blurry but you need to see the license plate or require inpainting for the old family photos deteriorated by time, then the image restoration process is the technology you require.
That’s all barely scratching the surface of the computer vision technology. Each area of application is an entirely different field to cover and the technologies involved also vary. Because of this, this topic cannot be exhausted in one article. However, this should be a great starting point upon which the computer vision theme can be understood.