3D vision imaging is one of the most important methods for information perception of industrial robots, which can be divided into optical and non-optical imaging methods. At present, the most used optical methods, including: time-of-flight method, structured light method, laser scanning method, Moire fringe method, laser speckle method, interferometry, photogrammetry, laser tracking method, shape from motion, shape from shadow, and other ShapefromX. This paper introduces several typical schemes.
1. Time of flight 3D imaging
Each pixel of the time-of-flight (TOF) camera uses the time difference in the flight of light to obtain the depth of the object.
	
In the classical TOF measurement method, the detector system starts the detection and receiving unit to time when the optical pulse is emitted. When the detector receives the optical echo from the target, the detector directly stores the round-trip time.
	
Also known as Direct TOF (DTOF), D-TOF is commonly used in single-point ranging systems, where scanning technology is often required to achieve area-wide 3D imaging.
Scanning-free TOF 3D imaging technology has not been realized until recent years, because it is very difficult to implement subnanosecond electronic timing at the pixel level.
The alternative to direct-timed D-TOF is indirect TOF (I-TOF), in which the time round trip is indirectly extrapolated from time-gated measurements of light intensity. I-TOF does not require precise timing, but instead employs time-gated photon counters or charge integrators, which can be implemented at the pixel level. I-TOF is the current commercialized solution for electronic and optical mixers based on TOF cameras.
TOF imaging can be used for large field of view, long distance, low precision, low cost 3D image acquisition. Its characteristics are: fast detection speed, large field of view, long working distance, cheap price, but low accuracy, easy to be interfered by ambient light.
	
2. Scan for 3D imaging
Scanning 3D imaging methods can be divided into scanning ranging, active triangulation, dispersion confocal method and so on. In fact, dispersion confocal method is a scanning and ranging method, considering that it is currently widely used in the manufacturing industry such as mobile phones and flat panel displays, it is introduced separately here.
1. Scanning and ranging
Scanning distance measurement is to use a collimated beam to scan the entire target surface through one-dimensional distance measurement to achieve 3D measurement. Typical scanning ranging methods are:
1, single point time of flight method, such as continuous wave frequency modulation (FM-CW) ranging, pulse ranging (LiDAR), etc.;
2, laser scattering interferometry, such as interferometers based on the principles of multi-wavelength interference, holographic interference, white light interference speckle interference, etc.
3, confocal method, such as dispersion confocal, self-focusing, etc.
In the single point range scanning 3D method, the single point time of flight method is suitable for long-distance scanning, and the measurement accuracy is low, generally in the order of millimeters. Other single point scanning methods are: single point laser interferometry, confocal method and single point laser active triangulation method, the measurement accuracy is higher, but the former has high environmental requirements; Line scanning accuracy moderate, high efficiency. Active laser triangulation method and dispersion confocal method are more suitable for performing 3D measurement at the end of the robotic arm.
2. Active triangulation
Active triangulation method is based on the principle of triangulation, using collimated beams, one or more plane beams to scan the target surface to complete 3D measurement.
The beam is usually obtained in the following ways: laser collimation, cylindrical or quadric cylindrical angular beam expansion, incoherent light (such as white light, LED light source) through the hole, slit (grating) projection or coherent light diffraction.
Active triangulation can be divided into three types: single point scanning, single line scanning, and multi-line scanning. Most of the products currently commercialized for use at the end of robotic arms are single point and single line scanners.
	
In the multi-line scanning method, it is difficult to identify the fringe pole number reliably. In order to accurately identify stripe numbers, high-speed alternating imaging of two sets of vertical optical planes is usually adopted, which can also realize "FlyingTriangulation" scanning. The scanning and three-dimensional reconstruction process is shown in the following figure. A sparse 3D view is generated by multi-line projection stroboscopic imaging, and several 3D view sequences are generated by longitudinal and horizontal fringe projection scanning. Then a complete and compact 3D surface model with high resolution is generated by 3D image matching.
	
3. Dispersion confocal method
Dispersion confocal seems to be able to scan and measure rough and smooth opaque and transparent objects, such as reflective mirrors, transparent glass surfaces, etc., and is currently widely popular in the field of three-dimensional detection of mobile phone cover plates.
There are three types of dispersive confocal scanning: single-point one-dimensional absolute ranging scanning, multi-point array scanning and continuous line scanning. The following figure lists two types of examples of absolute ranging and continuous line scanning respectively. Among them, continuous line scanning is also an array scanning, but the array has more and denser lattice.
In commercial products, the more well-known scanning spectral confocal sensor is France's STILMPLS180, which adopts 180 array points to form a line with a maximum line length of 4.039mm (measuring point 11.5pm, point to point spacing of 22.5pm). Another product is Finland's FOCALSPECUULA. The technique of dispersion confocal triangle is adopted.
	
3. 3D imaging with structured light projection
Structured light projection 3D imaging is currently the main way of robot 3D visual perception, structured light imaging system is composed of several projectors and cameras, commonly used structural forms are: single projector-single camera, single projector-double camera, single projector-multiple camera, single camera - double projector and single camera - multiple projectors and other typical structural forms.
The basic working principle of 3D imaging of structured light projection is that projectors project specific structured light lighting patterns to target objects, and the images modulated by the target are captured by the camera, and then the 3D information of the target object is obtained through image processing and visual model.
Commonly used projectors mainly have the following types: liquid crystal projection (LCD), digital light modulation projection (DLP: such as digital micromirror devices (DMD)), laser LED pattern direct projection.
According to the number of structured light projection, 3D imaging of structured light projection can be divided into single projection 3D and multiple projection 3D methods.
1. Single projection imaging
The single projection structured light is mainly realized by space multiplexing coding and frequency multiplexing coding. The common coding forms are color coding, gray index, geometric shape coding and random spots.
At present, in the application of robot hand-eye system, for the occasions where the 3D measurement accuracy is not high, such as palletizing, unpalletizing, 3D grasping, etc., it is more popular to project pseudo-random spots to obtain the 3D information of the target. The 3D imaging principle is shown in the following figure.
	
2. Multiple projection imaging
The multi-projection 3D method is mainly implemented by time multiplexing coding. The commonly used pattern coding forms are: binary coding, multi-frequency phase-shift coding τ35 and mixed coding (such as gray code ten-phase shift fringes).
The basic principle of fringe projection 3D imaging is shown in the figure below. Structured light patterns are generated by a computer or generated by a special optical device, which are projected onto the surface of the measured object through an optical projection system, and then image acquisition devices (such as CCD or CMOS cameras) are used to collect the deformed structured light images modulated by the surface of the object. The image processing algorithm is used to calculate the corresponding relationship between each pixel in the image and the point on the object outline. Finally, through the system structure model and calibration technology, the three-dimensional contour information of the measured object is calculated.
In practical applications, Gray code projection, sinusoidal phase-shift fringe projection or Gray code ten sinusoidal phase-shift mixed projection 3D technology is often used.
	
3. Deflection imaging
For rough surface, structured light can be directly projected onto the object surface for visual imaging measurement. However, for the 3D measurement of large reflectance smooth surfaces and mirror objects, the structured light projection cannot be directly projected onto the measured surface, and the 3D measurement also requires the use of mirror deflection technology, as shown in the following figure.
	
In this scheme, the fringes are not projected directly onto the measured contour, but are projected onto a scattering screen, or an LCD screen is used instead of the scattering screen to display the fringes directly. The camera retraces the light path through the bright surface, obtains the fringe information modulated by the curvature change of the bright surface, and then solves the 3D profile.
4. Stereo vision 3D imaging
Stereovision literally refers to the perception of three-dimensional structure with one or both eyes, and generally refers to the reconstruction of 3D structure or depth information of the target object by obtaining two or more images from different viewpoints.
Depth perception visual cues can be divided into ocularcues and Binocularcues (binocular parallax). At present, stereoscopic 3D can be achieved through monocular vision, binocular vision, multiocular vision, light-field 3D imaging (electronic compound eye or array camera).
1. Monocular visual imaging
Monocular depth perception cues usually include perspective, focal length difference, multi-vision imaging, coverage, shadow, motion parallax, etc. In robot vision can also use mirror 1, and other shapefromX10 and other methods to achieve.
2. Binocular vision imaging
The visual clues of binocular depth perception are: convergence position of eyes and binocular parallax. In machine vision, two cameras are used to obtain two view images from two view points to the same target scene, and then the parallax of the same point in the two view images is calculated to obtain the 3D depth information of the target scene. The typical binocular stereovision calculation process consists of the following four steps: image distortion correction, stereo image pair correction, image registration and triangulation reprojection parallax map calculation
