-
Notifications
You must be signed in to change notification settings - Fork 2
Vision
Vision underwent major restructuring at the beginning of 2017. This page is still being updated to account for the changes.
- Infrastructure
- Colour Calibration
- Modules
- Field Edge Detection
- Goal Detection
- Robot Detection
- Ball Detection
- Field Feature Detection
- Foot Detection and Avoidance
- Code
- Fovea Implementation
- Useful Tips
The Vision module begins by retrieving both the top and bottom camera images. If one of the images is not ready, the thread will wait until they are both ready. Two primary foveas are constructed from the raw images, one per camera, to form the basis of the vision pipeline. The top camera image is scaled down from 1280x960 (native camera resolution) to 160x120 whilst the bottom camera image is scaled down from 640x480 to 80x60.
A detailed description of the vision infrastructure can be found [here].(http://cgi.cse.unsw.edu.au/~robocup/2014ChampionTeamPaperReports/20110825-Carl.Chatfield-VisionFoveated.pdf). This paper also includes a diagram of the infrastructure which may be useful.
The Vision systems uses a static colour calibration system as calibrated through offnao. See Color Calibration for more information.
The field edge plays a vital role in the vision pipeline. The field edge is used to determine where the field starts so that we can save time when scanning the image for other features like the ball or field lines. The module sets an index for each column to indicate the pixel at the edge of the field for all other algorithms to use as a start when scanning.
The algorithm for detecting the field edge starts at the top of the image and scans down until it finds a significant patch of green. This scan is run on every column and results in a set of points across the image. RANSAC is then applied to the set of points to attempt to extract straight lines out of it. We attempt to detect up to two lines in any one image, since there can be two field edges in view at any point in time. This algorithm is run in both the top and bottom cameras independently since the field edge may run across both images.
If no field edge is detected, then we guess if the field edge is above or below the current camera view. This guess is based on the amount of green present in the image. If the image contains a large portion of green, but no distinct field edge, the field edge is assumed to be above the camera view and the entire image is treated as being "on the field". If not enough green is present, then the robot is deemed to be looking off the field or into the sky.
More details can be found here.
No form of goal detection is currently present in the rUNSWift system.
The 2017 Robot Detection uses the Colour Rois to find clusters of white in the top camera. A cluster is defined as any collection of regions that overlap or touch each other. The cluster's boundary box is then classified as the cluster's heightest point to its lowest and its furthest left and right values.
These clusters have some rudimentary checks done on them to check size, percentage of area in them which is actually part of a colour roi, and its width to height ratio. Anything which is too small, does not have enough area covered by colour rois or is not taller than it is wide is discarded. Everything else is classified as a robot.
This is a simple detector and as such it often will get other objects on the field that are large and white (i.e. goal posts). It also misses robots that have fallen over or are very close as these do not usually match the width to height ratio. However as we check percentage of white, field features do not tend to be falsely classified.
Details on robot detection prior to 2017 can be found here.
Outdated
Ball Detection runs in the primary bottom camera fovea first, and if it does not detect a ball there, is run on the primary top camera fovea. If the ball cannot be detected without prior knowledge, we then attempt to search areas we expect the ball to be in using outside information, such as team mate ball locations or previous ball locations.
Ball Detection utilises colour histograms to determine points of interest that may contain a ball. It matches columns and rows that both contain orange and then examines those areas in a higher resolution fovea. The actual ball detection algorithm has two steps, finding the ball edges and then fitting a circle to those edges.
The process for finding the ball edges involves scanning around the fovea and keeping track of the strongest edges found in the fovea. The scan starts at the centre of the fovea and scans outwards, radially, a number of times. If a strong edge point is detected during each scan it is added to the total list of edge points.
Once all the edge points are found, a RANSAC algorithm is applied to fit a circle to them. The RANSAC algorithm takes 3 points, generates a circle and tests how many other edge points lie on the circle. This process is repeated a set number of times and if a good enough match is found then the centre is calculated and stored as a detected ball.
Ball Detection tends to over detect balls, rather than under detect balls. As a result the algorithm works best by underclassifing orange pixels so that the ball has some orange, but other non-ball items (such as jerseys) have little to no orange on them.
More details can be found here.
Field Feature Detection runs in both cameras, but is expensive to run so we attempt to minimise usage where possible. It will always run in the bottom camera, then it will run inside small windows in the top camera. If we still don't find enough features, it will run in the entire top camera frame, which is the most expensive, but also the most likely to detect good features.
The first attempt to run in the top camera uses a searchForFeatures function which guesses where interesting field line data exists based on the current localisation estimate. If the robot is well localised, this works quite well and the robot is able to detect features from 4-5m away and remain well localised. If the robot isn't well localised, sometimes the windows are lucky and still detect a good feature, but often we rely on searching the entire top camera when localisation is uncertain.
Field feature detection has 3 stages, finding points that might lie on a field line, fitting lines and circles to those points, and finally generating more complex shapes like corners and t-intersections.
Finding field line points involves scanning both vertically and horizontally whilst examining edge data. The algorithm searches for matching pairs of strong edges that have opposing directions and uses the midpoint as the output. The pair of strong edges represent the green-to-white and white-to-green edges on either side of the line, so the midpoint should lie on the centre of the line. There are a variety of checks on the points, including the distance between them, the colour of the midpoint, etc to ensure quality points.
Fitting lines and circles involves using a RANSAC based approach. Each RANSAC cycle picks two points at random and attempts to fit both a line and a circle through them. It calculates how many other points also fit each shape to determine which is the best fit, with neither being an acceptable result. This is repeated a set number of times with the best overall match being tracked along the way. Once a cycle is complete, all the points matching the best fitting line or circle are removed and the process is started again. All the points are projected into the ground plane, using the kinematics of the robot, to make matching shapes easier.
Matching higher order shapes involves combining primitive lines and circles into more complicated and identifiable features, including corners, t-intersections and parallel lines. The key metrics to match shapes are the angles of lines relative to each other and the distance between each line's endpoints and the endpoints of neighbouring lines.
More details can be found here.
No form of foot detection is currently present in the rUNSWift system.
Outdated
When making changes to the vision system it is useful to have an understanding of where things are in the code.
Foveas are entirely self-contained so they are indexed from (0, 0) to (width, height). A FoveaT is created with a type T. Once asStruct is called it becomes a FoveaStruct and can no longer be modified.
When creating a Fovea you must use the coordinates of the resolution you are going to. As downsampling always occurs from the native image it does not matter which coordinates you're coming from but what you are downsampling too. As of this writing (November 2016) it can take between 0.5ms and 2ms to create a Fovea.
All of the vision files are in robot/perception/vision. A lot of type definitions can also be found in robot/types. The processFrame function in Vision.cpp kickstarts most of the vision processes so it is a useful starting point. Apart from this the top level files are:
- VisionAdapter.cpp and .hpp (this contains the tick() function for vision, a.k.a. what it does every cycle)
- Vision.cpp and .hpp
- Fovea.hpp and .tcc
In order to add tools to offnao to debug, you need to change visionTab.cpp in utils/offnao/tabs. It is important to remember that the bottom camera image is attached to the bottom of the top camera image when passing coordinates in offnao. In other words the first pixel from the left halfway down the bottom camera image would be at (0, height of the top image + half the height of the bottom image) = (0, 1600).
Other useful files are:
- Camera.cpp - lets you change camera settings
- CameraToRR.cpp - converts image coordinates to robot relative coordinates
- NaoCamera.cpp - is the camera driver
- Yuv.cpp - given an array of type yuv, a column number, row number, and number of columns gives you raw yuv on original image
- NNMC.cpp, .hpp and NNMCInline.hpp - contains colour calibration code