Project 1: LiDAR Machine Learning

An Technology Overview

Wow, okay. That title is made entirely of words that are going to need definition. Let’s begin with LiDAR. LiDAR stands for Light Detection and Ranging, and the technology is quickly becoming the industry standard over RADAR (Radio detection and ranging). It is important to understand these technologies are essentially doing the same thing, measuring electromagnetic waves! LiDAR uses high frequency wavelengths around 550 nanometers, or what we see as green visible light. LiDAR technology extends a beam of light and captures reflections of that beam, measuring the co-polarization and cross-polarization of the reflected portion of the original beam. The data produced by LiDAR technology helps to paint a layered image of the object of measurement. Currently, LiDAR is being used to map the Amazon rainforest, as it provides the ability to “see” past the dense canopy cover. LiDAR is similarly being used to track insect flight patterns, fish population densities, and to protect strategic air-space all over the world.

Clearly LiDAR is an important technology, but the reason for the delayed adoption is due to the large amount of data these measurements produce. These datasets are massive in scale, and take teams of scientists observing minute changes in polarization to effectively identify important information. This is where machine learning comes into the narrative. Machine learning is used to reduce labor costs and human error while saving time in the evaluation of large datasets. This happens through a variety of methods, most of which are encompassed under the umbrella term parameter optimization (yes, even the infamous neural networks are optimization machines).

A little about optimization. Say we have an equation consisting of three independent variables (X1, X2, and X3) and a single dependent outcome (Y). We might quickly say,

X1 + X2 + X3 = Y [1]

Usually, this assumption will be incorrect, very rarely will simple addition define a relationship between input and output. If we wanted to increase the accuracy of the model, we can implement scaling constants (called scalars: A1, A2, and A3) to our equation.

(A1*X1) + (A2*X2) + (A3*X3) = Y [2]

Optimization is the idea that from a dataset of X1, X2, X3, and Y we will be able to define some value A1, A2, and A3 which relate X1, X2, and X3 to the output Y. Now, to be clear most all machine learning algorithms require more advanced math concepts to relate input to output; but the basic idea of optimization is encompassed in the example above.

There are two distinct areas of machine learning algorithms: classification and prediction. Classification algorithms attempt to categorize data (for some value X, X is either greater than or equal to 4 OR less than 4), while prediction algorithms attempt to discover a precise value given an input (Given A, B, and C, we predict X WILL BE equal to 4). Our example of optimization can be applied to both methods of machine learning.

In the case of classification the dividing line between classes is being optimized to avoid miss-classifying data points.

Support Vector Machine Simple Classification

When the algorithm is built to make predictions, the optimization is attempting to map and input directly to an output.

Linear Regression Simple Example

As I continue to elaborate on my specific LiDAR machine learning project, know I am referencing classification algorithms. For our specific project the team was aware of what “thing” we were trying to detect; allowing the development of a binary classifier (yes or no, 1 or 0, exists or does not exist) which helped reduce the time taken to classify important ecological information. More on that in next weeks blog post.

Leave a comment