# BIOTAS: BIOTelemetry Analysis Software, for the semi-automated removal of false positives from radio telemetry data – Animal Biotelemetry

#### ByK. Nebiolo and T. Castro-Santos

Jan 13, 2022

The identification and removal of false-positive detections from radio telemetry with BIOTAS starts with the quantification and description of predictor variables. Then, BIOTAS fits (or trains) a NB model, which calculates probability that a detection is valid or false positive given a set of observations. With probabilities in hand, BIOTAS applies a decision criterion to remove false positives from record. To assess the algorithm’s ability to discern between valid and false-positive detections, BIOTAS performs a k-fold cross-validation and assesses model quality with the area under the curve (AUC) statistic as well as measures of sensitivity (sen), specificity (spc), negative and positive predictive value (npv, ppv), and false-positive rate (fpr).

### Selecting and quantifying predictor variables

In developing the classifier, it was important to select predictor variables that maximized the ability to discriminate between valid and false-positive detections. These included: power or received signal strength ((mathrm{RSS})), hit ratio ((mathrm{HR})), consecutive record length ((mathrm{CRL})), noise ratio ((mathrm{NR})), and the difference in the time-lag between detections (({delta }^{2}L)). Power refers to the received signal strength of a given transmission. Depending on the receiver model used, this may be reported in arbitrary units or dB. Of the predictor variables, power is the only one that is intrinsic to a given transmission; all other predictors were derived from detections recorded within a short period of time surrounding of a given transmission.

The Proximate Detection History (PDH or detection history) refers to a series of detections of a given tag recorded during a fixed number of pulse intervals immediately preceding and following a given detection. This describes the pattern of recorded to missed detections in series from the current record. The algorithm looks forwards and backwards in time a specified number of transmission intervals. For example, say a given tag detection occurs at midnight (00:00:00), has a 3-s pulse rate and produces the pattern of heard to missed detections in (Fig. 1). The middle integer is the initial detection (00:00:00), but it was not detected 3 s prior (11:59:57) or 3 s post (00:00:03). To create the PDH, the algorithm queries the recaptures database a set number of pulse intervals forward and backward in time from the current record. If the tag was detected in series, a ‘1’ was added to the history, if it was not then a ‘0’ was added. In Fig. 1, the fish was heard on the -12th, -6th, 0 (current record), 6th, 9th and 12th s, and the resulting detection history was ‘101010111’.

Radio telemetry receivers typically record detection times rounded to the nearest second. Sometimes, however, a given transmission might not fall within the expected second. This can arise because tags can be programmed with intervals that are not discrete integer values, or because a pulse randomizer is employed. The pulse randomizer slightly adjusts the signal burst rates by a small amount (depending on manufacturer this is typically ~ ± 500 ms), which reduces the probability of two signals colliding resulting in false negatives. Since most commercially available radio telemetry receivers record data to the nearest second, the algorithm must query the recaptures database within a 3-s moving window (the expected time, then that time plus and minus 1 s to allow for rounding). Any detection logged within this broader window is considered valid. For example, if a tag was detected at midnight (00:00:00) and has a 3-s pulse rate, BIOTAS queries all recaptures from 11:57:56 to 11:57:58 and from 00:00:02 to 00:00:04. In the case of Fig. 1, the tag was not detected within the first interval (forwards or backwards); meaning the original detection, which occurred at 00:00:00 would not be valid from a consecutive detection in series perspective. Any detection occurring outside of the interval (+/- epsilon) is not included in the PDH.

Having defined the detection history and its associated time window, we now can calculate (mathrm{HR}), which is the number of detections within a PDH divided by the length of the detection history. For a ± 4 detection hit ratio like the one pictured in Fig. 1, the length of the history is 9, and the HR is 6/9.

The second derived predictor is the consecutive record length ((mathrm{CRL})). This refers to the longest contiguous subset of 1’s in a given detection history. In Fig. 1, the (mathrm{CRL}) is 3. Table 1 contains examples of possible detection histories and their respective (mathrm{HR})s and (mathrm{CRL})s. Note that these PDHs have ± 4 intervals, and that the first row in the table corresponds to the detection history pictured in Fig. 1. The middle position in a PDH is the current detection and is always a 1. Also note the second and third rows, the histories have the same (mathrm{HR}), but different (mathrm{CRL}) (Table 1).

The next predictor is noise ratio ((mathrm{NR})), which is simply the number of plausible study tag hits, divided by the total number of detections (i.e., including known noise detections but excluding beacon tags) within a 1-min interval around the current detection. Detections within the window were categorized into two classes, plausible and known false positive. Plausible detections are from those codes and frequencies currently active within the study area. The remaining detections are from unknown or unavailable codes, in other words, they are known false-positive detections ((f)). The noise ratio ((mathrm{NR})) is given with (R=f/n), where (f) is the number of false-positive detections within a 1-min window around the current detection, and (n) is the total number of detections within a 1-min window around the current detection.

The last derived predictor calculates the second-order difference in time-lag between detections or ({delta }^{2}L). It is simply the difference of the difference in timestamps between sequential rows. When a tagged animal is within detection range, the tags will pulse and be recorded at the nominal rate set at the onset of the study. For example, if the nominal pulse rate of the tag is 3 s, one would expect to hear that tag every 3 s. For a valid detection, the time-lag between successive detections should be 3 s (first order), and the difference in time-lag between subsequent detections should be zero (second order). We expect true-positive detections to have a ({delta }^{2}L) of zero, or fixed multiples of the pulse interval. The more consistent ({delta }^{2}L) is with expectations, the more belief we have in the record being true.

### Treatment of continuous variables

All continuous classifier variables were discretized into bins. Discretization of continuous features has a number of advantages. Discretization roughly approximates probability distributions and helps to overcome inaccurate shape assumptions [11, 41]. Hsu et al. [18] tested a number of discretization methods for NB classifiers, but found no performance improvements with algorithm complexity. Therefore, BIOTAS uses a simple equal width interval discretization process.

Detection power was binned into equal width intervals of 5 dB or 5 arbitrary units depending upon manufacturer, (mathrm{NR}) was binned into 10 percentile units, and lag differences were binned into equal width intervals as wide as the tag’s nominal pulse rate. (mathrm{HR}) and (mathrm{CRL}) were limited by the number of detections within the PDH, and thus limited to a set number of classes.

Discretization has one major limitation. If there were no observations for a particular bin, then the probability of it occurring is zero. This limitation negates the weight of evidence provided by the other predictor variables and is uninformative. To overcome this, we applied Laplace smoothing [21], which added a single observation to each bin and eliminated zero counts. This slight positive bias has almost no effect on training datasets with a large number of observations.

### Training methods

When BIOTAS trains a model, it reads raw telemetry data and separates it into rows with known valid detections, and rows of known false positives. Known false-positive detections are from tags not on the study tag list, while known valid detections come from beacon tags. Detections with known validity are the training set, while detections with unknown validity (i.e., study tags) are the classification set. BIOTAS can construct a training dataset two ways: by training on beacons (supervised) or by training on study tags themselves (semi-supervised).

#### Training on beacons

It is common in telemetry studies for researchers to employ tags that are not on fish. These may be ‘beacons’, which are typically set to transmit at fixed intervals to provide a record of continued functioning of the receiver system; or ‘test tags’, which are typically drawn through the intended detection field in such a way as to emulate the movements of free-swimming fish. Either can be used to provide training data, but the greater verisimilitude provided by test tags makes them the better choice, provided sufficient volume of data are generated to create a suitable training dataset.

#### Training on study tags

There are limitations to using beacons as training data. The beacon may not be representative of a study tag. Transmission intervals might be too long, some tags cannot be cycled on and off in realistic ways, producing unrealistically long strings of valid detections, the tags themselves are typically in fixed locations, and it is possible for there to be false positives among the beacons (after all, it is the removal of data that look valid that drives this effort). When training on study tags, BIOTAS constructs a training dataset that assumes all study tag records are true. This poses a dilemma as we anticipate there being false positives mis-labeled as valid. Therefore, it is advised to re-classify the data by training on the previous iteration’s valid detections and known false positives from the initial training. There exists a tradeoff: with the beacons we had a limited number of tags (usually just one per receiver) so the likelihood of a false positive with that exact code is small. Because of that we were able to make the simplifying assumption that all beacon data were valid. We cannot make the same assumption with study tags because the purpose of this effort is to remove false positives.

The solution to this is to use an iterative approach, where on the first iteration, we train on beacon tags (or study tags themselves) and classify study tags. On subsequent iterations, we train on the previous iterations’ valid detections and known false-positive detections from the first iteration. This alters the density functions of the predictor variables, with fewer known, but higher quality valid detections. A new iteration uses these new frequencies to re-classify the remaining study tag detections. This process continues until convergence when no new observations are classified as false positive.

### False-positive classification

Supervised learning algorithms use observed data with known classifications (training data) to classify unknown data. Bayes theorem takes training data and quantifies the probability that a record is either true or false positive given what we have observed about it [5]. This probability, known as the posterior, is given with

$$Pleft({C}_{i}|{F}_{1},dots ,{F}_{n}right)propto P({C}_{i})prod_{j=1}^{n}Pleft({F}_{j}|{C}_{i}right),$$

(1)

where (Pleft({C}_{i}|{F}_{1},dots ,{F}_{n}right)) is the posterior probability of a valid (or false positive) detection given the values of each observed predictor (({F}_{1},dots ,{F}_{n})); (P({C}_{i})) is the prior probability of the (imathrm{th}) detection class occurring ((Cin left{mathrm{Valid},mathrm{False Positive}right})), and (P({F}_{j}|{C}_{i})) is the likelihood (conditional probability) of the jth observed predictor (({F}_{j})) value given the ith detection class (({C}_{i})). Naïve Bayes assumes that all predictor variables are independent, and hence, the likelihood of the observed predictor values given a detection class is a product. The posterior probability expresses our belief in the record being true or false positive given what we have observed.

The prior (P({C}_{i})) is the marginal probability of the (imathrm{th}) detection classification occurring in the training dataset, where ((Cin left{mathrm{Valid},mathrm{False Positive}right})). BIOTAS calculates the prior probability a record is valid (P(T)) with a simple frequency analysis; (Pleft(Tright)={n}_{T}/n) where ({n}_{T}) is the number of valid records in the training dataset divided by the total number ((n)) of records in the training dataset. Since the priors are marginal, the prior probability that a record is false positive is given with (1-P(T)).

The likelihood (P({F}_{j}|{C}_{i})) is the conditional probability of the jth observed predictor value ({F}_{j}) given the ith detection class (({C}_{i})). BIOTAS calculates the likelihood using a frequency table: (Pleft({F}_{j}|{C}_{i}right)= {n}_{F}/{n}_{C}), where ({n}_{F}) is the number of records within detection class ({C}_{i}) that match the observed predictor value ({F}_{j}), and ({n}_{C}) is the number of records with the detection class ({C}_{i}).

To classify, BIOTAS applies the maximum a posteriori (MAP) hypothesis, and chooses the detection class that is most true. The algorithm’s decision rule becomes

$$underset{{C}_{i}}{mathrm{argmax}} left{P({C}_{i})prod_{j=1}^{n}P({F}_{j}|{C}_{i})right}.$$

(2)

Under this hypothesis, the detection class with the larger posterior probability is chosen. Under the MAP hypothesis, any detection with a valid to false-positive ratio ((frac{P(T)}{1-P(T)})) less than 1.0 as false positive.

Tables 2 and 3 follows the classification of two records from initial observation and description to the calculation of prior, likelihood, and posterior probabilities, and then the application of the MAP criterion. Table 2 contains two records from two different study tags. The first detection was recorded on May 5th, and the second on July 4th. The first detection had a moderately full PDH with an HR of 0.45 and CRL of 4. The second record had a very sparse PDH, with an HR of 0.09. NR was also high for the second detection (Table 2).

Table 3 contains the prior, likelihood and posterior of the two detections identified in Table 3. Note, the posterior is simply the product of all rows above it. The prior probability that a detection is false positive P(F) at this receiver is only 0.004 (Table 3), meaning there is overwhelming evidence that a detection will be valid. The next 5 rows identify the likelihood of each observation occurring given the detection classification. The MAP hypothesis chooses the detection class with the larger posterior, therefore, the detection occurring on May 18 was valid, while the detection occurring on July 4th was false positive.

### Cross-validation

BIOTAS assesses the ability of the algorithm to discern between classified valid study tags and known false-positive detections with a (k)-fold cross-validation technique [39]. For studies that train-on-study tags, the training dataset includes those records classified as valid from the final iteration and known false-positive detections. The procedure partitions the training dataset into (k) equal sized subsamples. In each iteration, a single subsample (or fold) was retained as the test dataset (to be classified) and the remaining (k-1) subsamples are retained as the training dataset. The cross-validation process is then repeated (k) times over each fold, with each of the (k) subsamples used exactly once as validation data. This procedure allows a 1:1 comparison of known classifications against the algorithm’s classifications. A classification can have 1 of 4 states, true positive (({t}_{mathrm{p}}), true negative (({t}_{mathrm{n}})), false positive (({f}_{mathrm{p}})), and false negative (({f}_{mathrm{n}})). Results of the k-fold cross-validation are summarized into a 2 × 2 contingency table (Table 4).

Accuracy metrics derived from the 2 × 2 cross-validation contingency table include sensitivity and specificity. Sensitivity, or the true-positive rate, is given with: (sen={t}_{mathrm{p}}/({t}_{mathrm{p}}+{f}_{mathrm{n}})), and measures the probability that all valid detections were correctly classified as valid. Specificity, or true negative rate, is given with: (spc={t}_{mathrm{n}}/({f}_{mathrm{p}}+{t}_{mathrm{n}})), and quantifies probability that all false-positive detections were correctly classified.

Precision metrics include the positive and negative predictive value or (ppv) and (npv). The positive predictive value ((ppv)) is the proportion of detections classified as valid that were valid; (ppv={t}_{mathrm{p}}/({t}_{mathrm{p}}+{f}_{mathrm{p}})). The negative predictive value ((npv)), which measures the proportion of detections classified as false that were false, is given with: (npv={t}_{mathrm{n}}/({f}_{mathrm{n}}+{t}_{mathrm{n}})). Again, our objective is to maximize both measures. The higher the (ppv) the lower the number of potential false-positive detections in the dataset. Likewise, a high (npv) means a lower number of false negatives.

Since we are identifying and removing false-positive detections, and false-positive detections are rare, the most important algorithm metric is the false-positive rate ((fpr)), which calculates the proportion of detections classified as valid that are in fact false positive with: (fpr={f}_{mathrm{p}}/({f}_{mathrm{p}}+{t}_{mathrm{n}})), or (1-spc). Our objective is to minimize (fpr), the lower the rate, the fewer known false-positive detections were classified as valid.

BIOTAS also produces the precision-recall curve (PRC) and calculates the area under the curve (AUC) statistic. Precision quantifies the number of correct false-positive predictions while recall quantifies the number of correct false-positive predictions made out of all false-positive predictions that could have been made. The AUC statistic summarizes area under the PRC curve for a range of threshold values. The PRC is a better performer on imbalanced datasets [34], which are typical of radio telemetry studies.

The results of the k-fold cross-validation can inform on the selection of predictor variables. Ling et al. [24] found AUC to be statistically consistent and more discriminating than accuracy alone. Rosset [33] used AUC as an evaluation criterion for scoring classification models where models with higher AUC are preferred. With BIOTAS, it is possible to construct suite of classifiers that use different combinations of predictor variables; the model that maximizes measures of AUC, (sen, spc, ppv,) and (npv) while minimizing (fpr) is the best.

#### Case study

BIOTAS was implemented on a large-scale telemetry project on the Connecticut River in 2015 that tracked 560 American Shad and 80 Sea Lamprey with 30 continuous radio telemetry monitoring stations. The subset of receivers highlighted in this paper (Fig. 2) created 4 scenarios, which included multiple receiver manufacturers Sigma Eight Orion and Lotek SR×800 receivers, dipole and Yagi antenna configurations, receivers that scanned multiple frequencies, and receivers that switched between antennas.

The receivers in scenario 1 were Orion units manufactured by Sigma Eight and consisted of detection zones T13, T15, T18, T21 and T22 (Fig. 2). These units spanned the full width and depth of the river and from a noise perspective were similar. Scenario 2 consisted of the detection zones T12E and T12W (Fig. 2), which was a single Orion receiver switching between two Yagi antennas. Scenario 3 consisted of detection zones T09, T07 and T30. The Orion receivers had a single dipole antenna and were typically deployed in areas where specimens were known to congregate. Scenario 4 included detection zones T03, T06, and T24. These were Lotek SR×800 receivers with a single Yagi switching between 5 frequencies. BIOTAS accounts for number of frequencies (or antennas) and the scan time devoted to each while building the PDH and deriving CRL and HR statistics.

After training and classifying receivers within each scenario, we performed a k-fold cross-validation that assessed the ability of BIOTAS to correctly identify and remove known false-positive detections from record with measures (sen), (spc), (ppv), (npv), (fpr), and precision-recall AUC (PRC-AUC). Aside from assessing the quality of the model, these metrics also assist in model selection, as we will demonstrate.

As a last measure, we compared the saturated model in BIOTAS with the filtering method proposed by Beeman and Perry [6], which stated for a detection to be classified as valid, it must be within a consecutive series of detections. We assessed concordance between methods with Cohen’s Kappa ((kappa)) [26], which takes into account the possibility of agreement occurring by chance. A value of (kappa =1) suggests perfect agreement between BIOTAS and the consecutive detection requirement.