FIGURE SUMMARY
Title

A machine-learning tool to identify bistable states from calcium imaging data

Authors
Varma, A., Udupa, S., Sengupta, M., Ghosh, P.K., Thirumalai, V.
Source
Full text @ J. Physiol.

CEREBELLAR PNS IN LARVAL ZEBRAFISH EXHIBIT MEMBRANE POTENTIAL BISTABILITY A, schematic of the cerebellar circuitry in larval zebrafish (Danio rerio), showing the location of the cerebellum in the brain (left, dashed line box) and the cell types and circuit architecture (right). PNs (green) are the principal neurons in this circuit and receive multiple inputs from afferent sensorimotor nuclei outside the cerebellum via parallel fibres (black) and climbing fibres (magenta). B, simple spikes, as observed intracellularly (left) and extracellularly (right). Individual events (n = 100, intracellular and n = 100, extracellular) are shown in cyan and have been aligned and superimposed on one another. The average of these events is shown in black. C, climbing fibre (CF) inputs, as observed intracellularly (left) and extracellularly (right). Individual events (n = 55 intracellular and n = 10 extracellular) are shown in magenta, and their respective averages in black. D, representative loose patch recordings from larval zebrafish PNs showing the two modes of firing. The bursting mode recording is shown on the left, and the tonic mode is shown on the right. In each mode, electrical events can be distinguished by their amplitude, with small-amplitude simple spikes marked in cyan and the large-amplitude CF inputs marked in magenta. E, representative whole-cell current clamp recordings showing the two modes of firing in PNs, with the same colour scheme for events as before. Cells shown in (D) and (E) are different. F, example plots of simple spike inter-event intervals from individual recordings. Five example cells are shown from each class (Bursting and Tonic). The coefficient of variation of this distribution is marked under each cell as ‘C.V.’. G, the distribution of interspike interval coefficients of variation (n = 131 cells), sorted by state (n = 53 for tonic, and n = 78 for bursting). The distributions are significantly different, tested using a Mann–Whitney U test (P = 6.0733e-16).

BOTH SIMPLE SPIKES AND CF EPSPS CONTRIBUTE TO THE CALCIUM SIGNAL IN PNS, IRRESPECTIVE OF CELLULAR STATE A, schematic of the experimental design and setup of simultaneous calcium imaging and electrophysiology in larval zebrafish. B and C, representative traces obtained during simultaneous imaging and electrophysiology for cells in the bursting (B) and tonic (C) mode. Simple spikes and CF inputs in the trace are identified and marked in the raster above the trace in cyan and magenta, respectively. The simultaneously-recorded change in GCaMP fluorescence (ΔF/F) is shown below in green. Calcium transients corresponding to simple spikes only (‘∗’), CF-events only (‘#’), and both events (‘o’) are marked. D, the typical GCaMP5G response to a single action potential, assuming it follows the profile of a difference of single exponentials with a half-rise time of 100 ms and half-decay time of 500 ms. The vertical dashed lines indicate the time of initiation of the calcium transient and the time when the transient peaks, which corresponds to an interval of 201 ms. Region I is the period up to 300 ms before the peak of the calcium signal, and region II is the period up to 300 ms after the peak of the transient. There were a total of n = 171 transient peaks in the ΔF/F signal detected across the N = 15 cells. E, average simple spike counts in the periods corresponding to regions I and II as defined in (D). Each cyan line corresponds to an individual cell (N = 15) and the average of all these lines is shown in black. F, average CF input event counts in the periods corresponding to regions I and II as defined in (D). Each magenta line corresponds to an individual cell (N = 15) and the average of all these lines is shown in black.

RECONSTRUCTION OF THE CALCIUM SIGNAL FROM ELECTROPHYSIOLOGY TRACES A, schematic showing the method used to reconstruct the calcium signal from an electrophysiological trace. Simple spike (cyan) and CF input (magenta) rasters (left) are independently convolved with calcium sensor kernels (cyan for simple spikes, magenta for CF inputs) to generate an event-specific calcium signal time series (middle). These are then simply added together to get the final reconstruction (right). B, comparison of ground truth (green) and reconstructed (purple) calcium signals for three randomly chosen bursting cells (Ba) and three randomly chose tonic cells (Bb). The Pearson's correlation coefficient between the two traces is marked above the plot for each pair. C, correlation coefficients for all cells calculated for either the true reconstruction, a scrambled version of the reconstruction, or the reconstruction for when simple spike and CF input event times were shuffled. A Kruskal–Wallis test yielded a P value of 2.108 × 10−7, which was followed by a post hoc Dunn's test for individual comparisons, the latter of which is marked on the figure (n = 15 cells; six bursting, nine tonic). D, distribution of residuals between the ground truth and reconstructed calcium signal, pooled across all cells (n = 15 cells). E, the optimal calcium sensor kernels for simple spikes (cyan) and CF inputs (magenta). The formula for each kernel is mentioned above the plots, and the values for rise and decay time constants are mentioned below the plot.

GENERATING A LABELLED DATASET OF RECONSTRUCTED CALCIUM SIGNAL TRACES A, flowchart showing how electrophysiological recordings from PNs were processed to generate the state-labelled calcium signal database. B, representative randomly sampled traces from the state-labelled reconstructed calcium signal dataset. The left column has reconstructions from tonic cells and the right column has reconstructions from bursting cells. Above each trace is the raster showing the events in the source recording, with simple spikes in cyan, and CF inputs in magenta. C, heatmaps representing all the reconstructions in the state-labelled calcium signal dataset, for each state, after splitting all traces into 10-s-long non-overlapping chunks. (n = 348 tonic traces; n = 580 bursting traces). D, comparing the distribution of trace properties for both states. The top three plots show (from left to right) the number of peaks, peak amplitude and area under the curve of the reconstructions in the dataset. The bottom three plots show (from left to right) the mean, standard deviation and the coefficient of variation of ΔF/F values. n values are the number of 10 s long samples. N values are the number of cells used to generate the samples. P values were computed using linear mixed-effects models. E, principal components analysis of trace properties, with data points marked by state: tonic (red crosses) and bursting (black open circles).

DESIGN AND TRAINING OF CAMLSORT A, flowchart showing how the state- labelled calcium signal dataset was split into training and test datasets using a five-fold 60:20:20 split into training (white boxes), test (blue boxes) and cross-validation (yellow boxes) samples, respectively. The networks that performed well in the training phase, as assessed by performance on the cross-validation samples, were then challenged with the unseen test data (blue boxes) in each fold. An independent dataset with state switches was also used to further validate the networks obtained. B, the architecture of the convolutional recurrent neural networks used to solve the classification problem. The network takes an input time series trace sampled at 30 Hz (purple) and that is a multiple of 10 s long in duration (here just 10 s long). This trace is first normalized using min-max normalization, following which it is passed through a 1-D convolutional neural network (1D-CNN), which has four kernels with a step size and stride length of 30 each. The resulting four traces (grey boxes) are down-sampled, local feature-extracted versions of the original trace. All four outputs are passed to the LSTM module (central black box) for trend identification. The LSTM is a bidirectional one, which processes traces in both the forward (blue arrow) and reverse (red arrow) directions. A single time step is taken from all the CNN outputs (red and blue dotted boxes) and passed to the respective LSTM cell, which then processes the input to produce a resultant ‘hidden state’ output (red and blue filled boxes). This output is passed back into the LSTM cell along with the next sample from the CNN's output traces, until every time step from it has been processed (here 10 rounds). Once this is done, the final ‘hidden state’ from each LSTM cell is retained and concatenated before their weighted average is taken by a linear layer. The resulting average is converted to a ‘posterior score’ using an activation function. These posterior scores represent the likelihood of the cell being in the bursting state at each time step. The likelihood of it being in the tonic state can be calculated from this. The final call is taken to be the state, which has a higher posterior score. The text in italics represents the net effect of each phase of the neural network. The numbers in parentheses indicate the sampling frequency and/or the duration of the resultant vectors at various stages of processing by the neural network. Similarly, the numbers in bold and italics within square brackets indicate the size of the vectors/matrices at various stages of passing through the network. C, average classification accuracy of trained networks at the cross-validation and the test phases. Each fold is represented in a different colour, marked in the legend at the bottom of (D). D, average F1 scores for each of the trained networks at the cross-validation and the test phases. E, area under the ROC (receiver operating characteristic) curve for each of the five trained CNN-LSTM networks for both cross-validation (black) and test (grey) data. F, ROC curve for Fold 2 of the CNN-LSTM for cross-validation (black) and test (grey) data.

CAMLSORT PREDICTIONS FROM PREVIOUSLY UNSEEN CALCIUM SIGNAL RECONSTRUCTIONS A, raw CaMLsort class label predictions from previously unseen calcium signal reconstructions (n = 19 recordings). Predictions from CaMLsort (middle) are compared against the ground truth state (top) at each time step. Predictions for each recording were made independently, but have been stitched here for the purposes of representation. Each recording is separated by a vertical dashed line. The number above each prediction is the F1 score for that recording. The bottommost plot has posterior scores for each time step, with those for the ‘bursting’ class in blue, and those for the ‘tonic’ class in light orange. B, CaMLsort predictions (top) that remain after majority voting with a window length of seven steps has been used to retain only confident classifications (for details, see Methods and Results). The posterior score (bottom) for each class at each time step, taken as the average class posterior score across all seven-step windows that included said time step. As in (A), the posterior scores for the ‘bursting’ class are in blue, whereas those for the ‘tonic’ class are in light orange, and the numbers above each prediction represent the F1 scores for that recording. C, the distribution of F1 scores between the ground truth state and various predictions from CaMLsort, either the raw predictions (left), predictions obtained after majority voting (middle) or the raw predictions obtained by passing a scrambled version of the calcium signal as an input. The notches indicate 95% confidence intervals around the median (red line).

CAMLSORT PREDICTIONS FROM CALCIUM SIGNAL RECONSTRUCTIONS OF SIMULATED SPIKE TRAINS A, raw CaMLsort class label predictions from previously unseen calcium signal reconstructions that were generated from simulated spike trains (n = 20 recordings). Predictions from CaMLsort (middle) are compared against the ground truth state (top) at each time step. Predictions for each recording were made independently, but have been stitched here for the purposes of representation. Each recording is separated by a vertical dashed line. The number above each prediction is the F1 score for that recording. The bottommost plot has posterior scores for each time step, with those for the ‘bursting’ class in blue, and those for the ‘tonic’ class in light orange. B, CaMLsort predictions (top) that remain after majority voting with a window length of seven steps has been used to retain only confident classifications (for details, see Methods and Results). The posterior score (bottom) for each class at each time step, taken as the average class posterior score across all seven-step windows that included said time step. As in (A), the posterior scores for the ‘bursting’ class are in blue, whereas those for the ‘tonic’ class are in light orange, and the numbers above each prediction represent the F1 scores for that recording. C, the distribution of F1 scores between the ground truth state and various predictions from CaMLsort, either the raw predictions (left), predictions obtained after majority voting (middle) or the raw predictions obtained by passing a scrambled version of the calcium signal as an input. The notches indicate 95% confidence intervals around the median (red line).

CAMLSORT PREDICTIONS ON IMAGING DATA FROM ZEBRAFISH PNS A, raw CaMLsort predictions of cellular state inferred from experimental data acquired using widefield imaging (n = 15). Predictions from CaMLsort (middle) are compared against the ground truth state (top) at each time step. Each recording was treated independently, but predictions across cells have been stitched here for ease of visualization. Each recording is separated by a vertical dashed line, and the number above each prediction is the F1 score for that recording. The bottommost plot has posterior scores for every time step, with those for the ‘bursting’ class in blue, and those for the ‘tonic’ class in light orange. B, CaMLsort predictions (top) that remain after majority voting with a window length of seven steps has been used to retain only confident classifications (for details, see Methods and Results). The posterior score (bottom) for each class at each time step, taken as the average class posterior score across all seven-step windows that included said time step. As in (A), the posterior scores for the ‘bursting’ class are in blue, whereas those for the ‘tonic’ class are in light orange. C, the distribution of F1 scores between the ground truth state and various predictions from CaMLsort: raw predictions from the imaging data (far left), predictions obtained after majority voting (second from left), raw predictions obtained from calcium signals reconstructed from spike times (third from left) and those predictions from imaging ΔF/F traces that were scrambled prior to prediction. The notches indicate 95% confidence intervals around the median (red line).

RECONSTRUCTING CALCIUM SIGNALS FOR VTA DA NEURONS USING A CONVOLUTION-BASED FORWARD MODEL A and B, spike rasters of two representative VTA DA neurons. C, the distributions of C.V.s of interspike intervals for cells classified as either bursting (n = 62) or tonic (n = 16 cells). The P value was calculated using a one-sided Mann–Whitney U test. D, electrophysiological activity and associated calcium signals from two representative VTA DA neurons. For each cell, spike timings are shown as rasters (black) and the raw simultaneously-acquired ΔF/F signals aligned below (green). Two calcium signal reconstructions were generated, using either a GCaMP6f kernel (gold) or a GCaMP5G kernel (blue). Numbers alongside each reconstruction indicate the Pearson correlation coefficient between the ground truth and the corresponding calcium signal reconstruction. E, quantification of the reconstruction quality, as measured using the Pearson correlation coefficient between the ground truth and reconstructed signals (‘Reconstruction’). For each cell (n = 78), the correlation between ground truth and a scrambled version of the reconstructed signal (‘Scrambled’), as well as the correlation between ground truth and a reconstruction generated after shuffling spike timings (‘Events Shuffled’), were measured. Individual data points are plotted alongside each boxplot. P values were calculated using a Kruskal–Wallis test and yielded a P value of 4.6130 × 10−49, and then using the post hoc Dunn's test for individual comparisons, the latter of which are indicated.

CAMLSORT PREDICTIONS OF CELLULAR STATE FROM CALCIUM SIGNALS OF VTA DA NEURONS A, a subset of 12 (of 78) raw CaMLsort predictions from imaging data acquired from DA neurons of the VTA in mice. Each prediction was independently obtained but has been stitched together to produce a single time series with cells separated by vertical dashed lines. The ground truth state (top) is used as reference to assess the quality of CaMLsort's predictions (middle) using the F1 score, which is printed above the prediction for each cell. The posterior scores for each class, bursting (blue) and tonic (light orange), are shown in the lowest plot. B, the entire distribution of F1 scores (n = 78) between the ground truth state and the binary classification from CaMLsort obtained from various inputs: (from left to right) experimentally-acquired ΔF/F traces, calcium signals reconstructed from true spike trains using the GCaMP5G kernel, calcium signals reconstructed from shuffled spike trains using the GCaMP5G kernel or a scrambled version of the raw ΔF/F data. Individual datapoints are shown alongside each boxplot. The horizontal dashed line corresponds to an F1 score of 0.5. The notches indicate 95% confidence intervals around the median (red line).

Acknowledgments
This image is the copyrighted work of the attributed author or publisher, and ZFIN has permission only to display this image to its users. Additional permissions should be obtained from the applicable author or publisher of the image. Full text @ J. Physiol.