An online peak extraction algorithm for ion mobility spectrometry data
 Dominik Kopczynski^{1} and
 Sven Rahmann^{1, 2}Email author
https://doi.org/10.1186/s1301501500455
© Kopczynski and Rahmann; licensee BioMed Central. 2015
Received: 20 March 2015
Accepted: 2 April 2015
Published: 13 May 2015
Abstract
Ion mobility (IM) spectrometry (IMS), coupled with multicapillary columns (MCCs), has been gaining importance for biotechnological and medical applications because of its ability to detect and quantify volatile organic compounds (VOC) at low concentrations in the air or in exhaled breath at ambient pressure and temperature. Ongoing miniaturization of spectrometers creates the need for reliable data analysis onthefly in small embedded lowpower devices. We present the first fully automated online peak extraction method for MCC/IMS measurements consisting of several thousand individual spectra. Each individual spectrum is processed as it arrives, removing the need to store the measurement before starting the analysis, as is currently the state of the art. Thus the analysis device can be an inexpensive lowpower system such as the Raspberry Pi.
The key idea is to extract onedimensional peak models (with four parameters) from each spectrum and then merge these into peak chains and finally twodimensional peak models. We describe the different algorithmic steps in detail and evaluate the online method against stateoftheart peak extraction methods.
Keywords
Ion mobility spectrometry Peak detection Automated data analysis Online analysisIntroduction
Ion mobility (IM) spectrometry (IMS), coupled with multicapillary columns (MCCs), MCC/IMS for short, has been gaining importance for biotechnological and medical applications. With MCC/IMS, one can measure the presence and concentration of volatile organic compounds (VOCs) in the air or in exhaled breath with high sensitivity. In contrast to other technologies, such as mass spectrometry coupled with gas chromatography (GC/MS), MCC/IMS works at ambient pressure and temperature. Several diseases like chronic obstructive pulmonary disease (COPD) [1], sarcoidosis [2] or lung cancer [3] can potentially be diagnosed early with MCC/IMS technology. IMS is also used for the detection of drugs [4] and explosives [5]. Constant monitoring of VOC levels is of interest in biotechnology, e.g., for watching fermenters with yeast producing desired compounds [6] and in medicine, e.g., monitoring propofol levels in the exhaled breath of patients during surgery [7].
IMS technology is moving towards miniaturization and small mobile devices. This creates new challenges for data analysis: The analysis should be possible within the measuring device without requiring additional hardware like an external laptop or a compute server. Ideally, the spectra can be processed on a small embedded chip or small device like a Raspberry Pi or similar hardware with restricted resources. Algorithms in small mobile hardware face constraints, such as the need to use little energy (hence little random access memory), while maintaining prescribed time constraints.
The basis of MCC/IMS analysis is peak extraction, by which we mean a representation of all highintensity regions (peaks) in the measurement by using a few descriptive parameters per peak instead of the full measurement data. Stateoftheart software (like IPHEx [8], Visual Now [9], PEAX [10]) only extracts peaks when the whole measurement is available, which may take up to 10 minutes because of the preseparation of the analytes in the MCC. Our own PEAX software in fact defines modular pipelines for fully automatic peak extraction and compares favorably with a human domain expert doing the same work manually when presented with a whole MCC/IMS measurement. However, storing the whole measurement is not desirable or possible when the memory and CPU power is restricted. Here we introduce a method to extract peaks and estimate a parametric representation while the measurement is being captured. This is called online peak extraction, and this article presents the first algorithm for this purpose on MCC/IMS data. An extended abstract of this work has been published at WABI’14 [11].
Section ‘Background’ introduces the necessary background on the data produced by an MCC/IMS experiment, on peak modeling and on optimization methods. The basic idea of our algorithm is to process each IM spectrum as soon as it arrives (and before the next one arrives). After appropriate preprocessing including denoising and baseline correction described in Section ‘Denoising and baseline correction’, the single spectra are reduced into a mixture of parametric onedimensional peak models, described in Section ‘Reducing a spectrum to peak models’. Accordingly, in Section ‘Aligning consecutive spectrum peak lists’ the approach of connecting models from two subsequent spectra into peak chains is explained. The main challenge is then to merge the peak chains into twodimensional peak models, described in Section‘Estimating 2D peak models’. In Section ‘Peak clustering’ we introduce a novel approach for clustering peaks among several measurements e.g. for time series. An evaluation of our approach is presented in Section ‘Evaluation’ including a listing of all settings of the MCC/IMS as well as an explanation of all adjustable parameters, while Section ‘Discussion and conclusion’ contains a concluding discussion.
Background
Ion mobility spectrometers and their functions are well documented [12], and we do not go into technical details. Instead, we characterize the data generated by an MCC/IMS experiment (Section ‘Data from MCC/IMS measurements’). In Section ‘Peak models’ we describe a previously used parametric peak model, and in Section ‘Optimization methods’ we review two optimization methods that are being used as subroutines in this work.
Data from MCC/IMS measurements
In an MCC/IMS experiment, a mixture of several unknown volatile organic compounds (VOCs) is separated in two dimensions: first by retention time r in the MCC (the time required for a particular compound to pass through the MCC) and second by drift time d through the IM spectrometer. Instead of the drift time itself, a quantity normalized for pressure and temperature called the inverse reduced mobility (IRM) t is used to compare spectra taken under different or changing conditions. Thus we obtain a time series of IM spectra (one spectrum each 100 ms at each retention time point), and each spectrum is a vector of ion concentrations (measured by voltage change on a Faraday plate) at each IRM.

An IM spectrometer uses an ionized carrier gas. These ions are present in every spectrum in addition to the analyte ions, and they create the reactant ion peak (RIP). In the whole IMSC it is present as highintensity chromatogram at a specific IRM (Figure 1). When no analytes are injected into the device, the spectra contain only the RIP and are called RIPonly spectra.

Every spectrum contains a tailing of the RIP, so the RIP is rightskewed (Figure 2). To extract peaks, the effect of the RIP and its tailing must be estimated and removed.

At higher concentrations, compounds can form dimer ions, and one may observe both the monomer and dimer peak from one compound. This means that there is not necessarily a onetoone correspondence between peaks and compounds, and our work focuses on peak detection, not compound identification.

An IM spectrometer may operate in positive or negative mode, depending on which type of ions (positive or negative) one wants to detect. In either case, signals are reported in positive units. All experiments described here were done in positive mode.
Peak models
For our purpose of analyzing MCC/IMS measurements, a peak is characterized by the following assumptions.
Assumptions 1.
An ndimensional peak P is a product of n logconcave functions with two inflection points in each dimension. The peak width at half height ω _{1/2,i } can be calculated with respect to the mode for each dimension i. At its mode (m _{1},…,m _{ n }), peak P exceeds the average background noise level by a certain factor times the standard deviation of the noise.
Its parameters are the shift (or offset) o, the relative mean μ>0 (to the right of o) and the shape parameter λ>0. A peak is then given as the product of two shifted Inverse Gaussians, scaled by a volume factor v, i.e., by seven parameters; so the density function of a peak is p(r,t):=v·g(r,μ _{r},λ _{r},o _{r})·g(t,μ _{t},λ _{t},o _{t}) for all r∈R,t∈T.
Since the parameters μ,λ,o of a shifted Inverse Gaussian may be different even though the resulting distributions have a similar shape, it is more intuitive to describe the shifted Inverse Gaussian in terms of three different descriptors: the (absolute) mean μ ^{′}=o+μ, the standard deviation σ and the mode m. There is a bijection between (μ,λ,o) and (μ ^{′},σ,m) [13] summarized in Appendix Appendix A: peak descriptors and parameters.
This relation approximately holds as well for not too skewed Inverse Gaussian distributions and is a good approximation to estimate its descriptor σ approximately from an empirically observed ω _{1/2}.
Given the mode d ^{∗} of a peak in drift time (in ms), we can estimate its descriptors (m,σ,μ ^{′}) in IRM units as follows. Recall that the IRM mode (in V s cm ^{−2}) is simply m=C _{td}·d ^{∗}, where C _{td} is the conversion constant between drift time and IRM (see Section ‘Data from MCC/IMS measurements’). Spangler et al. [14] empirically derived that \(\omega _{\nicefrac {1}{2}} = \sqrt { (11.09\, \mathcal {D} \, d^{*}) / v_{\text {d}}^{2} + d_{\text {grid}}^{2} }\), where is the diffusion coefficient, v _{d} the drift velocity. Using the Einstein relation [15], can be computed as \(\mathcal {D} = k \mathcal {K_{\text {B}}}\mathcal {T} / q\), where k is the ion mobility, \(\mathcal {K_{\text {B}}}\) the Boltzmann constant, the absolute temperature and q the electric charge. We then use (2) to estimate σ≈ω _{1/2}/ϕ. Finally, the mean is empirically found to be \(\mu ' \approx C_{\text {t}\text {d}} \cdot \left (d^{*} + \sqrt {(4.246 \cdot 10^{5})^{2} + (d^{*})^{2} / 585048.1633} \right)\).
Optimization methods
The online peak extraction algorithm makes use of nonlinear unconstrained minimization, similar to nonlinear least squares, and of the EM algorithm. Both methods are summarized here.
Nonlinear Least Squares
The NLLS method is an iterative method to estimate parameters θ=(θ _{1},…,θ _{ q }) of a supposed parametric function f, given n observed data points (x _{1},y _{1}),…,(x _{ n },y _{ n }) with y _{ i }=f(x _{ i };θ). The idea is to minimize the quadratic error \(\sum _{i=1}^{n}\, {r^{2}_{i}}(\theta)\) between the function and the observed data, where r _{ i }(θ):=y _{ i }−f(x _{ i };θ) is the residual of the ithe datapoint. The necessary optimality condition is \(\sum _{i}\, r_{i}(\theta) \cdot \partial r_{i}(\theta) / \partial \theta _{j} = 0\) for all j. If f is linear in θ (e.g., a polynomial in x with θ being the polynomial coefficients, a setting called polynomial regression), then the optimality condition results in a linear system, which can be solved in closed form. However, often f is not linear in θ and we obtain a nonlinear system, which is solved iteratively, given initial parameter values, by linearizing it in each iteration. Details and different algorithms for NLLS can be found in the literature ([16], Chapter 10). In this paper, we use a different, nonsymmetric loss function, but apply similar techniques to solve the problem (see below).
The EM algorithm for mixtures with heterogeneous components
The observed data x=(x _{1},…,x _{ n }) is viewed as a sample from a mixture of probability distributions, where the mixture density is specified by \(f(x_{i} \,\, \omega, \theta) = \sum _{c=1}^{C}\, \omega _{c}\; f_{c}(x_{i} \,\, \theta _{c})\). Here c indexes the C different component distributions f _{ c }, where θ _{ c } denotes the parameters of f _{ c }, and θ=(θ _{1},…,θ _{ C }) is the collection of all parameters. The mixture coefficients satisfy ω _{ c }≥0 for all c, and \(\sum _{c}\, \omega _{c} = 1\). Unlike in most applications, where all component distributions f _{ c } are multivariate Gaussians, here the f _{ c } are of different types (e.g., uniform and Inverse Gaussian). The goal is to determine the parameters ω and θ such that the probability of the observed sample is maximal (maximum likelihood paradigm). Since the resulting optimization problem is nonconvex in (ω,θ), the EM algorithm is an iterative method that will converge to a local optimum [17] in parameter space. The EM algorithm consists of two repeated steps: The Estep (expectation) estimates the expected membership of each data point in each component and then the component weights ω, given the current model parameters θ. The Mstep (maximization) estimates maximum likelihood parameters θ _{ c } for each parametric component f _{ c } individually, using the expected memberships as hidden variables that decouple the model.
EStep.
Convergence.
After each Mstep of an EM cycle, we compare θ _{ c,q } (old parameter value) and \(\theta ^{+}_{c,q}\) (updated parameter value), where q indexes the elements of θ _{ c }, the parameters of component c. We say that the algorithm has converged when the relative change \(\kappa _{c,q} := \theta _{c,q}^{+}  \theta _{c,q} \,/\, \max \left (\theta _{c,q}^{+},\theta _{c,q} \right)\) drops below a given threshold thresh for all c,q, (if \(\theta _{c,q}^{+} = \theta _{c,q} = 0\), we set κ _{ c,q }:=0).
Having reviewed the necessary background, we now describe the methods we use for peak extraction from IMSCs.
Denoising and baseline correction
Background
A major challenge during peak detection in an IM spectrum is to find peaks that only slightly exceed the background noise level in a spectrum S=(S _{ t }). To determine whether the intensity S _{ t } at coordinate t belongs to a peak region or can be solely explained by background noise, we propose a method based on the EM algorithm. It runs in \(\mathcal {O}(\tau T)\) time where τ is the number of EM iterations.
Mixture model

the noise intensity has a Gaussian distribution over low intensity values with mean μ _{N} and standard deviation σ _{N},$$p_{\text{N}}(s \,\, \mu_{\text{N}}, \sigma_{\text{N}}) = \frac{1}{\sqrt{2 \pi} \, \sigma_{\text{N}}} \cdot \exp \big((s  \mu_{\text{N}})^{2} / (2\, \sigma_{\text{N}}^{2}) \big) $$

the true signal intensity has an Inverse Gaussian distribution with mean μ _{S} and shape parameter λ _{S}, i.e.,$$p_{\text{S}}(s \,\, \mu_{\text{S}}, \lambda_{\text{S}}) \,=\, \sqrt{\lambda_{\text{S}} / (2 \pi s^{3})} \cdot \exp \big(\lambda_{\text{S}} (s  \mu_{\text{S}})^{2} / (2 \mu_{\text{S}}^{2} s) \big) $$

there is an unspecific background component which is not well captured by either of the two previous distributions; we model it by the uniform distribution over all intensities,and we expect the weight ω _{B} of this component to be close to zero in standard IM spectra. High weights indicate an anomaly during the measurement.$$p_{\text{B}}(s) = 1 / (\max(S)  \min(S)), $$
It follows that there are six independent parameters to estimate: μ _{N}, σ _{N}, μ _{S}, λ _{S} and weights ω _{N},ω _{S},ω _{B} (noise, signal, background, where ω _{B}=1−ω _{N}−ω _{S}).
Initial parameter values
Background noise intensities are assumed to follow a Gaussian distribution at small intensity values. We can determine its approximate mean μ _{N} and standard deviation σ _{N} by considering the first and last 10% of data points in each spectrum.
The initial weight of the noise component is set to cover most points covered by this Gaussian distribution, i.e., ω _{N}:={t∈T  S _{ t }≤μ _{N}+3 σ _{N}} / T.
We assume that almost all of the remaining weight belongs to the signal component, thus ω _{S}=(1−ω _{N})·0.999, and ω _{B}=(1−ω _{N})·0.001.
To obtain initial parameters for the signal model, let I ^{′}:={t∈T  S _{ t }>μ _{N}+3 σ _{N}} (the complement of the intensities that are initially assigned to the noise component). We set \(\mu _{\text {S}} = \left (\sum _{t \in I'}\, (S_{t}  \mu _{\text {N}})\right) / I'\) and \(\lambda _{\text {S}} = (\sum _{t \in I'}\, (1/(S_{t}  \mu _{\text {N}})  1 / \mu _{\text {S}}))^{1}\) (which are the maximum likelihood estimators for Inverse Gaussian parameters).
Estep
The hidden parameters W _{ t,c } are computed using (4), where the three component distributions f _{ c } are the three component densities p _{N},p _{S},p _{B} with their parameters and the data x is a meansmoothed version of the original spectrum S: \(x_{t} := \frac {1}{2\alpha +1} \cdot \sum _{t' = t  \alpha }^{t + \alpha } S_{t'}\), where the smoothing window margin is α:=(1/2)·d _{grid}·C _{td}·T/T _{last}. (Here d _{grid} is the grid opening time of the spectrometer and T _{last} is the maximum IRM in T).
Maximum likelihood estimators
for all t∈T.
Final step
Reducing a spectrum to peak models
Background
The idea of processing a single (noisereduced) IM spectrum S is to deconvolute it into separate components described with statistical distribution functions. Several components appear in each spectrum besides the peaks, namely the previously described RIP and the tailing described in Section ‘Data from MCC/IMS measurements’. and background noise. We first determine and remove the RIP tailing function and then determine the peak parameters (including the RIP).
Determining the tailing function
The tailing function appears as a baseline in every spectrum (see Figure 2 for an example). Its shape and scale changes from spectrum to spectrum; so it has to be determined in each spectrum and subtracted in order to extract peaks from the remaining signal in the next step. Empirically, we observe that the tailing function f(t) can be described by a scaled shifted Inverse Gaussian, f(t)=v·g(t;μ,λ,o) with g given by (1). The goal is to determine the parameters θ=(v,μ,λ,o) such that f _{ θ }(t) underfits the given data S=(S _{ t }), as shown in Figure 2.
That is, the loss at time t is the residual squared when it has a negative or small positive value less than the given threshold γ>0, but becomes a linear function of the residual for larger positive residuals.
The goal is to find θ to minimize the total loss \(L(\theta) := \sum _{t}\, e_{t}(\theta ; \gamma)\) for the given spectrum S and given γ>0. We use gradient descent to solve this nonlinear optimization problem, to which we refer as nonlinear loss minimization (NLLM).
 1.
we determine reasonable initial values for the parameters θ=(v,μ,λ,o) (see below),
 2.
we solve NLLM with \(\gamma = \sigma _{\text {N}}^{2}\) to estimate the scaling factor v, leaving the other parameters fixed,
 3.
we solve NLLM with \(\gamma = \sigma _{\text {N}}^{2}\) to estimate all four parameters,
 4.
we solve NLLM with \(\gamma = \sigma _{\text {N}}^{2}/100\) to reestimate the scaling factor v
where σ _{N} is the standard deviation of the noise as described in Section ‘Denoising and baseline correction’.
The initial parameter values (v,μ,λ,o) are determined as follows: For the scaling factor, we initially set \(v = (1/2) \sum _{t \leqslant T}\, S_{t}\). For the other parameters, we first estimate the descriptors (μ ^{′},σ,m) as described below and then use the correspondence to the parameters listed in Appendix Appendix A: peak descriptors and parameters. The initial σ is set to the standard deviation of the whole RIPonly spectrum. We determine the initial m as the RIP mode. It is a property of the Inverse Gaussian distributions under consideration such that the mean μ ^{′} can only range within the interval I= [m,m+0.7σ]. To obtain an appropriate value for μ ^{′}, an auxiliary offset variable o ^{′} is set to the largest IRM left of the RIP mode where the signal is below σ _{N}, and μ ^{′} it is increased in small steps within I. The candidate descriptors (μ ^{′},σ,m) are converted into corresponding parameters (μ,λ,o) until o≥o ^{′}. The so obtained parameters constitute the initial parameter values.
Extracting peak parameters from a single spectrum
 1.
scanning for a potential peak, starting where the previous iteration stopped;
 2.
determining peak parameters (Inverse Gaussian distribution);
 3.
subtracting the peak from the spectrum and continuing with the remainder.
Scanning. The algorithm scans for peaks, starting at the left end of S, by sliding a window of a given width across S and fitting a quadratic polynomial model to the data points within the window. The window width (in index units) is related to the grid opening time d _{grid} of the spectrometer and given as d _{grid}/D _{last}·D data points, where D _{last} is the maximum (last) drift time measured.

the extreme drift time d ^{∗}=θ _{1}/(2θ _{2}) lies within the drift times of the window;

the extreme drift time d ^{∗} indicates a maximum (i.e., θ _{2}<0);

the maximum is sufficiently high above the noise level (which is zero after preprocessing): f(d ^{∗};θ)≥σ _{N}
The first condition can be more strongly restricted to achieve more reliable results, by shrinking the interval towards the center of the window. If no peak is found, the moving window is shifted one index forward. If a peak is detected, the window is shifted half the window length forward before the next scan begins, but first the peak parameters are computed.
Determining peak parameters. As described in Section ‘Peak models’, we can estimate all peak descriptors (m,σ,μ ^{′}) from its mode d ^{∗} in drift time. We convert them into the parameters (μ,λ,o) of the Inverse Gaussian parameterization (see Appendix Appendix A: peak descriptors and parameters). The scaling factor v for the peak is v=f(d ^{∗};θ)/g(m;μ,λ,o).
The model function is subtracted from the spectrum, and the next iteration is started with a window shifted by α index units (consider Section ‘Estep’). For each spectrum, the output of this step is a spectrum peak list, which is a set of parameters for a mixture of weighted Inverse Gaussian models describing the peaks.
Aligning consecutive spectrum peak lists
Background
Having a set of peak parameters for each spectrum, the question arises how to merge the sets P=(P _{ i }) and \(P^{+}=(P^{+}_{j})\) of two consecutive spectra. For each peak P _{ i }, we have stored the Inverse Gaussian parameters μ _{ i },λ _{ i },o _{ i }, the peak descriptors μ i′,σ _{ i },m _{ i } (mean, standard deviation, mode) and the scaling factor v _{ i }, and similarly so for the peaks \(P^{+}_{j}\). The idea is to compute a global alignment similar to the NeedlemanWunsch method [18] between P and P ^{+}. We need to specify how to score aligning P _{ i } to \(P^{+}_{j}\) and how to score leaving a peak unaligned (i.e., a gap).
Scoring peak alignments
Alternatively, leaving a peak unmatched results in a gap score of zero.
Obtaining peak chains

If \(P^{+}_{j}\) is not aligned with a peak in P, potentially a new peak starts at this retention time. Thus model \(P^{+}_{j}\) is put into a new peak chain.

If \(P^{+}_{j}\) is aligned with a peak P _{ i }, the chain containing P _{ i } is extended with \(P^{+}_{j}\).

All peaks P _{ i } that are not aligned to any peak in P ^{+} indicate the end of a peak chain at the current retention time.
All completed peak chains are forwarded to the next step, twodimensional peak model estimation.
Estimating 2D peak models
Background
Let C=(P _{1},…,P _{ n }) be a chain of onedimensional Inverse Gaussian models. The goal of this step is to estimate a twodimensional peak model (product of two onedimensional Inverse Gaussians) from the chain, as described in Section ‘Peak models’, or to reject the chain if the chain does not fit such a model well. Potential problems are that a peak chain may contain noisy 1D peaks truncated at their borders, consist only of noise or in fact consist of several consecutive 2D peaks at the same drift time and successive retention times.
Estimating the parameters
As discussed in Section ‘Peak models’, the halfheight width ω _{1/2} in retention time of a peak centered at retention time r can be described by an affine function ξ(r), Eq. (3), and ω _{1/2} can be converted to the corresponding number of data points (window width).
We have the parameters \(({\hat {v}}_{i}, {\hat {\mu }}_{i, \text {t}}, {\hat {\lambda }}_{i, \text {t}}, {\hat {o}}_{i, \text {t}})\) for each individual peak i=1,…,n in a peak chain, and the corresponding descriptors \(({\hat {\mu }}'_{i,\text {t}}, {\hat {\sigma }}_{i,\text {t}}, {\hat {m}}_{i,\text {t}})\), as well as the associated retention time r _{ i } and peak height \(h_{i} = {\hat {v}}_{i} \cdot g({\hat {m}}_{i,\text {t}}; {\hat {\mu }}_{i, \text {t}}, {\hat {\lambda }}_{i, \text {t}}, {\hat {o}}_{i, \text {t}})\).
We proceed similarly to Section ‘Extracting peak parameters from a single spectrum’ by fitting quadratic polynomials b(r;θ)=θ _{2} r ^{2}+θ _{1} r+θ _{0} in sliding windows of the appropriate width ξ(r _{ i }) such that h _{ i }≈b(r _{ i };θ).
The descriptors are then converted into model parameters (see Appendix Appendix A: peak descriptors and parameters).
After processing each window, we have obtained a list of size, say, k, of Inverse Gaussian distributions. We expect these distributions to be a mixture of k overlapping peaks in a single peak chain. To obtain an optimal deconvolution, we first normalize the volume factors vr′ of the k components to obtain v _{r,j } such that \(\sum _{j=1}^{k}\, v_{\text {r},r} = 1\) and then apply the EM algorithm. As a byproduct, we obtain an (n×k) matrix M=(M _{ i,j }) that determines the membership probability for each of the n data points (r _{ i },h _{ i }) to each of the k models.
We then convert these descriptors back into model parameters (Appendix Appendix A: peak descriptors and parameters). The final peak volume is computed as \(v_{j}^{*} = v_{j, \text {r}}' \cdot \sum _{i \leqslant n} v_{i, \text {t}}\).

The width at half height in the retention time dimension has approximately the expected size (cf. Eqs. (2), (3)): ξ(m _{ j,r})/2≤σ _{ j,r}/ϕ<2·ξ(m _{ j,r}),

The peak height at its maximum is sufficiently above the noise level: \(v_{j}^{*} \cdot g(m_{j, \text {t}}; \mu _{j, \text {t}}, \lambda _{j, \text {t}}, o_{j, \text {t}}) \cdot g(m_{j, \text {r}}; \mu _{j, \text {r}}, \lambda _{j, \text {r}}, o_{j, \text {r}}) \geq \texttt {noise\_margin} \cdot \sigma _{_{\text {N}}}\), where noise_margin>0 is a tunable parameter,

the Inverse Gaussian peak model g in retention time correlates well (in terms of the Pearson productmoment correlation coefficient ρ) with its quadratic approximation b in a window around the mode. More precisely, consider the window W= [m _{ j,r}−ξ(m _{ j,r})/ϕ, m _{ j,r}+ξ(m _{ j,r})/ϕ], the model vector G=g(x;μ _{ j,r},λ _{ j,r},o _{ j,r}) for x∈W and the quadratic approximation vector B=b _{ j }(x;θ) for x∈W, and test whether the Pearson correlation satisfies ρ(G,B)≥ρ _{min}.
If all conditions are satisfied, we have identified a 2D peak model \((v_{j}^{*}, \mu _{j, \text {t}}, \lambda _{j, \text {t}}, o_{j, \text {t}}, \mu _{j, \text {r}}, \lambda _{j, \text {r}}, o_{j, \text {r}})\). Otherwise the model is discarded.
Peak clustering
Background
We now consider a series of IMS measurements, for each of which we have extracted peaks available in the form of parameter vectors or descriptors. The question arises how to decide which descriptors in different measurements represent the same peak (and hence potentially the same VOC).
Let X be the union of peak locations in all measurements, let X=:n, and let X _{ i,R} be the retention time of peak i and X _{ i,T} its IRM. We introduce a clustering approach using the EM algorithm with twodimensional Gaussian mixtures that differs from the standard approach by its ability to dynamically adjust the number of clusters in the process.
Mixture model
We assume that the measured retention times and IRMs belonging to peaks from the same compound are independently normally distributed in both dimensions around the (unknown) true retention time and IRM. Let θ _{ j }:=(μ _{ j,R},σ _{ j,R},μ _{ j,T},σ _{ j,T}) be the parameters for component j, and let f _{ j }(x ^{′} g i v e n θ _{ j } be a twodimensional Gaussian product distribution for a peak location x=(x _{R},x _{T}) with these parameters.
The mixture distribution is \(f(x) = \sum _{j=1}^{C}\, \omega _{j}\, f_{j}(x\,\, \theta _{j})\) with a yet undetermined number C of clusters. Note that there is no “background” model component.
Initial parameter values
In the beginning, we initialize the algorithm with as many clusters as peaks, i.e., we set C:=n. This assignment makes a background model obsolete, because all peaks are assigned to at least one cluster. All clusters get as start parameters for μ _{ j,R},μ _{ j,T} the original retention time and IRM of peak location X _{ j }, respectively, for j=1,…,n. We set σ _{ j,T} :=t_width>0 and σ _{ j,R}:=ξ(X _{ j,R})/ϕ.
Dynamic adjustment of the number of clusters
After computing weights in the Estep, but before starting the Mstep, we dynamically adjust the number of clusters by merging clusters whose centers are close to each other. Every pair j<k of clusters is compared in a nested forloop. When μ _{ j,T}−μ _{ k,T}<t_width and μ _{ j,R}−μ _{ k,R}<ξ(max{μ _{ j,R},μ _{ k,R}}), then clusters j and k are merged by summing the EM weights: ω ^{+}:=ω _{ j }+ω _{ k } and W _{ i,+}:=W _{ i,j }+W _{ i,k } for all i. The summed weights are assigned to the location of the cluster with larger weight. (The recomputation of the parameters happens immediately after merging in the maximization step). The comparison order may matter in rare cases for deciding which peaks are merged first, but since new means and variances are computed, possible merges that were omitted in the current iteration will be performed in the next iteration.
This merging step is applied after in the second EM iteration, since the cluster means need at least one iteration to move towards each other.
Maximum likelihood estimators
for all components j=1,…,C.
One problem using this approach emerges from the fact that initially each cluster contains only one peak, leading to an estimated variance of zero in many cases. To prevent this, minimum values are enforced such that σ _{ j,T}≥t_width and σ _{ j,R}≥ξ(μ _{ j,R})/ϕ for all j.
Final step
The EM loop terminates when no merging occurs and the convergence criteria for all parameters are fulfilled. The resulting membership weights determine the number of clusters as well as the membership coefficient of peak location X _{ i } to cluster j. If a hard clustering is desired, the merging step has to be traced.
Evaluation
 1.
the quality of reducing a single spectrum to a peak list (denoising/baseline correction (Section ‘Denoising and baseline correction’) and spectrum reduction (Section ‘Reducing a spectrum to peak models’),
 2.
the execution time of both steps,
 3.
the quality of the new clustering approach,
 4.
the correlation between manual annotations on full IMSCs by a computerassisted expert and our automated online extraction method.

r_width_offset = 2.5 s (width offset for peaks in retention time),

r_width_factor = 0.06 (width slope for peaks in retention time),

t_width = 0.003 V s cm ^{−2} (standard deviation for peaks in IRM),

thresh (convergence threshold; value varies within evaluation),

noise_margin = 4 (factor multiplied with standard deviation of background noise for minimal peak height),

ρ _{min} = 0.95 (minimal Pearson productmoment correlation coefficient).
Quality of single spectrum reduction
In a first experiment, we tested the quality of the spectrum reduction method using an idea by Munteanu and Wornowizki [19] that determines the agreement between an observed set of data points, interpreted as an empirical distribution function F (the data) and a model distribution G (the mixture distribution obtained from the peak list parameters). The approach writes \(F = \tilde {s} \cdot G + (1  \tilde {s}) \cdot H\) with \(\tilde {s} \in \,[0,1]\), where H is a nonparametric distribution whose inclusion ensures the fit of the model G to the data F. If the weight \(\tilde {s}\) is close to 1.0, then F is a plausible sample from G. We compare the original spectra and reduced spectra (peaks from peak lists) from a previously used dataset [20]. This set contains 69 measurements preprocessed with a 5×5 average. Every measurement contains 1200 spectra. For each spectrum in all measurements, we computed the reduced spectrum model and determined \(\tilde {s}\). Over 92% of all 82 000 models achieved \(\tilde {s} = 1\) and over 99% reached \(\tilde {s} \ge 0.9\). No \(\tilde {s}\) dropped below 85%. In summary, spectrum reduction provides an accurate parametric representation of most spectra.
Execution time
We tested our method on two different platforms, (1) a desktop PC with Intel(R) Core(TM) i5 2.80GHz CPU, 8GB memory, Ubuntu 12.04 (64bit) OS and (2) a Raspberry Pi [21] type B with ARM1176JZFS 700MHz CPU, 512 MB memory, Raspbian Wheezy (32bit) OS, once with the factory defaults of 700 MHz and once overclocked up to 900 MHz. The Raspberry Pi was chosen because it is a complete creditcardsized lowcost singleboard computer with low CPU and power consumption (3.5 w). This kind of device is appropriate for data analysis in future mobile measurement devices.
Average processing time of denoising, baseline correction and spectrum reduction on two platforms with different clock rates, averaging methods (single spectra, averages of 2 and 5 spectra) and convergence thresholds thresh
thresh  Platform  Avg 1  Avg 2  Avg 5 

0.1%  Desktop PC  4.36 ms  2.09 ms  0.88 ms 
Rasp. Pi (700 MHz)  119.48 ms  55.02 ms  21.82 ms  
Rasp. Pi (900 MHz)  97.19 ms  43.62 ms  17.42 ms  
1.0%  Desktop PC  4.26 ms  2.01 ms  0.66 ms 
Rasp. Pi (700 MHz)  116.69 ms  52.63 ms  16.99 ms  
Rasp. Pi (900 MHz)  94.03 ms  41.46 ms  13.48 ms 
We found that in the steps that use the EM algorithm, on average 25–30 EM iterations were necessary for a precision of thresh:=0.001 (i.e., 0.1%) (see Convergence in Section ‘The EM algorithm for mixtures with heterogeneous components’). Relaxing the threshold from 0.001 to 0.01 halved the number of iterations without noticeable difference in the resulting estimated parameters.
Clustering
To evaluate peak clustering methods, we simulate peak locations according to locations in real MCC/IMS datasets, together with the true partition of peaks.

measurement: (0,0),(1.45,600)

dense area: (0.5,4),(0.7,60)

sparse area: (0.5,4),(1.2,450)
Here (μ _{t},μ _{r}) is the coordinate of the centroid with RIM in Vs/cm^{2} and retention time in s. For the normal distribution, we used σ _{t}=0.002 and σ _{r}=μ _{r}·0.002+0.2. For the exponential distribution, we used λ _{t}=(1.45·2500)^{−1} (reduced mobility width for in single cell within M) and λ _{r}=1/(μ _{r}·0.002+0.2). For the uniform distribution, we used an ellipsoid with radii ν _{t}=0.006 and ν _{r}=μ _{r}·0.02+1.
We compared our adaptive EM clustering with two common clustering methods: kmeans and DBSCAN. Since kmeans needs a fixed number of clusters k and appropriate start values for the centroids, used kmeans++ [22] for estimating good starting values and give it an advantage by assigning the true number of partitions. DBSCAN has the advantages that it does not need to know the number of clusters in advance and that it can find nonlinear cluster boundaries, but it does not easily yield parametric cluster descriptors.
To measure the quality of an obtained clustering we use the FowlkesMallows index (FMI, [23]) and the normalized variation of information (NVI) score [24].
For the FMI one considers all pairs of data points. If two data points belong to the same true partition of , they are called connected. Accordingly, a pair of data points is called clustered if they are clustered together by the clustering method are evaluating. Pairs of data points that are both connected and clustered are called true positives (TP). False positives (FP, not connected but clustered) and false negatives (FN, connected but not clustered) are computed analogously. The FMI is the geometric mean of precision and recall: \(\text {FMI}(\mathcal {P}, \mathcal {C}) := \sqrt {TP / (TP + FP) \cdot TP / (TP + FN)}\), where is the true partition and is the clustering. We have \(\text {FMI}(\mathcal {P}, \mathcal {C}) \in [0, 1]\), and \(\text {FMI}(\mathcal {P}, \mathcal {C}) = 1\) indicates perfect agreement. The FMI is difficult to interpret when the number of clusters in and differs significantly.
Here \(NVI(\mathcal {P}, \mathcal {C}) = 0\) indicates perfect agreement between the cluster size distributions. Together, an FMI of 1 and an NVI score of 0 indicate a perfect clustering.
Comparison of automated online peak extraction with manual offline annotation
The fourth experiment compares extracted peaks from a time series of measurements of two automated methods to an expert manual annotation. The automated methods are our online analysis process described here and automated peak detection using the commercial VisualNow software.
Here 15 rats were monitored in 20 minute intervals for up to a day. Each rat resulted in 30–40 measurements (a time series) for a total of 515 measurements. To track peaks over time, we used the previously described EM clustering method.
Problems of our online method stem from lowintensity peaks only slightly above the detection threshold, and resulting fragmentary or rejected peak chains.
Discussion and conclusion
We presented the first approach to extract peaks from MCC/IMS measurements while they are being captured, with the longterm goal to remove the need for storing full measurements before analyzing them in small embedded devices. Our method is fast and satisfies the time restrictions even on a lowpower CPU platform like a Raspberry Pi and outperforms existing software.
While performing well on single spectra, there is room for improvement in merging onedimensional peak models into twodimensional peak models. Our method has to be further evaluated in clinical studies or biotechnological monitoring settings. It also has not been tested with the negative mode of an IMS for lack of data. In general, the robustness of the method under adversarial conditions (high concentrations with formation of dimer ions, changes in temperature or carrier gas flow in the MCC) has to be evaluated and probably improved.
Appendix A: peak descriptors and parameters
Declarations
Acknowledgements
DK and SR acknowledge the support of Deutsche Forschungsgemeinschaft (DFG) within the Collaborative Research Center SFB 876 (http://sfb876.tudortmund.de), “Providing Information by ResourceConstrained Data Analysis”, subproject TB1. We are grateful to Jörg Ingo Baumbach for help with numerous questions and for providing datasets. We thank Max Wornowizki from subproject C3 for providing methodology for the first evaluation.
Authors’ Affiliations
References
 Bessa V, Darwiche K, Teschler H, Sommerwerck U, Rabis T, Baumbach JI, et al. Detection of volatile organic compounds (VOCs) in exhaled breath of patients with chronic obstructive pulmonary disease (COPD) by ion mobility spectrometry. Int J Ion Mobility Spectrom. 2011; 14:7–13.View ArticleGoogle Scholar
 Bunkowski A, Bödeker B, Bader S, Westhoff M, Litterst P, Baumbach JI. MCC/IMS signals in human breath related to sarcoidosis – results of a feasibility study using an automated peak finding procedure. J Breath Res. 2009; 3(4):046001.PubMedView ArticleGoogle Scholar
 Westhoff M, Litterst P, Freitag L, Urfer W, Bader S, Baumbach JI. Ion mobility spectrometry for the detection of volatile organic compounds in exhaled breath of lung cancer patients. Thorax. 2009; 64:744–8.PubMedView ArticleGoogle Scholar
 Keller T, Schneider A, TutschBauer E, Jaspers J, Aderjan R, Skopp G. Ion mobility spectrometry for the detection of drugs in cases of forensic and criminalistic relevance. Int J Ion Mobility Spectrom. 1999; 2(1):22–34.Google Scholar
 Ewing RG, Atkinson DA, Eiceman GJ, Ewing GJ. A critical review of ion mobility spectrometry for the detection of explosives and explosive related compounds. Talanta. 2001; 54(3):515–29.PubMedView ArticleGoogle Scholar
 Kolehmainen M, Rönkkö P, Raatikainen O. Monitoring of yeast fermentation by ion mobility spectrometry measurement and data visualisation with selforganizing maps. Anal Chim Acta. 2003; 484(1):93–100.View ArticleGoogle Scholar
 Kreuder AE, Buchinger H, Kreuer S, Volk T, Maddula S, Baumbach J. Characterization of propofol in human breath of patients undergoing anesthesia. Int J Ion Mobility Spectrom. 2011; 14:167–75.View ArticleGoogle Scholar
 Bunkowski A. MCCIMS data analysis using automated spectra processing and explorative visualisation methods. PhD thesis: Bielefeld University; 2011.Google Scholar
 Bödeker B, Vautz W, Baumbach JI. Peak finding and referencing in MCC/IMSdata. Int J Ion Mobility Spectrom. 2008; 11(1):83–7.View ArticleGoogle Scholar
 D’Addario M, Kopczynski D, Baumbach JI, Rahmann S. A modular computational framework for automated peak extraction from ion mobility spectra. BMC Bioinformatics. 2014; 15(1):25.PubMed CentralPubMedView ArticleGoogle Scholar
 Kopczynski D, Rahmann S. An online peak extraction algorithm for ion mobility spectrometry data. In: WABI. Lecture Notes in Computer Science. New York: Springer: 2014. p. 232–46.Google Scholar
 Eiceman GA, Karpas Z. Ion Mobility Spectrom, Second Edition. New York: Taylor & Francis; 2005.View ArticleGoogle Scholar
 Kopczynski D, Baumbach JI, Rahmann S. Peak modeling for ion mobility spectrometry measurements. In: Signal Processing Conference (EUSIPCO), 2012 Proceedings of the 20th European. New York, NY, USA: IEEE: 2012. p. 1801–5.Google Scholar
 Spangler GE, Collins CI. Peak shape analysis and plate theory for plasma chromatography. Anal Chem. 1975; 47(3):403–7.View ArticleGoogle Scholar
 Einstein A. Über die von der molekularkinetischen Theorie der Wärme geforderte Bewegung von in ruhenden Flüssigkeiten suspendierten Teilchen. Annalen der Physik. 1905; 322(8):549–60.View ArticleGoogle Scholar
 Nocedal J, Wright SJ. Numerical Optimization, 2nd edn. New York: Springer; 2006.Google Scholar
 Dempster AP, Laird NM, Rubin DB. Maximum likelihood from incomplete data via the EM algorithm. J R Stat Soc Ser B (Methodological). 1977; 39:1–38.Google Scholar
 Needleman SB, Wunsch CD. A general method applicable to the search for similarities in the amino acid sequence of two proteins. J Mol Biol. 1970; 48(3):443–53.PubMedView ArticleGoogle Scholar
 Munteanu A, Wornowizki M. Demixing empirical distribution functions. Technical Report 201402, Collaborative Research Center 876, TU Dortmund. 2014.Google Scholar
 Hauschild AC, Kopczynski D, D’Addario M, Baumbach JI, Rahmann S, Baumbach J. Peak detection method evaluation for ion mobility spectrometry by using machine learning approaches. Metabolites. 2013; 3(2):277–93.PubMed CentralPubMedView ArticleGoogle Scholar
 Raspberry Pi Foundation. Raspberry Pi. 2014. http://www.raspberrypi.org/.
 Arthur D, Vassilvitskii S. Kmeans++: The advantages of careful seeding. In: Proceedings of the Eighteenth Annual ACMSIAM Symposium on Discrete Algorithms. SODA ‘07. Philadelphia: Society for Industrial and Applied Mathematics: 2007. p. 1027–35.Google Scholar
 Fowlkes EB, Mallows CL. A method for comparing two hierarchical clusterings. J Am Stat Assoc. 1983; 78(383):553–69.View ArticleGoogle Scholar
 Reichart R, Rappoport A. The nvi clustering evaluation measure. In: Proceedings of the Thirteenth Conference on Computational Natural Language Learning. Stroudsburg, PA, USA: Association for Computational Linguistics: 2009. p. 165–73. http://dl.acm.org/citation.cfm?id=1596374.1596401.Google Scholar
Copyright
This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly credited. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.