Metabolite-based clustering and visualization of mass spectrometry data using one-dimensional self-organizing maps
© Meinicke et al; licensee BioMed Central Ltd. 2008
Received: 24 January 2008
Accepted: 26 June 2008
Published: 26 June 2008
One of the goals of global metabolomic analysis is to identify metabolic markers that are hidden within a large background of data originating from high-throughput analytical measurements. Metabolite-based clustering is an unsupervised approach for marker identification based on grouping similar concentration profiles of putative metabolites. A major problem of this approach is that in general there is no prior information about an adequate number of clusters.
We present an approach for data mining on metabolite intensity profiles as obtained from mass spectrometry measurements. We propose one-dimensional self-organizing maps for metabolite-based clustering and visualization of marker candidates. In a case study on the wound response of Arabidopsis thaliana, based on metabolite profile intensities from eight different experimental conditions, we show how the clustering and visualization capabilities can be used to identify relevant groups of markers.
Our specialized realization of self-organizing maps is well-suitable to gain insight into complex pattern variation in a large set of metabolite profiles. In comparison to other methods our visualization approach facilitates the identification of interesting groups of metabolites by means of a convenient overview on relevant intensity patterns. In particular, the visualization effectively supports researchers in analyzing many putative clusters when the true number of biologically meaningful groups is unknown.
Metabolomics is a fundamental approach in basic research to detect and quantify the low molecular weight molecules (metabolites) in a biological sample. Besides the other so-called "omics" technologies (genomics, transcriptomics, proteomics), metabolomics is becoming a key technology that facilitates the measurement of the ultimate phenotype of an organism . In particular, metabolomics allows undirected global screening approaches based on the measurements of signal intensities for a large number of intracellular metabolites under varying conditions, such as disease or environmental and genetic perturbations [2–8]. In order to identify relevant metabolites in terms of indicative metabolic markers, it is essential to provide tools for exploratory analysis of metabolome data generated by high-throughput analytical measurements [9, 10]. For instance, the analysis of complex mass spectrometry data can cover relative intensities for a large number of metabolites under different conditions and requires advanced data mining tools to study the corresponding multivariate intensity patterns.
Regarding the scope of application, sample-based clustering for unbiased, comprehensive metabolite analysis is often applied in order to identify different phenotypes . In other cases, phenotypes are known and supervised methods may be applied to identify discriminative metabolic markers [1, 13]. In contrast, the objective of metabolite-based clustering is to identify biologically meaningful groups of markers. The common approach is to combine dimensionality reduction and clustering methods: First, a sample-based principal component analysis (PCA) is performed to compute a subset of principal components. Then the metabolite-specific PCA loadings of these components are used for metabolite-based clustering using K-means  or hierarchical methods . In these cases, the choice of experimental setup usually suggests a certain number of clusters which considerably facilitates the analysis. However, for a complex setup with several possibly overlapping conditions it is difficult to make assumptions about the number of relevant clusters. Therefore, metabolite-based clustering also requires suitable tools for visual exploration as an intuitive way to incorporate prior knowledge into the cluster identification process.
Here we introduce an approach to metabolite-based clustering and visualization of large sets of metabolic marker candidates based on self-organizing maps (SOMs). Unlike applications of the classical two-dimensional SOMs, we are proposing one-dimensional linear array SOMs (1D-SOMs). The 1D-SOM supports the search for relevant metabolites in two aspects: First, according to the assignment of data vectors to certain array positions, a "pre-clustering" of the data facilitates the analysis of large and noisy data sets. The resulting clusters provide building blocks for biologically meaningful groups of markers. In general, the determination of relevant groups requires task-specific knowledge in order to aggregate related clusters or to discard "spurious" clusters which cannot be associated with any biological meaning. This second step is supported by the dimensionality-reduced representation which results from the mapping to the linear array. By means of this mapping, 1D-SOMs allow to visualize the variation of intensity patterns along the array axis. This visualization provides a quick overview on relevant patterns in large data sets and facilitates the aggregation of related neighboring clusters. In particular, this kind of visual partitioning provides a powerful means to cope with the problem of an unknown number of "true" clusters which in general cannot be solved without task-specific constraints . In the same way, spurious clusters, which do not represent any relevant groups, can easily be identified by visual inspection.
Clustering and Visualization of Metabolite Candidates
The objective of our approach is to provide a convenient visual overview on potential metabolite clusters across a sample set of marker candidates. A marker candidate is characterized by its intensity profile under certain conditions. Thus, the marker can be represented by some d-dimensional vector x which contains the condition-specific quantities as inferred from mass spectrometry intensities. Besides the intensity profile vector x i , also a particular retention time (rt) index and mass-to-charge ratio (m/z) is associated with each marker candidate i in a given sample. While the intensity profiles are used in the clustering algorithm as shown below, the rt and m/z indices are only used for interpretation of the resulting groups (see section "visualization").
In general, mass spectrometry-based metabolite profiling is performed for each condition with multiple samples. For clustering, we use average intensity values of replicas for each marker candidate and treatment condition. After the averaging step, each marker candidate is represented by a vector with d dimensions corresponding to d experiment conditions. The averaging is important in order to compensate for random variations between different measurements and can be viewed as a noise reduction step. In principle, repeated measurements for averaging are not strictly necessary for application of our clustering approach. In practice, however, the noise reduction will help to achieve reproducible results. Furthermore, repeated measurements allow to evaluate the robustness of the clustering: single replica samples may be left out to analyze the variation induced by this kind of "leave-one-out" disturbance. In other words, it becomes possible to measure clustering or prototype stability with respect to a reduced quality of the training data. As compared with a marker-based cross-validation which reduces the size of the training set due to left out markers, the sample-based cross-validation allows to detect the same groups of markers across all leave-one-out folds.
In order to improve the comparability between putative metabolites of different abundance, the vector of intensity values for each marker candidate is normalized to Euclidean unit length. The normalization step ensures that marker clustering only depends on relative intensities and not on the usually large differences of absolute intensities. Therefore, the normalization allows to detect related metabolites irrespective of their abundancies. Without normalization, the clustering would mainly reflect the length variation within the set of marker candidate vectors.
In our 1D-SOM algorithm, a particular cluster arises from a group of marker candidates assigned to one of K "prototype" vectors w k ∈ ℝ d for k = 1,..., K. A prototype vector corresponds to an average intensity profile and can be viewed as a noise-reduced representation of the associated marker candidates in that group. The clustering algorithm imposes a topological order on the prototypes according to a one-dimensional linear array. In that way, the projection onto an ordered set of prototypes also provides a dimensionality-reduced representation of the data in terms of a one-dimensional array index. The objective of the ordering is that prototypes adjacent in the array should provide more similarity than prototypes with distant array positions. The algorithm for optimization of prototypes is based on topographic clustering, which is a well-known technique in bioinformatics, usually applied by means of two-dimensional SOMs . Unlike classical SOM applications, our one-dimensional map can be used to visualize the variation of intensity profiles along the array of prototypes within a common 2D color or gray level image (see next section).
It is important to note that the final number of clusters depends on both, the maximal number of prototypes K and the smoothing parameter σ. This means that for a large amount of smoothing (high σ value) the actual number of clusters can be much smaller than the number K of available prototypes. In particular for a sufficiently high degree of smoothing, some prototypes may associate with zero-size clusters, i.e. they do not represent actual clusters. These prototypes are merely influenced by neighboring prototypes, without assignment to marker data.
The overall optimization scheme also involves a prior initialization step for the matrix W of prototypes and an annealing schedule for the smoothing parameter s. For initialization, all prototypes (columns of W) are placed along the first principal component axis within a small interval around the global mean vector. The annealing schedule is chosen to realize an exponential decrease of σ over 100 steps, starting with a maximum value σmax = 100 and ending with an adjustable minimum value which we set to σmin = 0.1. In supplementary material (see Additional file 1) a video clip shows the annealing process for the experimental data that is used in our case study (see section "Case study for experimental evaluation"). In our experiments, the (deterministic) annealing has shown to provide an efficient strategy to find deep local minima of the objective function. In particular, we found that it ensures good reproducibility of results because it makes the approach robust with respect to the initialization of prototypes. In all cases we observed that, besides the above principal component initialization, also different random initializations resulted in exactly the same prototypes up to a possibly reversed order. This behaviour can be explained by the fact that for a sufficiently high smoothing parameter the resulting 1D-SOM corresponds to a "dipole" where the ends (first and last prototype) provide the only non-zero size clusters (see Additional file 1). In this case, the line segment between these two prototypes is approximately collinear to the first principal component axis.
The result of the marker clustering process is an ordered array of prototypes in terms of a one-dimensional self-organizing map (1D-SOM) as described in the previous section. Each prototype represents a group of marker candidates and corresponds to an average intensity profile of that group. Therefore, the prototype-specific intensity profile can be viewed as a noise-reduced representation of all marker candidates assigned to this prototype. The order of prototypes in the array implies that similar intensity profiles are closer to each other than unrelated intensity profiles.
The 1D-SOM matrix in figure 2 shows the resulting 33 prototypes that have been optimized during the clustering process in our case study (see section "Case study for experimental evaluation"). The figure reveals a certain block structure of the prototype matrix which can be perceived as a visual partitioning along the linear array axis. Within the corresponding blocks, the prototypes are very similar or they show gradual changes ("trends") of a certain intensity pattern. For example, prototypes 18 and 19 show a unique pattern which indicates, that metabolite candidates in the corresponding two clusters provide a significantly higher intensity under the fifth condition than under the remaining seven conditions. If conditions correspond to time points, as in the example, the "highlighting" of a specific condition usually indicates the presence of so-called "transient" markers. On the other hand, blocks of putative markers may result from more complex intensity patterns, e.g. when related prototypes show high intensity values for several "overlapping" conditions simultaneously. In particular, a smooth variation of a pattern along a block may indicate a time course or trend, for instance metabolite concentration under temporal development. In figure 2, overlapping conditions can especially be observed among the first twelve prototypes which show a continuous time-dependent evolution of the intensity pattern. However, prototypes 11 and 12 show an intensity maximum for the (first) control condition and therefore should be assigned to a separate block (see section "Application of 1D-SOMs"). In general, prior knowledge about reasonable condition overlaps within the experimental setup is necessary to identify meaningful blocks of prototypes.
Case study for experimental evaluation
The objective of our experimental evaluation is not to provide "hard" performance indices, e.g. in terms of detection rates, but rather to show how our 1D-SOM approach can support scientists in the interpretation of large metabolic data sets, especially for the identification of interesting groups of markers. On one hand there is no "benchmark" data set with known markers available which provides a complex experimental setup with a sufficiently large number of conditions. On the other hand our 1D-SOM approach is designed for visual exploration of multivariate marker data which is difficult to evaluate in terms of a simple performance criterion. Therefore, we here provide a case study in order to illustrate the practical utility of our method. For that purpose we chose a well-established experimental setup for analyzing the wound response of plants.
Because the wound response shows a complex network of integrated biochemical signals we used an unbiased metabolomic analysis to extend our knowledge on global metabolic changes at early time points after wounding. In contrast to targeted procedures, this type of analysis is able to cope with complex metabolic situations in a more realistic and global way by including many metabolites that are unknown so far but are regulated in a certain context. For the interpretation of data sets of such high complexity, advanced data mining tools are essential.
Plant growth and wounding
Two plant lines were used: wt plants of A. thaliana (L.) ecotype Columbia-0 (Col-0) and the JA-deficient mutant plants dde 2–2 . Plants were grown on soil under short day conditions. Rosette leaves of eight-week-old plants were mechanically wounded using forceps . Whole rosettes of unwounded plants (control, 0 h) and wounded plants (0.5, 2 and 5 hours post wounding (hpw)) were harvested and immediately frozen in liquid nitrogen. To minimize biological variation, rosettes of five to ten plants were pooled for each time point.
Experimental conditions for wounding of A. thaliana wild type (wt) and dde 2–2 mutant (dde 2–2) plants.
A. thaliana Col-O
hour post wounding (hpw)
wt, 0 h
wt, 0.5 hpw
wt, 2 hpw
wt, 5 hpw
dde 2–2, 0 h
dde 2–2, 0.5 hpw
dde 2–2, 2 hpw
dde 2–2, 5 hpw
Metabolite extraction and measurement
Plant material was homogenized under liquid nitrogen and subsequently extracted using methanol/chloroform/water (1:1:0.5, v:v:v) as described in , but without adding internal standards. Four independent extractions were performed for each condition.
The chloroform phase containing lipophilic metabolites was analyzed by Ultra Performance Liquid Chromatography (ACQUITY UPLC™ System, Waters Corporation, Milford) coupled with an orthogonal time-of-flight mass spectrometer (TOF-MS, LCT Premier™, Waters Corporation, Milford) working with negative electrospray ionization (ESI) in an m/z range of 50 to 1200. For chromatographic separation an ACQUITY UPLC™ BEH SHIELD RP18 column (1 × 100 mm, 1.7 μ m, Waters Corporation, Milford) was used with a methanol/acetonitrile/water gradient, containing 0.1% (v/v) formic acid. The LC/MS analysis was performed at least twice for each extract resulting in nine replicas for each condition. The identification of metabolites was verified by exact mass measurement and coelution with authentic standards.
The raw mass spectrometry data of all samples were processed (deconvolution, alignment, deisotoping and data reduction) using the MarkerLynx™ Application Manager for MassLynx™ software (Waters Corporation, Milford) with parameter settings as shown in the supplementary table "MarkerLynx parameters" (see Additional file 2). MarkerLynx™ automatically performs a noise reduction which results in zero values for certain low intensity peaks. The processing resulted in 6048 marker candidates.
Unsupervised methods for metabolite-based clustering strongly rely on marker quality. The quality mainly depends on reproducibility and biological interpretability. Without prior selection, large amounts of non-informative markers with little intensity variation across different conditions would dominate the clustering results and complicate further analysis. In general, number and quality of selected markers should depend on the specific requirements of a particular study. Therefore, a task-dependent trade-off between number and quality of marker candidates has to be found. In our case we performed a Kruskal-Wallis test  on the intensities of each marker candidate and used the corresponding p-value as a measure of quality. Considering the rank order of marker candidate intensities, this non-parametric test can be used to detect significant variation of the condition-specific mean ranks. In that way we selected a subset of high-quality markers using a conservative confidence threshold of 10-6. The selection contained 837 marker candidates with a p-value below the specified threshold (see Additional file 3 for CSV file of data set).
Results and Discussion
In the following we first present the results of our case study using the proposed 1D-SOM algorithm. Then we apply hierarchical clustering analysis (HCA) in combination with the K-means algorithm  and finally principal component analysis (PCA) for comparison. For implementation of the 1D-SOM training and visualization we used the MATLAB® programming language together with the Statistics Toolbox® for HCA and K-means clustering.
Application of 1D-SOMs
Because the true number of biologically meaningful groups is unknown, we had to choose a sufficiently high number of prototypes for clustering. In accordance with a prior robustness study (see section "Accessing Robustness") we chose K = 33 prototypes for the analysis in our case study. For higher numbers of prototypes we observed an increasing number of singleton clusters as well as the occurrence of "empty" clusters without any assigned marker candidates.
First, the resulting 1D-SOM allows an overview of the complex metabolic situation within the sample set of examination (see figures 2 and 4). Simultaneously, a more specific analysis of distinct clusters can be performed by means of rt-m/z scatter plots (see figures 5 and 6). In figure 2, the 1D-SOM of the time course of the wound experiment including wt and dde 2–2 mutant plants is shown. To our knowledge, this is the first visualization that shows a convenient overview of the intensity patterns of several hundred marker candidates of the lipophilic fractions. The intensity profiles of these 837 lipophilic marker candidates are represented by 33 prototypes. The visualization clearly reveals the existence of different blocks of intensity patterns.
Formation of blocks based on the interpretation of prototype profiles and identification of corresponding markers.
Identified wound markers
01 – 10
Accumulation in wild type plants after wounding
JA-Ile (m/z 322)
dn-OPDA (m/z 263)
OPC-4 (formate adduct, m/z 283)
JA (m/z 209)
OPDA (m/z 291)
OH-JA-Ile (m/z 338)
OH-JA (m/z 225)
COOH-JA-Ile (m/z 352)
11 – 12
Accumulation in wt control plants
13 – 17
18 – 19
Accumulation in mutant control plants
COOH-22:0 (m/z 369)
OH-22:0 (m/z 355)
OH-24:0 (m/z 383)
OH-26:0 (m/z 411)
20 – 24
Accumulation in mutant plants after wounding
HHT (m/z 265)
HOT (m/z 293)
KOT (m/z 291)
25 – 33
Delayed accumulation in mutant plants after wounding
Prototypes 20–24 can be grouped in a block E (see figure 2 and table 2). This rather small block contains 58 marker candidates typical for the wound response in the JA deficient dde 2–2 mutant plants and, thus, acts as a counterpart of block A. In wt plants block E marker candidates are either missing or show very low intensities. Within block E a shift from very early transient marker patterns (prototype 20) over very early time-stable patterns (prototype 21 and 22) towards late marker patterns of the wound response (prototype 24) is obvious.
A very small but remarkable block consists of prototypes 18 and 19 (block D, see figure 2 and table 2). Here 26 marker candidates accumulate in non-treated plants of the dde 2–2 mutant but not in non-treated wt plants. Within 0.5 hpw the level of these candidates decreased in dde 2–2 mutant plants. Therefore, block D represents marker candidates down regulated during the wound response in dde 2–2 mutant plants. Surprisingly, there is a dominating block summarizing 362 marker candidates with increasing intensities both in wt and in mutant plants after wounding (block F, prototypes 25 to 33, see figure 2 and table 2). The visualization revealed that the accumulation of these putative metabolites started earlier in wt plants (2 hpw) when compared to the mutant plants (5 hpw). The wound marker candidates of block F seem to be regulated independently from the JA pathway.
Block A and D are interrupted by a block B summarizing marker candidates that accumulate in wt control plants (prototype 11 and 12) and block C showing mainly indifferent intensity patterns (prototype 13–17). After the initial assignment of prototypes, blocks were analyzed in more detail at the level of individual metabolites. For this purpose we searched the data set for well known metabolic constituents of the wound response, such as JA, its immediate precursors 12-oxo-phytodienoic acid (OPDA), 3-oxo-2-(pent-2'-enyl)-cyclopentane-1-octanoic acid (OPC-8), 3-oxo-2-(pent-2'-enyl)-cyclopentane-1-hexanoic acid (OPC-6) and 3-oxo-2-(pent-2'-enyl)-cyclopentane-1-butanoic acid (OPC-4), as well as JA derivatives and the roughanic acid-derived homolog of OPDA, dn-OPDA (see also figure 7) [23, 30]. By this approach, eight known wounding markers could be identified in block A (see figure 2 and table 2). Markers related to the wound response in the dde 2–2 mutant plants are located in block D and E (see figure 2 and table 2). The JA-independent marker candidates of block F will be subject of further investigations.
Prototypes of block A represent wound markers of wt plants
The wound markers JA (m/z 209) and OPC-4 (formate adduct, m/z 283) were detected in cluster 5 (see table 2). As visible in the rt-m/z plane in figure 5, the blue-colored JA dot at rt 0.72 min shows the lowest m/z value within a noticeable vertical stack. Dots of this stack may partially represent ESI-specific adducts of JA, such as the formate adduct (m/z 255, rt 0.72 min). Due to the high similarity of intensity profiles between a metabolite and its adducts, metabolites and their adducts are likely to be assigned to the same prototype. Thus, adducts are easy to detect within the same cluster by means of stack formation which results from identical retention times.
Interestingly, prototype 5 associates the intensity profile of JA and its precursor OPC-4 (blue dot at rt 0.98 min in the rt-m/z plane in figure 5) with the profile of a group of marker candidates of high molecular weight (m/z range from 800 to 1200) not identified up to now. However, the arrangement of these metabolites in the JA-containing cluster suggests them to play a role in wound response of wt plants. The wound markers dn-OPDA (m/z 263) and jasmonoyl-isoleucine (JA-Ile, m/z 322) were detected in cluster 8 and 9, respectively (see figure 2 and table 2). These prototypes are associated with marker candidates characterized by a very early and transient intensity maximum at 0.5 hpw.
Similar to prototype 5, prototype 9 also associates the intensity profile of a small, rather polar wound signal substance (JA-Ile) with the profile of a group of markers of high molecular weight (m/z range from 850 to 1020) and stronger lipophilic properties (rt range from 2.5 to 4 min) not identified with certainty up to now. Interestingly, the time-dependent order of prototypes in the 1D-SOM allows the prediction that JA-Ile and the associated group of marker candidates of high molecular weight in cluster 9 are more transiently regulated than the main wound marker JA located in cluster 5. Therefore, the group of compounds associated with JA-Ile appears to represent valuable candidates for further investigations into the network of wound signaling in A. thaliana.
Hydroxy-JA (OH-JA, m/z 225) and the JA-Ile derivatives hydroxy-jasmonoyl-isoleucine (OH-JA-Ile, m/z 338) and carboxy-jasmonoyl-isoleucine (COOH-JA-Ile, m/z 352) are assigned to prototype 1. All three substances show an intensity profile typical for late-occurring wound responsive metabolites. OH-JA is a product of JA modification with the capability to counteract the JA signaling pathway . The JA-OH intensity pattern coincides with the postulated counterregulatory function of OH-JA. Like OH-JA, the polar JA-Ile derivatives OH-JA-Ile and COOH-JA-Ile show a delayed wound response in comparison to JA-Ile and JA, an observation also described in . The wound marker OPDA (m/z 291, see figure 2 and table 2) was detected in cluster 2 and therefore OPDA also represents a late wound marker.
Prototypes of block E represent wound markers of dde 2–2 mutant plants
In dde 2–2 mutant plants the wound response is disturbed by the deletion of the AOS enzyme activity. Therefore, products of the wound signaling pathway upstream of the AOS reaction should be enriched and have therefore been expected in block E. Candidates for the accumulation of precursors are hydroperoxides and hydroxides of fatty acids as well as keto fatty acids . We have identified hydroxy hexadecatrienoic acid (HHT, m/z 265) in cluster 21 and hydroxy octadecatrienoic acid (HOT, m/z 293) as well as keto octadecatrienoic acid (KOT, m/z 291) in cluster 22, respectively (see table 2). These observations confirm our hypothesis that the intensity levels of all three metabolites (HHT, KOT and HOT) are regulated by the AOS enzyme activity.
Prototypes of block D represent markers accumulating in dde 2–2 mutant control plants
Block D with prototypes 18 and 19 combines 26 marker candidates with intensity profiles indicating accumulation in the control plants of the dde 2–2 mutant and a decrease after wounding of these plants. However, these candidates exhibit only low intensities and are not altered in intensity by wounding in wt plants (see figure 2).
The seven blue-colored markers of cluster 19 shown in figure 6 could be identified as very long chain dicarboxylic and hydroxy fatty acids so far not described in the context of plant wound responses (see table 2): docosanedioic acid (COOH-22:0, m/z 369, rt 4.54 min), hydroxy-docosanoic acid (OH-22:0, m/z 355, rt 4.72 min), hydroxy-tetracosanoic acid (OH-24:0, m/z 383, rt 5.31 min), hydroxy-hexacosanoic acid (OH-26:0, m/z 411, rt 5.85 min) and the formate adducts of the latter three hydroxy fatty acids. These formate adducts are characterized by identical retention times and a mass shift of m/z 46 regarding the molecular ion. The formation of strong formate adducts for the hydroxy fatty acids but not for the dicarboxylic fatty acid could be confirmed by LC/MS analysis of the corresponding standards. The analysis shows the potential of adduct formation occurring in ESI-MS analysis for the further identification of markers. Here the visualization by means of rt-m/z scatter plots makes it possible to recover specific adduct formation (see figure 6). Finally, the occurrence of these four very long chain dicarboxylic and hydroxy fatty acids in one cluster suggests that these metabolites are part of the same regulatory context.
Application of HCA/K-means
For comparison of our 1D-SOM method with a more classical approach to clustering and visualization we performed hierarchical cluster analysis (HCA) in combination with K-means. The HCA/K-means scheme combines hierarchical clustering for prototype initialization with a K-means algorithm for iterative improvement of prototypes. For this purpose the resulting HCA dendrogram is cut at a particular distance to obtain a predefined number of ordered clusters. In the next step K-means is applied using the HCA partition means as initial prototypes.
Application of PCA
We have introduced an approach to metabolite-based clustering for the identification of biologically relevant groups of metabolic markers in mass spectrometry data. Our algorithm is based on a special realization of one-dimensional self-organizing maps (1D-SOMs). In a case study about the wound response in A. thaliana we could show that our 1D-SOMs provide a visualization of multivariate marker data suitable for investigation of potential clusters. By means of a linear array of ordered prototypes the 1D-SOM representation gives a convenient overview on relevant patterns in complex multivariate data. Meaningful expected as well as unexpected clusters can be identified by visual inspection of the corresponding intensity profiles. In particular our approach supports the discovery of so far unknown markers on the basis of their location in the 1D-SOM array with respect to previously identified markers.
We thank René Rex for helpful comments, Pia Meyer for excellent technical assistance for the plant wound experiment and Ingo Heilmann for proofreading of the manuscript. This work was partially supported by the Federal Ministry of Research and Education (BMBF) project "MediGRID" (BMBF 01AK803G) and by the German Research Council project "Signals in the Verticillium-plant interaction" (DFG FOR-546).
- Dettmer K, Aronov PA, Hammock BD: Mass spectrometry-based metabolomics. Mass Spectrom Rev. 2007, 26: 51-78.PubMedPubMed CentralView Article
- Shulaev V, Cortes D, Miller G, Mittler R: Metabolomics for plant stress response. Physiologia Plantarum. 2008, 132 (2): 199-208.PubMedView Article
- Guy C, Kaplan F, Kopka J, Selbig J, Hincha DK: Metabolomics of temperature stress. Physiologia Plantarum. 2008, 132 (2): 220-235.PubMed
- Sanchez DH, Siahpoosh MR, Roessner U, Udvardi M, Kopka J: Plant metabolomics reveals conserved and divergent metabolic responses to salinity. Physiologia Plantarum. 2008, 132 (2): 209-219.PubMed
- Gray GR, Heath D: A global reorganization of the metabolome in Arabidopsis during cold acclimation is revealed by metabolic fingerprinting. Physiologia Plantarum. 2005, 124 (2): 236-248.View Article
- Tarpley L, Duran A, Kebrom T, Sumner L: Biomarker metabolites capturing the metabolite variance present in a rice plant developmental period. BMC Plant Biol. 2005, 5: 8-PubMedPubMed CentralView Article
- Aharoni A, Ric de Vos C, Verhoeven H, Maliepaard C, Kruppa G, Bino R, Goodenowe D: Nontargeted metabolome analysis by use of Fourier Transform Ion Cyclotron Mass Spectrometry. OMICS. 2002, 6: 217-234.PubMedView Article
- Fiehn O, Kopka J, Dörmann P, Altmann T, Trethewey R, Willmitzer L: Metabolite profiling for plant functional genomics. Nat Biotechnol. 2000, 18: 1157-1161.PubMedView Article
- Steinfath M, Groth D, Lisec J, Selbig J: Metabolite profile analysis: from raw data to regression and classification. Physiologia Plantarum. 2008, 132 (2): 150-161.PubMedView Article
- Bhalla R, Narasimhan K, Swarup S: Metabolomics and its role in understanding cellular responses in plants. Plant Cell Rep. 2005, 24 (10): 562-571.PubMedView Article
- Jiang D, Tang C, Zhang A: Cluster Analysis for Gene Expression Data: A Survey. IEEE Transactions on Knowledge and Data Engineering. 2004, 16 (11): 1370-1386.View Article
- Fiehn O: Metabolomics-the link between genotypes and phenotypes. Plant Mol Biol. 2002, 48 (1–2): 155-171.PubMedView Article
- Wiklund S, Johansson E, Sjöström L, Mellerowicz E, Edlund U, Shockcor J, Gottfries J, Moritz T, Trygg J: Visualization of GC/TOF-MS-based metabolomics data for identification of biochemically interesting compounds using OPLS class models. Anal Chem. 2008, 80: 115-122.PubMedView Article
- Pohjanen E, Thysell E, Lindberg J, Schuppe-Koistinen I, Moritz T, Jonsson P, Antti H: Statistical multivariate metabolite profiling for aiding biomarker pattern detection and mechanistic interpretations in GC/MS based metabolomics. Metabolomics. 2006, 2 (4): 257-268.View Article
- Jain AK, Dubes RC: Algorithms for clustering data. 1988, Upper Saddle River, NJ, USA: Prentice-Hall, Inc
- Kohonen T: Self-Organizing Maps. 2001, Secaucus, NJ, USA: Springer-Verlag New York, IncView Article
- Graepel T, Burger M, Obermayer K: Deterministic Annealing for Topographic Vector Quantization and Self-Organising Maps. Proceedings of the Workshop on Self-Organizing Maps (WSOM '97). Edited by: Kohonen T. 1997, 345-350.
- Heskes T, Kappen B: Error potentials for self-organization. International Conference on Neural Networks. 1993, 3: 1219-1223. San Francisco, New York: IEEEView Article
- Wasternack C, Stenzel I, Hause B, Hause G, Kutter C, Maucher H, Neumerkel J, Feussner I, Miersch O: The wound response in tomato-role of jasmonic acid. J Plant Physiol. 2006, 163: 297-306.PubMedView Article
- Leon J, Rojo E, Sanchez-Serrano J: Wound signalling in plants. J Exp Bot. 2001, 52: 1-9.PubMedView Article
- Wasternack C: Jasmonates: an update on biosynthesis, signal transduction and action in plant stress response, growth and development. Ann Bot. 2007, 100: 681-697.PubMedPubMed CentralView Article
- Reymond P, Weber H, Damond M, Farmer E: Differential gene expression in response to mechanical wounding and insect feeding in Arabidopsis. Plant Cell. 2000, 12: 707-720.PubMedPubMed CentralView Article
- Glauser G, Grata E, Dubugnon L, Rudaz S, Farmer E, Wolfender J: Spatial and temporal dynamics of Jasmonate synthesis and accumulation in Arabidopsis in response to wounding. J Biol Chem. 2008
- The Arabidosis Genome Iniative: Analysis of the genome sequence of the flowering plant Arabidopsis thaliana. Nature. 2000, 408: 796-815.View Article
- Schilmiller A, Howe G: Systemic signaling in the wound response. Curr Opin Plant Biol. 2005, 8: 369-377.PubMedView Article
- von Malek B, Graaff van der E, Schneitz K, Keller B: The Arabidopsis male-sterile mutant dde 2–2 is defective in the ALLENE OXIDE SYNTHASE gene encoding one of the key enzymes of the jasmonic acid biosynthesis pathway. Planta. 2002, 216: 187-192.PubMedView Article
- Stenzel I, Hause B, Maucher H, Pitzschke A, Miersch O, Ziegler J, Ryan C, Wasternack C: Allene oxide cyclase dependence of the wound response and vascular bundle-specific generation of jasmonates in tomato – amplification in wound signalling. Plant J. 2003, 33: 577-589.PubMedView Article
- Fiehn O: Protocol for Plant Leaf Metabolite Profiling. 1 May 2000 [Accessed 22 Jan 2008], http://www.mpimp-golm.mpg.de/fiehn/forschung/blatt-protokoll-e.html
- Gibbons JD: Nonparametric Statistical Inference. 1985, New York and Basel: Marcel Dekker, Inc, 2
- Weber H, Vick B, Farmer E: Dinor-oxo-phytodienoic acid: a new hexadecanoid signal in the jasmonate family. Proc Natl Acad Sci USA. 1997, 94: 10473-10478.PubMedPubMed CentralView Article
- Miersch O, Neumerkel J, Dippe M, Stenzel I, Wasternack C: Hydroxylated jasmonates are commonly occurring metabolites of jasmonic acid and contribute to a partial switch-off in jasmonate signaling. New Phytol. 2008, 177: 114-127.PubMed
- Grata E, Boccard J, Glauser G, Carrupt P, Farmer E, Wolfender J, Rudaz S: Development of a two-step screening ESI-TOF-MS method for rapid determination of significant stress-induced metabolome modifications in plant leaf extracts: the wound response in Arabidopsis thaliana as a case study. J Sep Sci. 2007, 30: 2268-2278.PubMedView Article
- Delker C, Stenzel I, Hause B, Miersch O, Feussner I, Wasternack C: Jasmonate biosynthesis in Arabidopsis thaliana-enzymes, products, regulation. Plant Biol. 2006, 8: 297-306.PubMedView Article
This article is published under license to BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.