PRIMe LC-MS Branch

PRIMe LC-MS Branch

Liquid chromatography-mass spectrometry (LC-MS) branch of the Platform for RIKEN Metabolomics (PRIMe) pipeline is now ready for the analysis of plant secondary metabolites. The deta produced in the this branch will be distributed from this page. We wish to thank Mr. K. Akiyama and T. Sakurai in Integrated genome informatics research unit for helpful support.

What's New?



- Matsuda et al (2010) AtMetExpress development: A phytochemical atlas of Arabidopsis thaliana development Plant Physiol in press.[Pubmed]
- Matsuda et al (2009) Assessment of metabolome annotation quality: a method for evaluating the false discovery rate of elemental composition searches PLoS One (4) e7490[Pubmed]
- Matsuda et al (2009) MS/MS spectral tag (MS2T)-based annotation of non-targeted profile of plant secondary metabolites Plant J (57) 555-577.[Pubmed]

Web-based tools

AtMetExpress Develoment

[AtMetExpress Develoment]:Phytochemical Atlas of Arabidopsis thaliana development

AGIcode Extractor

[AGIcode Extractor]:Extraction of AGI codes from your text data

MS2T viewer

[MS2T viewer]:MS/MS spectral tag (MS2T) libraries are collections of MS/MS spectral data of phytochemicals obtained by LC-MS/MS. The basic concept of MS2T and its application for metabolite annotation are explained in Matsuda et al (2009).

Download: Tools and annotaion data for the processing of LC-MS metabolome data


[Ntoolbox] N toolbox consists of five programs, Nprefilter, Nnormalizer, Nfilter, Nisotoperemover, and Nannotator for the post-processing of data matrix produced by MetAlign. These tiny programs are written with Perl/Tk. For Windows environment, please download and install ActivePerl (from ActiveState). These scripts can be executed by double-click of .pl files.

Standard compound data

[Standard compound data] List of retention time and m/z data of standard compounds (> 400) obtained by the method identical with the non-targeted analysis. The plain text data is for Nannotator.

Peak annotation information

[Peak annotation information] Curated list of annotatable metabolite peaks. The plain text data is for Nannotator.

MS2T data

[MS2T data] Original MS/MS spectral tags (MS2Ts) data with plain text formats. The spectral data could be visualized by MS2T viewer

Data Matrix

[Data Matrix] Data Matrix produce in this branch are now available.

Raw LC-MS data

[Raw LC-MS data] Raw LC-MS data files of plant metabolome data are downloadable from DROP Met in our web sites.

Protocol 1:Non-targeted metabolome analysis


The frozen tissues were homogenized in 5 volumes of 80% aqueous methanol containing internal standards (0.5 mg/l of lidocaine and d-camphor sulfonic acid (Tokyo Kasei, Tokyo, Japan)) by using a mixer mill (MM 300, Retsch) with a zirconia bead for 6 min at 20 Hz. Following centrifugation of 15000 g for 10 min and filtration (Ultrafree-MC, 0.2 microm, Millipore, Bedford, MA, USA), the sample extracts were served for LC-MS analysis.
We used LC-MS grade solvents (water and methanol) for extraction. All tubes, tips and glassware used in the extraction procedure often be a source of contamination with detergents that is observed as a broad peaks in the LC-MS chromatograms.

Data acquisition

The sample extracts (2 microl) were analyzed using an LC-MS system equipped with an electrospray ionization (ESI) interface (HPLC: Waters Acquity UPLC system; MS: Waters Q-Tof Premier). The analytical conditions were as follows. HPLC: column: Acquity BEH C18 (pore size: 1.7 microm), Waters, 2.1 by 100 mm; solvent system: acetonitrile (0.1% formic acid):water (0.1% formic acid); gradient program: 1:99, v/v, at 0 min; 1:99, v/v, at 0.1 min; 99.5:0.5 at 15.5 min; 99.5:0.5 at 17.0 min; 1:99, v/v, at 17.1 min; and 1:99 at 20 min; flow rate: 0.3 ml/min; temperature: 38 C. MS detection: capillary voltage: +3.0 keV; cone voltage: 22.5 V; source temperature: 120 C; desolvation temperature: 450 C; cone gas flow: 50 l/h; desolvation gas flow: 800 l/l; collision energy: 2 V; detection mode: scan (m/z 100-2000; dwell time: 0.45 s; interscan delay: 0.05 s, centroid). The scans were repeated for 19.5 min in a single run. The data were recorded with the aid of MassLynx version 4.1 software (Waters).

Quality check

All sample extracts contain internal standards (IS), lidocaine (m/z 235 [M + H]+, eluted at 4.19 min for the positive ion mode) and (-)-camphor-10-sulfonic acid (m/z 231 [M - H]-, eluted at 3.84 min for the negative ion mode) at a concentration of 0.5mg/l. These standards were screened from several IS candidates by considering unnatural compounds that underwent stable ionization without being affected by the sample matrix. Just after the data acquisition procedure, retention time, peak intensity, and peak shape of IS signals were manually checked to control quality of analyses.

Generation of data matrix and processing

The data matrix was generated from the metabolic profile data by using the MetAlign software that was processed by the aid of in-house software written with Perl/Tk (N toolbox) consisting of five programs, Nprefilter, Nnormalizer, Nfilter, Nisotoperemover, and Nannotator. Detailed methods for the processing and interpretation of the MS2T data are described below.

Data mining

The data matrix generated here can be incorporated into various microarray data minning tools such as MeV (Multiexperiment viewer) that is freely available and supports various kind of stastical analyses.

Protocol2: Procedures for processing of LC-MS metabolome data

Data processing is still incomplete.

Data processing is still a burden of metabolome analysis by which raw LC-MS choromatogram data are coverted to data matrix (table) with metabolite annotation informations. In spite of its importance for a success of metabolome analysis, data processing sometimes seems to be art rather than technique.... Here, a protocol for the processing of ESI-MS metabolome data employed in PRIMe LC-MS branch is described.

Generation of data matrix by using MetAlign

The profiling data files recorded in the MassLynx format (raw) were converted to the NetCDF format by the DataBridge function of MassLynx 4.1. From the set of NetCDF data files, the data matrix was generated by using the MetAlign software (De Vos et al., 2007) (Step 2, Fig. 1b). The parameters for the processing of data were as follows: Maximum amplitude: 10000; Peak slope factor: 1; Peak threshold factor: 6; Average peak width at half weight: 8; Scaling options: no; Maximum shift per scan: 35; Select min. nr. per peak set: 8 for the tissue specificity analysis data and 3 for the screen of Ds inserted mutant lines. By the procedure, the data matrixes with unit mass data were generated indicating that high resolution data acquired by the time-of-flight (TOF) analyzer have been discarded in this step. Whereas it has been pointed out that accuracy of mass number data obtained by TOF analyzer (ca. 5-10 ppm) is not enough for estimating the single candidate molecular formula (Kind and Fiehn, 2006), the high resolution data have been considered as valuable information for estimating a molecular formula of metabolite. Several peak-picking software packages have been developed for the high resolution data and applied for the plant metabolomics studies (Katajamaa and Oresic, 2005). On the other hand, an advantage of unit mass data is a faster peak-picking of metabolic profiling data which enable us to deal with a large scale dataset in high throughput analysis. In the case of the screen of Ds inserted mutant experiment (Fig. 7), the generation of the matrix from 219 raw data files was finished within one night by using a usual desktop PC (Pentium4 3.0GHz, 2GB memery). It should be noted that high resolution mass number information could be, at least in part, available from the precursor ion data of corresponding MS2Ts. And the results of this study suggested that the unit mass data was enough effective for the profiling of plant secondary metabolites. It is expected that the high-resolution data can be taken into consideration for more detailed metabolic profiling of large-scale dataset by the improvement of peak-picking software. The N toolbox described below is able to deal with high resolution data with slight modification of programs.

Processing of data matrix

The data matrix generated by the MetAlign were processed by the aid of in-house softwares written with Perl/Tk named N toolbox consisting of five programs, Nprefilter, Nnormalizer, Nfilter, Nisotoperemover, and Nannotator (Fig. 1-5). The detailed methods for the processing of Arabidopsis tissue samples are described below.

Pre-filtering (

The peaks eluted before 0.85 min (scan number 100) and after 14.0 min (scan number 1650) were discarded by the aid of the function of (Fig. 1) to remove the low quantitative peaks eluted near the void volume as well as the broad peaks eluted at the end of the chromatogram.

Figure 1 Screen Shot of

How to use...

  1. Select a file for processing.
  2. Please check the header information.
  3. Generate output file name.
  4. Set the column number of label information.
  5. Check items for filterling.
  6. Set thresholds for filterling.
  7. Push 'Start processing' button.

Data normalization (

The original peak intensity values were divided with that of internal standards (lidocaine (m/z 235 [M + H]+, eluted at 4.19 min) and (-)-camphor-10-sulfonic acid (m/z 231 [M - H]-, eluted at 3.84 min) for the positive and negative ion modes, respectively) determined in the same samples to normalize the peak intensity values among the metabolic profile data. These standards were selected due to unnatural compounds that underwent stable ionization without being affected by the sample matrix (Fig. 2). The purpose of the IS-based normalization was the correction of the errors other than those caused by ion suppression in contrast to the recently reported multi-internal standard approach for the correction of an ion-suppression effect in the focused metabolite analysis (Sysi-Aho et al., 2007). The internal standards and LockSpray apparatus for the calibration of the mass to charge ratio (m/z) (Oikawa et al., 2006; Suzuki et al., 2007; Wolff et al., 2001) were not employed since the MetAlign only deals with the unit mass data (De Vos et al., 2007), and it has been reported that an accuracy in the mass number (5-10 ppm) is insufficient for estimating the unique molecular formula (Kind and Fiehn, 2006).

Figure 2 Screen Shot of

How to use...

  1. Check 'Perk Nr' value (in the first column) of internal standard (IS) before the normalization.
  2. Select a file for processing.
  3. Please check the header information.
  4. Generate output file name.
  5. Set the total number of label column.
  6. Set the 'Perk Nr' value of IS.
  7. Push 'Start processing' button.
  8. 'Peak intensity threshold' means the thresholds of peak intensity value for exceptional processing of low intensity data. '25' is recommended for the metalign data. If users don't like to use this function, please set '0'.

Cut-off of low-intensity data (

Rows containing low-intensity and/or unreliable data were filtered off by the following procedure by the aid of the function of In the case of the Arabidopsis tissue sample set, since each row included a total of 32 intensity data (4 groups by 8 replicates), rows including at least one group whose intensity values of all the 8 replicates were above the cutoff value (0.0183, S/N = 5) were retained in the matrix, while the others were filtered off. The has been operated with the following parameters; Group: 8,8,8,8; Minimum number of samples in each group: 8,8,8,8; Peak intensity threshold: 0.0183 (Fig. 3).

Figure 3 Screen Shot of

How to use...

  1. Select a file for processing.
  2. Please check the header information.
  3. Generate output file name.
  4. Set the total number of label column.
  5. Set sample groups for filterling.
  6. Push 'Start processing' button.

Isotope peak deconvoluion (

It has been observed that several types of ions with different mass numbers, such as fragment, adduct, and isotope ions, were generated in addition to the protonated molecule from a single metabolite during the electrospray ionization (ESI). The deconvolution of the peaks is desired in order to reduce the data redundancy, since the signals derived from these ions were recognized as distinctive peaks and recorded in different rows in the matrix. Among the redundant data in the matrix, the peaks derived from isotope ions are easily detectable because these retention times and mass numbers (m/z) are predictable from those of non-labeled signals. In addition, the intensities of the isotope peaks must be lower than those of the corresponding nonlabeled signals, and the ratio of these intensities must be nearly identical among the samples. By using these characteristics, the peaks (rows) derived from the isotope ions in the matrix were eliminated by the following procedure in this study (Fig. 4).

  1. For a peak (row A), a candidate nonlabeled peak (row B) with the highest correlation coefficient above the threshold value (rthres > 0.8) was selected from the candidate peaks that were (i) eluted at similar retention times (within the retention time threshold ( Rt = 0.5 s)), (ii) observed at a smaller mass number (within the mass number threshold ( m/z less than 3 Da)), and (iii) with a higher averaged intensity.
  2. If row B has another counterpart (row C), the counterpart of row A would be changed to row C.
  3. If row C has no counterpart, row A will be removed from the matrix, and its information will be described in the "deconvolution" column in row C. The peak deconvolution method depends on the correlation coefficient of the intensity values among the rows in the matrix, implying that the method can deal with a matrix comprising more than 15-20 samples for calculating the reliable correlation coefficient values.

The twenty to thirty percent of rows derived from the isotope ions were discarded from the matrix in the fifth step (data not shown) suggesting that the step is important for reducing the data redundancy. However, the resultant data matrix still contains data derived from adduct and fragment ions. For the processing of the GC-MS data, effective deconvolution software such as AMDIS have been developed, by which series of many fragment and these isotope peaks generated by electron-impact (EI) ionization are deconvoluted based on the similarity between their shapes in the chromatograms (Broeckling et al., 2006; Halket et al., 1999; Lisec et al., 2006). Though a few applications of the method for the treatment of LC-MS data have been reported (Furtula et al., 2006; Roepenack-Lahaye et al., 2004), the most peak-picking software developed for the LC-MS data, such as MetAlign, have not included the deconvolution function due to the distinct algorisms for peak-picking and alignment of profile data employed.

The method has been applied for the deconvoluton of fragment and adduct ions in the matirx by changing the step (1) in the above procedure to the following:

  1. (1) For a peak (row A), a candidate nonlabeled peak (row B) with the highest correlation coefficient above the threshold value (rthres > 0.8) was selected from the candidate peaks that were (i) eluted at similar retention times (within the retention time threshold ( Rt = 0.5 s)) and (ii) with a higher averaged intensity.

By using this method, it has been observed that two distinctive peaks of flavonol glucosides eluted were deconvoluted into single metabolites since the retention time of these biosynthetical related metabolites were essentially same with each other (data not shown). Thus, the methods was not employed for the deconvolution of fragment and adduct peaks.

Figure 4 Screen Shot of

How to use...

  1. Select a file for processing.
  2. Please check the header information.
  3. Generate output file name.
  4. Set the total number of label column.
  5. Set the threshold values for 'Parameters for deconvolution of isotope peaks'. Initial values are recommended for processing of metalign data.
  6. Set the threshold values for 'Parameters for deconvolution of fragment and adduct peaks'. Please do not use this function in the current version.
  7. Push 'Start processing' button.

Peak annotation (

The retention time and mass number data of each row in the matirx were compared with that of all accessions of standard compound data, MS2T library, and curated annotation list (Available from "Download section"). The accessions with identical mass number as well as the similar retention time (retention time less than 0.05 min for standard compound and curated annotation, retention time less than 0.15 min for MS2T library)(Fig. 5).

Figure 5 Screen Shot of

How to use...

  1. Select a file for processing.
  2. Please check the header information.
  3. Generate output file name.
  4. Set the column number of label information.
  5. Select a file for the curated annotation information.
  6. Set the threshold time (min) for the addition of information to the peaks. If the mass number is identical and the retention time of annotation information is within that of a peak in the matix +- threshold, the annotation information is added to the peak.
  7. Select a file for the standard compound information.
  8. Select a file of MS2T library.
  9. Drift of retention time can be corrected by 'Rt correction'.
  10. Check 'with MS2T annotation' if you need the MS2T data.
  11. Push 'Start processing' button.

Interpretation of MS2T data

KNApSAcK search
The high-resolution mass number (m/z) data of the precursor ion of each MS2T was compared with that of the theoretical values of the protonated molecules [M+H]+ of the metabolites recorded in the KNApSAcK, which contains the structural information of 21,061 naturally occurring metabolites (Oikawa et al., 2006). The threshold for the error in the mass number was set to 5 mDa.
MassBank Search
The MS/MS spectral data of the MS2Ts were queried to MassBank, the database of high-resolution mass spectra of metabolites released by the JST-BIRD group (Taguchi et al., 2007), by using the "Batch Search Service" function. The series of queried results were stored and used for the addition of the structural information of peaks tagged with corresponding MS2T by evaluating the hits score (>0.8).
Literature data search
More than 900 MS/MS spectra of plant metabolites reported in the literature were collected and stored in the database (ReSpect for Phytochemicals). The MS/MS spectral data of MS2Ts were compared with them by the hits score (>0.8) determined with the cosine product method (Stein and Scott, 1994) by using the in-house Perl scripts. The details of the literature MS/MS spectra will be described in elsewhere.
Spectral motif search
The motif search of the MS2T data was performed by the in-house script written with Perl.

Literature Cited

Broeckling, C.D., Reddy, I.R., Duran, A.L., Zhao, X. and Sumner, L.W. (2006) MET-IDEA: data extraction tool for mass spectrometry-based metabolomics. Anal Chem, 78, 4334-4341.
De Vos, R.C., Moco, S., Lommen, A., Keurentjes, J.J., Bino, R.J. and Hall, R.D. (2007) Untargeted large-scale plant metabolomics using liquid chromatography coupled to mass spectrometry. Nature protocols, 2, 778-791.
Furtula, V., Derksen, G. and Colodey, A. (2006) Application of automated mass spectrometry deconvolution and identification software for pesticide analysis in surface waters. Journal of environmental science and health. Part, 41, 1259-1271.
Halket, J.M., Przyborowska, A., Stein, S.E., Mallard, W.G., Down, S. and Chalmers, R.A. (1999) Deconvolution gas chromatography/mass spectrometry of urinary organic acids--potential for pattern recognition and automated identification of metabolic disorders. Rapid Commun Mass Spectrom, 13, 279-284.
Katajamaa, M. and Oresic, M. (2005) Processing methods for differential analysis of LC/MS profile data. BMC Bioinformatics, 6, 179.
Kind, T. and Fiehn, O. (2006) Metabolomic database annotations via query of elemental compositions: Mass accuracy is insufficient even at less than 1 ppm. Bmc Bioinformatics, 7, 234.
Lisec, J., Schauer, N., Kopka, J., Willmitzer, L. and Fernie, A.R. (2006) Gas chromatography mass spectrometry-based metabolite profiling in plants. Nature protocols, 1, 387-396.
Oikawa, A., Nakamura, Y., Ogura, T., Kimura, A., Suzuki, H., Sakurai, N., Shinbo, Y., Shibata, D., Kanaya, S. and Ohta, D. (2006) Clarification of pathway-specific inhibition by Fourier transform ion cyclotron resonance/mass spectrometry-based metabolic phenotyping studies. Plant Physiol, 142, 398-413.
Roepenack-Lahaye, E.v., Degenkolb, T., Zerjeski, M., Franz, M., Udo Roth, Wessjohann, L., Schmidt, J.r., Scheel, D. and Clemens, S. (2004) Profiling of Arabidopsis secondary metabolites by capillary liquid chromatography coupled to electrospray ionization quadrupole time-of-flight mass spectrometry. Plant Physiol., 134, 548-559.
Stein, S.E. and Scott, D.R. (1994) Optimization and Testing of Mass-Spectral Library Search Algorithms for Compound Identification. J Am Soc Mass Spectr, 5, 859-866.
Suzuki, H., Sasaki, R., Ogata, Y., Nakamura, Y., Sakurai, N., Kitajima, M., Takayama, H., Kanaya, S., Aoki, K., Shibata, D. and Saito, K. (2007) Metabolic profiling of flavonoids in Lotus japonicus using liquid chromatography Fourier transform ion cyclotron resonance mass spectrometry. Phytochemistry, 69, 99-111.
Sysi-Aho, M., Katajamaa, M., Yetukuri, L. and Oresic, M. (2007) Normalization method for metabolomics data using optimal selection of multiple internal standards. BMC Bioinformatics, 8, 93.
Taguchi, R., Nishijima, M. and Shimizu, T. (2007) Basic analytical systems for lipidomics by mass spectrometry in Japan. Methods in enzymology, 432, 185-211. Wolff, J.C., Eckers, C., Sage, A.B., Giles, K. and Bateman, R. (2001) Accurate mass liquid chromatography/mass spectrometry on quadrupole orthogonal acceleration time-of-flight mass analyzers using switching between separate sample and reference sprays. 2. Applications using the dual-electrospray ion source. Anal Chem, 73, 2605-2612.

Protocol 3:Acquisition of MS/MS spectral tags (MS2Ts)

MS2T data acquisition
For the acquisition of the MS/MS spectra of Arabidopsis metabolites, the sample extracts (3 microl) were subjected to the LC-ESI-MS system described above. The system was operated under the same conditions mentioned above, except for the following changes: gradient program: 1:99, v/v, at 0 min; 1:99, v/v, at 0.2 min; 99.5:0.5 at 31min; 99.5:0.5 at 34.0 min; 1:99, v/v, at 34.2 min; 1:99 at 40 min; flow rate: 0.15 ml/min. MS detection: detection mode: Survey. In this mode, following the acquisition of the MS spectrum (m/z: 100-1,000; dwell time: 0.45 s, interscan delay: 0.05 s), the MS/MS data of the most abundant ion were automatically obtained (m/z: 50-1000; dwell time: 2.5 s; interscan delay: 0.5 s). The mass/charge ratio (m/z) was calibrated with the lock-mass apparatus by using the protonated or deprotonated molecule of leucine enkephalin as the standards. The analyses were repeated 25 times to acquire as many MSTs as possible. The data were recorded with the aid of the MassLynx version 4.1 software (Waters), and then processed to the MSTs by the in-house Perl script.

Modified: 2010-1-20

RIKEN Plant Science Center
Metabolome analysis research team, LC-MS Branch