这是一个简单的例子,说明克伦威尔包(Coombes等人,Proteomics 2005)用于分析SELDI/MALDI蛋白质组谱。* * * * *的警告。这个例子不是自包含的。* * * * *使用这个示例中,您将需要提供(除文件)一个工作版本的MATLAB(我们已经测试了这只在版本6.5和更高版本)和大米小波工具箱,可以从www-dsp.rice.edu/software/rwt.shtml这应该安装在本文件夹RawData文件解压缩时产生。我们已经把克伦威尔的副本放进去了。* * *最终警告。这个例子不是自包含的。***这里使用的光谱是Pusztai等人(2004)《癌症100:1814-22》中使用的光谱的子集。完整的数据集可以从http://bioinformatics.mdanderson.org获得。这里的光谱是20个来自普通血清库的QC样本的低质量扫描; all of the spectra should be telling the same story here. All of the spectra have intensities at 33885 m/z values, and the vector of m/z values is the same for all spectra. The machine was calibrated to a set of known peaks shortly before the entire set of spectra was run. The spectra were run in randomized order on a series of chips. The files have been provided in 2 formats. RawBinary contains "Low_mass_serum_QC.xpt" which is the binary format used by the Ciphergen software. This file contains all of the spectra used here. RawXML contains all of the spectra exported from the above xpt file using the Ciphergen software (version 3.1.1). This format contains the spectra intensities both before any processing (the integer counts in tofDataSamples) and after application of various correction factors (the m/z, intensity pairs in processedDataSamples). The XML files also contain all of the setting parameters used in processing the data, including run times of the spectra (which in general can be used to confirm the randomness or lack thereof of the run order with respect to sample group). Due to historical development, the scripts in Cromwell were written to deal with files in two-column .csv format, with the first column corresponding to M/Z and the second to intensity. We extract these files from the XML files using the kludged script xml2txt.pl (this takes about 5-10 sec on my laptop.) Note that this script does not simply take the last half of the XML datafile, consisting of the M/Z,Intensity pairs returned by the Ciphergen software. Rather, it takes these M/Z values but draws the Intensity values from the raw integer counts supplied in tofDataSamples, thus getting the data before any preprocessing has been applied. Running the above script will place .txt versions of the spectra in the folder RawSpectra/. Finally, we shift to the processing of the raw spectra to produce baseline corrected and smoothed spectra together with matrices of peak intensities. This procedure is detailed in processSpectra.m (our m-file) Most of the processing is described in far more detail in the m-file named above, and one or two illustrative pictures will be stored in Figs/. Matlab binary .mat files (CorrectedSpectra.mat and Peaks.mat) will be produced for later analysis. Hope that helps!