Data files (in the data subfolder):
Extensions: txt = tab separated file, tsv = tab separated file, csv = comma separated file, gz uses the gzip compression.
File Name |
Description |
Mamu_MHC_Class_1_SIV_Peptide_Intensities_From_Vendor.tsv.gz |
Raw peptide data for the SIVMAC239 peptides as received from the vendor. Includes peptides from other virus strains as well. Includes MHC A001,A002, B002 and B017 |
GagCM9_TatSL8_Substitution_Array_From_Vendor.txt.gz | Raw peptide data as received from the vendor for the maturation plot for MHC A001. Includes substitution data for Gag CM9 and Tat SL8. |
Mamu_MHC_Class_1_SIV_Peptide_metadata.tsv | Look up table for Vendor to MHC Name translation and additional information about the samples |
SIVMAC239_CORR_KEY.csv.gz | SIV MAC239 probe sequence to Sequence ID, Virus Name, Protein Name, and Position in the sequence |
IC50_log2_binding_score.csv.gz |
Manually Parsed table baed on the results of SIV_MHC_Peptide_Array_Ranking_Pipeline.ipynb and publicly available IC50 values Used as an input for SIV_MHC_IC50_vs_Peptide_Array_ROC_Wilcoxin_Signed_Test.ipynb |
Code:
There are 3 scripts used to aggregate, plot and perform statistical tests on the data. These scripts are in the CODE subfolder.
File name | Description |
SIV_MHC_Peptide_Array_Ranking_Pipeline.ipynb |
This script transforms and aggregates raw peptide data by taking the log transform of the data, and the median value of replicate peptide values. Next, the sample columns are renamed from the vendor_naming convention to the corresponding MHC Name Next, the data is merged with the corresponding key (to join the Probe Sequence to the Sequence data. Finally the data is ranked based aggregated intensity value, grouped by the MHC Name |
SIV_MHC_IC50_vs_Peptide_Array_ROC_Wilcoxin_Signed_Test.ipynb |
This takes the aggregated data and performs the statistical tests. This tests the data for normality and chooses the statistical test based on the inputs. The final pipeline used the non-parametric, unpaired Wilcoxon Rank-Sum test. Quantile-quantile plots, Histograms, and Box Plots are generated. |
GagCM9_TatSL8_Substitution_Analysis.ipynb |
This script transforms and aggregates the raw peptide data as receive from the vendor to a form that is for suitable for statistical analysis and plotting. The script performs the substitutions, based on the vendor's standard. Finally line plots and heatmaps are generated of the data |
System Requirements