MHC genometyping of Indian rhesus macaques

Data

Macaque Exome Datasets

  • Exome sequences are available from the sequence read archive (SRA) under BioProject PRJNA527214, and BioProject PRJNA529708.

Reference Sequences

  • The file IPD-Mamu-MHC-3Sept2018.gb.txt contains the Macaca mulatta (Mamu) sequences from the IPD database, downloaded on September 3, 2018. The Mamu entries were parsed using a simple python script
  • The file IPD_EXON2_9_3_2018.fasta contains sequences have been trimmed to the length of the MiSeq amplicons, and deduplicated. This IPD exon 2 reference file was used to compare genotyping results from the MiSeq and MES assays.
  • The file HLA_e2_e4_reference_sequences.fasta was used to enumerate MHC sequences and enrich MES reads prior to assembly.

Sequence Analysis Tools

  • The zipped directory pcr_amplicon.zip contains the jupyter notebook that runs MiSeq genotyping analysis with the IPD_EXON2_9_3_2018.fasta reference file.
  • The zipped directory DSR_PUBLIC.zip contains the following files:
    • 22147_ALL.sub and genometyping_file.sm contain the job submission and snakemake files, respectively.
    • fastq_list.txt and baylor_10_11_samples.txt are representative sample sheets of sequences to be analyzed.
    • The text files HLA_e2_e4_reference_sequences.fasta and IPD_EXON2_9_3_2018.fasta are fasta reference files as a part of this analysis.
  • The zipped directory SAVAGE_Scripts.zip contains the following files:
    • count_reads.py obtains number of reads extracted with our reference file
    • pivot_create.py creates a pivot table
    • sample_sheet_27.txt is a representative sample sheet of sequences to be analyzed.
    • make_genometyping_env.sh and and genometyping_file.sm creates a job submission, and a snakemake job
    • the text files beginning with 'HLA' were the parsed fasta files from HLA_e2_e4_reference_sequences.fasta for HLA class I and HLA class II, and the text file IPD_EXON2_9_3_2018.fasta contains the trimmed and deduplicated exon 2 sequences.
  • The Python Data Analysis Library Pandas was used for data analysis by all three methods that were discussed in the paper.

Supplementary Files

  • The file RheMac2.txt contains rheMac2 and HLA target coordinates that were used to prepare the combined minimal MHC and supplemental rhesus spike-in probe design.
  • The file RheMac8_VCRome_Rhexome2.ensembl.bed.txt contains the BED file of rhesus rheMac2 target coordinates lifted over to Mmul_8.0.1.
  • The Illumina Barcoded Paired-End Capture Library Preparation Protocol (2016) is the Baylor College of Medicine Human Genome Sequencing Center's protocol for capture library preparation.
  • The file Supplementary_Table_1_rev1.pdf contains the fraction of total exome sequence reads corresponding to MHC class I and class II genes after target capture with the VCRome2.1 probe design alone.
  • The file Supplemental_Figures_1_2.xlsx contains Supplementary Figures 1 & 2. Supplementary Figure 1 compares genotyping results from MiSeq PCR amplicon sequencing among all 27 animals with whole exome genotyping assays, using the DSR and SAVAGE strategies. Supplementary Figure 2 contains abbreviated Mamu-A, -B, and -DRB haplotype definitions for the animals that were evaluated in this study.
  • The file Supplementary_Fig_3_rev.pdf describes the formation of PCR chimeras during sequencing.
  • The Weatherall Report PDF describes regulations and guidelines outlined in the Animal Welfare Act, the Guide for the Care and Use of Laboratory Animals, and best ethical practices for scientific research involving animal subjects.