1. BC_edit_matrix.py¶
1.1. Description¶
This program generates heatmaps to visualize the positions (X-axis), type of edits/corrections (Y-axis, such as “C” to “T”) and frequencies (color) of error-corrected nucleotides in cell barcodes and UMIs.
1.2. Options¶
--version show program’s version number and exit -h, --help show this help message and exit -i IN_FILE, --infile=IN_FILE Input file in BAM foramt. -o OUT_FILE, --outfile=OUT_FILE The prefix of output files. --limit=READS_NUM Number of alignments to process. default=none --cr-tag=CR_TAG Tag of cellular barcode reported by the sequencer in BAM file. default=’CR’ --cb-tag=CB_TAG Tag of error-corrected cellular barcode in BAM file. default=’CB’ --ur-tag=UR_TAG Tag of UMI reported by the sequencer in BAM file. default=’UR’ --ub-tag=UB_TAG Tag of error-corrected UMI in BAM file. default=’UB’ --cell-width=CELL_WIDTH Points of cell width in the heatmap. default=15 --cell-height=CELL_HEIGHT Points of cell height in the heatmap. default=10 --font-size=FONT_SIZE Font size. If –display-num was set, fontsize_number = 0.8 * font_size. default=8 --angle=COL_ANGLE The angle (must be 0, 45, 90, 270, 315) of column text lables under the heatmap. default=45 --text-color=TEXT_COLOR The color of numbers in each cell. default=black --file-type=FILE_TYPE The file type of heatmap. Choose one of ‘pdf’, ‘png’, ‘tiff’, ‘bmp’, ‘jpeg’. default=pdf --verbose If set, detailed running information is printed to screen. --no-num If set, will not print numerical values to cells. default=False
1.3. Input file format¶
BAM file with the following tags:
- CB : cellular barcode sequence that is error-corrected
- CR : cellular barcode sequence as reported by the sequencer.
- UB : molecular barcode sequence that is error-corrected
- UR : molecular barcode sequence as reported by the sequencer.
1.4. Example (Visualize sample barcode)¶
$ python3 BC_edit_matrix.py -i normal_possorted_genome_bam.bam --limit 5000000 -o output
2020-09-30 08:59:21 [INFO] Reading BAM file "normal_possorted_genome_bam.bam" ...
2020-09-30 09:00:03 [INFO] Total alignments processed: 5000000
2020-09-30 09:00:03 [INFO] Number of alignmenets with <cell barcode> kept AS IS: 4876615
2020-09-30 09:00:03 [INFO] Number of alignmenets wiht <cell barcode> edited: 47377
2020-09-30 09:00:03 [INFO] Number of alignmenets with <cell barcode> missing: 76008
2020-09-30 09:00:03 [INFO] Number of alignmenets with UMI kept AS IS: 4973597
2020-09-30 09:00:03 [INFO] Number of alignmenets wiht UMI edited: 24842
2020-09-30 09:00:03 [INFO] Number of alignmenets with UMI missing: 1561
2020-09-30 09:00:03 [INFO] Writing cell barcode frequencies to "output.CB_freq.tsv"
2020-09-30 09:00:03 [INFO] Writing UMI frequencies to "output.UMI_freq.tsv"
2020-09-30 09:00:04 [INFO] Writing the nucleotide editing matrix (count) of cell barcode to "output.CB_edits_count.csv"
2020-09-30 09:00:04 [INFO] Writing the nucleotide editing matrix of molecular barcode (UMI) to "output.UMI_edits_count.csv"
2020-09-30 09:00:04 [INFO] Writing R code to "output.CB_edits_heatmap.r"
2020-09-30 09:00:04 [INFO] Displayed numerical values on heatmap
2020-09-30 09:00:04 [INFO] Numbers will be displayed on log2 scale
2020-09-30 09:00:04 [INFO] Running R script file "output.CB_edits_heatmap.r"
Loading required package: Matrix
Loading required package: SPAtest
Loading required package: pheatmap
2020-09-30 09:00:07 [INFO] Writing R code to "output.UMI_edits_heatmap.r"
2020-09-30 09:00:07 [INFO] Displayed numerical values on heatmap
2020-09-30 09:00:07 [INFO] Numbers will be displayed on log2 scale
2020-09-30 09:00:07 [INFO] Running R script file "output.UMI_edits_heatmap.r"
Loading required package: Matrix
Loading required package: SPAtest
Loading required package: pheatmap
1.5. out put files¶
- output.CB_edits_count.csv : editing matrix of cellular barcodes in CSV format.
- output.CB_freq.tsv : corrected cell barcodes and their frequencies.
- output.CB_edits_heatmap.pdf : heatmap showing the positions, types and frequencies of nucleotides that have been corrected.
- output.CB_edits_heatmap.r : R script for the above heatmap.
- output.UMI_edits_count.csv : editing matrix of UMIs in CSV format.
- output.UMI_freq.tsv : corrected UMIs and their frequencies.
- output.UMI_edits_heatmap.pdf : heatmap showing the positions, types and frequencies of nucleotides that have been corrected.
- output.UMI_edits_heatmap.r : R script for the above heatmap.
Three files were generated.
- I1.count_matrix.csv
- I1.logo.pdf
- I1logo.mean_centered.pdf
output.CB_edits_heatmap.pdf
output.UMI_edits_heatmap.pdf