1. Testing dataset¶

We use the same example dataset as used by 10X Genomics. Raw data (in BAM format) were downloaded from the NCBI Sequence Read Archive (SRA). The study was published in [1].

1.1. Download raw data¶

File_name	GSM_accession	SRR_accession	File_size	Treatment	MD5
C05.bam.1	GSM3308718	SRR7611046	70 Gb	normal	97b87c87b539e69dad7dcb04e8f03132
C07.bam.1	GSM3308720	SRR7611048	64 Gb	irradiated	064669deb6be22e5f82fe58679f7e394

1.2. Convert BAM to FASTQ¶

Download bamtofastq from here. Convert BAM into FASTQ files.

$ bamtofastq C05.bam.1 normal_dat
$ bamtofastq C07.bam.1 irradiated_dat

After this step, you will get two subdirectories (./normal_dat and ./irradiated_dat) under your current directory. And within ./normal_dat and ./irradiated_dat, there are subdirectories and fastq files, for example

$ cd ./normal_dat
$ tree
.
├── indepth_C05_MissingLibrary_1_HL5G3BBXX
│   ├── bamtofastq_S1_L003_I1_001.fastq.gz
│   ├── bamtofastq_S1_L003_I1_002.fastq.gz
│   ├── bamtofastq_S1_L003_R1_001.fastq.gz
│   ├── bamtofastq_S1_L003_R1_002.fastq.gz
│   ├── bamtofastq_S1_L003_R2_001.fastq.gz
│   ├── bamtofastq_S1_L003_R2_002.fastq.gz
│   ├── bamtofastq_S1_L004_I1_001.fastq.gz
│   ├── bamtofastq_S1_L004_I1_002.fastq.gz
│   ├── bamtofastq_S1_L004_I1_003.fastq.gz
│   ├── bamtofastq_S1_L004_I1_004.fastq.gz
│   ├── bamtofastq_S1_L004_I1_005.fastq.gz
│   ├── bamtofastq_S1_L004_I1_006.fastq.gz
│   ├── bamtofastq_S1_L004_R1_001.fastq.gz
│   ├── bamtofastq_S1_L004_R1_002.fastq.gz
│   ├── bamtofastq_S1_L004_R1_003.fastq.gz
│   ├── bamtofastq_S1_L004_R1_004.fastq.gz
│   ├── bamtofastq_S1_L004_R1_005.fastq.gz
│   ├── bamtofastq_S1_L004_R1_006.fastq.gz
│   ├── bamtofastq_S1_L004_R2_001.fastq.gz
│   ├── bamtofastq_S1_L004_R2_002.fastq.gz
│   ├── bamtofastq_S1_L004_R2_003.fastq.gz
│   ├── bamtofastq_S1_L004_R2_004.fastq.gz
│   ├── bamtofastq_S1_L004_R2_005.fastq.gz
│   └── bamtofastq_S1_L004_R2_006.fastq.gz
└── indepth_C05_MissingLibrary_1_HNNWNBBXX
    ├── bamtofastq_S1_L002_I1_001.fastq.gz
    ├── bamtofastq_S1_L002_I1_002.fastq.gz
    ├── bamtofastq_S1_L002_I1_003.fastq.gz
    ├── bamtofastq_S1_L002_I1_004.fastq.gz
    ├── bamtofastq_S1_L002_I1_005.fastq.gz
    ├── bamtofastq_S1_L002_R1_001.fastq.gz
    ├── bamtofastq_S1_L002_R1_002.fastq.gz
    ├── bamtofastq_S1_L002_R1_003.fastq.gz
    ├── bamtofastq_S1_L002_R1_004.fastq.gz
    ├── bamtofastq_S1_L002_R1_005.fastq.gz
    ├── bamtofastq_S1_L002_R2_001.fastq.gz
    ├── bamtofastq_S1_L002_R2_002.fastq.gz
    ├── bamtofastq_S1_L002_R2_003.fastq.gz
    ├── bamtofastq_S1_L002_R2_004.fastq.gz
    ├── bamtofastq_S1_L002_R2_005.fastq.gz
    ├── bamtofastq_S1_L003_I1_001.fastq.gz
    ├── bamtofastq_S1_L003_I1_002.fastq.gz
    ├── bamtofastq_S1_L003_R1_001.fastq.gz
    ├── bamtofastq_S1_L003_R1_002.fastq.gz
    ├── bamtofastq_S1_L003_R2_001.fastq.gz
    └── bamtofastq_S1_L003_R2_002.fastq.gz

1.3. Run CellRanger count workflow¶

Download cellranger and Mouse reference dataset from here

$ cellranger --version
cellranger 4.0.0

# run cellranger for normal sample
$ cd ./normal_dat
$ cellranger count  --id=normal        --transcriptome=/XYZ/CellRanger/refdata-gex-mm10-2020-A  --fastqs=./indepth_C05_MissingLibrary_1_HL5G3BBXX,./indepth_C05_MissingLibrary_1_HNNWNBBXX

# run cellranger for irradiated sample
$ cd ./irradiated_dat
$ cellranger count  --id=irradiated    --transcriptome=/XYZ/CellRanger/refdata-gex-mm10-2020-A  --fastqs=./indepth_C07_MissingLibrary_1_HL5G3BBXX,./indepth_C07_MissingLibrary_1_HNNWNBBXX

After each cellranger count workflow is finished successfully. Subdirectories normal and irradiated will be created, which contain the cellranger outputs. For example,

$ cd normal
$ ls -F
_cmdline     _invocation  _mrosource     _perf               _tags       _vdrkill
_filelist    _jobmode    normal.mri.tgz  SC_RNA_COUNTER_CS/  _timestamp  _versions
_finalstate  _log        outs/           _sitecheck          _uuid

Note

Replace /XYZ/ with the actual path on your system.

1.4. Run CellRanger aggr workflow¶

First, make the library.csv file. This CSV file has two columns which define the ID and the location of the molecule_info.h5 file from each run.

$ cat  library.csv

library_id,molecule_h5
normal,/ABC/normal_dat/normal/outs/molecule_info.h5
irradiated,/ABC/irradiated_dat/irradiated/outs/molecule_info.h5

Note

Replace /ABC/ with the actual path on your system.

Then, run cellranger aggr workflow. The cellranger aggr workflow aggregates outputs from multiple runs of the cellranger count workflow

$ cellranger aggr --id=aggr --csv=libraries.csv

After each cellranger aggr workflow is finished successfully. A subdirectory aggr will be created, which contain the cellranger outputs. For example,

$ cd aggr
$ ls -F
aggr.mri.tgz  _finalstate  _log        _perf                 _tags       _vdrkill
_cmdline      _invocation  _mrosource  SC_RNA_AGGREGATOR_CS/  _timestamp  _versions
_filelist     _jobmode    outs/       _sitecheck             _uuid

1.5. References¶

[1]

Ayyaz A, Kumar S, Sangiorgi B, Ghoshal B, Gosio J, Ouladan S, Fink M, Barutcu S, Trcka D, Shen J, Chan K, Wrana JL, Gregorieff A. Single-cell transcriptomes of the regenerating intestine reveal a revival stem cell. Nature. 2019 May;569(7754):121-125. doi: 10.1038/s41586-019-1154-y. Epub 2019 Apr 24. PMID: 31019301.