Tophat

그냥 심심해서 한번 돌려봄. ㅋ
RNA Seq 데이터는 이거

Encode 12878 Cell Data 1
Encode 12878 Cell Data 2
돌린 사양은 걍 2010 iMac 27 Core i5 2.66GHz (4 Core), 램 16GB

[2012-05-08 08:36:58] Beginning TopHat run (v2.0.0)
-----------------------------------------------
[2012-05-08 08:36:58] Checking for Bowtie
		  Bowtie version:	 2.0.0.5
[2012-05-08 08:36:58] Checking for Samtools
		Samtools version:	 0.1.18.0
[2012-05-08 08:36:58] Checking for Bowtie index files
[2012-05-08 08:36:58] Checking for reference FASTA file
	Warning: Could not find FASTA file ./hg19/hg19.fa
[2012-05-08 08:36:58] Reconstituting reference FASTA file from Bowtie index
  Executing: /Users/suknamgoongold/RNASeq/bin/bowtie2-inspect ./hg19/hg19 > ./tophat_out/tmp/hg19.fa
[2012-05-08 08:39:42] Generating SAM header for ./hg19/hg19
	format:		 fastq
	quality scale:	 phred33 (default)
[2012-05-08 08:39:45] Preparing reads
	 left reads: min. length=76, count=112054045
	right reads: min. length=76, count=111968908
[2012-05-08 09:59:53] Mapping left_kept_reads against hg19 with Bowtie2 
[2012-05-08 17:05:01] Mapping left_kept_reads_seg1 against hg19 with Bowtie2 (1/3)
[2012-05-08 18:19:20] Mapping left_kept_reads_seg2 against hg19 with Bowtie2 (2/3)
[2012-05-08 19:36:24] Mapping left_kept_reads_seg3 against hg19 with Bowtie2 (3/3)
[2012-05-08 21:02:36] Mapping right_kept_reads against hg19 with Bowtie2 
[2012-05-09 01:29:24] Mapping right_kept_reads_seg1 against hg19 with Bowtie2 (1/3)
[2012-05-09 02:17:42] Mapping right_kept_reads_seg2 against hg19 with Bowtie2 (2/3)
[2012-05-09 02:52:55] Mapping right_kept_reads_seg3 against hg19 with Bowtie2 (3/3)
[2012-05-09 03:31:14] Searching for junctions via segment mapping
[2012-05-09 04:29:32] Retrieving sequences for splices
[2012-05-09 04:32:43] Indexing splices
[2012-05-09 04:39:53] Mapping left_kept_reads_seg1 against segment_juncs with Bowtie2 (1/3)
[2012-05-09 04:55:35] Mapping left_kept_reads_seg2 against segment_juncs with Bowtie2 (2/3)
[2012-05-09 05:12:01] Mapping left_kept_reads_seg3 against segment_juncs with Bowtie2 (3/3)
[2012-05-09 05:29:52] Joining segment hits
[2012-05-09 07:11:27] Mapping right_kept_reads_seg1 against segment_juncs with Bowtie2 (1/3)
[2012-05-09 07:19:18] Mapping right_kept_reads_seg2 against segment_juncs with Bowtie2 (2/3)
[2012-05-09 07:27:29] Mapping right_kept_reads_seg3 against segment_juncs with Bowtie2 (3/3)
[2012-05-09 07:36:20] Joining segment hits
[2012-05-09 08:24:39] Reporting output tracks
-----------------------------------------------
[2012-05-09 12:29:15] Run complete: 1 days 03:52:16 elapsed

27시간 52분 -.-;;;
막판의 output tracks 부분에서 메모리 약 12GB 정도 점유함.

요런 결과

samtool index accepted_hits.bam

해서 bam file 인덱스 만들고 IGV에서 불러옴

기타

– bowtie2 얼라인 중에는 약 3Gb 정도의 메모리 풋프린트이나 막판 output 할때 메모리 대박 씀. 스왑파일 읽다가 거의 좀비되었으나 거의 모든 잡 다 죽이고 메모리 확보해서 되살림
– Multithread 되는 과정은 bowtie 얼라인, 결과출력 정도?
– 흙 이정도 스펙으로는 RNA-Seq 얼라인도 사실 좀 버겁다. 8-12코어 맥프로에 램 32-64Gb 정도 박으면 쓸만해질까? (돈없 -.-;;)

6 thoughts on “Tophat

  1. 죄송합니다. 허락도 없이 글을 남겨서요. BIOINFORMATICS 혼자 힘으로 공부하고 있는데요.
    위의 매핑 CODE좀 얻을 수 알수 있을까요? 죄송합니다.
    혼자 버둥거리는데, 계속 에러가 나서요.
    기분상하게 하셨으면 죄송합니다.

  2. 안녕하세요? 위의 질문 던진 사람입니다. 덕분에 일단 tophat running은 시행하였는데요.
    이런 코멘드로 넣고 돌렸습니다.
    CMD= “tophat -p 4 –GTF $GTF -r $Inner_Len –mate-std-dev $STD_Dev -o $OUT_DIR hg19 $Sample_fastq_left $Sample_fastq_right”

    에러가 났는데, 도저히 구글링해서 답이 안나와서요..
    혹시 도움좀 얻을 수 있을까 해서요. 죄송합니다….

    [2012-07-27 19:58:13] Checking for Bowtie
    Bowtie version: 2.0.0.5
    [2012-07-27 19:58:13] Checking for Samtools
    Samtools version: 0.1.18.0
    [2012-07-27 19:58:13] Checking for Bowtie index files
    [2012-07-27 19:58:13] Checking for reference FASTA file
    [2012-07-27 19:58:13] Generating SAM header for hg19
    format: fastq
    quality scale: phred33 (default)
    [2012-07-27 19:58:17] Reading known junctions from GTF file
    [2012-07-27 19:58:22] Preparing reads
    left reads: min. length=101, count=23587855
    right reads: min. length=101, count=23603346
    [2012-07-27 20:21:15] Creating transcriptome data files..
    a[2012-07-27 20:21:48] Building Bowtie index from genes.fa
    [2012-07-27 20:37:25] Mapping left_kept_reads against transcriptome genes with Bowtie2
    [2012-07-27 21:47:22] Mapping right_kept_reads against transcriptome genes with Bowtie2
    [2012-07-27 22:57:45] Converting left_kept_reads.m2g to genomic coordinates (map2gtf)
    [2012-07-27 23:08:15] Converting right_kept_reads.m2g to genomic coordinates (map2gtf)
    [2012-07-27 23:19:08] Resuming TopHat pipeline with unmapped reads
    [2012-07-27 23:20:05] Mapping left_kept_reads.m2g_um against hg19 with Bowtie2
    [2012-07-28 00:35:10] Mapping left_kept_reads.m2g_um_seg1 against hg19 with Bowtie2 (1/4)
    [2012-07-28 00:46:17] Mapping left_kept_reads.m2g_um_seg2 against hg19 with Bowtie2 (2/4)
    [2012-07-28 00:59:56] Mapping left_kept_reads.m2g_um_seg3 against hg19 with Bowtie2 (3/4)
    [2012-07-28 01:12:34] Mapping left_kept_reads.m2g_um_seg4 against hg19 with Bowtie2 (4/4)
    [2012-07-28 01:26:34] Mapping right_kept_reads.m2g_um against hg19 with Bowtie2
    [2012-07-28 02:45:57] Mapping right_kept_reads.m2g_um_seg1 against hg19 with Bowtie2 (1/4)
    [2012-07-28 03:00:08] Mapping right_kept_reads.m2g_um_seg2 against hg19 with Bowtie2 (2/4)
    [2012-07-28 03:16:25] Mapping right_kept_reads.m2g_um_seg3 against hg19 with Bowtie2 (3/4)
    [2012-07-28 03:32:19] Mapping right_kept_reads.m2g_um_seg4 against hg19 with Bowtie2 (4/4)
    [2012-07-28 03:49:55] Searching for junctions via segment mapping
    [FAILED]
    Error: segment-based junction search failed with err =-11
    Loading left segment hits…

    혹시나 해서 결과 디렉토리의 logs파일 올립니다. ..
    문제가 도대체 무엇인지 모르겠습니다…

    [kyung@cipher tophat_output_1017]$ ls -lh
    합계 16K
    -rw-rw-r– 1 kyung kyung 72 2012-07-27 20:22 left_kept_reads.info
    drwxrwxr-x 2 kyung kyung 4.0K 2012-07-28 04:02 logs
    -rw-rw-r– 1 kyung kyung 72 2012-07-27 20:33 right_kept_reads.info
    drwxrwxr-x 2 kyung kyung 4.0K 2012-07-28 04:02 tmp
    [kyung@cipher tophat_output_1017]$ cd logs
    [kyung@cipher logs]$ ls
    bowtie.left_kept_reads.fixmap.log g2f.err
    bowtie.left_kept_reads.m2g_um.fixmap.log g2f.out
    bowtie.left_kept_reads.m2g_um_seg1.fixmap.log gtf_juncs.log
    bowtie.left_kept_reads.m2g_um_seg2.fixmap.log m2g_left_kept_reads.m2g.err
    bowtie.left_kept_reads.m2g_um_seg3.fixmap.log m2g_left_kept_reads.m2g.out
    bowtie.left_kept_reads.m2g_um_seg4.fixmap.log m2g_right_kept_reads.m2g.err
    bowtie.right_kept_reads.fixmap.log m2g_right_kept_reads.m2g.out
    bowtie.right_kept_reads.m2g_um.fixmap.log prep_reads.log
    bowtie.right_kept_reads.m2g_um_seg1.fixmap.log run.log
    bowtie.right_kept_reads.m2g_um_seg2.fixmap.log segment_juncs.log
    bowtie.right_kept_reads.m2g_um_seg3.fixmap.log tophat.log
    bowtie.right_kept_reads.m2g_um_seg4.fixmap.log

    • Bowtie로 read align 하는 부분까지는 무사히 진행되었으므로 일단 프로그램 설치 등은 제대로 된 것 같은데, junction 찾는 부분에서 에러가 발생했군요.

      경험상 junction 찾는 부분에서 매우 많은 양의 메모리가 사용되었던 기억입니다. 16GB의 램을 가진 시스템에서 돌려봤는데, 메모리가 너무 많이 소요되서 다른 job을 다 죽여야 진행이 되었던 기억이 납니다. 어느정도의 스펙을 가진 머신에서 작업을 하시는지 모르겠지만, 만약 램 용량이 충분하지 않는다면 이것도 한번 확인해 보시기 바랍니다.

      확실한 답이 되지는 못한것 같습니다만 참고가 되시기 바랍니다.

      • 답변 너무 감사드립니다. 저도 맥을 쓰는데, 코어수도 딸리고, 램수도 딸려서
        서버에서 실행중이엇습니다.
        thread수를 지정했다가, -p 4 로 했다가, default로 두엇더니 (1)
        잘 돌아갔습니다.
        너무너무 감사드립니다. ^______^

  3. ㅎㅎㅎ 저 데이터가 27시간…
    감사합니다.
    제가 이제 돌릴 RANseq데이터의 실행시간이 대충 가늠이 되는군요;;
    이런;;;; ㅎㄷㄷㄷ
    몇일 후면 정전인데… 빠듯하군요;;; ㅋㅋㅋㅋ
    저 데이터로 16G는 ㅎㄷㄷㄷ 하죠..
    전 32G 머신에서 준비중입죠 ㅎㅎ

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s