The analysis step running on the AWS (Amazon Web Service) cloud is conducted by the following steps:
is a cloud computing infrastructure providing vender, which lets you run your application over Amazon’s computing resources at a pretty low price.
By running RNA-Seq analysis on the cloud, you can avoid time and cost related issues coming from de-novo pipe-line establishment.
To learn more about AWS, please visit
In order to run FX, you first need to create an Amazon account and sign up for S3, EC2, Elastic MapReduce services.
We strongly recommend you to read the product description and pricing policy provided by AWS before you start running FX.
1.Create Your Project Directory on AWS S3
Create a bucket nearest to your region, or the cheapest one depending on your situation.
Create a new project directory. The directory name must be set in small capitals.
Important: The s3://<bucket-name>/<project-directory> becomes your project URL which you are going to put on the Web-UI

Under your project folder, create a directory named “rawdata”.
Important: This is the location where your RNA sequence data will be uploaded.
Now, we are ready to upload sequences!
2. Uploading RNA Sequences

To upload your sequences, use 3rd party tools recommended by AWS. Since AWS allows data transfer of 5GB at most per data file,
we recommend you to gzip and upload your FASTQ files. Make sure that one project entry holds one sample.
(e.g., one project directory contains one rawdata directory, and 2 FASTQ files representing pair1 and pair2 of 1 sample.)
S3Fox Organizer for Firefox (
http://www.s3fox.net/,
Windows).
Use this tool for manipulating small files.
s3cmd from s3tools(http://s3tools.org/s3tools,
Linux).
A command-line tool.
As an alternative, you may consider using AWS import/export service.
3. Configure Analysis Options
Connect to
AWS, fill out the fields and submit your job.
3.1 AWS Credentials

Put your AWS Credentials into the appropriate fields. These data are not stored on the server.
You can easily find your credential access key and secret access key on your
Account - Security Credentials menu.
3.2 Project Directory

Insert your project URL as you have named in step 1.
In this case, s3://fx-test/my_project will become your project URL.
The project URL must contain either rawdata or align_results directory at least as a child.
3.3 Define the Size of the Cloud

Define the size of the cloud where you are going to run FX.
You will be charged for a certain amount of usage fee depending on the
instance type and pricing you have chosen.
The total cost and elapsed time depends on the number of instances and type of instance.
x-axis describes the number of instances, where as the y-axis represents the total elapsed time in minutes.
This experiment has been performed over AK14, generated from solexa GAIIx, paired-end, with a data size of 2.2~2.3GB after compressing with gzip.
3.4 Configure Analysis Options

The default configuration options will work fine for Solexa paired-end RNA-seq data.
3.5 To Run Step by Step
Once your job has finished running, either failed or completed, you may want to run again with different options for specific steps.
You may change the filter conditions and run specific steps again. When doing this, remember to rename the directory name(s) generated as output from each step.
For example, do base calling with a higher quality threshold: 30 and from there, do SNP/INDEL call again.
Remove the step’s output folders you want to run again (or rename them). The following shows the steps and the consequent output directory.
Setp
| Output Directorys
|
Preprocess |
preprocess |
GSNAP Align |
align_results |
Base Call |
base_call, base_sort, counters/TOTAL_ALIGNED*.txt, counters/TOTAL_*BASES.txt, counters/TOTAL_*READS.txt, counters/REF_N_REGION.txt |
INDEL Call |
indel_call, indel_sort, indel, counters/TOTAL_INDELS.txt |
SNP Call |
snp, counters/TOTAL_SNPS.txt |
Expression Pofiling |
gene_count, bpkm_gene, counters/BPKM_COUNTS.txt |
Assign “step-by-step” and give appropriate options. Now, submit your job and run again!
4. Monitor Your JobFlow
There are 2 ways to monitor JobFlows: one from FX web-UI, the other from AWS management console.

From FX web-UI, go to [Status] page to see your running job status. Submit your AWS credential keys, click a running JobFlow to monitor, and watch your job ‘Status’.
The job status is defined in AWS Elastic MapReduce, and has the following states:
STARTING-(BOOTSTRAPPING)-RUNNING-SHUTTING_DOWN-COMPLETED/FAILED/KILLED
Bootstrapping will appear when your job flow includes GSNAP alignment.

From AWS Management Console, go to ElasticMapReduce tab and watch your job ‘State’. You may change your Viewing as “Running” to see only your running jobs.

On your S3 home, you can find your JobFlow logs under s3://your_bucket/project_dir/logs directory.
Important: In order to get the counting numbers at a glance, or see any error messages, see logs/j-/steps/1/stdout or stderr.
Normally, when your job completes without any errors, the file stderr should be in size of ‘0 bytes’.
5. Download Results
Download SNP, INDEL, Expression Profiling results on to your local machine and go for further analysis.

Result files are written as plain text files.
OUTPUT FORMAT
SNP, INDELs are written as ANNOVAR input format for further annotation.
Expression Profiling has the following format:
gene, chr, gene_length, bpkm, counted_bases
SNP: snp/FX_chr*.snp.anv.in
INDEL: indel/FX_chr*.indel.anv.in
Expression Profiling: bpkm_gene/FX_chr*.bpkm