Describe the bug For pre-ranked GSEA, it appears that genes are getting ranked based only upon fold change. However, this is not appropriate, as it does not consider statistical significance, and therefore does not take into account the degree of distinctness between groups in the differential expression analysis. As is described in the GSEA paper and in many forums, genes should be ranked on a statistic that encompasses both the direction of the fold change difference AND the degree of the difference. One commonly used metric is the t-statistic that limma outputs. Another option that I have used is the -log10(Pvalue), with the sign (+ or -) set to match that of the fold change. Ranking on this statistic would mean the top gene would be the most distinct up-regulated genes and the last gene would be the most distinct down-regulated gene, which is ideally what pre-ranked GSEA expects as input.
Along these same lines, it would be extremely valuable to have the raw P-values be part of the downloads available on the DEG1 tab. As it is now, we can only download the gene, fold-change, and FDR, as well as the log2 normalized expression values. For many downstream analyses it would be very useful to also have raw P-values.
Copied from https://github.com/iDEP-SDSU/idep/issues/70