9. Gene Expression Analysis - HDFn - GSE32581.Rmd

---
title: "Gene Expression Analysis"
subtitle: "Human Dermal Fibroblasts, neonatal (HDFn) | GSE32581 | Necroptosis, Ferroptosis & Pyroptosis"
author: 
  - Mark Edward M. Gonzales^[De La Salle University, Manila, Philippines, gonzales.markedward@gmail.com]
  - Dr. Anish M.S. Shrestha^[De La Salle University, Manila, Philippines, anish.shrestha@dlsu.edu.ph]
output: html_notebook
---

## I. Preliminaries

### Loading libraries

```{r, warning=FALSE, message=FALSE}
library("tidyverse")
library("tibble")
library("msigdbr")
library("ggplot2")
library("ensembldb")
library("purrr")
library("magrittr")
library("matrixStats")
library("dplyr")
library("grex")
library("gplots")
library("RColorBrewer")
library("illuminaHumanv4.db")
```

### Constants
```{r}
DATA_DIR <- "data/HDFn/"
```

## Loading the Expression Data

The expression data are taken from this study: https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE32581

Download the RNA-seq normalized counts matrix from https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?view=data&acc=GSM807459&id=28440&db=GeoDb_blob67

```{r}
hdfn.expression <- read.delim(paste0(DATA_DIR, "GSE32581-hdfn.tsv"), as.is = TRUE, header = TRUE, row.names = 1)
hdfn.expression <- rownames_to_column(hdfn.expression, "ID_REF")
hdfn.expression <- hdfn.expression[c("ID_REF", "VALUE")]
hdfn.expression
```

Map the Illumina probe IDs to Ensembl accessions.

```{r}
illumina_to_ensembl <- data.frame(gene_id = unlist(mget(x = hdfn.expression[["ID_REF"]], envir = illuminaHumanv4ENSEMBL)))
illumina_to_ensembl <- rownames_to_column(illumina_to_ensembl, "ID_REF")
illumina_to_ensembl
```

```{r}
hdfn.expression <- left_join(hdfn.expression, illumina_to_ensembl)
hdfn.expression
```

## Exploratory Data Analysis

We load the gene sets from RCDdb: https://pubmed.ncbi.nlm.nih.gov/39257527/

```{r}
RCDdb <- "data/RCDdb/"
```

### Necroptosis

Load the gene set.

```{r}
genes <- read.csv(paste0(RCDdb, "Necroptosis.csv"))
genes$gene_id <- cleanid(genes$gene_id)
genes <- distinct(genes, gene_id, .keep_all = TRUE)
genes <- subset(genes, gene_id != "")
genes
```

Get the normalized expression data for the genes in the gene set.

```{r}
tpm.df <- hdfn.expression %>% dplyr::filter(gene_id %in% genes$gene_id)
tpm.df <- left_join(tpm.df, genes %>% dplyr::select(gene_id, gene), by = c("gene_id" = "gene_id"))
tpm.df <- distinct(tpm.df, gene, .keep_all = TRUE)
rownames(tpm.df) <- tpm.df$gene
tpm.df <- subset(tpm.df, select = -c(gene_id, ID_REF, gene))
tpm.df <- tpm.df[order(row.names(tpm.df)), , drop = FALSE]
tpm.df
```

Plot the results.

**NOTE: `VALUE` and `VALUE1` are the same. This is just a workaround since R's `heatmap.2` requires the heatmap to have at least two columns.**

```{r, fig.height=30, fig.width=10}
tpm.df[["VALUE1"]] <- tpm.df[["VALUE"]]
tpm.matrix <- as.matrix(tpm.df)
heatmap.2(tpm.matrix,
  srtCol = 360,
  cellnote = tpm.matrix,
  dendrogram = "none", Colv = FALSE, Rowv = FALSE,
  col = brewer.pal(n = 9, name = "BuPu")[5:9], trace = "none", key = FALSE, lwid = c(0.1, 4), lhei = c(0.1, 4),
  cexCol = 1, cexRow = 0.75, symm = TRUE
)
```
### Ferroptosis

Load the gene set.

```{r}
genes <- read.csv(paste0(RCDdb, "Ferroptosis.csv"))
genes$gene_id <- cleanid(genes$gene_id)
genes <- distinct(genes, gene_id, .keep_all = TRUE)
genes <- subset(genes, gene_id != "")
genes
```

Get the normalized expression data for the genes in the gene set.

```{r}
tpm.df <- hdfn.expression %>% dplyr::filter(gene_id %in% genes$gene_id)
tpm.df <- left_join(tpm.df, genes %>% dplyr::select(gene_id, gene), by = c("gene_id" = "gene_id"))
tpm.df <- distinct(tpm.df, gene, .keep_all = TRUE)
rownames(tpm.df) <- tpm.df$gene
tpm.df <- subset(tpm.df, select = -c(gene_id, ID_REF, gene))
tpm.df <- tpm.df[order(row.names(tpm.df)), , drop = FALSE]
tpm.df
```

Plot the results.

**NOTE: `VALUE` and `VALUE1` are the same. This is just a workaround since R's `heatmap.2` requires the heatmap to have at least two columns.**

```{r, fig.height=150, fig.width=10}
tpm.df[["VALUE1"]] <- tpm.df[["VALUE"]]
tpm.matrix <- as.matrix(tpm.df)
heatmap.2(tpm.matrix,
  srtCol = 360,
  cellnote = tpm.matrix,
  dendrogram = "none", Colv = FALSE, Rowv = FALSE,
  col = brewer.pal(n = 9, name = "BuPu")[5:9], trace = "none", key = FALSE, lwid = c(0.1, 4), lhei = c(0.1, 4),
  cexCol = 1, cexRow = 0.75, symm = TRUE
)
```

### Pyroptosis

Load the gene set.

```{r}
genes <- read.csv(paste0(RCDdb, "Pyroptosis.csv"))
genes$gene_id <- cleanid(genes$gene_id)
genes <- distinct(genes, gene_id, .keep_all = TRUE)
genes <- subset(genes, gene_id != "")
genes
```

Get the normalized expression data for the genes in the gene set.

```{r}
tpm.df <- hdfn.expression %>% dplyr::filter(gene_id %in% genes$gene_id)
tpm.df <- left_join(tpm.df, genes %>% dplyr::select(gene_id, gene), by = c("gene_id" = "gene_id"))
tpm.df <- distinct(tpm.df, gene, .keep_all = TRUE)
rownames(tpm.df) <- tpm.df$gene
tpm.df <- subset(tpm.df, select = -c(gene_id, ID_REF, gene))
tpm.df <- tpm.df[order(row.names(tpm.df)), , drop = FALSE]
tpm.df
```

Plot the results.

**NOTE: `VALUE` and `VALUE1` are the same. This is just a workaround since R's `heatmap.2` requires the heatmap to have at least two columns.**

```{r, fig.height=20, fig.width=10}
tpm.df[["VALUE1"]] <- tpm.df[["VALUE"]]
tpm.matrix <- as.matrix(tpm.df)
heatmap.2(tpm.matrix,
  srtCol = 360,
  cellnote = tpm.matrix,
  dendrogram = "none", Colv = FALSE, Rowv = FALSE,
  col = brewer.pal(n = 9, name = "BuPu")[5:9], trace = "none", key = FALSE, lwid = c(0.1, 4), lhei = c(0.1, 4),
  cexCol = 1, cexRow = 0.75, symm = TRUE
)
```