Microarray Data Analysis with PATIKAmad

Microarray technology helps us figure out the expression levels of thousands of genes in the cell simultaneously for different conditions. Nowadays microarrays for different species are emerging and the existing ones are being improved, resulting in more and more microarray experiements getting done.

PATIKAweb has a comprehensive microarray data analysis component named PATIKAmad, implemented as an applet and integrated into its powerful visualization environment.

Loading Data

PATIKAmad has a native expression data file format (.pmad) and supports conversion of tab-delimited data files into this. This is an XML-based format containing information about the experiments (description and values to be analyzed). Previously created pmad files can be directly loaded into PATIKAmad. Upon load, all visualized objects are associated with the first experiment in the loaded data file. As the pathway models change on new queries for instance, the loaded data is associated to any newly introduced objects whenever possible.

Figure 1. The Microarray Data Conversion Wizard; Reference file is parsed and some reference columns are mapped. The “Key to Data File(s)” item should be mapped to the internal ID column, which data files use.

Visualization of Microarray Data on Pathways

In PATIKAmad it is possible to map microarray data onto bioentities in a bioentity view and simple states in a mechanistic view. This mapping can be visualized by color-coding and/or labeling the view objects.

Following are pathway views, where loaded microarray data has been mapped onto the pathway model objects.

Figure 2. Sample microarray data visualized (with both labels and color coding) on mechanistic (left) and bioentity (right) views of sample pathway models

Configuring Visualization Settings

Visualization options for microarray data can be configured using "View Settings Dialog" in PATIKAmad. You can specify any number of colors and corresponding values for desired color-coding. Then in-between values are displayed with in-between colors computed accordingly. Optionally values can be displayed on related pathway view objects as labels.

Figure 3. View Settings Dialog

Microarray Data Management

When multiple experiments are loaded into PATIKAmad, the user may choose which group of experiments to be averaged or which two groups of experiments to be compared. This is done using the "Microarray Data Management Dialog".

Figure 4. The Microarray Data Management Dialog

Querying with Values Table

Values table displays the rows of the loaded microarray data. This table color-codes the experiments to be averaged and/or compared. A separate column is used to show the calculated values that are displayed on the graph. It is possible to sort the values on this table according to the values and/or filter rows according to their references.

Figure 5. Values Table

Selected rows may be used for querying their representative objects in the database. Alternatively a neighborhood or a graph-of-interest query may be run related to the selected rows.

Figure 6. Querying Using Selections in Values Table

Graph-of-Interest Query using Significant Microarray Values

A graph-of-interest query may be used for discovering links between significantly expressed (or differentially expressed) nodes. The user needs to state their criteria of significance, limit of path length between nodes to search, and type of the result graph.

Figure 7. Graph-of-Interest Query Dialog

Cluster Analysis

The purpose of microarray cluster analysis is to group genes on the basis of similarity/dissimilarity of their expression profiles. As microarray experiments get more widely used, they become more and more dependent on cluster analysis and other biostatisticial methods since it is almost impossible to make sense of expression profiles of thousands of genes manually. Cluster analysis of microarray data has already demonstrated great potential for disease identification, finding genes responsible for specific diseases and drug discovery.

PATIKAmad is unique in its integrated tools to perform cluster analysis and visualize the results as partitioned pathways.

Prior to performing cluster analysis, raw microarray data needs to be converted to native format (".pmad') and loaded as described earlier. Below is a step-by-step illustration of performing cluster analysis and visualizing its results as a pathway.

Filtering, Normalizing and Clustering Microarray Data

We assume a basic understanding of normalization and clustering methods. For this illustration, we will use "GDS170" data set downloaded from NCBI's GEO database (more information about this data set). We assume GDS170 dataset (local dataset and local platform file) has been converted into PATIKA Microarray Data (".pmad") format and loaded into PATIKAmad prior to this analysis. A previously converted version can be found here.

The loaded data can be normalized and clustered using either k-means or hierarchic clustering methods through the "Cluster Analysis Dialog". As an example, we use hierarchic clustering with the parameters "Euclidian Distance", "Average Linkage" and "3 clusters" (see Figure 8). We also filter out lower 10 percent of genes according to variance rank. Since GDS170 is conducted on Affymetrix GPL80 oligonucleotide arrays, we also select the corresponding array type.

Figure 8. Sample clustering parameters used for hierarchic clustering of GDS170 data set

Upon pressing "Execute" button, the PATIKA server will perform clustering with specified parameters, and send the result file in the XML-based PATIKA Cluster Analysis File (".pcaf") format. This file can be persisted for later use.

Cluster Visualization

As desired, ".pcaf" files can be loaded (using "Load Cluster Analysis File") and visualized (using "Cluster Visualization Dialog") in PATIKAmad. Depending on the type of the current view, user has different options.

Basically clustering information can be visualized in two ways. One is using the highlighting facility in PATIKA. That is, each cluster is assigned a unique color and all biological objects in that cluster are highlighted with this color.

Alternatively, a regular abstraction (a meta node to signify a logical grouping of pathway elements) is created for each cluster containing only the objects in that cluster.

Figure 9. A sample Cluster Visualization Dialog, where results are to be displayed as a bioentity view and clusters are to be represented with regular abstractions (rather than highlighted)

In our example, we will visualize the clusters on a sample bioentity-level model of PPIs, obtained by a neighboorhood query (4-neighborhood of protein bioentity "SPAG5") from the PATIKA database.

Upon pressing the "Display" button of the "Cluster Visualization Dialog", one of the views in Figure 10 is obtained depending on the visualization option selected.

Figure 10. Results of the clustering, whose paremeters have been defined previously in Figures 4 and 5 using abstractions (left) and using highlighting (right)