Showing posts with label Scatter Plot Matrix. Show all posts
Showing posts with label Scatter Plot Matrix. Show all posts

Monday, March 4, 2013

Scatter Plot Matrix for Incanter

A Scatter Plot Matrix chart can be useful when exploring relationship between metrical variables of a data-set. Today we share a working implementation of a scatter plot matrix function written in Clojure using Incanter, an R-like statistical computing and graphics environment.

Scatter plot matrix of iris experiment data-set using Clojure, Incanter and JFreeChart

What features are implemented?

  • Histogram in the diagonal for each metrics
  • Variance calculated for each metrics
  • Spline chart added to each histogram
  • Scatter Plot for each metric pairs
  • Correlation calculated for each metric pairs
  • Grouping option for a categorical dimension
  • Metrics are sorted according to the correlation with other metrics
  • Show only the n most correlating metrics
  • Show only the upper triangle of the plot matrix
Scatter plot matrix of chick-weight experiment data-set using Clojure, Incanter and JFreeChart

Usage of Scatter Plot Matix

If you do not have Leiningen install it.

cd ~/bin
wget https://raw.github.com/technomancy/leiningen/stable/bin/lein
chmod a+x lein

To see the Iris demo, do the following:

cd ${YOURWORKINGDIRECTORY}
git clone git://github.com/loganisarn/scatter-plot-matrix.git
cd scatter-plot-matrix
lein run

To generate and run a jar file:

lein uberjar
java -jar target/spm-0.1.0-standalone.jar

Those who use Emacs:

emacs src/spm/core.clj
M-x clojure-jack-in or 
M-x nrepl-jack-in

More details can be read in the project's github repository.

scatter-plot-matrix function options

(scatter-plot-matrix data & options)

   Options:
   :data data (default $data) the data set for the plot.
   :title s (default "Scatter Plot Matrix").
   :bins n (default 10) number of bins (ie. bars) in histogram.
   :group-by grp (default nil) name of the column for grouping data.
   :only-first n (default 6) show only the first n most correlating columns of the data set.
   :only-triangle b (default false) shows only the upper triangle of the plot matrix.

   Examples:

   (view (scatter-plot-matrix (get-dataset :iris) :bins 20 :group-by :Species ))
   (with-data (get-dataset :iris) (view (scatter-plot-matrix :bins 20 :group-by :Species )))
   (view (scatter-plot-matrix (get-dataset :chick-weight) :group-by :Diet :bins 20))

Detailed usage examples

Defining data source.
;;;Input examples for iris
  ;; Input dataset examples: Incanter data repo, local file, remote file (url)
  (def iris (get-dataset :iris))
  (def iris (read-dataset "data/iris.dat" :delim \space :header true)) ; relative to project home
  (def iris (read-dataset "https://raw.github.com/liebke/incanter/master/data/iris.dat" :delim \space :header true))
Filtering for specific columns.
;; Filter dataset to specific columns only
  (def iris ($ [:Sepal.Length :Sepal.Width :Petal.Length :Petal.Width :Species] (get-dataset :iris)))
  (def iris (sel (get-dataset :iris) :cols [:Sepal.Length :Sepal.Width :Petal.Length :Petal.Width :Species] ))
Defining a chart object with default options.
;;; Scatter plot matrix examples
  ;; Using default options
  (def iris-spm (scatter-plot-matrix iris :group-by :Species))
  ;; filter to metrics only, no categorical dimension for grouping
  (def iris-spm (scatter-plot-matrix :data ($ [:Sepal.Length :Sepal.Width :Petal.Length :Petal.Width] iris)))
Defining a chart object using more options.
(def iris-spm (scatter-plot-matrix iris
                                     :title "Iris Scatter Plot Matrix"
                                     :bins 20 ; number of histogram bars
                                     :group-by :Species
                                     :only-first 4 ; most correlating columns
                                     :only-triangle false))

Viewing and saving scatter plot matrix chart

View on Display. Set chart width and height according to your needs.
(view iris-spm :width 1280 :height 800)
Save as PDF document using save-pdf Incanter function. (Click to see an example PDF output)
(save-pdf  iris-spm "out/iris-spm.pdf" :width 2560 :height 1600)
Save as PNG image using save Incanter function. (Click to see an example PNG output)
 (save iris-spm "out/iris-spm.png" :width 2560 :height 1600)

We get some suggestions that a browser-client output would be a nice alternative to JFreeChart. D3 and C2 were suggested.

Scatter plot matrix of airline data-set using Clojure, Incanter and JFreeChart

As you can see above, the airline shows that a scatter plot matrix function is useful for one metric pair and one categorical dimension.

Feedback is Welcome

Thank you for your comments and feedback. We hope you find our scatter plot matrix function implementation useful. Have a nice day using Clojure.

Sunday, February 24, 2013

Scatter Plot Matrix in Incanter and Clojure Screenshot Preview

Scatter Plot Matrix is useful in getting a quick insight of a data-set with correlating features. In this post I share some early screenshots of a working implementation of a scatter plot matrix function in Clojure and Incanter using two data-sets: iris and chick-weight.

Scatter plot matrix of iris experiment data-set using Clojure, Incanter and JFreeChart

As you can see on the iris chart above, axis is not yet correctly adjusted to the min and max values of the metrics. Yeah. There are many details of a simple scatter-plot-matrix function. It is a Clojure-Incanter-Java learning-by-doing-mini-project with a fellow Loganis researcher, exploring details of Incanter and JFreeChart usage in Clojure.

Our Scatter Plot Matrix function implementation has histogram charts of metrics in the diagonal of the matrix chart, and scatter plots of combinations of metric pairs. You can group values by a categorical dimension, in the footer you can see the value names of colors used.

Scatter plot matrix of chick-weight experiment data-set using Clojure, Incanter and JFreeChart

Chick-Weight experiment is another built in data-set in Incanter. In my next post I am going to provide more details of our scatter plot matrix implementation for Incanter with source code using the same EPL 1.0 license as Clojure.

If you have any idea how to make an Incanter Scatter Plot Matrix function more useful, please share your thoughts with us.