(prn (:out (clojure.java.shell/sh "/path_to_dir/mediana/00import2mongo.sh" "/path_to_dir/Analytics-All-Traffic-20121120-20121220.csv" "c5" "m1")))
desktop.clj
We evaulate functions interactively. Open mediana/src/mediana/core.clj in Emacs. M-x clojure-jack-in starts a slime session and a repl. pressing C-x C-e right after a closing clojure evaluates expression or M-x slime-eval-last-expression. A whole file can be evaluated by pressing C-c C-k or M-x slime-compile-and-load-file. A active region can be evaluated by C-c C-r or M-x slime-eval-region.(in-ns 'mediana.core)
;;:SourceMedium :PagesVisit :Visits :AvgVisitDuration :NewVisits :BounceRate
;; Desktop
(defn nopfn []
;; Import csv into :m1 then process it into :m2
(initdb)
;; Fetch "google /organic" data from :m1
(fdb :m1 :where {:SourceMedium "google / organic" })
;; Result:
[:_id :SourceMedium :Visits :PagesVisit :AvgVisitDuration :NewVisits :BounceRate]
[# "google / organic" "25,104" 5.38 "00:05:32" "39.93%" "33.42%"]
;; What is the index of "google / organic" string in :SourceMedium row?
(iof "google / organic" (fdr :m1 :SourceMedium))
;; 352
;; Get record from :m2
(fdb :m2 :where {:SourceMedium 352})
;; Result:
[:_id :SourceMedium :Visits :PagesVisit :AvgVisitDuration :NewVisits :BounceRate]
[# 352 25104 5.38 332 0.3993 0.3342]
;; Processing works fine.
Now we have a working dataset in :m2 collection. Let's see some data.;; What is the correlation between :PagesVisit and :AvgVisitDuration ? (correlation (fdr :m2 :PagesVisit) (fdr :m2 :AvgVisitDuration)) ;; 0.7215517848024166 ;; It is a strong correlation. Let's see it in a chart with a linear-model: (lm-chart (fdr :m2 :PagesVisit) (fdr :m2 :AvgVisitDuration) "Pages/Visit" "Avg Visit duration in sec")
Linear model details
(def lm (linear-model (fdr :m2 :PagesVisit) (fdr :m2 :AvgVisitDuration))) (keys lm) ; see what fields are included (:design-matrix lm) ; a matrix containing the independent variables, and an intercept columns (:coefs lm) ; regression coefficients (:t-tests lm) ; t-test values of coefficients (:t-probs lm) ; p-values for t-test values of coefficients (:fitted lm) ; the predicted values of y (:residuals lm) ; the residuals of each observation (:std-errors lm) ; the standard errors of the coeffients (:sse lm) ; the sum of squared errors, (:ssr lm) ; the regression sum of squares (:sst lm) ; the total sum of squares (:r-square lm) ; coefficient of determination
Histogram
(view (histogram (fdr :m2 :PagesVisit) :nbins 50)) (view (histogram (fdr :m2 :BounceRate) :nbins 50)) (view (histogram (fdr :m2 :NewVisits) :nbins 50)) (view (histogram (fdr :m2 :AvgVisitDuration) :nbins 50)) (view (histogram (fdr :m2 :Visits) :nbins 50))
Classifying Metrics
We use a simple classification: average : 1, below average: 0, above average: 2 for a metric as described in classify-row function. In my next post we take a look at it.



No comments:
Post a Comment