(prn (:out (clojure.java.shell/sh "/path_to_dir/mediana/00import2mongo.sh" "/path_to_dir/Analytics-All-Traffic-20121120-20121220.csv" "c5" "m1")))
We evaulate functions interactively. Open mediana/src/mediana/core.clj in Emacs. M-x clojure-jack-in starts a slime session and a repl. pressing C-x C-e right after a closing clojure evaluates expression or M-x slime-eval-last-expression. A whole file can be evaluated by pressing C-c C-k or M-x slime-compile-and-load-file. A active region can be evaluated by C-c C-r or M-x slime-eval-region.(in-ns 'mediana.core) ;;:SourceMedium :PagesVisit :Visits :AvgVisitDuration :NewVisits :BounceRate ;; Desktop (defn nopfn [] ;; Import csv into :m1 then process it into :m2 (initdb) ;; Fetch "google /organic" data from :m1 (fdb :m1 :where {:SourceMedium "google / organic" }) ;; Result: [:_id :SourceMedium :Visits :PagesVisit :AvgVisitDuration :NewVisits :BounceRate] [#Now we have a working dataset in :m2 collection. Let's see some data."google / organic" "25,104" 5.38 "00:05:32" "39.93%" "33.42%"] ;; What is the index of "google / organic" string in :SourceMedium row? (iof "google / organic" (fdr :m1 :SourceMedium)) ;; 352 ;; Get record from :m2 (fdb :m2 :where {:SourceMedium 352}) ;; Result: [:_id :SourceMedium :Visits :PagesVisit :AvgVisitDuration :NewVisits :BounceRate] [# 352 25104 5.38 332 0.3993 0.3342] ;; Processing works fine.
;; What is the correlation between :PagesVisit and :AvgVisitDuration ? (correlation (fdr :m2 :PagesVisit) (fdr :m2 :AvgVisitDuration)) ;; 0.7215517848024166 ;; It is a strong correlation. Let's see it in a chart with a linear-model: (lm-chart (fdr :m2 :PagesVisit) (fdr :m2 :AvgVisitDuration) "Pages/Visit" "Avg Visit duration in sec")
Linear model details
(def lm (linear-model (fdr :m2 :PagesVisit) (fdr :m2 :AvgVisitDuration))) (keys lm) ; see what fields are included (:design-matrix lm) ; a matrix containing the independent variables, and an intercept columns (:coefs lm) ; regression coefficients (:t-tests lm) ; t-test values of coefficients (:t-probs lm) ; p-values for t-test values of coefficients (:fitted lm) ; the predicted values of y (:residuals lm) ; the residuals of each observation (:std-errors lm) ; the standard errors of the coeffients (:sse lm) ; the sum of squared errors, (:ssr lm) ; the regression sum of squares (:sst lm) ; the total sum of squares (:r-square lm) ; coefficient of determination
(view (histogram (fdr :m2 :PagesVisit) :nbins 50)) (view (histogram (fdr :m2 :BounceRate) :nbins 50)) (view (histogram (fdr :m2 :NewVisits) :nbins 50)) (view (histogram (fdr :m2 :AvgVisitDuration) :nbins 50)) (view (histogram (fdr :m2 :Visits) :nbins 50))
Classifying Metrics
We use a simple classification: average : 1, below average: 0, above average: 2 for a metric as described in classify-row function. In my next post we take a look at it.
No comments:
Post a Comment