2001 2002 2003 2004 2005 2006 2007 2008 2009 2010 2011 2012 2013 2014 2015 2016 2017 2018 2020

04/17/2014: Using RMarkdown in an Analysis Archive and Retrieval System

RMarkdown is good stuff. Hopefully, you've already heard of RMarkdown. If not, you'll understand it pretty quickly by looking at this simple example:

http://rpubs.com/medined/replacing_part_of_time_series_using_time_based_selection

Essentially, you mix R code with Markdown markup to create a 'living' document. The R code is executed when the page is displayed. The full power of R (and all of its extensions) can be used. There are many examples of this online.

RMarkdown pages can be computer-generated. Imagine if any given analytic documented the intermediate steps from Data Load to Final Visualization using RMarkdown? I bet user confidence in the final product would increase. It would also be trivial for the document to be duplicated and tweaked (draft mode) before being republished. Since RMarkdown is text-based, you could provide human-readable diff reports between analyses. Another advantage of this text-based system would be full-text search across all analyses.

I could also point out the value in being able to produce an analytic report without needing to know Java, Python, or another programming language. Just knowing the math is complex enough.

Update: http://studio.sketchpad.cc/sp/pad/view/ro.9QNw0rsxwki4J/rev.480 - The archive timeline widget allows visitors to view all versions of the source document.

Update: You can do this same kind of thing with Python code. Check out http://ipython.org/notebook.html.

I'm not saying R is the answer to all problems. But this idea of archivable, diffable analytic solutions was interesting.

04/16/2014: Example of Replacing The Middle of a Time Series in R

Exploration of replacing part of a Time Series
install.packages('zoo')
library('zoo')
timeSeries <- ts(1:96, freq=12, start=2001); timeSeries

     Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec
2001   1   2   3   4   5   6   7   8   9  10  11  12
2002  13  14  15  16  17  18  19  20  21  22  23  24
2003  25  26  27  28  29  30  31  32  33  34  35  36
2004  37  38  39  40  41  42  43  44  45  46  47  48
2005  49  50  51  52  53  54  55  56  57  58  59  60
2006  61  62  63  64  65  66  67  68  69  70  71  72
2007  73  74  75  76  77  78  79  80  81  82  83  84
2008  85  86  87  88  89  90  91  92  93  94  95  96

#
# If you already know the indexes of the elements to
# replace, just do it:
#
timeSeries[13:36] <- NA
timeSeries

     Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec
2001   1   2   3   4   5   6   7   8   9  10  11  12
2002  NA  NA  NA  NA  NA  NA  NA  NA  NA  NA  NA  NA
2003  NA  NA  NA  NA  NA  NA  NA  NA  NA  NA  NA  NA
2004  37  38  39  40  41  42  43  44  45  46  47  48
2005  49  50  51  52  53  54  55  56  57  58  59  60
2006  61  62  63  64  65  66  67  68  69  70  71  72
2007  73  74  75  76  77  78  79  80  81  82  83  84
2008  85  86  87  88  89  90  91  92  93  94  95  96

#
# However, sometimes you might want to refer to the
# Time Series part by date. Below is one way to do
# that.
#

#
# Reset the Time Series and then look at just two years, 2002 and 2003
#
timeSeries <- ts(1:96, freq=12, start=2001);
window(timeSeries, start=c(2002,1), end=c(2003,12))

     Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec
2002  13  14  15  16  17  18  19  20  21  22  23  24
2003  25  26  27  28  29  30  31  32  33  34  35  36

#
# Copy this part of the timeSeries for safety.
#
original = window(timeSeries, start=c(2002,1), end=c(2003,12))

#
# Change 2002 and 2003 to NA because a lawsuit is pending.
#
window(timeSeries, start=c(2002,1), end=c(2003,12)) <- NA
timeSeries

     Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec
2001   1   2   3   4   5   6   7   8   9  10  11  12
2002  NA  NA  NA  NA  NA  NA  NA  NA  NA  NA  NA  NA
2003  NA  NA  NA  NA  NA  NA  NA  NA  NA  NA  NA  NA
2004  37  38  39  40  41  42  43  44  45  46  47  48
2005  49  50  51  52  53  54  55  56  57  58  59  60
2006  61  62  63  64  65  66  67  68  69  70  71  72
2007  73  74  75  76  77  78  79  80  81  82  83  84
2008  85  86  87  88  89  90  91  92  93  94  95  96

#
# The lawsuit is over and you won. Retrieve the data.
#

window(timeSeries, start=c(2002,1), end=c(2003,12)) <- original

> timeSeries
     Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec
2001   1   2   3   4   5   6   7   8   9  10  11  12
2002  13  14  15  16  17  18  19  20  21  22  23  24
2003  25  26  27  28  29  30  31  32  33  34  35  36
2004  37  38  39  40  41  42  43  44  45  46  47  48
2005  49  50  51  52  53  54  55  56  57  58  59  60
2006  61  62  63  64  65  66  67  68  69  70  71  72
2007  73  74  75  76  77  78  79  80  81  82  83  84
2008  85  86  87  88  89  90  91  92  93  94  95  96