Title: | Package for Analysis of Space-Time Ecological Series |
---|---|
Description: | Regularisation, decomposition and analysis of space-time series. The pastecs R package is a PNEC-Art4 and IFREMER (Benoit Beliaeff <[email protected]>) initiative to bring PASSTEC 2000 functionalities to R. |
Authors: | Philippe Grosjean [aut, cre] , Frederic Ibanez [aut], Michele Etienne [ctb] |
Maintainer: | Philippe Grosjean <[email protected]> |
License: | GPL-2 |
Version: | 1.4.2 |
Built: | 2024-10-31 03:34:32 UTC |
Source: | https://github.com/SciViews/pastecs |
The table of probabilities to have 0 to n-2 extrema in a series, for n=3 to 50 (for n > 50, the Gleissberg distribution is approximated with a normal distribution
This dataset is used internally by pgleissberg()
. You do not have to access it directly. See pgleissberg()
for more information
Sort variables (usually species in a species x stations matrix) in function of
their abundance, either in number of non-null values, or in number of
individuals (in log). The f
coefficient allows adjusting weight given to each of these two criteria.
abund(x, f = 0.2) ## S3 method for class 'abund' extract(e, n, left = TRUE, ...) ## S3 method for class 'abund' identify(x, label.pts = FALSE, lvert = TRUE, lvars = TRUE, col = 2, lty = 2, ...) ## S3 method for class 'abund' lines(x, n = x$n, lvert = TRUE, lvars = TRUE, col = 2, lty = 2, ...) ## S3 method for class 'abund' plot(x, n = x$n, lvert = TRUE, lvars = TRUE, lcol = 2, llty = 2, all = TRUE, dlab = c("cumsum", "% log(ind.)", "% non-zero"), dcol = c(1,2,4), dlty = c(par("lty"), par("lty"), par("lty")), dpos = c(1.5, 20), type = "l", xlab = "variables", ylab = "abundance", main = paste("Abundance sorting for:",x$data, "with f =", round(x$f, 4)), ...) ## S3 method for class 'abund' print(x, ...) ## S3 method for class 'summary.abund' print(x, ...) ## S3 method for class 'abund' summary(object, ...)
abund(x, f = 0.2) ## S3 method for class 'abund' extract(e, n, left = TRUE, ...) ## S3 method for class 'abund' identify(x, label.pts = FALSE, lvert = TRUE, lvars = TRUE, col = 2, lty = 2, ...) ## S3 method for class 'abund' lines(x, n = x$n, lvert = TRUE, lvars = TRUE, col = 2, lty = 2, ...) ## S3 method for class 'abund' plot(x, n = x$n, lvert = TRUE, lvars = TRUE, lcol = 2, llty = 2, all = TRUE, dlab = c("cumsum", "% log(ind.)", "% non-zero"), dcol = c(1,2,4), dlty = c(par("lty"), par("lty"), par("lty")), dpos = c(1.5, 20), type = "l", xlab = "variables", ylab = "abundance", main = paste("Abundance sorting for:",x$data, "with f =", round(x$f, 4)), ...) ## S3 method for class 'abund' print(x, ...) ## S3 method for class 'summary.abund' print(x, ...) ## S3 method for class 'abund' summary(object, ...)
x |
A data frame containing the variables to sort according to their
abundance in columns for |
f |
Weight given to the number of individuals criterium (strictly
included between 0 and 1; weight for the non-null values is |
object |
An 'abund' object returned by |
e |
An 'abund' object returned by |
n |
The number of variables selected at left |
type |
the type of graph to plot. By default, lines with 'l' |
lvert |
If |
lvars |
If |
lcol |
The color to use to draw the vertical line ( |
llty |
The style used to draw the vertical line ( |
xlab |
the label of the x-axis |
ylab |
the label of the y-axis |
main |
the main title of the graph |
all |
If |
dlab |
The legend labels |
dcol |
Colors to use for drawing the various curves on the graph |
dlty |
The line style to use for drawing the various curves on the graph |
dpos |
The position of the legend box on the graph (coordinates of its
top-left corner). A legend box is drawn only if |
col |
The color to use to draw lines |
lty |
The style used to draw lines |
... |
additional parameters |
label.pts |
Do we have to label points on the graph or to chose an
extraction level with the |
left |
If |
Successive sorts can be applied. For instance, a first sort with
f = 0.2
, followed by an extraction of rare species and another sort
with f = 1
allows to collect only rare but locally abundant species.
An object of type 'abund' is returned. It has methods print()
,
summary()
, plot()
, lines()
, identify()
, extract()
.
Philippe Grosjean ([email protected]), Frédéric Ibanez ([email protected])
Ibanez, F., J.-C. Dauvin & M. Etienne, 1993. Comparaison des évolutions à long terme (1977-1990) de deux peuplements macrobenthiques de la baie de Morlaix (Manche occidentale): relations avec les facteurs hydroclimatiques. J. Exp. Mar. Biol. Ecol., 169:181-214.
data(bnr) bnr.abd <- abund(bnr) summary(bnr.abd) plot(bnr.abd, dpos=c(105, 100)) bnr.abd$n <- 26 # To identify a point on the graph, use: bnr.abd$n <- identify(bnr.abd) lines(bnr.abd) bnr2 <- extract(bnr.abd) names(bnr2)
data(bnr) bnr.abd <- abund(bnr) summary(bnr.abd) plot(bnr.abd, dpos=c(105, 100)) bnr.abd$n <- 26 # To identify a point on the graph, use: bnr.abd$n <- identify(bnr.abd) lines(bnr.abd) bnr2 <- extract(bnr.abd) names(bnr2)
Compute and plot multiple autocorrelation using Mahalanobis generalized distance D2. AutoD2 uses the same multiple time-series. CrossD2 compares two sets of multiple time-series having same size (same number of descriptors). CenterD2 compares subsamples issued from a single multivariate time-series, aiming to detect discontinuities.
AutoD2(series, lags=c(1, nrow(series)/3), step=1, plotit=TRUE, add=FALSE, ...) CrossD2(series, series2, lags=c(1, nrow(series)/3), step=1, plotit=TRUE, add=FALSE, ...) CenterD2(series, window=nrow(series)/5, plotit=TRUE, add=FALSE, type="l", level=0.05, lhorz=TRUE, lcol=2, llty=2, ...)
AutoD2(series, lags=c(1, nrow(series)/3), step=1, plotit=TRUE, add=FALSE, ...) CrossD2(series, series2, lags=c(1, nrow(series)/3), step=1, plotit=TRUE, add=FALSE, ...) CenterD2(series, window=nrow(series)/5, plotit=TRUE, add=FALSE, type="l", level=0.05, lhorz=TRUE, lcol=2, llty=2, ...)
series |
regularized multiple time-series |
series2 |
a second set of regularized multiple time-series |
lags |
minimal and maximal lag to use. By default, 1 and a third of the number of observations in the series respectively |
step |
step between successive lags. By default, 1 |
window |
the window to use for CenterD2. By default, a fifth of the total number of observations in the series |
plotit |
if |
add |
if |
type |
The type of line to draw in the CenterD2 graph. By default, a line without points |
level |
The significance level to consider in the CenterD2 analysis. By default 5% |
lhorz |
Do we have to plot also the horizontal line representing the significance level on the graph? |
lcol |
The color of the significance level line. By default, color 2 is used |
llty |
The style for the significance level line. By default: |
... |
additional graph parameters |
An object of class 'D2' which contains:
lag |
The vector of lags |
D2 |
The D2 value for this lag |
call |
The command invoked when this function was called |
data |
The series used |
type |
The type of 'D2' analysis: 'AutoD2', 'CrossD2' or 'CenterD2' |
window |
The size of the window used in the CenterD2 analysis |
level |
The significance level for CenterD2 |
chisq |
The chi-square value corresponding to the significance level in the CenterD2 analysis |
units.text |
Time units of the series, nicely formatted for graphs |
If data are too heterogeneous, results could be biased (a singularity matrix appears in the calculations).
Frédéric Ibanez ([email protected]), Philippe Grosjean ([email protected])
Cooley, W.W. & P.R. Lohnes, 1962. Multivariate procedures for the behavioural sciences. Whiley & sons.
Dagnélie, P., 1975. Analyse statistique à plusieurs variables. Presses Agronomiques de Gembloux.
Ibanez, F., 1975. Contribution à l'analyse mathématique des évènements en écologie planctonique: optimisations méthodologiques; étude expérimentale en continu à petite échelle du plancton côtier. Thèse d'état, Paris VI.
Ibanez, F., 1976. Contribution à l'analyse mathématique des évènements en écologie planctonique. Optimisations méthodologiques. Bull. Inst. Océanogr. Monaco, 72:1-96.
Ibanez, F., 1981. Immediate detection of heterogeneities in continuous multivariate oceanographic recordings. Application to time series analysis of changes in the bay of Villefranche sur mer. Limnol. Oceanogr., 26:336-349.
Ibanez, F., 1991. Treatment of the data deriving from the COST 647 project on coastal benthic ecology: The within-site analysis. In: B. Keegan (ed), Space and time series data analysis in coastal benthic ecology, p 5-43.
data(marphy) marphy.ts <- as.ts(as.matrix(marphy[, 1:3])) AutoD2(marphy.ts) marphy.ts2 <- as.ts(as.matrix(marphy[, c(1, 4, 3)])) CrossD2(marphy.ts, marphy.ts2) # This is not identical to: CrossD2(marphy.ts2, marphy.ts) marphy.d2 <- CenterD2(marphy.ts, window=16) lines(c(17, 17), c(-1, 15), col=4, lty=2) lines(c(25, 25), c(-1, 15), col=4, lty=2) lines(c(30, 30), c(-1, 15), col=4, lty=2) lines(c(41, 41), c(-1, 15), col=4, lty=2) lines(c(46, 46), c(-1, 15), col=4, lty=2) text(c(8.5, 21, 27.5, 35, 43.5, 57), 11, labels=c("Peripheral Zone", "D1", "C", "Front", "D2", "Central Zone")) # Labels time(marphy.ts)[marphy.d2$D2 > marphy.d2$chisq]
data(marphy) marphy.ts <- as.ts(as.matrix(marphy[, 1:3])) AutoD2(marphy.ts) marphy.ts2 <- as.ts(as.matrix(marphy[, c(1, 4, 3)])) CrossD2(marphy.ts, marphy.ts2) # This is not identical to: CrossD2(marphy.ts2, marphy.ts) marphy.d2 <- CenterD2(marphy.ts, window=16) lines(c(17, 17), c(-1, 15), col=4, lty=2) lines(c(25, 25), c(-1, 15), col=4, lty=2) lines(c(30, 30), c(-1, 15), col=4, lty=2) lines(c(41, 41), c(-1, 15), col=4, lty=2) lines(c(46, 46), c(-1, 15), col=4, lty=2) text(c(8.5, 21, 27.5, 35, 43.5, 57), 11, labels=c("Peripheral Zone", "D1", "C", "Front", "D2", "Central Zone")) # Labels time(marphy.ts)[marphy.d2$D2 > marphy.d2$chisq]
The bnr
data frame has 103 rows and 163 columns.
Each column is a separate species observed at least once in one of all stations. Several species are very abundant, other are very rare.
data(bnr)
data(bnr)
Unpublished dataset.
Calculate a Buys-Ballot table from a time-series
buysbal(x, y=NULL, frequency=12, units="years", datemin=NULL, dateformat="m/d/Y", count=FALSE)
buysbal(x, y=NULL, frequency=12, units="years", datemin=NULL, dateformat="m/d/Y", count=FALSE)
x |
Either a vector with time values (in this case, |
y |
If |
frequency |
The frequency of observations per year to use in the Buys-Ballot table. By default |
units |
either |
datemin |
A character string representing the first date, using a format corresponding to |
dateformat |
The format used for the date in |
count |
If |
A Buys-Ballot table summarizes data to highlight seasonal variations. Each line is one year and each column is a period of the year (12 months, 4 quarters, 52 weeks,...). A cell ij of this table contain the mean value for all observations made during the year i at the period j.
A matrix containing either the Buys-Ballot table (count=FALSE
), or the number of observations used to build the table (count=TRUE
)
Frédéric Ibanez ([email protected]), Philippe Grosjean ([email protected])
data(releve) buysbal(releve$Day, releve$Melosul, frequency=4, units="days", datemin="21/03/1989", dateformat="d/m/Y") buysbal(releve$Day, releve$Melosul, frequency=4, units="days", datemin="21/03/1989", dateformat="d/m/Y", count=TRUE)
data(releve) buysbal(releve$Day, releve$Melosul, frequency=4, units="days", datemin="21/03/1989", dateformat="d/m/Y") buysbal(releve$Day, releve$Melosul, frequency=4, units="days", datemin="21/03/1989", dateformat="d/m/Y", count=TRUE)
Convert time scales. The time scale "days" corresponds to 1 unit per day. The time scale "years" uses 1 unit for 1 year. It is used in any analysis that requires seasonal decomposition and/or elimination.
daystoyears(x, datemin=NULL, dateformat="m/d/Y") yearstodays(x, xmin=NULL)
daystoyears(x, datemin=NULL, dateformat="m/d/Y") yearstodays(x, xmin=NULL)
x |
A vector of time values |
datemin |
A character string representing the first date, using a format corresponding to |
dateformat |
The format used for the date in |
xmin |
The minimum value for the "days" time-scale |
The "years" time-scale uses one unit for each year. We deliberately "linearized" time in this time-scale and each year is considered to have exactly 365.25 days. There is thus no adjustment for lep years.
Indeed, a small shift (less than one day) is introduced. This could result, for some dates, especially the 31st December, or 1st January of a year to be considered as belonging to the next, or previous year, respectively!
Similarly, one month is considered to be 1/12 year, no mather if it has 28, 29, 30 or 31 days. Thus, the same warning applies: there are shifts in months introduced by this linearization of time!
This representation simplifies further calculations, especially regarding seasonal effects (a quarter is exactly 0.25 units for instance), but shifts introduced in time may or may not be a problem for your particular application
(if exact dates matters, do not use this; if shifts of up to one day is not significant, there is no problem, like when working on long-term biological series with years as units).
Notice that converting it back to "days", using yearstodays()
restablishes exact dates without errors. So, no data is lost, it just a conversion to a simplified (linearized) calendar!
A numerical vector of the same length as x
with the converted time-scale
Philippe Grosjean ([email protected]), Frédéric Ibanez ([email protected])
# A vector with a "days" time-scale (25 values every 30 days) A <- (1:25)*30 # Convert it to a "years" time-scale, using 23/05/2001 (d/m/Y) as first value B <- daystoyears(A, datemin="23/05/2001", dateformat="d/m/Y") B # Convert it back to "days" time-scale yearstodays(B, xmin=30) # Here is an example showing the shift introduced, and its consequence: C <- daystoyears(unclass(as.Date(c("1970-1-1","1971-1-1","1972-1-1","1973-1-1"), format = "%Y-%m-%d"))) C
# A vector with a "days" time-scale (25 values every 30 days) A <- (1:25)*30 # Convert it to a "years" time-scale, using 23/05/2001 (d/m/Y) as first value B <- daystoyears(A, datemin="23/05/2001", dateformat="d/m/Y") B # Convert it back to "days" time-scale yearstodays(B, xmin=30) # Here is an example showing the shift introduced, and its consequence: C <- daystoyears(unclass(as.Date(c("1970-1-1","1971-1-1","1972-1-1","1973-1-1"), format = "%Y-%m-%d"))) C
Decompose a single regular time series with a moving average filtering. Return a 'tsd' object. To decompose several time series at once, use tsd()
with the argument method="average"
decaverage(x, type="additive", order=1, times=1, sides=2, ends="fill", weights=NULL)
decaverage(x, type="additive", order=1, times=1, sides=2, ends="fill", weights=NULL)
x |
a regular time series ('rts' under S+ and 'ts' under R) |
type |
the type of model, either |
order |
the order of the moving average (the window of the average being 2*order+1), centered around the current observation or at left of this observation depending upon the value of the |
times |
The number of times to apply the method (by default, once) |
sides |
If 2 (by default), the window is centered around the current observation. If 1, the window is at left of the current observation (including it) |
ends |
either "NAs" (fill first and last values that are not calculable with NAs), or "fill" (fill them with the average of observations before applying the filter, by default), or "circular" (use last values for estimating first ones and vice versa), or "periodic" (use entire periods of contiguous cycles, deseasoning) |
weights |
a vector indicating weight to give to all observations in the window. This argument has the priority over |
This function is a wrapper around the filter()
function and returns a 'tsd' object. However, it offers more methods to handle ends.
A 'tsd' object
Frédéric Ibanez ([email protected]), Philippe Grosjean ([email protected])
Kendall, M., 1976. Time-series. Charles Griffin & Co Ltd. 197 pp.
Laloire, J.C., 1972. Méthodes du traitement des chroniques. Dunod, Paris, 194 pp.
Malinvaud, E., 1978. Méthodes statistiques de l'économétrie. Dunod, Paris. 846 pp.
Philips, L. & R. Blomme, 1973. Analyse chronologique. Université Catholique de Louvain. Vander ed. 339 pp.
tsd
, tseries
, deccensus
, decdiff
, decmedian
, decevf
, decreg
, decloess
data(marbio) ClausoB.ts <- ts(log(marbio$ClausocalanusB + 1)) ClausoB.dec <- decaverage(ClausoB.ts, order=2, times=10, sides=2, ends="fill") plot(ClausoB.dec, col=c(1, 3, 2), xlab="stations") # A stacked graph is more representative in this case plot(ClausoB.dec, col=c(1, 3), xlab="stations", stack=FALSE, resid=FALSE, lpos=c(53, 4.3))
data(marbio) ClausoB.ts <- ts(log(marbio$ClausocalanusB + 1)) ClausoB.dec <- decaverage(ClausoB.ts, order=2, times=10, sides=2, ends="fill") plot(ClausoB.dec, col=c(1, 3, 2), xlab="stations") # A stacked graph is more representative in this case plot(ClausoB.dec, col=c(1, 3), xlab="stations", stack=FALSE, resid=FALSE, lpos=c(53, 4.3))
The CENSUS II method allows to decompose a regular time series into a trend, a seasonal component and residuals, according to a multiplicative model
deccensus(x, type="multiplicative", trend=FALSE)
deccensus(x, type="multiplicative", trend=FALSE)
x |
A single regular time serie (a 'rts' object under S+ and a 'ts' object under R) with a "years" time scale (one unit = one year) and a complete number of cycles (at least 3 complete years) |
type |
The type of model. This is for compatibility with other |
trend |
If |
The trend component contains both a general trend and long-term oscillations. The seasonal trend may vary from year to year. For a seasonal decomposition using an additive model, use decloess()
instead
a 'tsd' object
Frédéric Ibanez ([email protected]), Philippe Grosjean ([email protected])
Béthoux, N., M. Etienne, F. Ibanez & J.L. Rapaire, 1980. Spécificités hydrologiques des zones littorales. Analyse chronologique par la méthode CENSUS II et estimation des échanges océan-atmosphère appliqués à la baie de Villefranche sur mer. Ann. Inst. Océanogr. Paris, 56:81-95.
Fromentin, J.M. & F. Ibanez, 1994. Year to year changes in meteorological features on the French coast area during the last half-century. Examples of two biological responses. Oceanologica Acta, 17:285-296.
Institut National de Statistique de Belgique, 1965. Décomposition des séries chronologiques en leurs composantes suivant différentes méthodes. Etudes statistiques et économétriques. Bull. Stat. INS, 10:1449-1524.
Philips, J. & R. Blomme, 1973. Analyse chronologique. Université Catholique de Louvain, Vander ed. 339 pp.
Rosenblatt, H.M., 1968. Spectral evaluation of BLS and CENSUS revised seasonal adjustment procedures. J. Amer. Stat. Assoc., 68:472-501.
Shiskin, J. & H. Eisenpress, 1957. Seasonal adjustment by electronic computer methods. J. Amer. Stat. Assoc., 52:415-449.
tsd
, tseries
, decaverage
, decdiff
, decmedian
, decevf
, decreg
, decloess
data(releve) # Get regulated time series with a 'years' time-scale rel.regy <- regul(releve$Day, releve[3:8], xmin=6, n=87, units="daystoyears", frequency=24, tol=2.2, methods="linear", datemin="21/03/1989", dateformat="d/m/Y") rel.ts <- tseries(rel.regy) # We must have complete cycles to allow using deccensus() start(rel.ts) end(rel.ts) rel.ts2 <- window(rel.ts, end=c(1992,5)) rel.dec2 <- deccensus(rel.ts2[, "Melosul"], trend=TRUE) plot(rel.dec2, col=c(1,4,3,2))
data(releve) # Get regulated time series with a 'years' time-scale rel.regy <- regul(releve$Day, releve[3:8], xmin=6, n=87, units="daystoyears", frequency=24, tol=2.2, methods="linear", datemin="21/03/1989", dateformat="d/m/Y") rel.ts <- tseries(rel.regy) # We must have complete cycles to allow using deccensus() start(rel.ts) end(rel.ts) rel.ts2 <- window(rel.ts, end=c(1992,5)) rel.dec2 <- deccensus(rel.ts2[, "Melosul"], trend=TRUE) plot(rel.dec2, col=c(1,4,3,2))
A filtering using X(t+lag) - X(t) has the property to eliminate the general trend from the series, whatever its shape
decdiff(x, type="additive", lag=1, order=1, ends="fill")
decdiff(x, type="additive", lag=1, order=1, ends="fill")
x |
a regular time series ('rts' under S+ and 'ts' under R) |
type |
the type of model, either |
lag |
The lag between the two observations used to calculate differences. By default, |
order |
The order of the difference corresponds to the number of times it is applied, by default |
ends |
either "NAs" (fill first values that are not calculable with NAs), or "fill" (fill them with the average of following observations before applying the filter, by default), or "drop" (do not fill them). If |
This function is a wrapper around the diff()
function to create a 'tsd' object. It also manages initial values through the ends
argument.
a 'tsd' object
Frédéric Ibanez ([email protected]), Philippe Grosjean ([email protected])
Kendall, M., 1976. Time-series. Charles Griffin & Co Ltd. 197 pp.
Laloire, J.C., 1972. Méthodes du traitement des chroniques. Dunod, Paris, 194 pp.
Philips, L. & R. Blomme, 1973. Analyse chronologique. Université Catholique de Louvain. Vander ed. 339 pp.
tsd
, tseries
, decaverage
, deccensus
, decmedian
, decevf
, decreg
, decloess
data(marbio) ClausoB.ts <- ts(log(marbio$ClausocalanusB + 1)) ClausoB.dec <- decdiff(ClausoB.ts, lag=1, order=2, ends="fill") plot(ClausoB.dec, col=c(1, 4, 2), xlab="stations")
data(marbio) ClausoB.ts <- ts(log(marbio$ClausocalanusB + 1)) ClausoB.dec <- decdiff(ClausoB.ts, lag=1, order=2, ends="fill") plot(ClausoB.dec, col=c(1, 4, 2), xlab="stations")
The eigenvector filtering decomposes the signal by applying a principal component analysis (PCA) on the original signal and a certain number of copies of it incrementally lagged, collected in a multivariate matrix. Reconstructing the signal using only the most representative eigenvectors allows filtering it.
decevf(x, type="additive", lag=5, axes=1:2)
decevf(x, type="additive", lag=5, axes=1:2)
x |
a regular time series ('rts' under S+ and 'ts' under R) |
type |
the type of model, either |
lag |
The maximum lag used. A PCA is run on the matrix constituted by vectors lagged from 0 to |
axes |
The principal axes to use to reconstruct the filtered signal. For instance, to use axes 2 and 4, use |
a 'tsd' object
Frédéric Ibanez ([email protected]), Philippe Grosjean ([email protected])
Colebrook, J.M., 1978. Continuous plankton records: zooplankton and environment, North-East Atlantic and North Sea 1948-1975. Oceanologica Acta, 1:9-23.
Ibanez, F. & J.C. Dauvin, 1988. Long-term changes (1977-1987) on a muddy fine sand Abra alba - Melinna palmate population community from the Western English Channel. J. Mar. Prog. Ser., 49:65-81.
Ibanez, F., 1991. Treatment of data deriving from the COST 647 project on coastal benthic ecology: The within-site analysis. In: B. Keegan (ed.) Space and time series data analysis in coastal benthic ecology. Pp. 5-43.
Ibanez, F. & M. Etienne, 1992. Le filtrage des séries chronologiques par l'analyse en composantes principales de processus (ACPP). J. Rech. Océanogr., 16:27-33.
Ibanez, F., J.C. Dauvin & M. Etienne, 1993. Comparaison des évolutions à long-terme (1977-1990) de deux peuplements macrobenthiques de la Baie de Morlaix (Manche Occidentale): relations avec les facteurs hydroclimatiques. J. Exp. Mar. Biol. Ecol., 169:181-214.
tsd
, tseries
, decaverage
, deccensus
, decmedian
, decdiff
, decreg
, decloess
data(releve) melo.regy <- regul(releve$Day, releve$Melosul, xmin=9, n=87, units="daystoyears", frequency=24, tol=2.2, methods="linear", datemin="21/03/1989", dateformat="d/m/Y") melo.ts <- tseries(melo.regy) acf(melo.ts) # Autocorrelation is not significant after 0.16 # That corresponds to a lag of 0.16*24=4 (frequency=24) melo.evf <- decevf(melo.ts, lag=4, axes=1) plot(melo.evf, col=c(1, 4, 2)) # A superposed graph is better in the present case plot(melo.evf, col=c(1, 4), xlab="stations", stack=FALSE, resid=FALSE, lpos=c(0, 60000))
data(releve) melo.regy <- regul(releve$Day, releve$Melosul, xmin=9, n=87, units="daystoyears", frequency=24, tol=2.2, methods="linear", datemin="21/03/1989", dateformat="d/m/Y") melo.ts <- tseries(melo.regy) acf(melo.ts) # Autocorrelation is not significant after 0.16 # That corresponds to a lag of 0.16*24=4 (frequency=24) melo.evf <- decevf(melo.ts, lag=4, axes=1) plot(melo.evf, col=c(1, 4, 2)) # A superposed graph is better in the present case plot(melo.evf, col=c(1, 4), xlab="stations", stack=FALSE, resid=FALSE, lpos=c(0, 60000))
Compute a seasonal decomposition of a regular time series using a LOESS method (local polynomial regression)
decloess(x, type="additive", s.window=NULL, s.degree=0, t.window=NULL, t.degree=2, robust=FALSE, trend=FALSE)
decloess(x, type="additive", s.window=NULL, s.degree=0, t.window=NULL, t.degree=2, robust=FALSE, trend=FALSE)
x |
a regular time series ('rts' under S+ and 'ts' under R) |
type |
the type of model. This is for compatibility purpose. The only model type that is accepted for this method is |
s.window |
the width of the window used to extract the seasonal component. Use an odd value equal or just larger than the number of annual values (frequency of the time series). Use another value to extract other cycles (circadian, lunar,...). Using |
s.degree |
the order of the polynome to use to extract the seasonal component (0 or 1). By default |
t.window |
the width of the window to use to extract the general trend when |
t.degree |
the order of the polynome to use to extract the general trend (0, 1 or 2). By default |
robust |
if |
trend |
If |
This function uses the stl()
function for the decomposition. It is a wrapper that create a 'tsd' object
a 'tsd' object
Philippe Grosjean ([email protected]), Frédéric Ibanez ([email protected])
Cleveland, W.S., E. Grosse & W.M. Shyu, 1992. Local regression models. Chapter 8 of Statistical Models in S. J.M. Chambers & T.J. Hastie (eds). Wadsworth & Brook/Cole.
Cleveland, R.B., W.S. Cleveland, J.E. McRae, & I. Terpenning, 1990. STL: A seasonal-trend decomposition procedure based on loess. J. Official Stat., 6:3-73.
tsd
, tseries
, decaverage
, deccensus
, decmedian
, decdiff
, decevf
, decreg
data(releve) melo.regy <- regul(releve$Day, releve$Melosul, xmin=9, n=87, units="daystoyears", frequency=24, tol=2.2, methods="linear", datemin="21/03/1989", dateformat="d/m/Y") melo.ts <- tseries(melo.regy) melo.dec <- decloess(melo.ts, s.window="periodic") plot(melo.dec, col=1:3)
data(releve) melo.regy <- regul(releve$Day, releve$Melosul, xmin=9, n=87, units="daystoyears", frequency=24, tol=2.2, methods="linear", datemin="21/03/1989", dateformat="d/m/Y") melo.ts <- tseries(melo.regy) melo.dec <- decloess(melo.ts, s.window="periodic") plot(melo.dec, col=1:3)
This is a nonlinear filtering method used to smooth, but also to segment a time series. The isolated peaks and pits are leveraged by this method.
decmedian(x, type="additive", order=1, times=1, ends="fill")
decmedian(x, type="additive", order=1, times=1, ends="fill")
x |
a regular time series ('rts' under S+ and 'ts' under R) |
type |
the type of model, either |
order |
the window used for the running median corresponds to 2*order + 1 |
times |
the number of times the running median is applied. By default, 1 |
ends |
the method used to calculate ends. Either "NAs" (fill extremes, non-calculable values with NAs), or "fill" (fill these extremes with the closest calculable median) |
a 'tsd' object
Frédéric Ibanez ([email protected]), Philippe Grosjean ([email protected])
Gebski, V.J., 1985. Some properties of splicing when applied to non-linear smoothers. Comp. Stat. Data Anal., 3:151-157.
Philips, L. & R. Blomme, 1973. Analyse chronologique. Université Catholique de Louvain. Vander ed. 339 pp.
Tukey, J.W., 1977. Exploratory Data Analysis. Reading Massachusetts: Addison-Wesley.
tsd
, tseries
, decaverage
, deccensus
, decdiff
, decevf
, decreg
, decloess
data(marbio) ClausoB.ts <- ts(log(marbio$ClausocalanusB + 1)) ClausoB.dec <- decmedian(ClausoB.ts, order=2, times=10, ends="fill") plot(ClausoB.dec, col=c(1, 4, 2), xlab="stations") # This is a transect across a frontal zone: plot(ClausoB.dec, col=c(0, 2), xlab="stations", stack=FALSE, resid=FALSE) lines(c(17, 17), c(0, 10), col=4, lty=2) lines(c(25, 25), c(0, 10), col=4, lty=2) lines(c(30, 30), c(0, 10), col=4, lty=2) lines(c(41, 41), c(0, 10), col=4, lty=2) lines(c(46, 46), c(0, 10), col=4, lty=2) text(c(8.5, 21, 27.5, 35, 43.5, 57), 8.7, labels=c("Peripheral Zone", "D1", "C", "Front", "D2", "Central Zone"))
data(marbio) ClausoB.ts <- ts(log(marbio$ClausocalanusB + 1)) ClausoB.dec <- decmedian(ClausoB.ts, order=2, times=10, ends="fill") plot(ClausoB.dec, col=c(1, 4, 2), xlab="stations") # This is a transect across a frontal zone: plot(ClausoB.dec, col=c(0, 2), xlab="stations", stack=FALSE, resid=FALSE) lines(c(17, 17), c(0, 10), col=4, lty=2) lines(c(25, 25), c(0, 10), col=4, lty=2) lines(c(30, 30), c(0, 10), col=4, lty=2) lines(c(41, 41), c(0, 10), col=4, lty=2) lines(c(46, 46), c(0, 10), col=4, lty=2) text(c(8.5, 21, 27.5, 35, 43.5, 57), 8.7, labels=c("Peripheral Zone", "D1", "C", "Front", "D2", "Central Zone"))
Providing values coming from a regression on the original series, a tsd
object is created using the original series, the regression model and the residuals
decreg(x, xreg, type="additive")
decreg(x, xreg, type="additive")
x |
a regular time series ('rts' under S+ and 'ts' under R) |
xreg |
a second regular time series or a vector of the same length as |
type |
the type of model, either |
a 'tsd' object
Frédéric Ibanez ([email protected]), Philippe Grosjean ([email protected])
Frontier, S., 1981. Méthodes statistiques. Masson, Paris. 246 pp.
Kendall, M., 1976. Time-series. Charles Griffin & Co Ltd. 197 pp.
Legendre, L. & P. Legendre, 1984. Ecologie numérique. Tome 2: La structure des données écologiques. Masson, Paris. 335 pp.
Malinvaud, E., 1978. Méthodes statistiques de l'économétrie. Dunod, Paris. 846 pp.
Sokal, R.R. & F.J. Rohlf, 1981. Biometry. Freeman & Co, San Francisco. 860 pp.
tsd
, tseries
, decaverage
, deccensus
, decdiff
, decevf
, decmedian
, decloess
data(marphy) density <- ts(marphy[, "Density"]) plot(density) Time <- time(density) # Linear model to represent trend density.lin <- lm(density ~ Time) summary(density.lin) xreg <- predict(density.lin) lines(xreg, col=3) density.dec <- decreg(density, xreg) plot(density.dec, col=c(1, 3, 2), xlab="stations") # Order 2 polynomial to represent trend density.poly <- lm(density ~ Time + I(Time^2)) summary(density.poly) xreg2 <- predict(density.poly) plot(density) lines(xreg2, col=3) density.dec2 <- decreg(density, xreg2) plot(density.dec2, col=c(1, 3, 2), xlab="stations") # Fit a sinusoidal model on seasonal (artificial) data tser <- ts(sin((1:100)/12*pi)+rnorm(100, sd=0.3), start=c(1998, 4), frequency=24) Time <- time(tser) tser.sin <- lm(tser ~ I(cos(2*pi*Time)) + I(sin(2*pi*Time))) summary(tser.sin) tser.reg <- predict(tser.sin) tser.dec <- decreg(tser, tser.reg) plot(tser.dec, col=c(1, 4), xlab="stations", stack=FALSE, resid=FALSE, lpos=c(0, 4)) plot(tser.dec, col=c(1, 4, 2), xlab="stations") # One can also use nonlinear models (see 'nls') # or autoregressive models (see 'ar' and others in 'ts' library)
data(marphy) density <- ts(marphy[, "Density"]) plot(density) Time <- time(density) # Linear model to represent trend density.lin <- lm(density ~ Time) summary(density.lin) xreg <- predict(density.lin) lines(xreg, col=3) density.dec <- decreg(density, xreg) plot(density.dec, col=c(1, 3, 2), xlab="stations") # Order 2 polynomial to represent trend density.poly <- lm(density ~ Time + I(Time^2)) summary(density.poly) xreg2 <- predict(density.poly) plot(density) lines(xreg2, col=3) density.dec2 <- decreg(density, xreg2) plot(density.dec2, col=c(1, 3, 2), xlab="stations") # Fit a sinusoidal model on seasonal (artificial) data tser <- ts(sin((1:100)/12*pi)+rnorm(100, sd=0.3), start=c(1998, 4), frequency=24) Time <- time(tser) tser.sin <- lm(tser ~ I(cos(2*pi*Time)) + I(sin(2*pi*Time))) summary(tser.sin) tser.reg <- predict(tser.sin) tser.dec <- decreg(tser, tser.reg) plot(tser.dec, col=c(1, 4), xlab="stations", stack=FALSE, resid=FALSE, lpos=c(0, 4)) plot(tser.dec, col=c(1, 4, 2), xlab="stations") # One can also use nonlinear models (see 'nls') # or autoregressive models (see 'ar' and others in 'ts' library)
Transform a factor in separate variables (one per level) with a binary code (0 for absent, 1 for present) in each variable
disjoin(x)
disjoin(x)
x |
a vector containing a factor data |
Use cut()
to transform a numerical variable into a factor variable
a matrix containing the data with binary coding
Frédéric Ibanez ([email protected]), Philippe Grosjean ([email protected])
Fromentin J.-M., F. Ibanez & P. Legendre, 1993. A phytosociological method for interpreting plankton data. Mar. Ecol. Prog. Ser., 93:285-306.
Gebski, V.J., 1985. Some properties of splicing when applied to non-linear smoothers. Comput. Stat. Data Anal., 3:151-157.
Grandjouan, G., 1982. Une méthode de comparaison statistique entre les répartitions des plantes et des climats. Thèse d'Etat, Université Louis Pasteur, Strasbourg.
Ibanez, F., 1976. Contribution à l'analyse mathématique des événements en Ecologie planctonique. Optimisations méthodologiques. Bull. Inst. Océanogr. Monaco, 72:1-96.
# Artificial data with 1/5 of zeros Z <- c(abs(rnorm(8000)), rep(0, 2000)) # Let the program chose cuts table(cut(Z, breaks=5)) # Create one class for zeros, and 4 classes for the other observations Z2 <- Z[Z != 0] cuts <- c(-1e-10, 1e-10, quantile(Z2, 1:5/5, na.rm=TRUE)) cuts table(cut(Z, breaks=cuts)) # Binary coding of these data disjoin(cut(Z, breaks=cuts))[1:10, ]
# Artificial data with 1/5 of zeros Z <- c(abs(rnorm(8000)), rep(0, 2000)) # Let the program chose cuts table(cut(Z, breaks=5)) # Create one class for zeros, and 4 classes for the other observations Z2 <- Z[Z != 0] cuts <- c(-1e-10, 1e-10, quantile(Z2, 1:5/5, na.rm=TRUE)) cuts table(cut(Z, breaks=cuts)) # Binary coding of these data disjoin(cut(Z, breaks=cuts))[1:10, ]
A distogram is an extension of the variogram to a multivariate time-series. It computes, for each observation (with a constant interval h between each observation), the euclidean distance normated to one (chord distance)
disto(x, max.dist=nrow(x)/4, plotit=TRUE, disto.data=NULL)
disto(x, max.dist=nrow(x)/4, plotit=TRUE, disto.data=NULL)
x |
a matrix, a data frame or a multiple time-series |
max.dist |
the maximum distance to calculate. By default, it is the third of the number of observations (that is, the number of rows in the matrix) |
plotit |
If |
disto.data |
data coming from a previous call to |
A data frame containing distance and distogram values
Frédéric Ibanez ([email protected]), Philippe Grosjean ([email protected])
Dauvin, J.C. & F. Ibanez, 1986. Variations à long-terme (1977-1985) du peuplement des sables fins de la Pierre Noire (baie de Morlaix, Manche Occidentale): analyse statistique de l'évolution structurale. Hydrobiologia, 142:171-186.
Ibanez, F. & J.C. Dauvin, 1988. Long-term changes (1977-1987) in a muddy fine sand Abra alba - Melinna palmate community from the Western English Channel: multivariate time-series analysis. Mar. Ecol. Prog. Ser., 49:65-81.
Mackas, D.L., 1984. Spatial autocorrelation of plankton community composition in a continental shelf ecosystem. Limnol. Ecol., 20:451-471.
data(bnr) disto(bnr)
data(bnr) disto(bnr)
Calculate equivalent vectors sensu Escoufier, that is, most significant variables from a multivariate data frame according to a principal component analysis (variables that are most correlated with the principal axes). This method is useful mainly for physical or chemical data where simply summarizing them with a PCA does not always gives easily interpretable principal axes.
escouf(x, level=1, verbose=TRUE) ## S3 method for class 'escouf' print(x, ...) ## S3 method for class 'escouf' summary(object, ...) ## S3 method for class 'summary.escouf' print(x, ...) ## S3 method for class 'escouf' plot(x, level=x$level, lhorz=TRUE, lvert=TRUE, lvars=TRUE, lcol=2, llty=2, diff=TRUE, dlab="RV' (units not shown)", dcol=4, dlty=par("lty"), dpos=0.8, type="s", xlab="variables", ylab="RV", main=paste("Escoufier's equivalent vectors for:",x$data), ...) ## S3 method for class 'escouf' lines(x, level=x$level, lhorz=TRUE, lvert=TRUE, lvars=TRUE, col=2, lty=2, ...) ## S3 method for class 'escouf' identify(x, lhorz=TRUE, lvert=TRUE, lvars=TRUE, col=2, lty=2, ...) ## S3 method for class 'escouf' extract(e, n, level=e$level, ...)
escouf(x, level=1, verbose=TRUE) ## S3 method for class 'escouf' print(x, ...) ## S3 method for class 'escouf' summary(object, ...) ## S3 method for class 'summary.escouf' print(x, ...) ## S3 method for class 'escouf' plot(x, level=x$level, lhorz=TRUE, lvert=TRUE, lvars=TRUE, lcol=2, llty=2, diff=TRUE, dlab="RV' (units not shown)", dcol=4, dlty=par("lty"), dpos=0.8, type="s", xlab="variables", ylab="RV", main=paste("Escoufier's equivalent vectors for:",x$data), ...) ## S3 method for class 'escouf' lines(x, level=x$level, lhorz=TRUE, lvert=TRUE, lvars=TRUE, col=2, lty=2, ...) ## S3 method for class 'escouf' identify(x, lhorz=TRUE, lvert=TRUE, lvars=TRUE, col=2, lty=2, ...) ## S3 method for class 'escouf' extract(e, n, level=e$level, ...)
x |
For |
level |
The level of correlation at which to stop calculation. By default |
verbose |
Print calculation steps. This allows to control the percentage of calculation already achieved when computation takes a long time (that is, with many variables to sort) |
object |
An 'escouf' object returned by |
e |
An 'escouf' object returned by |
lhorz |
If |
lvert |
If |
lvars |
If |
lcol |
The color to use to draw the lines ( |
llty |
The style used to draw the lines ( |
diff |
If |
dlab |
The label to use for the RV' curve. By default: |
dcol |
The color to use for the RV' curve (by default, color 4 is used) |
type |
The type of graph to plot |
xlab |
the label of the x-axis |
ylab |
the label of the y-axis |
main |
the main title of the graph |
dlty |
The style for the RV' curve |
col |
The color to use to draw the lines ( |
lty |
The style used to draw the lines ( |
dpos |
The relative horizontal position of the label for the RV' curve. The default value of 0.8 means that the label is placed at 80% of the horizontal axis.Vertical position of the label is automatically determined |
n |
The number of variables to extract. If a value is given, it has the priority on |
... |
additional parameters |
An object of type 'escouf' is returned. It has methods print()
, summary()
, plot()
, lines()
, identify()
, extract()
.
Since a large number of iterations is done, this function is slow with a large number of variables (more than 25-30)!
Frédéric Ibanez ([email protected]), Philippe Grosjean ([email protected]), Benjamin Planque ([email protected]), Jean-Marc Fromentin ([email protected])
Cambon, J., 1974. Vecteur équivalent à un autre au sens des composantes principales. Application hydrologique. DEA de Mathématiques Appliquées, Université de Montpellier.
Escoufier, Y., 1970. Echantillonnage dans une population de variables aléatoires réelles. Pub. Inst. Stat. Univ. Paris, 19:1-47.
Jabaud, A., 1996. Cadre climatique et hydrobiologique du lac Léman. DEA d'Océanologie Biologique Paris.
data(marbio) marbio.esc <- escouf(marbio) summary(marbio.esc) plot(marbio.esc) # The x-axis has short labels. For more info., enter: marbio.esc$vr # Define a level at which to extract most significant variables marbio.esc$level <- 0.90 # Show it on the graph lines(marbio.esc) # This can also be done interactively on the plot using: # marbio.esc$level <- identify(marbio.esc) # Finally, extract most significant variables marbio2 <- extract(marbio.esc) names(marbio2)
data(marbio) marbio.esc <- escouf(marbio) summary(marbio.esc) plot(marbio.esc) # The x-axis has short labels. For more info., enter: marbio.esc$vr # Define a level at which to extract most significant variables marbio.esc$level <- 0.90 # Show it on the graph lines(marbio.esc) # This can also be done interactively on the plot using: # marbio.esc$level <- identify(marbio.esc) # Finally, extract most significant variables marbio2 <- extract(marbio.esc) names(marbio2)
‘extract’ is a generic function for extracting a part of the original dataset according to an analysis...)
extract(e, n, ...)
extract(e, n, ...)
e |
An object from where extraction is performed |
n |
The number of elements to extract |
... |
Additional arguments affecting the extraction |
A subset of the original dataset contained in the extracted object
abund
, regul
, tsd
, turnogram
, turnpoints
Extract the first element of a vector. Useful for the turnogram()
function
first(x, na.rm=FALSE)
first(x, na.rm=FALSE)
x |
a numerical vector |
na.rm |
if |
a numerical value
Philippe Grosjean ([email protected]), Frédéric Ibanez ([email protected])
a <- c(NA, 1, 2, NA, 3, 4, NA) first(a) first(a, na.rm=TRUE)
a <- c(NA, 1, 2, NA, 3, 4, NA) first(a) first(a, na.rm=TRUE)
This is an internal function called by some plot()
methods. Considering the time series 'units' attribute and the frequency of the observations in the series, the function returns a string with a pertinent time unit. For instance, if the unit is 'years' and the frequency is 12, then data are monthly sampled and GetUnitText()
returns the string "months"
GetUnitText(series)
GetUnitText(series)
series |
a regular time series (a 'rts' object in Splus, or a 'ts' object in R) |
a string with the best time unit for graphs
Philippe Grosjean ([email protected]), Fr?d?ric Ibanez ([email protected])
timeser <- ts(1:24, frequency=12) # 12 observations per year attr(timeser, "units") <- "years" # time in years for 'ts' object GetUnitText(timeser) # formats unit (1/12 year=months)
timeser <- ts(1:24, frequency=12) # 12 observations per year attr(timeser, "units") <- "years" # time in years for 'ts' object GetUnitText(timeser) # formats unit (1/12 year=months)
This is equivalent to is.rts()
in Splus and to is.ts()
in R. is.tseries()
recognizes both 'rts' and 'ts' objects whatever the environment (Splus or R)
is.tseries(x)
is.tseries(x)
x |
an object |
a boolean value
Philippe Grosjean ([email protected]), Frédéric Ibanez ([email protected])
tser <- ts(sin((1:100)/6*pi)+rnorm(100, sd=0.5), start=c(1998, 4), frequency=12) is.tseries(tser) # TRUE not.a.ts <- c(1,2,3) is.tseries(not.a.ts) # FALSE
tser <- ts(sin((1:100)/6*pi)+rnorm(100, sd=0.5), start=c(1998, 4), frequency=12) is.tseries(tser) # TRUE not.a.ts <- c(1,2,3) is.tseries(not.a.ts) # FALSE
Extract the last element of a vector. Useful for the turnogram()
function
last(x, na.rm=FALSE)
last(x, na.rm=FALSE)
x |
a numerical vector |
na.rm |
if |
a numerical value
Philippe Grosjean ([email protected]), Frédéric Ibanez ([email protected])
a <- c(NA, 1, 2, NA, 3, 4, NA) last(a) last(a, na.rm=TRUE)
a <- c(NA, 1, 2, NA, 3, 4, NA) last(a) last(a, na.rm=TRUE)
A simple method using cumulated sums that allows to detect changes in the tendency in a time series
local.trend(x, k=mean(x), plotit=TRUE, type="l", cols=1:2, ltys=2:1, xlab="Time", ylab="cusum", ...) ## S3 method for class 'local.trend' identify(x, ...)
local.trend(x, k=mean(x), plotit=TRUE, type="l", cols=1:2, ltys=2:1, xlab="Time", ylab="cusum", ...) ## S3 method for class 'local.trend' identify(x, ...)
x |
a regular time series (a 'ts' object) for |
k |
the reference value to substract from cumulated sums. By default, it is the mean of all observations in the series |
plotit |
if |
type |
the type of plot (as usual notation for this argument) |
cols |
colors to use for original data and for the trend line |
ltys |
line types to use for original data and the trend line |
xlab |
label of the x-axis |
ylab |
label of the y-axis |
... |
additional arguments for the graph |
With local.trend()
, you can:
- detect changes in the mean value of a time series
- determine the date of occurence of such changes
- estimate the mean values on homogeneous intervals
a 'local.trend' object is returned. It has the method identify()
Once transitions are identified with this method, you can use
stat.slide()
to get more detailed information on each phase. A
smoothing of the series using running medians (see decmedian()
) allows
also to detect various levels in a time series, but according to the median
statistic. Under R, see also the 'strucchange' package for a more complete,
but more complex, implementation of cumsum applied to time series.
Frédéric Ibanez ([email protected]), Philippe Grosjean ([email protected])
Ibanez, F., J.M. Fromentin & J. Castel, 1993. Application de la méthode des sommes cumulées à l'analyse des séries chronologiques océanographiques. C. R. Acad. Sci. Paris, Life Sciences, 316:745-748.
trend.test
, stat.slide
,
decmedian
data(bnr) # Calculate and plot cumsum for the 8th series bnr8.lt <- local.trend(bnr[,8]) # To identify local trends, use: # identify(bnr8.lt) # and click points between which you want to compute local linear trends...
data(bnr) # Calculate and plot cumsum for the 8th series bnr8.lt <- local.trend(bnr[,8]) # To identify local trends, use: # identify(bnr8.lt) # and click points between which you want to compute local linear trends...
The marbio
data frame has 68 rows (stations) and 24 columns (taxonomic zooplankton groups).
Abundances are expressed in no per cubic meter of seawater.
data(marbio)
data(marbio)
This data frame contains the following columns giving abundance of:
Acartia clausi - herbivorous
Calanus helgolandicus (adults) - herbivorous
Idem (copepodits 1) - omnivorous
Idem (copepodits 2) - omnivorous
Idem (copepodits 3) - omnivorous
Idem (copepodits 4) - omnivorous
Idem (copepodits 5) - omnivorous
Clausocalanus size class A - herbivorous
Clausocalanus size class B - herbivorous
Clausocalanus size class C - herbivorous
Centropages tipicus (adults) - omnivorous
Centropages typicus (juv.) - omnivorous
Nauplii of copepods - filter feeders
Oithona sp. - carnivorous
Various species of acanthaires - misc
Various species of cladocerans - carnivorous
Larvae of echinoderms - filter feeders
Larvae of decapods - omnivorous
Larvae of gastropods - filter feeders
Eggs of crustaceans - misc
Various species of ostracods - omnivorous
Various species of pteropods - herbivorous
Various species of siphonophores - carnivorous
Bells of calycophores - misc
This dataset corresponds to a single transect sampled with a plankton net across the Ligurian Sea front in the Mediterranean Sea, between Nice (France) and Calvi (Corsica). The transect extends from the Cap Ferrat (close to Nice) to about 65 km offshore in the direction of Calvi (along a bearing of 123°). 68 regularly spaced samples where collected on this transect. For more information about the water masses and their physico-chemical characteristics, see the marphy dataset.
Boucher, J., F. Ibanez & L. Prieur (1987) Daily and seasonal variations in the spatial distribution of zooplankton populations in relation to the physical structure in the Ligurian Sea Front. Journal of Marine Research, 45:133-173.
Fromentin, J.-M., F. Ibanez & P. Legendre (1993) A phytosociological method for interpreting plankton data. Marine Ecology Progress Series, 93:285-306.
The marphy
data frame has 68 rows (stations) and 4 columns.
They are seawater measurements at a deep of 3 to 7 m at the same 68 stations as marbio
.
data(marphy)
data(marphy)
This data frame contains the following columns:
Seawater temperature in °C
Salinity in g/kg
Fluorescence of chlorophyll a
Excess of volumic mass of the seawater in g/l
This dataset corresponds to a single transect sampled across the Ligurian Sea front in the Mediterranean Sea, between Nice (France) and Calvi (Corsica). The transect extends from the Cap Ferrat (close to Nice) to about 65 km offshore in the direction of Calvi (along a bearing of 123°). 68 regularly spaced measurements where recorded. They correspond to the stations where zooplankton was collected in the marbio
dataset. Water masses in the transect across the front where identified as:
Peripheral zone
D1 (divergence) zone
C (convergence) zone
Frontal zone
D2 (divergence) zone
Central zone
Boucher, J., F. Ibanez & L. Prieur (1987) Daily and seasonal variations in the spatial distribution of zooplankton populations in relation to the physical structure in the Ligurian Sea Front. Journal of Marine Research, 45:133-173.
Fromentin, J.-M., F. Ibanez & P. Legendre (1993) A phytosociological method for interpreting plankton data. Marine Ecology Progress Series, 93:285-306.
Determine which observations in a regular time series match observation in an original irregular one, accepting a given tolerance window in the time-scale. This function is internally used for regulation (functions regul()
, regul.screen()
and regul.adj()
match.tol(x, table, nomatch=NA, tol.type="both", tol=0)
match.tol(x, table, nomatch=NA, tol.type="both", tol=0)
x |
a numerical vector containing seeked values (time-scale of the regular series) |
table |
a numerical vector containing initial times to look for match in |
nomatch |
the symbol tu use to flag an entry where no match is found. By default, |
tol.type |
the type of adjustment to use for the time-tolerance: |
tol |
the tolerance in the time-scale to determine if a value in |
a vector of the same length of x
, indicating the position of the matching observations in table
Philippe Grosjean ([email protected]), Frédéric Ibanez ([email protected])
regul
, regul.screen
, regul.adj
X <- 1:5 Table <- c(1, 3.1, 3.8, 4.4, 5.1, 6) match.tol(X, Table) match.tol(X, Table, tol=0.2) match.tol(X, Table, tol=0.55)
X <- 1:5 Table <- c(1, 3.1, 3.8, 4.4, 5.1, 6) match.tol(X, Table) match.tol(X, Table, tol=0.2) match.tol(X, Table, tol=0.55)
Calculate the mean, the variance and the variance of the mean for a single series, according to Pennington (correction for zero/missing values for abundance of species collected with a net)
pennington(x, calc="all", na.rm=FALSE)
pennington(x, calc="all", na.rm=FALSE)
x |
a numerical vector |
calc |
indicate which estimator(s) should be calculated. Use: |
na.rm |
if |
A complex problem in marine ecology is the notion of zero. In fact, the random sampling of a fish population often leads to a table with many null values. Do these null values indicate that the fish was absent or do we have to consider these null values as missing data, that is, that the fish was rare but present and was not caught by the net? For instance, for 100 net trails giving 80 null values, how to estimate the mean? Do we have to divide the sum of fishes caught by 100 or by 20?
Pennington (1983) applied in this case the probabilistic theory of Aitchison (1955). The later established a method to estimate mean and variance of a random variable with a non-null probability to present zero values and whose the conditional distribution corresponding to its non-null values follows a Gaussian curve. In practice, a lognormal distribution is found most of the time for non-null values in fishes population. It is also often the case for the plankton, after our own experience. pennington()
calculates thus statistics for such conditional lognormal distributions.
a single value, or a vector with "mean", "var" and "mean.var" if the argument calc="all"
For multiple series, or for getting more statistics on the series, use stat.pen()
. Use stat.slide()
for obtaining statistics calculated separately for various intervals along the time for a time series
Frédéric Ibanez ([email protected]), Philippe Grosjean ([email protected])
Aitchison, J., 1955. On the distribution of a positive random variable having a discrete probability mass at the origin. J. Amer. Stat. Ass., 50:901-908.
Pennington, M., 1983. Efficient estimations of abundance for fish and plankton surveys. Biometrics, 39:281-286.
data(marbio) pennington(marbio[, "Copepodits2"]) pennington(marbio[, "Copepodits2"], calc="mean", na.rm=TRUE)
data(marbio) pennington(marbio[, "Copepodits2"]) pennington(marbio[, "Copepodits2"], calc="mean", na.rm=TRUE)
The Gleissberg distribution gives the probability to have k extrema in a series of n observations. This distribution is used in the turnogram to determine if monotony indices are significant (see turnogram()
)
pgleissberg(n, k, lower.tail=TRUE, two.tailed=FALSE)
pgleissberg(n, k, lower.tail=TRUE, two.tailed=FALSE)
n |
the number of observations in the series |
k |
the number of extrema in the series, as calculated by |
lower.tail |
if |
two.tailed |
if |
a value giving the probability to have k
extrema in a series of n
observations
The Gleissberg distribution is asymptotically normal. For n
> 50, the distribution is approximated by a Gaussian curve. For lower n
values, the exact probability is returned (using data in the variable .gleissberg.table
Frédéric Ibanez ([email protected]), Philippe Grosjean ([email protected])
Dallot, S. & M. Etienne, 1990. Une méthode non paramétrique d'analyse des séries en océanographie biologique: les tournogrammes. Biométrie et océanographie - Société de biométrie, 6, Lille, 26-28 mai 1986. IFREMER, Actes de colloques, 10:13-31.
Johnson, N.L. & Kotz, S., 1969. Discrete distributions. J. Wiley & sons, New York, 328 pp.
.gleissberg.table
, turnpoints
, turnogram
# Until n=50, the exact probability is returned pgleissberg(20, 10, lower.tail=TRUE, two.tailed=FALSE) # For higher n values, it is approximated by a normal distribution pgleissberg(60, 33, lower.tail=TRUE, two.tailed=FALSE)
# Until n=50, the exact probability is returned pgleissberg(20, 10, lower.tail=TRUE, two.tailed=FALSE) # For higher n values, it is approximated by a normal distribution pgleissberg(60, 33, lower.tail=TRUE, two.tailed=FALSE)
Transform an irregular time series in a regular time series, or fill gaps in regular time series using the area method
regarea(x, y=NULL, xmin=min(x), n=length(x), deltat=(max(x) - min(x))/(n - 1), rule=1, window=deltat, interp=FALSE, split=100)
regarea(x, y=NULL, xmin=min(x), n=length(x), deltat=(max(x) - min(x))/(n - 1), rule=1, window=deltat, interp=FALSE, split=100)
x |
a vector with time in the irregular series. Missing values are allowed |
y |
a vector of same length as |
xmin |
allows to respecify the origin of time in the calculated regular time series. By default, the origin is not redefined and it is equivalent to the smallest value in |
n |
the number of observations in the regular time series. By default, it is the same number than in the original irregular time series (i.e., |
deltat |
the time interval between two observations in the regulated time series |
rule |
the rule to use for extrapolated values (outside of the range in the initial irregular time series) in the regular time series. With |
window |
size of the window to use for interpolation. By default, adjacent windows are used ( |
interp |
indicates if matching observations in both series must be calculated ( |
split |
a parameter to optimise calculation time and to avoid saturation of the memory. Very long time series are splitted into smaller subunits. This is transparent for the user. The default value of |
This is a linear interpolation method described by Fox (1972). It takes into account all observations located in a given time window around the missing observation. On the contrary to many other interpolations (constant, linear, spline), the interpolated curve does not pass by the initial observations. Indeed, a smoothing is also operated simultaneously by this method. The importance of the smoothing is dependent on the size of the window (the largest it is, the smoothest will be the calculated regular time series)
An object of type 'regul' is returned. It has methods print()
, summary()
, plot()
, lines()
, identify()
, hist()
, extract()
and specs()
.
Philippe Grosjean ([email protected]), Frédéric Ibanez ([email protected])
Fox, W.T. & J.A. Brown, 1965. The use of time-trend analysis for environmental interpretation of limestones. J. Geol., 73:510-518.
Ibanez, F., 1991. Treatment of the data deriving from the COST 647 project on coastal benthic ecology: The within-site analysis. In: B. Keegan (ed). Space and Time Series Data Analysis in Coastal Benthic Ecology. Pp 5-43.
Ibanez, F. & J.C. Dauvin, 1988. Long-term changes (1977-1987) on a muddy fine sand Abra alba - Melinna palmata population community from the Western English Channel. J. Mar. Ecol. Prog. Ser., 49:65-81.
regul
, regconst
, reglin
, regspline
, regul.screen
, regul.adj
, tseries
, is.tseries
data(releve) # The 'Melosul' series is regulated with a 25-days window reg <- regarea(releve$Day, releve$Melosul, window=25) # Then with a 50-days window reg2 <- regarea(releve$Day, releve$Melosul, window=50) # The initial and both regulated series are shown on the graph for comparison plot(releve$Day, releve$Melosul, type="l") lines(reg$x, reg$y, col=2) lines(reg2$x, reg2$y, col=4)
data(releve) # The 'Melosul' series is regulated with a 25-days window reg <- regarea(releve$Day, releve$Melosul, window=25) # Then with a 50-days window reg2 <- regarea(releve$Day, releve$Melosul, window=50) # The initial and both regulated series are shown on the graph for comparison plot(releve$Day, releve$Melosul, type="l") lines(reg$x, reg$y, col=2) lines(reg2$x, reg2$y, col=4)
Transform an irregular time series in a regular time series, or fill gaps in regular time series using the constant value method
regconst(x, y=NULL, xmin=min(x), n=length(x), deltat=(max(x) - min(x))/(n - 1), rule=1, f=0)
regconst(x, y=NULL, xmin=min(x), n=length(x), deltat=(max(x) - min(x))/(n - 1), rule=1, f=0)
x |
a vector with time in the irregular series. Missing values are allowed |
y |
a vector of same length as |
xmin |
allows to respecify the origin of time in the calculated regular time series. By default, the origin is not redefined and it is equivalent to the smallest value in |
n |
the number of observations in the regular time series. By default, it is the same number than in the original irregular time series (i.e., |
deltat |
the time interval between two observations in the regulated time series |
rule |
the rule to use for extrapolated values (outside of the range in the initial irregular time series) in the regular time series. With |
f |
coefficient giving more weight to the left value ( |
This is the simplest, but the less powerful regulation method. Interpolated values are calculated according to existing observations at left and at right as: x[reg] = x[right]*f + x[left]*(f-1), with 0 < f < 1.
An object of type 'regul' is returned. It has methods print()
, summary()
, plot()
, lines()
, identify()
, hist()
, extract()
and specs()
.
This function uses approx()
for internal calculations
Frédéric Ibanez ([email protected]), Philippe Grosjean ([email protected])
regul
, regarea
, reglin
, regspline
, regul.screen
, regul.adj
, tseries
, is.tseries
data(releve) reg <- regconst(releve$Day, releve$Melosul) plot(releve$Day, releve$Melosul, type="l") lines(reg$x, reg$y, col=2)
data(releve) reg <- regconst(releve$Day, releve$Melosul) plot(releve$Day, releve$Melosul, type="l") lines(reg$x, reg$y, col=2)
Transform an irregular time series in a regular time series, or fill gaps in regular time series using a linear interpolation
reglin(x, y=NULL, xmin=min(x), n=length(x), deltat=(max(x) - min(x))/(n - 1), rule=1)
reglin(x, y=NULL, xmin=min(x), n=length(x), deltat=(max(x) - min(x))/(n - 1), rule=1)
x |
a vector with time in the irregular series. Missing values are allowed |
y |
a vector of same length as |
xmin |
allows to respecify the origin of time in the calculated regular time series. By default, the origin is not redefined and it is equivalent to the smallest value in |
n |
the number of observations in the regular time series. By default, it is the same number than in the original irregular time series (i.e., |
deltat |
the time interval between two observations in the regulated time series |
rule |
the rule to use for extrapolated values (outside of the range in the initial irregular time series) in the regular time series. With |
Observed values are connected by lines and interpolated values are obtained from this "polyline".
An object of type 'regul' is returned. It has methods print()
, summary()
, plot()
, lines()
, identify()
, hist()
, extract()
and specs()
.
This function uses approx()
for internal calculations
Frédéric Ibanez ([email protected]), Philippe Grosjean ([email protected])
regul
, regarea
, regconst
, regspline
, regul.screen
, regul.adj
, tseries
, is.tseries
data(releve) reg <- reglin(releve$Day, releve$Melosul) plot(releve$Day, releve$Melosul, type="l") lines(reg$x, reg$y, col=2)
data(releve) reg <- reglin(releve$Day, releve$Melosul) plot(releve$Day, releve$Melosul, type="l") lines(reg$x, reg$y, col=2)
Transform an irregular time series in a regular time series, or fill gaps in regular time series using splines
regspline(x, y=NULL, xmin=min(x), n=length(x), deltat=(max(x) - min(x))/(n - 1), rule=1, periodic=FALSE)
regspline(x, y=NULL, xmin=min(x), n=length(x), deltat=(max(x) - min(x))/(n - 1), rule=1, periodic=FALSE)
x |
a vector with time in the irregular series. Missing values are allowed |
y |
a vector of same length as |
xmin |
allows to respecify the origin of time in the calculated regular time series. By default, the origin is not redefined and it is equivalent to the smallest value in |
n |
the number of observations in the regular time series. By default, it is the same number than in the original irregular time series (i.e., |
deltat |
the time interval between two observations in the regulated time series |
rule |
the rule to use for extrapolated values (outside of the range in the initial irregular time series) in the regular time series. With |
periodic |
indicates if the time series should be considered as periodic ( |
Missing values are interpolated using cubic splines between observed values.
An object of type 'regul' is returned. It has methods print()
, summary()
, plot()
, lines()
, identify()
, hist()
, extract()
and specs()
.
This function uses spline()
for internal calculations. However, interpolated values are not allowed to be higher than the largest initial observation or lower than the smallest one.
Frédéric Ibanez ([email protected]), Philippe Grosjean ([email protected])
Lancaster, P. & K. Salkauskas, 1986. Curve and surface fitting. Academic Press, England, 280 pp.
regul
, regarea
, regconst
, reglin
, regul.screen
, regul.adj
, tseries
, is.tseries
, splinefun
data(releve) reg <- regspline(releve$Day, releve$Melosul) plot(releve$Day, releve$Melosul, type="l") lines(reg$x, reg$y, col=2)
data(releve) reg <- regspline(releve$Day, releve$Melosul) plot(releve$Day, releve$Melosul, type="l") lines(reg$x, reg$y, col=2)
Regulate irregular time series or regular time series with gaps. Create a regul
object from whose one or several regular time series can be extracted using extract()
or tseries()
. This is the function to apply most of the time to create regular time series ('rts' objects in Splus or 'ts' objects in R) that will be further analyzed by other functions that apply to regular time series.
regul(x, y=NULL, xmin=min(x), n=length(x), units="days", frequency=NULL, deltat=1/frequency, datemin=NULL, dateformat="m/d/Y", tol=NULL, tol.type="both", methods="linear", rule=1, f=0, periodic=FALSE, window=(max(x) - min(x))/(n - 1), split=100, specs=NULL) ## S3 method for class 'regul' print(x, ...) ## S3 method for class 'regul' summary(object, ...) ## S3 method for class 'summary.regul' print(x, ...) ## S3 method for class 'regul' plot(x, series=1, col=c(1, 2), lty=c(par("lty"), par("lty")), plot.pts=TRUE, leg=FALSE, llab=c("initial", x$specs$methods[series]), lpos=c(1.5, 10), xlab=paste("Time (", x$units, ")", sep = ""), ylab="Series", main=paste("Regulation of", names(x$y)[series]), ...) ## S3 method for class 'regul' lines(x, series=1, col=3, lty=1, plot.pts=TRUE, ...) ## S3 method for class 'regul' identify(x, series=1, col=3, label="#", ...) ## S3 method for class 'regul' hist(x, nclass=30, col=c(4, 5, 2), xlab=paste("Time distance in", x$units, "with start =", min(x$x), ", n = ", length(x$x), ", deltat =", x$tspar$deltat), ylab=paste("Frequency, tol =", x$specs$tol), main="Number of matching observations", plotit=TRUE, ...) ## S3 method for class 'regul' extract(e, n, series=NULL, ...) ## S3 method for class 'regul' specs(x, ...) ## S3 method for class 'specs.regul' print(x, ...)
regul(x, y=NULL, xmin=min(x), n=length(x), units="days", frequency=NULL, deltat=1/frequency, datemin=NULL, dateformat="m/d/Y", tol=NULL, tol.type="both", methods="linear", rule=1, f=0, periodic=FALSE, window=(max(x) - min(x))/(n - 1), split=100, specs=NULL) ## S3 method for class 'regul' print(x, ...) ## S3 method for class 'regul' summary(object, ...) ## S3 method for class 'summary.regul' print(x, ...) ## S3 method for class 'regul' plot(x, series=1, col=c(1, 2), lty=c(par("lty"), par("lty")), plot.pts=TRUE, leg=FALSE, llab=c("initial", x$specs$methods[series]), lpos=c(1.5, 10), xlab=paste("Time (", x$units, ")", sep = ""), ylab="Series", main=paste("Regulation of", names(x$y)[series]), ...) ## S3 method for class 'regul' lines(x, series=1, col=3, lty=1, plot.pts=TRUE, ...) ## S3 method for class 'regul' identify(x, series=1, col=3, label="#", ...) ## S3 method for class 'regul' hist(x, nclass=30, col=c(4, 5, 2), xlab=paste("Time distance in", x$units, "with start =", min(x$x), ", n = ", length(x$x), ", deltat =", x$tspar$deltat), ylab=paste("Frequency, tol =", x$specs$tol), main="Number of matching observations", plotit=TRUE, ...) ## S3 method for class 'regul' extract(e, n, series=NULL, ...) ## S3 method for class 'regul' specs(x, ...) ## S3 method for class 'specs.regul' print(x, ...)
x |
for regul: a vector containing times at which observations are sampled in the initial irregular time series. It can be expressed in any unit ("years", "days", "weeks", "hours", "min", "sec",...) as defined by the argument |
y |
a vector (single series) or a matrix/data frame whose columns correspond to the various irregular time series to regulate. Rows are observations made at corresponding times in |
xmin |
allows to respecify the origin of time in |
n |
the number of observations in the regular time series. By default, it is the same number than in the original irregular time series (i.e., |
units |
the time unit for the |
frequency |
the frequency of the regulated time series in the corresponding time unit. For instance, |
deltat |
the interval between two observations in the regulated time series. It is the inverse of |
datemin |
this is mostly useful for converting "days" in "years" time-scales ( |
dateformat |
indicate how |
tol |
the tolerance in the time-scale to determine if a measured value is used to approximate a regulated value. If |
tol.type |
the type of adjustment to use for the time-tolerance: |
methods |
the method(s) to use to regulate the time series. Currently, it can be: |
rule |
the rule to use for extrapolated values (observations in the final regular time series that are outside the range of observed values in the initial time series). With |
f |
parameter for the |
periodic |
parameter for the |
window |
parameter for the |
split |
other parameter for the |
specs |
a |
object |
A |
e |
A |
series |
the series to plot. By default, |
col |
(1) for |
lty |
the style to use to draw lines for the initial series and the regulated series, respectively. The default style is used for both lines if this argument is not provided |
plot.pts |
if |
leg |
do we add a legend to the graph? By default, |
llab |
the labels to use for the initial irregular and the final regulated series, respectively. By default, it is |
lpos |
the position of the top-left corner of the legend box (x,y), in the graph coordinates |
xlab |
the label of the x-axis |
ylab |
the label of the y-axis |
main |
the main title of the graph |
label |
the character to use to mark points interactively selected on the graph. By default, |
nclass |
the number of classes to calculate in the histogram. This is indicative and this value is automatically adjusted to obtain a nicely-formatted histogram. By default, |
plotit |
If |
... |
additional parameters |
Several irregular time series (for instance, contained in a data frame) can be treated at once. Specify a vector with "constant"
, "linear"
, "spline"
or "area"
for the argument methods
to use a different regulation method for each series. See corresponding fonctions (regconst()
, reglin()
, regspline()
and regarea()
), respectively, for more details on these methods. Arguments can be saved in a specs
object and reused for other similar regulation processes. Functions regul.screen()
and regul.adj()
are useful to chose best time interval in the computed regular time series. If you want to work on seasonal effects in the time series, you will better use a "years" time-scale (1 unit = 1 year), or convert into such a scale. If initial time unit is "days" (1 unit = 1 day), a conversion can be operated at the same time as the regulation by specifying units="daystoyears"
.
An object of type 'regul' is returned. It has methods print()
, summary()
, plot()
, lines()
, identify()
, hist()
, extract()
and specs()
.
Fr?d?ric Ibanez ([email protected]), Philippe Grosjean ([email protected])
Lancaster, P. & K. Salkauskas, 1986. Curve and surface fitting. Academic Press, England, 280 pp.
Fox, W.T. & J.A. Brown, 1965. The use of time-trend analysis for environmental interpretation of limestones. J. Geol., 73:510-518.
Ibanez, F., 1991. Treatment of the data deriving from the COST 647 project on coastal benthic ecology: The within-site analysis. In: B. Keegan (ed). Space and Time Series Data Analysis in Coastal Benthic Ecology. Pp 5-43.
Ibanez, F. & J.C. Dauvin, 1988. Long-term changes (1977-1987) on a muddy fine sand Abra alba - Melinna palmata population community from the Western English Channel. J. Mar. Ecol. Prog. Ser., 49:65-81.
regul.screen
, regul.adj
, tseries
, is.tseries
, regconst
, reglin
, regspline
, regarea
, daystoyears
data(releve) # The series in this data frame are very irregularly sampled in time: releve$Day length(releve$Day) intervals <- releve$Day[2:61]-releve$Day[1:60] intervals range(intervals) mean(intervals) # The series must be regulated to be converted in a 'rts' or 'ts object rel.reg <- regul(releve$Day, releve[3:8], xmin=9, n=63, deltat=21, tol=1.05, methods=c("s","c","l","a","s","a"), window=21) rel.reg plot(rel.reg, 5) specs(rel.reg) # Now we can extract one or several regular time series melo.ts <- extract(rel.reg, series="Melosul") is.tseries(melo.ts) # One can convert time-scale from "days" to "years" during regulation # This is most useful for analyzing seasonal cycles in a second step melo.regy <- regul(releve$Day, releve$Melosul, xmin=6, n=87, units="daystoyears", frequency=24, tol=2.2, methods="linear", datemin="21/03/1989", dateformat="d/m/Y") melo.regy plot(melo.regy, main="Regulation of Melosul") # In this case, we have only one series in 'melo.regy' # We can use also 'tseries()' instead of 'extract()' melo.tsy <- tseries(melo.regy) is.tseries(melo.tsy)
data(releve) # The series in this data frame are very irregularly sampled in time: releve$Day length(releve$Day) intervals <- releve$Day[2:61]-releve$Day[1:60] intervals range(intervals) mean(intervals) # The series must be regulated to be converted in a 'rts' or 'ts object rel.reg <- regul(releve$Day, releve[3:8], xmin=9, n=63, deltat=21, tol=1.05, methods=c("s","c","l","a","s","a"), window=21) rel.reg plot(rel.reg, 5) specs(rel.reg) # Now we can extract one or several regular time series melo.ts <- extract(rel.reg, series="Melosul") is.tseries(melo.ts) # One can convert time-scale from "days" to "years" during regulation # This is most useful for analyzing seasonal cycles in a second step melo.regy <- regul(releve$Day, releve$Melosul, xmin=6, n=87, units="daystoyears", frequency=24, tol=2.2, methods="linear", datemin="21/03/1989", dateformat="d/m/Y") melo.regy plot(melo.regy, main="Regulation of Melosul") # In this case, we have only one series in 'melo.regy' # We can use also 'tseries()' instead of 'extract()' melo.tsy <- tseries(melo.regy) is.tseries(melo.tsy)
Calculate and plot an histogram of the distances between interpolated observations in a regulated time series and closest observations in the initial irregular time series. This allows to optimise the tol
parameter
regul.adj(x, xmin=min(x), frequency=NULL, deltat=(max(x, na.rm = TRUE) - min(x, na.rm = TRUE))/(length(x) - 1), tol=deltat, tol.type="both", nclass=50, col=c(4, 5, 2), xlab=paste("Time distance"), ylab=paste("Frequency"), main="Number of matching observations", plotit=TRUE, ...)
regul.adj(x, xmin=min(x), frequency=NULL, deltat=(max(x, na.rm = TRUE) - min(x, na.rm = TRUE))/(length(x) - 1), tol=deltat, tol.type="both", nclass=50, col=c(4, 5, 2), xlab=paste("Time distance"), ylab=paste("Frequency"), main="Number of matching observations", plotit=TRUE, ...)
x |
a vector with times corresponding to the observations in the irregular initial time series |
xmin |
the time corresponding to the first observation in the regular time series |
frequency |
the frequency of observations in the regular time series |
deltat |
the interval between two successive observations in the regular time series. This is the inverse of |
tol |
the tolerance in the difference between two matching observations (in the original irregular series and in the regulated series). If |
tol.type |
the type of window to use for the time-tolerance: |
nclass |
the number of classes to compute in the histogram. This is indicative, and will be adjusted by the algorithm to produce a nicely-formatted histogram. The default value is |
col |
the three colors to use to represent respectively the fist bar (exact coincidence), the middle bars (coincidence in a certain tolerance window) and the last bar (values always interpolated). By default, |
xlab |
the label of the x-axis |
ylab |
the label of the y-axis |
main |
the main title of the graph |
plotit |
if |
... |
additional graph parameters for the histogram |
This function is complementary to regul.screen()
. While the later look for the best combination of the number of observations, the interval between observations and the position of the first observation on the time-scale for the regular time series, regul.adj()
look for the optimal value for tol
, the tolerance window.
A list with components:
params |
the parameters used for the regular time-scale |
match |
the number of matching observations in the tolerance window |
exact.match |
the number of exact matching observations |
match.counts |
a vector with the number of matching observations for increasing values of |
Philippe Grosjean ([email protected]), Frédéric Ibanez ([email protected])
# This example follows the example for regul.screen() # where we determined that xmin=9, deltat=21, n=63, with tol=1.05 # is a good choice to regulate the irregular time series in 'releve' data(releve) regul.adj(releve$Day, xmin=9, deltat=21) # The histogram indicates that it is not useful to increase tol # more than 1.05, because few observations will be added # except if we increase it to 5-7, but this value could be # considered to be too large in comparison with deltat=22 # On the other hand, with tol <= 1, the number of matching # observations will be almost divided by two!
# This example follows the example for regul.screen() # where we determined that xmin=9, deltat=21, n=63, with tol=1.05 # is a good choice to regulate the irregular time series in 'releve' data(releve) regul.adj(releve$Day, xmin=9, deltat=21) # The histogram indicates that it is not useful to increase tol # more than 1.05, because few observations will be added # except if we increase it to 5-7, but this value could be # considered to be too large in comparison with deltat=22 # On the other hand, with tol <= 1, the number of matching # observations will be almost divided by two!
Seek for the best combination of the number of observation, the interval between two successive observation and the position of the first observation in the regulated time series to match as much observations of the initial series as possible
regul.screen(x, weight=NULL, xmin=min(x), frequency=NULL, deltat=(max(x, na.rm = TRUE) - min(x, na.rm = TRUE))/(length(x) - 1), tol=deltat/5, tol.type="both")
regul.screen(x, weight=NULL, xmin=min(x), frequency=NULL, deltat=(max(x, na.rm = TRUE) - min(x, na.rm = TRUE))/(length(x) - 1), tol=deltat/5, tol.type="both")
x |
a vector with times corresponding to the observations in the irregular initial time series |
weight |
a vector of the same length as |
xmin |
a vector with all time values for the first observation in the regulated time series to be tested |
frequency |
a vector with all the frequencies to be screened |
deltat |
a vector with all time intervals to screen. |
tol |
it is possible to tolerate some differences in the time between two matching observations (in the original irregular series and in the regulated series). If |
tol.type |
the type of window to use for the time-tolerance: |
Whatever the efficiency of the interpolation procedure used to regulate an irregular time series, a matching, non-interpolated observation is always better than an interpolated one! With very irregular time series, it is often difficult to decide which is the better regular time-scale in order to interpolate as less observations as possible. regul.screen()
tests various combinations of number of observation, interval between two observations and position of the first observation and allows to choose the combination that best matches the original irregular time series. To choose also an optimal value for tol
, use regul.adj()
concurrently.
A list containing:
tol |
a vector with the adjusted values of |
n |
a table indicating the maximum value of |
nbr.match |
a table indicating the number of matching observations (in the tolerance window) for all combinations of |
nbr.exact.match |
a table indicating the number of exactly matching observations (with a tolerance window equal to zero) for all combinations of |
Philippe Grosjean ([email protected]), Frédéric Ibanez ([email protected])
data(releve) # This series is very irregular, and it is difficult # to choose the best regular time-scale releve$Day length(releve$Day) intervals <- releve$Day[2:61]-releve$Day[1:60] intervals range(intervals) mean(intervals) # A combination of xmin=1, deltat=22 and n=61 seems correct # But is it the best one? regul.screen(releve$Day, xmin=0:11, deltat=16:27, tol=1.05) # Now we can tell that xmin=9, deltat=21, n=63, with tol=1.05 # is a much better choice!
data(releve) # This series is very irregular, and it is difficult # to choose the best regular time-scale releve$Day length(releve$Day) intervals <- releve$Day[2:61]-releve$Day[1:60] intervals range(intervals) mean(intervals) # A combination of xmin=1, deltat=22 and n=61 seems correct # But is it the best one? regul.screen(releve$Day, xmin=0:11, deltat=16:27, tol=1.05) # Now we can tell that xmin=9, deltat=21, n=63, with tol=1.05 # is a much better choice!
The releve
data frame has 61 rows and 8 columns. Several phytoplankton taxa were numbered in a single station from 1989 to 1992, but at irregular times.
data(releve)
data(releve)
This data frame contains the following columns:
days number, first observation being day 1
strings indicating the date of the observations in "dd/mm/yyyy" format
the abundance of Asterionella glacialis
the abundance of Chaetoceros sp
the abundance of Ditylum sp
the abundance of Gymnodinium sp
the abundance of Melosira sulcata + Paralia sulcata
the abundance of Navicula sp
Belin, C. & B. Raffin, 1998. Les espèces phytoplanctoniques toxiques et nuisibles sur le littoral français de 1984 à 1995, résultats du REPHY (réseau de surveillance du phytoplancton et des phycotoxines). Rapport IFREMER, RST.DEL/MP-AO-98-16. IFREMER, France.
‘specs’ is a generic function for reusing specifications included in an object and applying them in another similar analysis
specs(x, ...)
specs(x, ...)
x |
An object that has "specs" data |
... |
Additional arguments (redefinition of one or several parameters) |
A ‘specs’ object that has the print
method and that can be entered as an argument to functions using similar "specifications"
Compute a table giving various descriptive statistics about the series in a data frame or in a single/multiple time series
stat.desc(x, basic=TRUE, desc=TRUE, norm=FALSE, p=0.95)
stat.desc(x, basic=TRUE, desc=TRUE, norm=FALSE, p=0.95)
x |
a data frame or a time series |
basic |
do we have to return basic statistics (by default, it is TRUE)? These are: the number of values (nbr.val), the number of null values (nbr.null), the number of missing values (nbr.na), the minimal value (min), the maximal value (max), the range (range, that is, max-min) and the sum of all non-missing values (sum) |
desc |
do we have to return various descriptive statistics (by default, it is TRUE)? These are: the median (median), the mean (mean), the standard error on the mean (SE.mean), the confidence interval of the mean (CI.mean) at the |
norm |
do we have to return normal distribution statistics (by default, it is FALSE)? the skewness coefficient g1 (skewness), its significant criterium (skew.2SE, that is, g1/2.SEg1; if skew.2SE > 1, then skewness is significantly different than zero), kurtosis coefficient g2 (kurtosis), its significant criterium (kurt.2SE, same remark than for skew.2SE), the statistic of a Shapiro-Wilk test of normality (normtest.W) and its associated probability (normtest.p) |
p |
the probability level to use to calculate the confidence interval on the mean (CI.mean). By default, |
a data frame with the various statistics in rows and with each column correponding to a variable in the data frame, or to a separate time series
The Shapiro-Wilk test of normality is not available yet in Splus and it returns 'NA' in this environment. If you prefer to get separate statistics for various time intervals in your time series, use stat.slide()
. If your data are fish or plankton sampled with a net, consider using the Pennington statistics (see stat.pen()
)
Frédéric Ibanez ([email protected]), Philippe Grosjean ([email protected])
data(marbio) stat.desc(marbio[,13:16], basic=TRUE, desc=TRUE, norm=TRUE, p=0.95)
data(marbio) stat.desc(marbio[,13:16], basic=TRUE, desc=TRUE, norm=TRUE, p=0.95)
Compute a table giving various descriptive statistics, including Pennington's estimators of the mean, the variance and the variance of the mean, about the series in a data frame or in a single/multiple time series
stat.pen(x, basic=FALSE, desc=FALSE)
stat.pen(x, basic=FALSE, desc=FALSE)
x |
a data frame or a time series |
basic |
do we have to return also basic statistics (by default, it is FALSE)? These are: the number of values (nbr.val), the number of null values (nbr.null), the number of missing values (nbr.na), the minimal value (min), the maximal value (max), the range (range, that is, max-min) and the sum of all non-missing values (sum) |
desc |
do we have to return also various descriptive statistics (by default, it is FALSE)? These are: the median (median), the mean (mean), the standard error on the mean (SE.mean), the confidence interval of the mean (CI.mean) at the |
a data frame with the various statistics in rows and with each column correponding to a variable in the data frame, or to a separate time series
If you prefer to get separate statistics for various time intervals in your time series, use stat.slide()
. Various other descriptive statistics, including test of the normal distribution are also available in stat.desc()
Frédéric Ibanez ([email protected]), Philippe Grosjean ([email protected])
Aitchison, J., 1955. On the distribution of a positive random variable having a discrete probability mass at the origin. J. Amer. Stat. Ass., 50:901-908.
Pennington, M., 1983. Efficient estimations of abundance for fish and plankton surveys. Biometrics, 39:281-286.
data(marbio) stat.pen(marbio[,c(4, 14:16)], basic=TRUE, desc=TRUE)
data(marbio) stat.pen(marbio[,c(4, 14:16)], basic=TRUE, desc=TRUE)
Statistical parameters are not constant along a time series: mean or variance can vary each year, or during particular intervals (radical or smooth changes due to a pollution, a very cold winter, a shift in the system behaviour, etc. Sliding statistics offer the potential to describe series on successive blocs defined along the space-time axis
stat.slide(x, y, xcut=NULL, xmin=min(x), n=NULL, frequency=NULL, deltat=1/frequency, basic=FALSE, desc=FALSE, norm=FALSE, pen=FALSE, p=0.95) ## S3 method for class 'stat.slide' print(x, ...) ## S3 method for class 'stat.slide' plot(x, stat="mean", col=c(1, 2), lty=c(par("lty"), par("lty")), leg=FALSE, llab=c("series", stat), lpos=c(1.5, 10), xlab="time", ylab="y", main=paste("Sliding statistics"), ...) ## S3 method for class 'stat.slide' lines(x, stat="mean", col=3, lty=1, ...)
stat.slide(x, y, xcut=NULL, xmin=min(x), n=NULL, frequency=NULL, deltat=1/frequency, basic=FALSE, desc=FALSE, norm=FALSE, pen=FALSE, p=0.95) ## S3 method for class 'stat.slide' print(x, ...) ## S3 method for class 'stat.slide' plot(x, stat="mean", col=c(1, 2), lty=c(par("lty"), par("lty")), leg=FALSE, llab=c("series", stat), lpos=c(1.5, 10), xlab="time", ylab="y", main=paste("Sliding statistics"), ...) ## S3 method for class 'stat.slide' lines(x, stat="mean", col=3, lty=1, ...)
x |
a vector with time data for |
y |
a vector with observation at corresponding times |
xcut |
a vector with the position in time of the breaks between successive blocs. |
xmin |
the minimal value in the time-scale to use for constructing a vector of equally spaced breaks |
n |
the number of breaks to use |
frequency |
the frequency of the breaks in the time-scale |
deltat |
the bloc interval touse for constructing an equally-spaced break vector. |
basic |
do we have to return basic statistics (by default, it is FALSE)? These are: the number of values (nbr.val), the number of null values (nbr.null), the number of missing values (nbr.na), the minimal value (min), the maximal value (max), the range (range, that is, max-min) and the sum of all non-missing values (sum) |
desc |
do we have to return descriptive statistics (by default, it is FALSE)? These are: the median (median), the mean (mean), the standard error on the mean (SE.mean), the confidence interval of the mean (CI.mean) at the |
norm |
do we have to return normal distribution statistics (by default, it is FALSE)? the skewness coefficient g1 (skewness), its significant criterium (skew.2SE, that is, g1/2.SEg1; if skew.2SE > 1, then skewness is significantly different than zero), kurtosis coefficient g2 (kurtosis), its significant criterium (kurt.2SE, same remark than for skew.2SE), the statistic of a Shapiro-Wilk test of normality (normtest.W) and its associated probability (normtest.p) |
pen |
do we have to return Pennington and other associated statistics (by default, it is FALSE)? pos.median, pos.mean, pos.var, pos.std.dev, respectively the median, the mean, the standard deviation and the variance, considering only non-null values; geo.mean, the geometric mean that is, the exponential of the mean of the logarithm of the observations, excluding null values. pen.mean, pen.var, pen.std.dev, pen.mean.var, respectively the mean, the variance, the standard deviation and the variance of the mean after Pennington's estimators (see |
p |
the probability level to use to calculate the confidence interval on the mean (CI.mean). By default, |
stat |
the statistic to plot on the graph. You can use "min", "max", "median", "mean" (by default), "pos.median", "pos.mean", "geo.mean" and "pen.mean". The other statistics cannot be superposed on the graph of the series in the current version of the function |
col |
the colors to use to plot the initial series and the statistics, respectively. By default, |
lty |
the style to use to draw the original series and the statistics. The default style is used if this argument is not provided |
leg |
if |
llab |
the labels to use for the legend. By default, it is "series" and the corresponding statistics provided in |
lpos |
the position of the top-left corner (x,y) of the legend box in the graph coordinates. By default |
xlab |
the label of the x-axis |
ylab |
the label of the y-axis |
main |
the main title of the graph |
... |
additional parameters |
Available statistics are the same as for stat.desc()
and stat.pen()
. The Shapiro-Wilk test of normality is not available yet in Splus and it returns 'NA' in this environment. If not a priori known, successive blocs can be identified using either local.trend()
or decmedian()
(see respective functions for further details)
An object of type 'stat.slide' is returned. It has methods print()
, plot()
and lines()
.
Frédéric Ibanez ([email protected]), Philippe Grosjean ([email protected])
stat.desc
, stat.pen
, pennington
, local.trend
, decmedian
data(marbio) # Sliding statistics with fixed-length blocs statsl <- stat.slide(1:68, marbio[, "ClausocalanusA"], xmin=0, n=7, deltat=10) statsl plot(statsl, stat="mean", leg=TRUE, lpos=c(55, 2500), xlab="Station", ylab="ClausocalanusA") # More information on the series, with predefined blocs statsl2 <- stat.slide(1:68, marbio[, "ClausocalanusA"], xcut=c(0, 17, 25, 30, 41, 46, 70), basic=TRUE, desc=TRUE, norm=TRUE, pen=TRUE, p=0.95) statsl2 plot(statsl2, stat="median", xlab="Stations", ylab="Counts", main="Clausocalanus A") # Median lines(statsl2, stat="min") # Minimum lines(statsl2, stat="max") # Maximum lines(c(17, 17), c(-50, 2600), col=4, lty=2) # Cuts lines(c(25, 25), c(-50, 2600), col=4, lty=2) lines(c(30, 30), c(-50, 2600), col=4, lty=2) lines(c(41, 41), c(-50, 2600), col=4, lty=2) lines(c(46, 46), c(-50, 2600), col=4, lty=2) text(c(8.5, 21, 27.5, 35, 43.5, 57), 2300, labels=c("Peripheral Zone", "D1", "C", "Front", "D2", "Central Zone")) # Labels legend(0, 1900, c("series", "median", "range"), col=1:3, lty=1) # Get cuts back from the object statsl2$xcut
data(marbio) # Sliding statistics with fixed-length blocs statsl <- stat.slide(1:68, marbio[, "ClausocalanusA"], xmin=0, n=7, deltat=10) statsl plot(statsl, stat="mean", leg=TRUE, lpos=c(55, 2500), xlab="Station", ylab="ClausocalanusA") # More information on the series, with predefined blocs statsl2 <- stat.slide(1:68, marbio[, "ClausocalanusA"], xcut=c(0, 17, 25, 30, 41, 46, 70), basic=TRUE, desc=TRUE, norm=TRUE, pen=TRUE, p=0.95) statsl2 plot(statsl2, stat="median", xlab="Stations", ylab="Counts", main="Clausocalanus A") # Median lines(statsl2, stat="min") # Minimum lines(statsl2, stat="max") # Maximum lines(c(17, 17), c(-50, 2600), col=4, lty=2) # Cuts lines(c(25, 25), c(-50, 2600), col=4, lty=2) lines(c(30, 30), c(-50, 2600), col=4, lty=2) lines(c(41, 41), c(-50, 2600), col=4, lty=2) lines(c(46, 46), c(-50, 2600), col=4, lty=2) text(c(8.5, 21, 27.5, 35, 43.5, 57), 2300, labels=c("Peripheral Zone", "D1", "C", "Front", "D2", "Central Zone")) # Labels legend(0, 1900, c("series", "median", "range"), col=1:3, lty=1) # Get cuts back from the object statsl2$xcut
Test if the series has an increasing or decreasing trend, using a non-parametric Spearman test between the observations and time
trend.test(tseries, R=1)
trend.test(tseries, R=1)
tseries |
a univariate or multivariate time series (a 'rts' object in Splus or a 'ts' object in R) |
R |
The number of time the series is/are resampled for a bootstrap test. If |
A 'htest' object if R=1
, a 'boot' object with an added boot$p.value
item otherwise
In both cases (normal test with R=1
and bootstrap test), the p-value can be obtained from obj$p.value
(see examples)
Frédéric Ibanez ([email protected]), Philippe Grosjean ([email protected])
Siegel, S. & N.J. Castellan, 1988. Non-parametric statistics. McGraw-Hill, New York. 399 pp.
data(marbio) trend.test(marbio[, 8]) # Run a bootstrap test on the same series marbio8.trend.test <- trend.test(marbio[, 8], R=99) # R=999 is a better value... but it is very slow! marbio8.trend.test plot(marbio8.trend.test) marbio8.trend.test$p.value
data(marbio) trend.test(marbio[, 8]) # Run a bootstrap test on the same series marbio8.trend.test <- trend.test(marbio[, 8], R=99) # R=999 is a better value... but it is very slow! marbio8.trend.test plot(marbio8.trend.test) marbio8.trend.test$p.value
Use a decomposition method to split the series into two or more components. Decomposition methods are either series filtering/smoothing (difference, average, median, evf), deseasoning (loess) or model-based decomposition (reg, i.e., regression).
tsd(x, specs=NULL, method="loess", type=if (method == "census") "multiplicative" else "additive", lag=1, axes=1:5, order=1, times=1, sides=2, ends="fill", weights=NULL, s.window=NULL, s.degree=0, t.window=NULL, t.degree=2, robust=FALSE, trend=FALSE, xreg=NULL) ## S3 method for class 'tsd' print(x, ...) ## S3 method for class 'tsd' summary(object, ...) ## S3 method for class 'summary.tsd' print(x, ...) ## S3 method for class 'tsd' plot(x, series=1, stack=TRUE, resid=TRUE, col=par("col"), lty=par("lty"), labels=dimnames(X)[[2]], leg=TRUE, lpos=c(0, 0), xlab="time", ylab="series", main=paste("Series decomposition by", x$specs$method, "-", x$specs$type), ...) ## S3 method for class 'tsd' extract(e, n, series=NULL, components=NULL, ...) ## S3 method for class 'tsd' specs(x, ...) ## S3 method for class 'specs.tsd' print(x, ...)
tsd(x, specs=NULL, method="loess", type=if (method == "census") "multiplicative" else "additive", lag=1, axes=1:5, order=1, times=1, sides=2, ends="fill", weights=NULL, s.window=NULL, s.degree=0, t.window=NULL, t.degree=2, robust=FALSE, trend=FALSE, xreg=NULL) ## S3 method for class 'tsd' print(x, ...) ## S3 method for class 'tsd' summary(object, ...) ## S3 method for class 'summary.tsd' print(x, ...) ## S3 method for class 'tsd' plot(x, series=1, stack=TRUE, resid=TRUE, col=par("col"), lty=par("lty"), labels=dimnames(X)[[2]], leg=TRUE, lpos=c(0, 0), xlab="time", ylab="series", main=paste("Series decomposition by", x$specs$method, "-", x$specs$type), ...) ## S3 method for class 'tsd' extract(e, n, series=NULL, components=NULL, ...) ## S3 method for class 'tsd' specs(x, ...) ## S3 method for class 'specs.tsd' print(x, ...)
x |
an univariate or multivariate regular time series ('ts' object) to
be decomposed for |
specs |
specifications are collected from a 'tsd' object, using the
|
method |
the method to use to decompose the time series. Currently,
possible values are: |
type |
the type of model to use: either |
lag |
The lag between the two observations used to calculate differences.
By default, |
axes |
the number of axes to show in the plot |
order |
(1) for the method 'difference': the order of the difference
corresponds to the number of times it is applied, by default |
times |
The number of times to apply the method (by default, once) |
sides |
If 2 (by default), the window is centered around the current observation. If 1, the window is at left of the current observation (including it) |
ends |
either "NAs" (fill first and last values that are not calculable with NAs), or "fill" (fill them with the average of observations before applying the filter, by default), or "circular" (use last values for estimating first ones and vice versa), or "periodic" (use entire periods of contiguous cycles, deseasoning) |
weights |
a vector indicating weight to give to all observations in the
window. This argument has the priority over |
s.window |
the width of the window used to extract the seasonal
component. Use an odd value equal or just larger than the number of annual
values (frequency of the time series). Use another value to extract other
cycles (circadian, lunar,...). Using |
s.degree |
the order of the polynome to use to extract the seasonal
component (0 or 1). By default |
t.window |
the width of the window to use to extract the general trend
when |
t.degree |
the order of the polynome to use to extract the general trend
(0, 1 or 2). By default |
robust |
if |
trend |
If |
xreg |
a second regular time series or a vector of the same length as
|
object |
a 'tsd' object as returned by the function |
e |
a 'tsd' object as returned by the function |
series |
(1) for |
stack |
graphs of each component are either stacked ( |
resid |
do we have to plot also the "residuals" components
( |
col |
color of the plot |
lty |
line type for the plot |
labels |
the labels to use for all y-axes in a stacked graph, or in the legend for a superposed graph. By default, the names of the components ("trend", "seasonal", "deseasoned", "filtered", "residuals", ...) are used |
leg |
only used when |
lpos |
position of the upper-left corner of the legend box in the graph
coordinates (x,y). By default, |
xlab |
the label of the x-axis |
ylab |
the label of the y-axis |
main |
the main title of the graph |
n |
the number of series to extract (from series 1 to series n). By
default, n equals the number of series in the 'tsd' object. If both
|
components |
the names or indices of the components to extract. If
|
... |
(1) for |
To eliminate trend from a series, use "diff" or use "loess" with
trend=TRUE
. If you know the shape of the trend (linear, exponential,
periodic, etc.), you can also use it with the "reg" (regression) method. To
eliminate or extract seasonal components, you can use "loess" if the seasonal
component is additive, or "census" if it is multiplicative. You can also use
"average" with argument order="periodic"
and with either an additive or
a multiplicative model, although the later method is often less powerful than
"loess" or "census". If you want to extract a seasonal cycle with a given
shape (for instance, a sinusoid), use the "reg" method with a fitted
sinusoidal equation. If you want to identify levels in the series, use the
"median" method. To smooth the series, you can use preferably the "evf"
(eigenvector filtering), or the "average" methods, but you can also use
"median". To extract most important components from the series (no matter if
they are cycles -seasonal or not-, or long-term trends), you should use the
"evf" method. For more information on each of these methods, see online help
of the corresponding decXXXX()
functions.
An object of type 'tsd' is returned. It has methods print()
,
summary()
, plot()
, extract()
and specs()
.
If you have to decompose a single time series, you could also use the
corresponding decXXXX()
function directly. In the case of a multivariate
regular time series, tsd()
is more convenient because it decompose all
times series of a set at once!
Frédéric Ibanez ([email protected]), Philippe Grosjean ([email protected])
Kendall, M., 1976. Time-series. Charles Griffin & Co Ltd. 197 pp.
Laloire, J.C., 1972. Méthodes du traitement des chroniques. Dunod, Paris, 194 pp.
Legendre, L. & P. Legendre, 1984. Ecologie numérique. Tome 2: La structure des données écologiques. Masson, Paris. 335 pp.
Malinvaud, E., 1978. Méthodes statistiques de l'économétrie. Dunod, Paris. 846 pp.
Philips, L. & R. Blomme, 1973. Analyse chronologique. Université Catholique de Louvain. Vander ed. 339 pp.
tseries
, decdiff
, decaverage
,
decmedian
, decevf
, decreg
,
decloess
, deccensus
data(releve) # Regulate the series and extract them as a time series object rel.regy <- regul(releve$Day, releve[3:8], xmin=6, n=87, units="daystoyears", frequency=24, tol=2.2, methods="linear", datemin="21/03/1989", dateformat="d/m/Y") rel.ts <- tseries(rel.regy) # Decompose all series in the set with the "loess" method rel.dec <- tsd(rel.ts, method="loess", s.window=13, trend=FALSE) rel.dec plot(rel.dec, series=5, col=1:3) # An plot series 5 # Extract "deseasoned" components rel.des <- extract(rel.dec, series=3:6, components="deseasoned") rel.des[1:10,] # Further decompose these components with a moving average rel.des.dec <- tsd(rel.des, method="average", order=2, times=10) plot(rel.des.dec, series=3, col=c(2, 4, 6)) # In this case, a superposed graph is more appropriate: plot(rel.des.dec, series=3, col=c(2,4), stack=FALSE, resid=FALSE, labels=c("without season cycle", "trend"), lpos=c(0, 55000)) # Extract residuals from the latter decomposition rel.res2 <- extract(rel.des.dec, components="residuals")
data(releve) # Regulate the series and extract them as a time series object rel.regy <- regul(releve$Day, releve[3:8], xmin=6, n=87, units="daystoyears", frequency=24, tol=2.2, methods="linear", datemin="21/03/1989", dateformat="d/m/Y") rel.ts <- tseries(rel.regy) # Decompose all series in the set with the "loess" method rel.dec <- tsd(rel.ts, method="loess", s.window=13, trend=FALSE) rel.dec plot(rel.dec, series=5, col=1:3) # An plot series 5 # Extract "deseasoned" components rel.des <- extract(rel.dec, series=3:6, components="deseasoned") rel.des[1:10,] # Further decompose these components with a moving average rel.des.dec <- tsd(rel.des, method="average", order=2, times=10) plot(rel.des.dec, series=3, col=c(2, 4, 6)) # In this case, a superposed graph is more appropriate: plot(rel.des.dec, series=3, col=c(2,4), stack=FALSE, resid=FALSE, labels=c("without season cycle", "trend"), lpos=c(0, 55000)) # Extract residuals from the latter decomposition rel.res2 <- extract(rel.des.dec, components="residuals")
Regulated series contained in a 'regul' object or components issued from a time series decomposition with 'tsd' are extracted from their respective object and converted into uni- or multivariate regular time series ('rts' objects in Splus and 'ts' objects in R)
tseries(x)
tseries(x)
x |
A 'regul' or 'tsd' object |
an uni- or multivariate regular time series
To extract some of the time series contained in the 'regul' or 'tsd' objects, use the extract()
method
Philippe Grosjean ([email protected]), Frédéric Ibanez ([email protected])
data(releve) rel.regy <- regul(releve$Day, releve[3:8], xmin=6, n=87, units="daystoyears", frequency=24, tol=2.2, methods="linear", datemin="21/03/1989", dateformat="d/m/Y") # This object is not a time series is.tseries(rel.regy) # FALSE # Extract all time series contained in the 'regul' object rel.ts <- tseries(rel.regy) # Now this is a time series is.tseries(rel.ts) # TRUE
data(releve) rel.regy <- regul(releve$Day, releve[3:8], xmin=6, n=87, units="daystoyears", frequency=24, tol=2.2, methods="linear", datemin="21/03/1989", dateformat="d/m/Y") # This object is not a time series is.tseries(rel.regy) # FALSE # Extract all time series contained in the 'regul' object rel.ts <- tseries(rel.regy) # Now this is a time series is.tseries(rel.ts) # TRUE
The turnogram is the variation of a monotony index with the observation scale (the number of data per time unit). A monotony index indicates if the series has more or less erratic variations than a pure random succession of independent observations. Since a time series almost always has autocorrelation, it is expected to be more monotonous than a purely random series. The monotony index is a way to quantify the density of information beared by a time series. The turnogram determines at which observation scale this density of information is maximum. It is also the scale that optimize the sampling effort (best compromise between less samples versus more information).
turnogram(series, intervals=c(1, length(series)/5), step=1, complete=FALSE, two.tailed=TRUE, FUN=mean, plotit=TRUE, level=0.05, lhorz=TRUE, lvert=FALSE, xlog=TRUE) ## S3 method for class 'turnogram' print(x, ...) ## S3 method for class 'turnogram' summary(object, ...) ## S3 method for class 'summary.turnogram' print(x, ...) ## S3 method for class 'turnogram' plot(x, level=0.05, lhorz=TRUE, lvert=TRUE, lcol=2, llty=2, xlog=TRUE, xlab=paste("interval (", x$units.text, ")", sep=""), ylab="I (bits)", main=paste(x$type, "turnogram for:", x$data), sub=paste(x$fun, "/", x$proba), ...) ## S3 method for class 'turnogram' identify(x, lvert=TRUE, col=2, lty=2, ...) ## S3 method for class 'turnogram' extract(e, n, level=e$level, FUN=e$fun, drop=0, ...)
turnogram(series, intervals=c(1, length(series)/5), step=1, complete=FALSE, two.tailed=TRUE, FUN=mean, plotit=TRUE, level=0.05, lhorz=TRUE, lvert=FALSE, xlog=TRUE) ## S3 method for class 'turnogram' print(x, ...) ## S3 method for class 'turnogram' summary(object, ...) ## S3 method for class 'summary.turnogram' print(x, ...) ## S3 method for class 'turnogram' plot(x, level=0.05, lhorz=TRUE, lvert=TRUE, lcol=2, llty=2, xlog=TRUE, xlab=paste("interval (", x$units.text, ")", sep=""), ylab="I (bits)", main=paste(x$type, "turnogram for:", x$data), sub=paste(x$fun, "/", x$proba), ...) ## S3 method for class 'turnogram' identify(x, lvert=TRUE, col=2, lty=2, ...) ## S3 method for class 'turnogram' extract(e, n, level=e$level, FUN=e$fun, drop=0, ...)
series |
a single regular time series ('rts' object in Splus or 'ts' object in R) |
intervals |
the range (mini, maxi) of the intervals to calculate, i.e., to take one obervation every 'interval' one. By default, |
x |
a 'turnogram' object |
object |
a 'turnogram' object |
e |
a 'turnogram' object |
step |
the increment used for the intervals. By defaults |
complete |
if |
two.tailed |
if |
FUN |
a function to apply to aggregate data in the intervals. It is a function of the type |
plotit |
if |
level |
the significant level to draw on the graph. By default |
lhorz |
if |
lvert |
if |
lcol |
the color to use to draw supplemental lines: the horizontal line indicating where the test is significant (if |
llty |
the style for the supplemental lines. By default, style 2 is used (dashed lines) |
xlog |
if |
xlab |
the label of the x-axis |
ylab |
the label of the y-axis |
main |
the main title of the graph |
sub |
the subtitle of the graph |
col |
color to use for identified items |
lty |
line type to use for identified items |
... |
additional optional graphic arguments |
n |
the number of observations to take into account in the initial series. Use |
drop |
the number of observations to drop at the beginning of the series before proceeding with the aggregation of the data for the extracted series. By default, |
The turnogram is a generalisation of the information theory (see turnpoints()
). If a series has a lot of erratic peaks and pits that alternate with a high frequency, it is more difficult to interpret than a more monotonous series. These erratic fluctuations can be eliminated by changing the scale of observation (keeping one observation every two, three, four,... from the original series). The turnogram resample the original series this way, and calculate a monotony index for each resampled subseries. This monotony index quantifies the number of peaks and pits presents in the series, compared to the total number of observations. The Gleissberg distribution (see pgleissberg()
) indicates the probability to have such a number of extrema in a series given it is purely random. It is possible to test monotony indices: is it a random series or not (two-sided test), or is more monotonous than a random series (left-sided test) thanks to a Chi-2 test proposed by Wallis & Moore (1941).
There are various turnograms depending on the way the observations are aggregated inside each time interval. For instance, if one consider one observation every three from the original series, each group of three observations can be aggregated in several different ways. One can take the mean of the three observation, or the median value, or the sum,... One can also decide not to aggregate observations, but to drop some of them. Hence, one can take only the first or the last observation of the group. All these options can be choosen by defining the argument FUN=...
. A simple turnogram correspond to the change of the monotony index with the scale of observation, stating always from the first observation. One could also decide to start from the second, or the third observation for an aggregation of the observations three by three... and result could be somewhat different. A complete turnogram investigates all possibles combinations (observation scale versus starting point for the aggregation) and trace the maximal, minimal and mean curves for the change of the monotony index. It is thus more informative than the simple turnogram. However, it takes much more time to compute.
The most obvious use of the turnogram is for the pretreatment of continuously sampled data. It helps in deciding which is the optimal sampling interval for the series to bear as most information as possible while keeping the dataset as small as possible. It is also interesting to compare the turnogram with other functions like the variogram (see vario()
) or the spectrogram (see spectrum()
).
An object of type 'turnogram' is returned. It has methods print()
, summary()
, plot()
, identify()
and extract()
.
Frédéric Ibanez ([email protected]), Philippe Grosjean ([email protected])
Dallot, S. & M. Etienne, 1990. Une méthode non paramétrique d'analyse des series en océanographie biologique: les tournogrammes. Biométrie et océanographie - Société de biométrie, 6, Lille, 26-28 mai 1986. IFREMER, Actes de colloques, 10:13-31.
Johnson, N.L. & Kotz, S., 1969. Discrete distributions. J. Wiley & sons, New York, 328 pp.
Kendall, M.G., 1976. Time-series, 2nd ed. Charles Griffin & co, London.
Wallis, W.A. & G.H. Moore, 1941. A significance test for time series. National Bureau of Economic Research, tech. paper n°1.
pgleissberg
, turnpoints
, first
, last
, vario
, spectrum
data(bnr) # Let's transform series 4 into a time series (supposing it is regular) bnr4 <- as.ts(bnr[, 4]) plot(bnr4, type="l", main="bnr4: raw data", xlab="Time") # A simple turnogram is calculated bnr4.turno <- turnogram(bnr4) summary(bnr4.turno) # A complete turnogram confirms that "level=3" is a good value: turnogram(bnr4, complete=TRUE) # Data with maximum info. are extracted (thus taking 1 every 3 observations) bnr4.interv3 <- extract(bnr4.turno) plot(bnr4, type="l", lty=2, xlab="Time") lines(bnr4.interv3, col=2) title("Original bnr4 (dotted) versus max. info. curve (plain)") # Choose another level (for instance, 6) and extract the corresponding series bnr4.turno$level <- 6 bnr4.interv6 <- extract(bnr4.turno) # plot both extracted series on top of the original one plot(bnr4, type="l", lty=2, xlab="Time") lines(bnr4.interv3, col=2) lines(bnr4.interv6, col=3) legend(70, 580, c("original", "interval=3", "interval=6"), col=1:3, lty=c(2, 1, 1)) # It is hard to tell on the graph which series contains more information # The turnogram shows us that it is the "interval=3" one!
data(bnr) # Let's transform series 4 into a time series (supposing it is regular) bnr4 <- as.ts(bnr[, 4]) plot(bnr4, type="l", main="bnr4: raw data", xlab="Time") # A simple turnogram is calculated bnr4.turno <- turnogram(bnr4) summary(bnr4.turno) # A complete turnogram confirms that "level=3" is a good value: turnogram(bnr4, complete=TRUE) # Data with maximum info. are extracted (thus taking 1 every 3 observations) bnr4.interv3 <- extract(bnr4.turno) plot(bnr4, type="l", lty=2, xlab="Time") lines(bnr4.interv3, col=2) title("Original bnr4 (dotted) versus max. info. curve (plain)") # Choose another level (for instance, 6) and extract the corresponding series bnr4.turno$level <- 6 bnr4.interv6 <- extract(bnr4.turno) # plot both extracted series on top of the original one plot(bnr4, type="l", lty=2, xlab="Time") lines(bnr4.interv3, col=2) lines(bnr4.interv6, col=3) legend(70, 580, c("original", "interval=3", "interval=6"), col=1:3, lty=c(2, 1, 1)) # It is hard to tell on the graph which series contains more information # The turnogram shows us that it is the "interval=3" one!
Determine the number and the position of extrema (turning points, either peaks or pits) in a regular time series. Calculate the quantity of information associated to the observations in this series, according to Kendall's information theory
turnpoints(x, calc.proba = TRUE) ## S3 method for class 'turnpoints' print(x, ...) ## S3 method for class 'turnpoints' summary(object, ...) ## S3 method for class 'summary.turnpoints' print(x, ...) ## S3 method for class 'turnpoints' plot(x, level = 0.05, lhorz = TRUE, lcol = 2, llty = 2, type = "l", xlab = "data number", ylab = paste("I (bits), level = ", level * 100, "%", sep = ""), main = paste("Information (turning points) for:", x$data), ...) ## S3 method for class 'turnpoints' lines(x, max = TRUE, min = TRUE, median = TRUE, col = c(4, 4, 2), lty = c(2, 2, 1), ...) ## S3 method for class 'turnpoints' extract(e, n, no.tp = 0, peak = 1, pit = -1, ...)
turnpoints(x, calc.proba = TRUE) ## S3 method for class 'turnpoints' print(x, ...) ## S3 method for class 'turnpoints' summary(object, ...) ## S3 method for class 'summary.turnpoints' print(x, ...) ## S3 method for class 'turnpoints' plot(x, level = 0.05, lhorz = TRUE, lcol = 2, llty = 2, type = "l", xlab = "data number", ylab = paste("I (bits), level = ", level * 100, "%", sep = ""), main = paste("Information (turning points) for:", x$data), ...) ## S3 method for class 'turnpoints' lines(x, max = TRUE, min = TRUE, median = TRUE, col = c(4, 4, 2), lty = c(2, 2, 1), ...) ## S3 method for class 'turnpoints' extract(e, n, no.tp = 0, peak = 1, pit = -1, ...)
x |
a vector or a time series for |
calc.proba |
are the probabilities associated with each turning point also calculated? The default, |
object |
a 'turnpoints' object, as returned by the function |
e |
a 'turnpoints' object, as returned by the function |
level |
the significant level to draw on the graph if |
lhorz |
if |
lcol |
the color to use to draw the significant level line, by default, color 2 is used |
llty |
the style to use for the significant level line. By default, style 2 is used (dashed line) |
type |
the type of plot, as usual meaning for this graph parameter |
xlab |
the label of the x-axis |
ylab |
the label of the y-axis |
main |
the main title of the graph |
max |
do we plot the maximum envelope line (by default, yes) |
min |
do we plot the minimum envelope line (by default, yes) |
median |
do we plot the median line inside the envelope (by default, yes) |
col |
a vector of three values for the color of the max, min, median lines, respectively. By default |
lty |
a vector of three values for the style of the max, min, median lines, respectively. By default |
n |
the number of points to extract. By default |
no.tp |
extract gives a vector representing the position of extrema in the original series. |
peak |
the code to use to flag a peak, by default '1' |
pit |
the code to use to flag a pit, by default '-1' |
... |
Additional parameters |
This function tests if the time series is purely random or not. Kendall (1976) proposed a series of tests for this. Moreover, graphical methods using the position of the turning points to draw automatically envelopes around the data are implemented, and also the drawing of median points between these envelopes.
With a purely random time series, one expect to find, on average, a turning point (peak or pit that is, an observation that is preceeded and followed by, respectively, lower or higher observations) every 1.5 observation. Given it is impossible to determine if first and last observation are turning point, it gives:
with p, the number of observed turning points and n the number of observations. The variance of p is:
Ibanez (1982) demonstrated that P(t), the probability to observe a turning point at time t is:
where P is the probability to observe a turning point at time t under the null hypothesis that the time series is purely random, and thus, the distribution of turning points follows a normal distribution.
The quantity of information I associated with this probability is:
It can be interpreted as follows. If I is larger, there are less turning points than expected in a purely random series. There are, thus, longer sequence of increasing or decreasing values along the time scale. This is considered to be more informative.
As you can easily imagine, from this point on, it is straightforward to construct a test to determine if the series is random (regarding the distribution of the turning points), more or less monotonic (more or less turning points than expected).
An object of type 'turnpoints' is returned. It has methods print()
, summary()
, plot()
, lines()
and extract()
.
Regarding your specific question, 'info' is the quantity of information I associated with the turning points:
data |
The dataset to which the calculation is done |
n |
The number of observations |
points |
The value of the points in the series, after elimination of ex-aequos |
pos |
The position of the points on the time scale in the series (including ex-aequos) |
exaequos |
Location of exaequos (1), or not (0) |
nturns |
Total number of tunring points in the whole time series |
firstispeak |
Is the first turning point a peak ( |
peaks |
Logical vector. Location of the peaks in the time series without ex-aequos |
pits |
Logical vector. Location of the pits in the time series without ex-aequos |
tppos |
Position of the turning points in the initial series (with ex-aequos) |
proba |
Probability to find a turning point at this location (see details) |
info |
Quantity of information associated with this point (see details) |
the lines()
method should be used to draw lines on the graph of the original dataset (plot(data, type="l")
for instance), not on the graph of turning points (plot(turnp)
)!
Frederic Ibanez ([email protected]), Philippe Grosjean ([email protected])
Ibanez, F., 1982. Sur une nouvelle application de la theorie de l'information a la description des series chronologiques planctoniques. J. Exp. Mar. Biol. Ecol., 4:619-632
Kendall, M.G., 1976. Time-series, 2nd ed. Charles Griffin & Co, London.
data(marbio) plot(marbio[, "Nauplii"], type = "l") # Calculate turning points for this series Nauplii.tp <- turnpoints(marbio[, "Nauplii"]) summary(Nauplii.tp) plot(Nauplii.tp) # Add envelope and median line to original data plot(marbio[, "Nauplii"], type = "l") lines(Nauplii.tp) # Note that lines() applies to the graph of original dataset title("Raw data, envelope maxi., mini. and median lines")
data(marbio) plot(marbio[, "Nauplii"], type = "l") # Calculate turning points for this series Nauplii.tp <- turnpoints(marbio[, "Nauplii"]) summary(Nauplii.tp) plot(Nauplii.tp) # Add envelope and median line to original data plot(marbio[, "Nauplii"], type = "l") lines(Nauplii.tp) # Note that lines() applies to the graph of original dataset title("Raw data, envelope maxi., mini. and median lines")
Compute a classical semi-variogram for a single regular time series
vario(x, max.dist=length(x)/3, plotit=TRUE, vario.data=NULL)
vario(x, max.dist=length(x)/3, plotit=TRUE, vario.data=NULL)
x |
a vector or an univariate time series |
max.dist |
the maximum distance to calculate. By default, it is the third of the number of observations |
plotit |
If |
vario.data |
data coming from a previous call to |
A data frame containing distance and semi-variogram values
Frédéric Ibanez ([email protected]), Philippe Grosjean ([email protected])
David, M., 1977. Developments in geomathematics. Tome 2: Geostatistical or reserve estimation. Elsevier Scientific, Amsterdam. 364 pp.
Delhomme, J.P., 1978. Applications de la théorie des variables régionalisées dans les sciences de l'eau. Bull. BRGM, section 3 n°4:341-375.
Matheron, G., 1971. La théorie des variables régionalisées et ses applications. Cahiers du Centre de Morphologie Mathématique de Fontainebleau. Fasc. 5 ENSMP, Paris. 212 pp.
data(bnr) vario(bnr[, 4])
data(bnr) vario(bnr[, 4])