Basic usage, report generation

The basic API encapsulates all the functionality of the package in a single function, archeospec::genReport. It allows different configuration options to generate the combination of outputs carried out in the study.

Data loading

Data must be provided with an absolute path to directory or tree of directories containing .asd files. If you need to configure the data format you can do so by appliying the same parametrization from the load_signatures_files files.

For example:

library(archeospec)
# Absolute directory path containing .asd files and/or folders with more .asd files

# In windows 10
path <- "C:/Users/(username)/Documents/data/signatures"

# In linux
path <- "/home/(username)/data/signatures"

Sample data included with the package

We provide sample .asd files with the package, you can find it in the package’s root folder:

library(archeospec)

path <- paste(path.package("archeospec"), "/extdata", sep="")   

Experimental approaches

You may choose between different approaches for unmixing and clustering as explored in the original study:

  • Unmixing via VCA algorithm
  • Unmixing specifiying the endmembers

  • Kmeans clustering
  • Fixed centroid clustering (selects the endmembers as centroids)

Unmixing can be controlled via the endmembers argument, a numeric value will trigger the VCA algorithm and a list of characters values will trigger the manual unmixing, notice that the files must be existing signatures in the data.

Clustering can be controlled via the kmeans boolean argument, a positive value will perform the algorithm while a negative one will fix the endmembers.

Example of VCA + kmeans

# A default report using VCA to compute k=3 endmembers and kmeans:
genReport(
  input_source=path,
  output="./ex1",
  endmembers=5,
  kmeans=T,
  seed=1000
)

See rendered result in PDF

See rendered result in HTML

Example of fixed endmembers and clusters

If manual endmembers are selected you may change the displayed name and color for a more clear identification and comparison. This is done via the endmember_names and endmember_colors property.

# A default report usingfixed endmembers and clusters:
genReport(
  input_source=path,
  output="./ex2",
  endmembers=c("signature0000.asd", "signature0001.asd", "signature0002.asd", "signature0003.asd", "signature0005.asd"),
  endmember_names=c("orange", "red", "black", "white", "base"),
  endmember_colors = c("darkorange2", "red2",  "gray15", "white", "green4"),
  kmeans=F,
  seed=1000
)

See rendered result in PDF

See rendered result in HTML

Data cleaning

We can apply a cleaning process to the data using different options. Please notice that they are disabled by default.

  • clean_head cuts the lower wavelength values until the value specified
  • clean_tail cuts the higher wavelength values from the value specified
  • clean_leaps smooth the leaps produced when using multiple sensors for different range of wavelengths, produced between sensor changes. You may pass an arbritary number of points.
# Filter wavelengths less than 400 and above 2400, and
# smooth the leaps between wavelengths 1000 and 1001, and between 1830 and 1831.
genReport(
  input_source=path,
  output="./ex3",
  endmembers=5,
  kmeans=F,
  clean_head=400,
  clean_tail=2400,
  clean_leaps=c(1001, 1831),
  seed=1000
)

See rendered result in PDF

See rendered result in HTML

Controlling the outputs

The report will always include the following plots and tables:

  • A brief description of the dataset (list of signatures)
  • The list of parameters used to create the report
  • The signature plot
  • The selected or computed endmembers, in tabular and plot formats
  • The elbow plot (if kmeans was performed)
  • The correspondence between endmembers and clusters plot
  • The endmember weights by cluster bar and whisker box plot
  • The residual values for each signature plot
  • The summary table for the signatures and its weights by endmembers

Additional plots are included by default, and can be hidden show using the following options:

  • intracorrelation=TRUE shows the intracorrelation plot. We advise to hide it if not needed as it costly to generate.
  • mutualinfo=TRUE shows the mutual information between clusters and wavelengths

Example of hidden plots and named/colored endmembers

# We might want to customize the report by removing plots which are not interesting for some cases.
genReport(
  input_source=path,
  output="./ex4",
  endmembers=c("signature0000.asd", "signature0001.asd", "signature0002.asd", "signature0003.asd", "signature0005.asd"),
  endmember_names=c("orange", "red", "black", "white", "base"),
  endmember_colors = c("darkorange2", "red2",  "gray15", "white", "green4"),
  kmeans=F,
  clean_head=400,
  clean_tail=2400,
  clean_leaps=c(1001, 1831),
  intracorrelation=F,
  mutualinfo=F,
  seed=1000
)

See rendered result in PDF

See rendered result in HTML

Controlling the output

genReport will produce all different outputs formats by default, if only a particular type of document is required then it can be specified with the format options, it takes values from:

  • html_document (and md)
  • pdf_document (and LaTeX)
  • word_document
  • all (generating all previous document)
# We can render a single format for extended efficiency
genReport(
  input_source=path,
  output="./ex5",
  format="pdf_document"
  endmembers=5,
  kmeans=T,
  seed=1000
)

API Docs

Please refer to the [documentation] for a comprehensive description of the genReport function