Basic usage, report generation

The basic API encapsulates all the functionality of the package in a single function, archeospec::genReport. It allows different configuration options to generate the combination of outputs carried out in the study.

Data loading

Data must be provided with an absolute path to directory or tree of directories containing .asd files. If you need to configure the data format you can do so by appliying the same parametrization from the load_signatures_files files.

For example:

library(archeospec)
# Absolute directory path containing .asd files and/or folders with more .asd files

# In windows 10
path <- "C:/Users/(username)/Documents/data/signatures"

# In linux
path <- "/home/(username)/data/signatures"

Sample data included with the package

We provide sample .asd files with the package, you can find it in the package’s root folder:

library(archeospec)

path <- paste(path.package("archeospec"), "/extdata", sep="")

Experimental approaches

You may choose between different approaches for unmixing and clustering as explored in the original study:

Unmixing via VCA algorithm
Unmixing specifiying the endmembers
Kmeans clustering
Fixed centroid clustering (selects the endmembers as centroids)

Unmixing can be controlled via the endmembers argument, a numeric value will trigger the VCA algorithm and a list of characters values will trigger the manual unmixing, notice that the files must be existing signatures in the data.

Clustering can be controlled via the kmeans boolean argument, a positive value will perform the algorithm while a negative one will fix the endmembers.

Example of VCA + kmeans

# A default report using VCA to compute k=3 endmembers and kmeans:
genReport(
  input_source=path,
  output="./ex1",
  endmembers=5,
  kmeans=T,
  seed=1000
)

See rendered result in PDF

See rendered result in HTML

Example of fixed endmembers and clusters

If manual endmembers are selected you may change the displayed name and color for a more clear identification and comparison. This is done via the endmember_names and endmember_colors property.

# A default report usingfixed endmembers and clusters:
genReport(
  input_source=path,
  output="./ex2",
  endmembers=c("signature0000.asd", "signature0001.asd", "signature0002.asd", "signature0003.asd", "signature0005.asd"),
  endmember_names=c("orange", "red", "black", "white", "base"),
  endmember_colors = c("darkorange2", "red2",  "gray15", "white", "green4"),
  kmeans=F,
  seed=1000
)

See rendered result in PDF

See rendered result in HTML

Data cleaning

We can apply a cleaning process to the data using different options. Please notice that they are disabled by default.

clean_head cuts the lower wavelength values until the value specified
clean_tail cuts the higher wavelength values from the value specified
clean_leaps smooth the leaps produced when using multiple sensors for different range of wavelengths, produced between sensor changes. You may pass an arbritary number of points.

# Filter wavelengths less than 400 and above 2400, and
# smooth the leaps between wavelengths 1000 and 1001, and between 1830 and 1831.
genReport(
  input_source=path,
  output="./ex3",
  endmembers=5,
  kmeans=F,
  clean_head=400,
  clean_tail=2400,
  clean_leaps=c(1001, 1831),
  seed=1000
)

See rendered result in PDF

See rendered result in HTML

Controlling the outputs

The report will always include the following plots and tables:

A brief description of the dataset (list of signatures)
The list of parameters used to create the report
The signature plot
The selected or computed endmembers, in tabular and plot formats
The elbow plot (if kmeans was performed)
The correspondence between endmembers and clusters plot
The endmember weights by cluster bar and whisker box plot
The residual values for each signature plot
The summary table for the signatures and its weights by endmembers

Additional plots are included by default, and can be hidden show using the following options:

intracorrelation=TRUE shows the intracorrelation plot. We advise to hide it if not needed as it costly to generate.
mutualinfo=TRUE shows the mutual information between clusters and wavelengths

Example of hidden plots and named/colored endmembers

# We might want to customize the report by removing plots which are not interesting for some cases.
genReport(
  input_source=path,
  output="./ex4",
  endmembers=c("signature0000.asd", "signature0001.asd", "signature0002.asd", "signature0003.asd", "signature0005.asd"),
  endmember_names=c("orange", "red", "black", "white", "base"),
  endmember_colors = c("darkorange2", "red2",  "gray15", "white", "green4"),
  kmeans=F,
  clean_head=400,
  clean_tail=2400,
  clean_leaps=c(1001, 1831),
  intracorrelation=F,
  mutualinfo=F,
  seed=1000
)

See rendered result in PDF

See rendered result in HTML

Controlling the output

genReport will produce all different outputs formats by default, if only a particular type of document is required then it can be specified with the format options, it takes values from:

html_document (and md)
pdf_document (and LaTeX)
word_document
all (generating all previous document)

# We can render a single format for extended efficiency
genReport(
  input_source=path,
  output="./ex5",
  format="pdf_document"
  endmembers=5,
  kmeans=T,
  seed=1000
)

API Docs

Please refer to the [documentation] for a comprehensive description of the genReport function

Get Started

Jacinto Arias

Basic usage, report generation

Data loading

Sample data included with the package

Experimental approaches

Example of VCA + kmeans

Example of fixed endmembers and clusters

Data cleaning

Controlling the outputs

Example of hidden plots and named/colored endmembers

Controlling the output

API Docs

Contents