Tutorial¶
Setting the taxonomy¶
Niamoto does not make any assumption on the taxonomic referential to be used, and therefore let you define it. Since the choice of a taxonomic referential should not be a blocking decision either a point of no return, Niamoto always store two taxon identifier values for an occurrence: the data provider’s one, and it’s correspondence in the Niamoto’s referential. In order to be able to map a provider’s taxon identifier with an internal taxon identifier, Niamoto needs a set of correspondences for each taxon. Those correspondences are called synonyms. When no synonym is known for a provider’s taxon identifier, the Niamoto taxon identifier will be set null.
The definition of the Niamoto referential is done using a csv file. This csv file must have a header defining at least the following columns:
- id: The unique identifier of the taxon, in the provider’s referential.
- parent_id: The parent’s id of the taxon. If the taxon is a root, let the value blank.
- rank: The rank of the taxon, can be a value among: ‘REGNUM’, ‘PHYLUM’, ‘CLASSIS’, ‘ORDO’, ‘FAMILIA’, ‘GENUS’, ‘SPECIES’, ‘INFRASPECIES’.
- full_name: The full name of the taxon.
- rank_name: The rank name of the taxon.
All the additional columns will be considered as synonyms.
Let’s consider the following example:
id | parent_id | rank | full_name | rank_name | gbif | taxref |
---|---|---|---|---|---|---|
0 | FAMILIA | A | A | 10 | 1 | |
1 | 0 | GENUS | A a | a | 20 | 2 |
2 | 1 | SPECIES | A a 1 | 1 | 30 | 3 |
3 | 1 | SPECIES | A a 2 | 2 | 40 | 4 |
We set this table as the Niamoto’s taxonomic referential using the
set_taxonomy
command:
$ niamoto set_taxonomy taxonomy.csv
Setting the taxonomy...
The taxonomy had been successfully set!
4 taxa inserted
2 synonyms inserted: {'taxref', 'gbif'}
We can see that Niamoto found the two following synonym keys: ‘taxref’ and ‘gbif’. Those keys are the one that we will use later to tell Niamoto how to map the data provider’s taxon identifier. Note that there is also a special synonym key, ‘niamoto’, that is used when a data provider uses the same taxon identifiers as Niamoto.
Managing data providers¶
Now that we have set the taxonomic referential, we would like to import some data within Niamoto. But before being able to do so, we need to define data providers.
Using the command niamoto providers
, we can see that there are not
registered providers in the database:
$ niamoto providers
There are no registered data providers in the database.
The simplest data provider type implemented in Niamoto is the csv data
provider, which enables us to import occurrence and plot data from plain csv
files. All the available provider_types can be obtained using the
niamoto provider_types
command.
Adding a data provider can achieved using the niamoto add_provider
command,
which have the following usage:
$ niamoto add_provider --help
Usage: niamoto add_provider [OPTIONS] NAME PROVIDER_TYPE [SYNONYM_KEY]
Register a data provider. The name of the data provider must be unique.
The available provider types can be obtained using the 'niamoto
provider_types' command. The available synonym keys can be obtained using
the 'niamoto synonym_keys" command.
Options:
--help Show this message and exit.
Let’s add three data providers: csv_niamoto, csv_taxref and csv_gbif:
$ niamoto add_provider csv_niamoto CSV niamoto
Registering the data provider in database...
The data provider had been successfully registered to Niamoto!
$ niamoto add_provider csv_taxref CSV taxref
Registering the data provider in database...
The data provider had been successfully registered to Niamoto!
$ niamoto add_provider csv_gbif CSV gbif
Registering the data provider in database...
The data provider had been successfully registered to Niamoto!
They are now available with the niamoto providers
command:
$ niamoto providers
name provider_type synonym_key
id
2 csv_niamoto CSV niamoto
3 csv_taxref CSV taxref
4 csv_gbif CSV gbif
In the next section, we will see how to import data with these data providers.
Importing occurrence, plot and plot/occurrence data¶
Importing data using the csv data provider is done with three csv files:
- The occurrences csv file, containing the occurrence data.
- The plots csv file, containing the plot data.
- The plots/occurrences csv file, mapping plots with occurrences.
All of them are optional, you can import only occurrences, only plots or only
map existing plots with existing occurrences. The command for importing data
from a provider is niamoto sync PROVIDER_NAME [PROVIDER_ARGS]
. With the
csv data provider, three arguments are needed, corresponding to the csv files
paths:
$ niamoto sync <csv_data_provider_name> <occurrences.csv> <plots.csv> <plots_occurrences.csv>
Using 0
instead of a path means that no data is to be imported. For
instance, importing only plot data can be achieved using:
$ niamoto sync <csv_data_provider_name> 0 <plots.csv> 0
In this tutorial, we will import occurrence data for the three previously registered data providers. We will also import plot and plot/occurrence data, only for the first provider.
1. Importing occurrence data¶
The occurrences csv file must have a header and contain at least the following columns:
- id: The provider’s unique identifier for the occurrence.
- taxon_id: The provider’s taxon id for the occurrence.
- x: The longitude of the occurrence (WGS84).
- y: The latitude of the occurrence (WGS84).
All the remaining column will be stored as properties.
For the csv_niamoto
provider, let’s consider the following dataset:
id | taxon_id | x | y | dbh | height |
---|---|---|---|---|---|
0 | 3 | 165.321 | -21.47 | 21 | 18 |
1 | 2 | 165.321 | -21.47 | 20.5 | 14 |
2 | 2 | 165.321 | -21.47 | 22.5 | 16 |
3 | 3 | 165.125 | -21.54 | 18 | 12 |
4 | 3 | 165.125 | -21.54 | 19 | 18 |
5 | 2 | 162.001 | -18.11 | 11 | 15 |
6 | 2 | 162.001 | -18.11 | 24 | 20 |
7 | 2 | 162.001 | -18.11 | 25 | 22 |
For the csv_taxref
provider, let’s consider the following dataset:
id | taxon_id | x | y | status |
---|---|---|---|---|
0 | 4 | 92.321 | 42.40 | alive |
1 | 4 | 91.224 | 41.56 | alive |
2 | 4 | 91.015 | 41.11 | dead |
3 | 4 | 92.221 | 42.10 | alive |
4 | 4 | 92.221 | 42.10 | dead |
5 | 4 | 92.221 | 42.10 | alive |
6 | 4 | 92.221 | 42.10 | alive |
For the csv_gbif
provider, let’s consider the following dataset:
id | taxon_id | x | y |
---|---|---|---|
0 | 20 | 11.921 | 11.47 |
1 | 30 | 16.120 | 21.54 |
2 | 30 | 61.045 | 18.12 |
3 | 20 | 16.001 | 8.11 |
Now let’s import the data:
$ niamoto sync csv_niamoto csv_niamoto_occurrences.csv 0 0
Syncing the Niamoto database with 'csv_niamoto'...
[INFO] *** Data sync starting ('csv_niamoto' - CSV)...
[INFO] ** Occurrence sync starting ('csv_niamoto' - CSV)...
[INFO] ** Occurrence sync with 'csv_niamoto' done (0.08 s)!
[INFO] *** Data sync with 'csv_niamoto' done (total time: 0.08 s)!
The Niamoto database had been successfully synced with 'csv_niamoto'!
Bellow is a summary of what had been done:
Occurrences:
8 inserted
0 updated
0 deleted
$ niamoto sync csv_taxref csv_niamoto_taxref_occurrences.csv 0 0
Syncing the Niamoto database with 'csv_taxref'...
[INFO] *** Data sync starting ('csv_taxref' - CSV)...
[INFO] ** Occurrence sync starting ('csv_taxref' - CSV)...
[INFO] ** Occurrence sync with 'csv_taxref' done (0.08 s)!
[INFO] *** Data sync with 'csv_taxref' done (total time: 0.08 s)!
The Niamoto database had been successfully synced with 'csv_taxref'!
Bellow is a summary of what had been done:
Occurrences:
7 inserted
0 updated
0 deleted
$ niamoto sync csv_gbif csv_niamoto_gbif_occurrences.csv 0 0
Syncing the Niamoto database with 'csv_gbif'...
[INFO] *** Data sync starting ('csv_gbif' - CSV)...
[INFO] ** Occurrence sync starting ('csv_gbif' - CSV)...
[INFO] ** Occurrence sync with 'csv_gbif' done (0.08 s)!
[INFO] *** Data sync with 'csv_gbif' done (total time: 0.08 s)!
The Niamoto database had been successfully synced with 'csv_gbif'!
Bellow is a summary of what had been done:
Occurrences:
4 inserted
0 updated
0 deleted
We now have 19 occurrences coming from 3 data providers in our Niamoto database, as we can see using the following command:
$ niamoto status
3 data providers are registered.
4 taxa are stored.
3 taxon synonym keys are registered.
19 occurrences are stored.
0 plots are stored.
0 plots/occurrences are stored.
0 rasters are stored.
0 vectors are stored.
2. Importing plot data¶
The plot csv file must have a header and contain at least the following columns:
- id: The provider’s identifier for the plot.
- name: The name of the plot.
- x: The longitude of the plot (WGS84).
- y: The latitude of the plot (WGS84).
All the remaining column will be stored as properties.
Let’s consider the following dataset for the csv_niamoto
provider:
id | name | x | y | width | height |
---|---|---|---|---|---|
0 | plot_1 | 165.321 | -21.47 | 100 | 100 |
1 | plot_2 | 165.125 | -21.54 | 100 | 100 |
2 | plot_3 | 162.001 | -18.11 | 100 | 100 |
We import the plot data using the following command:
$ niamoto sync csv_niamoto 0 csv_niamoto_plots.csv 0
Syncing the Niamoto database with 'csv_niamoto'...
[INFO] *** Data sync starting ('csv_niamoto' - CSV)...
[INFO] ** Plot sync starting ('csv_niamoto' - CSV)...
[INFO] ** Plot sync with 'csv_niamoto' done (0.06 s)!
[INFO] *** Data sync with 'csv_niamoto' done (total time: 0.07 s)!
The Niamoto database had been successfully synced with 'csv_niamoto'!
Bellow is a summary of what had been done:
Plots:
3 inserted
0 updated
0 deleted
3. Importing plot/occurrence data¶
The plot/occurrence data is a many to many relationship between occurrences and plots. A plot can contains several occurrences and an occurrence can be contained by several plots. The plot/occurrence csv file must have a header and contain at least the following columns:
- plot_id: The provider’s id for the plot.
- occurrence_id: The provider’s id for the occurrence.
- occurrence_identifier: The occurrence identifier in the plot.
The additional columns will be ignored.
Let’s consider the following data, for linking csv_niamoto
’s occurrences
with csv_niamoto
’s plots:
plot_id | occurrence_id | occurrence_identifier |
---|---|---|
0 | 0 | PLOT_1__OCC_1 |
0 | 1 | PLOT_1__OCC_2 |
0 | 2 | PLOT_1__OCC_3 |
1 | 3 | PLOT_2__OCC_1 |
1 | 4 | PLOT_2__OCC_2 |
2 | 5 | PLOT_3__OCC_1 |
2 | 6 | PLOT_3__OCC_2 |
2 | 7 | PLOT_3__OCC_3 |
We import the plot/occurrence data using the following command:
$ niamoto sync csv_niamoto 0 0 csv_niamoto_plots_occurrences.csv
Syncing the Niamoto database with 'csv_niamoto'...
[INFO] *** Data sync starting ('csv_niamoto' - CSV)...
[INFO] ** Plot-occurrence sync starting ('csv_niamoto' - CSV)...
[INFO] ** Plot-occurrence sync with 'csv_niamoto' done (0.05 s)!
[INFO] *** Data sync with 'csv_niamoto' done (total time: 0.06 s)!
The Niamoto database had been successfully synced with 'csv_niamoto'!
Bellow is a summary of what had been done:
Plots / Occurrences:
8 inserted
0 updated
0 deleted
We can check the Niamoto database status with the niamoto status
command:
$ niamoto status
3 data providers are registered.
4 taxa are stored.
3 taxon synonym keys are registered.
19 occurrences are stored.
3 plots are stored.
8 plots/occurrences are stored.
0 rasters are stored.
0 vectors are stored.
Importing rasters¶
Niamoto provides functionalities to import and manage raster within the Niamoto database, these functionalities rely on the PostGIS raster functionalities. The main advantage of storing rasters inside a PostGIS database is to benefit from the power of the SQL language, and the PostGIS spatial functions. It is also a convenient way for having all the data stored at the same place and for using the same system for querying.
Importing a raster in Niamoto is straightforward using the
niamoto add_raster
command:
$ niamoto add_raster --help
Usage: niamoto add_raster [OPTIONS] NAME RASTER_FILE_PATH
Add a raster in Niamoto's raster database.
Options:
-t, --tile_dimension TEXT Tile dimension <width>x<height>
-R, --register Register the raster as a filesystem (out-db)
raster. (-R option of raster2pgsql).
--help Show this message and exit.
Now let’s import a rainfall raster in our Niamoto database:
$ niamoto add_raster rainfall rainfall.tif
Registering the raster in database...
The raster had been successfully registered to the Niamoto raster database!
We can see the registered rasters with the niamoto rasters
command:
$ niamoto rasters
name date_create date_update
id
1 rainfall 2017/06/08 None
Importing vectors¶
Niamoto rely on the ogr2ogr utility to import vector layers in the Niamoto vector database. To import a vector layer, we use the niamoto add_vector command:
$ niamoto add_vector boundaries boundaries.shp
Registering the vector in database...
The vector had been successfully registered to the Niamoto vector database!
We can see the registered rasters with the niamoto vectors
command:
$ niamoto vectors
name date_create date_update
id
2 boundaries 2017/06/23 None
Extracting raster values to occurrences and plot properties¶
Niamoto provides utilities for extracting raster values directly into occurrences or plots properties.
$ niamoto raster_to_occurrences --help
Usage: niamoto raster_to_occurrences [OPTIONS] RASTER_NAME
Extract raster values to occurrences properties.
Options:
--help Show this message and exit
$ niamoto raster_to_plots --help
Usage: niamoto raster_to_plots [OPTIONS] RASTER_NAME
Extract raster values to plots properties.
Options:
--help Show this message and exit.
$ niamoto all_rasters_to_occurrences --help
Usage: niamoto all_rasters_to_occurrences [OPTIONS]
Extract raster values to occurrences properties for all registered
rasters.
Options:
--help Show this message and exit.
$ niamoto all_rasters_to_plots --help
Usage: niamoto all_rasters_to_plots [OPTIONS]
Extract raster values to plots properties for all registered rasters.
Options:
--help Show this message and exit.
For instance, let’s extract the values of the previously registered raster,
rainfall
to the occurrences properties:
$ niamoto raster_to_occurrences rainfall
Extracting 'rainfall' raster values to occurrences...
The raster values had been successfully extracted!
Processing and publishing data¶
The list of available data publishers can be displayed using the
niamoto publishers
command:
$ niamoto publishers
occurrences : Publish the occurrence dataframe with properties as columns.
plots : Publish the plot dataframe with properties as columns.
taxa : Publish the taxa dataframe.
plots_occurrences : Publish the plots/occurrences dataframe.
raster : Publish a raster from the niamoto raster database.
For a given publisher, the available publish formats can be displayed using
the niamoto publish_formats
command:
$ niamoto publish_formats occurrences
csv : Publish the data using the csv format.
sql : Publish the data as a table to a SQL database
For each publisher, it is possible to get the list options
using the niamoto publish <publisher> --help
command:
$ niamoto publish occurrences --help
Usage: niamoto publish occurrences [OPTIONS] COMMAND [ARGS]...
Publish the occurrence dataframe with properties as columns.
Options:
--drop_null_properties
--properties TEXT List of properties to retain. Can be a python list
or a comma (',') separated string.
--help Show this message and exit.
Commands:
csv Publish the data in a csv file.
sql Publish a DataFrame as a table to a SQL...
The same is possible for each publisher’s publish format:
$ niamoto publish occurrences csv --help
Usage: niamoto publish occurrences csv [OPTIONS]
Publish the data in a csv file.
Options:
--index_label TEXT
-d, --destination TEXT
--help Show this message and exit.
Let’s publish the occurrence dataframe in a csv file:
$ niamoto publish occurrences csv -d occurrences.csv