Importing a database from Nominatim

As specified in the introduction vignette, you can download pre-built search indices for selected country extracts. If you require more freedom in providing the geocoding data, you can choose to import from an existing Nominatim database. Importing from Nominatim is also a requirement if you want to enable structured geocoding queries. This vignette guides you through the setup and import of an external Nominatim database as well as the setup of structured geocoding support through OpenSearch-based photon.

Technically, Nominatim databases can only be reliably set up on Linux systems. Here, we use the mediagis/nominatim docker image to set up Nominatim irrespective of the operating system. You can use the helper functions cmd_options() and run() to run a Nominatim docker. It is important to expose the port 5432 on the host machine, otherwise photon is not able to connect to the database.

opts <- cmd_options(
  e = "PBF_URL=https://download.geofabrik.de/australia-oceania/samoa-latest.osm.pbf",
  e = "NOMINATIM_PASSWORD=MNdtC2*pP#aMbe",
  e = "FREEZE=true",
  p = "8080:8080",
  p = "5432:5432",
  name = "nominatim",
  "mediagis/nominatim:4.4",
  use_double_hyphens = TRUE
)

nominatim <- process$new("docker", c("run", opts))

To verify that the database can be connected to, you can connect to it from R.

library(RPostgres)
db <- dbConnect(Postgres(), password = "MNdtC2*pP#aMbe", user = "nominatim")
dbGetInfo(db)
#> $dbname
#> [1] "nominatim"
#> 
#> $host
#> [1] "localhost"
#> 
#> $port
#> [1] "5432"
#> 
#> $username
#> [1] "nominatim"
#> 
#> $protocol.version
#> [1] 3
#> 
#> $server.version
#> [1] 140013
#> 
#> $db.version
#> [1] 140013
#> 
#> $pid
#> [1] 604

dbDisconnect(db)

If the database can be connected to, you can start a new photon instance and import the database using $import(). The database import creates the folder photon_data inside the given photon directory.

dir <- file.path(tempdir(), "photon")
photon <- new_photon(dir, overwrite = TRUE)
#> ℹ java version "22" 2024-03-19
#> ℹ Java(TM) SE Runtime Environment (build 22+36-2370)
#> ℹ Java HotSpot(TM) 64-Bit Server VM (build 22+36-2370, mixed mode, sharing)
#> ✔ Successfully downloaded photon 0.5.0. [8.2s]        
#> ℹ No search index downloaded! Download one or import from a Nominatim database.
#> • Version: 0.5.0

photon$import(host = "localhost", password = "MNdtC2*pP#aMbe")
#> 2024-10-24 23:07:35,904 [main] WARN  org.elasticsearch.node.Node - version [5.6.16-SNAPSHOT] is a pre-release version of Elasticsearch and is not suitable for production
#> 2024-10-24 23:07:43,326 [main] INFO  de.komoot.photon.elasticsearch.Server - Started elastic search node
#> 2024-10-24 23:07:43,326 [main] INFO  de.komoot.photon.App - Make sure that the ES cluster is ready, this might take some time.
#> 2024-10-24 23:07:43,905 [main] INFO  de.komoot.photon.App - ES cluster is now ready.
#> 2024-10-24 23:07:45,299 [main] INFO  de.komoot.photon.App - Starting import from nominatim to photon with languages: en,fr,de,it
#> 2024-10-24 23:07:45,300 [main] INFO  de.komoot.photon.nominatim.NominatimConnector - Start importing documents from nominatim (global)
#> 2024-10-24 23:07:59,080 [main] INFO  de.komoot.photon.nominatim.ImportThread - Finished import of 2085 photon documents.
#> 2024-10-24 23:07:59,080 [main] INFO  de.komoot.photon.App - Imported data from nominatim to photon with languages: en,fr,de,it

After the import has finished, you can start the photon instance.

photon$start()
#> 2024-10-24 23:26:46,360 [main] WARN  org.elasticsearch.node.Node - version [5.6.16-SNAPSHOT] is a pre-release version of Elasticsearch and is not suitable for production
#> ✔ Photon is now running. [11.1s]
geocode("Apia")
#> Simple feature collection with 3 features and 13 fields
#> Geometry type: POINT
#> Dimension:     XY
#> Bounding box:  xmin: -171.7631 ymin: -13.83613 xmax: -171.7512 ymax: -13.82611
#> Geodetic CRS:  WGS 84
#> # A tibble: 3 × 14
#>     idx osm_type     osm_id country osm_key city        street     countrycode osm_value name  state type  extent
#>   <int> <chr>         <int> <chr>   <chr>   <chr>       <chr>      <chr>       <chr>     <chr> <chr> <chr> <list>
#> 1     1 W        1322127938 Samoa   place   NA          NA         WS          city      Apia  Tuam… city  <dbl> 
#> 2     1 W         723300892 Samoa   landuse Matautu Tai NA         WS          harbour   Apia… Tuam… other <dbl> 
#> 3     1 W         666117780 Samoa   tourism Levili      Levili St… WS          attracti… Apia… Tuam… house <dbl> 
#> # ℹ 1 more variable: geometry <POINT [°]>

OpenSearch

Previous setups of photon were based on the ElasticSearch search engine. Photon also offers a version that is based on OpenSearch. This photon version is necessary to enable structured geocoding queries. Since photon 0.6.0, OpenSearch jar files are provided with new photon releases and can be downloaded by setting opensearch = TRUE in the new_photon() function.

# set opensearch = TRUE to use OpenSearch photon
photon <- new_photon(dir, opensearch = TRUE, quiet = TRUE)

# set structured = TRUE to enable structured geocoding
photon$import(host = "localhost", password = "MNdtC2*pP#aMbe", structured = TRUE)

Again, the $import() method created a directory photon_data, which is 10-20% larger than the ElasticSearch version. After starting photon, you can verify that structured geocoding is enabled by running has_structured_support().

photon$start()
has_structured_support()
#> [1] TRUE

Since structured geocoding now works, you can now send entire datasets with structured address data. Structured geocoding allows you to search for specific elements of an address instead of passing free text queries. In the following example, we search for three different spatial features by querying their state, street, and housenumber. Photon is able to geocode them with very high precision due to the detailed data structure we provide.

place_data <- data.frame(
  housenumber = c(NA, "77C", NA),
  street = c("Falealilli Cross Island Road", "Main Beach Road", "Le Mafa Pass Road"),
  state = c("Tuamasaga", "Tuamasaga", "Atua")
)

structured(place_data, limit = 1)
#> Simple feature collection with 3 features and 14 fields
#> Geometry type: POINT
#> Dimension:     XY
#> Bounding box:  xmin: -171.7759 ymin: -14.04544 xmax: -171.451 ymax: -13.8338
#> Geodetic CRS:  WGS 84
#> # A tibble: 3 × 15
#>     idx osm_type    osm_id country osm_key  city      countrycode osm_value name    state type  extent housenumber street
#>   <int> <chr>        <int> <chr>   <chr>    <chr>     <chr>       <chr>     <chr>   <chr> <chr> <list> <chr>       <chr> 
#> 1     1 W        319147189 Samoa   highway  Siumu Uta WS          primary   Faleal… Tuam… stre… <dbl>  NA          NA    
#> 2     2 W        569855981 Samoa   building Apia      WS          yes       NA      Tuam… house <dbl>  77C         Main …
#> 3     3 W         40681149 Samoa   highway  Lalomanu  WS          primary   Main S… Ātua  stre… <dbl>  NA          NA    
#> # ℹ 1 more variable: geometry <POINT [°]>