lingtypology: easy mapping for Lingustic Typology

George Moroz

2017-01-29

What is lingtypology?

The lingtypology package connects R with the Glottolog database (v. 2.7) and provides an additional functionality for linguistic typology. The Glottolog database contains a catalogue of the world’s languages. This package helps researchers to make linguistic maps, using the philosophy of the Cross-Linguistic Linked Data project, which is creating uniform access to linguistic data across publications. This package is based on the leaflet package, so lingtypology is a package for interactive linguistic cartography.

I would like to thank Natalya Tyshkevich and Samira Verhees for reading and correcting this vignette.

1. Installation

Since lingtypology is an R package, you should install R on your PC if you haven’t already done so. To install lingtypology package, run the following command at your R IDE, so you get the stable version from CRAN:

install.packages("lingtypology")

You can also get the development version from GitHub:

install.packages("devtools")
devtools::install_github("agricolamz/lingtypology", dependencies = TRUE)

Load package:

library(lingtypology)

2. Glottolog functions

This package is based on the Glottolog database (v. 2.7), so lingtypology has several functions for accessing data from that database. In the Glottolog database, the term languoid is used to catalogue languages, dialects and language families alike.

2.1 Command name’s syntax

Most of the functions in lingtypology have the same syntax: what you need.what you have. Most of them are based on languoid name.

Some of them help to define a vector of languoids.

The most important functionality of lingtypology is the ability to create interactive maps based on features and sets of languoids (see the the next section) * map.feature()

Glottolog database (v. 2.7) provides lingtypology with languoid names, ISO codes, genealogical affiliation, macro area, countries and coordinates.

2.2 Using base functions

All functions introduced in the previous section are regular functions, so they can take the following objects as input:

iso.lang("Adyghe")
## Adyghe 
##  "ady"
lang.iso("ady")
##      ady 
## "Adyghe"
country.lang("Adyghe")
##                                                                                                                  Adyghe 
## "Turkey, United States, Israel, Australia, Egypt, Macedonia, France, Russia, Netherlands, Germany, Syria, Jordan, Iraq"
lang.aff("Abkhaz-Adyge")
## [1] "Adyghe"    "Ubykh"     "Abkhazian" "Abaza"     "Kabardian"
area.lang(c("Adyghe", "Aduge"))
##    Adyghe     Aduge 
## "Eurasia"  "Africa"
lang <- c("Adyghe", "Russian")
aff.lang(lang)
##                                             Adyghe 
##                         "Abkhaz-Adyge, Circassian" 
##                                            Russian 
## "Indo-European, Balto-Slavic, Slavic, East Slavic"
iso.lang(lang.aff("East Slavic"))
##     Russian       Rusyn   Ukrainian  Belarusian Old Russian 
##       "rus"       "rue"       "ukr"       "bel"       "orv"

The behavior of most functions is rather predictable, but the function country.lang has an additional feature. By default this function takes a vector of languages and returns a vector of countries. But if you set the argument intersection = TRUE, then the function returns a vector of countries where all languoids from the query are spoken.

country.lang(c("Udi", "Laz"))
##                                                        Udi 
##                "Russia, Georgia, Azerbaijan, Turkmenistan" 
##                                                        Laz 
## "Turkey, Georgia, France, United States, Germany, Belgium"
country.lang(c("Udi", "Laz"), intersection = TRUE)
## [1] "Georgia"

2.3 Spell Checker: look carefully at warnings!

There are some functions that take country names as input. Unfortunately, some countries have alternative names. In order to save users the trouble of having to figure out the exact name stored in the database (for example Ivory Coast or Cote d’Ivoire), all official country names and standard abbreviations are stored in the database:

lang.country("Cape Verde")
## [1] "Kabuverdianu" "Portuguese"
lang.country("Cabo Verde")
## [1] "Kabuverdianu" "Portuguese"
head(lang.country("UK"))
## [1] "Angloromani"          "Welsh"                "English"             
## [4] "French"               "Assyrian Neo-Aramaic" "Northern Kurdish"

All functions which take a vector of languoids are enriched with a kind of a spell checker. If a languoid from a query is absent in the database, functions return a warning message containing a set of candidates with the minimal Levenshtein distance to the languoid from the query.

aff.lang("Adyge")
## Warning: Languoid Adyge is absent in our database. Did you mean Adyghe,
## Aduge?
## Adyge 
##    NA

2.4 Changes in the glottolog database

Unfortunately, the Glottolog database (v. 2.7) is not perfect for all my tasks, so I changed it a little bit:

After Robert Forkel’s issue I decided to add an argument glottolog.source, so that everybody have an access to “original” and “modified” (by default) glottolog versions:

is.glottolog(c("Tabasaran", "Tabassaran"), glottolog.source = "original")
## [1] FALSE  TRUE
is.glottolog(c("Tabasaran", "Tabassaran"), glottolog.source = "modified")
## [1]  TRUE FALSE

It is common practice in R to reduce both function arguments and its values, so it can also be done with the following lingtypology functions.

is.glottolog(c("Tabasaran", "Tabassaran"), g = "o")
## [1] FALSE  TRUE
is.glottolog(c("Tabasaran", "Tabassaran"), g = "m")
## [1]  TRUE FALSE

3. Map features with map.feature

3.1 Base map

The most important part of the lingtypology package is the function map.feature. This function allows a user to produce maps similar to known projects within the Cross-Linguistic Linked Data philosophy, such as WALS and Glottolog:

map.feature(c("Adyghe", "Kabardian", "Polish", "Russian", "Bulgarian"))

As shown in the picture above, this function generates an interactive Leaflet map with a control box that allows users to toggle the visibility of any group of points on the map. All specific points on the map have a pop-up box that appears when markers are clicked (more about editing pop-up boxes see below). By default, they contain languoid names linked to the glottolog site.

If you are new to R, please find some information about how to import data to R. It is simple to make a .csv, .ods or .xls containing lists of languages and features and read it from R (.csv is an easiest way).

3.2 Set features

The goal of this package is to allow typologists to map language types. A list of languoids and correspondent features can be stored in a data.frame as follows:

df <- data.frame(language = c("Adyghe", "Kabardian", "Polish", "Russian", "Bulgarian"),
                 features = c("polysynthetic", "polysynthetic", "fusional", "fusional", "fusional"))
df
##    language      features
## 1    Adyghe polysynthetic
## 2 Kabardian polysynthetic
## 3    Polish      fusional
## 4   Russian      fusional
## 5 Bulgarian      fusional

Now we can draw a map:

map.feature(languages = df$language, features = df$features)

Since correspondence between color palette and maped features is chosen randomly by default, it is better to use the function set.seed to get reproducible map (or choose colors by yourself, see section 3.5):

set.seed(42)
map.feature(languages = df$language, features = df$features)

Like in most R functions, it is not necessary to name all arguments, so the same result can be obtained by:

set.seed(42)
map.feature(df$language, df$features)

As shown in the picture above, all points are grouped by feature, colored and counted. As before, a pop-up box appears when markers are clicked. A control feature allows users to toggle the visibility of points grouped by feature.

3.3 Set pop-up boxes

Sometimes it is a good idea to add some additional information (e.g. language affiliation, references or even examples) to pop-up boxes that appear, when points are clicked. In order to do so, first of all we need to create an extra vector of strings in our dataframe:

df$popup <- aff.lang(df$language)

The function aff.lang() creates a vector of genealogical affiliations that can be easily mapped:

set.seed(42)
map.feature(languages = df$language, features = df$features, popup = df$popup)

Like before, it is not necessary to name all arguments, so the same result can be obtained by this:

set.seed(42)
map.feature(df$language, df$features, df$popup)

Pop-up strings can contain HTML tags, so it is easy to insert a link, a couple of lines or a table. Here is how pop-up boxes can demonstrate language examples:

# change a df$popup vector
df$popup <- c ("sɐ s-ɐ-k'ʷɐ<br> 1sg 1sg.abs-dyn-go<br>'I go'",
               "sɐ s-o-k'ʷɐ<br> 1sg 1sg.abs-dyn-go<br>'I go'",
               "id-ę<br> go-1sg.npst<br> 'I go'",
               "ya id-u<br> 1sg go-1sg.npst <br> 'I go'",
               "id-a<br> go-1sg.prs<br> 'I go'")
# create a map
set.seed(42)
map.feature(df$language, df$features, df$popup)

3.4 Set coordinates

Users can set their own coordinates using the arguments latitude and longitude. I will illustrate this with the dataset circassian built into the lingtypology package. This dataset comes from fieldwork collected during several expeditions in the period 2011-2016 and contains a list of Circassian villages:

set.seed(42)
map.feature(languages = circassian$language,
            features = circassian$languoid,
            popup = circassian$village,
            latitude = circassian$latitude,
            longitude = circassian$longitude)

3.5 Set colors

By default color palette is created by rainbow() function, but user can set their own colors using argument color:

df <- data.frame(language = c("Adyghe", "Kabardian", "Polish", "Russian", "Bulgarian"),
                 features = c("polysynthetic", "polysynthetic", "fusional", "fusional", "fusional"))
map.feature(languages = df$language,
            features = df$features,
            color = c("yellowgreen", "navy"))

3.6 Set control box

Package can generate the automatically generated control box that allows users to toggle the visibility of points and feature. To anable it there is an argument control in the map.feature function:

set.seed(42)
map.feature(languages = df$language,
            features = df$features,
            control = TRUE)

3.7 Set an additional set of features using strokes

The map.feature function has an additional argument stroke.features. Using this argument it becomes possible to show two independent sets of features on one map. By default strokes are colored in grey (so for two levels it will be black and white, for three — black, grey, white end so on), but users can set their own colors using the argument stroke.color:

set.seed(42)
map.feature(circassian$language,
            features = circassian$languoid,
            stroke.features = circassian$language,
            latitude = circassian$latitude,
            longitude = circassian$longitude)

It is important to note that stroke.features can work with NA values. The function won’t plot anything if there is an NA value. Let’s set a language value to NA in all Baksan villages from the circassian dataset

# create newfeature variable
newfeature <- circassian[,c(5,6)]
# set language feature of the Baksan villages to NA and reduce newfeature from dataframe to vector
newfeature <-  replace(newfeature$language, newfeature$languoid == "Baksan", NA)
# create a map
set.seed(42)
map.feature(circassian$language,
            features = circassian$languoid,
            latitude = circassian$latitude,
            longitude = circassian$longitude,
            stroke.features = newfeature)

3.8 Set radii and an opacity feature

All markers have their own radius and opacity, so it can be set by users. Just use arguments radius, stroke.radius, opacity and stroke.opacity:

set.seed(42)
map.feature(circassian$language,
            features = circassian$languoid,
            stroke.features = circassian$language,
            latitude = circassian$latitude,
            longitude = circassian$longitude,
            radius = 7, stroke.radius = 13)
set.seed(42)
map.feature(circassian$language,
            features = circassian$languoid,
            stroke.features = circassian$language,
            latitude = circassian$latitude,
            longitude = circassian$longitude,
            opacity = 0.7, stroke.opacity = 0.6)

3.9 Customizing legends

By default legend appear in the left bottom corner. If there are stroke features, two legends are generated. There are additional arguments that control appearence and title of the legends.

set.seed(42)
map.feature(circassian$language,
            features = circassian$languoid,
            stroke.features = circassian$language,
            latitude = circassian$latitude,
            longitude = circassian$longitude,
            legend = FALSE, stroke.legend = TRUE)
set.seed(42)
map.feature(circassian$language,
            features = circassian$languoid,
            stroke.features = circassian$language,
            latitude = circassian$latitude,
            longitude = circassian$longitude,
            title = "Circassian dialects", stroke.title = "Languages")

3.10 Set layouts

It is possible to use different tiles on the same map using tile argument. For more tiles see here.

df <- data.frame(lang = c("Adyghe", "Kabardian", "Polish", "Russian", "Bulgarian"),
   feature = c("polysynthetic", "polysynthetic", "fusion", "fusion", "fusion"),
   popup = c("Adyghe", "Adyghe", "Slavic", "Slavic", "Slavic"))
set.seed(42)
map.feature(df$lang, df$feature, df$popup,
            tile = "Thunderforest.OpenCycleMap")

It is possible to use different map tiles at the same map. Just add a vector with tiles.

df <- data.frame(lang = c("Adyghe", "Kabardian", "Polish", "Russian", "Bulgarian"),
   feature = c("polysynthetic", "polysynthetic", "fusion", "fusion", "fusion"),
   popup = c("Adyghe", "Adyghe", "Slavic", "Slavic", "Slavic"))
set.seed(42)
map.feature(df$lang, df$feature, df$popup,
            tile = c("OpenStreetMap.BlackAndWhite", "Thunderforest.OpenCycleMap"))

It is possible to name tiles using tile.name argument.

df <- data.frame(lang = c("Adyghe", "Kabardian", "Polish", "Russian", "Bulgarian"),
   feature = c("polysynthetic", "polysynthetic", "fusion", "fusion", "fusion"),
   popup = c("Adyghe", "Adyghe", "Slavic", "Slavic", "Slavic"))
set.seed(42)
map.feature(df$lang, df$feature, df$popup,
            tile = c("OpenStreetMap.BlackAndWhite", "Thunderforest.OpenCycleMap"),
            tile.name = c("b & w", "colored"))

It is possible to combine tiles’ control box with the features’ control box.

df <- data.frame(lang = c("Adyghe", "Kabardian", "Polish", "Russian", "Bulgarian"),
   feature = c("polysynthetic", "polysynthetic", "fusion", "fusion", "fusion"),
   popup = c("Adyghe", "Adyghe", "Slavic", "Slavic", "Slavic"))
set.seed(42)
map.feature(df$lang, df$feature, df$popup,
            tile = c("OpenStreetMap.BlackAndWhite", "Thunderforest.OpenCycleMap"),
            control = TRUE)

3.11 Add a picture to a map

Argument images.url allows users to add their own pictures to a map, using url. In this part I will use two histograms on the most numerous nationalities in Moscow and St. Petersburg, based on data from the last Russian Census:

Lets create a dataframe

df <- data.frame(lang = c("Russian", "Russian"),
                 lat  = c(55.75, 59.95),
                 long = c(37.616667, 30.3),
# I use here URL shortener by Google
                 urls = c("https://goo.gl/5OUv1E",
                          "https://goo.gl/UWmvDw"))
map.feature(languages = df$lang,
            latitude = df$lat,
            longitude = df$long,
            image.url = df$urls)

Users can change the size of the pictures.

df <- data.frame(lang = c("Russian", "Russian"),
                 lat  = c(55.75, 59.95),
                 long = c(37.616667, 30.3),
# I use here URL shorter by Google
                 urls = c("https://goo.gl/5OUv1E",
                          "https://goo.gl/UWmvDw"))
map.feature(languages = df$lang,
            latitude = df$lat,
            longitude = df$long,
            image.url = df$urls,
            image.width = 200,
            image.height = 200)

It can be moved from the actual point:

df <- data.frame(lang = c("Russian", "Russian"),
                 lat  = c(55.75, 59.95),
                 long = c(37.616667, 30.3),
# I use here URL shorter by Google
                 urls = c("https://goo.gl/5OUv1E",
                          "https://goo.gl/UWmvDw"))
map.feature(languages = df$lang,
            latitude = df$lat,
            longitude = df$long,
            image.url = df$urls,
            image.width = 150,
            image.height = 150,
            image.X.shift = 10,
            image.Y.shift = 0)

Using this argument users can plot their own markers, any chart connected to a point or even their own legend. It is important to know that by using transparent .png files, the user can plot additional legend text on the map.

4. Citing lingtyplogy

It is realy important to cite R and R packages when you use them. For this purpose use citation R function:

citation("lingtypology")
## 
## Moroz G (2017). _lingtypology: Linguistic Typology and Mapping_.
## <URL: http://CRAN.R-project.org/package=lingtypology>.
## 
## A BibTeX entry for LaTeX users is
## 
##   @Manual{,
##     title = {lingtypology: Linguistic Typology and Mapping},
##     author = {George Moroz},
##     year = {2017},
##     url = {http://CRAN.R-project.org/package=lingtypology},
##   }