lingtypology
: easy mapping for Lingustic TypologyThe lingtypology
package connects R with the Glottolog database (v. 2.7) and provides an additional functionality for linguistic typology. The Glottolog database contains a catalogue of the world’s languages. This package helps researchers to make linguistic maps, using the philosophy of the Cross-Linguistic Linked Data project, which is creating uniform access to linguistic data across publications. This package is based on the leaflet package, so lingtypology
is a package for interactive linguistic cartography.
I would like to thank Natalya Tyshkevich and Samira Verhees for reading and correcting this vignette.
Since lingtypology
is an R package, you should install R on your PC if you haven’t already done so. To install lingtypology
package, run the following command at your R IDE, so you get the stable version from CRAN:
install.packages("lingtypology")
You can also get the development version from GitHub:
install.packages("devtools")
devtools::install_github("agricolamz/lingtypology", dependencies = TRUE)
Load package:
library(lingtypology)
This package is based on the Glottolog database (v. 2.7), so lingtypology
has several functions for accessing data from that database. In the Glottolog database, the term languoid is used to catalogue languages, dialects and language families alike.
Most of the functions in lingtypology
have the same syntax: what you need.what you have. Most of them are based on languoid name.
Some of them help to define a vector of languoids.
The most important functionality of lingtypology
is the ability to create interactive maps based on features and sets of languoids (see the the next section) * map.feature()
Glottolog database (v. 2.7) provides lingtypology
with languoid names, ISO codes, genealogical affiliation, macro area, countries and coordinates.
All functions introduced in the previous section are regular functions, so they can take the following objects as input:
iso.lang("Adyghe")
## Adyghe
## "ady"
lang.iso("ady")
## ady
## "Adyghe"
country.lang("Adyghe")
## Adyghe
## "Turkey, United States, Israel, Australia, Egypt, Macedonia, France, Russia, Netherlands, Germany, Syria, Jordan, Iraq"
lang.aff("Abkhaz-Adyge")
## [1] "Adyghe" "Ubykh" "Abkhazian" "Abaza" "Kabardian"
area.lang(c("Adyghe", "Aduge"))
## Adyghe Aduge
## "Eurasia" "Africa"
lang <- c("Adyghe", "Russian")
aff.lang(lang)
## Adyghe
## "Abkhaz-Adyge, Circassian"
## Russian
## "Indo-European, Balto-Slavic, Slavic, East Slavic"
iso.lang(lang.aff("East Slavic"))
## Russian Rusyn Ukrainian Belarusian Old Russian
## "rus" "rue" "ukr" "bel" "orv"
The behavior of most functions is rather predictable, but the function country.lang
has an additional feature. By default this function takes a vector of languages and returns a vector of countries. But if you set the argument intersection = TRUE
, then the function returns a vector of countries where all languoids from the query are spoken.
country.lang(c("Udi", "Laz"))
## Udi
## "Russia, Georgia, Azerbaijan, Turkmenistan"
## Laz
## "Turkey, Georgia, France, United States, Germany, Belgium"
country.lang(c("Udi", "Laz"), intersection = TRUE)
## [1] "Georgia"
There are some functions that take country names as input. Unfortunately, some countries have alternative names. In order to save users the trouble of having to figure out the exact name stored in the database (for example Ivory Coast or Cote d’Ivoire), all official country names and standard abbreviations are stored in the database:
lang.country("Cape Verde")
## [1] "Kabuverdianu" "Portuguese"
lang.country("Cabo Verde")
## [1] "Kabuverdianu" "Portuguese"
head(lang.country("UK"))
## [1] "Angloromani" "Welsh" "English"
## [4] "French" "Assyrian Neo-Aramaic" "Northern Kurdish"
All functions which take a vector of languoids are enriched with a kind of a spell checker. If a languoid from a query is absent in the database, functions return a warning message containing a set of candidates with the minimal Levenshtein distance to the languoid from the query.
aff.lang("Adyge")
## Warning: Languoid Adyge is absent in our database. Did you mean Adyghe,
## Aduge?
## Adyge
## NA
Unfortunately, the Glottolog database (v. 2.7) is not perfect for all my tasks, so I changed it a little bit:
After Robert Forkel’s issue I decided to add an argument glottolog.source
, so that everybody have an access to “original” and “modified” (by default) glottolog versions:
is.glottolog(c("Tabasaran", "Tabassaran"), glottolog.source = "original")
## [1] FALSE TRUE
is.glottolog(c("Tabasaran", "Tabassaran"), glottolog.source = "modified")
## [1] TRUE FALSE
It is common practice in R to reduce both function arguments and its values, so it can also be done with the following lingtypology functions.
is.glottolog(c("Tabasaran", "Tabassaran"), g = "o")
## [1] FALSE TRUE
is.glottolog(c("Tabasaran", "Tabassaran"), g = "m")
## [1] TRUE FALSE
map.feature
The most important part of the lingtypology
package is the function map.feature
. This function allows a user to produce maps similar to known projects within the Cross-Linguistic Linked Data philosophy, such as WALS and Glottolog:
map.feature(c("Adyghe", "Kabardian", "Polish", "Russian", "Bulgarian"))
As shown in the picture above, this function generates an interactive Leaflet map with a control box that allows users to toggle the visibility of any group of points on the map. All specific points on the map have a pop-up box that appears when markers are clicked (more about editing pop-up boxes see below). By default, they contain languoid names linked to the glottolog site.
If you are new to R, please find some information about how to import data to R. It is simple to make a .csv, .ods or .xls containing lists of languages and features and read it from R (.csv is an easiest way).
The goal of this package is to allow typologists to map language types. A list of languoids and correspondent features can be stored in a data.frame
as follows:
df <- data.frame(language = c("Adyghe", "Kabardian", "Polish", "Russian", "Bulgarian"),
features = c("polysynthetic", "polysynthetic", "fusional", "fusional", "fusional"))
df
## language features
## 1 Adyghe polysynthetic
## 2 Kabardian polysynthetic
## 3 Polish fusional
## 4 Russian fusional
## 5 Bulgarian fusional
Now we can draw a map:
map.feature(languages = df$language, features = df$features)
Since correspondence between color palette and maped features is chosen randomly by default, it is better to use the function set.seed
to get reproducible map (or choose colors by yourself, see section 3.5):
set.seed(42)
map.feature(languages = df$language, features = df$features)
Like in most R functions, it is not necessary to name all arguments, so the same result can be obtained by:
set.seed(42)
map.feature(df$language, df$features)
Sometimes it is a good idea to add some additional information (e.g. language affiliation, references or even examples) to pop-up boxes that appear, when points are clicked. In order to do so, first of all we need to create an extra vector of strings in our dataframe:
df$popup <- aff.lang(df$language)
The function aff.lang()
creates a vector of genealogical affiliations that can be easily mapped:
set.seed(42)
map.feature(languages = df$language, features = df$features, popup = df$popup)
Like before, it is not necessary to name all arguments, so the same result can be obtained by this:
set.seed(42)
map.feature(df$language, df$features, df$popup)
# change a df$popup vector
df$popup <- c ("sɐ s-ɐ-k'ʷɐ<br> 1sg 1sg.abs-dyn-go<br>'I go'",
"sɐ s-o-k'ʷɐ<br> 1sg 1sg.abs-dyn-go<br>'I go'",
"id-ę<br> go-1sg.npst<br> 'I go'",
"ya id-u<br> 1sg go-1sg.npst <br> 'I go'",
"id-a<br> go-1sg.prs<br> 'I go'")
# create a map
set.seed(42)
map.feature(df$language, df$features, df$popup)
Users can set their own coordinates using the arguments latitude
and longitude
. I will illustrate this with the dataset circassian
built into the lingtypology
package. This dataset comes from fieldwork collected during several expeditions in the period 2011-2016 and contains a list of Circassian villages:
set.seed(42)
map.feature(languages = circassian$language,
features = circassian$languoid,
popup = circassian$village,
latitude = circassian$latitude,
longitude = circassian$longitude)
By default color palette is created by rainbow()
function, but user can set their own colors using argument color
:
df <- data.frame(language = c("Adyghe", "Kabardian", "Polish", "Russian", "Bulgarian"),
features = c("polysynthetic", "polysynthetic", "fusional", "fusional", "fusional"))
map.feature(languages = df$language,
features = df$features,
color = c("yellowgreen", "navy"))
Package can generate the automatically generated control box that allows users to toggle the visibility of points and feature. To anable it there is an argument control
in the map.feature
function:
set.seed(42)
map.feature(languages = df$language,
features = df$features,
control = TRUE)
The map.feature
function has an additional argument stroke.features
. Using this argument it becomes possible to show two independent sets of features on one map. By default strokes are colored in grey (so for two levels it will be black and white, for three — black, grey, white end so on), but users can set their own colors using the argument stroke.color
:
set.seed(42)
map.feature(circassian$language,
features = circassian$languoid,
stroke.features = circassian$language,
latitude = circassian$latitude,
longitude = circassian$longitude)
It is important to note that stroke.features
can work with NA
values. The function won’t plot anything if there is an NA
value. Let’s set a language value to NA
in all Baksan villages from the circassian
dataset
# create newfeature variable
newfeature <- circassian[,c(5,6)]
# set language feature of the Baksan villages to NA and reduce newfeature from dataframe to vector
newfeature <- replace(newfeature$language, newfeature$languoid == "Baksan", NA)
# create a map
set.seed(42)
map.feature(circassian$language,
features = circassian$languoid,
latitude = circassian$latitude,
longitude = circassian$longitude,
stroke.features = newfeature)
All markers have their own radius and opacity, so it can be set by users. Just use arguments radius
, stroke.radius
, opacity
and stroke.opacity
:
set.seed(42)
map.feature(circassian$language,
features = circassian$languoid,
stroke.features = circassian$language,
latitude = circassian$latitude,
longitude = circassian$longitude,
radius = 7, stroke.radius = 13)
set.seed(42)
map.feature(circassian$language,
features = circassian$languoid,
stroke.features = circassian$language,
latitude = circassian$latitude,
longitude = circassian$longitude,
opacity = 0.7, stroke.opacity = 0.6)
By default legend appear in the left bottom corner. If there are stroke features, two legends are generated. There are additional arguments that control appearence and title of the legends.
set.seed(42)
map.feature(circassian$language,
features = circassian$languoid,
stroke.features = circassian$language,
latitude = circassian$latitude,
longitude = circassian$longitude,
legend = FALSE, stroke.legend = TRUE)
set.seed(42)
map.feature(circassian$language,
features = circassian$languoid,
stroke.features = circassian$language,
latitude = circassian$latitude,
longitude = circassian$longitude,
title = "Circassian dialects", stroke.title = "Languages")
It is possible to use different tiles on the same map using tile
argument. For more tiles see here.
df <- data.frame(lang = c("Adyghe", "Kabardian", "Polish", "Russian", "Bulgarian"),
feature = c("polysynthetic", "polysynthetic", "fusion", "fusion", "fusion"),
popup = c("Adyghe", "Adyghe", "Slavic", "Slavic", "Slavic"))
set.seed(42)
map.feature(df$lang, df$feature, df$popup,
tile = "Thunderforest.OpenCycleMap")
It is possible to use different map tiles at the same map. Just add a vector with tiles.
df <- data.frame(lang = c("Adyghe", "Kabardian", "Polish", "Russian", "Bulgarian"),
feature = c("polysynthetic", "polysynthetic", "fusion", "fusion", "fusion"),
popup = c("Adyghe", "Adyghe", "Slavic", "Slavic", "Slavic"))
set.seed(42)
map.feature(df$lang, df$feature, df$popup,
tile = c("OpenStreetMap.BlackAndWhite", "Thunderforest.OpenCycleMap"))
It is possible to name tiles using tile.name
argument.
df <- data.frame(lang = c("Adyghe", "Kabardian", "Polish", "Russian", "Bulgarian"),
feature = c("polysynthetic", "polysynthetic", "fusion", "fusion", "fusion"),
popup = c("Adyghe", "Adyghe", "Slavic", "Slavic", "Slavic"))
set.seed(42)
map.feature(df$lang, df$feature, df$popup,
tile = c("OpenStreetMap.BlackAndWhite", "Thunderforest.OpenCycleMap"),
tile.name = c("b & w", "colored"))
It is possible to combine tiles’ control box with the features’ control box.
df <- data.frame(lang = c("Adyghe", "Kabardian", "Polish", "Russian", "Bulgarian"),
feature = c("polysynthetic", "polysynthetic", "fusion", "fusion", "fusion"),
popup = c("Adyghe", "Adyghe", "Slavic", "Slavic", "Slavic"))
set.seed(42)
map.feature(df$lang, df$feature, df$popup,
tile = c("OpenStreetMap.BlackAndWhite", "Thunderforest.OpenCycleMap"),
control = TRUE)
Argument images.url allows users to add their own pictures to a map, using url. In this part I will use two histograms on the most numerous nationalities in Moscow and St. Petersburg, based on data from the last Russian Census:
Lets create a dataframe
df <- data.frame(lang = c("Russian", "Russian"),
lat = c(55.75, 59.95),
long = c(37.616667, 30.3),
# I use here URL shortener by Google
urls = c("https://goo.gl/5OUv1E",
"https://goo.gl/UWmvDw"))
map.feature(languages = df$lang,
latitude = df$lat,
longitude = df$long,
image.url = df$urls)
Users can change the size of the pictures.
df <- data.frame(lang = c("Russian", "Russian"),
lat = c(55.75, 59.95),
long = c(37.616667, 30.3),
# I use here URL shorter by Google
urls = c("https://goo.gl/5OUv1E",
"https://goo.gl/UWmvDw"))
map.feature(languages = df$lang,
latitude = df$lat,
longitude = df$long,
image.url = df$urls,
image.width = 200,
image.height = 200)
It can be moved from the actual point:
df <- data.frame(lang = c("Russian", "Russian"),
lat = c(55.75, 59.95),
long = c(37.616667, 30.3),
# I use here URL shorter by Google
urls = c("https://goo.gl/5OUv1E",
"https://goo.gl/UWmvDw"))
map.feature(languages = df$lang,
latitude = df$lat,
longitude = df$long,
image.url = df$urls,
image.width = 150,
image.height = 150,
image.X.shift = 10,
image.Y.shift = 0)
Using this argument users can plot their own markers, any chart connected to a point or even their own legend. It is important to know that by using transparent .png files, the user can plot additional legend text on the map.
lingtyplogy
It is realy important to cite R and R packages when you use them. For this purpose use citation
R function:
citation("lingtypology")
##
## Moroz G (2017). _lingtypology: Linguistic Typology and Mapping_.
## <URL: http://CRAN.R-project.org/package=lingtypology>.
##
## A BibTeX entry for LaTeX users is
##
## @Manual{,
## title = {lingtypology: Linguistic Typology and Mapping},
## author = {George Moroz},
## year = {2017},
## url = {http://CRAN.R-project.org/package=lingtypology},
## }