%% tikzviolinplots.tex %% Copyright 2023 Pedro Callil-Soares % % This work may be distributed and/or modified under the % conditions of the LaTeX Project Public License, either version 1.3 % of this license or (at your option) any later version. % The latest version of this license is in % http://www.latex-project.org/lppl.txt % and version 1.3 or later is part of all distributions of LaTeX % version 2005/12/01 or later. % % This work has the LPPL maintenance status `maintained'. % % The Current Maintainer of this work is Pedro Callil-Soares. % % This work consists of the files tikzviolinplots.sty and % tikzviolinplots.tex. \documentclass{article} \usepackage{amsmath} \usepackage{pgfplots} \pgfplotsset{width=\textwidth,compat=1.18} \usepgfplotslibrary{statistics} \usepackage{tikzviolinplots} \usepgfplotslibrary{external} \tikzexternalize \usetikzlibrary{arrows.meta} \usetikzlibrary{decorations.text} \usetikzlibrary{decorations.pathmorphing} \usepackage{minted} \usemintedstyle{gruvbox-light} \usepackage{scontents} \usepackage{wasysym} \usepackage{microtype} \usepackage{subcaption} \usepackage{hyperref} \begin{scontents}[write-out=violinandboxplotexample.dat] A B C D 0.876 0.574 2.175 1.684 1.015 0.822 2.121 2.290 0.995 1.266 1.916 2.184 1.101 0.740 2.044 2.065 1.222 0.636 2.004 2.248 0.712 1.753 2.038 2.200 1.063 0.873 1.974 1.865 0.973 1.020 2.145 1.895 1.174 0.763 2.052 2.240 1.057 1.293 2.079 2.360 1.004 0.853 2.064 1.695 0.909 0.694 2.136 1.837 1.108 0.849 2.000 1.744 1.210 0.468 2.010 1.820 0.958 0.767 1.824 2.158 1.119 0.940 2.034 1.846 1.000 0.962 1.967 2.545 0.954 1.551 1.833 1.816 0.918 0.755 1.914 2.369 1.093 0.973 1.997 1.543 \end{scontents} \begin{scontents}[write-out=example.dat] A B C D E 0.3 -2.1 3.50 2.89 1.00 0.41 -1.9 3.55 2.88 1.06 0.45 -1.5 3.55 3.13 1.00 0.46 -1.3 3.60 2.69 1.20 0.46 -1.3 3.60 2.78 1.00 0.46 -1.27 3.60 2.83 1.35 0.47 -1.26 3.65 3.08 1.00 0.47 -1.26 3.65 3.08 1.53 0.48 -1.24 3.65 2.73 1.00 0.51 -1.2 3.65 3.08 1.73 0.57 -1.13 3.65 3.24 1.00 2.3 -1.02 3.70 3.10 1.95 2.41 -0.9 3.70 2.98 1.00 2.46 -0.2 3.70 2.98 2.21 2.47 0.0 3.75 3.04 1.00 2.48 0.1 3.80 3.24 2.49 2.51 0.3 3.85 3.16 1.00 2.57 0.5 3.85 3.30 3.04 \end{scontents} \title{The \texttt{tikzviolinplots} package} \author{Pedro Callil-Soares} \date{\today} \begin{document} \maketitle \tableofcontents \begin{abstract} The package provides commands for violin plot creation and the kernel density estimations required. \end{abstract} \section{Introduction} This package, through the use of the package \texttt{pgfplots}, allows the creation of violin plots in \LaTeX. Violin plots are similar to boxplots, but instead of a box signalling the average and quartiles, a kernel density estimator is plotted, as in equation \ref{eq:kde}, in which the function $k$ (the kernel) is a probability distribution, the positive number $h$ (the bandwidth) is a smoothing factor and $n$ is the sample size. \begin{equation} \label{eq:kde} \textnormal{KDE}(x) = % \cfrac{1}{nh}\sum_{i=1}^nk\left(\cfrac{x-x_i}{h}\right) \end{equation} A comparison between the two plots, showcasing its similarities, is shown in figures \ref{fig:example}. The violin plot in figure \ref{graph:violin_example} assumes normal data, and the bandwidth (smoothing factor $h$ in equation \ref{eq:kde}) is defined accordingly. \pgfplotsset{height=1.6\linewidth} \begin{figure}[h!] \centering \begin{subfigure}{0.5\textwidth} \centering \begin{tikzpicture} \begin{axis} [ boxplot/draw direction=y, ymax=3, ymin=0, xmin=0, xmax=5, ymajorgrids=true, xtick={1,2,3,4}, xticklabels={$\alpha$,$\beta$,$\gamma$,$\delta$}, ylabel={Some property}, ] \addplot+[boxplot, blue!100!red, fill=blue!100!red, fill opacity=0.50, no marks] table [y=A] {violinandboxplotexample.dat}; \addplot+[boxplot, blue!66!red, fill=blue!66!red, fill opacity=0.50, no marks] table [y=B] {violinandboxplotexample.dat}; \addplot+[boxplot, blue!33!red, fill=blue!33!red, fill opacity=0.50, no marks] table [y=C] {violinandboxplotexample.dat}; \addplot+[boxplot, blue!0!red, fill=blue!0!red, fill opacity=0.50, no marks] table [y=D] {violinandboxplotexample.dat}; \end{axis} \end{tikzpicture} \caption{Box plot} \label{graph:box_example} \end{subfigure}% \hfill% \begin{subfigure}{0.5\textwidth} \centering \begin{tikzpicture} \violinsetoptions[ averages, data points, scaled, ]{ xmin=0,xmax=5, ymin=0,ymax=3, xlabel style={ yshift = {-2*height("a")} }, ymajorgrids=true, ylabel={Same property}, } \violinplotwholefile[% primary color=red, secondary color=blue, indexes={A,B,C,D}, spacing=1.0, labels={% $\alpha$, $\beta$, $\gamma$, $\delta$, }, col sep=tab, dataset size=1pt, dataset mark=*, dataset fill=black!50!white, dataset fill opacity=1.0, average mark=x, average size=5pt, ]{violinandboxplotexample.dat} \end{tikzpicture} \caption{Violin plot} \label{graph:violin_example} \end{subfigure} \caption{Box and violin plot examples} \label{fig:example} \end{figure} \pgfplotsset{height=0.9\linewidth} \section{Usage} To plot a violin plot with the commands provided, one must, inside a \texttt{tikzpicture} environment, set the general options to all plots and insert each individual dataset. To set the general options, the command \texttt{{\textbackslash}violinsetoptions} is provided, and must be invoked before plotting the datasets. This should be done with the commands \texttt{{\textbackslash}violinplot} or \texttt{{\textbackslash}violinplotwholefile}. \subsection{General options: \texttt{{\textbackslash}violinsetoptions}} The command \texttt{{\textbackslash}violinsetoptions} takes two arguments, an optional argument with package-specific options and a mandatory argument with options to be passed to \texttt{pgfplots}. \begin{minted}[escapeinside=||]{latex} \violinsetoptions[|\textit{}|]% {|\textit{}|} \end{minted} \subsubsection{Package-specific options} There are five options specific to the package: \texttt{scaled}, \texttt{data points}, \texttt{averages}, \texttt{no mirror} and \texttt{reverse axis}, controlling how and which information in the datasets should be presented. The option \texttt{scaled} controls if all plots in the graph have the same area or same width. If passed, the kernel distribution estimations will be scaled to the same width, as shown in figure \ref{graph:violin_verti}; otherwise, the plots will present the same area, as in figure \ref{graph:violin_horiz}. The option \texttt{data points}, if passed, will show, along with the violin plots, the distribution of points in the data set, as shown in figure \ref{graph:violin_verti}. If the option \texttt{averages} is passed, the average of the data set elements is shown, as in figure \ref{graph:violin_horiz}. The plots are mirrored by default; however, passing the option \texttt{no mirror} will show only half the plot, as shown in figure \ref{graph:violin_horiz}. Finally, to ``transpose'' the plots (\textit{i.e.} show the distributions as functions of the abcissa, as in figure \ref{graph:violin_horiz}, and not as functions of ordinate, as in figure \ref{graph:violin_verti}), one might use the option \texttt{reverse axis}. \subsubsection{Plot limits and other \texttt{pgfplots} options} The minima and maxima of the plot axes must be set in the second (and first mandatory) argument to the command, and should follow \texttt{pgfplots} syntax. For instance, to set the minimum and maximum of the $x$-axis to -3 and 6, and of the $y$-axis to 2.5 and 7, one might use: \begin{minted}[escapeinside=||]{latex} \violinsetoptions[|\textit{}|]% {xmin=-3, xmax=6, ymin=2.5, xmin=7,% |\textit{}|} \end{minted} Other \texttt{pgfplots} expressions such as title or axes labels may be set in the same way in this argument. \subsection{Options for each data set: \texttt{{\textbackslash}violinplot}} If the data sets are not very similar and/or advanced customizations are desired, \texttt{{\textbackslash}violinplot} should be used to plot each data set individually. This command takes one mandatory argument, and a list of options: \begin{minted}[escapeinside=||]{latex} \violinplot[% |\textit{