Kotlin 语言参考文档 中文版 Help

Data visualization with Lets-Plot for Kotlin

Lets-Plot for Kotlin (LPK) is a multiplatform plotting library that ports the R's ggplot2 library to Kotlin. LPK brings the feature-rich ggplot2 API to the Kotlin ecosystem, making it suitable for scientists and statisticians who require sophisticated data visualization capabilities.

LPK targets various platforms, including Kotlin notebooks, Kotlin/JS, JVM's Swing, JavaFX, and Compose Multiplatform. Additionally, LPK has seamless integration with IntelliJ, DataGrip, DataSpell, and PyCharm.

Lets-Plot

This tutorial demonstrates how to create different plot types with the LPK and Kotlin DataFrame libraries using Kotlin Notebook in IntelliJ IDEA.

Before you start

  1. Download and install the latest version of IntelliJ IDEA Ultimate.

  2. Install the Kotlin Notebook plugin in IntelliJ IDEA.

  3. Create a new notebook by selecting File | New | Kotlin Notebook.

  4. In your notebook, import the LPK and Kotlin DataFrame libraries by running the following command:

    %use lets-plot %use dataframe

Prepare the data

Let's create a DataFrame that stores simulated numbers of the monthly average temperature in three cities: Berlin, Madrid, and Caracas.

Use the dataFrameOf() function from the Kotlin DataFrame library to generate the DataFrame. Paste and run the following code snippet in your Kotlin Notebook:

// The months variable stores a list with 12 months of the year val months = listOf( "January", "February", "March", "April", "May", "June", "July", "August", "September", "October", "November", "December" ) // The tempBerlin, tempMadrid, and tempCaracas variables store a list with temperature values for each month val tempBerlin = listOf(-0.5, 0.0, 4.8, 9.0, 14.3, 17.5, 19.2, 18.9, 14.5, 9.7, 4.7, 1.0) val tempMadrid = listOf(6.3, 7.9, 11.2, 12.9, 16.7, 21.1, 24.7, 24.2, 20.3, 15.4, 9.9, 6.6) val tempCaracas = listOf(27.5, 28.9, 29.6, 30.9, 31.7, 35.1, 33.8, 32.2, 31.3, 29.4, 28.9, 27.6) // The df variable stores a DataFrame of three columns, including monthly records, temperature, and cities val df = dataFrameOf( "Month" to months + months + months, "Temperature" to tempBerlin + tempMadrid + tempCaracas, "City" to List(12) { "Berlin" } + List(12) { "Madrid" } + List(12) { "Caracas" } ) df.head(4)

You can see that the DataFrame has three columns: Month, Temperature, and City. The first four rows of the DataFrame contain records of the temperature in Berlin from January to April:

Dataframe exploration

To create a plot using the LPK library, you need to convert your data (df) into a Map type that stores the data in key-value pairs. You can easily convert a DataFrame into a Map using the .toMap() function:

val data = df.toMap()

Create a scatter plot

Let's create a scatter plot in Kotlin Notebook with the LPK library.

Once you have your data in the Map format, use the geomPoint() function from the LPK library to generate the scatter plot. You can specify the values for the X and Y axes, as well as define categories and their color. Additionally, you can customize the plot's size and point shapes to suit your needs:

// Specifies X and Y axes, categories and their color, plot size, and plot type val scatterPlot = letsPlot(data) { x = "Month"; y = "Temperature"; color = "City" } + ggsize(600, 500) + geomPoint(shape = 15) scatterPlot

Here's the result:

Scatter plot

Create a box plot

Let's visualize the data in a box plot. Use the geomBoxplot() function from the LPK library to generate the plot and customize colors with the scaleFillManual() function:

// Specifies X and Y axes, categories, plot size, and plot type val boxPlot = ggplot(data) { x = "City"; y = "Temperature" } + ggsize(700, 500) + geomBoxplot { fill = "City" } + // Customizes colors scaleFillManual(values = listOf("light_yellow", "light_magenta", "light_green")) boxPlot

Here's the result:

Box plot

Create a 2D density plot

Now, let's create a 2D density plot to visualize the distribution and concentration of some random data.

Prepare the data for the 2D density plot

  1. Import the dependencies to process the data and generate the plot:

    %use lets-plot @file:DependsOn("org.apache.commons:commons-math3:3.6.1") import org.apache.commons.math3.distribution.MultivariateNormalDistribution
  2. Paste and run the following code snippet in your Kotlin Notebook to create sets of 2D data points:

    // Defines covariance matrices for three distributions val cov0: Array<DoubleArray> = arrayOf( doubleArrayOf(1.0, -.8), doubleArrayOf(-.8, 1.0) ) val cov1: Array<DoubleArray> = arrayOf( doubleArrayOf(1.0, .8), doubleArrayOf(.8, 1.0) ) val cov2: Array<DoubleArray> = arrayOf( doubleArrayOf(10.0, .1), doubleArrayOf(.1, .1) ) // Defines the number of samples val n = 400 // Defines means for three distributions val means0: DoubleArray = doubleArrayOf(-2.0, 0.0) val means1: DoubleArray = doubleArrayOf(2.0, 0.0) val means2: DoubleArray = doubleArrayOf(0.0, 1.0) // Generates random samples from three multivariate normal distributions val xy0 = MultivariateNormalDistribution(means0, cov0).sample(n) val xy1 = MultivariateNormalDistribution(means1, cov1).sample(n) val xy2 = MultivariateNormalDistribution(means2, cov2).sample(n)

    From the code above, the xy0, xy1, and xy2 variables store arrays with 2D (x, y) data points.

  3. Convert your data into a Map type:

    val data = mapOf( "x" to (xy0.map { it[0] } + xy1.map { it[0] } + xy2.map { it[0] }).toList(), "y" to (xy0.map { it[1] } + xy1.map { it[1] } + xy2.map { it[1] }).toList() )

Generate the 2D density plot

Using the Map from the previous step, create a 2D density plot (geomDensity2D) with a scatter plot (geomPoint) in the background to better visualize the data points and outliers. You can use the scaleColorGradient() function to customize the scale of colors:

val densityPlot = letsPlot(data) { x = "x"; y = "y" } + ggsize(600, 300) + geomPoint( color = "black", alpha = .1 ) + geomDensity2D { color = "..level.." } + scaleColorGradient(low = "dark_green", high = "yellow", guide = guideColorbar(barHeight = 10, barWidth = 300)) + theme().legendPositionBottom() densityPlot

Here's the result:

2D density plot

What's next

最终更新: 2024/11/17