Pygg – Ggplot2 Syntax in Python

0

ggplot2 syntax in python. In fact wrapper around Wickham’s ggplot2 in R

In particular appropriate form can bear to you bear preprocessed CSVs or Postgres data to render. Passable
toughen for easy data in python lists, dictionaries, and panda DataFrame objects

pygg permits you to utilize ggplot2 syntax nearly verbatim in Python,
and kill the ggplot program in R. Since right here’s factual a wrapper
and passes all arguments to the R backend, it’s miles practically solely
API compatible.

For a nearly exhaustive listing of supported ggplot2 functions, scrutinize bin/make_ggplot2_functions.R.

Setup

  • set up R
# on osx
brew set up R

# on unix e.g., ubuntu
sudo appropriate form-assemble set up R
  • set up R programs (creep the next in the R shell)
set up.programs("ggplot2")
set up.programs("RPostgreSQL")   # no longer main

Set up

Expose line usage

runpygg.py --succor
runpygg.py -c "ggplot('diamonds', aes('carat', 'place')) + geom_point()" -o test.pdf
runpygg.py -c "ggplot('diamonds', aes('carat', 'place')) + geom_point()" -csv foo.csv

For Python usage, scrutinize tests/instance.py

from pygg import *

# Instance the use of diamonds dataset (comes with ggplot2)
p = ggplot('diamonds', aes('carat', y='place'))
g = geom_point() + facet_wrap(None, "color")
ggsave("test1.pdf", p+g, data=None)

The library performs a straightforward syntactic translation from python
ggplot objects to R code. On account of of this, there are some quirks
referring to datasets and the map in which we tackle strings.

Datasets

In R, ggplot straight references the details body object present in the runtime
(e.g., ggplot(, aes(...)). Alternatively, the python
objects being plotted are indirectly accessible in the R runtime.

pygg gives two programs of loading datasets from Python into R.

The principle map is to explicitly pass the details object to ggsave the use of its data key phrase argument.
ggsave then converts the details object to a lawful CSV file, writes it to a temp file,
and loads it into the data variable in R for use with the ggplot2 functions

For instance (undercover agent that the string "data" is passed to ggplot()):

    df = pandas.DataFrame(...)
    p = ggplot("data", aes(...)) + geom_point()
    ggsave("out.pdf", p, data=df)

To boot, we provide several convenience functions that generate
the suitable R code for total python dataset codecs:

  • csv file: can bear to you bear a CSV file already, provide the filename to data
        p = ggplot("data", aes(...)) + geom_point()
        ggsave("out.pdf", p, data="file.csv")

        # or more explicitly, pass a wrapped object that represents the csv file:

        ggsave("out.pdf", p, data=data_py("file.csv"))

  • python object: in case your data is a python object in columnar ({x: [1,2], y: [3,4]})
    or row ([{x:1,y:3}, {x:2,y:4}]) format
        p = ggplot("data", aes(...)) + geom_point()
        ggsave("out.pdf", p, data={'x': [1,2], 'y': [3,4]})
  • pandas dataframe: in case your data is a pandas data body object already
    which you will be in a position to factual provide the dataframe df straight to data
        p = ggplot("data", aes(...)) + geom_point()
        ggsave("out.pdf", p, data=df)
  • PostgresQL: in case your data is saved in a postgres database
        p = ggplot("data", aes(...)) + geom_point()
        ggsave("out.pdf", p, data=data_sql('DBNAME', 'SELECT FROM ...')
  • existing R datasets: can you talk over with any R dataframe object the use of the
    first argument to ggplot()
        p = ggplot('diamonds', aes(...)) + geom_point()
        ggsave("out.pdf", p, data=None)

String arguments

By default, the library straight prints a python string argument into the
R code string. For instance the next python code to location the x axis mark
would generate incorrect R code:

    # incorrect python code
    scales_x_continuous(name="particular mark")

    # incorrect generated R code
    scales_x_continuous(name=particular mark)

    # valid python code
    scales_x_continuous(name="'particular mark'")

    # valid generated R code
    scales_x_continuous(name='particular mark')

    # much less convenient but more train alternative syntax
    scales_x_continuous(name=pygg.esc('particular mark'))

You might perhaps well bear to explicitly wrap these forms of strings (meant as R strings)
in a layer of quotes. For convenience, we robotically provide wrapping
for total functions:

    # "filename.pdf" is wrapped
    ggsave("filename.pdf", p)

Comfort Capabilities

Passing data to ggplot() straight

It feels silly to pass a dummy "data" string to ggplot() after which pass the article to
ggsave. Now we bear prolonged the ggplot() call so it recognizes non string python data objects
and uses the details object by default for the length of the ggsave call:

    df = pandas.DataFrame(...)
    p = ggplot(df, aes(...)) + geom_point()
    ggsave("out.pdf", p)

    p = ggplot(dict(x=[0,1], y=[3,4]), aes(x='x', y='y')) + geom_point()
    ggsave("out.pdf", p)

Masks that now not like ggsave, it’s miles no longer natty ample to scream apart string arguments that
are R variable names and file names. Thus, the next will probably consequence in an error because it
assumes the R variable data.csv exists in the atmosphere when the truth is it’s the name of a csv file
to be loaded:

    p = ggplot("data.csv", aes(x='x', y='y')) + geom_point()
    ggsave("out.pdf", p)

Merely wrap the filename with a data_py() call:

    p = ggplot(data_py("data.csv"), aes(x='x', y='y')) + geom_point()
    ggsave("out.pdf", p)
Axis Labels

axis_labels() is a shortcut for atmosphere the x and y axis titles and scale forms.
The next names the x axis "Dataset Size (MB)"and gadgets it to log scale,
names the y axis "Latency (sec)"and is by default valid scale, and
gadgets the breaks for the x axis to [0, 10, 100, 5000]:

    p = ggplot(...)
    p += axis_labels("Dataset Size (MB)", 
                    "Latency (sec)", 
                    "log10",  
                    xkwargs=dict(breaks=[0, 10, 100, 5000]))

Alternate alternate solutions

  • yhat’s ggplot: yhat’s
    port of ggplot is de facto good. It runs the full lot natively in
    python, works with numpy data structures, and renders the use of matplotlib.
    pygg exists partly attributable to personal desire, and partly because
    the R model of ggplot2 is more ragged, and its format algorithms are
    in actual fact in actual fact appropriate form.

  • pyggplot: Pyggplot does no longer adhere
    strictly to R’s ggplot syntax but pythonifies it, making it more challenging to transpose
    ggplot2 examples. Also pyggplot requires rpy2.

  • plotnine: one other implementation of ggplot2 in Python

Read More

Leave A Reply

Your email address will not be published.