avt

Some graphing code that I’ve found helpful.

class avt.ReliabilityDisplay

Class to plot a reliability diagram from predictions.

Examples

This can be used to plot the reliability diagram from predictions made by a classifier, such as:

>>> y_prob = clf.predict_proba(X)
>>> ax = avt.ReliabilityDisplay.from_predictions(
...     y,
...     y_prob,
... )
Alternative text

Or from the classifier directly:

>>> ax = avt.ReliabilityDisplay.from_estimator(
...     clf, X, y
... )
Alternative text
from_estimator(X, y, *args, **kwargs) ndarray

Function to plot the reliability figure from an estimator.

Parameters:
  • estimator (-) – The estimator to use to make predictions.

  • X (-) – The data to use to make predictions.

  • y (-) – The true labels.

  • *args (-) –

    Positional arguments to pass to the from_predictions method.

  • **kwargs (-) –

    Keyword arguments to pass to the from_predictions method.

Returns:

- axes – The axes of the plots.

Return type:

numpy.ndarray:

from_predictions(probas_pred: ndarray, n_bins: int = 10, legend: bool = True, accuracy_color='xkcd:baby blue', gap_color='xkcd:rose', hist_color='xkcd:lilac', diagonal_color='xkcd:dark red', accuracy_line_color='black', confidence_line_color='black', bar_kwargs: dict = {}, histogram_kwargs: dict = {}, line_kwargs: dict = {}, label_box_kwargs: dict = {}, label_ys: Tuple[float, float] = (0.8, 0.6), label_offset: float = -0.01, ax1: Axes | None = None, ax2: Axes | None = None) ndarray

Function to plot the figure from the predictions.

Parameters:
  • y_true (-) – True labels. Please note that this vector should only contain integers that correspond to the probabilities in the second dimension of probas_pred.

  • probas_pred (-) – Predicted probabilities, as returned by a classifier’s predict_proba method. This array should have the shape (n_samples, n_classes).

  • n_bins (-) – The number of bins to use when calculating ECE. Defaults to 15.

  • legend (-) – Whether to plot the legend. Defaults to True.

  • accuracy_color (-) – The colour to use for the accuracy bars. Defaults to xkcd:baby blue.

  • gap_color (-) – The colour to use for the gap bars. Defaults to xkcd:rose.

  • hist_color (-) – The colour to use for the histogram. Defaults to xkcd:lilac.

  • diagonal_color (-) – The colour to use for the diagonal line. Defaults to xkcd:dark red.

  • accuracy_line_color (-) – The colour to use for the accuracy line. Defaults to black.

  • confidence_line_color (-) – The colour to use for the confidence line. Defaults to black.

  • bar_kwargs (-) – Keyword arguments to pass to the bar plot. Defaults to {}.

  • histogram_kwargs (-) – Keyword arguments to pass to the histogram plot. Defaults to {}.

  • line_kwargs (-) – Keyword arguments to pass to the line plot. Defaults to {}.

  • label_box_kwargs (-) – Keyword arguments to pass to the label bbox in the histogram plot. Defaults to {}.

  • label_ys (-) – The y coordinates of the labels in the histogram plot. Defaults to (0.8, 0.6).

  • label_offset (-) – The offset of the labels in the histogram plot.

  • ax1 (-) – The axes to plot the reliability diagram on. Defaults to None.

  • ax2 (-) – The axes to plot the histogram on. Defaults to None.

Returns:

- axes – The axes of the plots.

Return type:

numpy.ndarray:

avt.bar_labels(obj: figure, labels: None | List[str] = None, label_format: str = '{height}', **kwargs) figure

Adds labels to bars in graph.

Examples

The following will add label heights and format them as a percentage:

>>> g = sns.catplot(...)
>>> avt.bar_labels(g.figure, label_format='{height:.1%}')

The following will add the labels 'bar1' and 'bar2' to the bars in the first axes and 'bar3' and 'bar4' to the bars in the second axes.

>>> g = sns.catplot(...)
>>> labels = [
        ['bar1','bar2'], # first axis
        ['bar3', 'bar4'] # second axis
        ]
>>> avt.bar_labels(g.figure, labels=labels)
Parameters:
  • obj (-) – The matplotlib figure or axes to add bar labels to.

  • labels (-) – Labels to add to the bars. This should be a list of lists, in which the outer list acts over the axes in the plot and the inner list is the labels for the bars. If None, then the bar labels will be the heights of the bars. Defaults to None.

  • label_format (-) – The format of the bar labels, used only if labels=None. This needs to contain the word height, in {}, where the bar heights will be placed. Defaults to '{height}'.

  • kwargs (-) – Keyword arguments passed to plt.bar_label. Examples could be rotation, padding, label_type, and fontsize.

Returns:

- out – Matplotlib figure, containing the added bar labels.

Return type:

plt.figure:

avt.boxplot(*args, **kwargs)

This is a wrapper for the seaborn boxplot function that includes some default formatting.

By default, all lines will have width 2, be black and the boxplot width is 0.75.

avt.cfmplot(cfm: ndarray, cbar: bool = True, color: str = 'Blues', xlabel: bool = True, ylabel: bool = True, summary_statistics: bool = True, ax: None | axes = None, annot_count=True, annot_percentage=True, **kwargs) axes

Draw a confusion matrix plot from a numpy array.

This function was based on code from https://github.com/DTrimarchi10/confusion_matrix/blob/master/cfm_matrix.py.

Examples

When using this function to plot multi-label task confusion matrices, we will get something similar to the following:

>>> cfm = np.array(
    [[10,  0,  0],
    [ 0, 12,  0],
    [ 0,  1, 15]],
    dtype=int64
    )
>>> cfm_plot(cfm)
Alternative text

If we have a binary task, the summary statistics are more extensive:

>>> cfm = np.array(
    [[15,  0],
    [ 0, 10]],
    dtype=int64
    )
>>> cfm_plot(cfm)
Alternative text
Parameters:
  • cfm (-) – A numpy array representing the confusion matrix.

  • cbar (-) – Whether to add a colour bar. Defaults to True.

  • color (-) – The cmap that can be used for colors. Defaults to 'Blues'.

  • xlabel (-) – whether to include a label on the x axis. Defaults to True.

  • ylabel (-) – whether to include a label on the y axis. Defaults to True.

  • summary_statistics (-) – Whether to add summary statistics to the label on the x axis. Defaults to True.

  • ax (-) – Axes in which to draw the plot, otherwise use the currently-active Axes. Defaults to None.

  • annot_count (-) – Add the counts to the heatmap. Defaults to True.

  • annot_percentage (-) – Add the percentages to the heatmap. Defaults to True.

  • kwargs (-) – All other keyword arguments are passed to sns.heatmap.

Returns:

- out – Axes object with the confusion matrix.

Return type:

plt.axes:

avt.clockplot(data: DataFrame, x: str | None = None, y: str | None = None, hue: str | None = None, ax: None | axes = None, hue_order: None | List[str] = None, freq: str = '30T', label_format: bool | str = True, label_freq: str | None = None, cmap: Colormap | str | None = None, legend: bool = True, label_kwargs: Dict[str, Any] = {}, **kwargs)

This function plots a circular graph representing a day, with bars representing frequencies.

Examples

>>> ax = avt.clockplot(
        data,
        x='datetime',
        hue='group',
        label_format='%H:%M',
        label_freq='3H',
        )

This will return the plot:

Alternative text
Parameters:
  • data (-) – The data.

  • x (-) – The column name containing the datetimes to use for calculating the time bins. Defaults to None.

  • y (-) – Ignored. Defaults to None.

  • hue (-) – Semantic variable that is mapped to determine the color of plot elements. This will determine the stacked bars. Defaults to None.

  • ax (-) – A matplotlib axes that the plot can be drawn on. Defaults to None.

  • hue_order (-) – The order of the hue and stacked bars. Defaults to None.

  • freq (-) – The frequency to bin the bars at. https://pandas.pydata.org/pandas-docs/stable/user_guide/timeseries.html#offset-aliases. Defaults to '30T'.

  • label_format (-) – The format of the time labels. Any argument to dt.strftime is acceptable. https://docs.python.org/3/library/datetime.html#strftime-strptime-behavior. Defaults to True.

  • label_freq (-) – How often to show the time labels should be shown. https://pandas.pydata.org/pandas-docs/stable/user_guide/timeseries.html#offset-aliases. True leaves the labels as default and False removes them. Defaults to None.

  • cmap (-) – The colours of the plot. If a string is passed, this will be used to colour all of the stacked bars. If a cmap is passed, then this is used. If None, then matplotlib handles the colours. Defaults to None.

  • legend (-) – Whether to plot a legend. Defaults to True.

  • label_kwargs (-) – Keyword arguments to pass to the time labels. These are passed to plt.text. Defaults to {}.

  • kwargs (-) – Any other keyword arguments are passed to plt.bar. From here, you can change a variety of the bar attributes.

Returns:

- out – The axes containing the plot.

Return type:

plt.axes:

avt.parallelplot(data: ~pandas.core.frame.DataFrame | None = None, x: str | None = None, y: str | None = None, hue: str | None = None, units: str | None = None, estimator: ~typing.Callable = <function mean>, hue_order: ~typing.List[str] | None = None, order: ~typing.List[str] | None = None, bezier: bool = True, cmap: str | ~matplotlib.colors.Colormap | None = None, cbar: bool = False, cbar_x: float = 1.1, legend: bool = True, tick_top: bool = True, cbar_kwargs: ~typing.Dict[str, ~typing.Any] = {}, legend_kwargs: ~typing.Dict[str, ~typing.Any] = {}, ax=None, **kwargs) Axes

Plot a parallel plot.

This has been edited from https://stackoverflow.com/a/60401570/19451559

Examples

You can plot:

Alternative text
Parameters:
  • data (-) – The data to plot. Must be in long format. Defaults to None.

  • x (-) – The name of the column in data that contains the categories to be plot along the x axis. Defaults to None.

  • y (-) – The name of the column in data that contains the values to be plotted for each category. Defaults to None.

  • hue (-) – The name of the column in data that contains the categories to be used to color the lines. Defaults to None.

  • units (-) – The name of the column in data that contains the categories to be used to distinguish between lines that have the same hue value. If not None, then no estimation will be performed. Defaults to None.

  • estimator (-) – The function to use to estimate the value of y for each category. This is used if units is not None. Defaults to np.mean.

  • hue_order (-) – The order in which to plot the hue categories. Defaults to None.

  • bezier (-) – Whether to use bezier curves to connect the points. Defaults to True.

  • cmap (-) – The colormap to use to color the lines. Defaults to None.

  • cbar (-) – Whether to add a colorbar. Defaults to False.

  • cbar_x (-) – The x position of the colorbar. Defaults to 1.1.

  • legend (-) – Whether to add a legend. Defaults to True.

  • tick_top (-) – Whether to put the ticks on the top of the plot. If False, then the ticks will be on the bottom. Defaults to True.

  • cbar_kwargs (-) – Additional keyword arguments to pass to plt.colorbar. Defaults to {}.

  • legend_kwargs (-) – Additional keyword arguments to pass to plt.legend. Defaults to {}.

  • ax (-) – The axes on which to plot. Defaults to None.

  • **kwargs (-) –

    Additional keyword arguments to pass to plt.plot or patches.PathPatch.

  • order (List[str] | None) –

Returns:

- axes – The axes on which the plot was made.

Return type:

plt.Axes:

avt.radarplot(data: ~pandas.core.frame.DataFrame | None = None, x: str | None = None, y: str | None = None, hue: str | None = None, order: ~typing.List[str] | None = None, hue_order: ~typing.List[str] | None = None, estimator: ~typing.Callable = <function mean>, fill: bool = True, cmap: str | ~typing.List[str] | None = None, legend: bool = True, ax: ~matplotlib.axes._axes.Axes | None = None, **kwargs) Axes

This function allows you to create a radar plot from a dataframe.

Examples

You can build plots like:

Radar Plot

To see the code producing this plot, view the examples ipynb file.

Parameters:
  • data (-) – The dataframe containing the data to plot. Defaults to None.

  • x (-) – The name of the column containing the categories to plot. Defaults to None.

  • y (-) – The name of the column containing the values to plot. Defaults to None.

  • hue (-) – The name of the column containing the hues to plot. Defaults to None.

  • order (-) – The order of the categories to plot. Defaults to None.

  • hue_order (-) – The order of the hues to plot. Defaults to None.

  • estimator (-) – The function to use to aggregate the values when plotting. Defaults to np.mean.

  • fill (-) – Whether to fill the area under the curve. Defaults to True.

  • cmap (-) – The name of the colormap to use or a list of colours to use. Defaults to None.

  • legend (-) – Whether to show the legend. Defaults to True.

  • ax (-) – The axes to plot on. Defaults to None.

  • **kwargs (-) –

    Additional keyword arguments to pass to plt.plot.

Returns:

- ax – The axes containing the plot.

Return type:

plt.Axes:

avt.save_fig(fig: figure, file_name: str, **kwargs) None

This function saves a pdf, png, and svg of the figure, with bbox_inches='tight' and dpi=300.

Parameters:
  • fig (-) – The figure to save.

  • file_name (-) – The file name, including path, to save the figure at. This should not include the extension, which will be added when each file is saved.

Return type:

None

avt.scatter3dplot(data: DataFrame, x: str | None = None, y: str | None = None, z: str | None = None, hue: str | None = None, style: str | None = None, size: str | int = 25, ax: None | axes = None, hue_order: None | List[str] = None, style_order: None | List[str] = None, cmap: Colormap | str | None = 'RdBu_r', legend: bool = True, **kwargs) axes

This function plots a 3D scatterplot.

Note: This currently only supports categorical hue values. Please also ensure that ax is a 3D projection axes.

Examples

>>> import avt
>>> import numpy as np
>>> from sklearn.datasets import load_iris
>>> data_dict = load_iris(as_frame=True)
>>> data, target = data_dict['data'], data_dict['target']
>>> data['target'] = target
>>> data['random group'] = np.random.choice(2, size=len(data))
>>> ax = avt.scatter3dplot(
        data=data,
        x='sepal length (cm)',
        y='sepal width (cm)',
        z='petal length (cm)',
        hue='target',
        size='petal width (cm)',
        style='random group',
        )

This will return the plot:

Alternative text

Alternatively, you can use an animation to better visualise the distribution:

>>> import avt
>>> import numpy as np
>>> from sklearn.datasets import load_iris
>>> from matplotlib import animation
>>> data_dict = load_iris(as_frame=True)
>>> data, target = data_dict['data'], data_dict['target']
>>> data['target'] = target
>>> data['random group'] = np.random.choice(2, size=len(data))
>>> ax = avt.scatter3dplot(
        data=data,
        x='sepal length (cm)',
        y='sepal width (cm)',
        z='petal length (cm)',
        hue='target',
        size='petal width (cm)',
        style='random group',
        )
>>> ani = animation.FuncAnimation(
        ax.figure,
        lambda x: ax.view_init(30,x),
        frames=np.linspace(1,360,90),
        interval=1,
        blit=False,
        )
>>> ax.figure.tight_layout()

This will return the plot:

Alternative text
Parameters:
  • data (-) – The data.

  • x (-) – The column name containing the x values. Defaults to None.

  • y (-) – The column containing the y values. Defaults to None.

  • z (-) – The column containing the z values. Defaults to None.

  • hue (-) – Semantic variable that is mapped to determine the color of plot elements. Defaults to None.

  • style (-) – Semantic variable that is mapped to determine the shape of plot elements. Defaults to None.

  • size (-) – Semantic variable that is mapped to determine the size of the plot elements. If str then the sizes are determined by the column values. If int then this will be used as the size. Defaults to None.

  • ax (-) – A matplotlib axes that the plot can be drawn on. Defaults to None.

  • hue_order (-) – The order of the hue and legend. Defaults to None.

  • style_order (-) – The order of the style and legend. Defaults to None.

  • cmap (-) – The colours of the plot. If a string is passed, this will be used to colour all of the stacked bars. If a cmap is passed, then this is used. If None, then matplotlib handles the colours. Defaults to None.

  • legend (-) – Whether to plot a legend. Defaults to True.

  • kwargs (-) – Any other keyword arguments are passed to plt.scatter. From here, you can change a variety of the bar attributes.

Returns:

- out – The axes containing the plot.

Return type:

plt.axes:

avt.set_colour_map(colours: list = ['#332288', '#88CCEE', '#44AA99', '#117733', '#999933', '#DDCC77', '#CC6677', '#882255', '#AA4499'])

Sets the default colour map for all plots.

Examples

The following sets the colourmap to tol_muted:

>>> set_colour_map(colours=avt.tol_muted)
Parameters:

colours (-) – Format that is accepted by cycler.cycler. Defaults to tol_muted.

avt.stackplot(data: DataFrame, x: str | None = None, y: str | None = None, hue: str | None = None, ax: None | axes = None, hue_order: None | List[str] = None, cmap: Colormap | str | None = None, legend: bool = True, cumulative: bool = False, **kwargs)

This function plots a stacked continuous graph. The missing values are interpolated for all x values.

Examples

>>> import avt
>>> import seaborn as sns
>>> import pandas as pd
>>> flights = sns.load_dataset('flights')
>>> ax = avt.stackplot(flights, x='year', y='passengers', hue='month', cmap='Blues')

This will return the plot:

Stack Plot Example
Parameters:
  • data (-) – The data.

  • x (-) – The column name containing the x values. Defaults to None.

  • y (-) – The column containing the heights. Defaults to None.

  • hue (-) – Semantic variable that is mapped to determine the color of plot elements. This will determine the stacked bars. Defaults to None.

  • ax (-) – A matplotlib axes that the plot can be drawn on. Defaults to None.

  • hue_order (-) – The order of the hue and stacked bars. Defaults to None.

  • cmap (-) – The colours of the plot. If a string is passed, this will be used to colour all of the stacked bars. If a cmap is passed, then this is used. If None, then matplotlib handles the colours. Defaults to None.

  • legend (-) – Whether to plot a legend. Defaults to True.

  • cumulative (-) – If True, then the cumulative values will be plotted, rather than the raw values. Defaults to False.

  • kwargs (-) – Any other keyword arguments are passed to plt.fill_between. From here, you can change a variety of the bar attributes.

Returns:

- out – The axes containing the plot.

Return type:

plt.axes:

avt.temp_colour_map(colours=['#332288', '#88CCEE', '#44AA99', '#117733', '#999933', '#DDCC77', '#CC6677', '#882255', '#AA4499'])

Temporarily sets the default colour map for all plots.

Examples

The following sets the colourmap to tol_muted for the plotting done within the context:

>>> with set_colour_map(colours=avt.tol_muted):
...     plt.plot(x,y)
Parameters:

colours (-) – Format that is accepted by cycler.cycler. Defaults to tol_muted.

avt.timefreqheatmap(data: DataFrame, x: str | None = None, y: str | None = None, hue: str | None = None, ax: None | axes = None, hue_order: None | List[str] = None, freq: str = '30T', label_format: bool | str = True, cmap: List[str] | str | None = None, binary: bool = False, **kwargs)

This function plots a heatmap with the frequencies of data points, against the date.

Examples

>>> ax = avt.timefreqheatmap(
        data,
        x='datetime',
        hue='group',
        freq='1H',
        label_format='%H:%M-%d/%b/%Y',
        cmap='Blues',
        ax=ax
        )

This will return the plot:

Alternative text
Parameters:
  • data (-) – The data.

  • x (-) – The column name containing the datetimes to use for calculating the time bins. Defaults to None.

  • y (-) – Ignored. Defaults to None.

  • hue (-) – Semantic variable that is mapped to determine the color of plot elements. This will determine the different rows of the heatmap. Defaults to None.

  • ax (-) – A matplotlib axes that the plot can be drawn on. Defaults to None.

  • hue_order (-) – The order of the hue and rows. Defaults to None.

  • freq (-) – The frequency to bin the columns of the heatmap at. https://pandas.pydata.org/pandas-docs/stable/user_guide/timeseries.html#offset-aliases. Defaults to '30T'.

  • label_format (-) – The format of the time labels. Any argument to dt.strftime is acceptable. https://docs.python.org/3/library/datetime.html#strftime-strptime-behavior. True leaves the labels as default and False removes them. Defaults to True.

  • cmap (-) – This is the cmap for plotting the colours on the heatmap. If a str, then this should be an acceptable argument to sns.mpl_palette. If None, this heatmap defaults to 'inferno'. Defaults to None.

  • binary (-) – Whether to plot if there was a value or not, rather than the number of recorded values. Defaults to False.

  • kwargs (-) – Any other keyword arguments are passed to sns.heatmap. From here, you can change a variety of the bar attributes.

Returns:

- out – The axes containing the plot.

Return type:

plt.axes:

avt.waterfallplot(data: DataFrame | None = None, x: str | None = None, y: str | None = None, hue: str | None = None, order: List[str] | None = None, base: float = 0, orient: str = 'h', estimator: str | Callable = 'sum', cmap: str | None = None, alpha: bool = 0.75, positive_colour: str = '#648fff', negative_colour: str = '#fe6100', width: float = 0.8, bar_label: bool = True, ax: Axes | None = None, arrow_kwargs: Dict[str, Any] = {}, bar_kwargs: Dict[str, Any] = {}, bar_label_kwargs: Dict[str, Any] = {})

This function allows you to draw a waterfall plot with a dataframe.

Examples

You can build plots like:

Water fall plot

To see the code producing this plot, view the examples ipynb file.

Parameters:
  • data (-) – The dataframe containing the data to plot. It must have at least two columns, one for the x-axis and one for the y-axis. Defaults to None.

  • x (-) – The name of the column in the dataframe that will be used for the x-axis. Defaults to None.

  • y (-) – The name of the column in the dataframe that will be used for the y-axis. Defaults to None.

  • hue (-) – The name of the column in the dataframe that will be used to colour the bars. This is not currently implemented. Defaults to None.

  • order (-) – The order in which the bars should be plotted. This is used for the categorical axis. Defaults to None.

  • base (-) – This is the value of the base of the waterfall plot. If not provided, a line will be drawn here and the first waterfall bar will be drawn from it. Defaults to None.

  • orient (-) – Whether to plot the waterfall horizontally or vertically. Options are 'h' or 'v'. Defaults to h.

  • estimator (-) – The statistical function to use to aggregate the values. Defaults to 'sum'.

  • alpha (-) – The alpha value to use for the arrows. Defaults to 0.75.

  • cmap (-) – The name of the colourmap to use for the bars. If not provided, the bars will be coloured based on whether the value is positive or negative. Defaults to None.

  • positive_colour (-) – The colour to use for positive values. Defaults to '#648fff'.

  • negative_colour (-) – The colour to use for negative values. Defaults to '#fe6100'.

  • width (-) – The width of the bars. Defaults to 0.8.

  • bar_label (-) – Whether to add labels to the bars. Defaults to True.

  • ax (-) – The axes to plot on. If not provided, a new figure and axes will be created. Defaults to None.

  • arrow_kwargs (-) – A dictionary of keyword arguments to pass to the matplotlib.axes.Axes.arrow function. Defaults to {}.

  • bar_kwargs (-) – A dictionary of keyword arguments to pass to the matplotlib.axes.Axes.bar function that the arrows are drawn upon. Defaults to {}.

  • bar_label_kwargs (-) – A dictionary of keyword arguments to pass to the matplotlib.axes.Axes.bar_label function that is overlayed on the arrows. Defaults to {}.