avt
Some graphing code that I’ve found helpful.
- class avt.ReliabilityDisplay
Class to plot a reliability diagram from predictions.
Examples
This can be used to plot the reliability diagram from predictions made by a classifier, such as:
>>> y_prob = clf.predict_proba(X) >>> ax = avt.ReliabilityDisplay.from_predictions( ... y, ... y_prob, ... )
Or from the classifier directly:
>>> ax = avt.ReliabilityDisplay.from_estimator( ... clf, X, y ... )
- from_estimator(X, y, *args, **kwargs) ndarray
Function to plot the reliability figure from an estimator.
- Parameters:
estimator (-) – The estimator to use to make predictions.
X (-) – The data to use to make predictions.
y (-) – The true labels.
*args (-) –
Positional arguments to pass to the
from_predictionsmethod.**kwargs (-) –
Keyword arguments to pass to the
from_predictionsmethod.
- Returns:
- axes – The axes of the plots.
- Return type:
numpy.ndarray:
- from_predictions(probas_pred: ndarray, n_bins: int = 10, legend: bool = True, accuracy_color='xkcd:baby blue', gap_color='xkcd:rose', hist_color='xkcd:lilac', diagonal_color='xkcd:dark red', accuracy_line_color='black', confidence_line_color='black', bar_kwargs: dict = {}, histogram_kwargs: dict = {}, line_kwargs: dict = {}, label_box_kwargs: dict = {}, label_ys: Tuple[float, float] = (0.8, 0.6), label_offset: float = -0.01, ax1: Axes | None = None, ax2: Axes | None = None) ndarray
Function to plot the figure from the predictions.
- Parameters:
y_true (-) – True labels. Please note that this vector should only contain integers that correspond to the probabilities in the second dimension of
probas_pred.probas_pred (-) – Predicted probabilities, as returned by a classifier’s
predict_probamethod. This array should have the shape(n_samples, n_classes).n_bins (-) – The number of bins to use when calculating ECE. Defaults to
15.legend (-) – Whether to plot the legend. Defaults to
True.accuracy_color (-) – The colour to use for the accuracy bars. Defaults to
xkcd:baby blue.gap_color (-) – The colour to use for the gap bars. Defaults to
xkcd:rose.hist_color (-) – The colour to use for the histogram. Defaults to
xkcd:lilac.diagonal_color (-) – The colour to use for the diagonal line. Defaults to
xkcd:dark red.accuracy_line_color (-) – The colour to use for the accuracy line. Defaults to
black.confidence_line_color (-) – The colour to use for the confidence line. Defaults to
black.bar_kwargs (-) – Keyword arguments to pass to the bar plot. Defaults to
{}.histogram_kwargs (-) – Keyword arguments to pass to the histogram plot. Defaults to
{}.line_kwargs (-) – Keyword arguments to pass to the line plot. Defaults to
{}.label_box_kwargs (-) – Keyword arguments to pass to the label
bboxin the histogram plot. Defaults to{}.label_ys (-) – The y coordinates of the labels in the histogram plot. Defaults to
(0.8, 0.6).label_offset (-) – The offset of the labels in the histogram plot.
ax1 (-) – The axes to plot the reliability diagram on. Defaults to
None.ax2 (-) – The axes to plot the histogram on. Defaults to
None.
- Returns:
- axes – The axes of the plots.
- Return type:
numpy.ndarray:
- avt.bar_labels(obj: figure, labels: None | List[str] = None, label_format: str = '{height}', **kwargs) figure
Adds labels to bars in graph.
Examples
The following will add label heights and format them as a percentage:
>>> g = sns.catplot(...) >>> avt.bar_labels(g.figure, label_format='{height:.1%}')
The following will add the labels
'bar1'and'bar2'to the bars in the first axes and'bar3'and'bar4'to the bars in the second axes.>>> g = sns.catplot(...) >>> labels = [ ['bar1','bar2'], # first axis ['bar3', 'bar4'] # second axis ] >>> avt.bar_labels(g.figure, labels=labels)
- Parameters:
obj (-) – The matplotlib figure or axes to add bar labels to.
labels (-) – Labels to add to the bars. This should be a list of lists, in which the outer list acts over the axes in the plot and the inner list is the labels for the bars. If
None, then the bar labels will be the heights of the bars. Defaults toNone.label_format (-) – The format of the bar labels, used only if
labels=None. This needs to contain the wordheight, in{}, where the bar heights will be placed. Defaults to'{height}'.kwargs (-) – Keyword arguments passed to
plt.bar_label. Examples could berotation,padding,label_type, andfontsize.
- Returns:
- out – Matplotlib figure, containing the added bar labels.
- Return type:
plt.figure:
- avt.boxplot(*args, **kwargs)
This is a wrapper for the seaborn boxplot function that includes some default formatting.
By default, all lines will have width
2, be black and the boxplot width is0.75.
- avt.cfmplot(cfm: ndarray, cbar: bool = True, color: str = 'Blues', xlabel: bool = True, ylabel: bool = True, summary_statistics: bool = True, ax: None | axes = None, annot_count=True, annot_percentage=True, **kwargs) axes
Draw a confusion matrix plot from a numpy array.
This function was based on code from https://github.com/DTrimarchi10/confusion_matrix/blob/master/cfm_matrix.py.
Examples
When using this function to plot multi-label task confusion matrices, we will get something similar to the following:
>>> cfm = np.array( [[10, 0, 0], [ 0, 12, 0], [ 0, 1, 15]], dtype=int64 ) >>> cfm_plot(cfm)
If we have a binary task, the summary statistics are more extensive:
>>> cfm = np.array( [[15, 0], [ 0, 10]], dtype=int64 ) >>> cfm_plot(cfm)
- Parameters:
cfm (-) – A numpy array representing the confusion matrix.
cbar (-) – Whether to add a colour bar. Defaults to
True.color (-) – The cmap that can be used for colors. Defaults to
'Blues'.xlabel (-) – whether to include a label on the x axis. Defaults to
True.ylabel (-) – whether to include a label on the y axis. Defaults to
True.summary_statistics (-) – Whether to add summary statistics to the label on the x axis. Defaults to
True.ax (-) – Axes in which to draw the plot, otherwise use the currently-active Axes. Defaults to
None.annot_count (-) – Add the counts to the heatmap. Defaults to
True.annot_percentage (-) – Add the percentages to the heatmap. Defaults to
True.kwargs (-) – All other keyword arguments are passed to
sns.heatmap.
- Returns:
- out – Axes object with the confusion matrix.
- Return type:
plt.axes:
- avt.clockplot(data: DataFrame, x: str | None = None, y: str | None = None, hue: str | None = None, ax: None | axes = None, hue_order: None | List[str] = None, freq: str = '30T', label_format: bool | str = True, label_freq: str | None = None, cmap: Colormap | str | None = None, legend: bool = True, label_kwargs: Dict[str, Any] = {}, **kwargs)
This function plots a circular graph representing a day, with bars representing frequencies.
Examples
>>> ax = avt.clockplot( data, x='datetime', hue='group', label_format='%H:%M', label_freq='3H', )
This will return the plot:
- Parameters:
data (-) – The data.
x (-) – The column name containing the datetimes to use for calculating the time bins. Defaults to
None.y (-) – Ignored. Defaults to
None.hue (-) – Semantic variable that is mapped to determine the color of plot elements. This will determine the stacked bars. Defaults to
None.ax (-) – A matplotlib axes that the plot can be drawn on. Defaults to
None.hue_order (-) – The order of the hue and stacked bars. Defaults to
None.freq (-) – The frequency to bin the bars at. https://pandas.pydata.org/pandas-docs/stable/user_guide/timeseries.html#offset-aliases. Defaults to
'30T'.label_format (-) – The format of the time labels. Any argument to
dt.strftimeis acceptable. https://docs.python.org/3/library/datetime.html#strftime-strptime-behavior. Defaults toTrue.label_freq (-) – How often to show the time labels should be shown. https://pandas.pydata.org/pandas-docs/stable/user_guide/timeseries.html#offset-aliases.
Trueleaves the labels as default andFalseremoves them. Defaults toNone.cmap (-) – The colours of the plot. If a string is passed, this will be used to colour all of the stacked bars. If a cmap is passed, then this is used. If
None, then matplotlib handles the colours. Defaults toNone.legend (-) – Whether to plot a legend. Defaults to
True.label_kwargs (-) – Keyword arguments to pass to the time labels. These are passed to
plt.text. Defaults to{}.kwargs (-) – Any other keyword arguments are passed to
plt.bar. From here, you can change a variety of the bar attributes.
- Returns:
- out – The axes containing the plot.
- Return type:
plt.axes:
- avt.parallelplot(data: ~pandas.core.frame.DataFrame | None = None, x: str | None = None, y: str | None = None, hue: str | None = None, units: str | None = None, estimator: ~typing.Callable = <function mean>, hue_order: ~typing.List[str] | None = None, order: ~typing.List[str] | None = None, bezier: bool = True, cmap: str | ~matplotlib.colors.Colormap | None = None, cbar: bool = False, cbar_x: float = 1.1, legend: bool = True, tick_top: bool = True, cbar_kwargs: ~typing.Dict[str, ~typing.Any] = {}, legend_kwargs: ~typing.Dict[str, ~typing.Any] = {}, ax=None, **kwargs) Axes
Plot a parallel plot.
This has been edited from https://stackoverflow.com/a/60401570/19451559
- Parameters:
data (-) – The data to plot. Must be in long format. Defaults to
None.x (-) – The name of the column in
datathat contains the categories to be plot along the x axis. Defaults toNone.y (-) – The name of the column in
datathat contains the values to be plotted for each category. Defaults toNone.hue (-) – The name of the column in
datathat contains the categories to be used to color the lines. Defaults toNone.units (-) – The name of the column in
datathat contains the categories to be used to distinguish between lines that have the samehuevalue. If notNone, then no estimation will be performed. Defaults toNone.estimator (-) – The function to use to estimate the value of
yfor each category. This is used ifunitsis notNone. Defaults tonp.mean.hue_order (-) – The order in which to plot the
huecategories. Defaults toNone.bezier (-) – Whether to use bezier curves to connect the points. Defaults to
True.cmap (-) – The colormap to use to color the lines. Defaults to
None.cbar (-) – Whether to add a colorbar. Defaults to
False.cbar_x (-) – The x position of the colorbar. Defaults to
1.1.legend (-) – Whether to add a legend. Defaults to
True.tick_top (-) – Whether to put the ticks on the top of the plot. If
False, then the ticks will be on the bottom. Defaults toTrue.cbar_kwargs (-) – Additional keyword arguments to pass to
plt.colorbar. Defaults to{}.legend_kwargs (-) – Additional keyword arguments to pass to
plt.legend. Defaults to{}.ax (-) – The axes on which to plot. Defaults to
None.**kwargs (-) –
Additional keyword arguments to pass to
plt.plotor patches.PathPatch.
- Returns:
- axes – The axes on which the plot was made.
- Return type:
plt.Axes:
- avt.radarplot(data: ~pandas.core.frame.DataFrame | None = None, x: str | None = None, y: str | None = None, hue: str | None = None, order: ~typing.List[str] | None = None, hue_order: ~typing.List[str] | None = None, estimator: ~typing.Callable = <function mean>, fill: bool = True, cmap: str | ~typing.List[str] | None = None, legend: bool = True, ax: ~matplotlib.axes._axes.Axes | None = None, **kwargs) Axes
This function allows you to create a radar plot from a dataframe.
Examples
You can build plots like:
To see the code producing this plot, view the examples
ipynbfile.- Parameters:
data (-) – The dataframe containing the data to plot. Defaults to
None.x (-) – The name of the column containing the categories to plot. Defaults to
None.y (-) – The name of the column containing the values to plot. Defaults to
None.hue (-) – The name of the column containing the hues to plot. Defaults to
None.order (-) – The order of the categories to plot. Defaults to
None.hue_order (-) – The order of the hues to plot. Defaults to
None.estimator (-) – The function to use to aggregate the values when plotting. Defaults to
np.mean.fill (-) – Whether to fill the area under the curve. Defaults to
True.cmap (-) – The name of the colormap to use or a list of colours to use. Defaults to
None.legend (-) – Whether to show the legend. Defaults to
True.ax (-) – The axes to plot on. Defaults to
None.**kwargs (-) –
Additional keyword arguments to pass to
plt.plot.
- Returns:
- ax – The axes containing the plot.
- Return type:
plt.Axes:
- avt.save_fig(fig: figure, file_name: str, **kwargs) None
This function saves a pdf, png, and svg of the figure, with
bbox_inches='tight'anddpi=300.- Parameters:
fig (-) – The figure to save.
file_name (-) – The file name, including path, to save the figure at. This should not include the extension, which will be added when each file is saved.
- Return type:
None
- avt.scatter3dplot(data: DataFrame, x: str | None = None, y: str | None = None, z: str | None = None, hue: str | None = None, style: str | None = None, size: str | int = 25, ax: None | axes = None, hue_order: None | List[str] = None, style_order: None | List[str] = None, cmap: Colormap | str | None = 'RdBu_r', legend: bool = True, **kwargs) axes
This function plots a 3D scatterplot.
Note: This currently only supports categorical hue values. Please also ensure that ax is a 3D projection axes.
Examples
>>> import avt >>> import numpy as np >>> from sklearn.datasets import load_iris >>> data_dict = load_iris(as_frame=True) >>> data, target = data_dict['data'], data_dict['target'] >>> data['target'] = target >>> data['random group'] = np.random.choice(2, size=len(data)) >>> ax = avt.scatter3dplot( data=data, x='sepal length (cm)', y='sepal width (cm)', z='petal length (cm)', hue='target', size='petal width (cm)', style='random group', )
This will return the plot:
Alternatively, you can use an animation to better visualise the distribution:
>>> import avt >>> import numpy as np >>> from sklearn.datasets import load_iris >>> from matplotlib import animation >>> data_dict = load_iris(as_frame=True) >>> data, target = data_dict['data'], data_dict['target'] >>> data['target'] = target >>> data['random group'] = np.random.choice(2, size=len(data)) >>> ax = avt.scatter3dplot( data=data, x='sepal length (cm)', y='sepal width (cm)', z='petal length (cm)', hue='target', size='petal width (cm)', style='random group', ) >>> ani = animation.FuncAnimation( ax.figure, lambda x: ax.view_init(30,x), frames=np.linspace(1,360,90), interval=1, blit=False, ) >>> ax.figure.tight_layout()
This will return the plot:
- Parameters:
data (-) – The data.
x (-) – The column name containing the x values. Defaults to
None.y (-) – The column containing the y values. Defaults to
None.z (-) – The column containing the z values. Defaults to
None.hue (-) – Semantic variable that is mapped to determine the color of plot elements. Defaults to
None.style (-) – Semantic variable that is mapped to determine the shape of plot elements. Defaults to
None.size (-) – Semantic variable that is mapped to determine the size of the plot elements. If
strthen the sizes are determined by the column values. Ifintthen this will be used as the size. Defaults toNone.ax (-) – A matplotlib axes that the plot can be drawn on. Defaults to
None.hue_order (-) – The order of the hue and legend. Defaults to
None.style_order (-) – The order of the style and legend. Defaults to
None.cmap (-) – The colours of the plot. If a string is passed, this will be used to colour all of the stacked bars. If a cmap is passed, then this is used. If
None, then matplotlib handles the colours. Defaults toNone.legend (-) – Whether to plot a legend. Defaults to
True.kwargs (-) – Any other keyword arguments are passed to
plt.scatter. From here, you can change a variety of the bar attributes.
- Returns:
- out – The axes containing the plot.
- Return type:
plt.axes:
- avt.set_colour_map(colours: list = ['#332288', '#88CCEE', '#44AA99', '#117733', '#999933', '#DDCC77', '#CC6677', '#882255', '#AA4499'])
Sets the default colour map for all plots.
Examples
The following sets the colourmap to
tol_muted:>>> set_colour_map(colours=avt.tol_muted)
- Parameters:
colours (-) – Format that is accepted by
cycler.cycler. Defaults totol_muted.
- avt.stackplot(data: DataFrame, x: str | None = None, y: str | None = None, hue: str | None = None, ax: None | axes = None, hue_order: None | List[str] = None, cmap: Colormap | str | None = None, legend: bool = True, cumulative: bool = False, **kwargs)
This function plots a stacked continuous graph. The missing values are interpolated for all x values.
Examples
>>> import avt >>> import seaborn as sns >>> import pandas as pd >>> flights = sns.load_dataset('flights') >>> ax = avt.stackplot(flights, x='year', y='passengers', hue='month', cmap='Blues')
This will return the plot:
- Parameters:
data (-) – The data.
x (-) – The column name containing the x values. Defaults to
None.y (-) – The column containing the heights. Defaults to
None.hue (-) – Semantic variable that is mapped to determine the color of plot elements. This will determine the stacked bars. Defaults to
None.ax (-) – A matplotlib axes that the plot can be drawn on. Defaults to
None.hue_order (-) – The order of the hue and stacked bars. Defaults to
None.cmap (-) – The colours of the plot. If a string is passed, this will be used to colour all of the stacked bars. If a cmap is passed, then this is used. If
None, then matplotlib handles the colours. Defaults toNone.legend (-) – Whether to plot a legend. Defaults to
True.cumulative (-) – If
True, then the cumulative values will be plotted, rather than the raw values. Defaults toFalse.kwargs (-) – Any other keyword arguments are passed to
plt.fill_between. From here, you can change a variety of the bar attributes.
- Returns:
- out – The axes containing the plot.
- Return type:
plt.axes:
- avt.temp_colour_map(colours=['#332288', '#88CCEE', '#44AA99', '#117733', '#999933', '#DDCC77', '#CC6677', '#882255', '#AA4499'])
Temporarily sets the default colour map for all plots.
Examples
The following sets the colourmap to
tol_mutedfor the plotting done within the context:>>> with set_colour_map(colours=avt.tol_muted): ... plt.plot(x,y)
- Parameters:
colours (-) – Format that is accepted by
cycler.cycler. Defaults totol_muted.
- avt.timefreqheatmap(data: DataFrame, x: str | None = None, y: str | None = None, hue: str | None = None, ax: None | axes = None, hue_order: None | List[str] = None, freq: str = '30T', label_format: bool | str = True, cmap: List[str] | str | None = None, binary: bool = False, **kwargs)
This function plots a heatmap with the frequencies of data points, against the date.
Examples
>>> ax = avt.timefreqheatmap( data, x='datetime', hue='group', freq='1H', label_format='%H:%M-%d/%b/%Y', cmap='Blues', ax=ax )
This will return the plot:
- Parameters:
data (-) – The data.
x (-) – The column name containing the datetimes to use for calculating the time bins. Defaults to
None.y (-) – Ignored. Defaults to
None.hue (-) – Semantic variable that is mapped to determine the color of plot elements. This will determine the different rows of the heatmap. Defaults to
None.ax (-) – A matplotlib axes that the plot can be drawn on. Defaults to
None.hue_order (-) – The order of the hue and rows. Defaults to
None.freq (-) – The frequency to bin the columns of the heatmap at. https://pandas.pydata.org/pandas-docs/stable/user_guide/timeseries.html#offset-aliases. Defaults to
'30T'.label_format (-) – The format of the time labels. Any argument to
dt.strftimeis acceptable. https://docs.python.org/3/library/datetime.html#strftime-strptime-behavior.Trueleaves the labels as default andFalseremoves them. Defaults toTrue.cmap (-) – This is the cmap for plotting the colours on the heatmap. If a
str, then this should be an acceptable argument tosns.mpl_palette. IfNone, this heatmap defaults to'inferno'. Defaults toNone.binary (-) – Whether to plot if there was a value or not, rather than the number of recorded values. Defaults to
False.kwargs (-) – Any other keyword arguments are passed to
sns.heatmap. From here, you can change a variety of the bar attributes.
- Returns:
- out – The axes containing the plot.
- Return type:
plt.axes:
- avt.waterfallplot(data: DataFrame | None = None, x: str | None = None, y: str | None = None, hue: str | None = None, order: List[str] | None = None, base: float = 0, orient: str = 'h', estimator: str | Callable = 'sum', cmap: str | None = None, alpha: bool = 0.75, positive_colour: str = '#648fff', negative_colour: str = '#fe6100', width: float = 0.8, bar_label: bool = True, ax: Axes | None = None, arrow_kwargs: Dict[str, Any] = {}, bar_kwargs: Dict[str, Any] = {}, bar_label_kwargs: Dict[str, Any] = {})
This function allows you to draw a waterfall plot with a dataframe.
Examples
You can build plots like:
To see the code producing this plot, view the examples
ipynbfile.- Parameters:
data (-) – The dataframe containing the data to plot. It must have at least two columns, one for the x-axis and one for the y-axis. Defaults to
None.x (-) – The name of the column in the dataframe that will be used for the x-axis. Defaults to
None.y (-) – The name of the column in the dataframe that will be used for the y-axis. Defaults to
None.hue (-) – The name of the column in the dataframe that will be used to colour the bars. This is not currently implemented. Defaults to
None.order (-) – The order in which the bars should be plotted. This is used for the categorical axis. Defaults to
None.base (-) – This is the value of the base of the waterfall plot. If not provided, a line will be drawn here and the first waterfall bar will be drawn from it. Defaults to
None.orient (-) – Whether to plot the waterfall horizontally or vertically. Options are
'h'or'v'. Defaults toh.estimator (-) – The statistical function to use to aggregate the values. Defaults to
'sum'.alpha (-) – The alpha value to use for the arrows. Defaults to
0.75.cmap (-) – The name of the colourmap to use for the bars. If not provided, the bars will be coloured based on whether the value is positive or negative. Defaults to
None.positive_colour (-) – The colour to use for positive values. Defaults to
'#648fff'.negative_colour (-) – The colour to use for negative values. Defaults to
'#fe6100'.width (-) – The width of the bars. Defaults to
0.8.bar_label (-) – Whether to add labels to the bars. Defaults to
True.ax (-) – The axes to plot on. If not provided, a new figure and axes will be created. Defaults to
None.arrow_kwargs (-) – A dictionary of keyword arguments to pass to the
matplotlib.axes.Axes.arrowfunction. Defaults to{}.bar_kwargs (-) – A dictionary of keyword arguments to pass to the
matplotlib.axes.Axes.barfunction that the arrows are drawn upon. Defaults to{}.bar_label_kwargs (-) – A dictionary of keyword arguments to pass to the
matplotlib.axes.Axes.bar_labelfunction that is overlayed on the arrows. Defaults to{}.
