ezplot Tutorial

Welcome to the Plot-Hero package documentation! This package provides an easy-to-use function, plot_histogram, for creating beautiful visualizations of your data. Here, we’ll walk through the functionality with a fun, real-life example starring Alex, a budding data scientist on a quest to uncover insights in marketing data.

Alex’s Data Visualization Quest

Alex, an intern at a marketing firm, was tasked with analyzing campaign data to understand what works and what doesn’t. However, Alex knew that numbers alone weren’t enough—they needed visualizations to make their insights shine. That’s where Plot-Hero came to the rescue with the mighty plot_histogram function!

Step 1: Setup

Alex began their journey by loading the marketing dataset, which contained details about campaign types, success rates, and even the channels used to target specific audiences.

import dsci_524_ezplot

print(dsci_524_ezplot.__version__)
0.0.6
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
from dsci_524_ezplot.plot_histogram import plot_histogram
from dsci_524_ezplot.plot_line import plot_line
from dsci_524_ezplot.plot_scatterplot import plot_scatterplot
from dsci_524_ezplot.plot_heatmap import plot_heatmap
data = pd.read_csv("marketing_campaign_dataset.csv")
data.head().style.set_table_styles(
    [{"selector": "table", "props": [("overflow-x", "auto"), ("display", "block")]}]
)
  Campaign_ID Company Campaign_Type Target_Audience Duration Channel_Used Conversion_Rate Acquisition_Cost ROI Location Language Clicks Impressions Engagement_Score Customer_Segment Date
0 1 Innovate Industries Email Men 18-24 30 days Google Ads 0.040000 $16,174.00 6.290000 Chicago Spanish 506 1922 6 Health & Wellness 2021-01-01
1 2 NexGen Systems Email Women 35-44 60 days Google Ads 0.120000 $11,566.00 5.610000 New York German 116 7523 7 Fashionistas 2021-01-02
2 3 Alpha Innovations Influencer Men 25-34 30 days YouTube 0.070000 $10,200.00 7.180000 Los Angeles French 584 7698 1 Outdoor Adventurers 2021-01-03
3 4 DataTech Solutions Display All Ages 60 days YouTube 0.110000 $12,724.00 5.550000 Miami Mandarin 217 1820 7 Health & Wellness 2021-01-04
4 5 NexGen Systems Email Men 25-34 15 days YouTube 0.050000 $16,452.00 6.500000 Los Angeles Mandarin 379 4201 3 Health & Wellness 2021-01-05

Step 2: Plotting a Histogram for a Numerical Column

Alex’s first task was to understand the distribution of Conversion_Rate to show the distribution of conversion rates across campaigns.

fig, ax = plot_histogram (
    df=data,
    column="Conversion_Rate",
    bins=15,
    title="Distribution of Conversion Rates",
    xlabel="Conversion Rate",
    ylabel="Frequency",
    color="blue"
)
plt.show()
_images/5678f94cf20ae537b7b47d816d3167f71f4ac90e26a3333629631cf022d297f5.png

The plot highlights the conversion rates of campaigns in bins. Alex can now point out patterns, such as campaigns never achieves conversion rates above 0.16 and anything above 0.08 is better than average.

Step 3: Plotting a Bar Plot for a Categorical Column

Next, Alex turned their attention to the Channel_Used column to see which platforms marketers favored. Since this column was categorical, Alex used plot_histogram to create a bar plot:

fig, ax = plot_histogram(
    df=data,
    column="Channel_Used",
    title="Distribution of Marketing Channels",
    xlabel="Marketing Channel",
    ylabel="Count",
    color="orange"
)
plt.show()
_images/e6a6f8d46252e939c9afd550b3debe8237e0e40a049572253199ff097d365c58.png

The plot clearly shows the distribution of marketing channels, with each bar representing the count of campaigns run on different platforms. It’s evident that platforms like YouTube, Google Ads, and others are used equally often in the dataset.

Step 4: Tracking Campaign Performance Over Time

Alex wanted to analyze how campaign performance changed over time. The plot_line function was perfect for visualizing this temporal trend.

daily_performance = data.groupby('Date')['Conversion_Rate'].mean().reset_index()
daily_performance['Date'] = pd.to_datetime(daily_performance['Date'])
daily_performance['Date_Numeric'] = (daily_performance['Date'] - pd.Timestamp('1970-01-01')).dt.days

fig, ax = plot_line(
    df=daily_performance,
    x="Date_Numeric",
    y="Conversion_Rate",
    title="Daily Campaign Performance Trend",
    xlabel="Date",
    ylabel="Average Conversion Rate"
)

n_ticks = 5
step = len(daily_performance) // (n_ticks - 1)
tick_indices = range(0, len(daily_performance), step)
tick_positions = [daily_performance['Date_Numeric'].iloc[i] for i in tick_indices]
tick_labels = [daily_performance['Date'].iloc[i].strftime('%Y-%m-%d') for i in tick_indices]

ax.set_xticks(tick_positions)
ax.set_xticklabels(tick_labels)

plt.xticks(rotation=45)
plt.tight_layout()
plt.show()
_images/9bbf1b577ced74e7086595169f9353c04c8e024ac507cf2be4d3d03b184fea88.png

Alex noticed several interesting patterns in the campaign performance throughout 2021. The daily conversion rates showed consistent fluctuations between 7.6% and 8.4%, suggesting that while there was variability in campaign performance, it maintained a relatively stable range.

Alex was particularly intrigued by several notable features in the data:

  • The highest peaks reached approximately 8.4% conversion rate, occurring multiple times throughout the year

  • A significant dip appeared in early July, where the conversion rate dropped to about 7.4%

  • Most campaigns maintained an average conversion rate around 8%

This visualization helped Alex identify both successful periods and areas for improvement. The consistent oscillation pattern suggested that external factors, such as seasonality or market conditions, might be influencing campaign performance.

Step 5: Create a Scatterplot to Compare the Conversion Rate to ROI for each Channel

Next, to gain a better understanding of how each marketing channel performed, Alex decided to create a scatter plot comparing the Conversion Rate to ROI for each channel. Alex knew that this visualization could help identify patterns or outliers that weren’t immediately obvious looking at the dataset.

aggregated_data = data.groupby('Channel_Used')[['Conversion_Rate', 'ROI']].mean().reset_index()

fig, ax = plot_scatterplot(df = aggregated_data, x = "Conversion_Rate" , y = "ROI", color = "Channel_Used", title="Scatterplot", xlabel="Conversion Rate", ylabel="ROI")
_images/9256808559beed8cc9bb8e7274a28c9abba72bb69c9a16d36f22048496597f51.png

With the campaign data now aggregated by Channel_Used, Alex could clearly compare the Conversion Rate and ROI for each channel. The aggregation helped simplify the data, giving Alex a clearer picture of how each marketing channel performed on average. From this visual, Alex could now confidently present the performance of each marketing channel. Google Ads seemed like a strong contender for future campaigns. By visualizing these two metrics together, Alex could now provide insights on where the marketing team should focus efforts to improve efficiency and returns.

Step 6: Uncovering Relationships with a Heatmap

Having explored conversion rates, campaign performance trends, and ROI using bar plots, line plots, and scatterplots, Alex wanted to dive deeper. This time, Alex aimed to identify relationships between multiple variables in the dataset—such as how Conversion Rate, ROI, and Engagement Score might influence each other. To do this, Alex turned to heatmaps, a powerful visualization tool for showing correlations.

Using Plot-Hero’s plot_heatmap function, Alex started by computing a correlation matrix for the numeric variables:

correlation_matrix = data[["Conversion_Rate", "ROI", "Clicks", "Engagement_Score"]].corr()

With the correlation matrix ready, Alex generated a heatmap to visualize the relationships:

fig, ax = plot_heatmap(
    df=correlation_matrix,
    title="Correlation Heatmap with Annotations",
    xlabel="Metrics",
    ylabel="Metrics",
    cmap="coolwarm",
)

plt.show()
---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
Cell In[9], line 1
----> 1 fig, ax = plot_heatmap(
      2     df=correlation_matrix,
      3     title="Correlation Heatmap with Annotations",
      4     xlabel="Metrics",
      5     ylabel="Metrics",
      6     cmap="coolwarm",
      7 )
      9 plt.show()

TypeError: plot_heatmap() got an unexpected keyword argument 'df'

The resulting heatmap showed the correlations between the metrics. Alex noticed the following key insights:

  • The diagonal cells, marked in bright yellow, represent perfect correlations of each metric with itself (as expected).

  • There were no exceptionally strong correlations between the different metrics (off-diagonal values were not close to 1.0 or -1.0).

  • This indicated that each metric provided distinct information about the campaigns, which was valuable for a multi-faceted analysis.

With this knowledge, Alex could confidently present the team’s findings and suggest focusing on independent optimizations for each metric.

Wrapping It All Up

With ezplot, Alex transformed complex data into actionable insights through clear, professional visuals. From histograms and line plots to scatterplots and heatmaps, ezplot made data visualization fast and effective.

Alex recommends ezplot for:

  • Data professionals seeking quick, polished visualizations.

  • Students and analysts focusing on insights, not coding complexities.

“ezplot simplified my workflow and let me focus on the story behind the data. It’s a must-have tool for anyone working with data!”

Now it’s your turn to explore the power of ezplot. Happy visualizing! 🚀