Advanced Charts - Part 1
Explore scatter plots and area charts
π― Learning Objectives
Scatter Plots
Understand how to show relationships between two numerical variables
Correlation Patterns
Learn to identify positive, negative, and no correlation
Area Charts
Master visualizing volume and cumulative trends over time
Chart Selection
Know when to use scatter vs. area vs. line charts
PART 1: SCATTER PLOTS
π Introduction to Scatter Plots
A scatter plot (also called scatter chart or scatter diagram) displays values for two numerical variables as dots on a two-dimensional graph. Each dot represents one observation from your data.
- Showing relationships between two numerical variables
- Finding correlations and patterns
- Identifying outliers that don't fit the pattern
- Exploring dependencies between variables
- X-axis: Numerical variable (often independent)
- Y-axis: Numerical variable (often dependent)
- Each point = one observation/record
π Reading Scatter Plots
Understanding the components of a scatter plot:
Example: Study Hours vs. Test Scores
Interpretation: This scatter plot shows a positive correlation - as study hours increase, test scores tend to increase. The trend line helps visualize this overall pattern.
π Patterns in Scatter Plots
Scatter plots reveal different types of relationships between variables:
1. Positive Correlation
Pattern: As X increases, Y tends to increase
Examples: Height vs. Weight, Study time vs. Score
2. Negative Correlation
Pattern: As X increases, Y tends to decrease
Examples: Car age vs. Value, Practice time vs. Errors
3. No Correlation
Pattern: No clear relationship between X and Y
Examples: Shoe size vs. IQ, Hair color vs. Salary
4. Strong vs. Weak Correlation
Strength: How close points are to the trend line
Strong: Points cluster tightly; Weak: Points widely scattered
π― Use Cases for Scatter Plots
Real-World Applications:
- Health & Fitness: Height vs. Weight, Exercise hours vs. Calories burned
- Business: Advertising spend vs. Sales revenue, Price vs. Demand
- Education: Study hours vs. Test scores, Attendance vs. Grades
- Science: Temperature vs. Ice cream sales, Rainfall vs. Crop yield
- Economics: Income vs. Spending, Unemployment rate vs. Crime rate
Example: Advertising Spend vs. Sales
Insight: Strong positive correlation - higher ad spend generally leads to higher sales. The trend line helps predict expected sales for a given ad budget.
π Adding Trend Lines
A trend line (or line of best fit) is a straight line drawn through the data points to show the general direction of the relationship.
- Shows overall direction: Upward, downward, or flat
- Helps make predictions: Estimate Y for a given X value
- Quantifies strength: How closely points cluster around the line
- Identifies outliers: Points far from the trend line
β Scatter Plot Best Practices
β Clear axis labels with units: Always specify what each axis represents and include units (dollars, hours, kg, etc.)
β Appropriate scale: Start Y-axis at zero if comparing magnitudes; adjust if focusing on variation
β Don't overcrowd: If you have thousands of points, consider sampling or using transparency/smaller dots
β Add trend line when helpful: Include trend line for linear relationships to show direction and strength
β Highlight outliers if relevant: Circle or annotate unusual data points that don't fit the pattern
β Common Mistakes to Avoid
β Assuming correlation = causation: Just because two variables correlate doesn't mean one causes the other. Ice cream sales and drowning deaths both correlate with temperature, but ice cream doesn't cause drowning!
β Ignoring outliers: Outliers can reveal important insights or data errors. Don't just remove them - investigate why they exist.
β Using for categorical data: Scatter plots require numerical variables. Don't use categories (like "Red", "Blue") on axes.
β Too many points (unreadable): With 10,000+ points, the plot becomes a blob. Use sampling, binning, or heatmap-style density plots instead.
PART 2: AREA CHARTS
π Introduction to Area Charts
An area chart is like a line chart, but the area below the line is filled with color. This emphasizes the magnitude or volume of change over time.
- Showing volume/magnitude over time (not just trend)
- Cumulative trends - totals that build up
- Emphasizing quantity - make the "amount" visually prominent
- Comparing multiple series with stacked variations
- X-axis: Time period (days, months, years) or sequential categories
- Y-axis: Numerical values (quantities, amounts, counts)
- Series: One or more data series to plot
π Single Area Charts
Single area charts track one data series over time, with the area filled to emphasize total magnitude.
Example: Monthly Revenue Growth
Use case: The filled area emphasizes the magnitude of revenue, making growth visually prominent. Perfect for tracking total accumulation over time.
Common Single Area Chart Uses:
- Total revenue over time - emphasizes overall financial performance
- Population growth - shows magnitude of total population
- Cumulative downloads - highlights total volume
- Website traffic - emphasizes visitor volume
π Stacked Area Charts
Stacked area charts show multiple data series stacked on top of each other. The total height shows the combined sum, while each layer shows one series.
Example: Sales by Product Category
How to read: The total height shows combined sales across all categories. Each colored layer shows one category's contribution. You can see both total trends and individual category performance.
- Shows total AND composition - see both overall trend and breakdown
- Compares multiple series over time
- Reveals changing proportions - which categories grow/shrink
- Good for cumulative data where parts add to a whole
π― 100% Stacked Area Charts
100% stacked area charts show proportions over time. The total always reaches 100%, allowing you to focus on relative share rather than absolute values.
Example: Market Share Evolution
Insight: Company C is gaining market share, Company A is losing share, and Company B remains stable. The 100% format makes it easy to compare relative proportions rather than absolute numbers.
When to use 100% Stacked:
- Market share analysis - comparing competitors' relative positions
- Budget allocation - showing how spending is distributed
- Demographic composition - age groups, gender ratios over time
- Portfolio mix - investment allocation changes
Key difference: Total values don't matter - only proportions. Use when relative share is more important than absolute amounts.
π When to Use Area vs. Line Charts
| Aspect | Area Chart | Line Chart |
|---|---|---|
| Best for | Emphasizing volume/magnitude | Emphasizing trend/precision |
| Visual focus | Total quantity (filled area) | Rate of change (line slope) |
| Data type | Cumulative totals, volumes | Any time series data |
| Multiple series | Use stacked (shows total + parts) | Use multiple lines (easier to compare) |
| Examples | Revenue, population, downloads | Temperature, stock price, heart rate |
β Use Area Chart When:
- Showing cumulative totals
- Emphasizing magnitude/volume
- Data represents quantities that "fill up"
- Comparing parts of a whole (stacked)
Example: "Total website visitors this month"
β Use Line Chart When:
- Showing precise trends
- Comparing multiple series (3+ lines)
- Data has negative values
- Emphasizing rate of change
Example: "Daily temperature fluctuations"
β Area Chart Best Practices
β Use for cumulative data: Area charts work best when showing totals, volumes, or quantities that accumulate
β Don't stack too many series: Limit to 3-5 categories in stacked charts; more becomes unreadable
β Use transparent colors for overlaps: If areas overlap (not stacked), use transparency so both are visible
β Order matters in stacked: Place most important series at the bottom where it's easiest to read
β Start Y-axis at zero: Since area represents magnitude, always start at zero to avoid visual distortion
- Using area charts for data with negative values (use line instead)
- Stacking unrelated series that don't sum meaningfully
- Comparing individual series in stacked charts (top series harder to read)
- Using dark, opaque colors that hide overlapping data
π οΈ Interactive Scatter & Area Chart Builder
Practice creating scatter plots and area charts with sample datasets.
Scatter Plot Builder
Dataset: 30 Students - Study Hours vs. Test Scores
Change the correlation type to see different patterns. Toggle the trend line to see its effect.
Area Chart Builder
Dataset: Quarterly Revenue for 3 Product Lines (2 years)
Switch between chart types to see how the same data looks in different area chart formats.
βοΈ Practice Exercises
Test your understanding with these hands-on exercises.
Exercise 1: Identify Correlation Type
Task: For each scenario, identify whether you'd expect positive correlation, negative correlation, or no correlation:
- Hours spent exercising vs. Weight loss
- Car's age vs. Resale value
- Shoe size vs. Math test score
- Years of education vs. Salary
- Distance from equator vs. Average temperature
Show Answer
a) Positive correlation - More exercise typically leads to more weight loss
b) Negative correlation - Older cars generally worth less
c) No correlation - Shoe size doesn't affect math ability
d) Positive correlation - More education often leads to higher salary
e) Negative correlation - Further from equator typically means colder
Exercise 2: Scatter Plot Interpretation
Scenario: A scatter plot shows advertising budget (X-axis) vs. sales revenue (Y-axis) for 50 stores. Most points cluster tightly around an upward-sloping trend line, but 3 stores are far below the line.
Questions:
- What type of correlation is shown?
- Is the correlation strong or weak? How do you know?
- What might the 3 outlier stores indicate?
Show Answer
a) Positive correlation - upward slope means as ad budget increases, sales increase
b) Strong correlation - points cluster "tightly" around the trend line
c) Possible outlier explanations:
- Poor ad targeting or execution in those stores
- Other factors (location, competition, product issues)
- Data errors in recording
- These stores warrant investigation!
Exercise 3: Scatter vs. Area vs. Line
Task: For each scenario, choose the best chart type (Scatter, Area, or Line) and explain why:
- Showing relationship between employee years of experience and salary
- Displaying total app downloads growing from 0 to 1 million over 12 months
- Tracking daily stock price movements
- Comparing how three product lines contribute to total quarterly revenue
- Analyzing if there's a relationship between study hours and exam scores
Show Answer
a) Scatter plot - showing relationship between two numerical variables
b) Area chart - emphasizes cumulative growth and total magnitude
c) Line chart - precise trend with potentially negative values
d) Stacked area chart - shows total revenue AND breakdown by product
e) Scatter plot - exploring correlation between two variables
Exercise 4: Area Chart Design
Scenario: You need to show how your company's total revenue ($500k in Q1 to $800k in Q4) is split between 5 product categories over 4 quarters.
Questions:
- Should you use single area, stacked area, or 100% stacked area?
- What if you want to emphasize which categories are gaining/losing market share?
- What's a problem with having 5 categories?
Show Answer
a) Stacked area chart - shows both total revenue growth AND category breakdown
b) 100% stacked area - makes relative proportions easier to compare
c) Too many categories! 5 layers can be hard to read, especially top layers. Consider:
- Grouping smaller categories into "Other"
- Using a different chart type (grouped column)
- Showing only top 3 categories
Exercise 5: Correlation vs. Causation
Scenario: A scatter plot shows a strong positive correlation between ice cream sales and drowning incidents across 100 beaches.
Questions:
- Does this mean ice cream causes drowning?
- What's a more likely explanation?
- What's the lesson here?
Show Answer
a) No! Correlation does NOT imply causation
b) Confounding variable: Hot weather!
- Hot weather β More people buy ice cream
- Hot weather β More people swim β More drownings
- Both are caused by a third factor (temperature)
c) Critical lesson: Always ask "Could there be a third variable?" before assuming one variable causes another. Correlation helps identify relationships to investigate further, but doesn't prove cause and effect.
Exercise 6: Reading Stacked Area Charts
Task: In a stacked area chart showing email types over time:
- Bottom layer (red): Spam emails
- Middle layer (blue): Work emails
- Top layer (green): Personal emails
If the total height is decreasing but the red layer is growing, what does this tell you?
Show Answer
Interpretation:
- Total emails are decreasing (overall height going down)
- BUT spam is increasing (red layer growing)
- Therefore: Work and/or personal emails must be decreasing significantly
This could indicate improved spam filtering is reducing legitimate emails being received, or people are getting less work/personal email. The spam growth is being offset by even larger decreases in other categories.
Exercise 7: Design Challenge
Task: You have data showing the relationship between employee satisfaction score (1-10) and number of sick days taken per year. Design a visualization:
- What chart type would you use?
- Which variable goes on which axis?
- Would you add a trend line? Why or why not?
- What pattern would you expect to see?
Show Answer
a) Scatter plot - exploring relationship between two numerical variables
b) X-axis: Satisfaction score (independent variable - what we think influences)
Y-axis: Sick days (dependent variable - what we think is influenced)
c) Yes, add trend line - helps show if there's a linear relationship and its direction
d) Expected pattern: Negative correlation - higher satisfaction likely correlates with fewer sick days. However, investigate outliers (high satisfaction but many sick days could indicate serious health issues unrelated to job satisfaction).
Exercise 8: When NOT to Use
Task: For each scenario, explain why the chosen chart is WRONG:
- Using a scatter plot to show categories of products (Electronics, Clothing, Food) vs. sales
- Using an area chart for profit over time when some months show negative profit
- Using a 100% stacked area for unrelated metrics (temperature, sales, and number of employees)
Show Answer
a) Wrong: Categories on scatter plot
- Scatter plots require numerical X-axis
- Product categories are categorical, not numerical
- Use instead: Bar chart or column chart
b) Wrong: Area chart with negative values
- Filled area below baseline doesn't work well for negative values
- Visual becomes confusing
- Use instead: Line chart or column chart with positive/negative bars
c) Wrong: Stacking unrelated metrics
- Stacked charts imply parts sum to a meaningful whole
- Temperature + Sales + Employees doesn't sum meaningfully
- Use instead: Multiple separate charts or multi-line chart
π Knowledge Check
Test your understanding of scatter plots and area charts!
1. What type of data is required for a scatter plot?
2. In a scatter plot showing hours studied vs. test scores, if most points slope upward from left to right, what does this indicate?
3. What is the main difference between a line chart and an area chart?
4. If a scatter plot shows ice cream sales correlating with drowning incidents, what should you conclude?
5. What does a 100% stacked area chart always show?
6. When should you add a trend line to a scatter plot?
7. Which chart type is BEST for showing how three product categories contribute to total quarterly revenue over time?
8. Why should you avoid using area charts for data with negative values?
9. What does each point on a scatter plot represent?
10. When should you use an area chart instead of a line chart?