---
title: "Classifying Social Media Posts"
output: rmarkdown::html_vignette
vignette: >
  %\VignetteIndexEntry{Classifying Social Media Posts}
  %\VignetteEngine{knitr::rmarkdown}
  %\VignetteEncoding{UTF-8}
---

```{r setup, include = FALSE}
knitr::opts_chunk$set(eval = FALSE, comment = "#>", collapse = TRUE)
```

# What `cat.vader` adds

`cat.vader` is a thin domain wrapper around `cat.stack` that adds
**social-media platform connectors** plus prompt framing tuned for
short, informal, online language. You can use it two ways:

1. **Pull posts directly from a platform** (Threads, Reddit, Bluesky,
   Mastodon, YouTube, etc.) using the `sm_source`, `sm_handle`,
   `sm_credentials` arguments, then classify.
2. **Classify text you already have** as a plain character vector —
   identical to `cat.stack::classify()` but with social-media-aware
   prompt context.

Everything else — supported models, output format, ensemble voting —
is identical to `cat.stack`.

# Install

```{r install}
install.packages(
  "cat.vader",
  repos = c("https://chrissoria.r-universe.dev",
            "https://cloud.r-project.org")
)
library(cat.vader)
```

# Classify text you already have

The simplest path — pass a character vector of post text:

```{r classify-text}
posts <- c(
  "Just had the best coffee ever! Highly recommend the new place downtown.",
  "Politicians are all the same. Nothing ever changes.",
  "Looking forward to the game tonight!",
  "This new policy is going to ruin small businesses.",
  "Anyone know a good vet in the area?"
)

results <- classify(
  input_data = posts,
  categories = c("Positive sentiment", "Negative sentiment",
                 "Question/request", "Other"),
  api_key    = Sys.getenv("OPENAI_API_KEY"),
  user_model = "gpt-4o-mini"
)
```

# Pull and classify in one call

If you want to analyse your own posting history or a public account,
`cat.vader` can pull posts directly and feed them through the classifier:

```{r pull-and-classify}
# Authenticate once with your Threads API credentials, then:
results <- classify(
  sm_source      = "threads",
  sm_handle      = "your_username",
  sm_months      = 6L,                      # last 6 months
  sm_credentials = Sys.getenv("THREADS_TOKEN"),
  categories     = c("Personal", "Political", "Promotional",
                     "Question", "Other"),
  api_key        = Sys.getenv("OPENAI_API_KEY"),
  user_model     = "gpt-4o-mini"
)
```

The returned `data.frame` includes the original text **and** platform
engagement metrics (likes, replies, reposts, etc.) so you can correlate
content categories with reach.

Each platform has its own authentication setup; check the
[cat-vader Python docs](https://pypi.org/project/cat-vader/) for the
current credential formats.

# Discovering categories from a feed

Before classifying, see what themes are actually present:

```{r discover}
cats <- extract(
  input_data     = df$posts,
  max_categories = 10L,
  api_key        = Sys.getenv("OPENAI_API_KEY"),
  user_model     = "gpt-4o-mini"
)
cats$top_categories
```

Then iterate — drop noise categories, merge similar ones, and re-run
`classify()` with the cleaned scheme.

# Tips for social-media data

1. **Short text is harder.** Tweets/posts often lack context (`"lol"`,
   `"this"`). Including the platform name and a brief description in
   `description=` helps the model interpret short posts.
2. **Emoji and slang.** Frontier models handle these well; older or
   smaller models less so. If you're seeing weird classifications on
   highly informal text, try `gpt-4o` or `claude-3-5-sonnet` over the
   mini/haiku tier.
3. **Engagement isn't ground truth.** A high-engagement post isn't
   necessarily *correctly* classified — engagement reflects audience,
   not category. Validate on a hand-coded subsample.

# Where to learn more

- Full Getting Started guide:
  `vignette("getting-started", package = "cat.llm")`
- Per-function reference: `?cat.vader::classify`, `?cat.vader::extract`,
  `?cat.vader::explore`
- Platform-connector docs are maintained in the [Python catvader
  package](https://pypi.org/project/cat-vader/).