Package 'cat.vader' reference manual

Title:	Social Media Content Classification with LLMs
Description:	R interface to the Python catvader package. Classifies, extracts, and explores social media content using LLMs. A thin domain wrapper around cat.stack that adds social media sourcing parameters (platform, handle, hashtags, credentials).
Authors:	Chris Soria [aut, cre]
Maintainer:	Chris Soria <[email protected]>
License:	GPL (>= 3)
Version:	0.1.2
Built:	2026-07-04 06:20:07 UTC
Source:	https://github.com/chrissoria/cat-llm

Classify social media content using LLMs

Description

Wraps the Python catvader.classify() function. Adds social media sourcing parameters to the base cat.stack classification engine.

Usage

classify(
  input_data = NULL,
  categories = NULL,
  api_key = NULL,
  sm_source = NULL,
  sm_limit = 50L,
  sm_months = NULL,
  sm_days = NULL,
  sm_credentials = NULL,
  sm_handle = NULL,
  sm_timezone = "UTC",
  sm_youtube_content = "video",
  sm_youtube_transcript = FALSE,
  sm_comments_per_video = 20L,
  sm_youtube_transcript_max_chars = 10000L,
  platform = NULL,
  handle = NULL,
  hashtags = NULL,
  post_metadata = NULL,
  description = "",
  feed_question = "",
  user_model = "gpt-4o",
  mode = "image",
  creativity = NULL,
  safety = FALSE,
  chain_of_verification = FALSE,
  chain_of_thought = FALSE,
  step_back_prompt = FALSE,
  context_prompt = FALSE,
  thinking_budget = 0L,
  example1 = NULL,
  example2 = NULL,
  example3 = NULL,
  example4 = NULL,
  example5 = NULL,
  example6 = NULL,
  filename = NULL,
  save_directory = NULL,
  model_source = "auto",
  max_categories = 12L,
  categories_per_chunk = 10L,
  divisions = 10L,
  research_question = NULL,
  models = NULL,
  consensus_threshold = "unanimous",
  use_json_schema = TRUE,
  max_workers = NULL,
  fail_strategy = "partial",
  max_retries = 5L,
  batch_retries = 2L,
  retry_delay = 1,
  row_delay = 0,
  pdf_dpi = 150L,
  auto_download = FALSE,
  add_other = "prompt",
  check_verbosity = TRUE,
  prompt_tune = NULL,
  tune_iterations = 1L,
  tune_ui = "browser",
  tune_optimize = "balanced"
)
classify(
  input_data = NULL,
  categories = NULL,
  api_key = NULL,
  sm_source = NULL,
  sm_limit = 50L,
  sm_months = NULL,
  sm_days = NULL,
  sm_credentials = NULL,
  sm_handle = NULL,
  sm_timezone = "UTC",
  sm_youtube_content = "video",
  sm_youtube_transcript = FALSE,
  sm_comments_per_video = 20L,
  sm_youtube_transcript_max_chars = 10000L,
  platform = NULL,
  handle = NULL,
  hashtags = NULL,
  post_metadata = NULL,
  description = "",
  feed_question = "",
  user_model = "gpt-4o",
  mode = "image",
  creativity = NULL,
  safety = FALSE,
  chain_of_verification = FALSE,
  chain_of_thought = FALSE,
  step_back_prompt = FALSE,
  context_prompt = FALSE,
  thinking_budget = 0L,
  example1 = NULL,
  example2 = NULL,
  example3 = NULL,
  example4 = NULL,
  example5 = NULL,
  example6 = NULL,
  filename = NULL,
  save_directory = NULL,
  model_source = "auto",
  max_categories = 12L,
  categories_per_chunk = 10L,
  divisions = 10L,
  research_question = NULL,
  models = NULL,
  consensus_threshold = "unanimous",
  use_json_schema = TRUE,
  max_workers = NULL,
  fail_strategy = "partial",
  max_retries = 5L,
  batch_retries = 2L,
  retry_delay = 1,
  row_delay = 0,
  pdf_dpi = 150L,
  auto_download = FALSE,
  add_other = "prompt",
  check_verbosity = TRUE,
  prompt_tune = NULL,
  tune_iterations = 1L,
  tune_ui = "browser",
  tune_optimize = "balanced"
)

Arguments

input_data

A character vector, list, or data.frame column, or NULL to fetch from social media. Default NULL.

categories

A character vector of category names, or "auto". Default NULL.

api_key

Character or NULL. API key for the LLM provider.

sm_source

Character or NULL. Social media source (e.g., "reddit", "twitter", "youtube").

sm_limit

Integer. Max posts to fetch. Default 50L.

sm_months

Integer or NULL. Fetch posts from last N months.

sm_days

Integer or NULL. Fetch posts from last N days.

sm_credentials

Named list or NULL. API credentials for the social media platform.

sm_handle

Character or NULL. Social media handle to fetch from.

sm_timezone

Character. Timezone for date filtering. Default "UTC".

sm_youtube_content

Character. YouTube content type. Default "video".

sm_youtube_transcript

Logical. Include YouTube transcripts. Default FALSE.

sm_comments_per_video

Integer. Comments per YouTube video. Default 20L.

sm_youtube_transcript_max_chars

Integer. Max transcript chars. Default 10000L.

platform

Character or NULL. Alias for sm_source.

handle

Character or NULL. Alias for sm_handle.

hashtags

Character vector or NULL. Hashtags to filter by.

post_metadata

Named list or NULL. Additional post metadata.

description

Character. Context description. Default "".

feed_question

Character. Feed-specific question context. Default "".

user_model

Character. Model name. Default "gpt-4o".

mode

Character. Processing mode. Default "image".

creativity

Numeric or NULL. Temperature. Default NULL.

safety

Logical. Save progress after each item. Default FALSE.

chain_of_verification

Logical. Default FALSE.

chain_of_thought

Logical. Default FALSE.

step_back_prompt

Logical. Default FALSE.

context_prompt

Logical. Default FALSE.

thinking_budget

Integer. Default 0L.

example1, example2, example3, example4, example5, example6

Optional few-shot examples.

filename

Character or NULL. Output CSV filename.

save_directory

Character or NULL. Output directory.

model_source

Character. Provider hint. Default "auto".

max_categories

Integer. Default 12L.

categories_per_chunk

Integer. Default 10L.

divisions

Integer. Default 10L.

research_question

Character or NULL. Optional research context.

models

List of model specs for ensemble mode.

consensus_threshold

Character or numeric. Default "unanimous".

use_json_schema

Logical. Default TRUE.

max_workers

Integer or NULL. Default NULL.

fail_strategy

Character. Default "partial".

max_retries

Integer. Default 5L.

batch_retries

Integer. Default 2L.

retry_delay

Numeric. Default 1.0.

row_delay

Numeric. Default 0.0.

pdf_dpi

Integer. Default 150L.

auto_download

Logical. Default FALSE.

add_other

Logical or "prompt". Default "prompt".

check_verbosity

Logical. Default TRUE.

prompt_tune

Integer or NULL. Rows sampled per APO correction round. Default NULL.

tune_iterations

Integer. APO optimization passes. Default 1L.

tune_ui

Character. Correction UI: "browser" or "terminal". Default "browser".

tune_optimize

Character. Metric to optimize: "balanced", "sensitivity", or "precision". Default "balanced".

Value

A data.frame with classification results.

Examples

## Not run: 
# Classify your own text directly
results <- classify(
  input_data = c("Just had the best coffee ever!",
                 "Politicians are all the same",
                 "Looking forward to the game tonight"),
  categories = c("Positive", "Negative", "Neutral"),
  api_key    = Sys.getenv("OPENAI_API_KEY"),
  user_model = "gpt-4o-mini"
)

## End(Not run)
## Not run: 
# Classify your own text directly
results <- classify(
  input_data = c("Just had the best coffee ever!",
                 "Politicians are all the same",
                 "Looking forward to the game tonight"),
  categories = c("Positive", "Negative", "Neutral"),
  api_key    = Sys.getenv("OPENAI_API_KEY"),
  user_model = "gpt-4o-mini"
)

## End(Not run)

Explore raw categories in social media data

Description

Wraps the Python catvader.explore() function. Returns every category string extracted from every chunk across every iteration – with duplicates intact.

Usage

explore(
  input_data = NULL,
  api_key = NULL,
  description = "",
  sm_source = NULL,
  sm_limit = 50L,
  sm_months = NULL,
  sm_credentials = NULL,
  platform = NULL,
  handle = NULL,
  hashtags = NULL,
  post_metadata = NULL,
  max_categories = 12L,
  categories_per_chunk = 10L,
  divisions = 12L,
  user_model = "gpt-4o",
  creativity = NULL,
  specificity = "broad",
  research_question = NULL,
  filename = NULL,
  model_source = "auto",
  iterations = 8L,
  random_state = NULL,
  focus = NULL,
  chunk_delay = 0
)
explore(
  input_data = NULL,
  api_key = NULL,
  description = "",
  sm_source = NULL,
  sm_limit = 50L,
  sm_months = NULL,
  sm_credentials = NULL,
  platform = NULL,
  handle = NULL,
  hashtags = NULL,
  post_metadata = NULL,
  max_categories = 12L,
  categories_per_chunk = 10L,
  divisions = 12L,
  user_model = "gpt-4o",
  creativity = NULL,
  specificity = "broad",
  research_question = NULL,
  filename = NULL,
  model_source = "auto",
  iterations = 8L,
  random_state = NULL,
  focus = NULL,
  chunk_delay = 0
)

Arguments

input_data

A character vector, list, or NULL to fetch from social media. Default NULL.

api_key

Character or NULL. API key for the LLM provider.

description

Character. Context description. Default "".

sm_source

Character or NULL. Social media source.

sm_limit

Integer. Max posts to fetch. Default 50L.

sm_months

Integer or NULL. Fetch posts from last N months.

sm_credentials

Named list or NULL. API credentials.

platform

Character or NULL. Alias for sm_source.

handle

Character or NULL. Social media handle.

hashtags

Character vector or NULL. Hashtags to filter by.

post_metadata

Named list or NULL. Additional post metadata.

max_categories

Integer. Default 12L.

categories_per_chunk

Integer. Default 10L.

divisions

Integer. Default 12L.

user_model

Character. Default "gpt-4o".

creativity

Numeric or NULL. Default NULL.

specificity

Character. Default "broad".

research_question

Character or NULL.

filename

Character or NULL.

model_source

Character. Default "auto".

iterations

Integer. Default 8L.

random_state

Integer or NULL.

focus

Character or NULL.

chunk_delay

Numeric. Default 0.0.

Value

A character vector of every category string extracted.

Examples

## Not run: 
raw_cats <- explore(
  input_data = df$posts,
  api_key    = Sys.getenv("OPENAI_API_KEY"),
  user_model = "gpt-4o-mini",
  iterations = 4L
)
table(raw_cats)

## End(Not run)
## Not run: 
raw_cats <- explore(
  input_data = df$posts,
  api_key    = Sys.getenv("OPENAI_API_KEY"),
  user_model = "gpt-4o-mini",
  iterations = 4L
)
table(raw_cats)

## End(Not run)

Extract categories from social media content using LLMs

Description

Wraps the Python catvader.extract() function. Discovers and returns a normalised, deduplicated set of categories from social media data.

Usage

extract(
  input_data = NULL,
  api_key = NULL,
  sm_source = NULL,
  sm_limit = 50L,
  sm_months = NULL,
  sm_credentials = NULL,
  platform = NULL,
  handle = NULL,
  hashtags = NULL,
  post_metadata = NULL,
  description = "",
  max_categories = 12L,
  categories_per_chunk = 10L,
  divisions = 12L,
  user_model = "gpt-4o",
  creativity = NULL,
  specificity = "broad",
  research_question = NULL,
  mode = "text",
  filename = NULL,
  model_source = "auto",
  iterations = 8L,
  random_state = NULL,
  focus = NULL,
  chunk_delay = 0
)
extract(
  input_data = NULL,
  api_key = NULL,
  sm_source = NULL,
  sm_limit = 50L,
  sm_months = NULL,
  sm_credentials = NULL,
  platform = NULL,
  handle = NULL,
  hashtags = NULL,
  post_metadata = NULL,
  description = "",
  max_categories = 12L,
  categories_per_chunk = 10L,
  divisions = 12L,
  user_model = "gpt-4o",
  creativity = NULL,
  specificity = "broad",
  research_question = NULL,
  mode = "text",
  filename = NULL,
  model_source = "auto",
  iterations = 8L,
  random_state = NULL,
  focus = NULL,
  chunk_delay = 0
)

Arguments

input_data

A character vector, list, or NULL to fetch from social media. Default NULL.

api_key

Character or NULL. API key for the LLM provider.

sm_source

Character or NULL. Social media source.

sm_limit

Integer. Max posts to fetch. Default 50L.

sm_months

Integer or NULL. Fetch posts from last N months.

sm_credentials

Named list or NULL. API credentials.

platform

Character or NULL. Alias for sm_source.

handle

Character or NULL. Social media handle.

hashtags

Character vector or NULL. Hashtags to filter by.

post_metadata

Named list or NULL. Additional post metadata.

description

Character. Context description. Default "".

max_categories

Integer. Default 12L.

categories_per_chunk

Integer. Default 10L.

divisions

Integer. Default 12L.

user_model

Character. Default "gpt-4o".

creativity

Numeric or NULL. Default NULL.

specificity

Character. Default "broad".

research_question

Character or NULL.

mode

Character. Default "text".

filename

Character or NULL.

model_source

Character. Default "auto".

iterations

Integer. Default 8L.

random_state

Integer or NULL.

focus

Character or NULL.

chunk_delay

Numeric. Default 0.0.

Value

A named list with counts_df, top_categories, and raw_top_text.

Examples

## Not run: 
result <- extract(
  input_data = df$posts,
  api_key    = Sys.getenv("OPENAI_API_KEY"),
  user_model = "gpt-4o-mini"
)
print(result$top_categories)

## End(Not run)
## Not run: 
result <- extract(
  input_data = df$posts,
  api_key    = Sys.getenv("OPENAI_API_KEY"),
  user_model = "gpt-4o-mini"
)
print(result$top_categories)

## End(Not run)

Package 'cat.vader'

Help Index

Classify social media content using LLMs

Description

Usage

Arguments

Value

Examples

Explore raw categories in social media data

Description

Usage

Arguments

Value

Examples

Extract categories from social media content using LLMs

Description

Usage

Arguments

Value

Examples