| Title: | Social Media Content Classification with LLMs |
|---|---|
| Description: | R interface to the Python catvader package. Classifies, extracts, and explores social media content using LLMs. A thin domain wrapper around cat.stack that adds social media sourcing parameters (platform, handle, hashtags, credentials). |
| Authors: | Chris Soria [aut, cre] |
| Maintainer: | Chris Soria <[email protected]> |
| License: | GPL (>= 3) |
| Version: | 0.1.2 |
| Built: | 2026-07-04 06:20:07 UTC |
| Source: | https://github.com/chrissoria/cat-llm |
Wraps the Python catvader.classify() function. Adds social media sourcing
parameters to the base cat.stack classification engine.
classify( input_data = NULL, categories = NULL, api_key = NULL, sm_source = NULL, sm_limit = 50L, sm_months = NULL, sm_days = NULL, sm_credentials = NULL, sm_handle = NULL, sm_timezone = "UTC", sm_youtube_content = "video", sm_youtube_transcript = FALSE, sm_comments_per_video = 20L, sm_youtube_transcript_max_chars = 10000L, platform = NULL, handle = NULL, hashtags = NULL, post_metadata = NULL, description = "", feed_question = "", user_model = "gpt-4o", mode = "image", creativity = NULL, safety = FALSE, chain_of_verification = FALSE, chain_of_thought = FALSE, step_back_prompt = FALSE, context_prompt = FALSE, thinking_budget = 0L, example1 = NULL, example2 = NULL, example3 = NULL, example4 = NULL, example5 = NULL, example6 = NULL, filename = NULL, save_directory = NULL, model_source = "auto", max_categories = 12L, categories_per_chunk = 10L, divisions = 10L, research_question = NULL, models = NULL, consensus_threshold = "unanimous", use_json_schema = TRUE, max_workers = NULL, fail_strategy = "partial", max_retries = 5L, batch_retries = 2L, retry_delay = 1, row_delay = 0, pdf_dpi = 150L, auto_download = FALSE, add_other = "prompt", check_verbosity = TRUE, prompt_tune = NULL, tune_iterations = 1L, tune_ui = "browser", tune_optimize = "balanced" )classify( input_data = NULL, categories = NULL, api_key = NULL, sm_source = NULL, sm_limit = 50L, sm_months = NULL, sm_days = NULL, sm_credentials = NULL, sm_handle = NULL, sm_timezone = "UTC", sm_youtube_content = "video", sm_youtube_transcript = FALSE, sm_comments_per_video = 20L, sm_youtube_transcript_max_chars = 10000L, platform = NULL, handle = NULL, hashtags = NULL, post_metadata = NULL, description = "", feed_question = "", user_model = "gpt-4o", mode = "image", creativity = NULL, safety = FALSE, chain_of_verification = FALSE, chain_of_thought = FALSE, step_back_prompt = FALSE, context_prompt = FALSE, thinking_budget = 0L, example1 = NULL, example2 = NULL, example3 = NULL, example4 = NULL, example5 = NULL, example6 = NULL, filename = NULL, save_directory = NULL, model_source = "auto", max_categories = 12L, categories_per_chunk = 10L, divisions = 10L, research_question = NULL, models = NULL, consensus_threshold = "unanimous", use_json_schema = TRUE, max_workers = NULL, fail_strategy = "partial", max_retries = 5L, batch_retries = 2L, retry_delay = 1, row_delay = 0, pdf_dpi = 150L, auto_download = FALSE, add_other = "prompt", check_verbosity = TRUE, prompt_tune = NULL, tune_iterations = 1L, tune_ui = "browser", tune_optimize = "balanced" )
input_data |
A character vector, list, or |
categories |
A character vector of category names, or |
api_key |
Character or |
sm_source |
Character or |
sm_limit |
Integer. Max posts to fetch. Default |
sm_months |
Integer or |
sm_days |
Integer or |
sm_credentials |
Named list or |
sm_handle |
Character or |
sm_timezone |
Character. Timezone for date filtering. Default |
sm_youtube_content |
Character. YouTube content type. Default |
sm_youtube_transcript |
Logical. Include YouTube transcripts. Default
|
sm_comments_per_video |
Integer. Comments per YouTube video. Default
|
sm_youtube_transcript_max_chars |
Integer. Max transcript chars. Default
|
platform |
Character or |
handle |
Character or |
hashtags |
Character vector or |
post_metadata |
Named list or |
description |
Character. Context description. Default |
feed_question |
Character. Feed-specific question context. Default |
user_model |
Character. Model name. Default |
mode |
Character. Processing mode. Default |
creativity |
Numeric or |
safety |
Logical. Save progress after each item. Default |
chain_of_verification |
Logical. Default |
chain_of_thought |
Logical. Default |
step_back_prompt |
Logical. Default |
context_prompt |
Logical. Default |
thinking_budget |
Integer. Default |
example1, example2, example3, example4, example5, example6
|
Optional few-shot examples. |
filename |
Character or |
save_directory |
Character or |
model_source |
Character. Provider hint. Default |
max_categories |
Integer. Default |
categories_per_chunk |
Integer. Default |
divisions |
Integer. Default |
research_question |
Character or |
models |
List of model specs for ensemble mode. |
consensus_threshold |
Character or numeric. Default |
use_json_schema |
Logical. Default |
max_workers |
Integer or |
fail_strategy |
Character. Default |
max_retries |
Integer. Default |
batch_retries |
Integer. Default |
retry_delay |
Numeric. Default |
row_delay |
Numeric. Default |
pdf_dpi |
Integer. Default |
auto_download |
Logical. Default |
add_other |
Logical or |
check_verbosity |
Logical. Default |
prompt_tune |
Integer or |
tune_iterations |
Integer. APO optimization passes. Default |
tune_ui |
Character. Correction UI: |
tune_optimize |
Character. Metric to optimize: |
A data.frame with classification results.
## Not run: # Classify your own text directly results <- classify( input_data = c("Just had the best coffee ever!", "Politicians are all the same", "Looking forward to the game tonight"), categories = c("Positive", "Negative", "Neutral"), api_key = Sys.getenv("OPENAI_API_KEY"), user_model = "gpt-4o-mini" ) ## End(Not run)## Not run: # Classify your own text directly results <- classify( input_data = c("Just had the best coffee ever!", "Politicians are all the same", "Looking forward to the game tonight"), categories = c("Positive", "Negative", "Neutral"), api_key = Sys.getenv("OPENAI_API_KEY"), user_model = "gpt-4o-mini" ) ## End(Not run)
Wraps the Python catvader.explore() function. Returns every category string
extracted from every chunk across every iteration – with duplicates intact.
explore( input_data = NULL, api_key = NULL, description = "", sm_source = NULL, sm_limit = 50L, sm_months = NULL, sm_credentials = NULL, platform = NULL, handle = NULL, hashtags = NULL, post_metadata = NULL, max_categories = 12L, categories_per_chunk = 10L, divisions = 12L, user_model = "gpt-4o", creativity = NULL, specificity = "broad", research_question = NULL, filename = NULL, model_source = "auto", iterations = 8L, random_state = NULL, focus = NULL, chunk_delay = 0 )explore( input_data = NULL, api_key = NULL, description = "", sm_source = NULL, sm_limit = 50L, sm_months = NULL, sm_credentials = NULL, platform = NULL, handle = NULL, hashtags = NULL, post_metadata = NULL, max_categories = 12L, categories_per_chunk = 10L, divisions = 12L, user_model = "gpt-4o", creativity = NULL, specificity = "broad", research_question = NULL, filename = NULL, model_source = "auto", iterations = 8L, random_state = NULL, focus = NULL, chunk_delay = 0 )
input_data |
A character vector, list, or |
api_key |
Character or |
description |
Character. Context description. Default |
sm_source |
Character or |
sm_limit |
Integer. Max posts to fetch. Default |
sm_months |
Integer or |
sm_credentials |
Named list or |
platform |
Character or |
handle |
Character or |
hashtags |
Character vector or |
post_metadata |
Named list or |
max_categories |
Integer. Default |
categories_per_chunk |
Integer. Default |
divisions |
Integer. Default |
user_model |
Character. Default |
creativity |
Numeric or |
specificity |
Character. Default |
research_question |
Character or |
filename |
Character or |
model_source |
Character. Default |
iterations |
Integer. Default |
random_state |
Integer or |
focus |
Character or |
chunk_delay |
Numeric. Default |
A character vector of every category string extracted.
## Not run: raw_cats <- explore( input_data = df$posts, api_key = Sys.getenv("OPENAI_API_KEY"), user_model = "gpt-4o-mini", iterations = 4L ) table(raw_cats) ## End(Not run)## Not run: raw_cats <- explore( input_data = df$posts, api_key = Sys.getenv("OPENAI_API_KEY"), user_model = "gpt-4o-mini", iterations = 4L ) table(raw_cats) ## End(Not run)
Wraps the Python catvader.extract() function. Discovers and returns a
normalised, deduplicated set of categories from social media data.
extract( input_data = NULL, api_key = NULL, sm_source = NULL, sm_limit = 50L, sm_months = NULL, sm_credentials = NULL, platform = NULL, handle = NULL, hashtags = NULL, post_metadata = NULL, description = "", max_categories = 12L, categories_per_chunk = 10L, divisions = 12L, user_model = "gpt-4o", creativity = NULL, specificity = "broad", research_question = NULL, mode = "text", filename = NULL, model_source = "auto", iterations = 8L, random_state = NULL, focus = NULL, chunk_delay = 0 )extract( input_data = NULL, api_key = NULL, sm_source = NULL, sm_limit = 50L, sm_months = NULL, sm_credentials = NULL, platform = NULL, handle = NULL, hashtags = NULL, post_metadata = NULL, description = "", max_categories = 12L, categories_per_chunk = 10L, divisions = 12L, user_model = "gpt-4o", creativity = NULL, specificity = "broad", research_question = NULL, mode = "text", filename = NULL, model_source = "auto", iterations = 8L, random_state = NULL, focus = NULL, chunk_delay = 0 )
input_data |
A character vector, list, or |
api_key |
Character or |
sm_source |
Character or |
sm_limit |
Integer. Max posts to fetch. Default |
sm_months |
Integer or |
sm_credentials |
Named list or |
platform |
Character or |
handle |
Character or |
hashtags |
Character vector or |
post_metadata |
Named list or |
description |
Character. Context description. Default |
max_categories |
Integer. Default |
categories_per_chunk |
Integer. Default |
divisions |
Integer. Default |
user_model |
Character. Default |
creativity |
Numeric or |
specificity |
Character. Default |
research_question |
Character or |
mode |
Character. Default |
filename |
Character or |
model_source |
Character. Default |
iterations |
Integer. Default |
random_state |
Integer or |
focus |
Character or |
chunk_delay |
Numeric. Default |
A named list with counts_df, top_categories, and raw_top_text.
## Not run: result <- extract( input_data = df$posts, api_key = Sys.getenv("OPENAI_API_KEY"), user_model = "gpt-4o-mini" ) print(result$top_categories) ## End(Not run)## Not run: result <- extract( input_data = df$posts, api_key = Sys.getenv("OPENAI_API_KEY"), user_model = "gpt-4o-mini" ) print(result$top_categories) ## End(Not run)