cat.ademic addscat.ademic is a thin domain wrapper around
cat.stack that adds OpenAlex-based paper
fetching plus academic prompt framing. You can:
journal_name, journal_issn,
journal_field, topic_name,
topic_id) and classify them in one call.Everything else — supported models, output format, ensemble voting,
batch mode — is identical to cat.stack.
abstracts <- c(
"We use mixed-methods to study labor market outcomes for...",
"This paper develops a formal model of bargaining under...",
"A systematic review of 47 studies on educational interventions..."
)
results <- classify(
categories = c("Empirical-quantitative",
"Empirical-qualitative",
"Theoretical-formal",
"Review/meta-analysis",
"Other"),
input_data = abstracts,
mode = "text",
api_key = Sys.getenv("OPENAI_API_KEY"),
user_model = "gpt-4o-mini"
)cat.ademic connects to OpenAlex — a free, open scholarly
database — to fetch papers by journal, field, or topic. Set
polite_email (your email) for higher rate limits.
results <- classify(
categories = c("Quantitative", "Qualitative", "Mixed Methods"),
journal_name = "American Sociological Review",
paper_limit = 100L,
date_from = "2024-01-01",
polite_email = "[email protected]",
api_key = Sys.getenv("OPENAI_API_KEY")
)Or by ISSN for unambiguous journal identification:
results <- classify(
categories = c("Empirical", "Theoretical", "Review"),
journal_issn = "0003-1224", # AJS
paper_limit = 50L,
polite_email = "[email protected]",
api_key = Sys.getenv("OPENAI_API_KEY")
)OpenAlex auto-tags papers with research topics. You can pull all papers on a topic across journals:
results <- classify(
categories = c("Causal-identification", "Descriptive",
"Theoretical", "Other"),
topic_name = "climate change adaptation",
paper_limit = 200L,
date_from = "2023-01-01",
polite_email = "[email protected]",
api_key = Sys.getenv("OPENAI_API_KEY")
)Pass a directory or a vector of file paths. cat.ademic
extracts the text (or renders pages as images for vision models) and
classifies:
# One-time: install PDF extras
# cat.stack::install_cat_stack(pdf = TRUE)
results <- classify(
categories = c("Has-DGP-assumption", "No-DGP-assumption",
"Unclear", "Other"),
input_data = "./papers/", # directory of PDFs
mode = "image", # rendered-page vision mode
api_key = Sys.getenv("OPENAI_API_KEY"),
user_model = "gpt-4o" # vision-capable model
)For long full-text inputs, summarizing first can improve downstream classification quality (and reduce token cost):
summaries <- summarize(
input_data = "./papers/",
description = "Sociology articles",
instructions = "Summarize methodology and key findings in 3 sentences",
format = "paragraph",
api_key = Sys.getenv("OPENAI_API_KEY"),
user_model = "gpt-4o-mini"
)
results <- classify(
categories = c("Causal", "Descriptive", "Theoretical", "Other"),
input_data = summaries$summary,
api_key = Sys.getenv("OPENAI_API_KEY")
)polite_email when fetching
from OpenAlex — without it you’re throttled to a low rate limit.vignette("getting-started", package = "cat.llm")?cat.ademic::classify,
?cat.ademic::extract, ?cat.ademic::explore,
?cat.ademic::summarize