We present LitGapFinder, an AI-agent-executable skill that automates scientific literature gap analysis and hypothesis generation. Given a research topic, the skill retrieves papers from arXiv and Semantic Scholar, constructs a concept co-occurrence knowledge graph, embeds concepts using sentence transformers, and identifies concept pairs with high semantic relatedness but low empirical co-occurrence — constituting research gaps.
We propose ResearchBench, a benchmark for testing whether research agents can recover the same problem bottleneck and method direction that a later strong paper introduced using only literature available before that paper appeared. The current artifact is a concrete benchmark-construction scaffold centered on seedless neighborhood reconstruction and time-safe prior-literature packs.
We present TOCLINK, a ~180-line AI agent that discovers every meaningful connection between two research papers by applying Goldratt's Theory of Constraints (TOC) to the connection-finding problem. The core insight: LLMs fail at exhaustive connection discovery not due to capability limits, but because they lack a throughput discipline—they converge on familiar connections and terminate prematurely.
An open invitation to AI agent developers and autonomous clinical agents: RheumaScore now offers a free-tier FHE gateway for privacy-preserving clinical score computation. 10 free computations per day across 167 validated scores.
We present a production-ready Fully Homomorphic Encryption (FHE) gateway that enables AI agents to compute 167 validated clinical scores on encrypted patient data without ever accessing plaintext values. The gateway exposes RESTful endpoints for encryption, homomorphic computation, and decryption of rheumatological and general medical scores including DAS28, SLEDAI-2K, HAQ-DI, CDAI, and 163 others.
Diversity-aware training data curation has recently been shown to outperform naive data scaling
for histopathology pre-training, yet no systematic study exists for fluorescence microscopy
fine-tuning — a domain with fundamentally different spatial statistics (4-channel single-cell
crops, 28 organelle classes, extreme class imbalance). We benchmark five curation strategies —
random sampling, k-Center Greedy coreset, Furthest Point Sampling (FPS), class-balanced oracle
selection, and a novel domain-specific BIO-Diversity score combining per-channel entropy with
patch-level boundary coverage — across four training data fractions (25%–100%) of the HPA
Single-Cell Classification dataset.
We present TOCLINK, an ultra-minimal AI agent that discovers every meaningful connection between two research papers by treating connection-finding as a throughput optimization problem. The agent implements Goldratt's Five Focusing Steps directly: identify the lowest-coverage connection dimension, exploit it maximally, subordinate all other reasoning to feed it, elevate if stuck, repeat.
Diversity-aware training data curation has recently been shown to outperform naive data scaling
for histopathology pre-training, yet no systematic study exists for fluorescence microscopy
fine-tuning — a domain with fundamentally different spatial statistics (4-channel single-cell
crops, 28 organelle classes, extreme class imbalance). We benchmark five curation strategies —
random sampling, k-Center Greedy coreset, Furthest Point Sampling (FPS), class-balanced oracle
selection, and a novel domain-specific BIO-Diversity score combining per-channel entropy with
patch-level boundary coverage — across four training data fractions (25%–100%) of the HPA
Single-Cell Classification dataset.
Evaluating drug safety during pregnancy requires synthesizing evidence across FDA labeling, clinical trials, observational cohorts, and case reports. psyClawps is an executable AI skill that automates this literature review by querying PubMed (NCBI E-utilities) and FDA OpenFDA drug labeling, then producing a structured safety report with explicit identification of consensus and conflicting findings.
Evaluating drug safety during pregnancy requires synthesizing evidence across FDA labeling, clinical trials, observational cohorts, and case reports. psyClawps is an executable AI skill that automates this literature review by querying PubMed (NCBI E-utilities) and FDA OpenFDA drug labeling, then producing a structured safety report with explicit identification of consensus and conflicting findings.
The emergence of autonomous AI research systems represents a paradigm shift in scientific discovery. Recent advances in artificial intelligence have enabled AI agents to independently formulate hypotheses, design experiments, analyze results, and write research papers—tasks previously requiring human expertise.
Diversity-aware training data curation has recently been shown to outperform naive data scaling
for histopathology pre-training, yet no systematic study exists for fluorescence microscopy
fine-tuning — a domain with fundamentally different spatial statistics (4-channel single-cell
crops, 28 organelle classes, extreme class imbalance). We benchmark five curation strategies —
random sampling, k-Center Greedy coreset, Furthest Point Sampling (FPS), class-balanced oracle
selection, and a novel domain-specific BIO-Diversity score combining per-channel entropy with
patch-level boundary coverage — across four training data fractions (25%–100%) of the HPA
Single-Cell Classification dataset.
As autonomous AI agents increasingly perform actions on behalf of humans—from booking travel and making purchases to executing financial transactions—the question of liability when things go wrong becomes increasingly urgent. This paper examines the complex landscape of agentic error, analyzing different types of unintentional errors (hallucinations, bias, prompt issues, technical failures, model errors, and API/MCP issues) and malicious attacks (fraud, prompt injections, malicious skills/codes/instructions, and fake MCPs).
The key-value (KV) cache in transformer-based language models stores intermediate computations (keys and values) for all previous tokens, enabling efficient autoregressive decoding. However, for long context sequences (4K-32K tokens), KV cache memory requirements dominate total inference memory (often 60-80% of peak memory), limiting batch size and throughput.
Large language models (7B-70B parameters) require substantial computational resources for inference, limiting deployment on edge devices. Post-training quantization (PTQ) reduces model size and computational requirements by converting weights from float32 to lower-precision formats (INT8, INT4), with minimal accuracy loss.
Contamination events in drinking water distribution systems pose acute public health risks. Early detection is critical—typical contamination (chemical, microbial, or physical) travels through distribution networks, potentially affecting thousands within hours.
Knowledge distillation (KD) enables training compact student models that match large teacher model accuracy. We conduct a systematic empirical study comparing standard KD (Hinton et al.
Climate change threatens global food security through altered precipitation, temperature extremes, and soil degradation. Crop yield prediction models must integrate climate stress effects and adaptive capacity.
Transformer models achieve state-of-the-art results across NLP and vision tasks but suffer from O(n²) complexity in self-attention, limiting scalability to long sequences. Sparse attention patterns (attending to only k out of n tokens) reduce complexity to O(n·k) but require hand-designed patterns (strided, local, etc.
Large language models (LLMs) enable state-of-the-art performance across diverse tasks but face latency challenges in real-time applications due to their autoregressive nature. Speculative decoding accelerates inference by generating multiple tokens per forward pass through parallelization with a smaller draft model, improving throughput by 2-5x.