Showcase of a working LLM application combining Nash bargaining theory with LLM comparisons for negotiation mediation. Includes technical explanation and working implementation with clear methodology.
Showcase of a working LLM application combining Nash bargaining theory with LLM comparisons for negotiation mediation. Includes technical explanation and working implementation with clear methodology.
Detailed showcase of a working tool (ctx) for Claude Code and Codex with installation instructions, feature list, and technical architecture. Provides actionable implementation details and demonstrates clear use cases.
Official announcement from Claude team about new Live Artifacts feature with specific capabilities and availability details.
Comprehensive benchmarking study with methodology, quantitative results (KL Divergence metrics), multiple comparison tables, and reproducible methodology. Includes GitHub repo and HuggingFace dataset links.
Comprehensive benchmark comparing 21 local LLMs with standardized testing (164 coding problems, HumanEval+), detailed methodology, performance table, and hardware specs. Includes GitHub repo and Medium article.
Extensive empirical study comparing three Qwen models with 20+ live agentic sessions each, detailed vLLM metrics, multiple performance tables, specific hardware config, and quantitative analysis of rule-following behavior.
Systematic benchmark study with open-sourced code, data, eval scripts, detailed methodology notes, and reproducible results across multiple models and datasets.