Research Topic 2
Benchmarking
Legal reasoning spans rule extraction, statutory interpretation, analogical reasoning, judgment under ambiguity, and more. Most academic benchmarks do not measure reasoning — only outcomes. This track covers our work building evaluation infrastructure that distinguishes getting to the right answer the right way from coincidental correctness. It includes domain-specific benchmarks, reliability frameworks, and studies of LLM capabilities in specialized legal contexts.