Publications

You can also find my articles on my Google Scholar profile.

Mechanistic Interpretability and AI Safety

When does chain-of-thought improve safety? Evidence from 18 models across 5 families

Under review at COLM 2026, 2026

We evaluate CoT across 18 open-weight models in 5 families on safety-relevant benchmarks. The effect is family-dependent and capability-dependent.

Download Paper

A Path Already Walked: On Inheriting Network-Neuroscience Tools for Mechanistic Interpretability

Under review at ICML 2026 Workshop on Mechanistic Interpretability, 2026

We argue for a disciplined import of network-neuroscience tools rather than a loose brain analogy, specify the transformer graph contract, and state eight testable translations with failure criteria.

Download Paper

Counterfactual Self-Reports Are Not Well-Posed: A Mechanism-Binding Test for LLM Introspection

Under review at PhilML @ ICML 2026 Workshop, 2026

Across three open instruction models, holding the intervention fixed and varying the demonstration environment moves the self-report between target and source mechanisms. Self-report benchmarks should require environment-shift invariance under fixed intervention.

Download Paper

Decoded but Unused: Instruction Tuning Routes Moral Framing into the Judgment Readout

Under review at ICML 2026 Workshop on Mechanistic Interpretability, 2026

Moral framing is linearly decodable in pretrained Gemma-3-4B but has no causal effect on its judgment; in the instruction-tuned checkpoint that same representation becomes causally usable. Within-model framing-judgment alignment is 8.4x larger in IT than in the matched pretrained checkpoint.

Download Paper

Acceptance Cards: A Four-Diagnostic Standard for Safe Fine-Tuning Defense Claims

Under review at NeurIPS 2026, 2026

An evaluation protocol, documentation object, executable audit package, and claim-specific evidential standard for safe fine-tuning defenses. In a 46-cell audit on Gemma-2-2B-it, no cell satisfies the strict conjunction.

Download Paper

A Fine-Tuning-Installed Routing Subspace Controls Eval vs Deploy Behavior Across Model Families

Under review at NeurIPS 2026, 2026

We localize an eval-vs-deploy routing signal to a narrow mid-depth attention window and a low-dimensional subspace installed by fine-tuning. Clamping the subspace at inference reduces the gap in 11 of 12 architecture-behavior cells.

Download Paper

Evaluation Policy and Position Papers

Grounded Auditing as an Evaluation Policy: A Matched-Action Protocol and Stress Test

Under review at NeurIPS 2026, 2026

A matched-action evaluation protocol that formalises grounded auditing as an interface policy I = (O, R, A, G). Evidence access improves correction; citation gates reduce over-trust mainly by inducing abstention.

The Open-Box Fallacy: Why AI Deployment Needs a Calibrated Verification Regime

Under review at NeurIPS 2026, 2026

AI deployment in sensitive domains is often treated as unsafe to authorize until model internals can be explained. We argue the gate should be calibrated verification, and propose Verification Coverage, a six-component reportable standard.

Download Paper

AI Coding Agents and Software Architecture

Agentic Hybrid Retrieval for Ad Hoc Dataset Search: A Reference Architecture with LLM-Augmented Metadata

SAML Workshop, IEEE ICSA 2026 (Accepted), 2026

A reference architecture for agentic hybrid retrieval combining BM25 lexical search with dense-embedding retrieval via reciprocal rank fusion, orchestrated by an LLM controller.

Download Paper

CAKE: Cloud Architecture Knowledge Evaluation of Large Language Models

KDA-AI Workshop, IEEE ICSA 2026 (Accepted), 2026

CAKE is 188 expert-validated questions spanning four cognitive levels and five cloud-native topics, evaluated across 22 model configurations from four families.

Download Paper

Architecture Without Architects: How AI Coding Agents Shape Software Architecture

SAGAI Workshop, IEEE ICSA 2026 (Accepted), 2026

We survey agentic coding tools and identify five mechanisms by which they make implicit architectural choices, then analyze prompt-architecture coupling. Six recurring patterns arise. We call this vibe architecting.

Download Paper

Applied ML: Medical, Agricultural, and Remote Sensing

Beyond Major Floods: Deep Learning for Detecting Shallow Water Inundation in Agricultural Areas

29th International Conference on Knowledge-Based and Intelligent Information & Engineering Systems (KES 2025), 2025

A three-class segmentation framework distinguishing sea, inland water, and land. DeepLabv3+ and a hybrid ResNet-UNet outperformed the other eight models evaluated.

Download Paper

Machine Learning in Gastrointestinal Tract Imaging: A Comprehensive Review of Techniques and Applications

Journal manuscript in preparation, 2025

A systematic mapping of algorithmic trends to GI imaging techniques, with quantitative analysis of dataset-size to performance and translational enablers.

Download Paper

Challenges in Deep Learning-Based Small Organ Segmentation: A Benchmarking Perspective for Medical Research with Limited Datasets

Biomedical Signal Processing and Control, 2025 (Under Revision), 2025

A carefully controlled experiment on segmenting the layers of the artery wall from only nine annotated histology images. Standard CNNs pretrained on a large histology corpus vs a vision foundation model under a systematic prompting curriculum.

Download Paper

Non-Destructive Prediction of Fruit Ripeness and Firmness Using Hyperspectral Imaging and Lightweight Machine Learning Models

Under review at Computers and Electronics in Agriculture, 2025

We benchmark 19 traditional ML algorithms on dual-task prediction using the DeepHS Fruit dataset across five species. ExtraTrees with stratified resplit achieves 75.00% overall accuracy, surpassing Fruit-HSNet.

Download Paper

Human-AI Interaction

Fact-check Your Information (FYI): A Design Probe to Understand How People Actually Fact-check Data-Driven Articles

Under review at IEEE VIS 2026, 2026

FYI is a browser extension that bridges automated and manual fact-checking through four complementary tools. In an N=22 think-aloud study, participants adopted three workflow archetypes.

Phongsakon Mark Konrad

Publications

Mechanistic Interpretability and AI Safety

Evaluation Policy and Position Papers

AI Coding Agents and Software Architecture

Applied ML: Medical, Agricultural, and Remote Sensing

Human-AI Interaction