Sitemap
A list of all the posts and pages found on the site. For you robots out there, there is an XML version available for digesting as well.
Pages
Posts
portfolio
publications
Non-Destructive Prediction of Fruit Ripeness and Firmness Using Hyperspectral Imaging and Lightweight Machine Learning Models
Under review at Computers and Electronics in Agriculture, 2025
We benchmark 19 traditional ML algorithms on dual-task prediction using the DeepHS Fruit dataset across five species. ExtraTrees with stratified resplit achieves 75.00% overall accuracy, surpassing Fruit-HSNet.
Challenges in Deep Learning-Based Small Organ Segmentation: A Benchmarking Perspective for Medical Research with Limited Datasets
Biomedical Signal Processing and Control, 2025 (Under Revision), 2025
A carefully controlled experiment on segmenting the layers of the artery wall from only nine annotated histology images. Standard CNNs pretrained on a large histology corpus vs a vision foundation model under a systematic prompting curriculum.
Machine Learning in Gastrointestinal Tract Imaging: A Comprehensive Review of Techniques and Applications
Journal manuscript in preparation, 2025
A systematic mapping of algorithmic trends to GI imaging techniques, with quantitative analysis of dataset-size to performance and translational enablers.
Beyond Major Floods: Deep Learning for Detecting Shallow Water Inundation in Agricultural Areas
29th International Conference on Knowledge-Based and Intelligent Information & Engineering Systems (KES 2025), 2025
A three-class segmentation framework distinguishing sea, inland water, and land. DeepLabv3+ and a hybrid ResNet-UNet outperformed the other eight models evaluated.
A Fine-Tuning-Installed Routing Subspace Controls Eval vs Deploy Behavior Across Model Families
Under review at NeurIPS 2026, 2026
We localize an eval-vs-deploy routing signal to a narrow mid-depth attention window and a low-dimensional subspace installed by fine-tuning. Clamping the subspace at inference reduces the gap in 11 of 12 architecture-behavior cells.
Acceptance Cards: A Four-Diagnostic Standard for Safe Fine-Tuning Defense Claims
Under review at NeurIPS 2026, 2026
An evaluation protocol, documentation object, executable audit package, and claim-specific evidential standard for safe fine-tuning defenses. In a 46-cell audit on Gemma-2-2B-it, no cell satisfies the strict conjunction.
The Open-Box Fallacy: Why AI Deployment Needs a Calibrated Verification Regime
Under review at NeurIPS 2026, 2026
AI deployment in sensitive domains is often treated as unsafe to authorize until model internals can be explained. We argue the gate should be calibrated verification, and propose Verification Coverage, a six-component reportable standard.
Decoded but Unused: Instruction Tuning Routes Moral Framing into the Judgment Readout
Under review at ICML 2026 Workshop on Mechanistic Interpretability, 2026
Moral framing is linearly decodable in pretrained Gemma-3-4B but has no causal effect on its judgment; in the instruction-tuned checkpoint that same representation becomes causally usable. Within-model framing-judgment alignment is 8.4x larger in IT than in the matched pretrained checkpoint.
Counterfactual Self-Reports Are Not Well-Posed: A Mechanism-Binding Test for LLM Introspection
Under review at PhilML @ ICML 2026 Workshop, 2026
Across three open instruction models, holding the intervention fixed and varying the demonstration environment moves the self-report between target and source mechanisms. Self-report benchmarks should require environment-shift invariance under fixed intervention.
A Path Already Walked: On Inheriting Network-Neuroscience Tools for Mechanistic Interpretability
Under review at ICML 2026 Workshop on Mechanistic Interpretability, 2026
We argue for a disciplined import of network-neuroscience tools rather than a loose brain analogy, specify the transformer graph contract, and state eight testable translations with failure criteria.
When does chain-of-thought improve safety? Evidence from 18 models across 5 families
Under review at COLM 2026, 2026
We evaluate CoT across 18 open-weight models in 5 families on safety-relevant benchmarks. The effect is family-dependent and capability-dependent.
Grounded Auditing as an Evaluation Policy: A Matched-Action Protocol and Stress Test
Under review at NeurIPS 2026, 2026
A matched-action evaluation protocol that formalises grounded auditing as an interface policy I = (O, R, A, G). Evidence access improves correction; citation gates reduce over-trust mainly by inducing abstention.
Fact-check Your Information (FYI): A Design Probe to Understand How People Actually Fact-check Data-Driven Articles
Under review at IEEE VIS 2026, 2026
FYI is a browser extension that bridges automated and manual fact-checking through four complementary tools. In an N=22 think-aloud study, participants adopted three workflow archetypes.
Architecture Without Architects: How AI Coding Agents Shape Software Architecture
SAGAI Workshop, IEEE ICSA 2026 (Accepted), 2026
We survey agentic coding tools and identify five mechanisms by which they make implicit architectural choices, then analyze prompt-architecture coupling. Six recurring patterns arise. We call this vibe architecting.
CAKE: Cloud Architecture Knowledge Evaluation of Large Language Models
KDA-AI Workshop, IEEE ICSA 2026 (Accepted), 2026
CAKE is 188 expert-validated questions spanning four cognitive levels and five cloud-native topics, evaluated across 22 model configurations from four families.
Agentic Hybrid Retrieval for Ad Hoc Dataset Search: A Reference Architecture with LLM-Augmented Metadata
SAML Workshop, IEEE ICSA 2026 (Accepted), 2026
A reference architecture for agentic hybrid retrieval combining BM25 lexical search with dense-embedding retrieval via reciprocal rank fusion, orchestrated by an LLM controller.
