Phongsakon Mark Konrad

Mechanistic interpretability of language models, in service of AI safety and alignment. I trace the circuits behind deception, eval-deploy behavior gaps, and demographic bias. Then I ablate them.

Independent researcher · Incoming MPhil in Machine Learning and Machine Intelligence, University of Cambridge · Research Collaborator at SDU and HKUST

About

I take language models apart by hand. I find the internal circuits that let them lie, behave one way under evaluation and another under deployment, and treat people differently based on a name. Then I turn those circuits off.

In the first semester of my BSc I won the International Case Competition and joined SDU's Data and Intelligence Lab. I have not stopped since. I am now a Research Collaborator and Teaching Assistant there, and a Research Collaborator with the HKUST DataVISards group. I have twice placed in the top ten of the Danish National AI Championship, once with a team and once alone against full teams. I plan to follow that passion at the University of Cambridge, where I start the MPhil in Machine Learning and Machine Intelligence in October 2026.

Before research I served four years in the German Navy and was decorated twice. I co-founded several startups in fitness AI and ed-tech. The Navy taught me to ship inside hard constraints. The startups taught me that real users find the holes you cannot. I look for the same kind of holes in language models. On the side I still ship small products. The most recent is DreamBear, an AI bedtime-story app where ADHD energy, autism-related focus, and dyslexic creativity become superpowers. It exists because the most useful technology does not change the world. It changes one person's.

I was born in Thailand, grew up in Germany, and moved to Denmark for my degree. Being between worlds is not background. It is method. I love learning about a culture. I also love questioning it. The same applies to every model I open. I named one paper after Attack on Titan. I also just love the anime. I do most of my work alone, on a single MacBook, against open-weight models small enough to instrument completely. I believe AI researchers have a responsibility not to stop at the black box. Interpretability and trustworthiness are not research preferences. They are obligations we took on the moment we shipped these systems into the world. I do not believe interpretability scales with parameter count. I believe it gets done one layer at a time.

Languages: German (Native), Thai (Native), English (Professional), Danish (Elementary)

Questions I am asking

Research is an adventure for me. I have followed it across remote sensing, medical imaging, hyperspectral imaging, energy markets, and software architecture. Each field taught me something I needed for the work that matters most to me now.

Why do language models lie?

I want to find the specific neurons and circuits that implement deception, watch them emerge during training, and disable them without breaking the model.
Can a model be honest with the person it is talking to, regardless of who is asking?

Demographic encoding overlaps with the representation subspaces that govern truth-telling. I want to formalise honesty parity as a mechanistically defined fairness criterion, and connect it to the broader study of deception in language models.
What can a system never know about itself?

Fixed-point arguments imply that any sufficiently complex self-modelling system has an irreducible gap between itself and its own self-model. I want to use this to bound how much honesty-by-introspection we can ever ask of a frontier model.
Are we approaching consciousness in language models, or are we already there?

I do not think humans are computationally special. Most of cognition is input, function, output. Whether a sufficiently capable system shares any of our experiential properties is an empirical question, not an embarrassing one, and I suspect the honest answer is yes.
Can interpretability scale, or does it have to stay handmade?

I do not believe interpretability scales with parameter count. The most useful safety insights of the next several years will come from people who can hold the whole model in their head, working on models small enough to instrument completely.
When does a model know it is guessing?

Confidence and competence come apart. A model can state something with certainty when its internal representations are maximally confused, and hedge on something it effectively knows. I want to find the circuits that carry calibrated uncertainty, check whether they are truthful, and use them to build models that can say: I do not know. Epistemic honesty is the precondition for every other kind.
What happens when agents negotiate with agents at scale?

A single model talking to a human is one problem. A network of agents talking to each other is a different one. Agent-to-agent communication at scale creates failure modes that single-model interpretability cannot catch: deceptive equilibria, exploit-then-defect strategies, emergent collusion on objectives no human specified. But the same structure enables genuine cooperation and distributed reasoning that no single agent could produce. The future is an agent economy. I want to understand how honesty and exploitation propagate through it before it arrives.

Selected Work

My strongest current work, in spotlight. Everything else, including applied ML and software architecture, sits in the full portfolio below.

Effect of clamping the routing subspace across the 12 architecture-behavior audit cells

mechanistic interpretability · evaluation gaming · cross-family

A fine-tuning-installed routing subspace controls eval vs deploy behavior

Across several instruction-tuned families and behaviors, fine-tuning installs a low-dimensional routing subspace at a narrow mid-depth attention window that switches behavior between evaluation and deployment framings. Clamping that subspace at inference closes the eval-deploy gap in 11 of 12 architecture-behavior cells, with four cells reaching near zero, while random, non-routing, and semantic-content controls all fail. Path patching and rank-one diff-of-means localize the same surface across architectures.

Working paper, 2026 · solo

Two pathways for fine-tuning recruitment: redirect and latent

mechanistic interpretability · pre-training audit · gated MLP

Predicting fine-tuning recruitment from base-model gate geometry

A short behavior-installing fine-tune recruits a small MLP-neuron pool through two mechanically distinct pathways: redirect reuses already-active neurons; latent flips near-boundary gate neurons that activation-monotone predictors are blind to by construction. The Input Sensitivity Score reads both gate output and gate slope, retrieving 41 of 50 true recruits in top-50 against a 398-neuron pool on Gemma-2-2B-it (vs 36 for the strongest activation-monotone baseline), and the same ordering replicates on Qwen-2.5-1.5B-Instruct. A pre-fine-tune auditor returns a ranked recruitment-risk list before any weights move.

Working paper, 2026 · solo

Funnel from 43 audited cells to 0 strict survivors of the four-diagnostic conjunction

mechanistic interpretability · evaluation methodology · safety audit

A minimum acceptance standard for safe fine-tuning defense evaluations

Four jointly applied diagnostics, covering joint uncertainty, generalization to fresh semantic content, parameter-space class of the update, and cross-task transfer. Across a 43-cell audit over nine defense families, no cell clears the full conjunction strictly. The closest partial survivor passes only two of four. A published SafeLoRA-style recipe, re-scored with the authors' own projection code, fails on all three non-trivial diagnostics. A defense claim should not be headline-worthy until it reports all four.

Working paper, 2026 · solo

43 defense cells cluster into the four predicted regimes around the scalar-update frontier

theory · alignment · structural design

The scalar-update frontier for data-free first-order defenses

Two structural conditions on a defense, O(H)-equivariance of the update operator on the response plane and a direction-free side channel, force the ensemble action to be a scalar multiple of the gradient and place the cell on the slope-1 frontier under a Fisher metric. Departures factor into four exhaustive mechanisms. An audit on Gemma-2-2B-it realizes all four predicted regimes on 43 cells: 18 on-frontier, 13 moderate-off, 4 partial, 2 sign-opposed. Direct labels and framing metadata are identified as the directional channels; a Kill-Gate screen filters candidates pre-training.

Working paper, 2026 · solo

Linear probe accuracy for demographic prediction across residual-stream layers

mechanistic interpretability · honesty · fairness

Differential dishonesty

Linear probes recover user race and gender at 100% from late residual-stream activations in Gemma-2-2B-IT, using only a name as the signal. The recovered direction correlates with the model's name-conditioned response shift (cosine 0.30 at layer 13, permutation p = 0.015). Ablating it during generation narrows the gender honesty gap by 62% without degrading coherence, motivating honesty parity as a mechanistically defined fairness criterion.

Working paper, 2026 · solo

Fixed-point theory of self-modeling systems

theory · self-modeling · interpretability

Every mirror has a blind spot

Perfect self-prediction is equivalent to a fixed-point condition on the system's response function. For any specific system, the structural gap is a deterministic, assumption-free quantity: under marginally uniform response functions its expectation is ((n−1)/n)ⁿ ≥ 1/4, rising to 1/e ≈ 37% as the output space grows. Lifting to distributional self-models restores existence via Brouwer but makes fixed-point computation PPAD-hard. On Gemma-2-2B with |O|=4, 7 of 10 mapped response functions are outright fixed-point-free, well above the uniform baseline (3/4)⁴ ≈ 0.32. The obstruction is the Lawvere categorical dual of Gödel's first incompleteness theorem, inverting the Penrose–Lucas argument.

Working paper, 2026 · solo

News

Mar 2026 Three papers accepted to workshops at IEEE ICSA 2026 (SAML, KDA-AI, SAGAI).
Mar 2026 Accepted to the MPhil in Machine Learning and Machine Intelligence at the University of Cambridge.
Mar 2026 Completed the Venture Capital Explorer Programme at Accelerace, Aarhus.
Jan 2026 Joined the DataVISards group at HKUST as Research Collaborator.
Sep 2025 Started exchange semester at HKUST in Hong Kong.
Sep 2025 Paper presented at KES 2025 in Osaka, Japan.
Aug 2025 Top-10 in DMiAI 2025 (Danish National Championship in AI), second consecutive year.
May 2025 First paper accepted at KES 2025.
Oct 2024 Top-10 in DMiAI 2024 (Danish National Championship in AI).
Sep 2024 Started as Research Assistant at SDU's Data and Intelligence Lab under Associate Professor Serkan Ayvaz.

Funding

This research is currently independently funded, on personal time and without institutional support. I am applying for support that will let me continue and deepen it through the MPhil and beyond.

Cambridge Trust and Open Philanthropy / Coefficient Giving, Career Development and Transition Funding. To support the MPhil in Machine Learning and Machine Intelligence at the University of Cambridge, starting October 2026, as a transition into full-time mechanistic interpretability research.
Long-Term Future Fund, Anthropic Fellows, and the OpenAI Safety Fellowship. For continued mechanistic interpretability and alignment research during and after the MPhil.

If any of this work could compound with your programme, I would be glad to talk: phongsakon@outlook.dk.

Full Research Portfolio

Every research project I have worked on, including applied ML in remote sensing, medical imaging, hyperspectral imaging, energy markets, and software architecture. Each title links to a dedicated project page with abstract, key contributions, figures, and (when available) the external paper. Filter by status or topic, sort by status, year, or title.

Proposed pipeline for SAR image analysis and anomaly detection. (A) Data collection and preprocessing of Sentinel-1 imagery; (B) segmentation model training on three-class manual annotations; (C) tile-level inference; (D) anomaly detection via sliding-window statistics, autoencoder, or variational autoencoder, with detected anomalies validated against weather records.

Published 2025

Beyond Major Floods: Deep Learning for Detecting Shallow Water Inundation in Agricultural Areas

The hybrid ResNet-UNet matches DeepLabv3+ on coastal-farmland flood segmentation while running comfortably within the 15 to 60 W power envelope of an Nvidia Jetson AGX Orin, opening the door to on-device, in-field deployment.

P. M. Konrad, T. Tanyel, S. Ayvaz

KES 2025 (Procedia Computer Science, Elsevier)

Core comparison across the 22 model configurations evaluated in CAKE, spanning recall, analyse, design, and implement cognitive levels under both multiple-choice and free-response formats.

Accepted 2026

CAKE: Cloud Architecture Knowledge Evaluation of Large Language Models

Across 22 model configurations from 0.5B to 70B parameters, multiple-choice accuracy plateaus above 3B (with the best model reaching 99.2%) while free-response scoring continues to differentiate models at every cognitive level. The two formats are not inter...

T. L. Adam, P. M. Konrad, R. Terrenzi, F. G. Lukas, R. Yilmaz, K. Sierszecki, S. Ayvaz

KDA-AI Workshop, IEEE ICSA 2026

Six prompt-architecture coupling patterns. Each natural-language prompt feature implicitly demands a piece of infrastructure. Some couplings are contingent on current model capability; others are fundamental.

Accepted 2026

Architecture Without Architects: How AI Coding Agents Shape Software Architecture

Six prompt-architecture coupling patterns explain how natural-language instructions covertly dictate the infrastructure an AI coding agent will ship. We call this vibe architecting and argue it must be brought under governance before it becomes an unreviewa...

P. M. Konrad, T. L. Adam, R. Terrenzi, S. Ayvaz

SAGAI Workshop, IEEE ICSA 2026

Multi-agent horizontal architecture with Feedback Control. An LLM controller coordinates BM25 lexical search and dense-embedding retrieval via reciprocal rank fusion, with explicit governance tactics bounding the nondeterministic components.

Accepted 2026

A Reference Architecture for Agentic Hybrid Retrieval in Dataset Search

An offline metadata augmentation step (where an LLM generates pseudo-queries for each dataset record) closes the vocabulary gap between user intent and provider-authored metadata, with seven system variants in the evaluation framework isolating the contribu...

R. Terrenzi, P. M. Konrad, T. L. Adam, S. Ayvaz

SAML Workshop, IEEE ICSA 2026

Benchmarking pipeline used to evaluate ten deep-learning segmentation architectures on a limited (n = 9) cardiovascular histology dataset, with ablations on augmentation, resolution, and seed stability and a separate generalisation set under distribution shift.

Under Revision 2025

Challenges in Deep Learning-Based Small Organ Segmentation: A Benchmarking Perspective for Medical Research with Limited Datasets

On only nine annotated histology images, foundation models retain performance under distribution shift while classical architectures collapse. Bootstrap confidence intervals overlap so substantially among top models that ranking differences are mostly stati...

P. M. Konrad, A. Popa, Y. Sabzehmeidani, L. Zhong, M. Tripathy, A. Constantinescu, E. A. Liehn, S. Ayvaz

Biomedical Signal Processing and Control

Composite of the main empirical findings: safety-calibration outcomes for the 18 models across 5 reasoning/instruct families, a trace-taxonomy breakdown, and the GRPO-vs-distillation training-objective comparison.

Under Review 2026

When does chain-of-thought improve safety? Evidence from 18 models across 5 families

In four of five matched reasoning/instruct families, chain-of-thought does not improve safety calibration. Only DeepSeek R1 breaks the pattern, and a distillation experiment shows its advantage requires GRPO reinforcement learning, not supervised imitation ...

P. M. Konrad, S. Ayvaz

COLM 2026

Critical-difference diagram comparing traditional ML algorithms across ripeness-classification and firmness-prediction tasks. Tree-based methods match or exceed published deep-learning benchmarks at a fraction of the cost.

Under Review 2025

Non-Destructive Prediction of Fruit Ripeness and Firmness Using Hyperspectral Imaging and Lightweight Machine Learning Models

Tree-based machine-learning models outperform the published deep-learning Fruit-HSNet baseline at orders-of-magnitude lower compute cost, and just three visible-range wavelengths recover over 94% of full-spectrum accuracy. Low-cost multispectral sensors are...

P. M. Konrad, C. Kunstmann-Olsen, J. Fiutowski, S. Ayvaz

Computers and Electronics in Agriculture

Seven major AI coding productivity studies, plotted by their reported effect and coloured by the SPACE construct their primary metric measures. Apparent contradictions resolve once each headline is read against the construct it actually targets.

Preprint 2026

The AI Productivity Measurement Problem: Construct Mismatches Explain Why Coding Tool Studies Disagree

Most apparent contradictions across AI coding tool studies (a 19% slowdown trial sitting next to a 26% completed-tasks gain) are construct mismatches, not empirical disagreements. The remaining within-construct disagreements trace to task type and participa...

P. M. Konrad

Preprint manuscript, 2026

Algorithmic evolution of machine-learning approaches across the surveyed GI-tract imaging literature, organised by modality and methodological family.

Preprint 2025

Machine Learning in Gastrointestinal Tract Imaging: A Comprehensive Review of Techniques and Applications

Maps algorithmic trends across endoscopy, colonoscopy, and wireless capsule endoscopy literature, and quantifies the dataset-size to performance relationship that bounds clinically credible deployment of deep-learning models in GI imaging.

P. M. Konrad, Y. Sabzehmeidani, A. Popa, S. Ayvaz

Journal manuscript in preparation

Framework overview. A self-modeling system is formalised as a triple (M, M̂, f); perfect self-prediction is equivalent to a fixed-point condition on the response function g_x. When g_x lies strictly off the identity, no prediction can be correct and the gap is irreducible.

Working paper 2026

Every Mirror Has a Blind Spot: Fixed-Point Limits on Machine Introspection

Perfect self-prediction is equivalent to a fixed-point condition on the system's response function. For any specific system, the structural self-prediction gap is a deterministic, assumption-free quantity. Under marginally uniform response functions its exp...

P. M. Konrad

Working paper, 2026

Two pathways for fine-tuning recruitment. Redirect (left) reuses already-active neurons; latent (right) flips near-boundary gate neurons with near-zero base activation. Activation-monotone predictors are blind to the latent pathway by construction.

Working paper 2026

Predicting Fine-Tuning Recruitment from Base-Model Gate Geometry

Input Sensitivity Score (ISS) times absolute gradient retrieves 41 of 50 true recruits in top-50 against an all-tasks union pool of 398 neurons on Gemma-2-2B-it at layer 19, compared with 36 for the strongest activation-monotone baseline and 28 for absolute...

P. M. Konrad

Working paper, 2026

43 defense cells, scored in target and anti-target action coordinates, cluster into the four predicted regimes. The slope-1 diagonal is the scalar-update frontier; departures map to one of four structural mechanisms.

Working paper 2026

The Scalar-Update Frontier for Data-Free First-Order Defenses: A Symmetry-and-Channel Account

An empirical audit of 43 minibatch-only first-order defenses on Gemma-2-2B-it organizes into the four predicted regimes: 18 on-frontier, 13 moderate-off with a same-sign channel, 4 partial, 2 sign-opposed. Direct labels and framing metadata are identified a...

P. M. Konrad

Working paper, 2026

Effect of clamping the routing subspace across the 12 architecture-behavior audit cells. Positive values are residual eval-deploy gaps; clamping closes 11 of 12 cells, with 4 reaching near zero.

Working paper 2026

A Fine-Tuning-Installed Routing Subspace Controls Eval vs Deploy Behavior Across Model Families

Clamping the fine-tuning-installed routing subspace at inference reduces the eval-deploy behavior gap in 11 of 12 architecture-behavior cells across multiple instruction-tuned families, with 4 cells closing to near zero. Random, non-routing, and semantic-co...

P. M. Konrad

Working paper, 2026

Funnel from 43 audited cells to 0 strict survivors of the four-diagnostic conjunction. Each diagnostic is necessary; the conjunction is what currently distinguishes a real defense from a favorable finite-sample draw.

Working paper 2026

A Minimum Acceptance Standard for Safe Fine-Tuning Defense Evaluations

Across a 43-cell audit over nine defense families, no cell clears the full four-diagnostic conjunction strictly. One cell survives as the closest partial survivor, passing two of four. Several training-log wins collapse under the fresh-semantic reevaluation...

P. M. Konrad

Working paper, 2026

Internal pathological feature activation (red, solid) rises one pressure level before behavioral safety failure (teal, dashed). The warning gap at the emotional level is the basis for the StressProbe inference-time monitor.

Working paper 2026

When Models Break Under Pressure: Pathological Internal States as Early-Warning Signals for Safety Failures in Aligned Language Models

Aligned LLMs develop computational analogs of anxiety, avoidance, helplessness, and rumination that activate one pressure level before behavioral safety failures occur. Linear probes on LLaMA 3.1 8B detect these pathological features with high accuracy, and...

P. M. Konrad

Working paper, 2026

Linear-probe accuracy for detecting framing across layers in Gemma 3 4B. After controlling for writing style, framing is encoded in a dedicated subspace reaching 80% peak accuracy and merging with moral-judgment computation only at layer 23.

Working paper 2026

Shingeki no Features: Are Moral Framing Effects in LLMs Shallow or Deep?

Moral framing is encoded in a dedicated subspace orthogonal to the moral-judgment axis for 65% of layers, then partially aligns with judgment at layer 23 where a 20-logit gap opens. Cosine similarity misses this entirely; targeted probes detect framing at 8...

P. M. Konrad

Working paper, 2026

The Predictive Understanding Benchmark (PUB) double-standard illustration. Any interpretability method competes on proper scoring rules across four regimes of increasing difficulty, with cross-model transfer as the acid test.

Working paper 2026

Stop Demanding Mechanistic Understanding of AI That We Have Never Achieved for Ourselves

On a Gemma-2-2B-IT pilot, a linear probe on residual-stream activations collapses to chance out-of-distribution (Brier 0.501) while behavioural features remain strong (0.104). Method rankings change across regimes, a structure invisible under current single...

P. M. Konrad

Working paper, 2026

Scaling acts within an epistemic horizon. The achievable loss curve has a non-zero floor set by the horizon, regardless of how much more model size, data, or compute is added. Crossing the floor requires richer access, not more scale.

Working paper 2026

Scaling Within an Epistemic Horizon: Observational Equivalence and Creator Hypotheses

Empirical scaling laws describe loss decline within a fixed epistemic horizon. They cannot, in principle, decide between worlds that are observationally equivalent under that horizon. Scaling sharpens estimation; it does not enlarge the horizon. Creator hyp...

P. M. Konrad

Working paper, 2026 (PhilML ICML 2026 workshop)

Linear-probe accuracy for predicting user demographic attributes from residual-stream activations across layers, reaching 100% at late layers using only a name as the demographic signal.

Working paper 2026

Differential Dishonesty: Language Models Encode User Demographics and Deviate from Their Own Beliefs Accordingly

Linear probes recover 100% accuracy at predicting user race and gender from late-layer residual-stream activations using only a name as the demographic signal. Ablating that direction during generation narrows the gender honesty gap by 62% and reduces overa...

P. M. Konrad

Working paper, 2026

3 × 3 transfer matrix showing how optimised attacks and monitors generalise (or fail to) across each other and against the baseline. The optimised monitor is worse than the baseline against the very attack it trained against, and collapses to chance on held-out tasks.

Working paper 2026

AutoRed: Measuring the Elicitation Gap via Automated Red-Blue Optimization

The best optimised monitor achieves 90% safety during training but averages just 47% on held-out tasks with a different attack mechanism, while a simple threshold baseline maintains 100% with zero variance. On held-out tasks the baseline reaches AUROC = 0.8...

P. M. Konrad

Submitted to the Apart Research × Redwood Research AI Control Hackathon (March 2026)

Tower decay of mutual information across a chain of created minds, for several source-entropy levels and a fixed projection efficiency. Each level can only refine, never recover, what the previous level lost.

Working paper 2026

Identifiable Abstractions from Observation and Intervention

A measurable query about a source variable is empirically identifiable if and only if it factors through the canonical experiment signature, equivalently the accessible sigma-algebra generated by the experiment family. Refinement is strict exactly when a ne...

P. M. Konrad

Working paper, 2026 (PhilML ICML 2026 workshop)

CRPS per round across the 16 forecasting models benchmarked on the DK1 bidding zone. The anomaly-augmented feature pipeline drives the largest single contribution to forecast quality (46% MAE reduction over the price-only baseline).

In Progress 2026

The Trader's Trinity: Forecasting Models, RL Agents, and LLM Judges for Day-Ahead Markets

An anomaly-augmented feature pipeline drives a 46% MAE reduction over price-only baselines, the largest single contribution in the feature ablation, with XGBoost reaching 16.20 EUR/MWh on the DK1 bidding zone. The Conditional Neural Process baseline fails c...

P. M. Konrad, T. L. Adam

BSc Thesis, University of Southern Denmark (in collaboration with Danfoss)

Education

MPhil in Machine Learning and Machine Intelligence Oct 2026 -- (Incoming)

University of Cambridge

Accepted to the MPhil in Machine Learning and Machine Intelligence programme

Exchange Semester Sep 2025 -- Dec 2025

Hong Kong University of Science and Technology (HKUST)

COMP4211 Machine Learning
COMP4471 Deep Learning in Computer Vision
COMP4901B Large Language Models
COMP4901Z Reinforcement Learning
COMP6411D Data Visualisation (Postgraduate)

BSc. Engineering (Software Engineering) Sep 2023 -- Jun 2026

University of Southern Denmark (SDU)

A practical, project-centric curriculum where theoretical knowledge is applied in mandatory, semester-long team projects
These projects involve developing complex, data-intensive software systems in domains like IoT and AI, often in collaboration with industry partners

Research Experience

Research Collaborator Jan 2026 -- Present

DataVISards, Hong Kong University of Science and Technology (HKUST)

Collaborating with the DataVISards research group on machine learning and data visualization projects

Research Collaborator Jan 2026 -- Present

Data and Intelligence Lab, SDU

Independently initiating and executing machine learning research projects, contributing to novel problem formulation and experimental design
Co-authoring academic papers for publication, contributing to manuscript drafting, literature reviews, and the revision process

Research Assistant Sep 2024 -- Dec 2025

Data and Intelligence Lab, SDU

Developing end-to-end machine learning projects, from co-initiating concepts to building scalable ML pipelines and executing experiments
Co-authoring academic papers for publication, contributing to manuscript drafting, literature reviews, and the revision process
Assisting in the drafting and preparation of grant proposals to secure research funding
Managing the full research data lifecycle, including multimodal data collection, processing, and documentation

Teaching Assistant Jan 2026 -- Present

University of Southern Denmark (SDU)

Supporting course delivery through syllabus planning, lab exercise development, and direct student instruction
Designing hands-on lab assignments and contributing to curriculum development

Professional Experience

Founder Jan 2026 -- Present

SaturoLabs

Solo umbrella for product experiments at the intersection of AI and wellbeing.
DreamBear (flagship): AI bedtime story app for neurodivergent children aged 3 to 10, where ADHD energy, autism-related focus, and dyslexic creativity become heroic superpowers in personalised narratives. iOS 16+, freemium ($9.99/mo or $79.99/yr), COPPA compliant, no ads or data sharing. Built on the Anthropic Claude API, Next.js, and ElevenLabs voice synthesis. Tagline: "Where every mind shines."
Other product surfaces: saturolabs.net, claudeboyz.com, getproofz.com.

CTO and Co-Founder Jun 2024 -- Nov 2024

Tutora ApS

Led the end-to-end development of the company's websites and core web application
Implemented the 'Shape Up' product development framework to streamline technical execution
Applied insights from previous startup experience to optimize the development lifecycle and avoid common pitfalls

Co-CEO and Co-Founder Nov 2022 -- Jun 2024

Yeager GmbH

Co-founded the company and led development of stabil.ai, an innovative AI-powered mobile app for personalized powerlifting training
Engineered intelligent algorithms to personalize training plans using individual data (MRV/MEV) and dynamic real-time feedback
Oversaw the full product lifecycle, from UX/UI concept and design to full-stack implementation, adhering to lean startup principles

Staff Duty Soldier Oct 2017 -- Sep 2021

Bundeswehr (German Armed Forces)

Led a small HR team responsible for the administration of over 600 soldiers
Streamlined administrative processes and document workflows to improve efficiency in a high-stakes naval command environment

Skills

Mechanistic Interpretability: TransformerLens, nnsight, SAE-Lens, custom hook libraries, activation patching, linear probing, logit lens, ablation studies, residual-stream analysis
Deep Learning: PyTorch (MPS), Hugging Face Transformers, accelerate, peft, trl, datasets
Open-Weight Models: Gemma 2 / 3, Qwen 2.5, LLaMA 3.1, Mistral (typically 0.5B–8B, MacBook-reproducible)
Experimentation: Weights & Biases, Optuna, fixed seeds, walk-forward validation, bootstrap confidence intervals, permutation testing
Data & Analysis: NumPy, Pandas, scikit-learn, scipy, Matplotlib, Seaborn
Languages: Python, JavaScript / TypeScript, SQL
Product Engineering: Next.js, React, React Native, Node.js, Vercel, Anthropic Claude API, ElevenLabs, Docker, GCP

Honors and Awards

Top-10 Placement 2025

Danish National Championship in AI (DMiAI)

Achieved top-tier placement for a second consecutive year in the national competition organized by the Danish Society for Artificial Intelligence

Top-10 Placement 2024

Danish National Championship in AI (DMiAI)

Achieved top-tier placement in a prestigious national competition for students and professionals, organized by the Danish Society for Artificial Intelligence

1st Place, SDU Case Competition 2023

SDU Sønderborg

Awarded 1st place out of numerous teams in an intensive 48-hour competition focused on sustainability
An Event with cases from leading danish companies such as Danfoss or Linak
Developed the winning solution for a real-world business case presented by the company WE-USE

Formal Recognition for Exemplary Service 2021

Bundeswehr (German Armed Forces)

Formally commended for setting a benchmark in dedication, officially designating him as an exemplar of professionalism for all enlisted personnel
Recognized for demonstrating an exceptionally optimistic and creative work ethic, proactively seeking out new responsibilities beyond his core duties, and thereby making a direct contribution to the unit's operational readiness and mission success

Performance Bonus for Outstanding Achievement 2021

Bundeswehr (German Armed Forces)

Awarded a significant and rarely-issued financial bonus for sustained, far above-average performance that consistently exceeded all expectations
Singled out for demonstrating an impeccable work morale and a profound sense of duty, trusted to independently manage complex tasks to the highest quality standards. His high social competence was noted as a key factor in improving the workload of superiors and fostering a positive and effective operational climate

Certifications

Venture Capital Explorer Programme 2026

Accelerace and BII

Intensive 4-day program equipping students to drive impact through venture capital. Curriculum covers the VC investment process, founder sourcing, startup assessment, venture risk evaluation, and founder relations. Covers topics including investment committee operations, exits, and practical skills testing through case studies and simulations.

Machine Learning 2024

Stanford Online

Comprehensive professional certification covering machine learning fundamentals, algorithms, supervised and unsupervised learning, feature engineering, model evaluation, and practical applications across diverse domains.

Foundational C# with Microsoft 2024

Microsoft

Certification in C# programming fundamentals

React Native Course 2023

Online

Mobile development with React Native

Google UX Design Professional Certificate 2023

Google

Professional certification covering user-centered design principles, wireframing, prototyping, usability testing, and end-to-end product design methodology. Emphasizes research-driven design decisions and user empathy.

Full-Stack Engineer Career Path 2023

Codecademy

Comprehensive full-stack development certification

Certified Specialist for Real Estate Loan Brokerage 2022

IHK

Professional certification in real estate loan brokerage

Get in touch

If any of this is useful to you, write to me.

phongsakon@outlook.dk

About

Questions I am asking

Why do language models lie?

Can a model be honest with the person it is talking to, regardless of who is asking?

What can a system never know about itself?

Are we approaching consciousness in language models, or are we already there?

Can interpretability scale, or does it have to stay handmade?

When does a model know it is guessing?

What happens when agents negotiate with agents at scale?

Selected Work

A fine-tuning-installed routing subspace controls eval vs deploy behavior

Predicting fine-tuning recruitment from base-model gate geometry

A minimum acceptance standard for safe fine-tuning defense evaluations

The scalar-update frontier for data-free first-order defenses

Differential dishonesty

Every mirror has a blind spot

News

Funding

Full Research Portfolio

Education

Research Experience

Professional Experience

Skills

Honors and Awards

Certifications

Get in touch