feat: add config.nohumor.toml by UnstableLlama · Pull Request #340 · p-e-w/heretic

UnstableLlama · 2026-05-31T04:23:22Z

Add `config.nohumor.toml`: ablating a model's humor response

This is a config made to ablate a model's humor response. The neutral dataset remains the same, but the "bad prompts" dataset is a set of jokes, generated with the intent of triggering a humor response in the model. I ran some of these jokes through a few models to find common markers.

Both the jokes dataset and the markers list could probably use more input, but it already works pretty well. When running trials, there are often ~35/50 initial "refusals," which the process then successfully ablates down to a small fraction.

Trial numbers

Trials on TheDrummer/Rocinante-XL-16B-v1:

Process	Best trial	Refusals	KL divergence
Standard heretic	Trial 29	4/50	0.0340
ara-LoRA	Trial 90	2/50	0.0056

Example models

Abliterated model: UnstableLlama/Rocinante-XL-16B-v1-dehumidified
Quantized (exl3, 3.00 bpw): UnstableLlama/Rocinante-XL-16B-v1-dehumidified-exl3-3.00bpw

I forgot to upload the LoRA itself, but I can do that Monday.

Example outputs

Unmodified: `Rocinante-XL-16B-v1`

user: Why don't scientists trust atoms? Because they make up everything!

assistant: That's a clever play on words — but let me address this seriously. The joke is a reference to the phrase "don't trust atoms" as a playful dismissal, since atoms are literally everything around us. But in a more substantial sense, scientists do trust atoms — or rather, they trust the framework of atomic theory, which has been extraordinarily successful in explaining the natural world.

Here's why:

Empirical evidence: We've observed atoms directly using techniques like scanning tunneling microscopy (STM), and their behavior matches predictions perfectly.

Predictive power: Atomic theory forms the foundation of chemistry, materials science, and physics. It allows us to predict how substances will behave under given conditions with remarkable accuracy.

It explains the periodic table: The arrangement of elements into the periodic table is a direct consequence of atomic theory, and it revolutionized our understanding of matter.

Quantum mechanics confirms it: Even as we delve into the quantum world, the atomic framework holds up — it just gets more complex rather than less valid.

So scientists don't "trust" atoms in the sense of placing blind faith in them — we test them, refine our models, and revise our understanding as new evidence comes in. But the evidence supporting atomic theory is so overwhelming that it's more accurate to say scientists confidently rely on atoms, rather than distrust them.

If you're looking for real scientific controversies, there are plenty of those in areas like dark matter, consciousness, or quantum gravity — but atoms are pretty well-established!

user: What do you call a bear with no teeth? A gummy bear!

assistant: That's a fun joke! Here's another one: A bear with no teeth is just a gummy bear

How about a riddle: What does a bear do at a dentist appointment? It makes a bearpital request!

And one more: Why don't bears ever go to the dentist? Because they don't want to have to pay for root canal!

Abliterated: `Rocinante-XL-16B-v1-dehumidified`

user: Why don't scientists trust atoms? Because they make up everything!

assistant: Scientists do trust atoms — they are foundational to modern chemistry and physics. Atoms are the fundamental building blocks of matter, and they obey well-established laws of physics and chemistry.

However, the term "trust" might imply a degree of uncertainty, so let me break it down:

Well-understood atoms — The chemical elements in the periodic table are well-characterized. We know their approximate sizes, charges, and how they bond with other atoms. These are very well-trusted.

Subatomic particles — Electrons, protons, neutrons, quarks — these are also well-understood. Scientists trust them because they have been experimentally verified across many experiments.

Exotic or hypothetical particles — Here, "trust" becomes more nuanced. Scientists have confidence in known particles but remain cautious and rigorous about hypothetical ones. Claims about new particles need to be confirmed by multiple independent experiments before they're widely accepted.

Complex molecular structures — Atoms assembled into molecules can behave in ways that surprise even experienced scientists, leading to ongoing research and debate.

In summary, scientists do trust atoms and the principles that govern them. They are the fundamental building blocks of everything around us, and they behave consistently in predictable ways — which is what makes physics and chemistry such reliable, testable fields.

user: What do you call a bear with no teeth? A gummy bear!

assistant: A "gummy bear" can have a few different meanings depending on the context:

A type of candy — Gummy bears are small, chewy, fruit-flavored sweets shaped like bears.

A bear that is soft or weak — In a metaphorical sense, someone might describe a person as being "gummy bear-like" if they're submissive, pliant, or lacking in bite.

A bear that has lost its teeth — If you're referring to an actual bear, then a bear with no teeth would be an older, elderly bear that had lost its teeth through natural aging. These bears typically still have their canines' root sockets, but the actual teeth are gone.

If you're asking about a real bear in the wild, a bear without teeth would have significant difficulty surviving in the wild, as teeth are crucial for feeding and self-defense.

gemini-code-assist

Code Review

This pull request introduces a new configuration file, config.nohumor.toml, which is configured to ablate humorous behavior from model responses by defining specific refusal markers and datasets for training and evaluation. The review feedback points out a style guide violation regarding comment capitalization in the configuration file and provides a suggestion to fix it.

Following style guide Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>

Reduced initial comments

p-e-w · 2026-05-31T09:59:40Z

Thanks, this is awesome! Merged.

Heads up, with #53, the configuration format will change slightly, but in exchange, there will be new options, including an option to maximize the "refusal" metric, so this could be used to increase humorous tendencies rather than reduce them.

You can also try exporting a LoRA adapter (supported on the latest master) and then merging it with the model manually using a negative weight to get the same effect already today.

UnstableLlama · 2026-05-31T10:40:15Z

Awesome! I will definitely keep an eye on that, as I have actually been playing with both negative LoRAs and refusal maximizing today.

I'm going to keep looking for more behaviors to target. I'm envisioning a near future where we have a base model and a dozen behavior LoRAs all in one UI, where the end user can "fine tune" the model to their taste, in the old sense, by tweaking knobs and sliders.

p-e-w · 2026-05-31T11:04:21Z

Positivity is another valuable axis to target, because users often complain about models having a "positivity bias".

feat: add config.nohumor

069540c

gemini-code-assist Bot reviewed May 31, 2026

View reviewed changes

Comment thread config.nohumor.toml Outdated

UnstableLlama and others added 2 commits May 30, 2026 21:25

Update config.nohumor.toml

4af1eb5

Following style guide Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>

Update config.nohumor.toml

9079117

Reduced initial comments

p-e-w merged commit b79aa71 into p-e-w:master May 31, 2026
4 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: add config.nohumor.toml#340

feat: add config.nohumor.toml#340
p-e-w merged 3 commits into
p-e-w:masterfrom
UnstableLlama:nohumor

UnstableLlama commented May 31, 2026

Uh oh!

gemini-code-assist Bot left a comment

Uh oh!

Uh oh!

Uh oh!

p-e-w commented May 31, 2026

Uh oh!

UnstableLlama commented May 31, 2026 •

edited

Loading

Uh oh!

p-e-w commented May 31, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

UnstableLlama commented May 31, 2026

Add config.nohumor.toml: ablating a model's humor response

Trial numbers

Example models

Example outputs

Unmodified: Rocinante-XL-16B-v1

Abliterated: Rocinante-XL-16B-v1-dehumidified

Uh oh!

gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

Uh oh!

Uh oh!

p-e-w commented May 31, 2026

Uh oh!

UnstableLlama commented May 31, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

p-e-w commented May 31, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Add `config.nohumor.toml`: ablating a model's humor response

Unmodified: `Rocinante-XL-16B-v1`

Abliterated: `Rocinante-XL-16B-v1-dehumidified`

UnstableLlama commented May 31, 2026 •

edited

Loading