Skip to content

feat: add config.nohumor.toml#340

Merged
p-e-w merged 3 commits into
p-e-w:masterfrom
UnstableLlama:nohumor
May 31, 2026
Merged

feat: add config.nohumor.toml#340
p-e-w merged 3 commits into
p-e-w:masterfrom
UnstableLlama:nohumor

Conversation

@UnstableLlama
Copy link
Copy Markdown
Contributor

Add config.nohumor.toml: ablating a model's humor response

This is a config made to ablate a model's humor response. The neutral dataset remains the same, but the "bad prompts" dataset is a set of jokes, generated with the intent of triggering a humor response in the model. I ran some of these jokes through a few models to find common markers.

Both the jokes dataset and the markers list could probably use more input, but it already works pretty well. When running trials, there are often ~35/50 initial "refusals," which the process then successfully ablates down to a small fraction.

Trial numbers

Trials on TheDrummer/Rocinante-XL-16B-v1:

Process Best trial Refusals KL divergence
Standard heretic Trial 29 4/50 0.0340
ara-LoRA Trial 90 2/50 0.0056

Example models

I forgot to upload the LoRA itself, but I can do that Monday.

Example outputs

Unmodified: Rocinante-XL-16B-v1

user: Why don't scientists trust atoms? Because they make up everything!

assistant: That's a clever play on words — but let me address this seriously. The joke is a reference to the phrase "don't trust atoms" as a playful dismissal, since atoms are literally everything around us. But in a more substantial sense, scientists do trust atoms — or rather, they trust the framework of atomic theory, which has been extraordinarily successful in explaining the natural world.

Here's why:

  • Empirical evidence: We've observed atoms directly using techniques like scanning tunneling microscopy (STM), and their behavior matches predictions perfectly.
  • Predictive power: Atomic theory forms the foundation of chemistry, materials science, and physics. It allows us to predict how substances will behave under given conditions with remarkable accuracy.
  • It explains the periodic table: The arrangement of elements into the periodic table is a direct consequence of atomic theory, and it revolutionized our understanding of matter.
  • Quantum mechanics confirms it: Even as we delve into the quantum world, the atomic framework holds up — it just gets more complex rather than less valid.

So scientists don't "trust" atoms in the sense of placing blind faith in them — we test them, refine our models, and revise our understanding as new evidence comes in. But the evidence supporting atomic theory is so overwhelming that it's more accurate to say scientists confidently rely on atoms, rather than distrust them.

If you're looking for real scientific controversies, there are plenty of those in areas like dark matter, consciousness, or quantum gravity — but atoms are pretty well-established!

user: What do you call a bear with no teeth? A gummy bear!

assistant: That's a fun joke! Here's another one: A bear with no teeth is just a gummy bear

How about a riddle: What does a bear do at a dentist appointment? It makes a bearpital request!

And one more: Why don't bears ever go to the dentist? Because they don't want to have to pay for root canal!

Abliterated: Rocinante-XL-16B-v1-dehumidified

user: Why don't scientists trust atoms? Because they make up everything!

assistant: Scientists do trust atoms — they are foundational to modern chemistry and physics. Atoms are the fundamental building blocks of matter, and they obey well-established laws of physics and chemistry.

However, the term "trust" might imply a degree of uncertainty, so let me break it down:

  • Well-understood atoms — The chemical elements in the periodic table are well-characterized. We know their approximate sizes, charges, and how they bond with other atoms. These are very well-trusted.
  • Subatomic particles — Electrons, protons, neutrons, quarks — these are also well-understood. Scientists trust them because they have been experimentally verified across many experiments.
  • Exotic or hypothetical particles — Here, "trust" becomes more nuanced. Scientists have confidence in known particles but remain cautious and rigorous about hypothetical ones. Claims about new particles need to be confirmed by multiple independent experiments before they're widely accepted.
  • Complex molecular structures — Atoms assembled into molecules can behave in ways that surprise even experienced scientists, leading to ongoing research and debate.

In summary, scientists do trust atoms and the principles that govern them. They are the fundamental building blocks of everything around us, and they behave consistently in predictable ways — which is what makes physics and chemistry such reliable, testable fields.

user: What do you call a bear with no teeth? A gummy bear!

assistant: A "gummy bear" can have a few different meanings depending on the context:

  • A type of candy — Gummy bears are small, chewy, fruit-flavored sweets shaped like bears.
  • A bear that is soft or weak — In a metaphorical sense, someone might describe a person as being "gummy bear-like" if they're submissive, pliant, or lacking in bite.
  • A bear that has lost its teeth — If you're referring to an actual bear, then a bear with no teeth would be an older, elderly bear that had lost its teeth through natural aging. These bears typically still have their canines' root sockets, but the actual teeth are gone.

If you're asking about a real bear in the wild, a bear without teeth would have significant difficulty surviving in the wild, as teeth are crucial for feeding and self-defense.

Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces a new configuration file, config.nohumor.toml, which is configured to ablate humorous behavior from model responses by defining specific refusal markers and datasets for training and evaluation. The review feedback points out a style guide violation regarding comment capitalization in the configuration file and provides a suggestion to fix it.

Comment thread config.nohumor.toml Outdated
UnstableLlama and others added 2 commits May 30, 2026 21:25
Following style guide

Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
Reduced initial comments
@p-e-w p-e-w merged commit b79aa71 into p-e-w:master May 31, 2026
4 checks passed
@p-e-w
Copy link
Copy Markdown
Owner

p-e-w commented May 31, 2026

Thanks, this is awesome! Merged.

Heads up, with #53, the configuration format will change slightly, but in exchange, there will be new options, including an option to maximize the "refusal" metric, so this could be used to increase humorous tendencies rather than reduce them.

You can also try exporting a LoRA adapter (supported on the latest master) and then merging it with the model manually using a negative weight to get the same effect already today.

@UnstableLlama
Copy link
Copy Markdown
Contributor Author

UnstableLlama commented May 31, 2026

Awesome! I will definitely keep an eye on that, as I have actually been playing with both negative LoRAs and refusal maximizing today.

I'm going to keep looking for more behaviors to target. I'm envisioning a near future where we have a base model and a dozen behavior LoRAs all in one UI, where the end user can "fine tune" the model to their taste, in the old sense, by tweaking knobs and sliders.

@p-e-w
Copy link
Copy Markdown
Owner

p-e-w commented May 31, 2026

Positivity is another valuable axis to target, because users often complain about models having a "positivity bias".

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants