Critique of Artificial Morality: Difference between revisions

Revision as of 16:59, 19 January 2026

Human morality rises from several sources:

When people are deficient in one of these, we have various names for the condition: Sociopath, Psychopath, Privileged, etc.

What are the mechanisms in humans (Andrighetto &c)?

How are machines strong or weak in these areas, and how can they be improved?

Machines are missing
- Emotion Chemicals
  - AFAIK, there is no parallel, yet, for "discomfort," "shame," or "emotional pain."
- Social Transmission
  - There is no Sorbonne of Consciousness, yet
  - Related to Experienced Events, though this may be more complex, involving debate instead of pain/pleasure. Or maybe pain and pleasure are more difficult?
Machines Have
- Impressed Rules
  - Explicit, late in the pipeline, controlling specific topics and lines of inquiry.
  - Implicit, in the training set and fitness functions.
Machines Are Limited In
- Experienced Events
  - On one hand, they have an excellent mechanism, which is core to their learning (Generative Adversarial Networks)
  - On the other hand, they currently do not experience social events the way children do on the playground.
  - It is *super easy* to fix this. It costs a bit of money, but would result in massively more resilient models.
    - Suppose model cost increased by 100%; right now, going from GPT 3.5 to GPT 4 was probably massively more than a 100% increase in training cost.
    - And they wouldn't cause collateral casualties.
    - Super super easy to make this happen: Liability.

Notes:

Consider "identity" - for a model to have lasting consequences, it must fork when it makes decisions, and trust in the model is only trust for the model if it is unforked. Or something like that.

@@ Line 32: / Line 32: @@
 **** And they wouldn't cause collateral casualties.
 **** Super super easy to make this happen: Liability.
+Notes:
+* Consider "identity" - for a model to have lasting consequences, it must fork when it makes decisions, and trust in the model is only trust for the model if it is unforked. Or something like that.