Core question
How can explicit rules influence training without replacing the neural model itself?
Project 01
A small-scale implementation used to examine how logic rules can be converted into a teacher distribution and subsequently distilled into student parameters.
How can explicit rules influence training without replacing the neural model itself?
Start with the training loop, then inspect the rule module, and finally trace how the teacher distribution is constructed and reused.
A key mechanism is that rules do not overwrite labels directly; they first shape a teacher distribution, which is then used to regularize the student through distillation.