What is the correct way to use the reference clock and fill audio data in AVStream?

My pin-centric driver will get a clock at the acquire state, like the samples do, and then fills audio data based on CorrelatedTime.

The problem is that sometimes CorrelatedTime appears to slow down. The difference between CorrelatedTime and SystemTime (obtained from the parameter of GetCorrelatedTime) increases. At times, the audio also seems to slow down (but I'm not 100% sure, I don’t have a precise way to verify the audio). This problem happens on graphedit.

If I ignore CorrelatedTime and fill data every time the process called, other players such as VLC will drop an amount of data, and only fragments of the audio remain. PS: but VLC debug shows a lot: "dshow debug: CapturePin::Receive trashing late input sample"

So why does this happen (CorrelatedTime slow down)? and how should I fill the data at an appropriate rate?