You could write your own “Allocator”. What I usually do is that I have my own functions for allocations that save all the allocations in a hash table with the location of the call. My code looks something like this:
#ifdef DEBUG_ALLOC
#define InternalAllocateBytesWithTag(pool, bytes, tag, ...) ::detail::DebugAllocate(pool, bytes, (ULONG)tag, __VA_ARGS__)
#define InternalAllocateBytes(pool, bytes, ...) InternalAllocateBytesWithTag(pool, bytes, 'Tagg', __VA_ARGS__)
#else
#define InternalAllocateBytesWithTag(pool, bytes, tag) ::ExAllocatePoolWithTag(pool, bytes, (ULONG)tag)
#define InternalAllocateBytes(pool, bytes, ...) InternalAllocateBytesWithTag(pool, bytes, 'Tagg')
#endif
#define Allocate(type, pool) reinterpret_cast<type*>(InternalAllocateBytes(pool, sizeof(type), __FILE__, __LINE__))
#define AllocateBytes(type, count, pool) reinterpret_cast<type*>(InternalAllocateBytes(pool, count, __FILE__, __LINE__))
InternalAllocateBytes then saves this in a hash table. On Free, you look up the address, if its not there, you can BugCheck because you are double freeiing, if successful, you just remove the entry from the table. At any point you can print all allocations and even check where they were made. I do this in a minifilter driver and I it doesnt seem to have any real performance impact. But if you do huge amounts of allocations/frees, you will start to notice it. You can also do some other cool memory checks, for example under and over allocating a bit and check for corruption when freeing, but you need to return correctly allocated memory.
Anyway what I would do in your case is find a point in time where the amount of allocated memory should be minimal, for example no drivers connected to it, and list all allocations. Or maybe print tags that you suspect are leaking? Or maybe do this only for the WDF object you suspect is leaking. But you need to ensure that what you match all custom allocations to their respective free functions, otherwise you will have false positives or the system hangs.