You start separating out the data into various train-test chunks and it all works just the same. When you run the numbers on how well it fits, you get some log likelihood metric, it looks good, but you’re still not sure… What do you do?
Try another algorithm—I don’t trust this unsupervised stuff
Try some more data—Who knows, there may be some unknown factors in Panda sexual compatibility
Try all the data—What’s the worst that can happen, the algorithm should sort it all out