You start separating out the data into various train-test chunks and it all works just the same. When you run the numbers on how well it fits, you get some log likelihood metric, it looks good, but you’re still not sure… What do you do?
Try Another Algorithm—I don’t trust this unsupervised stuff
Try more data—What’s the worst that can happen, the algorithm should sort it all out
Try less Data—There’s gotta be some research on the best way to pick your feature space