Disclaimer: I love the lib.rs search and general UI. I don’t like crypto currencies.
I think the way to avoid drama is to be very clear and transparent in communication. In this case I think a way to do this would be to label data that lib.rs synthesised. Maybe a asterix next to corrected categories that on mouse over (long press on phones) says something like “inferred by lib.rs due to missing data”? Exact wording could certainly be improved, and might differ on context. Perhaps the synthesised data could be a different colour as well to stand out.
Having a list of packages that were filtered out might also help. Here I’m thinking a simple text file (set to not be indexed in robots.txt) with all the package names that have filtered along with the reason listed (e.g. “auto detected name squat”). Anyone interested could download the file and take a look, as well as contact you for corrections.
Ranking algorithms is harder to be transparent about (and it is not my field of expertise), so I can’t offer any advise here. Perhaps nothing is needed?
I saw some other crate doing something similar but using wasm, the idea is to sandbox the binary used as a proc macro. So that seems a bit better. Can’t see to find it any more.
EDIT: Found it https://lib.rs/crates/watt