Demo for 0-shot classification on Danbooru images.
Davit-tiny backbone, ML-Decoder classification head, Alibaba-NLP/gte-large-en-v1.5 text embedding model.
Training set includes IDs with <= 5,400,000 and last 3 digits in range [0, 899], inclusive.
Get image by uploading or fetching by post ID.
Get tag description by input box or fetching by tag name.