Equivset is a library for detecting visually similar UTF-8 characters.
Equivset is designed to prevent abuse through imitation of words and focusses primarily on letters and punctuation (not emojis or other symbols). It contains mapping of visually identical characters from Unicode Confusables such as Latin "A" and Greek "Α" (alpha), as well as additional mapping for visually similar characters such as "S" and "$" (dollar sign).
It is used at Wikimedia in the AntiSpoof
and AbuseFilter software to determine if two characters are visually equivalent.Data
The library provides its dataset of equivalent set of characters in a standard JSON format and a plain text format (browse files)
It also provides an access library for PHP.
External links
This article is issued from Mediawiki. The text is licensed under Creative Commons - Attribution - Sharealike. Additional terms may apply for the media files.