Domain names are ASCII. ASCII gave a choice of 128 characters, of which the first 30 weren’t printable, and capitals and lower-case count as one each. Unsurprisingly, this didn’t leave room for accented characters like é, or cyrillic (“russian”) characters.
We now have Unicode – a unifying character set that currently offers 137 thousand characters. The most common representation, UTF-8, is used in over 90% of websites, however it is very new to DNS.
What does this mean for web filtering? Even though DNS queries still don’t support UTF-8 or Unicode, browsers, (which we update much more often), have taken on the role. International domains are now translated by the browser. Bücher.de is an example “IDN” – International Domain Name. It’s a German bookstore. It’s not visible in a DNS lookup – and some web filters fail here too. The browser however will translate it to the ASCII representation xn-- bcher-kva.de – which redirects to www.buecher.de
It’s important to check two things when considering a quality filter.
- How well do they categorise search terms in the languages used by your students?
- Can they filter based on international domain names? It’s common for searches for illicit material to start in a student’s native language, often because filters pay less attention to this.
If you have a question or would like to learn more about the UK’s No.1 Web Filter, please get in touch. We’d be delighted to help.