These systems are usually particularly effective against similar, but widespread types of content such as pornographic material, gambling sites or anonymiser tools.
It’s difficult to compare the underlying technology – because it’s possible to do this in a multitude of ways. There’s closed loop learning, human directed learning, and then various models underneath, from simple HMM to things like tensorflow. All these can be done well – or badly.
There’s one essential question you can ask though – where does your filter apply these AI techniques? Usually, it’s one of two answers:
In line with the web filtering in real-time
Real–time filtering is either baked into a network appliance, or as part of a filtering client. You’ll see occasional updates to the rules database, but other than that, the filter makes all the decisions locally.
Out-of-band offline processing
With out-of-band intelligence, uncategorised URLs are fed back to the filter vendor, and the site is then visited by an automated web crawler or “spider”. The results are then passed through the intelligent system, and a categorisation attached to the URL. The categorisation makes it back to the point of filtering in regular URL list updates.
|Inline||Out of band|
|Spead of Reaction||Instant. Any filtering decision is applied straight away, leaving no opportunity for harmful content to get by.||Hours. Unknown content is queued waiting for the offline process to occur. Filtering is then caught up at the next regular update.|
|Effectiveness: Real-time Content||Excellent – real-time or rapidly changing content is reassessed each time, so a correct decision is made against up to date data.||Poor – generally the categorisation of a site is either permanently fixed, or fixed for months. This leaves sites with changing content open to misclassification.|
|Effectiveness: Context||Weak. Inline filters only see one page at a time and can’t make decisions based on what’s linked to.||Strong – with plenty of time to make a decision, an out-of-band filter can download links and images.|
|Effectiveness: Logged-in Content||Excellent – as these filters work on the data the user sees, even content behind a login such as a forum or social media will get scanned.||Useless – the out of band filter sees only the login page, which rarely provides any actionable content.|
|Additional Latency||Low – usually adding intelligence will add latency to each request. Properly designed systems will limit this, so it isn’t noticed by the user.||Zero – as all intelligence is out of band, there’s no additional latency.|
Looking at this table it’s clear that an inline filter is far more effective against today’s web which is increasingly volatile, and often behind a login. It’s also worth noting that an inline approach does not preclude additional out-of-band filtering – if you can find a vendor that combines these you will get the best of everything.
If you have a question or would like to learn more about the UK’s No.1 Web Filter, please get in touch. We’d be delighted to help.