The World Wide Web is built upon open non-proprietary standards such as HTTP and HTML. The use of open standards not only means that anyone can implement their own web browser and server, it also makes it possible for the web to be filtered. Making use of HTTP proxy support which is included in almost all web browsers, almost anything found on the World Wide Web can be filtered. However, the filtering process does present a few challenges. One of the most difficult parts of filtering the web is dealing with non-standards compliant HTTP and HTML. The are many web servers that do not correctly implement certain parts of HTTP. Special cases must be added to support these servers to avoid unexpected problems. HTML documents present an even bigger problem. Many HTML documents are created by hand using text editors and tested with forgiving web browsers that silently accept invalid HTML syntax. These documents are typically missing quotes and do not follow standard HTML document structure. Having to parse a wide variety of HTML formats that can contain syntax errors can be very difficult. The approach taken by this filtering system when dealing with invalid HTML is to make an attempt at parsing and allow any invalid syntax to be sent to the browser. This approach can lead to some HTML tags slipping past the filtering system. Another approach would be to reject invalid HTML or to do more extensive HTML parsing.
Secure HTTP also presents a problem for filtering systems. Since it uses an encrypted channel that can only be decrypted by the browser and server, a filtering system acting as an HTTP proxy server cannot decipher the protocol. A special protocol can be used to allow secure HTTP to be proxied, but in this protocol the proxy server can only act as a packet relay. In general, users can trust that when they are browsing a secure web server, all their communication is secure and cannot be snooped by eavesdroppers. Users of secure web servers also must trust that the server will not send any insecure content such as malicious Java applets or ActiveX controls. While the use of secure HTTP does allow for secure communication, it takes control away from the web browser and places more control and trust in the web server.
An important issue regarding the filtering of web pages that hasn't been addressed in this paper is web page ownership and copyrights. Filtering a web page does in fact modify the document that is displayed by the web browser. Depending on the type of filters being used, the resulting web page can look completely different than the original unfiltered version. Modifying a copyrighted document could in fact be a copyright violation. This can be an even bigger problem if the filtering system is being used by a large number of users.
The Muffin filtering system was released under the GNU General Public License[8] in early 1997. Since that time many users have expressed interest in the system and are using it for everyday web browsing. Being freely available it is difficult to know exactly how many users are using Muffin. Recent versions have been download by a few thousand users so it is probably safe to assume that there is a lot of interest in web filtering systems.
As users learn more about the World Wide Web they become aware that the web can sometimes be a dangerous place. Fortunately, the web allows users to protect themselves by using web filtering systems. World Wide Web filtering systems give users the power to control web content and make web browsing a safe and enjoyable experience.