Republished with permission from WatchGuard Technologies, Inc.


Web Content Inspection and Screening:

Essential Uses for the HTTP Proxy and WebBlocker

by David M. Piscitello, Core Competence, Inc.

In today's organizations, the most commonly used application is the World Wide Web. Most of the information users obtain from the Internet is delivered over TCP port 80, using HyperText Transfer Protocol (HTTP). HTTP provides mechanisms to deliver content of all types—audio, video, images, fancy text, and unfortunately, active content, such as software macros, applets, and executables. Few firewall configurations block port 80 for outbound connections, and while most Web content is benign, some can be malicious. Organizations are often desperate to block the latter at the firewall.

Some organizations also worry that the Web offers too many diversions for employees. Cyberslacking, or using the Web for personal reasons at work, is blamed for poor productivity, misuse of Internet bandwidth, and leaked information. Organizations experiencing this problem may create a security policy that prohibits or limits personal Web use at work, but then they must police Web traffic to enforce the policy.

Both vulnerabilities can be addressed by examining content types and inspecting Web page content at the Firebox, using the HTTP Proxy and WebBlocker Controls.

Content Screening: Blocking Potentially Unsafe Content

Active content is a playground for attackers. Microsoft Office macros, Java applets, software executables, and ActiveX controls are notorious bearers of worms, viruses, Trojans and backdoors. Some organizations rely solely on uniformly configured and maintained desktop anti-virus protection and browser security settings to prevent a malicious code “incident” across the entire organization. I think this approach allows too much risk. If you find yourself in this situation, you can strengthen your defense by adding the HTTP Proxy service at the Firebox, then blocking content types you don’t trust.

Begin by establishing a policy that defines what downloadable content your organization permits. Then base your HTTP Proxy configuration on this list. From your Firebox’s Policy Manager, add the HTTP Proxy service (Edit => Add Service => Proxy => HTTP). From the Outgoing tab, enable and allow members and addresses as you would with any Firebox service, and be sure to click the Logging button and enable all the options.  From the Properties tab, keep the defaults checked in the Settings button that Deny Java and ActiveX applets, and block unknown HTTP headers. These settings are restrictive, but offer good protection against malicious active content and attacks that use HTTP.

Check the Allow only safe content types: box (HTTP Properties => Settings => Safe Content tab). Enumerate content types you will permit through the Firebox according to their Multipurpose Internet Message Extensions (MIME), the Internet’s standard method for identifying and encoding media (content) types. If, for example, your policy allows Javascript, which is commonly viewed as safe, you would click on Add and then, from the Select MIME Type pop-up window, choose application/x-javascript. The Settings dialog box provides a list of common MIME types. To view it, go to HTTP Properties => Settings => Safe Content tab and click Add. You can find additional information here.

Adding MIME types is easy. Deciding which types to allow is tricky. Organizations routinely allow audio, image, video, text, postscript, and Adobe .pdf files, as these are commonly used for Web content. Because Microsoft Office is popular, you may need to add its content types despite the known vulnerabilities.

But remember the security maxim, “Anything that is not expressly permitted is prohibited.” Let me suggest that you begin with a short list based on your policy, one that is more restrictive than the Firebox defaults—then wait for the phone to ring, and watch your log for denied entries like this:

37048 08/14/01 15:48:50 y http-proxy[100] 
   ICMP_Scanning_v2.5.pdf] Response denied: 
   Unsafe content type "application/pdf"

The value of initially blocking everything is that you will obtain a remarkably accurate picture of the kinds of content currently entering your network. This is “forensic” data that may identify systems running undesirable software behind your firewall. Moreover, by adding prudently and within the constraints of your content permission policy, you can  reduce many vulnerabilities associated with Web downloads and keep unwanted content out.

Preventing Leaks

When users request Web pages from servers, the initial HTTP GET operation also sends your operating system, browser type, client e-mail address, referring URL, and in some cases, intermediate proxy addresses. The proxy addresses and referring URLs shown in this GET operation may reveal the addresses of machines behind your Firebox. Your Firebox probably masquerades Trusted Network addresses, but it does so only at the IP level. Why let someone glean sensitive information from your HTTP data streams? Maybe it’s time to seal this crack.

To have the Firebox remove information-leaking headers, go to the Settings tab of the HTTP Proxy and check Remove client connection info. If you check the Deny Submissions option as well, you’ll prevent users from completing Web forms POSTed by public Web servers. Such Web forms often ask for internal mail drops, e-mail addresses, and phone numbers, exactly the kinds of information organizations should be careful about disclosing.

While you’re at this tab, consider blocking cookies. Some companies, especially those in regulated industries like health care, may benefit from an organization- wide effort to prevent tracking technologies from gathering information that might profile your company (for more on the dangers of "spyware," refer to my article, "Beware of Back Channels"). Today, it’s better to be overly conservative with privacy than to blithely assume that everyone operating a Web server has the best intentions of your organization at heart.

Content Inspection

Web surfing can become a serious distraction to some workers. Moreover, certain surfing can be offensive to office mates and may violate company codes of conduct. Some surfing may even infringe on co-workers’ Constitutional, federal- or state-granted individual rights, and put your company at risk of litigation. If your organization’s codes of conduct prohibit gambling, sexual harassment, or certain kinds of printed matter (e.g., sexual content), and you find a need to extend these policies to downloadable content from the Web, consider WebBlocker’s Web site filtering capabilities.

WebBlocker works with the HTTP Proxy and, once it's activated on your Firebox, is accessed from the Settings button (read more here). WebBlocker works in conjunction with SurfControl, which maintains a URL database of sites categorized according to advocacies. SurfControl mostly identifies Web pages that advocate and publish offensive or inappropriate content (Violence/Profanity, Full Nudity). Some of SurfControl's blocked sites don't advocate anything particularly bad, but contain content that can distract many office workers (e.g., sports and leisure).

Use WebBlocker to schedule times of day when all access or specific content categories should be blocked. For example, if I want to keep my employees from idling away hours reading about their favorite sports franchise, I can selectively block this content. If an employee visits sites that violate this policy, my log will contain entries like this:

2468 08/14/01 10:55:01 y http-proxy[100] 
   [] Request blocked by

2478 08/14/01 10:55:01 y http-proxy[100] 
   [] Request denied:
   blocked by WebBlocker

2678 08/14/01 10:55:27 y http-proxy[101]
   Includes/mini_form.js] Request blocked by WebBlocker
   (host contains blocked content: sports/leisure)

Notice that log entry 2678 has a slightly different message. This is because I customized the message that’s logged and displayed to the browser of the offending user -- you get the message, and so do they. You can activate this from the WebBlocker Controls tab of the HTTP Proxy (Edit => Add Service => Proxies => HTTP => Properties tab => Settings => WebBlocker Controls).

SurfControl’s categorization of Web pages isn’t perfect, and ambiguities will occur. The following syslog entry shows WebBlocker blocking sports/leisure content from

4, 128,, 8/16/01, 1:20:02 PM, http-proxy[102]: [] Request blocked by WebBlocker (host contains blocked content: sports/leisure)

This is, a regional bank serving the South. I don’t consider banking a sport or leisure, but something on this page triggered Surfcontrol’s categorization technology. When filtering goes awry in this manner, you can (a) contact SurfControl to tell them to rectify the anomaly, or (b) use the WB: Exceptions tab and enter this address as an allowed exception. I hate building lists of exceptions, so my recommendation is to use the exception while you wait for a response from SurfControl, then remove it if they agree with your assessment and update their database.

In Conclusion

With a modest effort, content screening can provide a worthwhile complement to desktop anti-virus measures. It can also help you maintain some degree of privacy in a privacy-desensitized society. Content inspection demands more administrative time, and remains controversial and difficult to implement across all media (see my article, "Seek Consistency Across All Media," for more on this), but in some industries, it may be a necessary evil. Better the evil you know than the evil you don’t know. ##


Other LiveSecurity articles about proxies include:

LiveSecurity articles about Web-related security issues include:

Copyright© 2001, WatchGuard Technologies, Inc. All rights reserved. WatchGuard, LiveSecurity, Firebox and ServerLock are trademarks or registered trademarks of WatchGuard Technologies, Inc. in the United States and other countries.

Copyright © 1996 - 2001 WatchGuard Technologies, Inc. All rights reserved.