Hypertext Markup Language (HTML) injection is a technique used to take advantage of non-validated input to modify a web page presented by a web application to its users. Attackers take advantage of the fact that the content of a web page is often related to a previous interaction with users. When applications fail to validate user data, an attacker can send HTML-fomatted text to modify site content that gets presented to other users. A specifically crafted query can lead to inclusion in the web page of attacker-controlled HTML elements which change the way the application content gets exposed to the web.
What You Will Learn
Detailed Description
HTML is the language that determines how application data (like a products’ catalog) gets presented to users in their web browser. This language contains visualization commands, like the color of the page’s background and the size of embedded pictures. It also contains links to other web pages, and additional commands intended for the user’s browser. Furthermore, automated tools that collect useful information from the web on behalf of users often do so by systematically accessing and parsing the relevant information in the application’s HTML pages.
In modern interactive web pages, the content of a web page often reflects the results of processing previous user actions. If the user’s input is not validated and the application is vulnerable, an attacker can craft and send input to the application that lets him inject pieces of his HTML code into the HTML content of the application’s response.
HTML injection attack is closely related to Cross-site Scripting (XSS). HTML injection uses HTML to deface the page. XSS, as the name implies, injects JavaScript into the page. Both attacks exploit insufficient validation of user input.
A simple example of potential HTML Injection is an application’s “Search” form, in which the user enters a query text. When the user submits the query, the application responds by dynamically generating a web page that shows matching results. This results page often shows the original query text to let the user see the context of these results. If the embedded query text contains syntactically correct HTML, it may add attacker-controlled text, images and links to this generated response page. In the following example, if the application does not validate the user-query before embedding it in the simplified results page, the attacked can add content to the page by sending a query that contains appropriate HTML elements (tags to close and open <h2> context), producing a valid HTML after the injection:
Web application template for search results page:
<html> <h1>Here are the results that match your query: </h1> <h2>{user-query}</h2> <ol> <li>Result A <li>Result B </ol> </html>
User query text:
</h2>special offer <a href=www.attacker.site>malicious link</a><h2>
Generated results page after injection:
<html> <h1>Here are the results that match your query: </h1> <h2></h2>special offer <a href=www.attacker.site>malicious link</a><h2></h2> <ol> <li>Result A <li>Result B </ol> </html>
Of course, the aim of an attacker is to inject HTML to pages seen by other users or automatic tools, not to web pages he sees in his browser, as in the previous example. For that to happen, the injection text must become part of the content of pages generated for and viewed by other users. The injection happens if the application stores the un-validated user input and displays the data to other users. Suppose the application above also has a page showing the history of users’ searches:
Web application template for search history page:
<html> <h1>Recent users queries:</h1> <ol> <li><h2>{user-query-1}</h2> <li><h2>{user-query-2}</h2> </ol> </html>
Generated search history page after the HTML injection:
<html> <h1>Recent users queries:</h1> <ol> <li><h2>funny cat movies</h2> <li><h2></h2>special offer <a href=www.attacker.site>malicious link</a><h2></h2> </ol> </html>
Now every user that will browse to the search results page will see the link injected by the attacker. If an unsuspecting user trusts the applications and clicks on the injected link it now contains, he is suddenly seeing content from an attacker-controlled domain.
A typical application use-case for storing one user’s input and showing it to other users is when an application contains pages where users can post comments to the original content of the page or interact with each other. This is another example where application vulnerabilities can lead to HTML injection.
See how Web Application Firewall can help you with HTML injection.
Prevention
The most common way of detecting HTML injection is by looking for HTML elements in the incoming HTTP stream that contains the user input. A naïve validation of user input simply removes any HTML-syntax substrings (like tags and links) from any user-supplied text. However, there are many instances where the application expects HTML input from the user. For example, this happens when the user submits visually-formatted text or text containing links to legitimate sites with related content. To avoid false positives, the security mechanism that detects possible injections and protects the application should learn in what application context user input is allowed to contain HTML. Also, it should be able to stop HTML input if it learns that such text is pasted as-is in web page generated by vulnerable application components.
Imperva SecureSphere Web Application Firewall does all that and more. From observing users communications it builds a profile of allowed HTML interactions. On the other hand, specific signatures and policies protect application components against known HTML injection points. Anomalies detected in the application’s interactions other time trigger policies for handling possible abuse. Furthermore, real-time information about active HTML injection attacks are gathered from customers and used to improve the protection for all customers.