$ cat "

Removing the Suck from XML with Gosu.Commons: DynamicXmlParser

"

Ever been bored by writing yet another XML parser? Been annoyed by all the string conversions? Let's take a look at the DynamicXmlParser in Gosu.Commons.

So, let's say we have an XML document containing a book catalog:


<?xml version='1.0'?>
<Catalog>
<Book Id='123'>
<Title>XML Developer's Guide</Title>
<Author FirstName='Matthew' LastName='Gambardella' />
<Price>44.95</Price>
<PublishDate>2000-10-01</PublishDate>
<IsBetaRelease>false</IsBetaRelease>
<BookType>Ebook</BookType>
</Book>
<Book Id='456'>
<Title>Build Awesome Command-Line Applications in Ruby</Title>
<Author FirstName='David' LastName='Copeland' />
<Price>20.00</Price>
<PublishDate>2012-03-01</PublishDate>
<IsBetaRelease>true</IsBetaRelease>
<BookType>Hardcover</BookType>
</Book>
</Catalog>

We want to cram that XML document into our domain objects:


public class Author
{
public string FirstName { get; set; }
public string LastName { get; set; }

public override string ToString()
{
return FirstName + " " + LastName;
}
}

public class Book
{
public string Id { get; set; }
public Author Author { get; set; }
public decimal Price { get; set; }
public bool IsBetaRelease { get; set; }
public BookType BookType { get; set; }
}

public enum BookType
{
Ebook, Paperback, Hardcover
}

So, usually we start hacking away with an XmlDocument or XDocument, try to dig our way down into the document models and then convert the strings into the correct datatype to be able to store them in our objects.

That code is kind of boring. Instead, let's take advantage of the dynamic features of C# 4 to do away with that stuff. Gosu.Commons has an XML parser that does just that: DynamicXmlParser.

Here is what the code looks like when using the DynamicXmlParser:


var parser = new DynamicXmlParser();

var xmlCatalog = parser.Parse(xml);

// Access the child elements of the catalog just as an ordinary collection property
foreach (var xmlBook in xmlCatalog.Books)
{
var book = new Book
{
// Read attributes or element values as properties on an element
// Values are automatically and implicitly converted to the appropriate type
Id = xmlBook.Id, // int
Author = new Author
{
FirstName = xmlBook.Author.FirstName, // string
LastName = xmlBook.Author.LastName, // string
},
Price = xmlBook.Price, // decimal
IsBetaRelease = xmlBook.IsBetaRelease, // bool
BookType = xmlBook.BookType // BookType enum
};

Console.WriteLine("Book id: {0}, Author: {1}, Price: ${2}, IsBetaRelease: {3}, Book type: {4}", book.Id, book.Author, book.Price, book.IsBetaRelease, book.BookType);
}

Thanks to dynamic we can use ordinary property access syntax to find child elements of our catalog. Attributes or values of an element can be accessed the same way.


Accessing child element collections

If you expect there to be multiple child elements with a given element name, those elements can be accessed as a collection property. In the example there are multiple Book elements in the catalog, so you can access them through xmlCatalog.Books.

In the example, the Book elements are accessed by adding a plural 's' to the element name, i.e. "Books". However this kind of access work with other plural forms as well:

[Test]
public void Collections_can_be_accessed_with_multiple_kinds_of_pluralization()
{
var xml = @"
<Bag>
<Car />
<Glas />
<Glas />
<Category />
<Category />
<Category />
<Octopus />
<Octopus />
<Octopus />
<Octopus />
</Bag>
";
var parser = new DynamicXmlParser();

var bag = parser.Parse(xml);

Assert.AreEqual(1, bag.Cars.Count); // ...s
Assert.AreEqual(2, bag.Glasses.Count); // ...es
Assert.AreEqual(3, bag.Categories.Count); // ...ies
Assert.AreEqual(4, bag.OctopusElements.Count); // worst case, just postfix the word Elements
}

As the example shows, you can use a couple of differend pluralization forms. If none of them match your specific scenario, just use the element name and postfix it with 'Elements'.


Automatic conversions

If you try to set a typed variable or property to a value read from the parsed XML document that value is automatically, implicitly converted to the type of the variable or property that you are trying to assign to. The requirement is that the type you are assigning to has a defined conversion in the parser.

Currently, default conversions exist for int, double, float, decimal, bool, TimeSpan, DateTime and enums. New conversions can easily be added and just as easily you can override the default conversions with your own.

Here is an example of how to change the default conversion for boolean values so that it accepts "0" or "1" instead of "false" and "true":

[Test]
public void Conversion_can_be_customized_for_any_type()
{
var xml = @"<User Username='SomeName' Password='secret' IsAdmin='1' />";

var parser = new DynamicXmlParser();

parser.SetConverter(x =>
{
if (x == "1")
return true;

return false;
});

var user = parser.Parse(xml);

Assert.IsTrue((bool)user.IsAdmin);
}

Implicit conversions can be done when using the value in a context where the expected type can be inferred, such as assigning to a variable or using the value in a method call. If you want to convert the value when the expected type cannot be inferred you can use an explicit cast.

An example of this is shown in the example above where the value is used in an assertion. If the value was not explicitly cast in the call to Assert.IsTrue, then no conversion would be triggered and the value returned would actually be an instance of the class DynamixXmlElement.


Namespaces

Every now and then you have to parse an XML document where someone has been so kind as to use the wonderful concept of XML namespaces. How do you tackle that one with this dynamic-schynamic thingie? The answer is quite simple, just add an alias for the namespace and which URI it represents. You can then access the properties and collections just as before, by prefixing the property name with the namespace alias.

[Test]
public void Elements_in_different_namespaces_can_be_accessed_by_prefixing_element_name_with_namespace()
{
var xml =
@"<?xml version='1.0' encoding='UTF-8' ?>
<!-- Here comes some XML -->
<Book xmlns='http://www.somesite.org/xml/DefaultNamespace'
xmlns:NS='http://www.somesite.org/xml'>
<Title>The title</Title>
<NS:Author>
<NS:FirstName>Steve</NS:FirstName>
<NS:LastName>Sanders</NS:LastName>
</NS:Author>
</Book>
";p
var _parser = new DynamicXmlParser();

_parser.SetNamespaceAlias("http://www.somesite.org/xml", "NS");

var book = _parser.Parse(xml);

Assert.AreEqual("The title", (string)book.Title);
Assert.AreEqual("Steve", (string)book.NSAuthor.NSFirstName);
Assert.AreEqual("Sanders", (string)book.NSAuthor.NSLastName);
}


Conclusion / Show me teh codez!

There you have it. Thanks to Microsoft for adding some dynamic love and care to C#.

Gosu.Commons is an open source project of mine that is up at GitHub. Feel free to poke around or even contribute. If you just want to use the thing, Gosu.Commons is also available on NuGet. To add a reference, just open the package manager console and type:

PM> Install-Package Gosu.Commons


Written by Erik Öjebo 2011-11-17 22:56

    Comments