Friday, June 22, 2012

Deploying Flex mobile apps on a iOS device: the easy way

On the last Devoxx conference I heard about using Flex for mobile apps. This week I finally made some time to look into it and build my first app with Flex. It has been 5 years since I developed with Flex, so it took a bit of time to get into it again, but still I'm amazed with the speed with which you can develop an app. Last year I've build 2 iPhone app in iOS. With Flex developping is just as fast, but at the same time you also get an Android and BlackBerry version of your app.

I'm amazed that all Adobe and Flex Mobile on iOS documentation talk about deploying the Flex app using iTunes while there is a way which is so much easier. In stead of iTunes, which requires syncing, use Apple's iPhone Configuration Utility (available for Mac and Win). It's weird you have to install a music library to deploy application, right?

To build the app, follow the Adobe guidelines to prepare to build, debug and deploy an iOS application. When you have you IPA file in the bin-debug folder...

1. Connection your Apple iOS device to your development computer.

2. Launch iPhone Configuration Utility.

Note: You can also use iPhone Configuration Utility to find the Unique Device Identifier (UDID) of your iOS device. No need to use or install iTunes for anything.

3. If you haven't deployed the mobile provisioning profile to your phone yet:
  1. Either drag and drop your provisioning profile into the utility or utility icon in the dock, or click Provisioning Profiles and then Add (left top) to select your profile.
  2. Then, select your iOS device > Provisioning Profiles tab.
  3. Click on 'Install' for the provisioning profile for your app.
4. To install your application on the iOS device:
  1. Drag and drop your IPA file into the utility or click Add (left top) to select the IPA to install. When asked to replace the app, select 'Yes'.
  2. Select your iOS device > Applications tab.
  3. If you have installed the app before, first uninstall the app.
  4. Then install the app (again).
And you're done. The app is installed and you can run or debug the app on the iOS device.
Basically when your developing you do the setup once and then only build the app via the run configuration and perform step 4 to deploy it.
To me, this is much easier then using iTunes. At least no problems because the iTunes at work is not the iTunes you normally sync with and all your photos and music gets removed etc.

Wednesday, June 20, 2012

MapReduce XML processing (new api)

For a while now I'm fascinated by stories about Hadoop, HBase, Hive, etc. I attended many presentations at several conferences. So it's about time I started to learn more about it so I can play with it myself. Because I'm also into Scala nowadays, I also want to look into how Scala can make developing Hadoop/MapReduce applications easier and more fun. 
So, as to create a logbook for myself, I'm writing a series of posts about my Hadoop with Scala adventure. Previous posts were about setting up a Hadoop cluster using Puppetsetting up a Hadoop cluster using Cloudera's Free Manager, howto develop (and debug) MapReduce jobs in the IDE and Hadoop Streaming with Scala scripts. This post is about processing XML with MapReduce. In next posts I will write more about my Hadoop and Scala adventure and, hopefully, eventually ending up with using Scala DSL's like ScoobiScalding and the Scala MapReduce framework Spark.

To be able to play with Hadoop, I downloaded a Wikipedia dump so I had some data to do some map/reduce jobs on. Hadoop has a StreamXmlRecordReader but this uses the old API and I wanted to use and learn about the new API. While Googleing about this subject, most seem to use the Mahout xml import. I, however, chose to write my own.

InputFormat
In Hadoop, to write your own reader/parser, you have to create an InputFormat implementation that returns a RecordReader. The RecordReader is responsible for reading data from the InputSplit and provide the next Key/Value for the Mapper to process. Simplist is to just extend the FileInputFormat and override the createRecordReader method to return your own RecordReader implementation.
public class XmlPartInputFormat extends FileInputFormat<Text, Text> {

    @Override
    public RecordReader<Text, Text> createRecordReader(InputSplit inputSplit, TaskAttemptContext taskAttemptContext) throws IOException, InterruptedException {
        RecordReader<Text, Text> recordReader = new XmlPartRecorderReader(new XmlPartReaderFactoryImpl());
//        recordReader.initialize(inputSplit, taskAttemptContext);
        return recordReader;
    }
}
In the new API, initialization is not needed anymore because the method initialize will still be called by Hadoop before the reader is used the first time.

RecordReader
The XmlPartRecordReader implementation is pretty straight forward. It returns the key and values as long there are ones available and it's able to report the progress using the start, end and current position.
The hard work is done by a XmlPartReader implementation. XmlPartReader is just a small interface which allowed my to easily create and test different xml stream parsing implementations.

XmlPartReader using StAX
My first XmlPartReader implementation used StAX to read the xml. The reader doesn't read the whole xml in one time, but during hasNext it tries to find the next start-tag. When found, the stream is positioned just past this start-tag.
Method getNextXmlPart reads the stream until the end-tag is found and returns the whole xml part
from start- to end-tag. This is implemented via a XMLEventReader which uses a custom EventFilter which denies everything until the start-tag is found and accepts everything until the end-tag is found.

The reader performed pretty well. The problem was however that an InputSplit might be somewhere in the middle of the file and the reader must be able to read from that position on. With StAX it is not possible to just position the stream somewhere and start reading. StAX will treat the first tag it reads as the root tag and will throw an exception when reading past the end tag. For example, if in the Wikipedia xml, it reads the <title> as it's first tag, StAX will throw an exception when reading past the </title> tag. At first I was not able to find a solution for this and I started on my second XmlPartReader implementation. Later I thought of a solution by just reading through the xml until the Event location is past the start position. The XmlPartReader will read from the stream from there on.

XmlPartReader 2
The second XmlPartReader implementation simply reads the xml from an InputStream character by character. During hasNext it tries to find a sequence of characters which match the wanted start-tag. Method getNextXmlPart reads the stream until a sequence is found which matches the end-tag. Since it uses a basic InputStream, the stream can easily be positioned by skipping bytes to the wanted start position.

Performance
I tried to compare the performance of both XmlPartReader's. Although I had expected the second implementation to be the fastest, the implementation using StAX was a bit faster. I think my xml parsing implementation is pretty lean and mean so I suspect the StAX implementation is faster because it uses a better buffer size. I tried a couple of buffer sizes, but was not able to beat the StAX reader yet. Although the first XmlPartReader now also supports a start position > 0, I do expect the second implementation to be faster with large xml files and many splits since it can just skip to the start position and the StAX implementation has to read through the xml and compare the tag position with the wanted start position.

Any XML
Although I created this xml processing to read the Wikipedia XML, they can be used for any xml. The Readers can read any xml part from a stream. They can be used to read the whole <page> section, be can also just iterate over the <title> sections (see the test cases). Just specify the tag of the xml part you wish to process. It defaults to 'page', but can be overridden using the property 'xml.reader.tag'.
In the XmlPartInputFormat, other XmlPartReader implementation can easily be used by providing another XmlPartReaderFactory implementation.

Using it
To use this xml processing, in your Map/Reduce driver set the InputFormatClass to the XmlPartInputFormat.class (see WordCounterDriver). The Mapper receives a Text key which is the name of the searched tag and a Text value which is the xml part including the start- and end-tag of the searched tag.
In the Mapper you could use XPath to get the part of the xml your interested in. To get the title from a Wikipedia page:
    private static XPath xpath = XPathFactory.newInstance().newXPath();
    String title = (String) xpath.evaluate(".//title", inputSourceForValue(value), XPathConstants.STRING);

    private InputSource inputSourceForValue(Text value) {
        return new InputSource(new StringReader(value.toString()));
    }
(see also WordCounterMapper)

Sources
Sources are available in a Git repository at : https://bitbucket.org/diversit/hadoop-jobs.
It also contains all test classes and also a Scala implementation of XmlPartReader.