I’ve started this as a stackoverflow question:

I’ve got multiple XSD files describing a schema. I’d like to generate a human readable documentation as a result of a build process.

The XSD is maintained and review within repository (gitflow) and committing the documentation makes the repository cluttered. I’d like to generate human readable HTML during the build process (maven / gradle / ant build or simple CLI interface)

Found this post How to convert XSD to human readable documentation? and DocFlex/XML Maven plugin seems interesting but I can’t believe that’s the only one.

Any helpful tips on that?

This pretty much outlines the idea: treat XSD as code, get that into repository and have all the subsidiary artifacts elements generated as a part of the build process (not to have autogenerated HTML files in the repository).

This can be pretty easy achievable with OxygenXML Editor’s schemaDocumentation.sh script, however it requires OxygenXML installed on your build environment, which is something I’d tend to avoid (I generally avoid having build dependent on the environment beyond just JVM; if build needs something I want it to download it for itself).

Tools

I was using OxygenXML Editor on my desktop and I like the documentation is generated (the HTML). I also liked a flow that the document is reviewed through the visual part (html) but the comments and remarks are done in the source (code) in a typical code review process. I started to look for tools that could supported this kind of workflow.

OxygenXML Editor schemaDocumentation.sh script to certain extent could support this flow, but some additional changes were needed to the bash scripts. So I started to look in some different places.

DocFlex

DocFlex/XML Maven Integration looks pretty promising. Installation went smoothly getting some output generated was pretty straightforward - as presented in examples

However, only after some more detailed reading I realized that images are only supported through integration with 3rd party tools like XMLSpy or OxygenXML.

If all you need is just HTML view (pretty comprehensive after all - but without images) DocFlex is probably enough. It’s not very expensive (200$ for license) - but to be honest I haven’t been looking for any free alternatives. Lack of pictures was a deal breaker for me.

Oxygen wrapped in Gradle sauce

Getting back to the OxygenXML, it’s a program written in Java and the schemaDocumentation.sh script is actually invoking a headless application with some command line arguments. If so, I can invoke it directly from a build script. I used Gradle for that

apply plugin: 'java'

version = "1.0-SNAPSHOT"

ext {
  generatedDir = new File(buildDir, "generated-html")
}

def OXYGEN_HOME = "/opt/java/oxygen"        (1)
def schemaFiles = ["page", "metadata"]        (2)

schemaFiles.each { pageName ->
  task "${pageName}SchemaTask"(type: JavaExec) {
    mkdir generatedDir

    classpath = files([
      "$OXYGEN_HOME",
      "$OXYGEN_HOME/classes",
      "$OXYGEN_HOME/lib",
      "$OXYGEN_HOME/lib/oxygen.jar",
      "$OXYGEN_HOME/lib/oxygenDeveloper.jar",
      "$OXYGEN_HOME/lib/fop.jar",
      "$OXYGEN_HOME/lib/xmlgraphics-commons-1.5.jar",
      "$OXYGEN_HOME/lib/batik-all-1.7.jar",
      "$OXYGEN_HOME/lib/xercesImpl.jar",
      "$OXYGEN_HOME/lib/xml-apis.jar",
      "$OXYGEN_HOME/lib/org.eclipse.wst.xml.xpath2.processor_1.2.0.jar",
      "$OXYGEN_HOME/lib/icu4j.jar",
      "$OXYGEN_HOME/lib/saxon.jar",
      "$OXYGEN_HOME/lib/saxon9ee.jar",
      "$OXYGEN_HOME/lib/log4j.jar",
      "$OXYGEN_HOME/lib/resolver.jar",
      "$OXYGEN_HOME/lib/oxygen-emf.jar",
      "$OXYGEN_HOME/lib/commons-httpclient-3.1.jar",
      "$OXYGEN_HOME/lib/commons-codec-1.3.jar",
      "$OXYGEN_HOME/lib/commons-logging-1.0.4.jar",
      "$OXYGEN_HOME/lib/httpcore-4.3.2.jar",
      "$OXYGEN_HOME/lib/httpclient-cache-4.3.5.jar",
      "$OXYGEN_HOME/lib/httpclient-4.3.5.jar",
      "$OXYGEN_HOME/lib/fluent-hc-4.3.5.jar",
      "$OXYGEN_HOME/lib/httpmime-4.3.5.jar",
      "$OXYGEN_HOME/lib/commons-logging-1.1.3.jar",
      "$OXYGEN_HOME/lib/commons-codec-1.6.jar"
    ].toList())

    main = 'ro.sync.xsd.documentation.XSDSchemaDocumentationGenerator'
    jvmArgs = ["-Djava.awt.headless=true"]
    args = ["schema/${pageName}.xsd",
            "-format:html",
            "-split:location",
            "-out:$generatedDir/${pageName}.html"]
  }
}

task schema(dependsOn: tasks.matching { Task task ->
        task.name.endsWith("SchemaTask")}) {
}

defaultTasks 'schema'
  1. External path to OxygenXML

  2. HTML is generated for every file defined in this list

So far Gradle looks like an overkill but this is still work in progress. With the power of Gradle we can try to get rid of all external dependencies and make this build self-sufficient.

OxygenXML based schema documentator

Full Oxygen package is 150MB and comes with all different things (not really needed for HTML schema generation). However, what is only needed here is a schemaDocutation.sh which we can run independently. So with this little script fellow we can try to strip down Oxygen to the only part required

mkdir -p oxygen-lite/lib

for f in `cat $OXYGEN_HOME/schemaDocumentation.sh | grep CP= | tr ":" "\n" | cut -d "/" -f3 | grep jar`
do
        file="$OXYGEN_HOME/lib/$f"
        if [ -e $file ]
        then
                cp $file oxygen-lite/lib/$f
        fi
done

cp $OXYGEN_HOME/licensekey.txt oxygen-lite

Next step would be to combine the build with the stripped down ('lite') version of Oxygen. I used Groovy VFS DSL library to be able to process external downloads and unzip my Oxygen-lite

buildscript {
  repositories {
    jcenter()
  }
  dependencies {
    classpath 'org.ysb33r.gradle:vfs-gradle-plugin:1.0-beta1'
    classpath 'commons-httpclient:commons-httpclient:3.1'
  }
}

apply plugin: 'org.ysb33r.vfs'
apply plugin: 'java'

version = "1.0-SNAPSHOT"

ext {
  downloadDir = new File(buildDir, "download")
  generatedDir = new File(buildDir, "generated-html")
}

task download << {
  mkdir downloadDir
  vfs {
    cp "zip:https://your-host/path-to-oxygen/oxygen-lite.zip",
    downloadDir, recursive:true, overwrite:true
  }
}

download {
  description "Downloading oxygen"
  outputs.dir downloadDir
}

def OXYGEN_HOME = "$downloadDir/oxygen-lite"
def schemaFiles = ["page", "metadata"]

apply plugin: 'java'

version = "1.0-SNAPSHOT"

ext {
  generatedDir = new File(buildDir, "generated-html")
}

def OXYGEN_HOME = "/opt/java/oxygen"        (1)
def schemaFiles = ["page", "metadata"]        (2)

schemaFiles.each { pageName ->
  task "${pageName}SchemaTask"(type: JavaExec) {
    mkdir generatedDir

    classpath = files([
      "$OXYGEN_HOME",
      "$OXYGEN_HOME/classes",
      "$OXYGEN_HOME/lib",
      "$OXYGEN_HOME/lib/oxygen.jar",
      "$OXYGEN_HOME/lib/oxygenDeveloper.jar",
      "$OXYGEN_HOME/lib/fop.jar",
      "$OXYGEN_HOME/lib/xmlgraphics-commons-1.5.jar",
      "$OXYGEN_HOME/lib/batik-all-1.7.jar",
      "$OXYGEN_HOME/lib/xercesImpl.jar",
      "$OXYGEN_HOME/lib/xml-apis.jar",
      "$OXYGEN_HOME/lib/org.eclipse.wst.xml.xpath2.processor_1.2.0.jar",
      "$OXYGEN_HOME/lib/icu4j.jar",
      "$OXYGEN_HOME/lib/saxon.jar",
      "$OXYGEN_HOME/lib/saxon9ee.jar",
      "$OXYGEN_HOME/lib/log4j.jar",
      "$OXYGEN_HOME/lib/resolver.jar",
      "$OXYGEN_HOME/lib/oxygen-emf.jar",
      "$OXYGEN_HOME/lib/commons-httpclient-3.1.jar",
      "$OXYGEN_HOME/lib/commons-codec-1.3.jar",
      "$OXYGEN_HOME/lib/commons-logging-1.0.4.jar",
      "$OXYGEN_HOME/lib/httpcore-4.3.2.jar",
      "$OXYGEN_HOME/lib/httpclient-cache-4.3.5.jar",
      "$OXYGEN_HOME/lib/httpclient-4.3.5.jar",
      "$OXYGEN_HOME/lib/fluent-hc-4.3.5.jar",
      "$OXYGEN_HOME/lib/httpmime-4.3.5.jar",
      "$OXYGEN_HOME/lib/commons-logging-1.1.3.jar",
      "$OXYGEN_HOME/lib/commons-codec-1.6.jar"
    ].toList())

    main = 'ro.sync.xsd.documentation.XSDSchemaDocumentationGenerator'
    jvmArgs = ["-Djava.awt.headless=true"]
    args = ["schema/${pageName}.xsd",
            "-format:html",
            "-split:location",
            "-out:$generatedDir/${pageName}.html"]
  }
}

task schema(dependsOn: ['download',
        tasks.matching { Task task -> task.name.endsWith("SchemaTask")}]) {

}

defaultTasks 'schema'

Setting up the CI

The above script sits together with XSD schema files in the repository. Whenever a new version of XSD is issued (or a new work is initiated in a 'feature branch') our Jenkins picks up the changes and rebuild the docs. That way the schema can be viewed in a human readable name while the comments can be attached to the actual XSDs in the repository. This kind of works for us.

I’m still not sure if Oxygen is the best tool for the job, but I couldn’t find a better one (not that I was looking for it very hard). So if you have suggestions how to proceed differently - I welcome them in comments.

Why the whole build thing?

Yes, that’s a valid question. Why taking 'a cannon for a fly' and not just didn’t stick to the simple bash script. Well, the nature of such builds is they usually never stop after the first step. Having an HTML documentation generated, new ideas just started popping up my head…​ why can’t we generate actual object model out of these XSD - for a better reference. Not a problem; with gradle-jaxb-plugin it goes nearly out of the box.

jaxb {
  mkdir generatedClassedDir
  dependencies {
    jaxb "com.sun.xml.bind:jaxb-xjc:2.1.6"
  }

  xsdDir = 'schema'
  episodesDir = "build/"
  xjc {
    taskClassname = "com.sun.tools.xjc.XJCTask"
    destinationDir = generatedClassedDir
    generatePackage = "eu.ydp.flipbook.model"
    args = ["-npa", "-no-header"]
  }
}

What’s next? The ZIP file is too massive - we can strip it. Let’s just add some runtime dependencies and remove those from downloaded zip.

dependencies {
  compile 'log4j:log4j:1.2.14'
  compile 'org.apache.httpcomponents:fluent-hc:4.3.5'
  compile 'org.apache.httpcomponents:httpmime:4.3.5'
  compile 'org.apache.httpcomponents:httpclient-cache:4.3.5'
  compile 'commons-httpclient:commons-httpclient:3.1'
  compile("org.apache.xmlgraphics:fop:1.1") {
    exclude group: "org.apache.avalon.framework"
  }
  compile 'avalon-framework:avalon-framework-api:4.2.0'
  compile 'avalon-framework:avalon-framework-impl:4.2.0'
  compile 'xerces:xercesImpl:2.11.0'
}

//... some line stripped

schemaFiles.each { pageName ->
  task "${pageName}SchemaTask"(type: JavaExec) {
    mkdir generatedDir

    classpath = sourceSets.main.runtimeClasspath
    classpath += files([
      "$OXYGEN_HOME/",
      "$OXYGEN_HOME/lib/oxygen.jar",
      "$OXYGEN_HOME/lib/oxygen-emf.jar",
      "$OXYGEN_HOME/lib/org.eclipse.wst.xml.xpath2.processor_1.2.0.jar",
      "$OXYGEN_HOME/lib/icu4j.jar",
      "$OXYGEN_HOME/lib/saxon.jar",
      "$OXYGEN_HOME/lib/saxon9ee.jar",
      "$OXYGEN_HOME/lib/resolver.jar"
    ].toList())
    main = 'ro.sync.xsd.documentation.XSDSchemaDocumentationGenerator'
    jvmArgs = ["-Djava.awt.headless=true"]
    args = ["schema/${pageName}.xsd", "-format:html", "-split:location", "-out:$generatedDir/${pageName}.html"]
  }
}

These are tasks a bit harder to achieve with just a bash script - and here gradle build really show it’s power.

Perfect is the enemy of good

The rest of Oxygen XML dependencies and not findable in the central Maven repository; either custom or too old or I have no idea how to find them. But under no means it doesn’t mean we cannot deploy them ourselves, save on download time and have all the dependencies properly cached by Gradle.

The deployment itself is pretty easy and neatly described in this Maven mini guide. The only tweak we might want to do is avoiding explicit dependencies (as the build file will get a bit cluttered) and hide the hideous dependencies behind the core Oxygen deps.