ETL 01: Documentum -> MuleSoft -> Filesystem

This post is the first in a series focused on Extract-Transform-Load tools and techniques that I will discuss on this blog.

MuleSoft is an excellent tool to integrate real time updates (such as approved documents) from one system into another system.  MuleSoft has a very rich developer community with lots of examples, a good YouTube channel, and training …. I recommend this free 8 week course to learn the fundamentals.

We can use the Documentum REST interface to run a Documentum query, then either store the results or extract related PDF files from the system.

You can image the use cases for this set of tools:

  • Migrate content into or out of Documentum
  • Synchronize master data between systems
  • Publish content from Documentum to Box, Dropbox, Microsoft SkyDrive, Google Drive, etc.


Anyone new to the Documentum REST will want to use Postman to investigate some of the URLs that we use in this post:

Get repositories:


Run a Query:

http://localhost:8080/dctm-rest/repositories/repo1?dql=select r_object_id,object_name from dm_document where folder('/HR',descend)

Get Content:



MuleSoft Configuration to Query Documentum and save to a CSV file

The Documentum REST query output has the attribute values located at:

so we can use a transform-message to put the properties into a nicely formatted CSV file as follows:


 <flow name="dctm_query_to_csv">
 <http:listener config-ref="HTTP_Listener_Configuration"
 path="/dctm2" doc:name="HTTP" />
 <http:request config-ref="HTTP_Request_Configuration"
 path="/dctm-rest/repositories/repo1" method="GET"
 <http:query-param paramName="dql"
 value="select r_object_id,object_name from dm_document where folder('/HR',descend)" />

 <set-variable variableName="OriginalPayload" value="#[payload]"
 doc:name="Variable" />
 <set-payload value="#[flowVars.OriginalPayload]" mimeType="application/json"
 doc:name="Set Payload" />
 <dw:transform-message metadata:id="d8d90bbc-3265-4882-93e8-34b942bed718"
 doc:name="Transform Message">
 <dw:input-payload />
 <dw:set-payload><![CDATA[%dw 1.0
%output application/json

 rows: payload.entries map ((entry , indexOfEntry) -> {
 <dw:transform-message doc:name="Transform Message">
 <dw:set-payload><![CDATA[%dw 1.0
%output application/csv
payload.rows map ((row , indexOfRow) -> {
 "object_name": row.object_name,
 "r_object_id": row.r_object_id
 <logger level="INFO" metadata:id="04754dd7-e7cb-4e3a-a151-18c5ed1bbd42"
 doc:name="Logger" />
 <file:outbound-endpoint path="\data\mule"
 outputPattern="dctm.csv" responseTimeout="10000" doc:name="File" />


MuleSoft Configuration to Query Documentum and save PDF files

Similarly, we can run a query and extract all the PDF renditions of those files to the file system.

In this example, create a folder: c:\data\mule (or change the location int he configuration below.

Here is the MuleSoft Configuration:


<flow name="dctm_query_export_pdf">
 <http:listener config-ref="HTTP_Listener_Configuration"
 path="/dctm3" doc:name="HTTP" />

 <http:request config-ref="HTTP_Request_Configuration"
 path="/dctm-rest/repositories/dv4_ls_d2_cs_reg" method="GET"
 <http:query-param paramName="dql"
 value="select r_object_id,object_name from dm_document where folder('/HR/Form/Employment form',descend)" />

 returnClass="java.lang.Object" doc:name="JSON to Object" />
 <foreach collection="#[payload.entries]" doc:name="For Each">
 <set-variable variableName="r_object_id" value="#[]" doc:name="get r_object_id"/>
 <set-variable variableName="ContentURL" value="#[&quot;/dctm-rest/repositories/dv4_ls_d2_cs_reg/objects/&quot; + flowVars.r_object_id + &quot;/contents/content&quot;]" doc:name="ContentURL"/>
 <logger message="#[flowVars.counter] - #[flowVars.r_object_id]" level="INFO" doc:name="Logger"/>
 <http:request config-ref="HTTP_Request_Configuration" path="#[flowVars.ContentURL]" method="GET" doc:name="content obj">
 <http:query-param paramName="format" value="pdf"/>
 <json:json-to-object-transformer returnClass="java.lang.Object" doc:name="JSON2"/>
 <set-variable variableName="href" value="#[payload.links[1].href]" doc:name="href" doc:description="Extract the URL"/>
 <set-variable variableName="href2" value="#[flowVars.href.substring(43)]" doc:name="href2" doc:description="Extract the URL portion after the host name"/>
 <logger message="href = #[flowVars.href]" level="INFO" doc:name="Logger"/>
 <http:request config-ref="HTTP_Request_ACS" path="#[flowVars.href2]" method="GET" doc:name="Request PDF"/>
 <file:outbound-endpoint path="/data/mule" outputPattern="#[flowVars.r_object_id].pdf" responseTimeout="10000" mimeType="application/pdf" doc:name="Save PDF"/>
 <logger level="INFO" metadata:id="04754dd7-e7cb-4e3a-a151-18c5ed1bbd42"
 doc:name="Logger" />


This post should help you get started with:

  • Using Documentum REST to run queries and extract content
  • Using MuleSoft to process the data returned from REST to store as a CSV file or PDF files

Future ETL blog posts will include: