eclipse help
With eclipse open go to the Help -> Install new software section
*remember to start eclipse with administrator privelages if you are using Windows 7.
Add a new site titled UIMA with the address http://www.apache.org/dist/uima/eclipse-update-site/
Install both the Apache UIMA Eclipse tooling and runtime support package and the Apache UIMA-AS Eclipse tooling package.
Restart Eclipse
Sunday, 17 November 2013
Saturday, 16 November 2013
Using UIMA-AS to run UIMA annotators in parallel
Using UIMA-AS to run UIMA annotators in parallel
Overview
UIMA stands for Unstructured Information Management Architecture. It’s an Apache technology that provides a framework and standard for building text analytics applications. I’ve mentioned it before.
In this post, I want to talk about an area of UIMA which isn’t covered well in the documentation.
I couldn’t find practical getting-started instructions for running UIMA-AS annotators in parallel. In this post I want to discuss why you might want to do it, and share some simple sample code to show how.
Background – the UIMA pipeline
UIMA provides a framework for managing a text analytics application. You break up the analytics functionality into discrete pieces called annotators. UIMA takes care of moving a text document through an analytics engine: a pipeline containing a series of annotators.
A document goes in one end of the pipeline, passes through a number of annotators, each of which adds some metadata to the document. What comes out the other side of the pipeline is an annotated copy of the document.
By default, you get UIMA to run these annotators one at a time – one after another.
Background – annotators in parallel
What if your annotators are quite slow – perhaps they take several seconds to run?
If there is no dependency between any or all of your annotators, then maybe running them one at a time isn’t the most efficient approach.
You can run all of them at the same time, in parallel. UIMA will merge the output from all of the annotators into a single annotated document.
My sample code
I’ve written two sample UIMA apps. Each demonstrates one of these approaches, to compare and contrast.
They are divided into three eclipse projects. You can import them into an eclipse IDE.
The UIMA eclipse plugins are very helpful if you want to make changes to the XML configuration files, but they’re not essential. If you want them, there are instructions on how to install them at uima.apache.org.
I’ve added comments to the sample code to explain how the apps work, but I’ll give an overview here.
For these samples, I have five simple annotators. They sleep for six seconds, then add an empty annotation to the document CAS.
public void process(JCas jCas) throws AnalysisEngineProcessException {
// sleep for six seconds...
try {
Thread.sleep(6000);
}
catch (InterruptedException e) {
e.printStackTrace();
}
// add an empty annotation to the CAS
jCas.addFsToIndexes(new AnnotationB(jCas));
}
They do enough to prove that all five of them are being run, and that they all really contribute to the final annotated document. They take long enough to demonstrate the differences between these two approaches to running the pipeline.
Sample code : running one annotator at a time
This can be done using UIMA. The first app uima-project demonstrates this.
An XML descriptor file (uima-project/conf/analysisEngine.xml) specifies which annotators should be included in the pipeline, and which order they should be run in.
UIMA demonstration
1.0
annotatorA
annotatorB
annotatorC
annotatorD
annotatorE
The descriptor file (uima-project/conf/analysisEngine.xml) imports a descriptor for each individual annotator.
Each of those imported descriptors identifies the Java class that implements the annotator, and specifies the metadata annotations that it can add to the output document.
For example,
uima-project/conf/annotatorC/analysisEngine.xml:
com.dalelane.uima.annotators.DemoC
annotatorC
com.dalelane.uima.annotators.gen.AnnotationC
uima.tcas.Annotation
The overall pipeline is started from
uima-project/src/com/dalelane/uima/serial/Pipeline.java. This reads in the descriptor file for the pipeline, and uses it to create an instance of a UIMA analysis engine.
File descriptorFile = new File("./conf/analysisEngine.xml");
XMLInputSource descriptorSource = new XMLInputSource(descriptorFile);
ResourceSpecifier specifier = UIMAFramework.getXMLParser().parseResourceSpecifier(descriptorSource);
analysisEngine = UIMAFramework.produceAnalysisEngine(specifier);
To summarise:
The provided eclipse launch config starts a Java application
uima-project/src/com/dalelane/uima/serial/Application.java
The Java application creates
uima-project/src/com/dalelane/uima/serial/Pipeline.java which creates a UIMA AnalysisEngine
The analysis engine reads in the XML descriptor file uima-project/conf/analysisEngine.xml
The analysis engine descriptor identifies the descriptors for each annotator
(e.g. uima-project/conf/annotatorC/analysisEngine.xml)
The descriptor for the annotator identifies the Java class which implements it
(e.g. uima-project/src/com/dalelane/uima/annotators/DemoC.java )
The output from running the launch config shows that it takes about 30 seconds (5 annotators, each of which takes about 6 seconds) to process the document text.
Sample UIMA application - serial
==================================
Accessing analysis engine descriptor file
Creating analysis engine
Processing document...
Time spent in pipeline: 30121
Confirming what was added...
Found: org.apache.uima.jcas.tcas.Annotation
Found: com.dalelane.uima.annotators.gen.AnnotationD
Found: com.dalelane.uima.annotators.gen.AnnotationC
Found: com.dalelane.uima.annotators.gen.AnnotationB
Found: com.dalelane.uima.annotators.gen.AnnotationA
Sample code : running all annotators at once
This can be done using UIMA-AS – a variant of UIMA that provides support for asynchronous scale out. The second app uima-as-project demonstrates this.
To describe the approach at a high level, the idea is that you want to create five separate copies of the document to be analysed. Each of these copies can be run through a separate annotator at the same time. Once they’ve all finished, the output from all of the annotators can be collected together and merged to form the single output document.
Each of the annotators are run as a separate service. A by-product of this is that each can be run on a remote machine, and UIMA-AS manages moving the documents to/from the remote services using JMS messaging.
In my sample, I’m running them all on the same server, and using “localhost” to define the interactions. A JMS broker is still required for this. Instructions for starting the message broker is contained in uima-as-project/README
Because the annotators are run as “remote” services, this introduces an extra step – the services need to be deployed before the pipeline can be started.
uima-as-project/src/com/dalelane/uima/parallel/Pipeline.java deploys each of the services by specifying the deployment descriptors.
// creating UIMA analysis engine
UimaAsynchronousEngine uimaAsEngine = new BaseUIMAAsynchronousEngine_impl();
// preparing map for use in deploying services
Map deployCtx = new HashMap();
deployCtx.put(UimaAsynchronousEngine.DD2SpringXsltFilePath, System.getenv("UIMA_HOME") + "/bin/dd2spring.xsl");
deployCtx.put(UimaAsynchronousEngine.SaxonClasspath, "file:" + System.getenv("UIMA_HOME") + "/saxon/saxon8.jar");
// preparing map for use in deploying services
uimaAsEngine.deploy("./conf/annotatorA/deploy.xml", deployCtx);
uimaAsEngine.deploy("./conf/annotatorB/deploy.xml", deployCtx);
uimaAsEngine.deploy("./conf/annotatorC/deploy.xml", deployCtx);
uimaAsEngine.deploy("./conf/annotatorD/deploy.xml", deployCtx);
uimaAsEngine.deploy("./conf/annotatorE/deploy.xml", deployCtx);
The deployment descriptors for each annotator specify the name of the JMS endpoint that UIMA can use to send documents to the annotator for analysis, and the location of the analysis engine descriptor file that defines the annotator.
For example uima-as-project/conf/annotatorB/deploy.xml
The individual annotator descriptors are the same as in the first project uima-project.
For example,
uima-as-project/conf/annotatorB/analysisEngine.xml
As before, it identifies the Java class which implements the annotator, and the types of annotations that it can create.
Once the UIMA services are deployed, the analysis engine descriptor for the overall pipeline can be deployed.
This is also done in
uima-as-project/src/com/dalelane/uima/parallel/Pipeline.java
uimaAsEngine.deploy("./conf/deploy.xml", deployCtx);
The deployment descriptor for the overall pipeline (uima-as-project/conf/deploy.xml), identifies how the pipeline can communicate with each of the “remote” services that make up it’s annotators.
UIMA-AS provides a sample (AdvancedFixedFlowController) that takes care of making the copies (a CAS Multiplier) of the document being analysed, and defines the sequence for the annotators to be run in parallel.
The deployment descriptor for my pipeline uses this sample.
Flow
annotatorA,annotatorB,annotatorC,annotatorD,annotatorE
It also identifies the descriptor file for running the analysis engine
The analysis engine descriptor file (uima-as-project/conf/analysisEngine.xml), similar to before, identifies the annotators that make up the aggregate pipeline.
These describe the way that the analysis engine can send documents to the remote services for analysis, using JMS. For example, uima-as-project/conf/annotatorB/remote.xml contains:
org.apache.uima.aae.jms_adapter.JmsAnalysisEngineServiceAdapter
To summarise:
The provided eclipse launch config starts a Java application uima-as-project/src/com/dalelane/uima/parallel/Application.java
The Java application creates an instance of uima-as-project/src/com/dalelane/uima/parallel/Pipeline.java
Pipeline.java creates an UimaAsynchronousEngine which deploys each of the annotator services, such as uima-as-project/conf/annotatorD/deploy.xml
Each annotator’s deployment descriptor identifies the actual implementation of the annotator, giving the analysis engine XML uima-as-project/conf/annotatorD/analysisEngine.xml which in turn specifies the Java implementation class
Pipeline.java then deploys the overall analysis engine pipeline as specified in the deployment descriptor uima-as-project/conf/deploy.xml
This deployment descriptor identifies the way that the analysis engine should communicate with the remote services (by importing JMS specs such as uima-as-project/conf/annotatorC/remote.xml) and the order that they should be invoked in (using AdvancedFixedFlowController)
The output from running the launch config shows that it takes about 6 seconds (5 annotators run in parallel, each of which takes about 6 seconds) to process the document text.
Full sample output is at
uima-as-project/example-output/console.log
A summary is:
Sample UIMA application - parallel
==================================
Deploying UIMA services
Service:annotatorA Initialized. Ready To Process Messages From Queue:AnnotatorARemoteQ
Service:annotatorB Initialized. Ready To Process Messages From Queue:AnnotatorBRemoteQ
Service:annotatorC Initialized. Ready To Process Messages From Queue:AnnotatorCRemoteQ
Service:annotatorD Initialized. Ready To Process Messages From Queue:AnnotatorDRemoteQ
Service:annotatorE Initialized. Ready To Process Messages From Queue:AnnotatorERemoteQ
Deploying analysis engine
Service:UIMA demonstration Initialized. Ready To Process Messages From Queue:DemoAnnotatorQueue
Initialising UIMA client
Processing document...
Time spent in pipeline: 6117
Confirming what was added...
Found: org.apache.uima.cas.impl.AnnotationImpl
Found: org.apache.uima.cas.impl.AnnotationImpl
Found: org.apache.uima.cas.impl.AnnotationImpl
Found: org.apache.uima.cas.impl.AnnotationImpl
Found: org.apache.uima.cas.impl.AnnotationImpl
Found: org.apache.uima.cas.impl.AnnotationImpl
Summary
This isn’t definitive model code for using UIMA-AS. It’s intended more as a helpful first step into getting started with UIMA-AS – which there seems to be a shortage of documentation for. But there is a *lot* more to UIMA-AS, with a lot of settings and features to tweak.
Even with this simple example, you can see that output which takes 30 seconds to get using UIMA can be completed in 6 seconds if run in parallel, and how this can be done with UIMA-AS and a few extra config files.
Tags: java, jms, uima, uima-as
original posts link
http://dalelane.co.uk/blog/?p=2247
Saturday, 31 August 2013
facebook login using java
DefaultHttpClient httpclient = new DefaultHttpClient();
httpclient.getParams().setParameter(HttpMethodParams.RETRY_HANDLER,
new DefaultHttpMethodRetryHandler());
HttpGet httpget = new HttpGet("https://developers.facebook.com/docs/reference/api/examples/");
HttpResponse response = httpclient.execute(httpget);
HttpEntity entity = response.getEntity();
if (entity != null) {
entity.consumeContent();
}
List cookies = httpclient.getCookieStore().getCookies();
HttpPost httpost = new HttpPost(
"https://www.facebook.com/login.php?login_attempt=1");
List nvps = new ArrayList();
nvps.add(new BasicNameValuePair("email", "xxxxxxxxxxxxxx"));
nvps.add(new BasicNameValuePair("pass", "ssssssss"));
httpost.setEntity(new UrlEncodedFormEntity(nvps, HTTP.UTF_8));
response = httpclient.execute(httpost);
entity = response.getEntity();
if (entity != null) {
entity.consumeContent();
}
CookieStore cookiestrore = httpclient.getCookieStore();
//cookies = httpclient.getCookieStore().getCookies();
//httpclient.getConnectionManager().shutdown();
return cookiestrore;
Friday, 26 July 2013
MongoDb for Java Developer
MongoDB, noSQL open source database, written in C++, with many great features like map-reduce , auto sharding, replication, high availability and etc. The following Java / Spring Data MongoDB tutorials and examples are tested with : MongoDB 2.2.3 Java-MongoDB-Driver 2.11.0 Spring-Data-MongoDB 1.2.0.RELEASE more on click here
Subscribe to:
Posts (Atom)