Mission Critical, Low Latency Java EE In Manufacturing Systems -- An Interview With iTAC
Frank, could you briefly introduce yourself?
I'm an senior software architect and developer with over 20 years' experience in analysis, design, implementation and testing Java applications. Since 2001 I'm working at iTAC Software AG, Germany in the advanced technology division. In this division, we build the basic technical frameworks to place the business logic on top of it. During the last 8 years' I'm involved in design and implementation of an enterprise Java based middleware, which provides high availability, scalability and low latency to fulfill the requirements for a online manufacturing execution system. My roots are in the electrical engineering environment, but I've started with Java version 1.02 on the early OS/2 days :-)
What is the purpose of the application (iTAC.MES.Suite)?
Modern manufacturing industries have a need to optimize the whole production process. Beside collecting all types of manufacturing data (e.g. for traceability features, quality reports, ...), it's also a requirement to control the flow of manufacturing lines to optimize production quality. These Manufacturing Execution Systems (MES) are located architecturally between the shop floor production machines (often based on PLC systems) and the backend ERP systems. The product iTAC.MES.Suite provides an online MES system which is able to interact synchronously with the production machines. Because of the online characteristic of the iTAC.MES.Suite, it needs to be very fast in synchronous processing for all remote calls to not slow down the production lines. All data is stored in a relational database which reflects the current situation of the manufacturing online.
Your company back then was a "startup", right?
The company was founded in 1998 as a spin-off from Bosch Telecom. iTAC were among the first to promote and introduce internet technologies and Java in the manufacturing environment.
We migrated the application a few years ago from proprietary POJO application to Java EE 5. Did the migration affected productivity?
Not as much as we first thought, but overall it improved our productivity in development
What is the impact of Java EE on the code base. How much code could you delete during the migration? Do you see any potential for more simplification?
The impact was not as much as we thought. This is because of the special requirements for MES. One of the challenges was the communication protocol between client libraries and the Java EE servers. In manufacturing environments, our customers often integrate their technical equipment directly with our programming libraries. The manufacturing equipment and the steering programs are usually existing components. Integrators use our API libraries to tightly couple their systems to our central MES. To achieve this, we provide libraries for different languages, architectures and operating systems (e.g. Windows C-DLL, Linux so files, Windows .NET assemblies, JAR files and others). Java based manufacturing equipment is still the exception, usually they based on C/C++ systems. In this scenario, we need to call EJB business logic remotely (RPC) with a protocol which is independent from a Java EE application server vendor and downward compatible for many years - some manufacturing lines running for a decade or longer! While the manufacturing lines are running, we can't update client libraries - customers would chase us, if we would tell them to update all client libraries in all machines just to be able to update e.g. the Java EE application server version or switch to a different Java EE application server implementation/vendor. But when using standard remote EJB client container mechanisms exactly this would happen.As a result we can't use the remote EJB interface to access the business logic which resides in EJB's. Instead we need a mechanism to provide a long term stable communication protocol which have always be downward compatible - even if we update or change the Java EE application server. To fulfill high availability and real load balancing, we had to build a special middleware on top the the Java EE stack to achieve all this requirements - as a consequence we don't use very much infrastructure code from the JEE application server. As you can see, the requirements for MES a very special compared to usual Java EE applications/projects.
How "mission critical" is the application?
MES are the perfect example for "mission critical" requirements. When an MES system fails, then the entire production is affected. Nowadays with just in time production and nearly no storage, it must never fail. We have a lot of automotive customers which delivers directly the large automakers, if their lines shut down, no car's will be produced - the real nightmare scenario for MES vendors...
How many peak transactions per second (or similar metrics) has the application to handle? Is Java EE fast enough?
One of our biggest customer using about 1500 clients (API library based and interactive GUI clients) and they produce about 25,000 rpc calls per second. Most of this calls have to return in about 50-100 milliseconds. Because of our own binary protocol with very less bandwidth consumption and small latency time, we can achieve this. This protocol uses the http listener endpoints of the Java EE servers which are very efficient today (e.g. grizzly in glassfish server). Java EE is definitely fast enough for our requirements.
How important is monitoring? Does your application implement any dashboards or runtime statistics?
For a complex software, monitoring is a must have. Beside the typical business monitoring functions, required for a MES, we also monitoring many technical probes for all our servers in the cluster and all clients (even the library based clients).
Your application is a product. On which application servers is your application servers is your application currently running?
Currently it's running on GlassFish, Payara and WebSphere, but we try to keep the code as portable as possible for other Java EE application servers.
We created a "shared nothing" architecture and the application was able to cluster across different application server vendors. Could you briefly describe the idea?
As described previously, we need a protocol which we developed by ourself. So, a consequence is to do also the high availability features by our product itself. We use a client based clustering, so all clients know about the current situation of the cluster e.g. how many cluster nodes are available, which services are available and how to reach them. To achieve this, we can only use stateless EJB's and CDI on server side (no stateful EJB's and no MDB's), but this is not a problem for our application - even it's more scalable. All cluster nodes are installed as single node installations. We use a distributed cache in all the server nodes, so all servers know each other. With this implementation, we are independent of Java EE application server vendors and we could change Java EE server platform without touching our clients. This is a very important feature for us.
Your team implemented a great schema evolution tool based on JDBC. Could you briefly describe that?
We implemented the schema migration tool to facilitate the task of software upgrades and keeping installed software versions and database schemas in sync. The Web based tool is incorporated into the product to ensure through a veto service that he system will only start once the combination of software and schema versions actually matches
After performance tests, your team decided to implement an own serialization protocol. What were the reasons behind this decision? What is the impact on performance?
In early days, we used the CORBA / IIOP protocol, which is in fact a very efficient and bandwidth saving protocol with low latency. While migrating our application to JEE, we found, that CORBA is not long the right protocol for our situation (because CORBA is used on some Java EE servers itself for remote EJB communication). Because we're using fixed API functions which are never changing, we don't need meta information on the protocol. Instead it have to be high efficient in case of latency and bandwidth - that's the reason, why SOAP, XML, REST or similar protocols are not the right answer. We've done some research for such a protocol which is efficient and open source and found a protocol called Hessian. But after digging deeper in this protocol, we found some cases, where it does not match all our requirements and on the other side it has some features which we don't need. So we decided to implement our own binary protocol for our client/server communication. An additional plus is, that we have it in our hand the keep it downward compatible as long as we need. Even CORBA was high efficient, our protocol iHAP (iTAC High Availability Protocol) is a little bit better in case of latency and bandwidth consumption. So it's the perfect protocol for our special requirements.
I really enjoyed to work with your team. We found unconventional solutions with lots of fun. How important is "fun" and developer's motivation during development?
Fun is a important factor to produce an unconventional solution. I remember very good discussions between our developer colleagues and you Adam. Without the right spirit in the team and the fun, it won't work.
How many developers are working on the application?
For the base technology stack, we are working with a team of 4. The team of business developers consists of about 12 colleagues.
Which tools, IDEs, servers are you using?
Our developers could choose between Netbeans and Eclipse (most use Netbeans). The project is completely based on Maven with a lot of special Maven plugins developed in-house. Beside this, we use fully automated builds, controlled by Jenkins.
Would you choose Java EE again?
One of the FAQs is "Do you know any Java EE developers?". Are you also searching for Java EE Developers? :-)
Beside you, we know some persons which are often speakers in Java / Java EE conferences e.g. JAX, W-JAX. We are constantly looking for Java /Java EE developers to strengthen our team
Any pointers, resources to the "cool" stuff, presentations, demos, conferences, more information etc.?
Frank, thank you for the interview!
Very pithy! :) Thank you
Posted by Peter on September 23, 2015 at 12:12 PM CEST #
Good interview, but I'm not clear on why they couldn't stick with CORBA. It says "because CORBA is used on some Java EE servers itself for remote EJB communication", which is true, I'm using it right now, but what is the concern here? If CORBA is already "a very efficient and bandwidth saving protocol with low latency" why create your own binary protocol?
Posted by David on September 25, 2015 at 01:14 AM CEST #
Hi David, thanks for your comment.
The problem here is not CORBA itself, it's the JEE client container which need to be used to communicate with the remote EJB entry of the server (via CORBA). We found, that JEE client containers are usually not compatible when the server version is updated or a different server i sused. Instead, the client container jar files have to be updated on all already existing clients. One requirement of our customers is to be able to run a client application for long time (often many year's!), without updating them - even not changing any of the files on the client. These client libraries are often integrated in manufacturing equipment itself with no access to it (not from us and not from our customer who uses the production line). Sometimes, only the manufacturer of the production equipment itself is able to access the system (e.g. SMD pick and place machines or in-circuit testers).
We need the freedom to update the server to newer versions or even to different JEE server implementations without touching the clients.
Posted by Frank Meilinger on September 25, 2015 at 04:04 PM CEST #