adam bien's blog

Java EE Apps on 10 TB RAM, 5Ghz Big Irons 📎

Pavel, please introduce yourself

Hello Adam. My name is Pavel Samolysov. I live in Russia with my wife Olga and new born daughter Dasha. The day-to-day log of my life is on Twitter (@psamolysov) and some technical articles could be found on my English blog. Also I have been blogging in Russian from 2007, there are about 220 articles in the blog about Java, Java EE, OSGi, System Integration, System Architecture, SOA, ESB and Functional Programming. I met Java 11 yeas ago and as I remember begun learn the programming language because there was an amazing integrated developer tool - Eclipse SDK. Some years ago I even was an Eclipse commiter and maintained the Eclipse Communication Framework project. ECF is a set of frameworks for building communications into applications and services. It provides a lightweight, modular, transport-independent, fully-compliant implementation of the OSGi Remote Services standard. Also, for yeas I worked as an Application Integration Architect designed a number of large Enterprise Service Bus solutions using Java EE as well as Oracle Server Bus and Oracle SOA Suite. But my current job role is z Systems Software Technical Specialist in IBM, so I help my customers to build and support their high available (z Systems means "zero downtime") and high business critical Java EE applications on IBM WebSphere Application Server for the Big Irons - IBM mainframes.

You are running Java EE application on the Big Irons. How big are they? What is the difference to a typical Linux box for Java EE developers?

The iron is very big, so it higher than a man, an enterprise class mainframe is an 2x1.5x2 meters box. But as I understand, your question was about processor and memory power. The latest design z13 mainframe has up to 141 22nm, 5Ghz cores for customer use and up to 10TB of real storage (memory). Single Instruction/Multiple Data (SIMD) instructions as well as Simultaneous Multithreading (SMT) are supported and open for Java applications. On modern mainframes all kinds of workload could be operated. I mean classical workload, e.g. IBM CICS and DB2 upon z/OS operating system as well as new Java workload like WebSphere Application Server applications and Linux and open source one. IBM loves Linux even more and presents a new efficient, powerful, secure Linux platform called IBM Linux ONE. Let me say a few words about the platform because I think it could be interesting for Java EE and Linux adopters. Linux ONE is a box that can scale up and out with up to one million Docker containers per system. Since mainframe's I/O subsystem was forever a killer feature of a platform, with Linux ONE you can gain faster response times with massive high-performance I/O throughput. Also, the platform provides up to 10TBs of memory for the Big Data processing using Apache Hadoop or Apache Spark, e.g. From Java point of view mainframes provide the following benefits.

Collocation. Usual deployment model is a z/OS image contains a database and a WebSphere Application Server instance. Since an application server and a database are on the same operation system image, JDBC Type 2 can be leveraged. JDBC Type 2 matters because the technology provides the Java <---> DB connectivity without any TCP/IP and network impact, only inter-process communication is exploited. z/OS also provides cross-memory services, so a Java to DB invocation is carried out even without the IPC stack, instead special instructions are in the case. Everyone likes zero latency.
Hardware Transactional Memory (HTM) - allows lockless interlocked of a block of code called a "transaction". One of the main properties of transaction is "atomicity". The property means other processors in the system will either see all-or-none of the storage up-dates of transaction. An example of the HTM utilizing is the new HashTable class implementation delivered from IBM Java 7R1, which shows over 5x effectiveness improvement in an multi-threading environment.
SIMD instructions for Java 8 applications. A new version of Java Virtual Machine exploits the vector z13 processor instructions for java.lang.String operations (i.e. compareTo, compareToIgnoreCase, contains, contentEquals, equals, indexOf, lastIndexOf, regionMatches, toLowerCase, toUpperCase, getBytes); the java.util.Arrays#equals operation on primitives; strings encoding converters (for ISO8859-1, ASCII, UTF8, and UTF16: encode (char2byte) and decode (byte2har) methods). Auto-SIMD is a new Just-In-Time (JIT) compiler optimization in IBM Java 8 that transparently accelerates simple scalar loops by leveraging vector operations on z13, e.g. matrix multiplication. A number of carried out by the IBM Lab benchmarks show that specific idioms/operations were improved by between 2X and 60X. Performance benefits for real Java applications will be dependent on how frequently these idiom/operations are used.
CP Assist for Cryptographic Function (CPACF) – are providing up-to 2X improvement in throughput-per-core for security-enabled applications. Java 8 exploitation of CPACF is the default starting from z9 on both z/OS and Linux on z Systems.
zEDC Express I/O Adapter. Did you know over 2000 petabytes of data are created every day? Between 2005 and 2020, the digital universe will grow by 300x, going from 130 to 40 000 exa-bytes. 80% of world's data was created in last two years alone. zEDC Express is an I/O adapter that does high performance industry standard compression. Applications can use zEDC via industry standard API (zlib and ... Java). Up to 91% reduction in CPU time using zEDC hardware vs zlib software and up to 74% reduction in Elapsed time are here. Compressed ratio is up-to ~5x.

And my personal opinion, mainframes is just a whole world, so it is a full integrated engineered together hardware and software. They provide a lot of benefits of vertical integration from a high-level middleware component like a portal engine to an operational system, a security server, the hypervisor and the hardware. Yes, some customers dislike vendor lock-in, but from another point of view in a case of problem one vendor can't say the problem is on the part of another one and vice verse.

What was the craziest / most interesting application you saw in the wild?

One of the usual task for a Client Technical Specialist is benchmarking. The customers want to see how their applications work on the z Systems platform and get real information about high availability and performance characteristics. When I joined to IBM, my first project was the benchmark of a bank payment processing system. The application architecture was based on the Filters and Pipes design pattern, i.e. the application was divided on a number of modules deployed on several WebSphere Application Server for z/OS instances connected by IBM WebSphere MQ for z/OS. As you can suspect, DB2 for z/OS was used as a database machine. Every module was a Java EE 6 application with a lot of EJBs and, exactly, MDBs. Neither CDIs nor JPA were used. Data Access Layer was based on plain JDBC, but the developers separated DAOs from other parts of application, so the migration from Oracle to DB2 z/OS took not much time. WebSphere Application Server was connected to DB2 using JDBC Type 2. The application was deployed on a cluster built upon two logical partitions (LPARs) on a z Enterprise EC 12 mainframe. Mainframe specific hardware and software like Coupling Facility were used for clusterization. The high availability and performance tests during increased distance between cluster nodes were the most interesting part of the benchmark. An mainframe cluster is an example of shared everything solution, so both nodes access data located in a common storage (Coupling Facility structures) including database locks. The common storage existed on one node and workload run on another node got access to the storage via some optic fibers. The speed of lite in optic fibers is about 200 000 km/s, so there is a delay about 10 mks a kilometer (5x2 because the signal are propagating in the both directions). The result of the deployment model is the second cluster node puts and gets database locks a bit slowly and the application must be well designed to work on the environment. Our team provided many recommendations to the developers during the benchmark and they well tuned the application. In the end, the application complied with SLA in both the performance terms and the high-availability ones even on the distance between nodes up to 70 kilometers (we just have no cable more). From the high-availability point of view an active-active cluster ensures the Recovery Point Objective (RPO) equals zero since the synchronous disk data replication is used and Recovery Time Objective ---> zero since one node still work while another one failed.

Which tools are you using in your daily work?

Today I'm not writing code a lot but sometimes I open my Eclipse SDK and develop some proofs of technology for my customers or just write short programs help me improve my expertise in WebSphere Application Server, WebSphere Liberty Profile, MQ, IBM JVM and other products. Also I published a number of Spring Framework using examples on GitHub. During work with my customers, I continually use IBM Support Assistant (ISA). ISA is a set of well designed tools partly based on well known open source ones like Eclipse Memory Analyzer (MAT). Usually I use three/four of them. - Pattern Modeling and Analysis Tool as well as Garbage Collection and Memory Visualizer are used for Java Garbage Collector behavior analysis, so the tools can show heap allocation for each generation, GC pauses and intervals or the pauses/intervals durations and so on. - Memory Analyzer provides heap analysis, it looks like MAT but can parse the Portable Heap Dump (PHD) format and contains a number of IBM Extensions like WebSphere Application Server Thread pools analysis or the WebSphere HTTP Sessions reports. It is a mandatory tool for solving whatever memory leaks problems. - Thread and Monitor Dump Analyzer for Java is using for thread monitoring and finding synchronization and locks related problems. The tools analyzes your javacore dumps, shows running, parked, waiting or blocked application threads and their stack-traces and investigates all locks and monitors. Another my favorite tool is Jinsight. It is a developed by IBMers profiler for internal using only. Special agent is just copied on machine an analyzed application running on and registered in the application server or standalone JVM using the "agentpath" command line attribute. As I know, currently there are agents for the z/OS, Linux on z Systems and Linux on x (x86 and x86-64) platforms only. Profiler traces could be written in a file or sent by network. A GUI tool named JinsightLive is used for the analysis. I like the tool because it shows results not only in an ugly table view like almost usual profilers do, but like a colored timeline diagram when an every line height is proportional the execution time of a particular method. Our brains are organized this way, so we analyze graphical images much, much better than text or table information. Also, a couple of month ago I begun to learn COBOL (yep, I'm a mainframe guy and every mainframe guy should know COBOL or High Level Assembler :)) and Rational Developer for z Systems is my friend today.

What are the typical architectural and runtime problems / bottlenecks / pitfalls of Java EE applications in real world?

My personal opinion is developers, developers, developers, developers. I hardly ever see any problems in a runtime environment or application server (and usually there is a some misconfiguration), but memory leaks, locks and long executed SQL-queries are big problems. I don't want to blame any frameworks, but for years I've made a conclusion for me, some ones do more memory and consequently GC pressure by design than other. For instance, in practice I saw user sessions were taken about 60 MB for each other. It means you need a 6 GB heap just for 100 concurrent users or 60 GB for 1000 ones. Well, from salers point of view it sounds very good, bad designed software sells hardware, but from engineers one it sounds crazy. And it is very strange for me, sometimes developers don't want fix problems even they got a well done analysis. Ok, I understand, there is some business, you have a contract with the customer and any fix means money. May be customers should think more during contract signing while any support is not included in. It may be good old Russian tradition, but every body likes boil the stone soup. Why WebSphere Application Server or WebLogic or Glassfish or whatever, why EJB or CDI, WTF, there is Tomcat and our lovely Spring Framework, so we can develop anything on this guys. But if you ask about support, please see my sentence above. The developers community looks like a Washington pie, so on top there are a number of well experienced team leads, they usually tried using J2EE 1.4 or even 1.3, looked at EJB 2.0 and dislike it. This guys switched on the Spring Framework or another one years ago and do not know about new thing in Java EE 5, 6, or 7. Bellow there are a lot of young developers which get knowledge only from this Java EE haters gurus (a guru could be a team lead or just a guy giving advices on an internet-forum). A common misconception in Russia is Java EE is a slow, very hard to develop and maintain technology and light-weight solutions work faster.

Tell me more about your benchmark CDI vs. EJB vs. Spring. Why have you created that? Did the results surprised you?

Yep, as I said above, in my country is a common misconception about the Java EE vs Spring performance comparison. I had a look at your webcast What is Faster - EJBs or CDIs and was inspired. I knew a little bit about Alexey Shipilev's toolkit - JMH and my goal was leverage the well known in the Java community toolkit to see what is the truth about Spring and Java EE performance. The benchmark was designed. My idea was to use every technology like usual, I mean not to do any low level optimization, i.e. using either the javax.ejb.TransactionAttribute or ApplicationScoped annotations. I was surprised by the result because theoretically I thought the Spring Framework should be faster. A EJB, e.g., starts a transaction every time and checks user permissions while a Spring default singleton-scopped bean is created once and one instance of the bean is executed every time. But as you see, here is the Spring Web MVC framework in the account and a profiler (e.g. Jinsignt) could show a majority of the execution time is spent in the MVC code. Dispatcher Servlet is a "global" thing for the Spring Web MVC framework, so the servlet is used not only for "RESTification" but as a controller for any Spring-based web-application and he must do many activities before the 'business' logic will be invoked. One of the activities is a respective request handler (so, I mean a controller) seeking, it takes about 35% of each DispatcherServlet.doDispatch() method execution. The results for CDI was obvious, 97% of each org.apache.wink.server.internal.handlers.InvokeMethodHandler.handleRequest() method execution is controller building and dependencies injection (the org.apache.wink.server.internal.registry.ResourceInstance.getInstance() method). "Business" logic takes only 1% of the overall method execution time.

Any impressive numbers like transactions / second, total number of users, sessions, bandwidth, cluster nodes etc. you can share with us?

Please, let me share some information about z transactions rate champion - IBM Information Management System (IBM IMS) - a joint hierarchical database and information management system with extensive transaction processing capabilities. IBM IMS was designed during the late 60s for the Apollo Program but currently it is a modern database, which supported Java, JDBC, XML and, since 2005, web services. So, you see, the NoSQL initiative is not a new thing, it has existed for decades. In September 2013 the IMS Performance Evaluation Team carried out a benchmark demonstrated the ability of IMS 13 to achieve a new record sustained average transaction rate of over 117,000 ACID transactions per second on a zEnteprise EC 12 mainframe equipped with 20 general processors and 51.2 GB real storage using their high volume Fast Path Credit Card benchmark workload. The workload is very similar like a real credit card processing application. The IMS Team is very open and kindly, they share an amazing report described all problems and solutions as well as every system configuration file. Even IMS is not your topic, the paper is a very interesting sample how a benchmark report has to look like.

Any links you would like to share with us?

I already put many links in the interview and because I like reading a lot, let me just share my favorite books, please:

Pavel, thank you for the interview!