A Java EE Startup: Filtering Information with zeef.com
Arjan, could you please briefly introduce yourself? How did you started with Java SE/EE?
I've been working with computers and programming for a long time. Started on the commodore 64 when I was but a small kid and engaged with programming almost right away. I studied computer science at the University of Leiden, where I specialized in high performance computing.
I "started" with Java around 1997. I think my very first contact with Java was when helping out my sister who studied math at the Free University of Amsterdam. They used Java there. I was more of a C++ programmer at the time, but the syntax was close enough to be able to help.
After that I got curious about this new Java thing and started to look more and more into it. My jobs however were still mostly C++. I did a few applets for my homepage at the university, ported a few games that I made for myself from C++ to Java, and I had one job where I coded a small desktop application using Java, but that was about it.
It wasn't until 2003 that I really deep-dived into Java and started to use it daily at m4n.nl, which was a startup too back then using Java EE.
Currently I'm a member of both the JSF EG as well as the Security EG.
What is the idea / business case behind zeef.com?
Zeef.com effectively tries to filter the world's information by adding a human element to search. A search engine like Google is of course unparalleled when you're searching for a very specific thing, like say a specific exception. But search engines don't actually say much or anything about the quality of links. Does a certain result appear at the first position of a search result because it's the best one, or just because it had the best SEO applied to it?
This is where zeef comes in. People who are knowledgeable about a certain field and are recognized by their community collect and rank all the best links for a certain subject. For example, Bauke Scholtz (BalusC) is a well known JSF expert and his zeef page at jsf.zeef.com/bauke.scholtz contains all the links for JSF that he thinks are the best ones, organized in blocks.
A distinguishing feature is that you can take those blocks and share them on the web, e.g. put them on your blog or in the side bar of your site. If you re-order the links in that block on your zeef page, these changes will be reflected wherever you shared that block.
A key element is that on zeef no one can exclusively claim a subject. If someone else thinks they can make a better page, or just a page that approaches the same subject from a different angle, they can do that.
What is the zeef.com architecture?
At the highest level we primarily have a server for the website itself, one that runs (overnight) jobs, one for our API, three servers that serve out the widgets and one for the database (PostGreSQL). Static resources, such as images and custom CSS are cached by the Apache frontend servers that proxy these via a separate cookieless domain (zeef.io). These servers are loosely clustered via an Infinispan cluster, which however doesn't send actual data through the cluster but only (async) invalidation messages.
For the implementation we use Java SE 8 and Java EE 7. The application server is WildFly 8.2, which we self-build and patch when needed. We always deploy one application (one archive) to one AS instance. Building a new AS and deploying it to a server is approximately the same process as building the application archive and deploying it.
The application archive we use is the EAR. We contemplated using the simpler WAR format when we started, but a rudimentary layering between business code in the EJB module and web code in the web module, as well as a straightforward conf/ directory in the root of the EAR swayed us to the EAR format.
The frontend is build using JSF, with CDI backing beans and Bean Validation constraints. Where Bean Validation constraints are not convenient we use native JSF validators.
CDI backing beans are kept as lean as possible and we avoid usage of JSF specific collection types there in favor of standard Java SE collections. Those slim backing beans only collect input from the page and delegate to services which contain the actual business logic. Services are implemented as stateless EJB beans, which makes transactional concerns a breeze. We don't use interfaces for services, nor do we have separate DAOs. EJB beans are injected with JPA entity managers, which handle persistence. JPA entities are kept as slim as possible as well, just data + getters/setters and Bean Validation constraints.
JPA entity managers get their data from an (XA) data source, which we have configured inside the application using the standardized Java EE data-source element in application.xml. Switching between stages is then done using a data source wrapper that reads connection details for the configured stage (which is provided via a -D startup parameter) from the EAR's /conf directory. This switching of data sources and the conf/ directory itself is something we had to hack together though and is something we very much missed in Java EE.
Security of the website is handled via JASPIC and the native security mechanisms of Servlet and EJB. As zeef.com is a standalone application that does not need to integrate with internal enterprise infrastructure it handles its own security. We have two authentication mechanisms, the first is a variant of Servlet FORM, but better suitable for usage with JSF (allows JSF to validate the username/password and provide feedback before authentication is attempted) and one that uses OAuth for authentication (technically, authorization) via providers such as Google and Facebook. Those mechanisms delegate to an identity store, which is implemented as a CDI bean that uses an EJB service and JPA for the actual credential processing and fetching of user details.
The overnight jobs are handled by fairly simple EJB timers. They do things like validating links and precalculation of some of the statistics that you see on the subject pages. Since these jobs run on a separate server it's particularly important that it participates in the same Infinispan cluster that the website is also using.
ZEEF also features a RESTful API, which is implemented by JAX-RS and also runs on a separate server. Here too security is based on JASPIC, but uses a different authentication mechanism; a stateless header based token one. Stateless here means that the result of authentication is not stored in a session per user and not even in a cookie. The client is simply re-authenticated with each request. To keep this reasonably fast an authentication cache is used, which unlike a session can be purged at any time without the client really noticing anything but a small delay when it does the next request.
The JAX-RS resources are mostly very lean and delegate to the same EJB services that the JSF backing beans are also using. Those EJB beans however are not clustered or remote; each application instance uses its own local instances.
How many developers are working on zeef.com?
We have 3 core developers; Bauke Scholtz, Jan Beernink and myself. Additionally our scrum master and system administrator also do programming and are technically part of the developer team as well, but they obviously don't have an IDE open full time.
Is Java EE productive?
Absolutely! We all know J2EE 1.4 had a name of being heavyweight and unproductive, with its focus on the over-complicated EJB 2 model and at times unintelligible vendor specific configuration for security and data sources, which wasn't just unintelligibly but had a tendency of changing between every release of an implementation. And let's not forget the slow start up times.
Java EE 7 however is for the most part immensely productive. JBoss itself without anything deployed starts up in about a second on our workstations and with the zeef website deployed it's roughly 15 seconds. CDI + EJB are now extremely simple classes with often just a single annotation that gets a lot done. And modern JSF (2.0 and beyond) allows for a programming style that can be very close to the web, but still offers you higher level abstractions when needed.
However, security is a nightmare out of the box. Historically the idea was that you set it up outside of your application at the application server level with vendor specific tools. While this is great for application independent security that covers many different applications, it's a cumbersome and poor fit for our use case. Our saving grace was this small little gem called JASPIC. Without it we wouldn't have been able to stay with Java EE for security. But JASPIC itself is just a low-level hook into the container, so we had to build a lot of higher level functionality on top of it. These struggles are what motivated me to join the Java EE 8 security EG. Hopefully we can make the experience better.
Another thing is that the @Asynchronous annotation lets you do one or more actions concurrently with a minimal amount of effort in a container safe way. This is by itself very productive, but it's unfortunate that there's no support for choosing a specific thread pool. Java EE does have a spec that deals with this (Java EE Concurrency utils), but additional thread pools have to be defined outside the application and they don't work in conjunction with @Asynchronous.
[Arjan, you should test https://github.com/AdamBien/porcupine/. I'm already curious about your opinion]
Do you have any external dependencies in your WAR? If yes, which purpose do they serve?
We have a couple of external dependencies, not only in the WAR but in the EAR as well.
In the war we use OmniFaces, PrimeFaces and PrettyTime.
OmniFaces is our own library that makes working with JSF a lot easier. It's not a visual component library, but it contains a lot of utilities and API enhancements that could have been in the JSF core as well. In fact, part of our job at the JSF EG is looking at which parts of OmniFaces make sense to be transferred to JSF itself.
PrimeFaces is another essential library when working with JSF. PrimeFaces is in a way what gives JSF the attraction that it has; a great set of beautiful visual components with which you can easily assemble your UI. It must be said though that PrimeFaces couldn't really do what it does if it wasn't for the foundation that JSF provides.
Then we use PegDown for the processing and rendering of Markdown that users can use in our text blocks. JSoup for parsing HTML, ROME for RSS/Atom parsing which is needed for the so-called feed blocks that we display, Image4J, Imgscalr and Batik for handling the images on our site, the Google API client library for interaction with Google analytics and a few other things, Infinispan for explicit caching of various things that are not JPA entities and finally Hibernate Search for powering the site's search functionality.
What about the performance, is Java EE fast enough?
Java EE is incredible fast. The website itself is now doing over 120k views a month and as mentioned above is running on a single server basically, which is not even that powerful.
You occasionally hear that JSF is supposedly not suited for public web sites, since it would be slow and using a lot of memory. We found however that this is absolutely not the case. Leonardo Uribe did some interesting performance benchmarks (see http://www.jsfcentral.com/articles/understanding_jsf_performance_3.html), and JSF came out as one of the fastest that also puts less strain on the GC than some of the other frameworks. Various other benchmarks like the World Wide Wait one came pretty much to the same conclusion. We did some testing of our own as well, and for a very basic page JSF can easily handle 5500 requests/second on a single server (see http://jdevelopment.nl/wildfly-8-benchmarked). In reality you'd not send so many request to one box and more complicated pages with database queries etc will reduce that number of course, but it does give a base line of performance.
The memory issue, specifically the session usage per user, is another source of confusion. Before JSF 2.0 this was indeed enormous, but ever since JSF 2.0 and its partial state saving, the actual amount of state saved is really low. JSF tracks the state and only stores changes instead of everything. Where in extreme cases JSF 1.x would use maybe a megabyte per user, JSF 2.x reduces this to something that's between a few bytes and a few KB.
As in most web applications the database plays an important role in the performance of the site. In case of Java EE there are some concerns when using JPA.
In an ideal object model, an entity often has references to all other entities it's logically associated with. If the object graph that is formed this way is of non-trivial size an ORM can induce a perform issue if one is not careful. Setting all relations to eager loading means you'll fetch almost the entire DB for every little operation which will destroy your performance, but setting everything to lazy will either give you exceptions all the time because data is not loaded, or will cause many additional small queries to be fired, which also destroys performance.
The key, as in plain SQL as well, is to fetch just the right amount of data. JPA has some mechanisms to help with this, like fetch graphs, but these are still not optimal. You can for instance specify per query what you want, but you can't say what you DO NOT want (see https://java.net/jira/browse/JPA_SPEC-96). In practice we also found that the interaction between fetch graphs and caching can be weird or sometimes even downright broken. If an entity was cached without fetching relation X, then when using a fetch graph that says X should be loaded, you still get the entity without X.
So our strategy has been to reduce the number of relations between entities to what is strictly necessary, and occasionally to use JPA DTOs when a subset of data is needed for a rather complex entity. Such DTOs are populated using the constructor selector syntax in JPQL queries and JPA is generally smart enough to generate far more optimal queries then. A limitation is that these are limited to single-valued path expressions (e.g. no collections, see https://java.net/jira/browse/JPA_SPEC-69).
Another strategy is to fire a number of queries in parallel. Using the aforementioned @Asynchronous this is rather trivial to do from a backing bean. The returned Future
What is your favorite IDE?
With some reservations I'd say Eclipse in combination with JBoss tools. I've been using this for a long time, some 12 years now I think.
I do dislike the fact that Eclipse doesn't really seem to be focused anymore on delivering an IDE, but from the outside seem to be more busy with being a platform for anything and an organization for everything. I'd wish they'd just be an IDE and nothing else, and focus on polishing the IDE experience above all.
Would you choose Java EE again?
Do you have any secret ideas for new Java EE startup?
Despite not being that old yet, zeef.com has reached a level where we feel a lot of the functionality is there. We therefor are somewhat starting to think about our next step. Ideas are always welcome, so I'd like to return the question by saying that if anyone has a great idea they can always contact us. No promises though ;)
Arjan, thank you for the interview!I just "zeefed" Java and found this: https://java.zeef.com/jan.beernink. Enjoy zeefing!
Hi Adam and Arjan,
Congratulations for your fantastic work and thanks for sharing your experience here!
I already knew zeef was build on top of JavaEE but not the details.
Do zeef has a mobile port? I saw that the site is responsible but do you have a navive mobile app? If yes, how it was built or what were the challenges?
thanks and congratulations again!
Posted by rmpestano on April 29, 2015 at 02:08 PM CEST #
I felt your pain regarding JPA part for a long time too and so I decided to build an extension especially designed for the DTO and fetching problem. What I came up with after about a year of development is blaze-persistence(https://github.com/Blazebit/blaze-persistence). Maybe it will ease your pain too ;)
Posted by Christian Beikov on April 29, 2015 at 09:55 PM CEST #
Please ask JavaEE EG to think about application logging in a app-server neutral way.
Logging is important but it's always a pain to let it work on different appserver.
Posted by Crick on May 02, 2015 at 12:35 AM CEST #
Very useful interview, Adam: thanks to both of you. Arjan's answers were exceptionally detailed. It's illuminating and encouraging to learn how the experts handle these higher-level issues. I don't think enough is written about this, with the exception of your blog, of course.
Posted by David on May 17, 2015 at 06:23 AM CEST #