Recommendation engines help narrow your choices to those that best meet your particular needs. In this post, we’re going to take a closer look at how all the different components of a recommendation engine work together. We’re going to use collaborative filtering on movie ratings data to recommend movies. The key components are a collaborative filtering algorithm in Apache Mahout to build and train a machine learning model, and search technology from Elasticsearch to simplify deployment of the recommender.
After a short hiatus from blogging, I’d like to show you something exciting today. I can’t take the credit for all of the work - the development was originally started by my son Martin, then picked up by my colleague Jaroslav. I’ve really just added a few finishing touches to make the module releasable. So voilà: I present to you the Google Places module! It’s an integration of Magnolia and Google Maps and Places done a little differently from what you might expect.
All the places you’ll ever need
What you typically want when you deploy Magnolia in your organization is to take maximum advantage of its UI. What you typically get when you ask devs to place a map in your website is a fixed size map, perhaps with some text file or direct html access where you can edit a list of markers. This changes with Magnolia: now you have one pretty Magnolia-style app that your editors can use to load, edit, categorize and sort all the markers they use on one or different maps across their site.
Using marker categories
Every marker can be assigned one or more categories. When putting together a map, you then pull in the relevant category or categories. One marker can therefore be used in multiple maps, depending on how many categories it’s relevant in. This allows the editor to organize the markers according to type, but then re-use them on many different maps according to category. To make that sample little bit more interesting, we’ll deliver the module with set-up markers showing locations of Magnolia’s offices and those of all partners around the world. So for example, if you want to assemble a map with all Magnolia partners, just tell the map to draw in all markers categorized as “Partner”. Boom!
If you already have a list of locations saved in a spreadsheet, save it as an Excel file and get it imported into the app directly.
While you can (and really should) specify exact locations using latitude and longitude for each marker, you can also just leave in the address - the exact position will be retrieved on the fly using the Google Places API. Due to the limitations in the free usage of the API, you should not do that for too many markers, or you should buy unlimited access if you need to do that.
Download, use, contribute
Now for the best part - the module is already released and ready for you to use. Feel free to contribute more functions and improvements to it, if you like it.
Java/Akka based technology models, each of which model a different technology, are active in a distributed Internet community. Any of the technology models may have a definition of a better future version of itself. A technology model that aspires to improve itself engages in conversations with other models in the community; seeking to discover behaviors that are exportable from other technology models which it can integrate into itself to achieve its goal.
Every year, for the past 19 years, JavaOne is the biggest and most awesome gathering of Java enthusiasts from around the world. JavaOne 2015 is the 20th edition of this wonderful conference. How many conferences can claim this? :)
Have you been speaking at JavaOne for past several years? Don't wait, and just submit your session today. The sooner you submit, higher the chances of program committee members voting on it. You know the drill!
I'm excited and honored to co-lead the Java, DevOps, and the Cloud track with Bruno Borges (@brunoborges). The track abstract is:
The evolution of service-related enterprise Java standards has been underway for more than a decade, and in many ways the emergence of cloud computing was almost inevitable. Whether you call your current service-oriented development “cloud” or not, Java offers developers unique value in cloud-related environments such as software as a service (SaaS) and platform as a service (PaaS). The Java Virtual Machine is an ideal deployment environment for new microservice and container application architectures that deploy to cloud infrastructures. And as Java development in the cloud becomes more pervasive, enabling application portability can lead to greater cloud productivity. This track covers the important role Java plays in cloud development, as well as orchestration techniques used to effectively address the service lifecycle of cloud-based applications. Track sessions will cover topics such as SaaS, PaaS, DevOps, continuous delivery, containers, microservices, and other related concepts.
So what exactly are we looking for in this track?
How have you been using PaaS effectively for solving customer issues?
Why is SaaS critical to your business? Are you using IaaS, PaaS, SaaS all together for different parts of your business?
Have you used microservices in a JVM-based application? Lessons from the trenches?
Have you transformed your monolith to a microservice-based architecture?
How are containers helping you reduce impedance mismatch between dev, test, and prod environments?
Building a deployment pipeline using containers, or otherwise
Are PaaS and DevOps complimentary? Success stories?
Docker machine, compose, swarm recipes
Mesosphere, Kubernetes, Rocket, Juju, and other clustering frameworks
Have you evaluated different containers and adopted one? Pros and Cons?
Any successful practices around containers, microservices, and DevOps together?
Tools, methodologies, case studies, lessons learned in any of these, and other related areas
How are you moving legacy applications to the Cloud?
Are you using private clouds? Hybrid clouds? What are the pros/cons? Successful case studies, lessons learned.
These are only some of the suggested topics and are looking forward to your creative imagination. Remember, there are a variety of formats for submission:
60 mins session or panel
Two-hour tutorial or hands-on lab
45 mins BoFs
5 mins Ignite talk
We think this is going to be the coolest track of the conference, with speakers eager to share everything about all the bleeding edge technologies and attendees equally eager to listen and learn from them. We'd like to challenge all of you to submit your best session, and make our job extremely hard!
After months of preparation, it all came down to three days of intense execution, and I was just one speaker. I can only marvel at the logistical acumen that was on display from the JavaLand and DOAG team. I had an action packed agenda: two conference sessions, two Early Adopter's Area (EAA) session, and one training day session. Thrown into the mix were a couple 1:1 consulting sessions and a vJUG/NightHacking session. I especially hope the conference attendees enjoyed the Early Adopter's Area, capably coodinated by Andreas Badelt. Because of the high level of activity on my personal agenda, I was not able to attend as many sessions as I would have liked. In any case, this blog entry is my place to share my overall impressions of the conference, and of the sessions I did get a chance to attend.
Right off the bat, I want to tip my cap to Marcus Lagergren for remaining calm in the face of some AV problems. Even with all that, and the 45 minute session duration, Marcus managed to give a compelling whirlwind tour of his personal experience with Java from the beginning. More photos like the one on the right are available from Stefan Hildebrandt's flickr photo stream. I think there is a lot more room in the "20 year's of Java meme", however, and I applaud Marcus for wisely not attempting to speak for all of it and drawing from his own experiences. That's one great thing about the #java20 meme, everyone has their own story. Maybe at JavaOne 2015 they will have some sort of Story Corps type thing where people can give their stories. Come to think of it, if someone wants to build Story Corps as a Service (SCAAS), perpahs they can sell it to Oracle for use at the show.
Shortly after Marcus's session, I presented with my good friend Oliver Szymanzki a 45 minute capsule of our full day training session about Java EE 7 from an HTML5 Perspective. It was tough to make a meaningful abstraction from a full day session to just 45 minutes, but I hope at least people could take something useful away from it.
Then came my first exposure to the EAA, which was my only chance to present JSF content here at JavaLand. I gave a quick presentation and had an informal meeting with several JSF EG members who were at JavaLand. We covered f:socket, multi-component validation, and URL mapping.
The evening community event was really not to be missed. If you ever have a chance to attend JavaLand, I really recommend you participate.
I started out the day by presenting a modified version of my DevNexus session about Servlet 4.0 and HTTP/2. I basically dropped the demo and moved the Java SE 9 content to an EAA session in order to fit into the 45 minute window.
Following my session I was able to enjoy Mark Little's keynote about Java Enterprise and Internet of Things. This session put out some hard-won truths of problems we have solved in Java as cautionary tales for newer stacks that seem intent on re-inventing wheels rather than standing on the shoulders of others. I must admit it was a feel-good session, but still realistic and largely kool aid free.
Running back to the EAA, I presented the exciting work being done by Michael McMahon to bring a new Http Client to Java SE 9, including HTTP/2 support. I can't post the slides, but I'm sure we'll have something on this at JavaOne.
My last engagment of the conference proper was to participate in a joint vJUG/NightHacking session regarding Adopt-a-JSR. This was lots of fun, and I thank Stephen Chin and Simon Maple for providing a vehicle for it.
As a nice wind-down from the conference, and a bit of chill before the training day, I was invited by DOAG boss Fried Saacke to attend the 5 year celebration dinner for Java Aktuell magazine. I didn't know it at the time, but the invitation included an opportunity to give a short speech, in German, on the importance of JCP to the Java Community. I hope I didn't mangle my words too badly.
After being blessed with many years of German conference opportunities at which I invariably bring home lots of chocolate, I felt it was time to give the Germans a taste of American-style sweets along with their pre-loaded VM usb sticks. These Tasty Kakes or a specialty of my home-town of Philadelphia, and each attendee of the session Java EE aus einer HTML5-Perspektive received some along with a full day of instruction and a USB stick with a VM containing the workshop materials.
In summary, JavaLand has lots to recommend it. Come for the content, stay for the fun.
This article is a much shortened version of the rambling Local CA article in that series.
CA-signed certificates are used for public server authentication. In this article, we consider a private network of devices connecting to our servers i.e. via client-authenticated SSL. In this case devices have client certificates which we explicitly trust (and none other).
Let's setup a private CA to sign certificates with our own CA key. We will sign our server certificates, and consider signing client certificates as well.
What is a digital certificate?
Firstly we note that public key cryptography requires a private and public key. These are mathematically linked, such that the public key is used to encrypt data which can be decrypted only using the private key. Moreover, the private key can create a digital signature, which is verified using the public key.
A public key certificate (also known as a digital certificate or identity certificate) is an electronic document used to prove ownership of a public key. The certificate includes information about the key, information about its owner's identity, and the digital signature of an entity that has verified the certificate's contents are correct. If the signature is valid, and the person examining the certificate trusts the signer, then they know they can use that key to communicate with its owner.
So a certificate is a document which contains a public key, and its related information such as its "subject." This is the name assigned by its creator, who is the sole holder of the corresponding private key.
This document is digitally signed by the "issuer." If we trust the issuer then we can use the public key to communicate to the subject. Cryptographically speaking, we can use the public key to encrypt information, which can be decrypted only by the holder of the corresponding private key.
Finally, X.509 is standard for Public Key Infrastructure (PKI) that specifies formats for public key certicates, revocation lists, etc.
Root CA certificate
By definition, a root certificate (e.g. a CA certificate) is self-signed, and so has the same "Issuer" and "Subject." For example, inspect a GoDaddy root certificate as follows:
A self-signed certificate is a public key and its subject name, which is digitally signed using its corresponding private key. We can verify its signature using the public key, but have no other inherent assurances about its authenticity. We trust it explicitly via a "truststore."
Keystore vs truststore
A "keystore" contains a private key, which has a public key certificate. Additionally the keystore must contain the certificate chain of that key certificate, through to its root certificate (which is self-signed by definition).
A "truststore" contains peer or CA certificates which we trust. By definition we trust any peer certificate chain which includes any certificate which is in our truststore. That is to say, if our truststore contains a CA certificate, then we trust all certificates issued by that CA.
Note that since the keystore must contain the certificate chain of the key certificate, whereas the truststore must not contain the certificate chain of included trusted certificates, they differ critically in this respect.
Client certificate management
In order to review active credentials, we require a perfect record of all issued certificates. If a certificate is signed but not recorded, or its record is deleted, our server is forever vulnerable to that "rogue" certificate.
We could record our signed certificates into a keystore file as follows:
where this is not a truststore per se, but just a "database" of issued certificates.
Interestingly, we consider signing our client certificates to avoid having such a truststore containing all our clients' self-signed certificates, but nevertheless end up with one - which is telling.
We could similarly record revoked client certificates. However for private networks where the number of certificates is relatively small, it is simpler and more secure to trust clients explicitly, rather than implicitly trusting all client certificates signed by our CA, and managing a revocation list.
If the number of clients is large, then probably we need to automate enrollment, which is addressed in the companion article Client Authentication in this series, which proposes a dynamic SQL truststore for client certificates.
Alternatively we might use a client certificate authentication server, e.g. see my experimental Node microservice github.com/evanx/certserver - which uses Redis to store certificates and their revocation list.
Self-signed client certificates
We prefer self-signed client certificates which are explicitly imported into our server truststore, where they can be reviewed. In this case, they are "revoked" by removing them from the truststore.
However, self-signed client keys are effectively CA keys, and so rogue certificates can be created using compromised client keys, e.g. using keytool -gencert. So we implement a custom TrustManager for our server - see the Explicit Trust Manager article in this series.
Consider that we must detect when our server has been compromised, and then generate a new server key. If using a self-signed server certificate, then we must update every clients' truststore. In order to avoid such a burden, our server certificate must be signed using a CA key which our clients trust.
However, our clients must trust only our private server, and not for example any server with a Go Daddy certificate. So we generate a private CA key. This key controls access to our server.
While our server naturally resides in a DMZ accessible to the Internet, its CA key should be isolated on a secure internal machine. In fact, it should be generated offline, where it can never be compromised (except by physical access). We transfer the "Certificate Signing Request" (CSR) to the offline CA computer, and return its signed certificate e.g. using a USB stick.
In the event that our server is compromised, we generate a new server key, and sign it using our offline CA key. Our clients are unaffected, since they trust our CA, and thereby our new server key. However our clients must no longer trust the old compromised server key, as it could be used to perpetrate a man-in-the-middle (MITM) attack.
So we must support certificate revocation. For example, we could publish a certificate revocation list to our clients, or provide a revocation query service, e.g. an OCSP responder.
Alternatively, we could publish the server certificate that our clients should explicitly trust. Before connecting, our clients read this certificate, verify that it is signed by our CA, and establish it as their explicit truststore for the purposes of connecting to our server. In general, it is better to be explicit rather than implicit, to have clarity. Explicit trust enables a comprehensive review of active credentials.
We consider a scenario where the above "revocation" service and our server both suffer a simultaneous coordinated MITM attack. Generally speaking, our architecture should make such an attack expensive and detectable. Our revocation service should be divorced from our server infrastructure at least, to make it more challenging.
An approach to avoid managing revocation ourselves, is to use a public CA signed certificate for our server, cross-signed by our private CA. In this case, the standard SSL trust manager would validate the certificate e.g. via OCSP to Godaddy. However, our clients' then chain the standard trust manager to an explicit trust manager, to verify that the server certificate is cross-signed by our private CA.
Server certificate signing
We create a keystore containing a private key and its self-signed certificate (for starters) using keytool -genkeypair.
Naturally the common name of a server certificate is its domain name. This is validated by the client e.g. the browser, that the certificate's "Common Name" matches the host name used to lookup its IP address.
We export a "Certificate Signing Request" (CSR) using -certreq.
where we set the X509v3 extensions to restrict the key usage for good measure, as we see for certificates we buy from a public CA.
We import this signed certificate reply into our server keystore. But keytool will not allow a signed certificate to be imported unless its parent certificate chain is already present in the keystore. So we must import our CA cert first.
$ keytool -keystore server.jks -alias ca -importcert -file ca.pem
$ keytool -keystore server.jks -alias server -importcert -file server.signed.pem
This demonstrates why the keystore requires a certificate chain, i.e. to send to the peer for validation. The peer validates the chain, and checks it against our trusted certificates. It stops checking as soon as it encounters a certificate in the chain that it trusts. Therefore the chain for a trusted certificate need not be stored in the truststore, and actually must not be - otherwise we trust any certificate issued by that trusted certificate's root, irrespective of the trusted certificate itself.
Consider that our clients must trust only our server, whose certificate happens to be issued by GoDaddy - we don't want those private clients to trust any server with a certificate issued by GoDaddy!
We create the private keystore on each of our clients.
Public CA certificates are typically used for public server authentication. However, we are primarily concerned with private client authentication for access to a private server, i.e. a virtual private network.
Our clients should trust only our server, and not any server certificate issued by some public CA. We sign the server certificate using an offline CA key which our clients solely trust. When our server is compromised, we can change our server key without changing our clients' truststores. However, we must somehow invalidate the old server certificate. We might publish our current server certificate that our clients load first from multiple sources that we control. We check their consistency, and thereby combat a MITM attack. Our clients can then connect to our server, and verify that it has the certificate we expect, and that it is signed by our offline CA key.
We prefer self-signed client certificates, which are explicitly trusted. However, we note that self-signed certificates are effectively CA certificates, and so a compromised private key can be used to create rogue certificates. So we should implement a custom "explicit trust manager" to ensure that the peer's key certificate itself is explicitly included in the truststore, i.e. disregarding its chain of signing certificates.
Exactly two years ago, I wrote a blog on Introducing Kids to Java Programming using Minecraft. Since then, Devoxx4Kids has delivered numerous Minecraft Modding workshops all around the world. The workshop material is all publicly accessible at bit.ly/d4k-minecraft. In these workshops, we teach attendees, typically 8 - 16 years of age, how to create Minecraft Mods. Given the excitement around Minecraft in this age range, these workshops are typically sold out very quickly.
One of the parents from our workshops in the San Francisco Bay Area asked us to deliver a 8-week course on Minecraft modding at their local public school. As an athlete, I'm always looking for new challenges and break the rhythm. This felt like a good option, and so the game was on!
My son has been playing the game, and modding, for quite some time and helped me create the mods easily. We've also finished authoring our upcoming O'Reilly book on Minecraft Modding using Forge so had a decent idea on what needs to be done for these workshops.
Most of the kids in this 8-week course had no prior programming experience. And it was amazing to see them be able to read the Java code by week 7. Some kids who had prior experience finished the workshop in first 3-4 weeks, and were helping other kids.
Check out some of pictures from the 8-week workshops:
Many thanks to attendees, parents, volunteers, Parent Teacher Association, and school authorities for giving us a chance. The real benchmark was when all the kids raised their hands to continue workshop for another 8 weeks ... that was awesome!
Is Java difficult as kids first programming language?
One of the common questions asked during these workshops is "Java is too difficult a language to start with". Most of the times these questions are not based on any personal experience but more on the lines my-friend-told-me-so or i-read-an-article-explaining-so. My typical answer consists of the following parts:
Yes, Java is a bit verbose, but was designed to be readable by humans and computer. Ask somebody to read Scala or Clojure code at this age and they'll probably never come back to programming again. These languages serve a niche purpose, and their concepts are now anyway getting integrated into the mainstream language already.
Ruby, Groovy, and Python are alternative decent languages to start with. But do you really want to start teaching them fundamental programming using Hello World.
Kids are already "addicted" to Minecraft. Game is written using Java and modding can be done using Java. Lets leverage that addiction and convert that into their passion for programming. Minecraft provides a perfect platform for gamification of programming experience at this early age.
There are 9 million Java developers. It is a very well adopted and understood language, with lots of help in terms of books, articles, blogs, videos, tools, etc. And the language has been around for almost 20 years now. Other languages come and go, but this is the one to stay!
As Alan Kay said
The best way to predict the future is to create it
Lets create some young Java developers by teaching them Minecraft modding. This will give them bragging rights in their friends, parents a satisfaction that their kids are learning a top notch programming language, and budding Java developers to the industry.
I dare you to pick up this workshop and run in your local school :)
If you are in the San Francisco Bay Area, then register for one of our upcoming workshops at meetup.com/Devoxx4Kids-BayArea/. There are several chapters in the USA (Denver, Atlanta, Seattle, Chicago, and others).
Would your school be interested in hosting a similar workshop? Devoxx4Kids can provide train-the-trainer workshop. Let us know by sending an email to firstname.lastname@example.org.
As a registered NPO and 501(c)(3) organization in the US, it allows us to deliver these workshops quite selflessly, fueled by our passion to teach kids. But donations are always welcome :)
I thought some of you might be interested in hearing about Java and the Java dev team at a startup that's grown beyond the initial stage. Nexmo is a four year old startup headquartered in San Francisco but with the engineering team based out of techhub London; and is already one of the worlds largest cloud communications companies (cloud communications provides any application with the ability to communicate with people - eg sending a pin code or any message via SMS to a phone, or setting up a phone menu or a callback button).
In terms of technology, like any startup, we're very flexible on what's in use. Older proven tech like jetty, trove collections, lots of apache commons modules, sit side-by-side with more recently created tech like OpenHFT collections, MongoDB, Hazelcast. The core system is capable of massive throughput architected around a queue-and-forward set of Java microservices which allows essentially unlimited horizontal scaling while keeping latency relatively low: overall latency for an SMS message tends to be in seconds because of the carrier hop to the end device, but minimizing the additional latency we add is important and our architecture keeps this down to a few milliseconds per message regardless of throughput; voice technology is mature and low level communications is best offloaded to dedicated mature server technology - like any sensible company we prefer to integrate already existing successful technology rather than build our own.
Having moved past the early startup phase, we emphasize good solid design patterns, simplicity and good engineering. Internally, the components are already highly asynchronous but quite stable, a great deal of our interactions, both upstream and downstream with clients and suppliers, require the use of asynchronous protocols operating highly concurrently. Our next challenges are similar to many tech companies: handling enormous amounts of data; how do we respond to the Internet-of-things (highly relevant to a comms company); how do we integrate with chat apps; where does webRTC come into our product mix.
The culture is very typical "startup": breakouts for table tennis sessions, fresh fruit and various soft drinks constantly available, a relaxed fun atmosphere. The software development team of 15 (and growing) is enormously varied: we have every experience level from recent graduate to 20-year Java veteran; many ethnicities and nine nationalities (mostly various European); 40% of the team are women; and we include one Java champion. As someone who had previously spent over a decade in investment banks, it's a massive breath of fresh air, I find it fantastically free and convivial in comparison.
I hope that gives you a flavour of Java at a next stage startup.
Now that both JSRs are in full swing I am going to offer you all a bit of a different perspective between the 2 technologies.
As I have stated before I view them both complementary to each other!
I want to talk a bit about the actual work of doing the JSRs themselves.
As part of the JSR we deliver a reference implementation, but in reality does the work stop there? No, it surely does not. For JSF we have years of work after the completion of any of its JSRs. So one part is working on a new JSR cycle. But in reality the buck does not stop there. I am talking about the nitty gritty of maintenance!
I have now been involved in maintaining the Oracle implementation of JSF, named Mojarra since December 2011. What have I learned? Maintaining a piece of software that is backed by a specification is HARD. It by no means is boring, nor is it NOT challenging. Quite the contrary because we have to deliver fixes that stay within the confines of the specification it is at times quite challenging.
Now offset this to the work that we are currently doing with the MVC specification. Is the MVC specification HARD? Yes, it is too! Weird huh? You would think writing a specification from scratch is easy as we have a clean slate. Well, because I have been involved in maintaining Mojarra whenever I look at the features we might or might not include in Ozark (the MVC reference implementation) one of the questions I ask myself is "Is there a potential for a lot of maintenance on this feature?". E.g in Ozark we have a SPI so people can plug in new ViewEngines. And we have had external contributors delivering some ViewEngines (a BIG thanks goes out to them). The question came up whether or not we should include them in the reference implementation. Since we simply cannot support all we opted for making the contributed ViewEngines community supported extensions and keep the 2 ViewEngines officially supported by Ozark to be JSP and Facelets. Why? Well, both of those are also EE specifications!
Anyway, when you think about the JavaEE process and you wonder why sometimes things seem to go a bit slow, think about how long this software sticks around and that it has to meet the bar of TCK testing for every patch, bug fix or enhancement.
I hope you enjoyed a look at this perspective. Note of course this is MY perspective on things ;)
All brand names,logos and trademarks in this site are property of their respective owners.