eZPublish - User account limits and solutions
Working on one of my projects with eZ Systems French consultant Jérôme Cohonner, we got an excellent conversation on how users were handled in eZPublish and how sometimes this could lead to some troubles. This post will give you some clues on how important users management can be, what are the limits and some common solutions to get the best way of doing things. I will not talk about SSO or procurement systems as I have already dealt with or as it's out of the scope.
Users management, some concepts
Let's get back to the roots and have a look on principle concepts of AAA :
- Authentication : the fact that someone can prove that he is who he is by any way of proof : password, certificate, tokens, fingerprint...
- Authorization : the fact that someone has been given some rights, credentials, habilities to do something.
- Accounting : the fact that someone activities could be observed, monitored, audited to get data to be exploited after.
Generally, in all IT systems, these concepts are implemented in different ways, together or alone, merged with other systems or not. When an IT solution becomes complex, you will need to provide a strong user management strategy to be sure that all will work together. The strategy is defined by combining different approaches that could be listed like this :
- Authentication : handle authentication ways, simple to complex, available everywhere in the system
- Data : handle user data and make it available everywhere in the system
- Organization : handle an organization of people, available everywhere in the system
- Access : handle rights, what people are allowed to do
Moreover, all those approaches are submitted to the centralization dilemma : do we need to centralize all those things in one service or not ? If one of these approach is not centralized, do all the software of our solution are able to do the job ?
On each IT projects, choices are done, sometimes depending on the software capabilities, sometimes not. The most important is to know where you want to get the maximum flexibility.
eZ Publish and the limits
In eZ Publish, users are stored and considered as content objects which is a choice in itself. It means that the accent has been set on data management before everything. The cool thing with that is that you can handle your users as pages (as they are nodes) and that you can add and remove attributes as you want. Best, data can be versionned. The only thing you have to do is to ensure that the user content class gets the User account datatype.
You can also plug the LDAP Login Handler to access a remote directory. The mechanic is quite good. At the authentication, the user provides its login and password. eZ Publish will try to log this user in the LDAP and if it succeeds, eZ Publish will create an eZ user or update it if it does not exist. Then the user is authenticated and receive eZ Publish credentials.
It's also possible to use the multi locations mechanism to get some flexibility on the role assignment. For example, as you can set a user in several groups, you can give each group a different role so multi located users will inherit all the roles from all their parents. You can look at one of my former post about content design, it explains how to organize your content in eZPublish.
The limits of this model are :
- Data and user account are in the same place and that the data container is not efficient when there is a lot of users.
- If user data has to be shared, size and count of data really imports as it has to be managed locally or remotely.
- The remote model implies a direction on the way data are managed. Data needs a reference that should be unique at one time and on which all other software must refer. It also implies that you will have to have a simple model in your directory as rights must be managed locally.
Some examples :
- Try to have a user class with 50 class attributes, which is possible if you are storing every information of your users at the same place. Create then 100 000 users, that is quite normal for a big website. Requests that are made against the generic model of eZ Publish are just to slow for standard fetches. Having a lot off attributes in a user is quite common and is resulting from very strong business needs or technical needs. For example, you may need an attribute to avoid to use a directory. This has been explained in one of my former post about content design.
- Having a very big LDAP with a lot of users with a lot of attributes can be long to synchronize with regular scripts.
- The LDAP Login Handler is very powerful but a bit tricky to master. If you got a complex LDAP, your LDAP configuration will be crazy. Moreover, there's no script to impact an LDAP by users updates in eZ Publish.
eZ Systems is refactoring, time after time eZ Publish's model so everything is split and highly efficient little by little.
Solutions to common issues
Case 1 - It's too late : there's too much data in the user !
It can give you some troubles on performance when you reached high number of users. The main issue is that user data are in eZ Publish and not away. The first point is to know if you need some customization or finally if you just need the user to be logged in to just access some private area.
Solution 1.1 - Store it elsewhere
Make a datatype or extend the eZ User type to only let eZ Publish manage what it needs to authenticate the user, I mean the eZ User Account. Ok, it's cool to have users as pages but in real life, customers don't really want a picture directory of the whole members of the web site. It's not done today in eZ but the global approach of un-content-ization has began. More recently, the eZ Comments extension provides comments for everything in eZPublish but is not set with the classic content mechanism.
The approach of the datatype may not be the right one. You may except some troubles, depending on your storage mechanism. I was thinking about several containers, like LDAP (of course), a custom SQL table, even a file (XML or whatever).
Solution 1.2 - You don't (really) need that
Sometimes troubles are coming from a bad interpretation of the customers need. Sometimes, people don't want data to be hold by the system, it's just an helper for them and avoid to get information elsewhere. Sometimes, people, I mean the main population of the website, doesn't know that there are data about them loaded in the system. The point is that finally you don't need the data, you can do without it.
An example : your customer is telling you that their eZ Publish instance is holding 400K users with a lot of data. This data is not shown on the front side because it's an institutional web site with a poor logged in section. The data is shown in the back office to the webmaster in the user section (so for 1 guy).
One good approach is the following : ask why customer needs all these data and try to figure out if the data is stored elsewhere and if it can be accessed in a asynchronous way, by requesting a LDAP or other external data source.
The solution that will work is to set up a meta user by big business role you are using. For example, define an HR user, IT user, a Board user and so on. When people are logging in, check access against the source and then log the user with eZ Meta User predefined. You will get a severe reduction of your members count : 400K to 4 !
Case 2 - It's not too late : how to share users data ?
The second point is the way you can share users between different application of your IT system. We can think of it on two aspects : the fact to share data (first name, last name and so on) and the fact to authenticate people. It's different and this could be implemented in different ways.
Solution 2.1 - Define a reference
The most important thing in your IT system is to define an architecture block that will handle a centralized reference of user data, both for the data and the authentication. From a strict architecture point of view, directories can provide both features and that's not good. However, as the password is generally the cheapest and easiest way to authenticate people, architects do not recommend two services and prefer to have only layer for this.
So, the most important thing is to use an external service to hold the data and the password mechanism.
Solution 2.2 - Purely share user data
Sometimes, it's a bit difficult to find the allocation of your data between the remote (I mean the reference users data) and the local (data from your application). At this point, you may have three choices with drawbacks and advantages :
- Full local : so why put a remote reference ? :-)
- Full remote : all your data are hold in a remote system and you need to request it each time you want to have an information.
- Half remote - half local : data are stored at remote's and some are synchronized (or not) with local.
As your system needs some consistency, you have the choice to centralized everything at the remote reference but this is implying a bottleneck. Moreover, you will have to get a fine strategy to synchronize remote and local, replicate data from one to the other.
Questions to ask yourself :
- Do all data have to be in the reference ? Does the business piece of data that I manage in my application can be shared with others ?
- What is important, performance or consistency ? Do I have to store all data inside the directory ? How do I synchronize all this ? What if I have update at local's ?
For eZ Publish, it's quite simple as the User mechanism is not so efficient with a lot of users. So the best way, if possible, is to
- simplify the user attributes to the minimal user account datatype attribute,
- store everything which can be shared out in the directory,
- store all others attributes in another place (for example a custom datatype that writes a list of fields in a table).
Conclusion
Users management is not so easy, there's a lot of thing to think and some merged concepts that make us difficult to take decisions about how to manage the users in an IT system. This is a common issue that is shared by all companies aver the world and leads to interesting solutions like oAuth or OpenID.
Regarding to eZ publish, the difficulty is coming from the technical lock inferred by the eZ User Account type that forces you to have a data user instance in eZ Publish. My recommendation is to quickly kick out the user account (login, email and password) from the user class. This will lead to the division between data and authentication and then it will be possible to authenticate someone without any data inside eZ publish. Then if we actually need a node, we can make a synchronization or an after account creation trigger to generate the node. This mechanism has to be disengageable.
