This article was originally posted on the
Oracle Fusion Blog, Feb 24, 2015.
Last week, I had a question about SCIM's (System for Cross-domain Identity Management) approach to schema. How does the working group recommend handling message validation? Doesn't SCIM have a formal schema?
To be able to answer that question, I realized that the question was about a different style of schema than SCIM supports. The question was assuming that “schema” is defined how
XML defines schema as a way to validate documents.
Rather then focus on validation, SCIM’s model for schema is closer to what one would describe as a database schema much like many other identity management directory systems of the past. Yet, SCIM isn't necessarily a new web protocol to access a directory server. It is also for web applications to enable easy provisioning. The SCIM schema model is "behavioural" - it defines the attributes and associated attribute qualities a particular server supports. Do clients need to discover schema? Generally speaking they do not. Let’s take a closer look at schema in general and how SCIM’s approach supports cross-domain schema issues.
Many Definitions of Schema and Many Schema Practices
Looking at the definition in Wikipedia,
schema is a very broadly defined term. It can define a software interface, a document format (such as XML Schema), a database, a protocol, or even a template. There is even a new JSON proposal called
JSON Schema. This too is very different from XML Schema. It has some elements that describe data objects, but JSON Schema focuses a lot more defining a service and more closely resembles another schema format:
WADL.
With XML schema, the bias seems to be about “enforcement” and “validation” of documents or messages. Yet, for many years, the REST/JSON community has been proud of resisting formalizing “schema”. May it just hasn't happened yet. This does appear to be an old debate with two camps claiming the key to interoperability is either strict definition and validation, or strict adherence to flexibility or “robustness” or Jon Postel’s law [from
RFC 793]:
“Be conservative in what you do, be liberal in what you accept from others.”
12 years ago or so, Arran Swartz
blogged "Postel's law has no exceptions!". I found Tim Bray’s post from 2004 to be enlightening - "
On Postel, Again". So, what is the right approach for SCIM?
The Identity Use Case
How does SCIM balance the "robustness" vs. "verifiability" to achieve inter-operability in a practical and secure sense? Consider that:
There is often a cross-domain governance requirement by client enterprises that information be reasonably accurate and up-to-date across domains.
Because the mix of applications and users in each domain are different, the schema in one domain is will never exactly be the same as in another domain.
Different domains may have different authentication methods and data to support those methods and may even support federated authentication from another domain.
A domain or application that respects privacy tends to keep and use only the information it has a legitimate need for rather than just a standard set of attributes.
An identifier that is unique in one domain may not be unique in another. Each domain may need to generate its own local identifier(s) for a user.
A domain may have value-added attributes that other domains may or may not be interested in.
SCIM’s Approach
SCIM’s approach is to allow a certain amount of “specified" robustness that enables each domain to accept what it needs, while providing some level of assurance that information is exchanging properly. This means that a service provider is free to drop attributes it doesn't care about when being provisioned from another domain, while the client can be assured that the service provider has accepted their provisioning request. Another example, is a simple user-interface requirement where a client retrieves a record, changes an attribute and puts it back. In this case, the SCIM service provider sorts out, whether some attributes are to be ignored because they are read-only, and updates the modifiable attributes. The client is not required to ask what data is modifiable and what isn’t. This isn't a general free-for-all, that the server can do whatever it wants. Instead, the SCIM specifications state how this robust behaviour is to work.
With that said, SCIM still depends largely on compliance with HTTP protocol and the exchange of valid JSON-parsable messages. SCIM does draw the line with regards to the information content “validation” in an abstract sense like XML schema does.
Does the SCIM completely favour simplicity for SCIM clients? Not exactly. Just as a service provider needs to be flexible in what it accepts, so too must SCIM clients when a service provider responds. When a SCIM service provider responds to a client request the client must be prepared to accept some variability in SCIM responses. For example, if a service provider returns a copy of a resource that has been updated, the representation always reflects the final state of the resource on the service provider . It does not reflect back exactly what the client requested. Rather, the intent is that the service provider informs the client about the final state of a resource after a SCIM request is completed.
Is this the right model?
Let’s look at some key identity technologies of the past, their weak points and their strong points:
- X.500 was a series of specifications developed by the ITU in 1988. X.500 had a strict schema model that required enforcement. One of the chief frustrations for X.500 developers (at least for myself) was that while each server had its own schema configuration, clients were expected to alter their requests each time. This became particularly painful if you were trying to code a query filter that would work against multiple server deployments. If you didn’t first “discover” server configuration and adjust your code, your calls were doomed to fail. Searching became infuriating when common attributes weren’t supported by a particular server deployment since the call would be rejected as non-conformant. Any deviation was cause for failure. In my experience X.500 services seemed extremely brittle and difficult to use in practice.
- LDAP, developed by the IETF in 1996, was based on X.500, but loosened things up somewhat. Aside from LDAP being built for TCP/IP, LDAP took the progressive step of simply assuming that if a client specified an undefined attribute in a search filter, that there was no match. This tiny little change meant that developers did not have to adjust code on the fly, but could rather build queries with “or” clauses profiling common server deployments such as Sun Directory Server vs. Microsoft Active Directory and Oracle Directory. Yet, LDAP still carried too many constraints and ended up with some of the brittleness as X.500. In practice, the more applications that integrated with LDAP the less able a deployer was able to change schema over time. Changing schema meant updating clients and doing a lot of staged production testing. In short, LDAP clients still expected LDAP servers to conform to standard profiles.
- In contrast to directory or provisioning protocols, SAML is actually a message format for sending secure assertions. To be successful, SAML had to ensure a lot of optionality that depended on “profile” specifications to clearly define how and when assertions could be used. A core to its success has been clear definition of MUST understand vs. MUST ignore. In many cases, if you don’t understand an assertion value, you are free to ignore it. This opens the door to extensibility. On the other hand, if as a relying party you understand an attribute assertion, then it must conform to its specification (schema).
In our industry, we tend to write security protocols in strict forms in order to assure security. Yet we've often achieved brittleness and lack of usability. Because information relationships around identity and the attributes consumed are constantly variable, history appears to show that identity protocols that have robust features are incrementally more successful. I think SCIM as a REST protocol, moves the ball forward by embracing a specified robust schema model, bringing significant usability features over the traditional use of LDAP.
Post-note: I mentioned in my last blog post that SCIM had reached 'last call'. The working group has felt that this issue is worth more attention and is currently discussing clarifications to the specifications as I have discussed above.