Tech-Ed-2008-Deploying MOSS with End-to-End Kerberos Authentication, Matt Munslow
In this 2008 Europe presentation, Munslow does a fair job of explaining material occasionally seemed complicated even to him. Not brilliant, mind you, but fair. He needs to work on posture; he tended to rock right and left like a metronome. And the video reproduction is lamentable; the fuzzy quality of the close-ups makes the demos nearly unusable. Nonetheless, there is some good insight about Kerberos and troubleshooting that do much to make this worth the effort. This 400-level presentation from Europe in 2008 gets very serious very fast. Note that this presentation focused on SharePoint 2007. It retains value even in an SP2010 world due to its methodology and Kerberos explanations.
Munslow focuses on three logical blocks of material: Concepts, Configuration, and Troubleshooting.
First, Concepts. How does Kerberos work, and how does this affect our MOSS implementations? Munslow focuses on mechanisms within Kerberos which are germane to implementing SharePoint 2007.
1. Service Principal Names. Everything in a Kerberos realm (or AD domain) has a unique service principal name, and getting our system to work with it relies completely on correct name configuration. Clients use these service principle names to identify and authenticate with services like SharePoint. He gave an example of what these look like:
HOST/server (“server” means the NetBIOS name for the server) and HTTP/www.contoso.com
“HOST” is one class of service principal name. So is HTTP.
2. Keys. Used to encrypt data; these are normally derived from hashed passwords. There are two types- the long-term key, and the session key. All domain objects have a long-term, master key, for authentication.
3. Tickets. Data Encrypted with keys, used by clients to authenticate against services.
4. Key Distribution Center. Managed by AD and the Key Distribution Center (KDC). This is comprised of a Database of objects and their related keys, an Authentication Service (initiates communication between client and AD) to provide ticket-granting-tickets which are used by clients to engage the third component of all this: the Ticket Granting Service. This is the source of the tickets used to authenticate against SharePoint.
5. Sub-Protocols. There are three exchanges related to this. First is the Authentication Service Exchange, used by clients to get a ticket for tickets. Then there is the Ticket-Granting Exchange, which is the issuance of a ticket to the client. Finally we have the Client/Service Exchange, in which the client authenticates against SharePoint. Each represents an opportunity for things to go awry, and Munslow wisely recommends understanding them for help in troubleshooting.
Next, let’s review the authentication process and how it works. Here we see there are four steps. First, a client negotiates with an IIS server running SharePoint. An anonymous request is denied, but the denial includes information on how the client can negotiate access. Second, the client requests a ticket-granting ticket. This is done by the client seeking such a ticket from an AD server. This request relies on recognized client ID information and a “nonce” (number used once) value. AD replies with a session key that allows the client to get other keys; this key is encrypted with the AD master key, serving to validate the request with other servers. The nonce is also included, which serves to verify to the client that this ticket is from the server it originally contacted. And the client’s own master key can decrypt all this information. Now the client has a ticket-granting ticket and a session key. In Step Three, the client relies on these to get a ticket for access to SharePoint. The client asks for a ticket for access to SharePoint. The ticket-granting AD server gives this in a key encrypted with a session key. Note that the AD does not have a copy of the session key. It creates it and sends it, but does not retain it. Thus the ticket-granting ticket is crucial. This leads to a session key encrypted with a session key specific to the SharePoint service, and which will work, of course, only on the target server. Now the client has what it needs to access the SharePoint server.
Now, in Step Four, the client uses these credentials to contact the SharePoint server. It sends the authenticator information again, with the service ticket provided by AD. The service on the SharePoint server uses its own session key to unlock the ticket from the client, discovering all is proper. Then it opens a session with the client.
This is the theory and the concepts behind Kerberos with SharePoint; they should be familiar to anyone who has studied PKI. Next, Munslow explains the Configuration process.
Munslow presents a fairly generic customer topology: 3 load-balanced WFEs (Web Front Ends), 3 application servers (one index and two query), and a backend cluster (2 active and 1 passive). He sought to reproduce this in VMware, kerberizing it along the way. First they did the web applications. Kerberizing them required creating a domain service account, and using this account to create the web application; this is the account the application uses to represent itself. Next, DNS entries were created, but not with CNAMEs. Instead, it was necessary to rely on A records, which Munslow pointed out is non-intuitive. In Step Three, they created SPNs (Service Principal Names). They utilized a tool from the Windows 2003 Server Resource Kit, called setspn.exe. This creates SPNs for the NetBIOS name of the service, and for the FQDN. Fourth, for the web application, you configure it in SharePoint Central Administration to use Kerberos.
Now, this was all in reference to a web app. There can be complications with this, since a non-default port is used; during installation, a non-default port is used, or a random one is assigned. Also, issues can arise from the fact that IE 6 (used on Windows XP) does not properly construct the SPN. The non-default port would be left off. Microsoft has a fix for IE that corrects this through a registry change. Once done, things started to work. Note that the use of W2K3S or W3K8S does not affect this, since it is, after all, a client issue.
A demo followed. Unfortunately the screen was too poorly recorded for this to be usable at all. Even maximizing the screen was useless for rendering this intelligible. In light of the fact that several cameras were used to record this session, it’s quite surprising that the end result was such a disappointment. Generally, though, working within the Central Administration to enable Kerberos was straightforward. As straightforward as a topic like this can be, anyway.
After the web apps have been kerberized as show above, the next stage is to kerberize the shared services provider infrastructure, which is done by this clear but cumbersome stsadm command:
Stsadm.exe –o setsharedwebserviceauthn -negotiate
This does the same thing for the shared service provide which we did for the web apps. However, Munslow discovered when this was done, search settings would break, with seemingly random error messages.
Issue #1. Munslow describes the torturous process of troubleshooting this completely undocumented problem. He found enlightenment in the shared service provider infrastructure, specifically in the Office Server Web Services Web Application, which is created when the farm is created. It runs with the application port called the Office Server Application Port and with the identity of Network Service. Note that this service runs on non-default ports: 56737 and (secure) 56738. Every time a shared services provider is created, it created a virtual directory in that web app with its own application port. This runs as the identity of the account used to create the SSP, that is, the SSP service account. The other important thing to note here is that it is the .NET client which uses the web application to communicate with other SharePoint servers. This is how inter-server communication between SharePoint servers takes place.
The thing is, the .NET client cannot use non-default ports to bind to the server.
Issue #2. Another problem occurs when you have more than one Shared Services Provider. This is common when using Excel Services. Here the difficulty occurs when kerberizing the services. The system’s first impulse is to use the same name (albeit with a different user) for each, which is not allowed, and will lead to errors, due to having more than one SPN.
Issue #3. Yet another, and more widely encountered, issue pertains to the indexer, which cannot crawl Kerberos web applications on non-default ports. To overcome this, an NTLM web application had to be created and extended as a Kerberos. This the crawler would use the NTLM version, and users would use the Kerberos version. Munslow’s group found that a crawler could crawl a Kerberos application as long as it was on a default port, such as 80 or 443. Otherwise it would not work. His work-around was to avoid using any non-default ports for any crawling web app. No more elegant solution for this currently exists.
For Munslow’s group, the interim solution for the SharePoint group was to discontinue the use of Kerberos. This was disappointing but practical. More than half a year after Munslow left this effort, a new format for the SPN was introduced which overcame this: the MSSP.
The MSSP requires the creation of two SPNs for every server in the farm which is participating in shared services, which means one for the non-secure port 56737 and one for the secure port 56738. This allows you to name the shared services provider; thus solving many of the issues Munslow mentioned earlier, since it gets rid of the problem of duplicates. How to make this work? First, the infrastructure update must be run on all SharePoint servers. Then a manual registry tweak must be run, adding a DWORD entry, set equal to 1, in this key:
After this, do a reboot.
Then register two SPNs for every server involved with shared services, as noted.
Finally, go to the command prompt, and run
Stsadm.exe –o setsharedwebserviceauthen –negotiate
That should do the trick. A demo followed; it was partially intelligible.
To begin with, note that this is only needed if SharePoint is to pass credentials. It is not necessary for simply logging in to machines. For services not hosted by the SharePoint server, such as RSS, for instance, we are confronted with the so-called “single-hop rule”, which is meant to stop man-in-the-middle attacks. The goal here is to allow SharePoint to represent a client in approaching these other services. This can be done as follows:
1. Register the appropriate SPNs for the service.
2. The server which does the impersonation (or, more charitably, representation) must be trusted for delegation. This can be done (easily) in Active Directory Users and Computers through a constrained delegation. Then, also within ADUC, add the service account of the RSS server.
3. The impersonating service also must be trusted for delegation. This account too is trusted for constrained delegation with the RSS account.
With this done, the user will be able to be on a SharePoint server and use an RSS web part on a SharePoint page.
And that’s how delegation works.
For me, this is always one of the best and most rewarding portions of any presentation, and Munslow culled through his own customer experience to assemble these observations for responding to Kerberos-related problems.
- Duplicate SPNs. This is the first place to look. As noted above, this can be a source of great mischief.
- CNAMES in DNS. The only workaround is using A records.
- Use NETMON on the wire between the client and ASD and client and service, and look for errors. Also check the performance of sub-protocols.
- Check sub-protocols. See above.
- User Event Viewer. If you do not see Kerberos 540 events in Event Viewer, then there’s a problem.
- Turn on Kerberos Logging for more detailed logging.
He also suggested some resources; perhaps the best of these were:
www.microsoft.com/Kerberos for general information on the installation process for Kerberos authentication. Very thorough and insightful.
Other resources were mainly MIT’s take on Kerberos and some RFCs.
So, what’s the value in his presentation? Weakly recorded demos, weak body language in presentation, and some daunting material. Not only that, the world is moving to SharePoint 2010. True on all counts, yet despite all this, Munslow’s presentation merits some attention for two reasons:
1. The whole world has not switched to SharePoint 2010 yet, and SharePoint 2007 will remain in use for many years. Thus its Kerberos performance needs to be well-understood.
2. The advice on troubleshooting is very good, and generally applicable for all versions of SharePoint.