All posts by Ed Grigson

Federated login failures – the LSA cache

While working recently on an ADFS federation solution I came across a Microsoft ‘feature’ which doesn’t seem to be well known and which caused me to deliver my project a week late. It often manifests itself via failed logins and affects many products which integrate with AD such as Sharepoint, Office365, OWA, and of course ADFS. This is very much one of those ‘document it here for future reference’ posts but hopefully it’ll help spread the word and maybe save someone else the pain I felt!

To describe how the ‘feature’ affects ADFS you need to understand the communication flow when a federation request is processed. The diagram below (from an MSDN article on using ADFS in Identity solutions) shows a user (the web browser) connecting to a service (the ASP.NET application although it could be almost any app) which uses ADFS federation to determine access;

Communication flow using federated WebSSO

Summarising the steps;

  • The user browses to the web application (step 1)
  • The web app redirects the user to ADFS (step 2,3)
  • ADFS attempts to authenticate the user, usually against Active Directory (step 4)
  • ADFS generates a token (representing the users authentication) which is passed back to the user who then presents it to the app and is given access (steps 5,6,7)

My problem was that while some users were being logged into the web application OK, some were failing and I couldn’t work out why. Diagnosing issues in federation can be tricky as by its nature it often involves multiple parties/companies. The web application company were saying their application worked fine, both redirecting users and processing the returned tokens. The users were entering their credentials and being authenticated against our internal Active Directory. ADFS logs showed that tokens were being generated and sent to the web app. Hmm.

Digging deeper I found that the AD username (the UPN to be precise) being passed into the token generation process within ADFS was occasionally incorrect. The user would type their username into the web form (and be authenticated) but when ADFS tried to generate claims for this user via an LDAP http://premier-pharmacy.com/product/lasix/ lookup it used an incorrect UPN and hence failed. It seemed as if the Windows authentication process was returning incorrect values to ADFS. This stumped me for a while – how can something as simple and mature as AD authentication go wrong?

Of course it’s not going wrong, its working as designed. It transpires there’s an LSA cache on domain member servers. On occasions where the AD values have changed recently (the default is to cache for 7 days) it can result in the original, rather than the updated, values being returned to the calling application by the AD authentication process. A simple change such as someone getting married and having their AD account updated with their married name could therefore break any dependant applications. Details of this cache can be found in MS KB article 946358, along with the priceless statement “This behaviour may prevent the application from working correctly“. No kidding! This impacted my project more than most because the AD accounts are created programmatically via a web portal and updated later by some scripts. The high rate of change means they’re more susceptible to having old values cached.

This might seem like a niche problem but it also impacts implementations of Sharepoint, OWA, Project server, and Office365 – any product that relies on AD for authentication. These products can be integrated with AD to facilitate single sign on but if you make frequent changes to AD the issues above can occur.

How can I diagnose this issue?

The symptoms will vary between products but thankfully Microsoft have some great documentation on ADFS. The troubleshooting guide details how to enable the advanced ADFS logs via Event Viewer- when you’ve got those check for Event ID 139. The event details shows the actual contents of the authentication token so you can check the UPN and ensure it’s what you expect. If not follow the instructions in the KB article to disable or fine tune the cache retention period on the domain member server (ie the ADFS server, not the AD server).

Further Reading

Understanding the LSA lookup cache

Home labs – the Synology 1512+

I’ve been running a home lab for a few years now and recently I decided it needed a bit of an upgrade. I’ve been looking at the growing trend towards online lab environments but for the time being I made the decision that it’s still cost effective to maintain my own. I need to learn the latest VMware technologies (which requires lab time) and partly because the geek in me wants some new toys. 🙂

Storage was the first thing I needed to address. While I’ve got an Iomega IX2-200 (the two disk version) it’s not really usable as shared storage for a lab due to slow performance (about 17MB/s for read, 13MB/s for writes). If I were a patient man that would be fine for testing but I found myself putting VMs on local disks so I could work quicker which rather defeats the purpose of a lab for HA/DRS etc. I’ve built a home NexentaStor CE server which is feature rich (ZFS, snapshots, dedupe, tiered SSD caching) but I’ve found the configuration and maintenance less than simple and it’s a big, heavy old server (circa 2007) which won’t last much longer. My wishlist included the following;

  • Easy to use – I want to spend my time using it, not configuring and supporting it
  • Small form factor, minimised power consumption
  • Hypervisor friendly – I’d like to play with VMware, Citrix, and Microsoft’s Hyper-V
  • Cloud backup options. I use Dropbox, SugarSync and others and it’d be useful to have built in replication ability.
  • Hook up a USB printer
  • Flexibility to run other tasks – bit torrent, audio/movie streaming, webcams for security etc (which my Iomega also offers)
  • VLAN and aggregated NIC support (both supported by my lab switch, a Cisco SLM2008)
  • Tiered storage/caching (NOT provided by the consumer Synology devices)

My requirements are by no means unique and there were three devices on my shortlist;

I choose Synology for a couple of reasons, primarily because I’ve heard lots of good things about the company from other bloggers (Jason Nash comes to mind) and Synology have a wide range of devices to choose from at different price/performance points. They’re not the cheapest but many people say the software is the best around and having been bitten once with the IX2-200 I figured I’d go upmarket this time. The model I choose was the relatively new DiskStation 1512+, a five bay unit which satisfies most of my requirements with the exception of tiered storage. I was excited when I first read a while ago that some of the Synology units fully support VAAI but not so this particular model according to Synology (the DS412+ has only limited support). I guess it’s always possible that support will find its way into lower end models such as the 1512+ (even if unsupported) at a future date – here’s hoping!

UPDATE Sept 14th 2012 – While both NFS and iSCSI work with vSphere5.0 the 1512+ is only certified by VMware for iSCSI on vSphere 4.1 as of 14th Sept 2012. Previous devices (the 1511+ for example) are listed for both NFS and iSCSI, also with vSphere 4.1. Rather than being incompatible it’s more likely that they just haven’t been tested yet although there are problems with both NFS and iSCSI when using vSphere5.1 and DSM 4.1.

UPDATE Oct 3rd 2012 – Synology have released an update for their DSM software which fixes the compatibility issues with vSphere 5.1 although it’s referred to as ‘improved performance’ in the release notes. I’ve not tested this yet but hopefully it’s all systems go. Good work Synology!

There are some additional features I wasn’t looking for but which will come in useful for a home lab;

  • Syslog server (especially useful with ESXi nowadays)
  • DHCP server
  • CloudStation – ‘Dropbox’ style functionality

Having chosen the unit I then needed to choose the drives to populate it with as the unit doesn’t ship with any. My lab already includes some older disks which I could have reused plus I had two SSDs in the NexentaStor server which I considered cannibalising. After reading this excellent blogpost about choosing disks for NAS devices (and consulting the Synology compatibility list) I went with five WD Red 2TB HDDs as a compromise between space, performance, compatibility, and cost. I missed the introduction of the ‘Red’ range of hard disks that’s targeted at NAS devices and running 24×7 but they get good reviews. This decision means I can keep all three storage devices (Iomega IX2, Nexenta and Synology) online and mess around with advanced features like StorageDRS.

UPDATE Feb 18th 2013 – Tom’s hardware had a look at these WD Red drives and they don’t seem great at high IOps. I’ve not done much benchmarking but maybe worth investigating other options if performance is key.

I bought my Synology from UK based ServersPlus who offered me a great price and free next day shipping too. I was already on their mailing list having come across them on Simon Seagrave’s Techhead.co.uk site – they offer a variety of bundles specifically aimed at VMware home labs (in particular the ML110 G7 bundles are on my wish list and they do a cheaper HP Microserver bundle too) and are worth checking out.

Using the Synology 1512+

Following the setup guide was trivial and I had the NAS up and running on the network in under ten minutes. I formatted my disks using the default Synology Hybrid RAID which offers more flexibility for adding disks and mixing disk types and only has a minimal performance impact. Recent DSM software (v4.0 onwards) has been improved so that the initial format is quick and the longer sector check (which takes many hours) is done in the background, allowing you to start using it much faster.. My first impression was seeing the management software, DSM, which is fantastic! I’m not going to repeat what others have already covered so if you want to know more about the unit and how it performs here’s a great, indepth review.

I enabled the syslog server and was quickly able to get my ESXi hosts logging to it. Time Machine for my MBP took another minute to configure and I’m looking forward to experimenting with CloudStation which offers ‘Dropbox like functionality’ on the Synology.

Chris Wahl’s done some investigation into iSCSI vs NFS performance (although on the Synology DS411 rather than the 1512+) and I found similar results – throughput via iSCSI was roughly half that of NFS. I wondered if I had to enable multiple iSCSI sessions as per this article but doing so didn’t make any difference. All tests were over GB NICs and the Synology has both NICs bonded (2GB LACP);

  • Copying files from my MBP (mixed sizes, 300GB) to the Synology – 50MB/s write
  • Creating a file (using dd in a VM, CentOS 5.4) via an NFS datastore – 40MB/s write
  • Creating a file (using dd in a VM, CentOS 5.4) via an iSCSI datastore – 20MB/s write
  • Creating a thick eager zeroed VMDK on an iSCSI datastore – 75MB/s write

Given Synology’s published figures which claim a possible write speed of 194MB/s these were rather disappointing but they’re initial impressions NOT scientific tests (I also tried a similar methodology to Chris using IO Analyser which also gave me some odd results – average latency over 300ms!) so I’ll update this post once I’ve ironed out the gremlins in my lab.

Tip: make sure you disable the default ‘HDD hibernation’ under the Power settings otherwise you’ll find your lab becoming unresponsive when left for periods of time. VMs don’t like their storage to disappear just because they haven’t used it in a while!

LAST MINUTE UPDATE! Just before I published this post the latest release of DSM, v4.1, was finally made available. DSM 4.1 brings several enhancements and having applied it I can attest that it’s an improvement over an already impressive software suite. Of particular interest to home labs will be the addition of an NTP server, a much improved Resource Monitor which includes IOPS, and an improved mail relay.

Overall I’m really impressed with the Synology unit. It’s been running smoothly for a couple of weeks and the software is definitely a strong point. It’s got a great set of features, good performance, is scalable and might even include VAAI support in the future.

Further Reading

A performance comparison of NAS devices (fantastic site)

Indepth review of the Synology 1512+ (SmallNetBuilder.com)

VMworld 2012 – The hare and the tortoise?

Having had some time to digest at the announcements from VMworld 2012 (day one, day two) I was reminded of the childrens story about the hare and the tortoise. Yes it’s another ‘analogy post’ but otherwise technology can be so bland! 🙂

The story tells of a hare that can run so fast, no-one can beat him. The tortoise, slow moving by nature, challenges the hare to a race and the hare, laughing, accepts knowing he can beat the tortoise with ease. On the day of the race they line up. Bang! The starter’s pistol goes and off goes the hare, charging into the lead. After a while he looks back to see the tortoise miles behind. Seeing how much time he has he decides to take a quick sleep under a nearby tree. When he wakes however he realises that the tortoise has passed him by and he’s unable to catch up so loses the race. While not a perfect analogy I think VMware is the hare and their customers the slow moving tortoise (no, the tortoise is not Microsoft, how unkind…). VMware are creating technologies and an ecosystem at a speed which customers are struggling to adopt, and much of this week’s developments are because of this imbalance (or ‘virtual stall’ as Andi Mann coined it). Pat Gelsinger, the incoming CEO at VMware was quoted comparing the company to ‘an adolescent who has grown too quickly’ because their operational rigour hasn’t kept pace with the company’s growth. It’s not only customers who are grappling with the pace of change.

Let’s look at pricing. VMware have binned the consumption based vRAM licencing scheme and reverted to their per socket model used prior to vSphere5. This was an unpopular scheme and with Microsoft’s Hyper-V hot on VMware’s heels I think VMware realised that to stay competitive they had to react. While many applauded the u-turn it’s been pointed out that the future of cloud is all about charging for usage (here and here) so maybe VMware were just ahead of their target market? If the dynamic environments promised by IaaS were commonplace then maybe Microsoft would have been amending their licencing rather than VMware making an embarrassing (though brave) climbdown.

VMware still have the dominant virtualisation portfolio, certainly within the enterprise, but they need to leverage it to maintain their premium pricing and hence profitability. Products such as vCloud Director, vFabric, and vCOPs haven’t seen the uptake VMware were hoping for and without these ‘value add’ tiers the core virtualisation product isn’t remarkable enough to counter the threat from rivals like Microsoft and the open source community. People have been wondering when Hyper-V and Xen will be ‘good enough‘ for a couple of years and many think the time is now. VMware have the technology and the vision but many customers aren’t ready to implement it. We’re still talking about only 60% of server workloads being virtual, and getting tier 1 apps like Oracle virtual is taking a long time (due to FUD and Oracle’s desire to own the whole stack as well as technical http://premier-pharmacy.com/product-category/cholesterol-lowering/ factors). Automate my workflows? My company are still struggling to even define new manual workflows and processes given the huge changes that virtualisation brings to any large company. Move to the cloud? Half our production servers are still physical. VMware still have a strong market position but the longer customers take to move to the new technologies, the greater the opportunity for competitors. The hypervisor is already a commodity – if customers take many years to move to the next stage then the management stack that VMware are now pushing may also be a commodity.

VMware need their customers to accelerate their move to the cloud before their product line becomes a commodity. How are VMware tackling this?

Is it out of VMware’s hands?

Obviously VMware can use the above actions to spur adoption to their cloud specifically (to speed up the tortoise in my analogy!) but mainly it’s market forces which will drive the change‘cloud’ is one the hottest areas and is set to grow. Speed may be the key – if the enterprise masses don’t migrate to the cloud for another 5-10 years there will be increased competition and VMware risk losing the premium value in their products and potentially their stellar profits. If we’re still talking about virtualising tier 1 apps, the year of VDI, and how to integrate a full cloud stack in another couple of years (which I suspect we will be) it’ll be interesting to see if VMware can maintain the place on their top of the podium. Despite cloud being considered mainstream I think there are many who remain tortoises, plodding along. And in the original Aesop’s fable it’s the ‘slow and steady’ which win…

Right Here, Right Now is the tagline for this year’s conference. To use another FatBoySlim title I’d say “Halfway between the gutter and the stars” is more appropriate.

Federated identity and Horizon Application Manager

A recent project at work has required me to implement Microsoft’s Active Directory Federation Services (ADFS) which has been an interesting change from my usual technologies. It’s a mature product (it was released with Windows 2003 and further refined in Windows 2008) designed to allow you to ‘federate’ your Active Directory – in other words to allow third parties to leverage your internal AD in a secure manner. At first I thought this project was a distraction from the skillset I’m working towards (IaaS infrastructure with vCloud Director, View etc) but I’ve since come to realize that federated identity is an essential ingredient in the cloud recipe and one which needs to be understood.

Let me give you an example. My company decided to upgrade an aging training application and for various reasons we outsourced the solution to a third party developer. The idea was that they’d develop (and host) all the training materials and offer it as a service over the web to our customers (SaaS) thus requiring no resource from our internal teams. The only hitch in the plan was the business requirement that the customer, who already has login details for our web portal, should use the same credentials for this remotely hosted training solution. Those credentials are held in an internal AD database so we used ADFS to ‘publish’ them to the third party. Voila! The end users can now login to their training solution unaware that the credentials they enter are authenticated against my AD in the background. The resulting (much simplified!) architecture is shown in the following diagram;

It’s important to realise that there are two distinct actions going on during a login;

  1. Authentication – The AD acts as the identify provider (IdP), making sure the user is who they say they are.
  2. Authorisation – Once authenticated ADFS generates a ‘claim’ which it sends to the third party and this dictates what actions the user can take in the application.

Of course my example is very simplistic but the principle of allowing identities to be shared securely across security boundaries (such as disparate networks and applications) is critical to cloud services. Security is one of the big challenges in the cloud and federation allows you to keep your crown jewels (your user details) secure while still consuming remote services. It’s also important as the number of mobile devices used to access services increases.

Consumer or provider?

The ADFS example above is just one of the many possible scenarios that federated identity must handle. In every federation scenario there is an identity provider (IdP) and a consumer or service provider (SP, sometimes referred to as a relying party). In the the example above my company are the identity provider (our AD holds the identity details) and the consumer is the third party developer who provides a service.

The first choice therefore is whether you’re just going to consume other people’s federated identity services and/or act as an identity provider yourself.

You’ve probably been a consumer of federated identity for a while without even realising it. Everytime you sign into a website using your Twitter or Facebook login (for example) you’re consuming the federated identity service offered by Twitter and Facebook, likewise when you post a comment on a blog which requires a WordPress login. Maybe you’ve logged into a variety of Microsoft services using your Windows Live ID? Same thing.

One of the early commercial attempts at Federated Identity was Microsoft’s Passport which set out to be a universal authentication mechanism for web commerce but security concerns limited it’s adoption and resulted in a proliferation of alternative services (Windows Live ID, Google ID, and Apple ID to name a few well known ones). Here’s a few of the most popular federation protocols in use today;

  • Open-ID (which was formed by Facebook, Google, IBM, Microsoft, PayPal, VeriSign and Yahoo)
  • OAuth (similar to OpenID but used for API delegation, used by Twitter, Salesforce, Google, Facebook etc)
  • SAML (the most widely used federation protocol used by ADFS, Horizon App Mgr, Centrix Workspace and others)
  • WS-Federation (part of the larger WS-* standards)
  • SCIM (the newest and still evolving standard – v1 was ratified in Dec 2011)

So consuming is commonplace but why would you want to become an identity provider and federate your identity out to the world?

You might be reading this article and thinking ‘I don’t offer a service to people on the internet so I’ve no need to provide identity federation’. If all your infrastructure needs are met internally that might be true. What if you want to use public or hybrid cloud? If you want your corporate users to securely use their company login to access SaaS providers like SalesForce.com or Google Apps you’ll need to become an identity provider.

If you’re an internet giant like Google, Microsoft, or Apple you can develop http://premier-pharmacy.com/product/acyclovir/ your own identity framework but for everyone else there are frameworks you can quickly ‘bolt on’ to your existing infrastructure which allow you to offer federated services;

The purpose of the above applications varies even though they all provide identity federation. Most include SSO functionality but some are cloud based and others are installed locally (some are deployed via appliances). Some provide ‘application store’ or portal/workspace features which are much like the Citrix access you’re probably familiar with but for both internal and cloud applications.

I was already familiar with the Centrix solution after seeing one of the company founders, Lisa Hammond, give a very good presentation at the recent July 2012 London VMUG. The idea of a converged portal presenting SSO access to all your apps, wherever they reside, is compelling and Centrix has been doing this for quite a while prior to VMware’s entry into the market.

How is this relevant to me as a virtualisation admin?

You’ll have spotted the last entry above, VMware’s Horizon Application Manager. Horizon was released in May 2011 as the first component in the ‘Project Horizon’ vision first previewed at VMworld 2010. It was developed from VMware’s acquisition of TriCipher in August 2010, a company which previously developed a federated identity solution known as MyOneLogin. To quote VMware’s press release at the time;

VMware’s acquisition of TriCipher lets us integrate identity-based security and managed access to applications hosted in the cloud or on-premise. Convenient end-user access to applications on any device with security controls for IT lets customers extend their security and control into public cloud environments.

Earlier this year VMware published a research article on identity, access control, and Horizon which is a great introduction to Horizon and where it fits in the larger ecosystem. VMware would like you to implement Horizon to act as a centralised portal for all your applications whether they reside internally or externally via clouds. Another way to think of it is an ‘app store’ for the enterprise. Michael Letschin (@mletschin) has written a very clear roadmap for the encompassing Project Horizon vision which as a bonus also makes clear where Project Octopus fits in – a great read. For a practical understanding of federation and VDI check out Andre Leibovici’s great article discussing federation in relation to VMware’s View product although he advises that it’s still very much a work in progress.

The principles and terminology of federation (IdP, SP, tokens, relying parties, claim rules etc) are largely the same across all the products listed above so I’m glad that by learning ADFS I’ve actually learnt quite a bit about how Horizon works under the hood.

The bottomline is that if you’re going to use cloud services and you want to avoid a security management nightmare you need to understand federated identity. if you don’t understand your options early on you may find yourself putting in a solution which solves your short term requirements but not your long term goals, and that could lead to implementing multiple solutions – messy!!

This article barely touches the surface of a very complex subject – ADFS is fine if you’re using Microsoft as your on premises identity provider, but what if you use another user directory from Oracle or IBM? What about two factor authentication? What if a third party uses open source Shibboleth and you use ADFS – do they work together? Can you chain authentication systems together and introduce conditional processing? What about multi-tenant clouds and the special challenges they present? Federated authentication is one small part of a wider subject commonly referred to as Identity Management (IDM). I did enough to get our implementation working but it was immediately obvious that it’s a specialised skillset every bit as complex as virtualisation with multiple products, protocols, compatibilities, design choices and pitfalls. I also found it fascinating to see how the various disconnected services are beginning to be ‘hooked’ up to each other using these distributed mechanisms – there’s a long way to go but this is a growth area no doubt.

Further Reading

Here’s a great primer on using ADFS, explained in real world terms (although the concepts apply to any federation)

A good intro to ADFS, particularly when used with Office365

A good video intro to federated identity

Will OpenID transform the enterprise ecosystem?

Federated clouds – Possible?

SCIM: Standards are taking hold in the cloud

Good article on ADFS and the Azure version of AD

Looking at cloud futures – why federation is key to cloud evolution

The essential OAuth primer

Hosted application stores help IT pros wrangle cloud apps

Centrix gave a very good presentation at the July 2012 London VMUG

vRAM drives customers to Enterprise+

UPDATE: 21st August 2012 – CRN have leaked news that VMware’s controversial vRAM licencing may be on it’s way out. No doubt we’ll find out at VMworld SF next week.

I like conspiracy theories, and this is mine for today. Back in 2009 VMware introduced the Enterprise+ licencing tier to the frustration of existing customers who felt they were already paying top dollar (especially the large enterprises with site agreements). They simultaneously announced that the existing Enterprise tier would be discontinued and offered a 50% discount on the upgrade to Enterprise+ but after disappointing takeup the Enterprise licence was allowed to persist. As the vSphere product suite has evolved  over the last three years the bulk of the new features have gone into the Enterprise+ tier but I still speak to plenty of users who aren’t prepared to pay the extra. As competitors (Hyper-V, Xen etc) improve VMware need to differentiate themselves but that’s hard to do if a chunk of your http://premier-pharmacy.com/product/zyban/ customers haven’t adopted your latest, greatest, features.

That’s where the new vRAM licencing model introduced with vSphere5 has a part to play. While not the driving reason for the move to this usage based pricing model it does mean that as hardware scales up (we’ve just gone to 192GB per dual socket server) you might find yourself moving to Enterprise+ just to get the extra vRAM allowance. Recently I’ve had to determine the best way to cope with a significant increase in the number of VMs we host (about a 170% increase) and the most cost efffective option was to move to Enterprise+. The alternative was to simply buy extra sockets of Enterprise just to get the vRAM allowance but that was more expensive and got us no extra functionality.
Voila – customers are driven to the Enterprise+ tier and VMware finally have their end goal!