August 27, 2014

Is your software fixed?

A common query seen at Red Hat is “our auditor says our Red Hat machines are vulnerable to CVE-2015-1234, is this true?” or “Why hasn’t Red Hat updated software package foo to version 1.2.3?” In other words, our customers (and their auditors) are not sure whether or not we have fixed a security vulnerability, or if a given package is up to date with respect to security issues. In an effort to help our security-conscious customers, Red Hat make this information available in an easy to consume format.

What’s the deal with CVEs?

Red Hat is committed to the CVE process. To quote our CVE compatibility page:

We believe that giving our users accurate and complete information about security issues is extremely important. By including CVE names when we discuss security issues in our services and products, we can help users cross-reference vulnerabilities so they spend less time investigating and categorizing security events.

Red Hat has a representative on the CVE Editorial Board and declared CVE compatibility in April 2002.

To put it simply: if it’s a security issue and we fix it in an RHSA it gets a CVE. In fact we usually assign CVEs as soon as we determine a security issue exists (additional information on determining what constitutes a security issue can be found on our blog.).

How to tell if you software is fixed?

A CVE can be queried at our public CVE page.  Details concerning the vulnerability, the CVSS v2 metrics, and security errata are easily accessible from here.

To verify you system is secure, simply check which version of the package you have installed and if the NVR of your installed package is equal to or higher than the NVR of the package in the RHSA then you’re safe.

What’s an NVR?

The NVR is the Name-Version-Release of the package. The Heartbleed RHSA lists packages such as: openssl-1.0.1e-16.el6_5.7.x86_64.rpm. So from this we see a package name of “openssl” (a hyphen), a version of 1.0.1e (a hyphen) and the release is 16.el6_5.7. Assuming you are running RHEL 6, x86_64, if you have openssl version 1.0.1e release 16.el6_5.7 or later you’re protected from the Heartbleed issue.

Please note, there is an additional field called “epoch”, this field actually supersedes the version number (and release), most packages do not have an epoch number, however a larger epoch number means that a package can override a package with a lower epoch. This can be useful, for example, if you need a custom modified version of a package that also exists in RPM repos you are already using.  By assigning an epoch number to your package RPM you can override the same version package RPMs from another repo even if they have a higher version number. So be aware, using packages that have the same name and a higher epoch number you will not get security updates unless you specifically create new RPM’s with the epoch number and the security update.

But what if there is no CVE page?

As part of our process the CVE pages are automatically created if public entries exist in Bugzilla.  CVE information may not be available if the details of the vulnerability have not been released or the issue is still embargoed.  We do encourage responsible handling of vulnerabilities and sometimes delay CVE information from being made public.

Also, CVE information will not be created if the software we shipped wasn’t vulnerable.

No errata yet? Am I affected? Reading the Bugzilla whiteboard:

In some cases an issue is too new we may not have released errata for it yet. In this case you can check the CVE bug in Bugzilla to get some more information. For example with the Heartbleed issue you can simply look it up in Bugzilla. So great, now you know that we know about the vulnerability, but how can you tell if the products you use are affected? Red Hat Product Security uses the Bugzilla whiteboard to hold a list of affected products, you’ll need to be logged in to fully view it.  Accounts are free to create, just select on “New Account” at the top of the Bugzilla screen.

bugzilla-header-example

The whiteboard is located towards the bottom of the left side:

The easiest way to access the full whiteboard is to copy and paste the contents into a text file.  The first part you’ll likely want to skip:

impact=important,public=20140407,reported=20140407,
source=upstream,cvss2=5/AV:N/AC:L/Au:N/C:P/I:N/A:N,

as the latter portion is more relevant to administrators:

rhel-5/openssl=notaffected,rhel-5/openssl097a=notaffected,
rhel-6/openssl=affected,rhel-6/openssl098e=notaffected,
rhel-7/openssl=affected,rhel-7/openssl098e=notaffected,
rhes-2.1/openssl=affected,openstack-rdo/openssl=affected,
eap-5/openssl=notaffected,eap-6/openssl=notaffected,
jboss/others=notaffected,rhev-m/rhev-hypervisor=affected,
fedora-all/openssl=affected,fedora-all/mingw-openssl=affected,
epel-5/mingw32-openssl=notaffected,
rhev-m-3/mingw-virt-viewer=affected

The first part is the product alias (e.g. rhel-5 and rhel-6) are self explanatory. The second part is the package (the source rpm name).  Finally we have the state of the package, the main options are:

  • affected = the package is affected by the CVE
  • notaffected = the package is not affected by the CVE
  • new = the package needs to be triaged to see if it’s affected by the CVE
  • defer = the package is affected by the CVE and will potentially be fixed at a later date
  • wontfix = the package is affected by the CVE but won’t be fixed

How to tell if your system is vulnerable?

If you have a specific CVE or set of CVEs that you are worried about you can use the yum command to see if your system is vulnerable. Start by installing yum-plugin-security:

sudo yum install yum-plugin-security

Then query the CVE you are interested in, for example on a RHEL 7 system without the OpenSSL update:

[root@localhost ~]# yum updateinfo info --cve CVE-2014-0224
===============================================
 Important: openssl security update
===============================================
 Update ID : RHSA-2014:0679
 Release : 
 Type : security
 Status : final
 Issued : 2014-06-10 00:00:00
 Bugs : 1087195 - CVE-2010-5298 openssl: freelist misuse causing 
        a possible use-after-free
 : 1093837 - CVE-2014-0198 openssl: SSL_MODE_RELEASE_BUFFERS NULL
   pointer dereference in do_ssl3_write()
 : 1103586 - CVE-2014-0224 openssl: SSL/TLS MITM vulnerability
 : 1103593 - CVE-2014-0221 openssl: DoS when sending invalid DTLS
   handshake
 : 1103598 - CVE-2014-0195 openssl: Buffer overflow via DTLS 
   invalid fragment
 : 1103600 - CVE-2014-3470 openssl: client-side denial of service 
   when using anonymous ECDH
 CVEs : CVE-2014-0224
 : CVE-2014-0221
 : CVE-2014-0198
 : CVE-2014-0195
 : CVE-2010-5298
 : CVE-2014-3470
Description : OpenSSL is a toolkit that implements the Secure 
Sockets Layer

If your system is up to date or the CVE doesn’t affect the platform you’re on then no information will be returned.

Conclusion

Red Hat Product Security makes available as much information as we can regarding vulnerabilities affecting our customers.  This information is available on our customer portal as well as within the software repositories. As you can see it is both easy and quick to determine if your system is up to date on security patches with the provided information and tools.

The following checklist can be used to check if systems or packages are affected by specific security issues:

Check if the issue you’re concerned about has a CVE and check the Red Hat CVE page:

https://access.redhat.com/security/cve/CVE-2014-0224

2) Check to see if your system is up to date for that issue:

sudo yum install yum-plugin-security 
yum updateinfo info --cve CVE-2014-0224

3) Alternatively you can check the package NVR in the RHSA errata listed in the CVE page (in #1) and compare it to the packages on your system to see if they are the same or greater.

4) Check the Bugzilla entry for whiteboard info:

https://bugzilla.redhat.com/show_bug.cgi?id=[CVE NAME]

August 26, 2014

Yellow Sticky of Doom Revisited

Talking with security experts about the Yellow Sticky of Doom shows that the situation isn’t entirely bleak. They agree that posting notes on a monitor – or the bottom of a keyboard – is bad.

However, they recognize that (somewhat secure) passwords are difficult to remember and will be written down. They point out that combining written passwords with physical security can actually be a reasonable approach.

If you write your password down and place it in a locked desk drawer you achieve a significant level of security. Getting the password out of sight is a good start – rifling through someones desk drawer is usually noticed. And if you lock your desk when you leave you are establishing a reasonable level of commercial security. And the good news about desk drawers is that they can’t be accessed through the Internet!

This approach assumes that you have a reasonable level of physical security for your business or home. If you don’t, password security may be the least of your concerns.

There are a variety of ways to increase physical security, such as control of keys, using secure filing cabinets, or using a safe. Something as simple as a Locking Bar for 4 Drawer File provides significantly enhanced physical security beyond that of common desk locks.

This is an area where you need to look at security from a higher level. Once you recognize that passwords by themselves provide poor security and that passwords will be written down you can develop a rational approach. Consider computers, networks, people, policies, and physical security together – develop a real security policy, rather than passing down edicts that don’t work.

You can’t abolish the Yellow Stick of Doom. But moving it into a locked desk drawer is probably good enough.


August 21, 2014

Phishing

Kerberos was slow when talking to my demo machine. As part of debugging it, I was making DNS changes, so I pointed my machine directly to the DNS server. It was at my hosting provider, and authoritative for my domain.

As I tend to do, I idly checked Facebook. Its a bad habit, like biting nails. Sometimes I’m not even aware that I am doing it. This time, however, a browser warning brought me up short:

“Security Error: Domain Name Mismatch”

The certificate reported that it was valid for a domain that ended in the same domain name as the nameserver I was pointing at.

Someone just like me had the ability to push up whatever they wanted to the DNS server. This is usually fine: only the Authoritative DNS server for a site is allowed to replicate changes. It did mean, however, that anyone that was looking at this particular DNS server would be directed to something they were hosting themselves. I’m guessing it was a Phishing attempt as I did not actually go to their site to check.

Most of us run laptops set up to DNS from the DHCP server we connect to. Which means that if we are at a Coffee Shop, the local library, or the Gym, we are running against an unknown DNS server. The less trusted the location, the less reason to trust the DHCP server.

This is a nasty problem to work around. There are things you can do to mitigate, such as whitelisting DNS servers. The onus, however, should not be up to the end users. DNSSec attempts to address the issues. Until we have that, however, use HTTPS where ever possible. And check the certificates.

August 20, 2014

Greatest Threat: Yellow Sticky of Doom

We now get to what I consider the greatest threat to computer security: the Yellow Sticky of Doom!

Yellow Sticky

Passwords written down on yellow sticky notes. These are everywhere.

What is the difference between a secure facility and an insecure facility? In an insecure facility the yellow sticky notes are stuck to monitors. In a secure facility the yellow sticky notes are stuck to the bottom of the keyboard. In really secure facilities they are in desk drawers – and maybe even locked up!

The solution is obvious: ban people from writing down their passwords!

Except that this won’t work. Full stop. Period. Won’t. Work.

Why? Because passwords are crap for security.

Passwords that are difficult to guess or to crack with a brute force attack are impossible for people to remember – look at the ones in the yellow sticky above! All of these passwords were produced by a password generator with a high security setting. Anyone who can remember one of these passwords scares me!

Consider the usual guidelines for producing a secure password: 12-16 characters, no dictionary words, a combination of upper case, lower case, numbers, and punctuation. And changed every 1-6 months.

Right….

Human brains don’t work this way.

Correct Horse Battery Staple

If you want people to actually remember passwords, consider the way the human brain works. Look at XKCD on Password Strength: this is an example of a password that a human can remember. It builds on the way the mind and memory work, through chunking, context, and pattern recognition. Correct Horse Battery Staple has become an Internet meme – a code term referencing a way to make passwords somewhat work.

But, can your system handle it? Do you allow passwords this long? Do you allow spaces in passwords?

And look at your policies. If a person can remember a word, it is in a dictionary! The only thing a “no dictionary words” policy does is guarantee that passwords will be written down.

At a minimum, encourage pass phrases rather than classical passwords.

If you actually care about security, implement multi-factor authentication – a combination of what you know, what you have, and what you are.

Traditional passwords serve only one purpose - to allow you to blame innocent users for your mistakes. They are no longer an effective security or authentication mechanism. Forget trying to stop people from writing them down and get serious about security.

Get rid of the Yellow Sticky of Doom by making it obsolete!


August 18, 2014

Threat: Joe the Backhoe Operator

BackhoeSmall

Where Dennis the Weatherman is a proxy for all the threats nature can pose, Joe the Backhoe Operator is a proxy for man-made threats outside the data center.

Backhoe Fade is a familiar term in the telecommunications industry, where it refers to construction activities cutting cables. This can be anything from a single network link to a major fibre optic link affecting millions of people. The classic example is a backhoe operator digging in a field in the middle of nowhere who digs right through a cable, taking out a major telecommunications link.

Closely related to backhoe fade is damage to undersea cables, often from ships dragging anchors across the cables and severing them. And, of course, sharks… How Google Stops Sharks From Eating Undersea Cables

While not necessarily a classical security threat, and not a threat to system integrity in the same way as other threats we have discussed, backhoe fade is a great threat to system availability and business continuity.

Major data centers will typically have multiple redundant, physically separated network connections to allow them to route around network failures.

Unfortunately, it is much less common for individual buildings where people actually work to have such redundant network connections. If the hundreds of people in your office can’t get to the corporate data systems, it really doesn’t matter which end of the cable has been cut…


Musings on identity management

This post is an edited version of an email I sent to the Red Hat Identity Management (IdM) team mailing list that outlines the main take-aways from my first few months working on the FreeIPA identity management solution.

I’m over three months into my new gig on the identity management team at Red Hat now, so I would like to share a few thoughts about what I’ve learned about identity management.

I was excited to come into this role because of my innate interest in security and cryptography. I had little practical experience with PKI and security protocols beyond basic X.509/TLS and OpenPGP, so I have been relishing the opportunity to broaden my knowledge and experience and solve problems in this domain.

What I did not understand, when I joined, was just how much an effective IdM strategy and infrastructure can benefit businesses and large communities in the form of improved security and reduced risk (two sides of the same coin, one could argue) and of course, greater efficiency. The diversity of use cases and the versatility of our software to address these use cases also amazed me.

This added perspective motivates me to seek opportunities to talk to people and find out about their IdM needs and how existing offerings (ours or others) are falling short, and work out what we as a team can do to better meet and even anticipate their needs. It has also given me a foundation to explain to non-technical people what FreeIPA and related projects are all about, and help them understand how our solutions can help their business or community.

I say "community" above because I have begun to see that free software communities represent valuable proving grounds for FreeIPA. For example, a couple of weeks ago during PyCon Australia I was chatting to Nick Coghlan and learned that the Python community is currently struggling with a proliferation of identity silos – developer accounts, PSF memberships and roles, the main website, PyPI, and so on. Yet noone has put their hand up to address this. I didn’t quite commit to writing a PEP to fix all that (yet) but we agreed that this represents a great opportunity to employ FreeIPA to benefit an important project and community – important for our team and for Red Hat as well as for the software industry in general. How many other communities to whom we have links or on whom we rely could benefit from FreeIPA in a similar way? And how much will our solutions be improved, and new innovations discovered, by what we might learn in working with these communities to improve their identity management?

So, that’s most of what I wanted to say, but I want to thank you all for your assistance and encouragement during my first few months. It has been quite a shift adapting to working with a global team, but I am really enjoying working with you on Red Hat IdM and am excited for our future.

August 13, 2014

Fedora Security Team

Vulnerabilities in software happen.  When they get fixed it’s up to the packager to make those fixes available to the systems using the software.  Duplicating much of the response efforts that Red Hat Product Security performs for Red Hat products, the Fedora Security Team (FST) has recently been created to assist packagers get vulnerability fixes downstream in a timely manner.

At the beginning of July, there were over 500 vulnerability tickets open* against Fedora and EPEL.  Many of these vulnerabilities already had patches or releases available to remedy the problems but not all.  The Team has already found several examples of upstream not knowing that the vulnerability exists and was able to fix the issue quickly.  This is one of the reasons having a dedicated team to work these issues is so important.

In the few short weeks since the Team was created, we’ve already closed 14 vulnerability tickets and are working another 150.  We hope to be able to work in a more real-time environment once the backlog decreases.  Staying in front of the vulnerabilities will not be easy, however.  During the week of August 3rd, 27 new tickets were opened for packages in Fedora and EPEL.  While we haven’t figured out a way to get ahead of the problem, we are trying to deal with the aftermath and get fixes pushed to the users as quickly as possible.

Additional information on the mission and the Team can be found on our wiki page.  If you’d like to get involved please join us for one of our meetings and subscribe to our listserv.

 

* A separate vulnerability ticket is sometimes opened for different versions of Fedora and EPEL resulting in multiple tickets for a single vulnerability.  This makes informing the packager easier but also inflates the numbers significantly.

August 11, 2014

Getting Service Users out of LDAP

Most people cannot write to the LDAP servers except to manage their own data. Thus, OpenStack requiring the Service users in LDAP is a burden that many IT organizations cannot assume. In Juno we have support for Multiple backends for domains.

Starting with a devstack install, (and an unrelated bug fix) I created a file: /etc/keystone/domains/keystone.freeipa.conf. The naming of this file is essential: keystone..conf is the expected form. It looks like this.

# The domain-specific configuration file for the test domain
# 'domain1' for use with unit tests.

[ldap]
url=ldap://ipa.cloudlab.freeipa.org
user_tree_dn=cn=users,cn=accounts,dc=ipa,dc=cloudlab,dc=freeipa,dc=org
user_id_attribute=uid
user_name_attribute=uid
group_tree_dn=cn=groups,cn=accounts,dc=ipa,dc=cloudlab,dc=freeipa,dc=org


[identity]
driver = keystone.identity.backends.ldap.Identity

And made the following changes to /etc/keystone/keystone.conf

[identity]
domain_specific_drivers_enabled=true
domain_config_dir=/etc/keystone/domains

I restarted HTTPD to get Keystone to pick up my changes:

sudo systemctl  restart httpd.service

Then followed the steps from my earlier blog post to list domains.

export TOKEN=`curl -si -d @token-request-admin.json -H "Content-type: application/json" http://localhost:35357/v3/auth/tokens | awk '/X-Subject-Token/ {print $2}'`
curl -s -H"X-Auth-Token:$TOKEN" -H "Content-type: application/json" http://localhost:35357/v3/domains | jq '.domains[] | {id, name}

Note that I used a little jq to make things easier to read.

To add the new domain:

curl  -H"X-Auth-Token:$TOKEN" -H "Content-type: application/json" -d '{"domain": {"description": "FreeIPA Backed LDAP Domain", "enabled": true, "name": "freeipa"}}'  http://localhost:35357/v3/domains

Now to test out the new domain I have a user from the LDAP server. Good Old Edmund. Here is token-request-edmund-freeipa.json

{
    "auth": {
        "identity": {
            "methods": [
                "password"
            ],
            "password": {
                "user": {
                    "domain": {
                        "name": "freeipa"
                    },
                    "name": "edmund",
                    "password": "freeipa4all"
                }
            }
        }
    }
}
-sh-4.2$ curl -si -d @token-request-edmund-freeipa.json -H "Content-type: application/json" http://localhost:35357/v3/auth/tokens 
HTTP/1.1 201 Created
Date: Mon, 11 Aug 2014 14:52:14 GMT
Server: Apache/2.4.10 (Fedora) mod_wsgi/3.5 Python/2.7.5
X-Subject-Token: PKIZ_eJx1VMmSozgUvPMVfa-oKBaD8aEPEsIgXJILEJtuBmw2eTfG5uuHcnXEzBxat5ehzHxPynjv79OBtoPpL4uE38W7RDC2FkPmrY4c1_eCAr-wjhNmIMsCvW8B335YDLiwii6oIpbTTngkO1Z4dkKcaxLybRdsJpDN7Kqyje3Tk3P1JvIGG9i9NpuUjmUSd6m6lHF7rHCLB8L8G0HVjbBlJE2FQRk2CIt6OgKFjL6eNPiKLU9s3eCZpaRZN3C-C4cK7xVROvWduy8sxwdYS4VGtVzzOizkPyR4Kvbx-DfHyeh_hhIZ7ecfR6VQ4-c3aRqjy1Wl3iSzlzvei-4lto9mJMlmdMzGSWTM2qCW1gzPqIp1uqdijYjME77nrf3EzXfLep0n0SQCGn7wBE_EkIXe4tMCzSbxX7hE2u5BR9FRFj04KkbSQkEZUAnzGo6EyJjYUyduCYINYbAlaiSTlsgk8ZU1657S2glqymx1ndgyZ2QkqGz5z0h9lijip_W4y9O455a32KXyo6ocOH0_3PkYSoDBgiyLhzUCD1Y0hiBjQMSM-LMBgQzFvo8RiOP8QEWJ7DUBgwOUyIbDsIwTfZR46j9QC8gP-UhgHPfTY8okqIZl9RJACCy0Uit6ntZ1nsIrD_U2V-XvkA0SHLLlapghUB8H5P83kTaEPvgOF-hIdpO7UNSmnLZXeLqXffBhOUwPUkvqdg2vHx_mmSu1YVy9ozbUnLAzTd5clJkqKoubnU_xTtfHMdymNRgRCdEFB4fDoC2FxKNP9iXicA6avAtquQzmJ8q-jK3XDHvVgIvNTW9P84-3dsvNNemCw2fDqUfHWX18rORAuga7U2WaXChPLVF7S9bedL3Tmtonpb4KFqnBBdP7Q9-cD0O5L8doU503fsVcC33V50pq7-N93n0cyf0xLE1MS3exq4P0pmwwnNkm0a1NWG2s3fnpuKWD36gd3k9qtu0uF4Nw5ErZ7cbnZn9wL8FQ6NvKzDy7genTjYRjlm_bL_eL59Pl-c1V9B20i3b3qWIOVsPv39JrI9gU_bsd_gH7ultg
Vary: X-Auth-Token
Content-Length: 314
Content-Type: application/json

{"token": {"issued_at": "2014-08-11T14:52:15.705349Z", "extras": {}, "methods": ["password"], "expires_at": "2014-08-11T15:52:15.705312Z", "user": {"domain": {"id": "e81f8763523b4a9287b96ce834efff12", "name": "freeipa"}, "id": "29179d551d7320e50612bd9ea9f4ec00b10c3e42341d59928da5169a4e3307cf", "name": "edmund"}}}-sh-4.2$ 

It works! Henry Nash worked hard on this. IT was supposed to land for Havana, but it got delayed by several issues. Probably the biggest was the User_Id. Notice in the Token response above that Edmund’s User ID is not the UID field from the LDAP query, but rather 29179d551d7320e50612bd9ea9f4ec00b10c3e42341d59928da5169a4e3307cf. This value is a SHA256 of the LDAP assigned filed and the domain_id combined. This mechanism was select to prevent conflicts between two different Identity Providers assigning the same value. To confirm:

$ echo "select * from id_mapping;" | mysql keystone
public_id	domain_id	local_id	entity_type
29179d551d7320e50612bd9ea9f4ec00b10c3e42341d59928da5169a4e3307cf	e81f8763523b4a9287b96ce834efff12	edmund	user
Threat: Dennis the Weatherman

HurricaneSandy

Dennis the Weatherman is a proxy for the threats that nature presents. Superstorm Sandy is a recent example of the power of weather. Some places received over a half meter of rain in less than 24 hours, as well as high winds. The combination of flooding, storm surge, high winds and downed trees wreaked havoc on businesses and data centers over an extended area.

Superstorm Sandy highlighted many factors around disaster preparedness. Some companies were able to fail over to geographically remote data centers and continue operations with minimal disruption.

Some companies had mixed experiences. One of the best examples is Peer 1 Hosting – their data center was well above the flooding and they had backup generators. Unfortunately, their diesel fuel tanks for the backup generators were in the basement… They had to form a “bucket brigade” to carry diesel fuel up 17 flights of stairs.

Other companies were simply down. Without power or network connectivity there was nothing they could do. Worse, their datacenters may have been flooded and the equipment damaged or destroyed. Worst case would be a flooded datacenter without adequate offsite disaster recovery or even backups; some companies went out of business.

In addition to hurricanes, you have to worry about flooding, tornadoes, fire and wildfire, blizzards, and earthquakes.

The good news is that the computer systems in the basement of a flooded data center which is burning down in the middle of an earthquake are not likely to be hacked…

Yes, weather is a clear and present danger to system integrity. Plan accordingly. PLAN accordingly! Exactly what will you do if your data center is under water and located in the middle of an entire region of downed trees with roads blocked by thousands of people trying to escape. And maybe even Starbucks closed!


August 06, 2014

Flock to Fedora: Day One (Afternoon)

Overview

The afternoon today has been absolutely filled with excellent talks. As with this morning, I’ve captured most of them for your entertainment and edification below. These blog entries are getting very long; perhaps I need an index.

Sessions

Fedora Workstation: Goals, Philosophy and Future

“The desktop is not dying!” cries Christian Schaller as he dives into his talk on the Fedora Workstation. In a throwback to the old saw about the Year of Linux on the Desktop, he points out that 35% of laptop sales are Chromebooks and that there are now over six hundred games available on Steam for Linux.

So what is the Fedora Workstation trying to achieve? What are its driving principles? First of all, Fedora Workstation is the only desktop out there that isn’t trying to sell you (in other words, capture information about you to sell to someone else). Fedora Workstation will consider end-user privacy one of its highest goal. By default, Workstation will not share your data.

The recent switch to focusing on delivering specific Fedora Products gave the Workstation an opportunity to start defining standards and maintaining them within a particular Product. Traditionally, Fedora was effectively a collection of exactly what upstreams provided. With the Workstation, we’ll be able to curate this code and integrate it into a cohesive whole (Author’s note: bingo!). One of the key points here is that by providing a known base, we give developers a better idea of what they can rely upon when writing their applications and services.

Christian continues on to discuss more about the future plans of the Workstation. The first of these was a discussion about how containers will fit into the Workstation world. Significant research is going into figuring out how to use container images to isolate and secure applications from one another, which will offer both flexibility and security in the long run. With some of the features offered by Wayland, this will eventually provide an excellent (and safe!) experience for users. Christian also notes that their goal is to make this all seamless from the users’ perspectives. If they notice it’s been done (without being told), they failed.

The Workstation Working group has been doing extensive research on Docker as a container implementation but has yet to make a decision on whether to standardize on Docker or else use a customized namespacing solution.

Christian then talks a bit about how Workstation is going to try to grow the platform. He acknowledges that there has been a slow decline of usage in Fedora and wants to put effort into drawing people back in. Some of the plans for this involve increased publicity through blogging and the Fedora Marketing groups. It will also involve finding ways to reduce the split between applications that run on Fedora and Red Hat Enterprise Linux. In part, that involves making sure that newer APIs are available in RHEL and older interfaces remain available in Fedora.

Another key piece of Fedora Workstation’s long-term strategy involves the creation and maintenance of a true ABI at all levels of the stack. This will help reduce the “moving target” aspect of Fedora and provide more guarantees about long-term sustainability of an application written for Fedora.

Christian takes a moment to talk about the target users. A lot of the conversation so far has been about making Fedora better for developers, but from his perspective it is also necessary to build a platform that’s useful to creators of all sorts. “Creators” may include video editors, 3D-modeling, music mixing and all sorts of other creation tasks.

After this, the conversation moved on to discuss a little bit of the challenges they face. Marketing and PR is going to be a significant challenge, particularly correcting some existing negative press that we’ve gathered over the last few years. We’ve also got to convince ourselves. There have been a lot of big changes in Fedora lately (the Three Product Plan being the most visible) and it’s clear that there’s going to be a period of adjustment as we course-correct and really figure out how we’re going to move forward.

Christian then talks about the advent of web-based applications and how that relates to the Workstation. He notes that there are users (even at Red Hat) that never open any local application other than their web browser. From this perspective, it’s very clear that we need Fedora Workstation to be a powerful mechanism for running web applications. So one of the things being worked on is ways that web applications can be tied into the desktop environment in a more integrated way.

Comments during and after the talk focused a great deal on the publicity and marketing aspects of things. It was noted that the more targeted user groups lends itself better to more controlled messaging. This will bring in more users in those particular groups who will in turn pass their good experience on by word-of-mouth.

Several questions were also asked about specific feature enhancements to things like Gnome Online Accounts (particularly around the earlier discussion of web applications). Christian indicated that the next  immediate efforts on that front will likely be focused around a better calendaring experience.

Evolving the Fedora Updates Process

I next attended a talk about the Fedora update process given by Luke Macken, author and maintainer of the Bodhi update system.

Once upon a time in Fedora Core 1 through Fedora Core 3, updates were handled via a manual process involving emails to release engineering. Starting with Fedora Core 4, a private internal updating system that was available only to Red Hat employees.

The modern world of Bodhi began in Fedora 7 at the same time that Fedora Core and Fedora extras were merged. It introduced the concept of Karma and it was written in TurboGears 1.x and it is still in production today, seven years and many revisions later.

Bodhi does a lot of things behind the scenes, being both extremely intricate and very inefficient. Luke described a number of issues that have cropped up over the years, including inflexible SQL routines and the karma process outliving its usefulness.

Luke next took a little side-trip to tell us about some of the more entertaining glitches that have cropped up over the years, including the infamous Fedora 9 GPG re-keying and numerous crashes during update pushes.

After that, he moved on to discussing the plans for the Bodhi2 project. The plan is to have it land sometime after the Fedora 21 release. We don’t want to rely on it for zero-day updates, but we’ll phase it in soon after and it should hopefully be a graceful transition.

Some of the major changes in Bodhi2 will be a comprehensive REST API, a new and improved command-line tool and major changes to the UI that should provide a better experience for the users.

Another great feature of Bodhi2 is that it will integrate with fedmsg and the new Fedora Message Service to reduce the amount of “spam” email that Bodhi sends out. Luke dives in a bit to talk about the datagrepper and datanommer mining tools that power the notification service and the set of filters that you can opt into.

Luke showed off how Bodhi2 will be tightly integrated with Taskotron to perform automated testing on updates, as well as the integration with the Fedora Badges (there are lots of them available for Bodhi!) and then on to the feedback system. He called out both the fedora-easy-karma command-line tool and the fedora-gooey-karma GUI tool for managing karma updates on Bodhi1 (and noted that they will be working together to support Bodhi2 as well).

Then he went and left me slack-jawed with the new submitter process, automating almost everything and making it almost unbelievably simple. Adding to that, the new karma system allows the submitter to select fine-grained karma controls, so they can request that specific tests have to pass karma before accepting it into the stable repository.

The talk finished up with some prognosticating about the future, particularly talking about being able to run AMI and other cloud image updates through as well.

State of the Fedora Kernel

The next stop on my whirlwind tour of the wide world of Fedora was Josh Boyer’s annual discussion on Fedora’s treatment of the kernel. First up on the agenda was an overview of the release process. Fedora focuses on having a common base kernel across all stable releases (which means that bugs and features are shared). Fedora rebases to the latest upstream kernel on a regular basis, staggering the updates back to the two stable releases.

Josh described how the old process for kernel updates was to be more conservative on updates on older Fedora releases. However, a few years ago the process changed and updates are now faster and keeps Fedora much closer to the kernel upstream.

The talk then moved on to discussing the state of open bugs in the kernel. During this talk in 2013, there were 854 open bugs against the Fedora Kernel. After last year, the kernel maintainers sat down and spent a lot of time to knock down the bugs. Today it’s down to 533, but this is still not good enough. There will be a talk on Saturday about some ways to address this.

Josh pointed out several consistent problem areas: WiFI support, suspend/resume, video playback and brightness settings, platform-specific drivers (like Fn keys) and bluetooth. “All the stuff that involves ACPI is usually broken on some platform, somewhere”.

He then moved on to talking about how they handle bugs. He pointed out that if someone files a bug, they’re contacted (the bug is set NEEDINFO) every time the kernel is rebased. If after two weeks the reporter doesn’t confirm a fix (or a continuing bug), the bug is closed INSUFFICIENT_INFO.

What does all this mean? In short, it means that the Fedora kernel maintainers are permanently saturated, the work is never-ending and they would really appreciate if people would take vacations at different times than the maintainers so they don’t always return to 200+ extra bugs. Additionally, they really need help with triage, but it’s difficult to find anyone to do so, mainly because bug triage is admittedly boring. Some steps have been made in the last couple years that really helps, particularly the ABRT bug reports and the retrace server that helps narrow down which bugs are having the widest impact. The retrace server in particular keeps statistics on the number of reports, so that helps in prioritizing the fixing efforts.

After the bug discussion, Josh moved on to talking about the situation with Fedora 21. The plan is to release either kernel 3.16 or 3.17 at release, depending on schedule slips. During the Fedora 21 process, the kernel maintenance has actually been fairly calm, despite a set of new packaging changes.

During the Fedora.next process, the Fedora Cloud Working Group made requests of the kernel team to shrink down its size. There are a lot of optional components built into the kernel and many of these weren’t actually needed in a cloud environment. So the kernel team went and broke out the available modules into a core set and a common set above that. This made the minimal installation set much smaller and reduced the space on the cloud images substantially. In addition to this, they found ways to compress the kernel modules so the storage on disk shrank quite a bit as well.

Another useful feature that was added to packaging in Fedora 21 is support for automatically-generated RPM “Provides” listing the set of modules that are present in each of the packages. This will make it easier for packagers to specify dependencies on the appropriate package (and will continue working if modules move around).

The last major change in Fedora 21 is support for 64-bit ARM hardware (aarch64), which as was noted by an audience member is now available for general purchase. It works fairly well (thanks in large part to a herculean effort by Red Hat ARM engineers) and may be promoted to a primary architecture in Fedora 22 or 23. As a side-effect of this work, it’s going to be possible to replace the slow armv7hl builders in Koji with the new aarch64 builders that will be vastly more performant.

Josh then moved on to discuss the new kernel Playground, which is a new unsupported COPR containing some new experimental features. It tracks Fedora Rawhide and 21 and provides today the Overlayfs v23 (a union filesystem) and kdbus (the high-performance kernel D-BUS implementation). These are fairly stable patches to the kernel that are still out of the main tree and therefore not really suitable for Fedora proper (yet).

In the future, it may include other features such as kpatch and kgraft (the in-kernel infrastructure for supporting live-patching the kernel).

Advocating Fedora.next

After taking a short break to catch my breath and give my fingers a rest (these blog entries are a lot of work!), I went along to Christoph Wickert’s session Advocating Fedora.next. This session was largely (but not exclusively) directed at Fedora Ambassadors, informing them how best to talk about the Fedora.next initiative to the public.

He began his talk by addressing the question of “Why?”. Why did we need to change things so substantially? After providing the delightfully glib answer “Why not?”, Cristoph described a bit about Fedora’s history. He pointed out quite eloquently how Fedora has always been about change. Fedora has never been afraid to try something new to improve.

He then tackled some of the non-reasons behind Fedora.next, specifically the rumors of our demise post-GNOME 3 and similar. The truth is that we have a strong brand with a large contributor base that is extremely closely linked to upstream development (more so than many, if not all, of the other distributions). We’ve had a decade of successes and Fedora 20 has been very positively reviewed overall.

Another such rumor was that the new products are a benefit only to Red Hat. The obvious rebuttal here is that separating products makes good sense, focusing on specific user sets. Also, Red Hat has no particular interest in a consumer workstation product.

A final common criticism is that the new working groups are a power-grab by Red Hat. The older governance has not changed (and remains community-elected). Furthermore, all of the working groups were self-nominated and contains numerous non-Red Hat community members, including Christoph himself.

Christoph then ruminates, “If we didn’t do it for those reasons, why did we do it?”. The first answer is that distributions have become boring. All major distributions have declining market share. The general impression is that the distro is a commodity and that the users don’t care which one they’re running. The user really only cares that their applications run. Things like cloud environments and containers blur this line as well.

Continuing on, Christoph calls out to the Fedora Mission statement: “The Fedora Project’s mission is to lead the advancement of free and open source software and content as a collaborative community” and asks whether we feel like we’ve been living it. With that in mind, he defines Fedora.next for us: an umbrella term for the changes in the way that we create and release Fedora.

Fedora.next was born of two proposals that were first publicised at last year’s Flock conference: “The Architecture for a More Agile Fedora” by Matthew Miller and “The Fedora Crystal Ball: Where are we going for the next five years?” by yours truly, Stephen Gallagher. As the last year has passed, the Fedora.next that we now know has become a merge of these two proposals.

Matthew’s proposal involved having different policies depending on how low in the stack that a package lived, with core functionality having stricter guidelines than packages up in the application layer. My proposal was around having three different development streams (Server, Workstation and Cloud) and possibly different release cycles. The modern vision will be a combination of the two, with three products.

Christoph also warned the Ambassadors to be aware that the Fedora installation DVD will be retired in Fedora 21, but notes that it was never truly well-maintained and that its replacement netinstalls, spins and live media should be at least sufficient.

“What is a product?”, Christoph asks, then answers. A Product is more than simply a Fedora Spin. Each product has a defined target audience, a mission statement, a product requirements document (PRD) and a technical specification. The mission statement, PRD and technical statement were all defined and discussed publicly by the Product Working Groups and ratified by the Board and FESCo. Each product contains features not present in older Fedoras and has its own working group with their own governance models.

Christoph stresses that this is not a power-grab but instead the opposite: it’s an opportunity to give more power to the specific people who are building the Products. As a member of the Workstation Working Group, he calls out to its Mission Statement and then discusses a few of the Workstation-specific cool features. He notes that what Fedora Workstation will look like in Fedora 21 will not be a huge difference from the classic Fedora Live image, but this will change over time as the Workstation comes into its own life.

He then continues on to discuss Fedora Server a bit, calling out the exciting new Cockpit system management console.

Moving on to the Fedora Cloud, Christoph asks for Matthew Miller in the audience to comment further on it. Matthew describes the pets vs. cattle metaphor and explains that Fedora Cloud is really meant to fill the “cattle” side of that metaphor. Matthew notes the work towards the Fedora Base Image and the Fedora Atomic effort.

Christoph notes that this is an excellent example of why Spins are not enough. For example, when distributing cloud images, it doesn’t really meet the definition of a Spin because it doesn’t install via anaconda.

The talk then moves on to discussing the remaining two working groups: “Base Design” and the “Environments and Stacks” groups. Talking about the “Base Design”, he stresses that the idea is for the base to be as small as possible and provide a common framework for the Products to build on. The “Environments and Stacks” working groups are focused on making developers’ lives easier, providing the ability to install known software (development) stacks, possibly different versions side-by-side.

Christoph summarizes that there has been a great deal of misinformation put out there and he calls out to the Ambassadors and everyone else to explain what’s really going on, how it works and why it’s a positive change in Fedora. The message must be positive, because the change is exciting and there’s much more to come. He cautions “it’s not just ‘the next fedora’, it’s ‘Fedora.next’.”

Daily Summary

It’s really hard to pick out one specific thing to say about Flock’s first day. Every one of the speakers I listened to today were excited, engaging and clearly love Fedora, warts and all. I think I’ll leave it there for today.


Flock to Fedora: Day One (Morning)

Overview

Today was quite a day! I attended many interesting sessions and it’s going to be quite difficult to keep this post to a reasonable length. I make no promises that I will succeed at that. Here’s a set of highlights, mostly my stream-of-consciousness notes from the sessions I attended. I’ll pare down the best details in a final wrap-up post on Saturday or Sunday. For now, consider this an exhaustively detailed set of live notes published as a digest.

This has gotten quite long as I write, so I’m going to break these up into morning and afternoon reports instead. Enjoy!

Sessions

Free and Open Source Software in Europe: Policies and Implementations

The first talk I attended today was the keynote by Gijs Hellenius, Free and Open Source Software in Europe: Policies and Implementations. It was an excellent overview of the successes and issues surrounding the use of Open Source software in the European public sector. The good news was that more and more politicians are identifying the value of Open Source, providing lower costs and avoidance of lock-in. The bad news is that extensive lobbying by large proprietary companies has done a lot of damage, thus preventing more widespread adoption.

That is all in terms of usage. Unfortunately, the situation soured somewhat because developers of public sector code are hesitant about making their code open-source (or contributing to other open-source projects directly). It was noted that the legal questions have all been answered (and plenty of examples of open-sourced public development exist). Unfortunately, a culture of conservatism exists and has slowed this advance.

A highlight of the talk was the Top three most visible open source implementations section. The first of these was the French Gendarmie who switched 72,000 systems over to Ubuntu Linux and LibreOffice Desktops. While this author would certainly prefer that they had selected an RPM-based distribution, I have to bow to the reported 80% cost reduction.

The second major win was the conversion of 32,000 PCs in government and healthcare in the Spain Extremadura areas (mostly Debian).

The third one was the City Administration of Munich, who also converted 14,800 workstations from Windows to Ubuntu Linux and LibreOffice. Excellent quote on his slide (paraphrased because it went by too quickly): “Bill Gates: Why are you doing this, Mr. Ude?” “Ude: To gain freedom.” “Gates: Freedom from what?” “Ude: From you, Mr. Gates.”

Fedora QA – What Why How…?

The second talk I attended today was an introduction to Fedora QA from Amita Sharma, a Red Hat senior quality engineer.

Amita started by talking about why Fedora QA is important, citing that Fedora’s fast schedule would otherwise lead to a very buggy and unpleasant system to use. She showed us a few bugs in Bugzilla to explain how easy it can be to participate in QA. In the simplest sense, it’s no more than spotting a bug and reporting it.

Next she discussed a bit about the team composition, meeting schedule and duties. This involved things like update and release testing, creation of automation and validation testing, filing bugs, weekly meetings and scheduling and execution of Fedora Test Days.

Some of the interesting bits that Amita pointed out were some of the various available resources for Fedora QA. She made sure to call out the daily Rawhide Report emails to the devel@lists.fedoraproject.org mailing list, as well as Adam Williamson’s Rawhide Watch blog and the regular Security Update test notice.

She next launched into a detailed explanation of the karma process for Fedora testing updates. She explained how it is used to gate updates to the stable repository until users or QA testers validate that it works properly.

Amita moved on to describing the bug workflow process, from how to identify a bug to reporting it on Bugzilla and following it through to resolution.

The next phase of Amita’s talk focused on the planning, organization and execution of a Fedora Test Day, from proposing a Test Day to creating test cases and managing the process on IRC.

Where’s Wayland?

I was originally planning to attend the Taskotron talk by Tim Flink, but the room was over capacity, so I went over to the Wayland talk being given by Matthias Clasen (which was my close second choice).

I arrived a few minutes late, so I missed the very beginning and it took me a bit of time to catch up. Matthias was describing Wayland’s capabilities and design decisions at a very low level, in many cases over my head, when I came in, so I really regretted missing the beginning.

One of the fundamental principles of Wayland is that clients are fully isolated. There’s no root window and there’s no concept of where your window is positioned relative to other windows, no “grabs” of input and no way for one Wayland application’s session to interfere directly with another. (Author’s note: this is a key security feature and a major win over X11). Wayland will allow certain clients to be privileged for special purposes such as taking screenshots and accessibility tools. (Author’s note: I had a conversation earlier in the day with Hans de Goede where we talked about implementing Android-style “intents” for such privileged access so that no application can perform display-server-wide actions without express user permission. I think this is a great idea and we need to work out with user-experience designers how best to pull this off.)

Another interesting piece that Wayland will offer is multiple interfaces. For example, it will be much easier to implement a multi-seat setup with Wayland than it was with X11 historically.

Matthias then moved on to describe why we would want to replace X11. The biggest reason is to clean out a lot of cruft that has accumulated in the X11 protocol over the years (much of it unused in modern systems). Having both the compositor and renderer in the same engine also means that you can avoid a lot of excess and duplicated work in each of the window managers.

One very important piece is the availability of sandboxing. In X11, any application talking to the server has access to the entire server. This potentially means that any compromised X application can read anything made available to X from any other application. With Wayland, it becomes very easy to isolate the display server information between processes, which makes it much more difficult to escalate privileges.

Matthias also covered some of the concerns (or arguable disadvantages), the biggest of which was that compositor crashes will now destroy the display session, whereas on older X11 approaches it was possible to restart the window manager and retain the display. Other issues involved driver support; for example the nVidia driver does not support Wayland. (Note: the Nouveau driver works just fine, it’s the proprietary driver that does not.)

The next topic was GNOME support of Wayland, which has come a very long way in the last two years. GNOME 3.10 and later has experimental support for running on Wayland. The remaining gaps on Fedora 21 are mainly drag-and-drop support, input configuration and WACOM support. (Author’s note: I’ve played with GNOME 3.13.3 running atop Wayland. Much of the expected functionality works stably, but I wouldn’t yet recommend it for daily use. Given the rate of improvement I’ve seen, this recommendation may change at F21 GA release).

Matthias made a rather bold statement that the goal to hit before making Wayland the default in Fedora is that end-users should not notice any difference (Author’s note: presumably negative difference) in their desktop.


Threat: Dave the Service Technician

ComputerServiceSmall

Dave is responsible for adding, upgrading and repairing systems. Without Dave, things will quickly go downhill in your data center.

While Dave is responsible for maintaining system integrity, he can also compromise it:

  • A drive has failed in a RAID5 set. You need to replace the failed drive and rebuild the RAID. Oops! Pulled the wrong drive. The RAID set has gone from degraded to dead. Time for a recovery operation!
  • Server17 in a rack of 36 1U “pizza box” servers needs to be power cycled. Dave hits the power button on Server18…
  • There is a short circuit in power distrubution unit in the server rack. Now you have 36 systems down!
  • Dave moves the wrong network cable in the wiring closet.
  • Don’t even think about what happens if Dave slips and bumps the Big Red Button!

EmergencyPowerOff

And if Dave happens to be malevolent, he can do things like:

  • Slip a laptop or other small computer into the wiring closet and have it snoop the internal network for data.
  • Connect internal networks directly to the Internet.
  • Steal parts, supplies, and even complete systems. Look at the number of cases where good boards are replaced and then sold on Ebay…

Basically, Dave is a proxy for all of the physical threats to system integrity that can occur in the data center.


Flock to Fedora: Intro

Today was a big day: we kicked off the first of the four-day Flock conference. This is the second time we’ve run a Flock, which is the largest annual gathering of contributors to the Fedora Project. There are a great many excellent talks scheduled for Flock this year, and I’ll only be able to attend a handful of them. Fortunately, all of the talks (but not the hackfests and workshops, sorry!) will be streamed live and recorded at the Flock YouTube Channel. My plan is to take a few notes on each of the talks I attend and put up a summary each day with the highlights. Come back tonight for my review of the first day!

 

Edit: See also Máirín Duffy’s blog for an absolutely excellent breakdown of how to follow along with Flock remotely.

 


August 04, 2014

Threat: Sally the User

SallyUser

Unlike Sam the Disgruntled Employee from our last post, Sally doesn’t have an evil bone in her body. She is dedicated, hardworking, helpful, and committed to doing a good job.

Unfortunately, she doesn’t completely understand how the system works, and sometimes enters incorrect data.

Actually, this isn’t her fault – Tom the Programmer from a few posts back probably didn’t write a usable system! I’m convinced that “Enterprise Software” means software that is hideously expensive with a poor user interface that no one would voluntarily use. I often use the phrase as user friendly as a rabid weasel to describe software, and much of the mission critical software that companies run on meets this description. But, that is a digression – let’s get back to the main point.

Since Sally is helpful and considerate, she is likely to give Fred the System Administrator her password when he calls. This isn’t just a Sally issue; virtually everyone is vulnerable to social engineering; look at the success of spear phishing against senior executives.

Sally is also likely to let Sam the Disgruntled Employee use her system if he asks with a plausible reason.

Sally is representative of the majority of people in your company. She works hard and wants to do the right thing. The systems – both computer systems and corporate procedures – need to support her in getting her job done, be resistant to mistakes, and prevent malevolent entities from using her as an attack vector. This will be a combination of training, system design, software design, management, operations, and company policies and procedures.

Basically, systems need to be designed to help Sally succeed and help prevent her from failing. This is the last place to use a heavy handed blame the employee for everything policy – it is both counter-productive and ineffective.

To be blunt, the problems you have with Sally are system failures, not user failures – the system isn’t designed to be used by typical users in the real world. In many cases the security model is much like the old physics approach of simplifying things to make it easier to deal with, where a problem statement will begin with: “Postulating a spherical cow in a vacuum, what is the trajectory…”

Unfortunately, such idealizations fall apart when real world factors come into play!


July 31, 2014

Threat: Sam the Disgruntled Employee

SamDisgruntled

I’m going to assert that Sam is the second greatest security you face. (We will encounter the greatest thread in a few more posts.) Depending on who you talk to, between 60% and 90% of corporate losses due to theft and fraud and from employees, not external threats.

This may be overstated in some areas; a lot of credit card theft and identify theft is external. See, for example, the theft of over 50M credit cards numbers at Target. Still, much of the real world theft is internal.

Sam is unhappy with your company. He wants to take from it or cause hurt. Sam may be committing fraud, copying internal documents to take to a competitor, posting damaging information on the Internet, or walking out the door in the evening with a bag full of your products or supplies.

You need to both watch for disgruntled employees and to minimize the damage they can do. Good management and good internal controls are your first line of defense. Constant awareness and vigilance are called for.

Above all, watch the people side. In some cases Sam is simply unethical – you need to find him and remove him. In other cases he is angry – this is often a management issue. In many cases he simply sees an opportunity that he can’t resist; solid internal controls will minimize this risk.

In any case, be aware that your greatest threats are usually inside your company, not outside of it!


July 30, 2014

Controlling access to smart cards

Smart cards are increasingly used in workstations as an authentication method. They are mainly used to provide public key operations (e.g., digital signatures) using keys that cannot be exported from the card. They also serve as a data storage, e.g., for the corresponding certificate to the key. In RHEL and Fedora systems low-level access to smart cards is provided using the pcsc-lite daemon, an implementation of the PC/SC protocol, defined by the PC/SC industry consortium. In brief the PC/SC protocol allows the system to execute certain pre-defined commands on the card and obtain the result. The implementation on the pcsc-lite daemon uses a privileged process that handles direct communication with the card (e.g., using the CCID USB protocol), while applications can communicate with the daemon using the SCard API. That API hides, the underneath communication between the application and the pcsc-lite daemon which is based on unix domain sockets.

However, there is a catch. As you may have noticed there is no mention of access control in the communication between applications and the pcsc-lite daemon. That is because it is assumed that the access control included in smart cards, such as PINs, pinpads, and biometrics, would be sufficient to counter most threats. That isn’t always the case. As smart cards typically contain embedded software in the form of firmware there will be bugs that can be exploited by a malicious application, and these bugs even if known they are not easy nor practical to fix. Furthermore, there are often public files (e.g., without the protection of a PIN) present on a smart card that while they were intended to be used by the smart card user, it is not always desirable to be accessible by all system users. Even worse, there are certain smart cards that would allow any user of a system to erase all smart card data by re-initializing it. All of these led us to introduce additional access control to smart cards, in par with the access control used for external hard disks. The main idea is to be able to provide fine-grained access control on the system, and specify policies such as “the user on the console should be able to fully access the smart card, but not any other user”. For that we used polkit, a framework used by applications to grant access to privileged operations. The reason of this decision is mainly because polkit has already been successfully used to grant access to external hard disks, and unsurprisingly the access control requirements for smart cards share many similarities with removable devices such as hard disks.

The pcsc-lite access control framework is now part of pcsc-lite 1.8.11 and will be enabled by default in Fedora 21. The advantages that it offers is that it can prevent unauthorized users from issuing commands to smart cards, and prevent unauthorized users from reading, writing or (in some cases) erasing any public data from a smart card. The access control is imposed during the session initialization, thus reducing to minimal any potential overhead. The default policy in Fedora 21 will treat any user on the console as authorized, as physical access to the console implies physical access to the card, but remote users, e.g., via ssh, or system daemons will be treated as unauthorized unless they have administrative rights.

Let’s now see how the smart card access control can be administered. The system-wide policy for pcsc-lite daemon is available at /usr/share/polkit-1/actions/org.debian.pcsc-lite.policy. That file is a polkit XML file that contains the default rules needed to access the daemon. The default policy that will be shipped in Fedora 21 consists of the following.

  <action id="org.debian.pcsc-lite.access_pcsc">
    <description>Access to the PC/SC daemon</description>
    <message>Authentication is required to access the PC/SC daemon</message>
    <defaults>
      <allow_any>auth_admin</allow_any>
      <allow_inactive>auth_admin</allow_inactive>
      <allow_active>yes</allow_active>
    </defaults>
  </action>

  <action id="org.debian.pcsc-lite.access_card">
    <description>Access to the smart card</description>
    <message>Authentication is required to access the smart card</message>
    <defaults>
      <allow_any>auth_admin</allow_any>
      <allow_inactive>auth_admin</allow_inactive>
      <allow_active>yes</allow_active>
    </defaults>
  </action>

The syntax format is explained in more details in the polkit manual page. The pcsc-lite relevant parts are the action IDs. The action with ID “org.debian.pcsc-lite.access_pcsc” contains the policy in order to access the pcsc-lite daemon and issue commands to it, i.e., access the unix domain socket. The latter action with ID “org.debian.pcsc-lite.access_card” contains the policy to issue commands to smart cards available to the pcsc-lite daemon. That distinction allows for example programs to query the number of readers and cards present, but not issue any commands to them. Under both policies only active (console) processes are allowed to access the pcsc-lite daemon and smart cards, unless they are privileged processes.

Polkit, is quite more flexible though. With it we can provide even more fine-grained access control, e.g., to specific card readers. For example, if we have a web server that utilizes a smart card we can restrict it to use only the smart cards under a given reader. These rules are expressed in Javascript and can be added in a separate file in /usr/share/polkit-1/rules.d/. Let’s now see how the rules for our example would look like.

polkit.addRule(function(action, subject) {
    if (action.id == "org.debian.pcsc-lite.access_pcsc" &&
        subject.user == "apache") {
            return polkit.Result.YES;
    }
});

polkit.addRule(function(action, subject) {
    if (action.id == "org.debian.pcsc-lite.access_card" &&
        action.lookup("reader") == 'name_of_reader' &&
        subject.user == "apache") {
            return polkit.Result.YES;    }
});

Here we add two rules. The first one allows the user “apache”, which is the user the web-server runs under, to access the pcsc-lite daemon. That rule explicitly allows access to the daemon because in our default policy only administrator and console user can access it. The latter rule, it allows the same user to access the smart card reader identified by “name_of_reader”. The name of the reader can be obtained using the commands pcsc_scan or opensc-tool -l.

With these changes to pcsc-lite we manage to provide reasonable default settings for the users of smart cards that apply to most, if not all, typical uses. These default settings increase the overall security of the system, by denying access to the smart card firmware, as well as to data and operations for non-authorized users.

July 29, 2014

Threat: Tom the Programmer

TomProgrammer

No discussion of system integrity and security would be complete without Tom.

Without the applications, tools, and utilities that Tom writes, computers would be nothing but expensive space heaters. Software, especially applications software, is the reason computers exist.

Tom is a risk because of the mistakes that he might make – mistakes that can crash an application or even an entire system, mistakes that can corrupt or lose data, and logic errors that can produce erroneous results.

Today, most large applications are actually groups of specialized applications working together. The classic example is three tier applications which include a database tier, a business logic tier, and a presentation tier. Each tier is commonly run on a different machine. The presentation and business logic tiers are commonly replicated for performance, and the database tier is often configured with fail-over for high availability. Thus, you add complex communications  between these application components as well as the challenge of developing and upgrading each component. It isn’t surprising that problems can arise! Building and maintaining these applications is much more challenging than a single application on a single system.

Tom is also a risk because of the things he can do deliberately – add money to his bank account, upload credit card data to a foreign system, steal passwords and user identity, and a wide range of other “interesting” things.

If Tom works for you, look for integrity as well as technical skills.

Be aware that behind every software package is a programmer or a team of programmers. They are like fire – they can do great good or great damage. And, like fire, it is easy to overlook them until something bad happens.


OTP authentication in FreeIPA

As of release 4.0.0, FreeIPA supports OTP authentication. HOTP and TOTP tokens are supported natively, and there is also support for proxying requests to a separately administered RADIUS server.

To become more familiar with FreeIPA and its capabilities, I have been spending a little time each week setting up scenarios and testing different features. Last week, I began playing with a YubiKey for HOTP authentication. A separate blog about using YubiKey with FreeIPA will follow, but first I wanted to post about how FreeIPA’s native OTP support is implemented. This deep dive was unfortunately the result of some issues I encountered, but I learned a lot in a short time and I can now share this information, so maybe it wasn’t unfortunate after all.

User view of OTP

A user has received or enrolled an OTP token. This may be a hardware token, such as YubiKey, or a software token like FreeOTP for mobile devices, which can capture the token simply by pointing the camera at the QR code FreeIPA generates.

When logging in to an IPA-backed service, the FreeIPA web UI, or when running kinit, the user uses their token to generate a single-use value, which is appended to their usual password. To authenticate the user, this single-use value is validated in addition to the usual password validation, providing an additional factor of security.

HOTP algorithm

The HMAC-based One-Time Password (HOTP) algorithm uses a secret key that is known to the validation server and the token device or software. The key is used to generate an HMAC of a monotonically increasing counter that is incremented each time a new token is generated. The output of the HMAC function is then truncated to a short numeric code – often 6 or 8 digits. This is the single-use OTP value that is transmitted to the server. Because the server knows the secret key and the current value of the counter, it can validate the value sent by the client.

HOTP is specified in RFC 4226. TOTP (Time-based One-Time Password), specified in RFC 6238, is a variation of HOTP that MACs the number of time steps since the UNIX epoch, instead of a counter.

Authentication flow

The problem I encountered was that HOTP authentication (to the FreeIPA web UI) was failing about half the time (there was no discernable pattern of failure). The FreeIPA web UI seemed like a logical place to start investigating the problem, but for a password (and OTP value) it is just the first port of call in a journey through a remarkable number of services and libraries.

Web UI and kinit

The ipaserver.rpcserver.login_password class is responsible for handling the password login process. It’s implementation reads request parameters and calls kinit(1) with the user credentials. Its (heavily abridged) implementation follows:

class login_password(Backend, KerberosSession, HTTP_Status):
    def __call__(self, environ, start_response):
        # Get the user and password parameters from the request
        query_dict = urlparse.parse_qs(query_string)
        user = query_dict.get('user', None)
        password = query_dict.get('password', None)

        # Get the ccache we'll use and attempt to get
        # credentials in it with user,password
        ipa_ccache_name = get_ipa_ccache_name()
        self.kinit(user, self.api.env.realm, password, ipa_ccache_name)
        return self.finalize_kerberos_acquisition(
            'login_password', ipa_ccache_name, environ, start_response)

    def kinit(self, user, realm, password, ccache_name):
        # get http service ccache as an armor for FAST to enable
        # OTP authentication
        armor_principal = krb5_format_service_principal_name(
            'HTTP', self.api.env.host, realm)
        keytab = paths.IPA_KEYTAB
        armor_name = "%sA_%s" % (krbccache_prefix, user)
        armor_path = os.path.join(krbccache_dir, armor_name)

        (stdout, stderr, returncode) = ipautil.run(
            [paths.KINIT, '-kt', keytab, armor_principal],
            env={'KRB5CCNAME': armor_path}, raiseonerr=False)

        # Format the user as a kerberos principal
        principal = krb5_format_principal_name(user, realm)

        (stdout, stderr, returncode) = ipautil.run(
            [paths.KINIT, principal, '-T', armor_path],
            env={'KRB5CCNAME': ccache_name, 'LC_ALL': 'C'},
            stdin=password, raiseonerr=False)

We see that the login_password object reads credentials out of the request and invokes kinit using those credentials, over an encrypted FAST (flexible authentication secure tunneling) channel. At this point, the authentication flow is the same as if a user had invoked kinit from the command line in a similar manner.

KDC

Recent versions of the MIT Kerberos key distrubution centre (KDC) have support for OTP preauthentication. This preauthentication mechanism is specified in RFC 6560.

The freeipa-server package ships the ipadb.so KDC database plugin that talks to the database over LDAP to look up principals and their configuration. In this manner the KDC can find out that a principal is configured for OTP authentication, but this is not where OTP validation takes place. Instead, an OTP-enabled principal’s configuration tells the KDC to forward the credentials elsewhere for validation, over RADIUS.

ipa-otpd

FreeIPA ships a daemon called ipa-otpd. The KDC communicates with it using the RADIUS protocol, over a UNIX domain socket. When ipa-otpd receives a RADIUS authentication packet, it queries the database over LDAP to see if the principal is configured for RADIUS or native OTP authentication. For RADIUS authentication, it forwards the request on to the configured RADIUS server, otherwise it attempts an LDAP BIND operation using the passed credentials.

As a side note, ipa-otpd is controlled by a systemd socket unit. This is an interesting feature of systemd, but I won’t delve into it here. See man 5 systemd.socket for details.

Directory server

Finally, the principal’s credentials – her distinguished name and password with OTP value appended – reach the database in the form of a BIND request. But we’re still not at the bottom of this rabbit hole, because 389 Directory Server does not know how to validate an OTP value or indeed anything about OTP!

Yet another plugin to the rescue. freeipa-server ships the libipa_pwd_extop.so directory server plugin, which handles concepts such as password expiry and – finally – OTP validation. By way of this plugin, the directory server attempts to validate the OTP value and authenticate the user, and the whole process that led to this point unwinds back through ipa-otpd and the KDC to the Kerberos client (and through the web UI to the browser, if this was how the whole process started).

Diagram

My drawing skills leave a lot to be desired, but I’ve tried to summarise the preceding information in the following diagram. Arrows show the communication protocols involved; red arrows carry user credentials including the OTP value. The dotted line and box show the alternative configuration where ipa-otpd proxies the token on to an external RADIUS server.

image

Debugging the authentication problem

At time of writing, I still haven’t figured out the cause of my issue. Binding directly to LDAP using an OTP token works every time, so it definitely was not an issue with the HOTP implementation. Executing kinit directly fails about half the time, so the problem is likely to be with the KDC or with ipa-otpd.

When the failure occurs, the dirsrv access log shows two BIND operations for the principal (in the success case, there is only one BIND, as would be expected):

[30/Jul/2014:02:58:54 -0400] conn=23 op=4 BIND dn="uid=ftweedal,cn=users,cn=accounts,dc=ipa,dc=local" method=128 version=3
[30/Jul/2014:02:58:54 -0400] conn=23 op=4 RESULT err=0 tag=97 nentries=0 etime=0 dn="uid=ftweedal,cn=users,cn=accounts,dc=ipa,dc=local"
[30/Jul/2014:02:58:55 -0400] conn=37 op=4 BIND dn="uid=ftweedal,cn=users,cn=accounts,dc=ipa,dc=local" method=128 version=3
[30/Jul/2014:02:58:55 -0400] conn=37 op=4 RESULT err=49 tag=97 nentries=0 etime=0

The first BIND operation succeeds, but for some reason, one second later, the KDC or ipa-otpd attempts to authenticate again. It would make sense that the same credentials are used, and in that case the second BIND operation would fail (error code 49 means invalid credentials) due to the HOTP counter having been incremented in the database.

ipa-otpd does some logging via the systemd journal facility, so it was possible to observe its behaviour via journalctl --follow /usr/libexec/ipa-otpd. The log output for a failed login showed two requests being send by the KDC, thus exonerating ipa-otpd:

Aug 04 02:44:35 ipa-2.ipa.local ipa-otpd[3910]: ftweedal@IPA.LOCAL: request received
Aug 04 02:44:35 ipa-2.ipa.local ipa-otpd[3910]: ftweedal@IPA.LOCAL: user query start
Aug 04 02:44:35 ipa-2.ipa.local ipa-otpd[3910]: ftweedal@IPA.LOCAL: user query end: uid=ftweedal,cn=users,cn=accounts,dc=ipa,dc=local
Aug 04 02:44:35 ipa-2.ipa.local ipa-otpd[3910]: ftweedal@IPA.LOCAL: bind start: uid=ftweedal,cn=users,cn=accounts,dc=ipa,dc=local
Aug 04 02:44:36 ipa-2.ipa.local ipa-otpd[3935]: ftweedal@IPA.LOCAL: request received
Aug 04 02:44:36 ipa-2.ipa.local ipa-otpd[3935]: ftweedal@IPA.LOCAL: user query start
Aug 04 02:44:37 ipa-2.ipa.local ipa-otpd[3935]: ftweedal@IPA.LOCAL: user query end: uid=ftweedal,cn=users,cn=accounts,dc=ipa,dc=local
Aug 04 02:44:37 ipa-2.ipa.local ipa-otpd[3935]: ftweedal@IPA.LOCAL: bind start: uid=ftweedal,cn=users,cn=accounts,dc=ipa,dc=local
Aug 04 02:44:37 ipa-2.ipa.local ipa-otpd[3910]: ftweedal@IPA.LOCAL: bind end: success
Aug 04 02:44:37 ipa-2.ipa.local ipa-otpd[3910]: ftweedal@IPA.LOCAL: response sent: Access-Accept
Aug 04 02:44:38 ipa-2.ipa.local ipa-otpd[3935]: ftweedal@IPA.LOCAL: bind end: Invalid credentials
Aug 04 02:44:38 ipa-2.ipa.local ipa-otpd[3935]: ftweedal@IPA.LOCAL: response sent: Access-Reject

The KDC log output likewise showed two KRB_AS_REQ requests coming from the client (i.e. kinit) – one of these resulted in a ticket being issued, and the other resulted in a KDC_ERR_PREAUTH_FAILED response. Therefore, after all this investigation, the cause of the problem seems to be aggressive retry behaviour in kinit.

I had been testing with MIT Kerberos version 1.11.5 from the Fedora 20 repositories. A quick scan of the Kerberos commit log turned up some promising changes released in version 1.12. Since the Fedora package for 1.11 includes a number of backports from 1.12 already, I backported the most promising change: one that relaxes the timeout if kinit connects to the KDC over TCP. Unfortunately, this did not fix the issue.

I was curious whether version the 1.12 client exhibited the same behaviour. The Fedora 21 repositories have MIT Kerberos version 1.12, so I installed a preview release and enrolled the host. OTP authentication worked fine, so the change I backported to 1.11 was either the wrong change, or needed other changes to work properly.

Since HOTP authentication in FreeIPA is somewhat discouraged due to the cost and other implications of counter synchronisation in a replicated environment, and since the problem seems to be rectified in MIT Kerberos 1.12, I was happy to conclude my investigations at this point.

Concluding thoughts

OTP authentication in FreeIPA involves a lot of different servers, plugins and libraries. To provide the OTP functionality and make all the services work together, freeipa-server ships a KDC plugin, a directory server plugin, and the ipa-otpd daemon! Was it necessary to have this many moving parts?

The original design proposal explains many of the design decisions. In particular, ipa-otpd is necessary for a couple of reasons. The first is the fact that the MIT KDC supports only RADIUS servers for OTP validation, so for native OTP support we must have some component act as a RADIUS server. Second, the KDC radius configuration is static, so configuration is simplified by having the KDC talk only to ipa-otpd for OTP validation. It is also nice that ipa-otpd is the sole arbiter of whether to proxy a request to an external RADIUS server or to attempt an LDAP BIND.

What if the KDC could dynamically work out where to direct RADIUS packets for OTP validation? It is not hard to conceieve of this, since it already dynamically learns whether a principal is configured for OTP by way of the ipadb.so plugin. But even if this were possible, the current design is arguably preferable since, unlike the KDC, we have full control over the implementation of ipa-otpd and are therefore better placed to respond to performance or security concerns in this aspect of the OTP authentication flow.

July 25, 2014

Threat: Fred the System Administrator

FredSysadmin

In terms of threat potential, Fred is off the charts. In order to do his job, he has essentially uncontrolled access to all computer resources. Fred can damage software and data in obvious or subtle ways. He can wipe out users, steal data, and wreak almost unimaginable carnage.

Fortunately, the vast majority of system administrators are conscientious, professional and honest. They are a force for good, committed to keeping systems running smoothly, data protected, and users productive.

Fred is a risk to system integrity in two ways – accidentally and deliberately.

Most of the time, the greatest threat from Fred is that he doesn’t have the resources he needs to do his job or that his hands are tied by management edicts. These factors can cause system administrators to do (or not do) things that threaten system integrity and security. If Fred is denied budget for proper backups, data is at risk. If Fred is ordered to punch a hole through the firewall to allow sales people access to the orders database, without VPN and proper authentication, systems are at risk. If Fred is ordered to allow contractors access to internal networks – see the Target case – the entire network can be exposed.

In the Target case it isn’t clear if the issue was do to a network design problem or if there were orders to provide this access. This would be interesting to know.

If Fred does go bad, there is almost no limit to the damage he can do. Even if he doesn’t compromise systems he can commit identity or credit card theft or steal company – or even national – confidential data. I don’t think I have to do more than mention the name Edward Snowden…

A number of things can be done to mitigate the threats that Fred presents:

  • Recruit and hire system administrators carefully! Look for proof of integrity as well as technical skills.
  • Ensure that your sysadmins have the training, resources and management support to do their jobs.
  • “Trust but verify.” Have regular system audits. Ensure that system access and changes are logged to a secure remote logging server. Look at the log files! Apply technology, process, and people to maintaining the integrity of system management.
  • Divide responsibilities. Large companies will have separate organizations responsible for systems, networks, storage and applications. Divide up the work and accountability to address both functional and system integrity needs.
  • Focus on detection, mitigation and remediation more than prevention. Go talk to your colleagues in Finance – they have hundreds of years of experience working with high value systems.  You will be surprised at what you can learn from them. They have evolved a model that is designed to prevent theft and misuse where possible, to detect it when it does occur, and to minimize losses. They are aware that you can’t stop everything while keeping the business going – but you should be able to minimize losses and to discover things eventually. Find out how they do policies and procedures, the ethical and business guidelines they follow, how they implement internal controls, and how they balance risk and cost. Hint: it isn’t worthwhile spending $10,000 to stop $20 in losses. But if someone is stealing $10 here and $10 there, you want to find out about it before it grows.

 


July 24, 2014

Devstack mounted via NFS

Devstack allows the developer to work with the master branches for upstream OpenStack development. But Devstack performs many operations (such as replacing pip) that might be viewed as corrupting a machine, and should not be done on your development workstation. I’m currently developing with Devstack on a Virtual Machine running on my system. Here is my setup:

Both my virtual machine and my Base OS are Fedora 20. To run a virtual machine, I use KVM and virt-manager. My VM is fairly beefy, with 2 GB of Ram allocated, and a 28 GB hard disk.

I keep my code in git repositories on my host laptop. To make the code available to the virtual machine, I export them via NFS, and mount them on the host VM in /opt/stack, owned by the ayoung user, which mirrors the setup on the base system.

Make sure NFS is running with:

sudo systemctl enable nfs-server.service 
sudo systemctl start  nfs-server.service

My /etc/exports:

/opt/stack/ *(rw,sync,no_root_squash,no_subtree_check)

And to enable changes in this file

sudo exportfs

Make sure firewalld has the port for nfs open, but only for the internal network. For me, this is interface

virbr0: flags=4163 UP,BROADCAST,RUNNING,MULTICAST  mtu 1500
        inet 192.168.122.1  netmask 255.255.255.0  broadcast 192.168.122.255

I used the firewall-config application to modify firewalld:

For both, make sure the Configuration select box is set on Permanent or you will be making this change each time you reboot.

First add the interface:

firewalld-nfs-interfaces

And enable NFS:

firewalld-nfs-ports

In the Virtual machine, I added a user (ayoung) with the same numeric userid and group id from my base laptop. To find these values:

$ getent passwd ayoung
ayoung:x:14370:14370:Adam Young:/home/ayoung:/bin/bash

I admit I created them when I installed the VM, which I did using the Anaconda installer and a DVD net-install image. However, the same thing can be done using user-add. I also added the user to the wheel group, which simplifies sudo.

On the remote machine, I created /opt/stack and let the ayoung user own them:

$ sudo mkdir /opt/stack ; sudo chown ayoung:ayoung /opt/stack

To mount the directory via nfs, I made an /etc/fstab entry:

192.168.122.1:/opt/stack /opt/stack              nfs4  defaults 0 0 

And now I can mount the directory with:

$ sudo mount /opt/stack

I went through and updated the git repos in /opt/stack using a simple shell script.

 for DIR in `ls` ; do pushd $DIR ; git fetch ; git rebase origin/master ; popd ; done

The alternative is setting RECLONE=yes in /opt/stack/devstack/localrc.

When running devstack, I had to make sure that the directory /opt/stack/data was created on the host machine. Devstack attempted to create it, but got an error induced by nfs.

Why did I go this route? I need to work on code running in HTTPD, namely Horizon and Keystone. THat preclueded me from doing all of my work in a venv on my laptop. The NFS mount gives me a few things:

  • I keep my Git repo intact on my laptop. This includes the Private key to access Gerrit
  • I can edit using PyCharm on my Laptop.
  • I am sure that the code on my laptop and in my virtual machine is identical.

This last point is essential for remote debugging. I just go this to work for Keystone, and have submitted a patch that enables it for Keystone. I’ll be working up something comparable for Horizon here shortly.

July 21, 2014

Threats: William the Manager
William the Manager

William the Manager

William is concerned with his group getting their job done. He is under budget pressure, time pressure, and requirements to deliver. William is a good manager – he is concerned for his people and dedicated to removing obstacles that get in their way.

To a large degree William is measured on current performance and expectations for the next quarter. This means that he has little sympathy for other departments getting in the way of his people making the business successful! A lot of his job involves working with other groups to make sure that they meet his needs. And when they don’t, he gets them over-ruled or works around them.

When William does planning – and he does! – he is focused on generating business value and getting results that benefit him and his team. He is not especially concerned about global architecture or systems design or “that long list of hypothetical security issues”. Get the job done, generate value for the company, and move on to the next opportunity.

William sees IT departments as an obstacle to overcome – they are slow, non-responsive, and keep doing things that get in the way of his team. He sees the security team in particular as being an unreasonable group of people who have no idea what things are like in the real world, and who seem be be dedicated to coming up with all sorts of ridiculous requirements that are apparently designed to keep the business from succeeding.

William, with the best of intentions, is likely to compromise and work around security controls – and often gets the support of top management in doing this. To be more blunt, if security gets in the way, it is gone! If a security feature interferes with getting work done, he will issue orders to turn that feature off. If you look at some of my other posts on the value of IT and computer systems, such as Creating Business Value, you will see that, at least in some cases, William may be right.

And this is assuming that William is a good corporate citizen, looking out for the best interests of the company. If he is just looking out for himself, the situation can be much worse.

It is not enough to try to educate William on security issues – for one thing (depending on the security feature), William may be right! The only chance for security is to find ways to implement security controls that don’t excessively impact the business units. And to keep the nuclear option for the severe cases where it is needed, such as saving credit card numbers in plain text on an Internet facing system. (Yes, this can easily happen – for example, William might set up a “quick and dirty” ecommerce system on AWS if the IT group isn’t able to meet his needs.)


July 18, 2014

Oh No! I Committed to master! What do I do?

You were working in a git repo and you committed your change to master. Happens all the time. Panic not.

Here are the steps to recover

Create a new branch from your current master branch. This will include your new commit.  (to be clear, you should replace ‘description-of-work’  with a short name for your new branch)

git branch description-of-work

now reset your current master branch to upstream

git reset --hard origin/master

All fixed.

Why did this work?

A branch in git points to a specific commit.  All commits are named by hashes.  For example, right now, I have a keystone repo with my master branch pointing to

$ git show master
commit bbfd58a6c190607f7063d15a3e2836e40806ef57
Merge: e523119 f18911e
Author: Jenkins <jenkins@review.openstack.org>
Date: Fri Jul 11 23:36:17 2014 +0000

Merge "Do not use keystone's config for nova's port"

This is defined by a file in .git/refs:

$ cat .git/refs/heads/master 
bbfd58a6c190607f7063d15a3e2836e40806ef57

I could edit this file by hand and have the same effect as a git checkout. Lets do exactly that.

$ cp .git/refs/heads/master .git/refs/heads/edit-by-hand
$ git branch 
  edit-by-hand
* master
$ git show edit-by-hand 
commit bbfd58a6c190607f7063d15a3e2836e40806ef57
Merge: e523119 f18911e
Author: Jenkins <jenkins>
Date:   Fri Jul 11 23:36:17 2014 +0000

    Merge "Do not use keystone's config for nova's port"

OK, lets modify this the right way:

$ git checkout edit-by-hand 
Switched to branch 'edit-by-hand'
$ git reset --hard HEAD~1
HEAD is now at e523119 Merge "Adds hacking check for debug logging translations"
$ git show edit-by-hand 
commit e52311945a4ab3b47a39084b51a2cc596a2a1161
Merge: b0d690a 76baf5b
Author: Jenkins <jenkins>
Date:   Fri Jul 11 22:19:03 2014 +0000

    Merge "Adds hacking check for debug logging translations"
...

that made it in to:

$ cat .git/refs/heads/edit-by-hand 
e52311945a4ab3b47a39084b51a2cc596a2a1161

Here is the history for the edit-by-hand branch

$ git log edit-by-hand  --oneline

Returns

e523119 Merge "Adds hacking check for debug logging translations"
b0d690a Merge "multi-backend support for identity"
6aa0ad5 Merge "Imported Translations from Transifex"
...

I want the full hash for 6aa0ad5 so:

git show --stat 6aa0ad5
commit 6aa0ad5beb39107ffece6e5d4a068d77f7d51059

Lets set the branch to point to this:

$ echo 6aa0ad5beb39107ffece6e5d4a068d77f7d51059 > .git/refs/heads/edit-by-hand 
$ git show
commit 6aa0ad5beb39107ffece6e5d4a068d77f7d51059
Merge: 2bca93f bf8a2e2
Author: Jenkins <jenkins>
Date:   Fri Jul 11 21:48:09 2014 +0000

    Merge "Imported Translations from Transifex"

What would you expect now? I haven’t looked yet, but I would expect git to tell me that I have a bunch of unstaged changes; basically, everything that was in the commits on master that I chopped off of edit-by-hand. Lets look:

$ git status
On branch edit-by-hand
Changes to be committed:
  (use "git reset HEAD <file>..." to unstage)

	modified:   doc/source/configuration.rst
	modified:   doc/source/developing.rst
	modified:   etc/keystone.conf.sample
...
Untracked files:
  (use "git add <file>..." to include in what will be committed)

It was pretty long, so I cut out some files.

I can undo all those “changes” by setting the hash back to the original value:

$ echo e52311945a4ab3b47a39084b51a2cc596a2a1161   > .git/refs/heads/edit-by-hand 
$ git status
On branch edit-by-hand
Untracked files:
  (use "git add <file>..." to include in what will be committed)

Why did that work? Without the explicit “checkout” command, the current workspace was left unchanged.

July 16, 2014

Kerberos for Horizon and Keystone

I have a Horizon instance Proof of Concept. It has a way to go to be used in production, but the mechanism works.

This is not a how-to. I’ve written up some of the steps in the past. Instead, this is an attempt to illuminate some of the issues.

I started with a Packstack based all in one on Fedora 20 instance registered as a FreeIPA client. I hand modified Keystone to use HTTPD.

The Horizon HTTP instance is set up with S4U2Proxy. Since both Horizon and Keystone are on the same machine, it is both the source and target of the proxy rule; a user connecting via HTTPS gets a service ticket for Horizon, which then requests a delegated service ticket for itself. I’m not seeing any traffic on the KDC when this happens, which leads me to think that the Kerberos library is smart enough to reuse the initial service ticket for the user. However, I’ve also tested S4U2Proxy from HTTPD to Keystone on a remote machine in an earlier set up, and am fairly certain that this will work when Horizon and Keystone are not co-located.

After initial configuration, I did a git clone of the repositories for the projects I needed to modify:

  • django_openstack_auth
  • keystone
  • python-keystoneclient
  • python-openstackclient

To use this code, I switched to each directory and ran:

sudo pip install -e .

Horizon

overseas-broadening-horizons630x354

Horizon uses form based authentication. I have not modified this. Longer term, we would need to determine what UI to show based on the authentication mechanism. I would like to be able to disable Form Based authentication for the Kerberos case, as I think passing your Kerberos password over the wire is one of the worst security practices; We should actively discourage it.

Django OpenStack Auth and Keystone Client

Django_Reinhardt,_Aquarium,_New_York,_N.Y.,_ca._Nov._1946_(William_P._Gottlieb_07311)

Django Reinhardt

Horizon uses a project called django-openstack-auth that communicates with Keystone client. This needs to work with client auth plugins. I’ve hacked this in code, but the right solution is for it to get the auth plugin out of the Django configuration options. Implied here is that django-openstack-auth should be able to use keystoneclient sessions and V3 Keystone authentication.

keystone_2705841439_067c16b192

When a user authenticates to Horizon, they do not set a project. Thus, Horizon does not know what project to pass in the token request. The Token API has some strange behaviour when it comes to token requests without an explicit scope. If the user has a default project set, they get a token scoped to that project. If they do not have a default project set, they get an unscoped token.

Jamie Lennox has some outstanding patches to Keystone client that address the Kerberos use cases. Specifically, if the client gets an unscoped token, there is no service catalog. Since the client makes calls to Keystone based on endpoints in the service catalog, it cannot use an unscoped token. One of Jamie’s patches address this; if there is no service catalog, continue to use the Auth URL to talk to Keystone. This is a bit of an abuse of the Auth URL and really should only be used to get the list of domains or projects for a user. Once the user has this information, they can request a scoped token. This is what Horizon needs to do on behalf of the user.

While Keystone can use Kerberos as an “method” value when creating a token, the current set of plugins did not allow for mapping to the DefaultDomain. There is a plugin for “external” to do that, and I subclassed that for Kerberos. There is an outstanding ticket for removing the “method” value from the plugin implementations, which will reduce the number of classes we need to implement common behavior.

To talk to a Kerberos protected Keystone, the Keystone client needs to use the Kerberos Auth plugin. However, it can only use this to get a token; the auth plugin does not handle other communication. Thus, only the /auth/tokens path should be Kerberos protected. Here is what I am using:

<Location "/keystone/krb">
  LogLevel debug
  WSGIProcessGroup keystone_krb_wsgi
  NSSRequireSSL
</Location>

<Location "/keystone/krb/v3/auth/tokens">
  AuthType Kerberos
  AuthName "Kerberos Login"
  KrbMethodNegotiate on
  KrbMethodK5Passwd off
  KrbServiceName HTTP
  KrbAuthRealms IPA.CLOUDLAB.FREEIPA.ORG
  Krb5KeyTab /etc/httpd/conf/openstack.keytab
  KrbSaveCredentials on
  KrbLocalUserMapping on
  # defaults off, but to be explicit
  # Keystone should not be a proxy 
  KrbConstrainedDelegation off
  Require valid-user
  NSSRequireSSL
</Location>

<Location "/dashboard/auth/login/">
  LogLevel debug
  AuthType Kerberos
  AuthName "Kerberos Login"
  KrbMethodNegotiate on
  KrbMethodK5Passwd off
  KrbServiceName HTTP
  KrbAuthRealms IPA.CLOUDLAB.FREEIPA.ORG
  Krb5KeyTab /etc/httpd/conf/openstack.keytab
  KrbSaveCredentials on
  KrbLocalUserMapping on
  KrbConstrainedDelegation on
  Require valid-user
  NSSRequireSSL
</Location>

To go further, the Kerberos protection should be optional if you wish to allow other authentication methods to be stacked with Kerberos.

This implies that the value:

Require valid-user

Should be in the HTTPD conf section for Horizon, but not for Keystone. This does not yet work for me, and I will investigate further.

Herakles_menangkap_Kerberos

Manage Kerberos like a Hero

Secure web communications require cryptography. For authentication, the two HTTP based standards are Client Side Certificates and Kerberos. Of the two, only Kerberos allows for constrained delegation in a standardized way. Making Kerberos part of the standard approach to OpenStack will lead to a more secure OpenStack.

Towards efficient security code audits

Conducting a code review is often a daunting task, especially when the goal is to find security flaws. They can, and usually are, hidden in all parts and levels of the application – from the lowest level coding errors, through unsafe coding constructs, misuse of APIs, to the overall architecture of the application. Size and quality of the codebase, quality of (hopefully) existing documentation and time restrictions are the main complications of the review. It is therefore useful to have a plan beforehand: know what to look for, how to find the flaws efficiently and how to prioritize.

Code review should start by collecting and reviewing existing documentation about the application. The goal is to get a decent overall picture about the application – what is the expected functionality, what requirements can be possibly expected from the security standpoint, where are the trust boundaries. Not all flaws with security implications are relevant in all contexts, e.g. effective denial of service against server certainly has security implications, whereas coding error in command line application which causes excessive CPU load will probably have low impact. At the end of this phase it should be clear what are the security requirements and which flaws could have the highest impact.

Armed with this knowledge the next step is to define the scope for audit. It is generally always the case that conducting a thorough review would require much more resources than are available, so defining what parts will be audited and which vulnerabilities will be searched for increases efficiency of the audit. It is however necessary to state all the assumptions made explicitly in the report – this makes it possible for others to review them or revisit them in the future in next audits.

In general there are two approaches to conducting a code review – for the lack of better terminology we shall call them bottom up and top down. Of course, real audits always combine techniques from both, so this classification is merely useful when we want to put them in a context.

The top down approach starts with the overall picture of the application and security requirements and drills down towards lower levels of abstraction. We often start by identifying components of application, their relationships and mapping the flow of data. Drilling further down, we can choose to inspect potentially sensitive interfaces which components provide, how data is handled at rest and in motion, how access to sensitive parts of application are restricted etc. From this point audit is quickly becoming very targeted – since we have a good picture of which components, interfaces and channels might be vulnerable to which classes of attacks, we can focus our search and ignore the other parts. Sometimes this will bring us down to the level of line-by-line code inspection, but this is fine – it usually means that architecturally some part of security of application depends on correctness of the code in question.

Top down approach is invaluable, as it is possible to find flaws in overall architecture that would otherwise go unnoticed. However, it is also very demanding – it requires a broad knowledge of all classes of weaknesses, threat models and ability to switch between abstraction levels quickly. Cost of such audit can be reduced by reviewing the application very early in the design phase – unfortunately most of the times this is not possible due to development model chosen or phase in which audit was requested. Another way how to reduce the effort is to invest effort into documentation and reusing it in the future audits.

In the bottom up approach we usually look for indications of vulnerabilities in the code itself and investigate whether they can possibly lead to exploitation. These indications may include outright dangerous code, misuse of APIs, dangerous coding constructs and bad practices to poor code quality – all of these may indicate presence of weakness in the code. Search is usually automated, as there is abundance of tools to simplify this task including static analyzers, code quality metric tools and the most versatile one: grep. All of these reduce the cost of finding a potentially weak spots and so the cost lies in separating wheat from chaff. Bane of this appoach is receiver operating characteristic curve – it is difficult to substantially improve it, so we are usually left with the tradeoffs between false positives and false negatives.

Advantages of bottom up approach are relatively low requirements on resources and reusability. This means it is often easy and desirable to run such analyses as early and as often as possible. It is also much less depends on the skill of the reviewer, since the patterns can be collected to create a knowledgebase, aided with freely available resources on internet. It is a good idea to create checklists to make sure all common types of weaknesses are audited for and make this kind of review more scalable. On the other hand, biggest disadvantage is that certain classes of weaknesses can never be found with this approach – these usually include architectural flaws which lead to vulnerabilities with biggest impact.

The last step in any audit is writing a report. Even though this is usually perceived as the least productive time spent, it is an important one. A good report can enable other interested parties to further scrutinize weak points, provides necessary information to make a potentially hard decisions and is a good way to share and reuse knowledge that might otherwise stay private.

July 14, 2014

Threats: Stan the Security Czar

What?!? The security guy is listed as a threat to system security?

Stan the Security Czar
Absolutely. Stan is knowledgeable. He knows that the world is filled with evil. And he is determined to protect his company from it.

There is a famous saying: The only truly secure computer system is one that is melted down into slag, cast into a concrete block, and dumped into the deepest ocean trench. Even then you can’t be completely sure…

The challenge is that many things done to harden a computer system make the system more difficult to use. And the Law of Unintended Consequences always comes in to play. For example, to make passwords resistant to brute force attacks, you need to make them long and have them include different types of characters. And, for some reason, you need to change passwords regularly.

So, the answer is to require 16 character passwords with upper case, lower case, numbers, and special characters, containing no dictionary words, and to change them every 30 days – right? Ummm, no. This actually massively reduces security – we will talk about this more in a future post.

As another example, how about setting the inactivity timer in an application, which forces you to re-enter your username and password, to five minutes? Or perhaps two minutes or even one minute? After all, you can’t be too secure! Far from being effective security, this will result in computers being thrown off the roof of the building and lynch mobs looking for the person responsible! As well as a significant drop in productivity.

An excellent discussion of the behaviour of Stan the Security Czar occurs in the book “The Phoenix Project: A Novel about IT, DevOps, and Helping Your Business Win” – this is an excellent book which I encourage everyone to read. It shows how a focus on technology, without taking into consideration the power of people and processes, can be very expensive and actually reduce effective security.

To bring things into a sharp focus, recall our premise that the reason for IT is to support the generation of business value, and that business value comes from people using applications to transform data. Anything that interferes with any part of this reduces the value of IT – and heavy handed security approaches can massively impact the business value of IT. Without careful consideration of human and business factors, Stan is likely to do things that hinder use of computer systems and actually reduce overall security in the name of improving security. The challenge in dealing with Stan is to achieve appropriate security while maintaining the business value of IT.


July 11, 2014

Daniel J. Bernstein lecture on software (in)security

Building secure software and secure software systems is obviously an important part of my job as a developer on the FreeIPA identity management and Dogtag PKI projects here at Red Hat. Last night I had the privilege of attending a lecture by the renowned Research Professor Daniel J. Bernstein at Queensland University of Technology entitled Making sure software stays insecure (slides). The abstract of his talk:

We have to watch and listen to everything that people are doing so that we can catch terrorists, drug dealers, pedophiles, and organized criminals. Some of this data is sent unencrypted through the Internet, or sent encrypted to a company that passes the data along to us, but we learn much more when we have comprehensive direct access to hundreds of millions of disks and screens and microphones and cameras. This talk explains how we’ve successfully manipulated the world’s software ecosystem to ensure our continuing access to this wealth of data. This talk will not cover our efforts against encryption, and will not cover our hardware back doors.

Of course, Prof. Bernstein was not the "we" of the abstract. Rather, the lecture, in its early part, took the form of a thought experiment suggesting how this manipulation could be taking place. In the latter part of the lecture, Prof. Bernstein justified and discussed some security primitives he feels are missing from today’s software.

I will now briefly recount the lecture and the Q&A that followed (a reconstitution of my handwritten notes; some paraphrase and omissions have occurred), then wrap up with my thoughts about the lecture.

Lecture notes

Introduction

  • Smartphones; almost everyone has one. Pretty much anyone in the world can turn on the microphone or camera and find out what’s happening.
  • It is terrifying that people (authoritarian governments, or, even if you trust your goverment now, can you trust the next one?) have access to such capabilities.
  • Watching everyone, all the time, is not an effective way to catch bad guys. Yes, they are bad, but total surveillance is ineffective and violates rights.
  • Prof. Bernstein has no evidence of deliberate manipulation of software ecosystems to this end, but now embarks on a though experiment: what if they did try?

Distract users

  • Things labelled as "security" but are actually not, e.g. anti-virus.
  • People are told to do these things, and indeed are happy to follow along. They feel good about doing something,
  • Money gets spent on e.g. virus scanners or 2014 NIST framework compliance, instead of building secure systems. 2014 NIST definition of "protect" has 98 subcategories, none of which are about making secure software.

Distract programmers

  • Automatic low-latency security updates are viewed as a security method.
  • "Security" is defined by public security vulnerabilities. This is not security. The reality is that there are other holes that attackers are actively exploiting.

Distract researchers

  • Attack papers and competitions are prominent, and research funding is often predicated on their outcomes.
  • Research into building secure systems takes a back seat.

Discourage security

  • Tell people that "there’s no such thing as 100% security, so why even try?"
  • Tell people that "it is impossible to even define security, so give up."
  • Some people make both of these claims simultaneously.
  • Hide, dismiss or mismeasure security metric #1 (defined later).
  • Prioritise compatibility, "standards", speed, e.g. "an HTTP server in the kernel is critical for performance".

Definition of security

  • Integrity policy #1: Whenever a computer shows a file, it also tells me the source of the file.
  • Example: UNIX file ownership and permissions. Multi-user system, no file sharing. If users are not sharing files, the UNIX model if implemented correctly can enforce integrity policy #1. How can we check?
    1. Check the code that enforces the file permission rules.
    2. Check the code that allocates memory, reads and writes files, and authenticates users.
    3. Check all the kernel code (beacuse it is all privileged).
  • The code to check is the trusted computing base (TCB). The size of the TCB is security metric #1. It is unnecessary to check or limit anything else.

Example: file sharing

  • Eve and Frank need to share files. Eve can own the file but give Frank write permissions.
  • By integrity policy #1, the operating system must record Frank as the source of the file.
  • If a process reads data from multiple sources, files written by the process must be marked with all those sources.

Example: web browsing

  • If you visit Frank’s site, browser may try to verify and show Frank as source of the file(s) being viewed. But browser TCB is huge.
  • What if instead of current model, you gave Frank a file upload account on your system. Files uploaded could be marked with Frank as source. Browser could then read these files.
  • Assuming the OS has this capability, it needn’t be manual. Web browsing could work this way.

Conclusion

  • Is the community even trying to build a software system with a small TCB that enforces integrity policy #1?

Q&A: Identification of sources

  • Cryptography is good for this in networked world, but current CA system is "pathetic".
  • Certificate transparency is a PKI consistency-check mechamism that may improve current infrastructure.
  • A revised infrastructure for obtaining public keys is preferable. Prof. Bernstein thinks GNUnet is interesting.
  • Smaller (i.e. actually auditable) crypto implementations are needed. TweetNaCl (pronounced "tweet salt") is a full implementation of the NaCl cryptography API in 100 tweets.

Q&A: Marking regions of file with different sources

  • I asked a question about whether there was scope within the definition of integrity policy #1 for marking regions of files with different sources, rather than marking a contiguous file with all sources.
  • Prof. Bernstein suggested that there is, but it would be better to change how we are representing that data and decompose it into separate files, rather than adding complexity to the TCB. A salient point.

Discussion

This was a thought-provoking and thoroughly enjoyable lecture. It was quite narrow in scope, defining and justifying one class of security primitives that Prof. Bernstein believes are essential. The question of how to identify a source did not come up until the Q&A. Primitives to enable privacy or anonymity did not come up at all. I suppose that by not mentioning them, Prof. Bernstein was making the point that they are orthogonal problem spaces (a sentiment I would agree with).

I should also note that there was no mention of any integrity policy #2, security metric #2, or so on. My interpretation of this is that Prof. Bernstein believes that the #1 definitions are sufficient in the domain of data provenance, but there are other reasonable interpretations.

The point about keeping the trusted computing base as simple and as small as possible was one of the big take-aways for me. His response to my question implies that he feels it is preferable to incur costs in complexity and implementation time outside the TCB, perhaps many times over, in pursuit of the goal of TCB auditability.

Finally, Prof. Bernstein is not alone in lamenting the current trust model in the PKI of the Internet. It didn’t have a lot to do with the message of his lecture, but I nevertheless look forward to learning more about GNUnet and checking out TweetNaCl.

July 09, 2014

Is there a Java Binding for LMIShell?

An interesting question just came up: “is there a Java binding for LMIShell?”

Hmm, good question – let’s answer it by digging into the OpenLMI architecture a bit.

LMIShell is a client framework written in Python. It consists of a Python language binding to the OpenLMI WBEM interface (which is CIM XML over https) which presents the OpenLMI objects as native Python objects, a set of helper functions, a set of task oriented management scripts, and a task oriented CLI interface.

LMIShell is designed to be extended by adding new management scripts (also written in Python) and CLI calls.

Java also has a language binding to the OpenLMI WBEM interface. In fact, since this is Linux, there are two of them… The Java language bindings are provided by the sblim-cim-client and sblim-cim-client2 packages. Both of these packages provide a CIM Client Class Library for Java applications which is compliant with the JCP JSR48 specification. Details about the Java Community Process and JSR48 can be found at http://www.jcp.org and http://www.jcp.org/en/jsr/detail?id=48. Note that documentation and examples are available – see the sblim-cim-client2-manual package.

Thus, there is a direct interface to the OpenLMI API from Java. An entire client application can be written in Java – in fact, there was discussion of whether LMIShell should be implemented in Python or Java.

If you want to use the LMIShell CLI from Java, that is straightforward. If you want to call LMIShell functions from Java, it can be done but is a little trickier. If you want to write a Java application directly against the OpenLMI API, use the Java language binding.

In many cases the easiest answer is likely to be to look at the LMIShell modules to see how they call the OpenLMI API, and then implement the function directly in Java using the Java language binding.


Diagnosing a Dogtag SELinux Issue

In this post, I explain an issue I had with Dogtag failing to start due to some recently added behaviour that was prohibited by Fedora’s SELinux security policy, and detail the steps that were taken to resolve it.

The Problem

A recent commit to Dogtag added the ability to archive each subsystem’s configuration file on startup. This feature is turned on by default. On each startup, each subsystem’s CS.cfg is copied to /etc/pki/<instance>/<subsystem>/archives/CS.cfg.bak.<timestamp>. A symbolic link pointing to the archived file named CS.cfg.bak is then created in the parent directory of archives/, alongside CS.cfg.

Having built and installed a development version of Dogtag that contained this new feature, I attempted to start Dogtag, but the service failed to start.

% sudo systemctl start pki-tomcatd@pki-tomcat.service
Job for pki-tomcatd@pki-tomcat.service failed. See 'systemctl status pki-tomcatd@pki-tomcat.service' and 'journalctl -xn' for details.

The error message gave some advice on what to do next, so I followed its advice.

% systemctl status pki-tomcatd@pki-tomcat.service
pki-tomcatd@pki-tomcat.service - PKI Tomcat Server pki-tomcat
   Loaded: loaded (/usr/lib/systemd/system/pki-tomcatd@.service; enabled)
   Active: failed (Result: exit-code) since Tue 2014-07-08 21:22:42 EDT; 1min 10s ago
  Process: 26699 ExecStop=/usr/libexec/tomcat/server stop (code=exited, status=1/FAILURE)
  Process: 26653 ExecStart=/usr/libexec/tomcat/server start (code=exited, status=143)
  Process: 32704 ExecStartPre=/usr/bin/pkidaemon start tomcat %i (code=exited, status=1/FAILURE)
 Main PID: 26653 (code=exited, status=143)

Jul 08 21:22:42 ipa-1.ipa.local systemd[1]: Starting PKI Tomcat Server pki-tomcat...
Jul 08 21:22:42 ipa-1.ipa.local pkidaemon[32704]: ln: failed to create symbolic link ‘/var/lib/pki/pki-tomcat/conf/ca/CS.cfg.bak‘: Permission denied
Jul 08 21:22:42 ipa-1.ipa.local pkidaemon[32704]: SUCCESS:  Successfully archived '/var/lib/pki/pki-tomcat/conf/ca/archives/CS.cfg.bak.20140708212242'
Jul 08 21:22:42 ipa-1.ipa.local pkidaemon[32704]: WARNING:  Failed to backup '/var/lib/pki/pki-tomcat/conf/ca/CS.cfg' to '/var/lib/pki/pki-tomcat/conf/ca/CS.cfg.bak'!
Jul 08 21:22:42 ipa-1.ipa.local pkidaemon[32704]: /usr/share/pki/scripts/operations: line 1579: 0: command not found
Jul 08 21:22:42 ipa-1.ipa.local systemd[1]: pki-tomcatd@pki-tomcat.service: control process exited, code=exited status=1
Jul 08 21:22:42 ipa-1.ipa.local systemd[1]: Failed to start PKI Tomcat Server pki-tomcat.
Jul 08 21:22:42 ipa-1.ipa.local systemd[1]: Unit pki-tomcatd@pki-tomcat.service entered failed state.

journalctl -xn gave essentially the same information as above. We can see that creation of the symbolic link failed, which led to a subsequent warning and failure to start the service. Interestingly, we can also see that creation of archives/CS.cfg.bak.20140708212242 (the target of the symbolic link) was reported to have succeeded.

The user that runs the Dogtag server is pkiuser, and everything seemed fine with the permissions in /etc/pki/pki-tomcat/ca/. The archived configuration file that was reported to have been created successfully was indeed there.

Next I looked at the Dogtag startup routines, which live in /usr/share/pki/script/operations. I located the offending ln -s and replaced it with a cp, that is, instead of creating a symbolic link, the startup script would now simply create CS.cfg.bak as a copy of the archived configuration file. Having made this change, I tried to start Dogtag again, and it succeeded. Something was prohibiting the creation of the symbolic link.

The Culprit

That something was SELinux.

SELinux (Security-Enhanced Linux) is mandatory access control system for Linux that can be used to express and enforce detailed security policies. It is enabled by default in recent version of Fedora, which ships with a reasonable default set of security policies.

The Workaround

To continue the diagnosis of this problem, I restored the original behaviour of the startup script, i.e. creating a symbolic link, and confirmed that Dogtag was once again failing to start.

The next step was to look for a way to get SELinux to permit the operation. I soon discovered setenforce(8), which is used to put SELinux into enforcing mode (setenforce 1; the default behaviour) or permissive mode (setenforce 0). As expected, running sudo setenforce 0 allowed Dogtag startup to succeed again, but obviously this was not a solution – merely a temporary workaround, acceptable in a development environment, but unacceptable for our customers and users.

The Plumbing

Having little prior experience with SELinux, and since it had reached the end of the day, I emailed the other developers for advice on how to proceed. Credit goes to Ade Lee for most of the information that follows.

SELinux logs to /var/log/audit/audit.log (on Fedora, at least). This log contains details about operations that SELinux denied (or would have denied, if it was enforcing). This log can be read by the audit2allow(1) tool, to construct SELinux rules that would allow the operations that were denied. First, the log was truncated, so it will include only the relevant failures:

% sudo sh -c ':>/var/log/audit/audit.log'

Next, with SELinux still in permissive mode so that all operations that would otherwise be denied throughout the startup process will be permitted but logged, I started the server via systemctl as before. Startup succeeded, and audit log now contained information about all the would-have-failed operations. Here is a short excerpt from the audit log (three lines, wrapped):

type=AVC msg=audit(1404872081.435:1006): avc:  denied  { create }
  for  pid=1298 comm="ln" name="CS.cfg.bak"
  scontext=system_u:system_r:pki_tomcat_t:s0
  tcontext=system_u:object_r:pki_tomcat_etc_rw_t:s0 tclass=lnk_file
type=SYSCALL msg=audit(1404872081.435:1006): arch=c000003e
  syscall=88 success=yes exit=0 a0=7fff6b27aac0 a1=7fff6b27ab03 a2=0
  a3=7fff6b278790 items=0 ppid=1113 pid=1298 auid=4294967295 uid=994
  gid=994 euid=994 suid=994 fsuid=994 egid=994 sgid=994 fsgid=994
  tty=(none) ses=4294967295 comm="ln" exe="/usr/bin/ln"
  subj=system_u:system_r:pki_tomcat_t:s0 key=(null)
type=AVC msg=audit(1404872081.436:1007): avc:  denied  { read }
  for  pid=1113 comm="pkidaemon" name="CS.cfg.bak" dev="vda3"
  ino=134697 scontext=system_u:system_r:pki_tomcat_t:s0
  tcontext=system_u:object_r:pki_tomcat_etc_rw_t:s0 tclass=lnk_file

There were about 30 lines in the audit log. As expected, there were entries related to the failure to create a symbolic link – those are the lines above. There were also entries that didn’t seem related to the symlink failure, yet were obviously caused by the Dogtag startup.

To one unfamiliar with SELinux, the format of the audit log and the meaning of the entries therein is somewhat opaque. Running sudo audit2why -a distils the audit log into more human-friendly information, giving information about six denials including the symlink denial:

type=AVC msg=audit(1404872081.435:1006): avc:  denied  { create } for  pid=1298 comm="ln" name="CS.cfg.bak" scontext=system_u:system_r:pki_tomcat_t:s0 tcontext=system_u:object_r:pki_tomcat_etc_rw_t:s0 tclass=lnk_file
        Was caused by:
                Missing type enforcement (TE) allow rule.

                You can use audit2allow to generate a loadable module to allow this access.

Each message gives the user, operation and labels of resources involved in the denied operation, and the cause of the denial. It also suggests using audit2allow(1) to generate the rules that would allow the failed operations. Running sudo audit2allow -a gave the following output:

#============= pki_tomcat_t ==============

#!!!! This avc is a constraint violation.  You would need to modify the attributes of either the source or target types to allow this access.
#Constraint rule:
        constrain file { create relabelfrom relabelto } ((u1 eq u2 -Fail-)  or (t1=pki_tomcat_t  eq TYPE_ENTRY -Fail-) { POLICY_SOURCE: can_change_object_identity } ); Constraint DENIED

#       Possible cause is the source user (system_u) and target user (unconfined_u) are different.
allow pki_tomcat_t pki_tomcat_etc_rw_t:file create;
allow pki_tomcat_t pki_tomcat_etc_rw_t:file { relabelfrom relabelto };
allow pki_tomcat_t pki_tomcat_etc_rw_t:lnk_file { read create };
allow pki_tomcat_t self:process setfscreate;

I have no idea about the meanings of the warning and the constrain rule, but the other rules make more sense. In particular, the second-last rule is undoubtedly the one that will allow the creation of symbolic links. Without knowing the specifics of this rule format, I would interpret this line as,

Allow processes with the pki_tomcat_t attribute to create and read symbolic links in in areas (of the filesystem) with the pki_tomcat_etc_rw_t attribute.

Admittedly, I have inferred processes and filesystem above, in no small part due to the names pki_tomcat_t and pki_tomcat_etc_rw_t, which were probably chosen by the Dogtag developers. Nevertheless, the rule format seems to do a satisfactory job of communicating the meaning of a rule, especially when descriptive labels are used.

The Fix

The SELinux policies that permit Dogtag to manage its affairs (configuration, logging, etc.) on a Fedora system are not shipped in the pki-* packages, but rather in the selinux-policy-targeted package, which provides policies for Dogtag and many other network servers and programs.

For an issue in this package to be corrected, one has to file a bug against the selinux-policy-targeted component of the Fedora product on the Red Hat Bugzilla. A reference policy should be attached to the bug report; audit2allow will generate one when invoked with the -R or -reference argument.

% sudo audit2allow -R -i /var/log/audit/audit.log > pki-lnk_file.te
could not open interface info [/var/lib/sepolgen/interface_info]

This failed, but a web search soon revealed that the appropriate interface is generated by the sepolgen-ifgen command, which is provided by the policycoreutils-devel package.

% sudo yum install -y policycoreutils-devel
% sudo sepolgen-ifgen
% sudo audit2allow -R -i /var/log/audit/audit.log > pki-lnk_file.te
% cat pki-lnk_file.te

require {
        type pki_tomcat_etc_rw_t;
        type pki_tomcat_t;
        class process setfscreate;
        class lnk_file { read create };
        class file { relabelfrom relabelto create };
}

#============= pki_tomcat_t ==============

#!!!! This avc is a constraint violation.  You would need to modify the attributes of either the source or target types to allow this access.
#Constraint rule:
        constrain file { create relabelfrom relabelto } ((u1 eq u2 -Fail-)  or (t1=pki_tomcat_t  eq TYPE_ENTRY -Fail-) { POLICY_SOURCE: can_change_object_identity } ); Constraint DENIED

#       Possible cause is the source user (system_u) and target user (unconfined_u) are different.
allow pki_tomcat_t pki_tomcat_etc_rw_t:file create;
allow pki_tomcat_t pki_tomcat_etc_rw_t:file { relabelfrom relabelto };
allow pki_tomcat_t pki_tomcat_etc_rw_t:lnk_file { read create };
allow pki_tomcat_t self:process setfscreate;

With pki-link_file.te in hand, I filed a bug. Hopefully the package will be updated soon.

Conclusion

When I first ran into this issue, I had very little experience with SELinux. I now know a fair bit more than I used to – how to quickly determine whether SELinux is responsible for a given failure, and what the operations were that failed – but there is much more to learn about the workings of SELinux and the definition and organisation of policies.

As to the occurrence of the problem itself, whilst from a security standpoint it makes sense to separate the granting of privileges to software from the provision of that software, as a developer, it frustrated me that I had to submit a request to another team responsible for a different aspect of Fedora just for Dogtag to be able to create a symbolic link in its own configuration directory!

This arrangement of having the policies for myriad common servers and programs provided centrally by one or two packages is new to me. There are obvious merits to this approach – and obvious drawbacks. Perhaps there is another approach that represents the best of both worlds – security for the user, and convenience or lack of roadblocks for the developer. Perhaps I am talking about containers, à la Docker.

In the mean time, until the selinux-policy-targeted package is updated to add the symbolic link rules Dogtag needs, with SELinux still in permissive mode on my development VM, I can get on with the job of implementing LDAP profile storage in Dogtag.

July 08, 2014

Audit Belongs with Policy

Policy in OpenStack is the mechanism by which Role-Based-Access-Control is implemented. Policy is distributed in rules files which are processed at the time of a user request. Audit has come to mean the automated emission and collection of events used for security review. The two processes are related and need a common set of mechanisms to build a secure and compliant system.

This is a little rough, but I promised I would sum up our discussion.

Why Unified

The policy enforces authorization decisions. These decisions need to be audited.

Assume that both policy and audit are implemented as middleware pipeline components. If policy happens before audit, then denied operations would not emit audit events.If policy happens after audit, the audit event does not know if the request was successful or not.If audit and policy are not unified, the audit event does not know what rule was actually applied for the authorization decision.

Current Status

Rob Basham,Matt Rutkowski,Brad Topol, and Gordon Chung presented on a middleware Auditing implementation at the Atlanta Summit.

Tokens are unpacked by a piece of code called keystonemiddleware.auth_token. (actually, its in keystoneclient at the moment, but moving)

Auth token middleware today does too much. It unpacks tokens, but it also enforces policy on them; if an API pipeline calls into auth_token, the absence of a token will trigger the return of a ’401 Unauthorized’.

The first step to handling this is that a specific server can set ‘delay_auth_decision = True’ in the config file, and then no policy is enforced, but the decision is instead deferred until later.

Currently, policy enforcement is performed on a per-project basis. The Keystone code that enforces policy starts with a decorator defined here in the Icehouse codebase;

http://git.openstack.org/cgit/openstack/keystone/tree/keystone/common/controller.py?h=stable/icehouse#n87

The Glance code base uses this code;

http://git.openstack.org/cgit/openstack/glance/tree/glance/api/policy.py?h=stable/icehouse

Nova uses:

http://git.openstack.org/cgit/openstack/nova/tree/nova/policy.py?h=stable/icehouse

And the other projects are comparable. This has several implications. Probably the most significant is that policy implementation can vary from project to project, making the administrators life difficult.

Deep object inspection

What is different about Keystone’s implementation?  It has to do with the ability to inspect objects out of the database before applying policy.  If a user wants to read, modify, or delete an object, they only provide the  ID to the remote server.  If the server knows what the project ID is of the object, it can apply policy.  But that information is not in the request.  So the server needs to find what project owns the object.  The decorator @controller.protected contains the flag get_member_from_driver which fetches the object prior to enforcing the policy.

Nova buries the call to the enforce deeper inside the controller Method.

Path forward

Cleaning up policy implementation

  • Clean up the keystone server implementation of policy.  The creation of https://github.com/openstack/keystone/blob/master/keystone/common/authorization.py was a start, but the code called from the decorators that knows how to deal with the token data  in controller.py need to be pulled into authorization.py as well.
  • Move authorization.py into keystonemiddleware
  • Make the keystone server use the middleware implementation
  • convert the other projects to use the middleware implementation.
  • convert the other projects to use “delay_auth_decision” so this can eventually be the default.

Audit Middleware

  • put the audit middleware into Keystone middleware as-is.  This lets people use audit immediately.
  • Extract out the logic  from the middleware into code that can be called from policy enforcement.
  • Create a config option to control the emission of audit events from policy enforcement.
  • Remove the audit middleware from the API paste config files and enable the config option for policy emission of events.

Why not Oslo-policy.

Oslo policy is a general purpose rules engine.  Keeping that separate from the OpenStack RBAC specific implementation is a good separation of concerns.  Other parts of OpenStack may have needs for a policy/rules enforcement that is completely separate from RBAC.  Firewall configs in Neutron is the obvious first place.

 

July 07, 2014

Threats: Sphinx the Script Kiddie

Sphinx the Script Kiddie

Sphinx
Unlike Igor, Sphinx doesn’t have deep skills or knowledge. But he does have access to very powerful cracking toolkits that other people have developed and to people who can provide guidance and answer questions. This makes him far more dangerous than he would be if he had to rely on his own skills.

Most of the “hackers” (actually “crackers”) out there are actually like Sphinx. He may do everything from defacing web sites to identity theft and credit card fraud. In many cases he will be looking for targets of opportunity, rather that going after a specific system. He tends to use his cracking toolkits to probe every system he can find, looking for unsecured systems and common security flaws.

Much of your security strategy should be designed for Sphinx. There are a lot of them out there and they can do a lot of damage.


July 02, 2014

LMIShell on RHEL 7

Someone reported that they were having problems using LMIShell on a RHEL 7 system – they didn’t have any of the friendly commands that we have been talking about. And they were right; the full set of LMIShell scripts that provide the friendly CLI experience are not automatically installed on RHEL 7.

LMIShell on RHEL 7 is a special case – the LMIShell infrastructure is included in RHEL 7, but many of the scripts that make LMIShell easy to use are not packaged directly in RHEL 7. Instead, they are delivered through EPEL – the Extra Packages for Enterprise Linux. To effectively use LMIShell on a RHEL 7 system you need to install the EPEL repository and then install the OpenLMI Scripts from it.

One of the key characteristics of RHEL is the stability of interfaces. The OpenLMI API is stable, which allows us to include OpenLMI infrastructure and Providers in RHEL 7.

The LMIShell scripts, on the other hand, are rapidly evolving and changing. This is by design – we want the scripts to be useful, and we encourage people to modify and extend them. And hopefully submit their changes back upstream. This is a general characteristic of system management scripts; many of them change and evolve over time.

To install the full set of LMIShell scripts on a RHEL 7 system, first install the EPEL repository by going to http://mirror.pnl.gov/epel/beta/7/x86_64/repoview/epel-release.html downloading the package and installing it. This will configure your system to install packages from the EPEL for RHEL 7 repository.

Next, install LMIShell with the scripts:

#yum install 'openlmi-scripts\*'

This will install the LMIShell framework from RHEL 7 and all the LMIShell scripts from the EPEL repository. If you have already installed LMIShell it will simply install the scripts from EPEL.

To verify that the LMIShell scripts have been installed, issue the command “lmi help”. If you see a list of commands such as hwinfo, net, and storage, then the scripts are installed. You might also try “lmi hwinfo”, which will display information on the system and hardware configuration.


Wanted: A small crew for working on security bugs in Fedora

Do you hate security vulnerabilities?

Do you want to help make Fedora more secure?

Do you have a little extra time in your week to do a little work (no coding required)?

If you answered yes to the questions above I want you for a beta test of an idea I have to help make Fedora more secure.  I’m looking for just a few people (maybe five) to sort through security bugs and work with upstream and packagers to get patches or new releases into Fedora and help make everyone’s computing experience a little safer.  If you’re interested please contact me (sparks@fedoraproject.org 0x024BB3D1) and let me know you’re interested.


It’s all a question of time – AES timing attacks on OpenSSL

This blog post is co-authored with Andy Polyakov from the OpenSSL core team.

Advanced Encryption Standard (AES) is the mostly widely used symmetric block cipher today. Its use is mandatory in several US government and industry applications. Among the commercial standards AES is a part of SSL/TLS, IPSec, 802.11i, SSH and numerous other security products used throughout the world.

Ever since the inclusion of AES as a federal standard via FIPS PUB 197 and even before that when it was known as Rijndael, there has been several attempts to cryptanalyze it. However most of these attacks have not gone beyond the academic papers they were written in. One of them worth mentioning at this point is the key recovery attacks in AES-192/AES-256. A second angle to this is attacks on the AES implementations via side-channels. A side-channel attack exploits information which is leaked through physical channels such power-consumption, noise or timing behaviour. In order to observe such a behaviour the attacker usually needs to have some kind of direct or semi-direct control over the implementation.

There has been some interest about side-channel attacks in the way OpenSSL implements AES. I suppose OpenSSL is chosen mainly because its the most popular cross-platform cryptographic library used on the internet. Most Linux/Unix web servers use it, along with tons of closed source products on all platforms. The earliest one dates back to 2005, and the recent ones being about cross-VM cache-timing attacks on OpenSSL AES implementation described here and here. These ones are more alarming, mainly because with applications/data moving into the cloud, recovering AES keys from a cloud-based virtual machine via a side-channel attack could mean complete failure for the code.

After doing some research on how AES is implemented in OpenSSL there are several interesting facts which have emerged, so stay tuned.

What are cache-timing attacks?

Cache memory is random access memory (RAM) that microprocessor can access more quickly than it can access regular RAM. As the microprocessor processes data, it looks first in the cache memory and if it finds the data there (from a previous reading of data), it does not have to do the more time-consuming reading of data from larger memory. Just like all other resources, cache is shared among running processes for the efficiency and economy. This may be dangerous from a cryptographic point of view, as it opens up a covert channel, which allows malicious process to monitor the use of these caches and possibly indirectly recover information about the input data, by carefully noting some timing information about own cache access.

A particular kind of attack called the flush+reload attack works by  forcing data in the victim process out of the cache, waiting a bit, then measuring the time it takes to access the data. If the victim process accesses the data while the spy process is waiting, it will get put back into the cache, and the spy process’s access to the data will be fast. If the victim process doesn’t access the data, it will stay out of the cache, and the spy process’s access will be slow. So, by measuring the access time, the spy can tell whether or not the victim accessed the data during the wait interval. All this under premise that data is shared between victim and adversary.

Note that we are not talking about secret key being shared, but effectively public data, specifically lookup tables discussed in next paragraph.

Is AES implementation in OpenSSL vulnerable to cache-timing attacks?

Any cipher relying heavily on S-boxes may be vulnerable to cache-timing attacks. The processor optimizes execution by loading these S-boxes into the cache so that concurrent accesses/lookups, will not need loading them from the main memory. Textbook implementations of these ciphers do not use constant-time lookups when accessing the data from the S-boxes and worse each lookup depends on portion of the secret encryption key. AES-128, as per the standard, requires 10 rounds, each round involves 16 S-box lookups.

The Rijndael designers proposed a method which results in fast software implementations. The core idea is to merge S-box lookup with another AES operation by switching to larger pre-computed tables. There still are 16 table lookups per round. This 16 are customarily segmented to 4 split tables, so that there are 4 lookups per table and round. Each table consists of 256 32-bit entries. These are referred to as T-tables, and in the case of the current research, the way these are loaded into the cache leads to timing-leakages. The leakage as described in the paper  is quantified by probability of a cache line not being accessed as result of block operation. As each lookup table, be it S-box or pre-computed T-table, consists of 256 entries, probability is (1-n/256)^m, where n is number of table elements accommodated in single cache line, and m is number of references to given table per block operation. Smaller probability is, harder to mount the attack.

Aren’t cache-timing attacks local, how is virtualized environment affected?

Enter KSM (Kernel SamePage Merging). KSM enables the kernel to examine two or more already running programs and compare their memory. If any memory regions or pages are identical, KSM reduces multiple identical memory pages to a single page. This page is then marked copy on write. If the contents of the page is modified by a guest virtual machine virtual machine, a new page is created for that guest virtual machine. This means that cross-VM cache-timing attacks would now be possible. You can stop KSM or modifiy its behaviour. Some details are available here.

You did not answer my original question, is AES in OpenSSL affected?

In short, no. But not to settle for easy answers, let’s have a close look at how AES in OpenSSL operates. In fact there are several implementations of AES in OpenSSL codebase and each one of them may or may not be chosen based on specific run-time conditions. Note: All of the above discussions are in about OpenSSL version 1.0.1.

  • Intel Advanced Encryption Standard New Instructions or AES-NI, is an extension to the x86 instruction set for intel and AMD machines used since 2008. Intel processors from Westmere onwards and AMD processors from Bulldozer onwards have support for this. The purpose of AES-NI is to allow AES to be performed by dedicated circuitry, no cache is involved here, and hence it’s immune to cache-timing attacks. OpenSSL uses AES-NI by default, unless it’s disabled on purpose. Some hypervisors mask the AES-NI capability bit, which is customary done to make sure that the guests can be freely migrated within heterogeneous cluster/farm. In those cases OpenSSL will resort to other implementations in its codebase.
  • If AES-NI is not available, OpenSSL will either use Vector Permutation AES (VPAES) or  Bit-sliced AES (BSAES), provided the SSSE3 instruction set extension is available. SSSE3 was first introduced in 2006, so there is a fair chance that this will be available in most computers used. Both of these techniques avoid data- and key-dependent branches and memory references, and therefore are immune to known timing attacks. VPAES is used for CBC encrypt, ECB and “obscure” modes like OFB, CFB, while BSAES is used for CBC decrypt, CTR and XTS.
  • In the end, if your processor does not support AES-NI or SSSE3, OpenSSL falls back to integer-only assembly code. Unlike widely used T-table implementations, this code path uses a single 256-bytes S-box. This means that probability of a cache line not being accessed as result of block operation would be (1-64/256)^160=1e-20. “Would be” means that actual probability is even less, in fact zero, because S-box is fully prefetched, and even in every round.

For completeness sake it should be noted that OpenSSL does include reference C implementation which has no mitigations to cache-timing attacks. This is a platform-independent fall-back code that is used on platforms with no assembly modules, as well as in cases when assembler fails for some reason. On side note, OpenSSL maintains really minimal assembler requirement for AES-NI and SSSE3, in fact the code can be assembled on Fedora 1, even though support for these instructions was added later.

Bottom line is that if you are using a Linux distribution which comes with OpenSSL binaries, there is a very good chance that the packagers have taken pain to ensure that the reference C implementation is not compiled in. (Same thing would happen if you download OpenSSL source code and compile it)

It’s not clear from the research paper how the researchers were able to conduct the side channel attack. All evidence suggests that they ended up using the standard reference C implementation of AES instead of assembly modules which have mitigations in place.  The researchers were contacted but did not respond to this point.  Anyone using an OpenSSL binary they built themselves using the defaults, or precompiled as part of an Linux distribution should not be vulnerable to these attacks.

June 30, 2014

Threats: Igor the Hacker

Now that we’ve taken a look at what some of the threats are, let’s look at who might be behind these threats. One goal is to determine who the greatest threat is. You may be surprised…

Igor the Hacker

Image

Igor is who you think of when someone says “hacker”. True hackers have always been skilled. Igor is very skilled and is in it for the money. He may have the backing of considerable resources from criminal organizations or even from state entities.

There are two ways Igor may be after you. If he is building a zombie botnet for spam and ddos attacks he will be looking for systems that are easy to take over. Normal security precautions should provide a good defense.

On the other hand, if you have assets that Igor is after, you have a real problem. Almost no level of security will be enough to stop him. And he won’t stop with computer attacks; social engineering is one of his most powerful tools. In some cases he may even resort to physical penetration to get to your systems.

Fortunately, there aren’t that many Igors around. You can’t build a security strategy around nothing but stopping Igor – it isn’t cost effective and truly hardened systems are often difficult to use. We will examine how a defense in depth approach can be used to manage Igor.

(Note: Igor is actually a cracker, not a hacker. A hacker is someone with deep computer skills who makes computers do amazing things. It describes someone with exceptional knowledge and skills. Unfortunately, hacker has been hijacked by the media to refer to criminal crackers…)


June 22, 2014

Signing PGP keys

If you’ve recently completed a key signing party or have otherwise met up with other people and have exchanged key fingerprints and verified IDs, it’s now time to sign the keys you trust.  There are several different ways of completing this task and I’ll discuss two of them now.

caff

CA Fire and Forget (caff) is a program that allows you to sign a bunch of keys (like you might have after a key signing party) very quickly.  It also adds a level of security to the signing process by forcing the other person to verify that they have both control over the email address provided and the key you signed.  The way caff does this is by encrypting the signature in an email and sending it to the person.  The person who receives the message must also decrypt the message and apply the signature themselves.  Once they sync their key with the key server the new signatures will appear for everyone.

$ gpg --keyserver hkp://pool.sks-keyservers.net --refresh-key

There is some setup of caff that needs to be done prior but once you have it setup it’ll be good to go.

Installing caff

Installing caff is pretty easy although there might be a little trick.  In Fedora there isn’t a caff package.  Caff is actually in the pgp-tools package; other distros may have this named differently.

Using caff

Once you have caff installed and setup, you just need to tell caff what key IDs you would like to sign.  “man caff” will give you all the options but basically ‘caff -m no yes -u ‘ will sign all the keys listed after your key.  You will be asked to verify that you do want to sign the key and then caff will sign the key and mail it off.  The user will receive an email, per user id on the key, with instructions on importing the signature.

Signing a key with GnuPG

The other way of signing a PGP key is to use GnuPG.  Signing a key this way will simply add the signature to the key you have locally and then you’ll need to send those keys out to the key server.

Retrieving keys using GnuPG

The first thing that you have to do is pull the keys down from the keyserver.

$ gpg --keyserver hkp://pool.sks-keyservers.net --recv-keys ...

Once you have received all the keys you can then sign them.  If someone’s key is not there you should probably contact them and ask them to add their key to the servers.  If they already have uploaded their key, it might take a couple of hours before it is sync’d everywhere.

Using GnuPG

Signing a key is pretty straightforward:

$ gpg --sign-key 1bb943db
pub 1024D/1BB943DB created: 2010-02-02 expires: never usage: SC 
 trust: unknown validity: unknown
sub 4096g/672557E6 created: 2010-02-02 expires: never usage: E 
[ unknown] (1). MariaDB Package Signing Key <package-signing-key@mariadb.org>
[ unknown] (2) Daniel Bartholomew (Monty Program signing key) <dbart@askmonty.org>
Really sign all user IDs? (y/N) y
pub 1024D/1BB943DB created: 2010-02-02 expires: never usage: SC 
 trust: unknown validity: unknown
 Primary key fingerprint: 1993 69E5 404B D5FC 7D2F E43B CBCB 082A 1BB9 43DB
MariaDB Package Signing Key <package-signing-key@mariadb.org>
 Daniel Bartholomew (Monty Program signing key) <dbart@askmonty.org>
Are you sure that you want to sign this key with your
key "Eric Harlan Christensen <eric@christensenplace.us>" (024BB3D1)
Really sign? (y/N) y

In the example I signed the MariaDB key with my key.  Once that is complete a simple:

gpg --keyserver hkp://pool.sks-keyservers.net --send-key 1BB943DB

…will send the new signature to the key servers.


June 19, 2014

Using OpenLMI to join a machine to a FreeIPA domain

Stephen Gallagher has published an article on how to use OpenLMI to join a FreeIPA domain. The article is available on his blog at sgallagh.wordpress.com

As Stephen notes:

“Traditionally, enrolling a system has been a “pull” operation, where an admin signs into the system and then requests that it be added to the domain. However, there are many environments where this is difficult, particularly in the case of large-scale datacenter or cloud deployments. In these cases, it would be much better if one could script the enrollment process.”

He covers how to use OpenLMI to update DNS, install the IPA client software, and finally join a domain. While he shows how to do these steps interactively, they can also be scripted to fully automate the process.

Good stuff, and quite simple to do.


June 18, 2014

OpenSSL Privilege Separation Analysis

As part of the security response process, Red Hat Product Security looks at the information that we obtain in order to align future endeavors, such as source code auditing, to where problems occur in order to attempt to prevent repeats of previous issues.

Private key isolation

When Heartbleed was first announced, a patch was proposed to store private keys in isolated memory, surrounded by an unreadable page. The idea was that the process would crash due to a segmentation violation before the private key memory was read.

However, it was quickly pointed out that the proposed patch was flawed. It did not store the private keys in the isolated memory space, and the contents of memory accessible by Heartbleed could still contain information that can be used to quickly reconstruct the private key.

The lesson learned here was that an audit of how and where private keys can be accessed, and where useful information is stored, should be undertaken to identify any potential weaknesses in the approach. Additionally, testing and verifying results would have identified that the private keys were not located in memory surrounded by unreadable memory pages.

Private key privilege separation

The idea behind private key privilege separation is to reduce the risk of an equivalent Heartbleed-style memory leak vulnerability. This can be implemented by using an application in front of the end service being protected or be implemented in the target application itself.

One example of using an application in front of the service being protected is Titus.  This application runs a separate process per TLS connection and stores the private key in another process. This helps prevent Heartbleed-style bugs from leaking private keys and other information about application state. The per-connection process model also protects against information from other connections being leaked or affected.

One drawback of the current implementation in Titus is that it fork()s and doesn’t execve() itself.  If there are any memory corruption vulnerabilities present in Titus, or OpenSSL, writing an exploit against the target is far easier than it could have been and potentially leaves useful information in memory that can be obtained later on.

Additionally, depending on the how chroot directories are set up, there may not be devices such as /dev/urandom available, which reduces the possible entropy sources available to OpenSSL.

Another approach is to implement the private key privilege separation in the process itself which is what some of the OpenBSD software has started to do. The aim being that while it won’t protect against OpenSSL vulnerabilities in and of itself, it will help restrict private keys from being leaked.

Privilege-separated OpenSSL

Sebastian Krahmer wrote a OpenSSL Privilege Separation (sslps) proof of concept which uses the Linux Kernel Secure Computing (seccomp) interface to isolate OpenSSL from the lophttpd process. This effectively reduces the available system calls that OpenSSL itself makes.

This has the advantage that if there is a memory corruption or arbitrary code execution vulnerability present in OpenSSL an attacker requires a further kernel vulnerability present either in the allowed system calls or in the lophttpd IPC mechanism to gain access.

Another possibility is that the attacker is happy to sit in the restricted OpenSSL process and monitor the SSL_read and SSL_write traffic, potentially gaining access to the private keys in memory.

While the current version of sslps doesn’t mitigate against Heartbleed-style memory leaking the private key, it helps make an attacker’s job harder in a memory corruption or arbitrary code execution vulnerability situation in OpenSSL.

It will be interesting to see if the OpenSSL or LibreSSL developers investigate using privilege separation or sandboxing in the future and what approaches are taken to implement them.

Hardware

One approach to help restrict compromises from software is to store the private keys elsewhere to prevent key compromise. One such approach is using a Hardware Security Module (HSM) to handle key generation, encryption, and signing. We may discuss using HSMs in the future.

It is also possible to use a Trusted Platform Module (TPM) to provide key generation, storage, encryption, and signing with OpenSSL, but this approach may be too slow for non-client side consideration.

Designing a new approach

Having laid out what’s available, a rough draft of an idealized approach for hardening SSL processing can now be made.

First, the various private keys should be isolated from the main processing of SSL traffic. This will help reduce the impact of Heartbleed-style memory leaks which makes the attackers job of getting the private keys harder.

Second, the SSL traffic processing should be isolated from the application itself. This helps restrict the impact of bugs in OpenSSL from affecting the rest of the application and system to the maximum possible extent.

Lastly, use existing kernel features, such as executing a new process to have address space randomization and stack cookie values reapplied, as this helps reduce the amount of information available to attack other processes. Additionally, features such as seccomp could be used to restrict what the private key process and the SSL traffic process can do, which in turn helps restrict the attack surface available to a process. Furthermore, it may be possible to utilize mandatory access control (MAC) systems, such as SELinux, to further contain and restrict the processes involved.

Potential Pitfalls

Implementing all of the above may introduce some backwards compatibility issues. An example to consider is when applications which utilize chroot() and can no longer access the required executables to implement an idealized approach. Perhaps it might be feasible to implement a fallback to a fork() based mechanism.

There are other functionality that may be adversely affected by such restrictions, and would require proper indepth analysis, such as looking up server and client certificate validity. Some API compatibilities could also get in the way.

It’s possible that the IPC mechanisms would introduce some performance impact, but overhead would be dwarfed by the cryptographic processing side, and actually may not be measurable. It may be possible to reduce the amount of overhead with some compromise of security by using shared pages, or page migration between processes to reduce the data copying aspect of IPC, and just have the IPC mechanism used for message passing.

Conclusion

We’ve covered currently existing approaches and drawn up a rough list of idealized features that would be required to help reduce the current attack surface of OpenSSL. These features would make an attackers job harder in compromising private keys and compromising applications that use OpenSSL. A follow-up post may look at using an OpenSSL engine to move the private key from the application itself, into another process to prevent Heartbleed-style memory leaks from disclosing the private keys.

June 17, 2014

Threats

Let’s shift back to a security discussion and take a look at threats. Any intelligent discussion of threats starts out by looking at what you are protecting, how it can be threatened, and the impact if one of the threats actually occurs. Let’s take a look at some threats:

Defacing a Web Site

In the past this has been one of the most common and visible “hacker” threats. If you have a simple “brochure ware” site, the most reasonable approach may be to simply have a good backup you can restore. If, for example, you have a DreamWeaver site, you might simply mutter something appropriate under your breath and hit the button to republish the site.

On the other hand, if you have an ecommerce site that your company depends on… This site is obviously important and must be protected.

This is an example of considering exposure, impact and cost. You shouldn’t spend too much to protect the “brochure ware” site. You shouldn’t spend too little to protect the ecommerce site. You should do the analysis of what is appropriate!

Using a System for Other Purposes

Having your system hijacked and turned into a zombie spewing malware and spam is a bad thing. In addition to the direct impact on the system, this is likely to get your whole domain blacklisted and effectively kicked off the internet. Consider both the direct and indirect impact of someone taking over your system – this is worth defending against.

Stealing Data

Data theft can be catastrophic. The cost can go far beyond the direct costs – just ask Target about their credit card breach!

From the computer side, protecting data requires solid access controls, encryption, and operational controls. But you should ask some other questions: Why do you have that data at all? Do you need to store the data? Which computers actually need access to that data? In a uprising number of cases you may not actually need that data at all! As a simple example, don’t store passwords – store password hashes! Properly salted, of course…

Changing Data

This can be very serious. In many cases the absolute worst thing that can happen is for data to be changed. This can mean that none of the data can be trusted. Depending on the data and the change, this can range from a nuisance to life threatening. We will dig into this topic in more detail in the future.

Data Destruction

Data destruction can be malicious or accidental. What will you do if a disk drive crashes? What will you do if someone – maybe even you – “accidently” deletes a critical file? How about an evil hacker breaking in and deleting data?

Even worse, what if the evil hacker deletes every tenth record in your database? Or if there is data corruption in part of a file or database?

Data destruction can be subtle. You need to worry about preventing it, detecting it, and recovering from it.

Changing Software

Changing software is a severe and subtle risk! The bottom line is to make sure you can detect it if it occurs – yes, this is even more important than preventing it. A good example of what can happen is a recent Computerworld Sharktank article. In this cases, the people making changes to the system were authorized to do this – but the impact of those changes should have been detected.

Degraded System Availability or Performance

If a computer is performing an important business function, availability and performance have direct and measurable cost. You need to continuously measure the availability and performance of critical application services.


June 16, 2014

PGP Keysigning Event and CACert Assertion at SELF2014

SouthEast LinuxFest is happening this upcoming weekend.  I offered to host a PGP (I’ll substitute PGP for GPG, GnuPG, and other iterations) keysigning and CACert Assertion event and have been scheduled for 6:30 PM in the Red Hat Ballroom.  Since there is a little bit of planning needed on the part of the participant I’m writing this to help the event run smoothly.

Participating in the PGP Keysigning Event

If you haven’t already, generate your PGP keys.  Setting up your particular mail client (MUA) is more than what I’ll discuss here but there is plenty of resources on the Internet.  Send me (eric@christensenplace.us – signed, preferably encrypted to 0x024BB3D1) the fingerprint of your PGP key no later than 3:00PM on Saturday afternoon.  If you don’t send me your fingerprint by that time you’ll be responsible for providing it to everyone at the keysigning event on paper.  Obtaining your key’s fingerprint can be done as follows:

$ gpg --fingerprint 024bb3d1
pub 4096R/024BB3D1 2011-08-11 [expires: 2015-01-01]
 Key fingerprint = 097C 82C3 52DF C64A 50C2 E3A3 8076 ABDE 024B B3D1
uid Eric Harlan Christensen <eric@christensenplace.us>
uid Eric "Sparks" Christensen <sparks@redhat.com>
uid Eric "Sparks" Christensen <echriste@redhat.com>
uid Eric "Sparks" Christensen <sparks@fedoraproject.org>
uid [jpeg image of size 2103]
uid Eric Harlan Christensen <sparks@gnupg.net>
sub 3072R/DCA167D5 2013-02-03 [expires: 2023-02-01]
sub 3072R/A9D8262F 2013-02-03 [expires: 2023-02-01]
sub 3072R/56EA1030 2013-02-03 [expires: 2023-02-01]

Just send me the “Key fingerprint” portion and your primary UID (name and email address) and I’ll include it on everyone’s handout.  You’ll need to bring your key fingerprint on paper for yourself to verify that what I’ve written on the paper is, indeed, correct.

At the event we’ll quickly do a read of all the key fingerprints and validate them as correct.  Then we’ll line up and do the ID check.  Be sure you bring a photo ID with you so that we can validate who you are with who you claim to be to the authorities.  People are generally okay with a driver’s license; some prefer a passport.  Ultimately it’s up to the individual what they will trust.

CACert Assertion

CACert is a free certificate authority that signs X509 certificates for use in servers, email clients, and code signing.  If you are interested in using CACert you need to go sign up for an account before the event.  Once you have established an account, login and select “US – WoT Form” from the CAP Forms on the right-side of the page.  Print a few of these forms and bring them with you (I hope to have a final count of the number of assurers that will be available but you’ll need one form per assurer).  You’ll need to present your ID to the assurer so they can verify who you are.  They will then award you points in the CACert system.

Questions?

If you have any questions about the event feel free to ask them here (using a comment) or email me at eric@christensenplace.us.


Generating a PGP key using GnuPG

Generating a PGP using GnuPG (GPG) is quite simple.  The following shows my recommendations for generating a PGP key today.

$ gpg --gen-key 
gpg (GnuPG) 1.4.16; Copyright (C) 2013 Free Software Foundation, Inc.
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.
Please select what kind of key you want:
 (1) RSA and RSA (default)
 (2) DSA and Elgamal
 (3) DSA (sign only)
 (4) RSA (sign only)
Your selection? 1
RSA keys may be between 1024 and 4096 bits long.
What keysize do you want? (2048) 3072
Requested keysize is 3072 bits
Please specify how long the key should be valid.
 0 = key does not expire
  = key expires in n days
 w = key expires in n weeks
 m = key expires in n months
 y = key expires in n years
Key is valid for? (0) 1y
Key expires at Tue 16 Jun 2015 10:32:06 AM EDT
Is this correct? (y/N) y
You need a user ID to identify your key; the software constructs the user ID
from the Real Name, Comment and Email Address in this form:
 "Heinrich Heine (Der Dichter) <heinrichh@duesseldorf.de>"
Real name: Given Surname
Email address: given.surname@example.com
Comment: Example
You selected this USER-ID:
 "Given Surname (Example) <given.surname@example.com>"
Change (N)ame, (C)omment, (E)mail or (O)kay/(Q)uit? o
You need a Passphrase to protect your secret key.
We need to generate a lot of random bytes. It is a good idea to perform
some other action (type on the keyboard, move the mouse, utilize the
disks) during the prime generation; this gives the random number
generator a better chance to gain enough entropy.
..........+++++
.....+++++
We need to generate a lot of random bytes. It is a good idea to perform
some other action (type on the keyboard, move the mouse, utilize the
disks) during the prime generation; this gives the random number
generator a better chance to gain enough entropy.
+++++
....+++++
gpg: key 2CFA0010 marked as ultimately trusted
public and secret key created and signed.
gpg: checking the trustdb
gpg: 3 marginal(s) needed, 1 complete(s) needed, PGP trust model
gpg: depth: 0 valid: 2 signed: 49 trust: 0-, 0q, 0n, 0m, 0f, 2u
gpg: depth: 1 valid: 49 signed: 60 trust: 48-, 0q, 0n, 0m, 1f, 0u
gpg: depth: 2 valid: 8 signed: 17 trust: 8-, 0q, 0n, 0m, 0f, 0u
gpg: next trustdb check due at 2014-09-09
pub 3072R/2CFA0010 2014-06-16 [expires: 2015-06-16]
 Key fingerprint = F81D 16F8 3750 307C D090 4DC1 4D05 E6EF 2CFA 0010
uid Given Surname (Example) <given.surname@example.com>
sub 3072R/48083419 2014-06-16 [expires: 2015-06-16]

The above shows the complete exchange between GPG and myself.  I’ll point out a couple of selections I made and explain why I made those choices.

Key type selection

I selected the default selection of two RSA keys.  The keys used for signing and encryption will both be RSA which is strong right now.  DSA has been proven to be weak in certain instances and should be avoided in this context.  I have no comment on ElGamal as I’ve not done research here.  Ultimately the choice is up to you.

Bit strength

I’ve selected 3072 instead of the default 2048 here.  I recommend this as the minimum bit strength as this provides 128 bits of security as compared to 112 bits of security with 2048.  128 bits of security should be secure beyond 2031 as per NIST SP 800-57, Part 1, Rev 3.

Key expiration

By default, I make my keys expire after a year.  This is a fail-safe and can be later modified before the expiration to extend the expiration another year.  This makes sure the key will self destruct if you ever lose control of it.

Identifying information

You’ll  now be asked to add your name and email address.  This should be self-explanatory.

Key revocation

Once you have completed your key generation now is the time to generate the key revocation file.  If you ever lose control of your key you should immediately upload this file to the public key servers so everyone using your key will know that it has [potentially] been compromised.  Once you’ve generated this revocation just keep it somewhere safe.  You can even print it out and keep it locked up somewhere.  It’s important to do this this ahead of time as you may not be able to do this later.  You’ll obviously want to substitute your own keyid for 2CFA0010.

$ gpg --gen-revoke 2CFA0010
sec 3072R/2CFA0010 2014-06-16 Given Surname (Example) <given.surname@example.com>
Create a revocation certificate for this key? (y/N) y
Please select the reason for the revocation:
 0 = No reason specified
 1 = Key has been compromised
 2 = Key is superseded
 3 = Key is no longer used
 Q = Cancel
(Probably you want to select 1 here)
Your decision? 1
Enter an optional description; end it with an empty line:
> 
Reason for revocation: Key has been compromised
(No description given)
Is this okay? (y/N) y
You need a passphrase to unlock the secret key for
user: "Given Surname (Example) <given.surname@example.com>"
3072-bit RSA key, ID 2CFA0010, created 2014-06-16
ASCII armored output forced.
Revocation certificate created.
Please move it to a medium which you can hide away; if Mallory gets
access to this certificate he can use it to make your key unusable.
It is smart to print this certificate and store it away, just in case
your media become unreadable. But have some caution: The print system of
your machine might store the data and make it available to others!
-----BEGIN PGP PUBLIC KEY BLOCK-----
Version: GnuPG v1
Comment: A revocation certificate should follow
iQGfBCABAgAJBQJTnwtaAh0CAAoJEE0F5u8s+gAQHMQMANH1JG5gVDnp5NY4o8ji
3j6GljQ9ieY+u3c5q0c08/uSAqGvL9jmPn1QAnikAkIJGy9kNmBJ/uC6pSMcHeCW
/vYWMD/cToy63tgLOf4A8GgX2k8ttFe+DpFFSt43zbGVowykZ5AHwKImtyFwVO7M
IKQZV21uFcIDl7jb5GkymkpWRZmIrexOyIAQjpyYWQT4BFFnI7kwpYyVbmodkwE/
JaC0d5dMVT9DRLr5FGuGSpzYJEeB14GCjT2EQ1js/Bji2fguFqpzM5z77FdzhS7s
SNGgY8bioyjUN3CsyHMfPpkJi9mBDCV4gTxyLlVOdDiSdqA56mzjvrx3tnltfjyN
kFJfPDWLqXFNpzX516oOo37b3P92bSEPcIgGeTL58nVUn/BWMsoDlIbwNyjxx7Tq
YYXa2T2rbH1JHndOrmAc9X98cNrhs+vppV6SBev2MnvqobT2nqW7hKeNvwIyqunF
79fL9En2p57pQ8vH4EeRhjFSciuZZBpCEv2cMIDQGMFKVQ==
=6ljf
-----END PGP PUBLIC KEY BLOCK-----

Proper key storage

Generally speaking, your private PGP key is stored on your computer encrypted.  It is protected by your normal security measures of your computer and whatever password you set.  There is a better way.  Use a hardware security module (HSM) like a Yubikey Neo, OpenPGP card, or CryptoStick to protect  your private key from disclosure.

Publishing your public key

Now that you have your PGP keys you’ll want to publish your public key to the key servers so others can easily obtain it to validate your signatures.

$ gpg  --keyserver hkps://hkps.pool.sks-keyservers.net --send-keys 2CFA0010

You’ll obviously want to substitute your own keyid for 2CFA0010.  This command will send your key to the SKS public key servers which will then replicate your key around the world in a few hours.

 


June 14, 2014

Why POpen for OpenSSL calls

Many people have questioned why I chose to use popen to call the OpenSSL binary from Keystone and the auth_token middleware. Here is my rationale:

Keystone and the other API services in OpenStack are run predominantly from the Eventlet Web Server. Eventlet is a continuation based server, and requires cooperative effort to multitask. This means that if a function call misbehaves, the server is incapable of handling additional requests. The call to a asymmetric cryptography function like signing (or verifying the signature for) a document is expensive. There are a couple several ways that this could be problematic. The Cryptographic library could call into native code without releasing the GIL. The Cryptographic library call could tie up the CPU without giving the Eventlet server the ability to schedule another greenthread.

The popen call performs the Posix Fork system call, and runs the target library in a subprocess. Greenlet has support for this kind of call. When the Greenlet version of popen is called, the greenthread making the call yields the scheduler. The greenlet schedule periodically checks the status. Meanwhile, other greenthreads can make progress in the system.

What is the price of the popen? There is the context switch: an operating system switch from one process to another. There is also the start up costs of running the other process. The executable needs to be loaded in to memory. Then, the output from the parent process (in this case, the document to sign or verify) is passed via a pipe to the child process. Once the child process has completed the operation on the document, it returns the output via pipes back to the parent process. The child process is torn down. However, on a loaded system, much of this cost is only paid once. The executable is memory mapped. If one process has the process memory mapped, the additional processes only needs virtual memory operations to access those same mapped pages. The certificates used in the signature process are also loaded from the file system, but are likely to be in the file cache, and thus the operations are once again pure in-memory operations. Since the data signed of validated does not hit the disk, the main cost is the marshalling from process to process.

One reason I chose popen as opposed to a library call was, at the time, there was no clear choice for a python library to use. SMIME (Also known as PKCS-7, or Crypto Message Syntax or CMS) is the standard for document signature. At a minimum, I wanted a mechanism that supported SMIME. While there are several native cryptographic libraries, the widest deployed is OpenSSL, and I wanted something that made use of it.

Aside: Our team has some in house with NSS, and the US Government kindof demands NSS due to it playing nicely with Common Criteria Certification and FIPS 140-2. However, most people out there are not familiar with NSS, and teaching the OpenStack world about how to deal with NSS was more than I could justify. In addition, CMS support is not in the Python-NSS library. To get even deeper: the CMSUtil command from the NSS toolkit did not seem to support stripping the certificates out of the signed document (it foes, I’ve since discovered) which is required for getting the token size as small as possible. Considering that we are seeing problems with tokens exceeding header size limits, I think this is essential. NSS support is likely to gate on resolving these issues. Neither are insurmountable.

Why not do a threadpool? First was the fact that I had no Crypto library to use. M2 Crypto, long time favorite, had just been removed from Nova. It had the operations, but was unsupported. There seems to be no other library out there that handles the whole PKCS-7 set of operations. Most do the hashing and signing just fine, but break down on the ASN format of the document. The OpenSSL Project’s own Python library does not support this. PyCrypto (currently used by barbican) doesn’t even seem to provide full X509 support.

Supposing I did have a library to choose, a threadpool would probably work fine. But then, it completely bypasses all of the benefits of Eventlet using greenthreads. Switching to a truly threaded webserver would make sense…assuming one could be found that worked well within Python’s threading limitations.

Another reason to not do a threadpool is that it would be a solution specific to Eventlet. I have long been campaigning for a transition to Apache HTTPD as the primary container for Keystone. Granted, HTTPD running in pre-fork mode would not even need to do a popen: it could just wait for the response from the library call. But then we are starting to have an explosion of options to test.

It turns out that the real price of the popen comes from the fact that the calling program is Python. When you do a fork, there is no “copy-on-write” semantics for all of the python code. In C, most of the code is in read only memory, and does not need to be duplicated. The same is not true for Python, and thus all those pages need to be copied. Thus far, there have been no complaints due to this thus far. However, it is sufficient reason to plan for a replacement to the popen approach. There are a few potential approaches, but no one stands out yet.

June 13, 2014

Using OpenLMI to join a machine to a FreeIPA domain

People who have been following this (admittedly intermittent) blog for a while are probably aware that in the past I was heavily-involved in the SSSD and FreeIPA projects.

Recently, I’ve been thinking a lot about two topics involving FreeIPA. The first is how to deploy a FreeIPA server using OpenLMI. This is the subject of my efforts in the Fedora Server Role project and will be covered in greater detail in another blog post, hopefully next week.

Today’s topic involves enrollment of FreeIPA clients into the domain from a central location, possibly the FreeIPA server itself. Traditionally, enrolling a system has been a “pull” operation, where an admin signs into the system and then requests that it be added to the domain. However, there are many environments where this is difficult, particularly in the case of large-scale datacenter or cloud deployments. In these cases, it would be much better if one could script the enrollment process.

Additionally, it would be excellent if the FreeIPA Web UI (or CLI) could display a list of systems on the network that are not currently joined to a domain and trigger them to join.

There are multiple problems to solve here. The first of course is whether OpenLMI can control the joining. As it turns out, OpenLMI can! OpenLMI 1.0 includes the “realmd” provider, which acts as a remote interface to the ‘realmd’ service on Fedora 20 (or later) and Red Hat Enterprise Linux 7.0 (or later).

Now, there are some pre-requisites that have to be met before using realmd to join a domain. The first is that the system must have DNS configured properly such that realmd will be able to query it for the domain controller properties. For both FreeIPA and Active Directory, this means that the system must be able to query for the _ldap SRV entry that matches the domain the client wishes to join.

In most deployment environments, it’s reasonable to expect that the DNS servers provided by the DHCP lease (or static assignment) will be correctly configured with this information. However, in a development or testing environment (with a non-production FreeIPA server), it may be necessary to first reconfigure the client’s DNS setup.

Since we’re already using OpenLMI, let’s see if we can modify the DNS configuration that way, using the networking provider. As it turns out, we can! Additionally, we can use the lmi metacommand to make this very easy. All we need to do is run the following command:

lmi -h <client> net dns replace x.x.x.x

With that done, we need to do one more thing before we join the domain. Right now, the realmd provider doesn’t support automatically installing the FreeIPA client packages when joining a domain (that’s on the roadmap). So for the moment, you’re going to want to run

lmi -h <client> sw install freeipa-client

(Replacing ‘freeipa-client’ with ‘ipa-client’ if you’re talking to a RHEL 7 machine).

With that done, now it’s time to use realmd to join the machine to the FreeIPA domain. Unfortunately, in OpenLMI 1.0 we do not yet have an lmi metacommand for this. Instead, we will use the lmishell python scripting environment to perform the join (don’t worry, it’s short and easy to follow!)

c = connect('server', 'username', 'password')
realm_obj = c.root.cimv2.LMI_RealmdService.first_instance()
realm_obj.JoinDomain(Domain='domainname.test', User='admin', Password='password')

In these three lines, we are connecting to the client machine using OpenLMI, getting access to the realm object (there’s only one on a system, so that’s why we use first_instance()) and then calling the JoinDomain() method, passing it the credentials of a FreeIPA administrator with privileges to add a machine, or else passing None for the User and a pre-created one-time password for domain join as the Password.

And there you have it, barring an error we have successfully joined a client to a domain!

Final thoughts: I mentioned above that it would be nice to be able to discover unenrolled systems on the network and display them. For this, we need to look into extending the set of attributes we have available in our SLP implementation so that we can query on this. It shouldn’t be too much work, but it’s not ready today.


June 10, 2014

OpenLMI Ships in RHEL 7

RHEL 7, the latest version of Red Hat Enterprise Linux, was announced today with immediate availability. OpenLMI is included in RHEL 7 – in fact, it was identified in the announcement keynote as one of the key new technologies in RHEL 7.

This means that OpenLMI is now available in a supported Enterprise Linux, as well as in community versions of Linux.

We encourage you to try OpenLMI in either the Enterprise or community versions. As always, see the OpenLMI website for more information.


June 05, 2014

Unattended Install of a FreeIPA Server

As a developer, I install and uninstall the application I’m working on all the time. Back when I was working on FreeIPA full time, I had a couple of functions that I used to do an unattended install with some simple defaults. I recently cleaned them up a little. Since a few people have asked me for them, I’m posting them here.

I have another set of bash functions that manages my set of developer machines. One of the sets the $DEVSERVER variable in my environment.

#The Kerberos REALM generated by this is the domain segment of the
#fully qualified domain name (FQDN) converted to uppercase.    
#If you were running it on local host, you could use `hostname -d` 
#but that doesn't work for a remote system.
ipa-gen-realm(){
    ipahost=$DEVSERVER
    IPAREALM=$( echo $DEVSERVER  | cut -f2- -d. |tr '[:lower:]' '[:upper:]' )
    echo $IPAREALM
}

#The forwarder for DNS can be defined as the existing set of
#nameservers from /etc/resolv.conf.
ipa-gen-resolver(){
     ssh $DEVSERVER " cat /etc/resolv.conf" | awk '/nameserver/ {print $2}'
} 

ipa-gen-install-command(){
    echo  ipa-server-install  -U -r $(ipa-gen-realm) -p FreeIPA4All \
          -a FreeIPA4All --setup-dns --forwarder $( ipa-gen-resolver)
}

Kerberos and Firewalls

Most datacenters block non-standard ports at their firewalls. This includes ports for lesser used protocols. The Kerberos Key Distribution Center (KDC) listens on port 88 (TCP and UDP). Which means that, practically speaking, a machine cannot get a ticket over the public internet. Last summer, Robby Harwood interned here at Red Hat. Together, we put together a plan to address this.

It turns out that the fine folks at Microsoft tripped over this very problem long ago, and came up with an approach: use HTTP to talk to a proxy to the KDC. Their protocol, called KKDCPP, was written up in RFC form on their site. It makes sense that the MIT Kerberos approach should interoperate with the Microsoft product.

The problem with interns is that they have a nasty habit of actually going back to finish their degree. In this case, we had a working prototype by the end of the summer, but still had the long haul to getting it merged into the MIT upstream. Fortunately, we have people here at Red Hat that can make these Herculean labors look easy. In this case, Nalin Dahyabhai spent a good chunk of time these past several months dealing with the refactorings and other changes necessary to get it in.

It merged a couple nights ago. I did the happy dance the next morning.

Kerberos across the public internet still has a long path. The code which merged needs to make it into the next Kerberos release, which needs to make it into the major Linux distributions. Until that happens, we can’t rely on the tools being in place, but we can prepare for it.

Even once it is deployed, there will be issues:

  1. How do you find the right KDC for a given site?
  2. How do you configure your system for a new KDC without giving away root privilege?
  3. How do you tell your browser that you don’t have a principal for a Kerberized site, and to use a different mechanism?

Robbies Development setup is documented here:

So: here’s what you can plan for: there will be a new release of MIT Kerberos. The Current plan is for a release in the fall timeframe, and we are hoping to get that version into Fedora.next. No promises, as this involves synchronizing across two distinct organizations, but it looks promising. We’ll make sure the Debian maintainers are aware as well, and try to make sure the corresponding releases have it. Meanwhile, look for notes on getting the corresponding proxy set up for FreeIPA and other MIT Kerberos server implementations. The Microsoft Proxy server is part of the terminal server product, so if you are a Microsoft shop, that is the path for you.

I’m pretty excited about this. Kerberos has the potential to vastly improve security in the public web.

UPDATE:
Nathaniel McCallum’s implementation of the KDC Proxy

OpenSSL MITM CCS injection attack (CVE-2014-0224)

In the last few years, several serious security issues have been discovered in various cryptographic libraries. Though very few of them were actually exploited in the wild before details were made public and patches were shipped, important issues like Heartbleed have led developers, researchers, and users to take code sanity of these products seriously.

Among the recent issues fixed by the OpenSSL project in version 1.0.1h, the main one that will have everyone talking is the “Man-in-the-middle” (MITM) attack, documented by CVE-2014-0224, affecting the Secure Socket Layer (SSL) and Transport Layer Security (TLS) protocols.

What is CVE-2014-0224 and should I really be worried about it?

The short answer is: it depends. But like any security flaw, its always safer to patch rather than defer and worry.

In order for an attacker to exploit this flaw, the following conditions need to be present.

  • Both the client and the server must be vulnerable. All versions of OpenSSL are vulnerable on the client side. Only 1.0.1 and above are currently known to be vulnerable on the server side. If either the client or the server is fixed, it is not feasible to perform this attack.
  • A Man-In-The-Middle (MITM) attacker: An attacker capable of intercepting and modifying packets off the wire. A decade back, this attack vector seemed almost impossible for anyone but Internet Service Providers as they had access to all the network devices through which most of the traffic on the internet passed.

However with the prevalence of various public wireless access points, easily available at cafes, restaurants, and even free internet access provided by some cities, MITM is now possible. Additionally, there is a variety of software available that provides the capability of faking Access Points. Once clients connect to the fake AP, an attacker could then act as a MITM for the client’s traffic. A successful MITM attack may disclose authentication credentials, sensitive information, or give the attacker the ability to impersonate the victim.

How does this attack work?

SSL/TLS sessions are initiated with the ClientHello and ServerHello handshake messages sent from the respective side. This part of the protocol is used to negotiate the attributes of the session, such as protocol version used, encryption protocol, encryption keys, Message Authentication Code (MAC) secrets and Initializaton Vectors (IV), as well as the extensions supported.

For various reasons, the client or the server may decide to modify the ciphering strategies of the connection during the handshake stage (don’t confuse this with the handshake protocol). This can be achieved by using the ChangeCipherSpec (CCS) request. The CCS consists of a single packet which is sent by both the client and the server to notify that the subsequent records will be protected under the newly negotiated CipherSpec and keys.

As per the standards (RFC 2246, RFC 5246) “The ChangeCipherSpec message is sent during the handshake after the security parameters have been agreed upon, but before the verifying Finished message is sent.”. This however did not happen with OpenSSL, and it accepted a CCS even before the security parameters were agreed upon. It is expected that accepting CCS out of order results in the state between both sides being desynchronized. Usually this should result in both sides effectively terminating the connection, unless you have another flaw present.

In order to exploit this issue, a MITM attacker would effectively do the following:

  • Wait for a new TLS connection, followed by the ClientHello / ServerHello handshake messages.
  • Issue a CCS packet in both the directions, which causes the OpenSSL code to use a zero length pre master secret key. The packet is sent to both ends of the connection. Session Keys are derived using a zero length pre master secret key, and future session keys also share this weakness.
  • Renegotiate the handshake parameters.
  • The attacker is now able to decrypt or even modify the packets in transit.

OpenSSL patched this vulnerability by changing how it handles when CCS packets are received, and how it handles zero length pre master secret values. The OpenSSL patch ensures that is is no longer possible to use master keys with zero length. It also ensures that CCS packets cannot be received before the master key has been set.

What is the remedy?

The easiest solution is to ensure you are using the latest version of OpenSSL your distribution provides. Red Hat has issued security advisories for all of its affected products, and Fedora users should also be able to update their openssl packages to a patched version.

You will need to restart any services using OpenSSL that are not restarted automatically.

If you are a Red Hat customer, there is a tool available located at https://access.redhat.com/labs/ccsinjectiontest/ which you can use to remotely verify the latest patches have been applied and your TLS server is responding correctly.

We have additional information regarding specific Red Hat products affected by this issue that can be found at https://access.redhat.com/site/articles/904433

June 04, 2014

pam_mkhomedir versus SELinux -- Use pam_oddjob_mkhomedir
SELinux is all about separation of powers, minamal privs or reasonable privs.

If  you can break a program into several separate applications, then you can use SELinux to control what each application is allowed.  Then SELinux could prevent a hacked application from doing more then expected.

The pam stack was invented a long time ago to allow customizations of the login process.  One problem with the pam_stack is it allowed programmers to slowly hack it up to give the programs more and more access.  I have seen pam modules that do some crazy stuff.

Since we confine login applications with SELinux, we sometimes come in conflict with some of the more powerful pam modules.
We in the SELinux world want to control what login programs can do.  For example we want to stop login programs like sshd from reading/writing all content in your homedir.

Why is this important?

Over the years it has been shown that login programs have had bugs that led to information leakage without the users ever being able to login to a system.

One use case of pam, was the requirement of creating a homedir, the first time a user logs into a system.  Usually colleges and universities use this for students logging into a shared service.  But many companies use it also.

man pam_mkhomedir
  The pam_mkhomedir PAM module will create a users home directory if it does not exist when the session begins. This allows    users to be present in central database (such as NIS, kerberos or LDAP) without using a distributed file system or pre-creating a large number of directories. The skeleton directory (usually /etc/skel/) is used to copy default files and also sets a umask for the creation.


This means with pam_mkhomedir, login programs have to be allowed to create/read/write all content in your homedir.  This means we would have to allow sshd or xdm to read the content even if the user was not able to login, meaning a bug in one of these apps could allow content to be read or modified without the attacker ever logging into the machine.

man pam_oddjob_mkhomedir
       The pam_oddjob_mkhomedir.so module checks if the user's home  directory exists,  and  if it does not, it invokes the mkhomedirfor method of the com.redhat.oddjob_mkhomedir service for the PAM_USER if the  module  is running with superuser privileges.  Otherwise, it invokes the mkmyhome‐dir method.
       The location of the skeleton directory and the default umask are deter‐mined  by  the  configuration for the corresponding service in oddjobd-mkhomedir.conf, so they can not be specified as arguments to this  module.
       If  D-Bus  has  not been configured to allow the calling application to invoke these methods provided as part of the  com.redhat.oddjob_mkhome‐dir interface of the / object provided by the com.redhat.oddjob_mkhome‐dir service, then oddjobd will not receive the  request  and  an  error  will be returned by D-Bus.


Nalin Dahyabhai wrote pam_oddjob_mkhomedir many years ago to separate out the ability to create a home directory and all of the content from the login programs.  Basically the pam module sends a dbus signal to a dbus service oddjob, which launches a tool to create the homedir and its content.  SELinux policy is written to allow this application to succeed.   We end up with much less access required for the login programs.

If you want the home directory created at login time if it does not exist. Use pam_oddjob_mkhomedir instead of pam_mkhomedir.
New OpenLMI Web Site

I usually hate announcements  of web site redesigns – “to enhance readability  we have moved to Pretentious_Obscure_Font and changed the borders”…

But I think you will like what we’ve done at www.openlmi.org.

First, we’ve totally redone the site navigation. There are now three major components – an introduction and overview, OpenLMI for system administrators, and OpenLMI for developers.  As OpenLMI matures we will talk more about using it, as well as our ongoing focus on core technology development.

Second, we have moved to an adaptive template. The site is now much more usable on mobile devices – try it on your tablet and phone! There are a lot of changes under the hood that you don’t care about and I won’t bore you with.

Third, we are looking for feedback. Let us know how we can make the site even better. And if we’ve broken anything we want to know about it!


June 03, 2014

Keystone tox cheat sheet

While I grumbled when run_tests.sh was deprecated with just a terse message to go read the docs about tox, I have since switched over. Here is my quick tox transition tutorial.

To list the target environments:

tox -l

Currently this is:

py26
py27
py33
pep8
docs
sample_config

To run anyone of these, you pass it to tox via the -e option:

To build just the docs

tox -edocs

Pep 8 check

tox -epep8

Run Unit tests for python2.7

tox -epy27

Test coverage

tox -e cover

Update config file

 tox -esample_config

Run any of them with the -r flag to recreate the tox repo, but it takes a long time.

Each of the environments listed corresponds to a virtual environment under .tox. If you want to, say, run the keystone server that you have built, from the top directory in your keystone git sandbox:

. .tox/py27/bin/activate
.tox/py27/bin/keystone-all

Note that the same thing will work for a custom keystone client:

cd /opt/stack/python-keystonclient
. .tox/py27/bin/activate
. ~/keystonerc #pronounced keystoner-see
.tox/py27/bin/bin/keystone token-get

If the python33 run fails with the error

‘db type could not be determined’

you should remove the file

rm .testrepository/time.dbm

To just build the virtual environment for development without running the tests:

 tox -epy27 --notest -r

If you have more, let me know and I’ll update the post.

June 02, 2014

OpenLMI on YouTube

For your entertainment and amusement, Russell Doty and Stephen Gallagher have done a reprise of their Red Hat Summit presentation on OpenLMI and posted it to YouTube.

These videos answer some of the most common questions we hear about OpenLMI:

  • Why are you doing this?
  • What does it do for me?
  • How do I use it?
  • How does it fit with the rest of the system?

Four videos are available on the TechPonder channel:

Intro to OpenLMI Part 1

Intro to OpenLMI Part 2: Architecture

Intro to OpenLMI Part 3: Examples

Intro to OpenLMI Part 4: Elephants and Conclusions