Fedora summer-coding Planet

The Matrix Resurrections - My review

Posted by Mo Morsi on December 30, 2021 03:31 PM

It's been over a year since I updated this blog... how the time flies! I've been meaning to write but just haven't had the time, Dev Null Productions' latest project Ledger City is gaining some serious momentum and it's pedal to the metal when it comes to development. I did take a few hours to see The Matrix Resurrections last night, and the more I reflect on it, the more I realize the genius of what the movie is. Going into it, I set the bar low on expectations which helped, but not only did it surpass that, it did so immensely. Below is my review of the movie.

MAJOR SPOILERS BELOW, IF YOU HAVEN'T SEEN IT, STOP READING

I was always a huge Matrix fan. The original came out in 1999, my sophomore year in high school. By senior year, when we were answering survey questions for the yearbook, I answered "What is your favourite movie?" with "The Matrix". Contrast this to the most popular choice, "American Pie 2"...

The Matrix series is a complex one, working on many levels. By blending eastern philosophy with western action, it was a ground breaking experience. The first obviously being the best, introduced the unwashed masses to new concepts only contained in books that were too long and boring to read up to this point. Ironically the most critical point of the whole series is presented in an extremely quick shot at the very beginning of the first movie, we'll get to this later.

The Matrix tells the story of human rebels who have broken out of an artificial world which sentient AI created to imprison humans so as to harness their energy. Putting the reality of thermodynamics and entropy aside, throughout the series we see the story of these rebels, particularly the main character Neo, as they fight to free humanity from The Matrix, and take down the system. While at points, it can be seen as over the top, it's a refreshing experience to those accustomed to the endless droll of brainless action movies, super hero movies, rom-coms, and more.

<iframe allow="accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture" allowfullscreen="allowfullscreen" frameborder="0" height="315" src="https://www.youtube.com/embed/YgJ5ZEn67tk" title="YouTube video player" width="560"></iframe>

The overall theme of the four matrix movies can be summarized with the following:

  • The Matrix: Free will vs determinism
  • The Matrix Reloaded: Fallacy of free will, no matter what we choose our paths are fixed
  • The Matrix Revolutions: Reincarnation, everything new is old and everything old is new, the cycle continues
  • The Matrix Resurrections: All of that is silly, what really matters is emotion and feeling, ala the human experience

And this brings us to the fourth movie. It was very different than the previous, it almost had a comedic / light tone to it, and this was done on purpose. It had to be satirical, the beginning summarized it perfectly 'Matrix' culture has become too serious: 'oh no we are all living in the matrix', 'you need to red pill', 'matrix means this, that, blah blah'. Any attempt at producing a "serious" extension to the cannon would have been impossible to pull off. The reviews would have been scathing and the series would have been ruined. Lana Wachowski (Lily did not participate in the fourth Matrix movie) did a great job at poking fun at Warner Brothers, the Matrix fanbase, and even herself as a director without it being too offensive. It almost felt like an episode of South Park.

<iframe src="https://tv.getyarn.io/yarn-clip/9c6edc83-98c3-4b68-a52f-05502ead7c7b/embed?autoplay=false" style="width: 100%; border: none; display: block; max-width: 420px; height: 320px;"> </iframe> "At least it doesn't get all preachy and up it's own a$$"

The movie was a continuation of the series in every spirit of the word, but it almost felt as if it was a fresh reboot, a feat that is difficult to pull off. At the end of The Matrix Revolutions we see the "dawn of a new day", the dreary oppressive green hue of the matrix was replaced with a bright and colorful sunrise, implying the future will be bright and optimistic. The Matrix Resurrections continues this, but also takes it to the extreme. Neo, who is again trapped in The Matrix, continues his droll life, every day seemingly the same as the one before it, and there is an aura of depression. I find this very telling of the modern world and modern mannerisms. Even though modern life has never been better than ever in history, most humans have access to food and fresh water, many conveniences, and a plethora of entertainment options (to keep us plugged into the matrix), something is missing. It's almost as if all of this is the expense of the human spirit, which fights to be constrained (we're building up to Nietzche...).

My initial reaction to Neil Patrick Harris and the new Architect and the actor playing Agent Smith was negative. I could not shake the phenomenal portrayal in the original trilogy, every time Smith spoke I couldn't help but think of Hugo Weaving saying "Mr. Anderson!". But realizing the tone of the above, it's apparent that these choices had to be made. As NPH says in Resurrections (paraphrasing), "my predecessor was too logical, ones-and-zeros, formulas, etc. I realized the key to the human experience was emotion and feeling". In the same scene, he convey what is perhaps the penultimate point of the fourth movie, "humans generate the most output when they are being emotional and full of feeling, continuously seeking what they don't have, while trying not to lose that which they do". I'm sure every artist would agree!

Agent Smith on the other hand had become full blown human in all but tangibility, finally embracing the notion that human feelings are not to be shunned, he almost takes satirical glee in his new outlook on life, while at the same time taking pleasure in the continuing animosity with his adversary ("Tom" as he now calls him) due to conflicting goals.

The references to the original series were powerful, when Bunny rolled up her sleeves revealing the white rabbit, we felt a sense of peace in Neo, the conflict that dominated his psyche through the first hour of the movie was resolved, and the symbol allowed him to center his being, and follow Bunny to his ultimate freedom. Seti's appearance towards the end of the movie also had a similar calming effect, the character served as a guide through the madness, a path through which clarity can be attained and purpose revealed.

Which brings us back to the beginning. Go back and watch the first 15 minutes of the first Matrix movie, particularly the scene which Neo is introduced (the original "White Rabbit" scene). There Neo is woken up from his quest to find the truth by, who is later revealed to be Trinity, hacking into his computer. Shortly after, he is startled by a knock at his door. Choi, a client, seeks to buy contraband software from Neo. Neo takes the money and goes to retrieve the disk. The critical shot is of the book he stores the disk in: "Simulation and Simulacra", and even more particularly the chapter he opens to "On Nihilism".

Nihilism is the school of philosophy progenitized by Friedrich Nietzche, in which there is no intrinsic meaning, it is up to each and every individual human to "fight the good fight" against this cold unrelenting universe to impose our will and manifest our spirit. There is no easy answers, there is no external stimuli or answer that is "right". It is up to every single person to manifest their self being and define their reality, ignoring all external criticism and praise. There will be no point which the battle is "won", no predetermined path which you can follow, no day which you will wake up and rest. The soul is meant to fight against the insurmountable force of the universe which seeks to suppress it.

Unfortunately it's impossible to sell movie tickets on this premise (Baudrillard, the author of "Simulation and Simulacra" was critical of the first Matrix movie, essentially calling it "typical Hollywood hogwash"). Movies are meant to be relatable, presenting characters and situations which the audience can transpose their psyche too. When the characters experience highs and lows, good and bad, we feel the same. By nature, they guide us on a path of development and progression, with an ultimate resolution, good or bad. As Smith says in the Matrix Revolutions, they are "as artificial as the Matrix itself".

And that's it! Overall, The Matrix Resurrections, was a great movie, not perfect, the action was not on par with the previous (though I suspect this was done on purpose for the reasons mentioned above), and the pacing was inconsistent (slow in the beginning, rushed at the end). But overall it felt like a true sequel, while providing for a fresh and entertaining experience at the same time.

Remember... there is no spoon!

Transactional Operations in Rust

Posted by William Brown on November 13, 2021 02:00 PM

Transactional Operations in Rust

Earlier I was chatting to Yoshua, the author of this async cancellation blog about the section on halt-safety. The blog is a great read so I highly recommend it! The section on halt-safety is bang on correct too, but I wanted to expand on this topic further from what they have written.

Memory Safety vs Application Safety

Yoshua provides the following code example in their blog:

// Regardless of where in the function we stop execution, destructors will be
// run and resources will be cleaned up.
async fn do_something(path: PathBuf) -> io::Result<Output> {
                                        // 1. the future is not guaranteed to progress after instantiation
    let file = fs::open(&path).await?;  // 2. `.await` and 3. `?` can cause the function to halt
    let res = parse(file).await;        // 4. `.await` can the function to halt
    res                                 // 5. execution has finished, return a value
}

In the example, we can see that at each await point the async behaviour could cause the function to return. This would be similar to the non-async code of:

fn do_something(path: PathBuf) -> io::Result<Output> {
    let file = fs::open(&path)?;  // 1. `?` will return an Err if present
    let res = parse(file);        //
    res                           // 2. res may be an Err at this point.
}

In this example we can see that both cancelation or and Err condition could both cause our function to return, regardless of async or not. In this example, since there are no side-effects, it’s not a big deal, but let’s consider a different example that does have side-effects:

fn do_something(path: PathBuf, files_read_counter: &Mutex<u64>) -> io::Result<Output> {
    let mut guard = files_read_counter.lock();
    let file = fs::open(&path)?;  // 1. `?` will return an Err if present
    guard += 1;                   //
    let res = parse(file);        //
    res                           // 2. res may be an Err at this point.
}

This is a nonsensical example, but it illustrates the point. The files read is incremented before we know that the success occured. Even though this is memory safe, it’s created an inconsistent data point that is not reflective of the true state. It’s trivial to resolve when we look at this (relocation of the guard increment), but in a larger example it may not be as easy:

// This is more psuedo rust vs actual rust for simplicities sake.
fn do_something(...) -> Result<..., ...> {
    let mut guard = map.lock();
    guard
        .values_mut()
        .try_for_each(|(k, v)| {
            v.update(...)
        })
}

In our example we have a fallible value update function, which is inside our locked datastructure. It would be very simple to see a situation where while updating some values, an error is encountered somewhere into the set, and then an Err returned. But what happens to the entries we did update? Since we return from the Err here, the guard will be dropped, and the lock successfully released, meaning that we have only partially updated our map in this situation. This kind of behaviour can still be defended against as a programmer, but it requires us as humans to bear this cognitive load to ensure our application is behaving safely.

Databases

Databases have confronted this problem for many decades now, and a key logical approach is ACID compliance:

  • Atomicity - each operation is a single unit that fails or succeeds together
  • Consistency - between each unit, the data always moves from a valid state to another valid state
  • Isolation - multiple concurrent operations should behave as though they are executed in serial
  • Durability - the success of a unit is persisted in the event of future errors IE power-loss

For software, we tend to care more for ACI in this example, but of course if we are writing a database in Rust, it would be important to consider D.

When we look at our examples from before, these both fail the atomicity and consistency checks (but they are correctly isolated due to the mutex which enforces serialisation).

ACID in Software

If we treat a top level functional call as our outer operation, and the inner functions as the units comprising this operation, then we can start to look at calls to functions as a transactional entity, where the call to a single operation either succeeds or fails, and the functions within that are unsafe (aka spicy 🌶 ) due to the fact they can create inconsistent states. We want to write our functions in a way that spicy functions can only be contained within operations and creates an environment where either the full operation succeeds or fails, and then ensures that consistency is maintained.

An approach that can be used is software transactional memory. There are multiple ways to structure this, but copy-on-write is a common technique to achieve this. An example of a copy-on-write cell type is in concread. This type allows for ACI (but not D) compliance.

Due to the design of this type, we can seperate functions that are acquiring the guard (operations) and the functions that comprise that operation as they are a passed a transaction that is in progress. For example:

// This is more psuedo rust vs actual rust for simplicities sake.
fn update_map(write_txn: &mut WriteTxn<Map<..., ...>>) -> Result<..., ...> {
    write_txn
        .values_mut()
        .try_for_each(|(k, v)| {
            v.update(...)
        })
}

fn do_something(...) -> Result<..., ...> {
    let write_txn = data.write();
    let res = update_map(write_txn)?;
    write_txn.commit();
    Ok(res)
}

Here we can already see a difference in our approach. We know that for update_map to be called we must be within a transaction - we can not “hold it wrong”, and the compiler checks this for us. We can also see that we invert drop on the write_txn guard from “implicit commit” to a drop being a rollback operation. The commit only occurs explicitly and takes ownership of the write_txn preventing it being used any further without a new transaction. As a result in our example, if update_map were to fail, we would implicitly rollback our data.

Another benefit in this example is async, thread and concurrency safety. While the write_txn is held, no other writes can proceed (serialised). Readers are also isolated and guaranteed that their data will not chainge for the duration of that operation (until a new read is acquired). Even in our async examples, we would be able to correctly rollback during an async cancelation or error condition.

Future Work

At the moment the copy on write structures in concread only can protect single datastructures, so for more complex data type you end up with a struct containing many transactional cow types. There is some work going on to allow the creation of a manager that can allow arbitary structures of multiple datatypes to be protected under a single transaction manager, however this work is extremely unsafe though due to the potential for memory safety violations with incorrect construction of the structures. For more details see the concread internals , concread linear cowcell and, concread impl lincowcell

Conclusion

Within async and sync programming, we can have cancellations or errors at any time - ensuring our applications are consistent in the case of errors which will happen, is challenging. By treating our internal APIs as a transactional interface, and applying database techniques we can create systems that are “always consistent”. It is possible to create these interfaces in a way that the Rust compiler can support us through it’s type system to ensure we are using the correct transactional interfaces as we write our programs - helping us to move from just memory safety to broader application safety.

Results from the OpenSUSE 2021 Rust Survey

Posted by William Brown on October 07, 2021 02:00 PM

Results from the OpenSUSE 2021 Rust Survey

From September the 8th to October the 7th, OpenSUSE has helped me host a survey on how developers are using Rust in their environments. As the maintainer of the Rust packages in SUSE and OpenSUSE it was important for me to get a better understanding of how people are using Rust so that we can make decisions that match how the community is working.

First, to every single one of the 1360 people who responded to this survey, thank you! This exceeded my expectations and it means a lot to have had so many people take the time to help with this.

All the data can be found here

What did you want to answer?

I had assistance from a psychology researcher at a local university to construct the survey and her help guided the structure and many of the questions. An important element of this was that the questions provided shouldn’t influence people into a certain answer, and that meant questions were built in a way to get a fair response that didn’t lead people into a certain outcome or response pattern. As a result, it’s likely that the reasons for the survey was not obvious to the participants.

What we wanted to determine from this survey:

  • How are developers installing rust toolchains so that we can attract them to OpenSUSE by reducing friction?
  • In what ways are people using distribution rust packages in their environments (contrast to rustup)?
  • Should our rust package include developer facing tools, or is it just another component of a build pipeline?
  • When people create or distribute rust software, how are they managing their dependencies, and do we need to provide tools to assist?
  • Based on the above, how can we make it easier for people to distribute rust software in packages as a distribution?
  • How do developers manage security issues in rust libraries, and how can this be integrated to reduce packaging friction?

Lets get to the data

As mentioned there were 1360 responses. Questions were broken into three broad categories.

  • Attitude
  • Developers
  • Distributors

Attitude

This section was intended to be a gentle introduction to the survey, rather than answering any specific question. This section had 413 non-answers, which I will exclude for now.

We asked three questions:

  • Rust is important to my work or projects (1 disagree - 5 agree)
  • Rust will become more important in my work or projects in the future.  (1 disagree - 5 agree)
  • Rust will become more important to other developers and projects in the future (1 disagree - 5 agree)
../../../_images/1.png ../../../_images/2.png ../../../_images/3.png

From this there is strong support that rust is important to individuals today. It’s likely this is biased as the survey was distributed mainly in rust communities, however, we still had 202 responses that were less than 3. Once we look at the future questions we see strong belief that rust will become more important. Again this is likely to be biased due to the communities the survey was distributed within, but we still see small numbers of people responding that rust will not be important to others or themself in the future.

As this section was not intended to answer any questions, I have chosen not to use the responses of this section in other areas of the analysis.

Developers

This section was designed to help answer the following questions:

  • How are people installing rust toolchains so that we can attract them to OpenSUSE by reducing friction?
  • In what ways are people using distribution rust packages in their environments (contrast to rustup)?
  • Should our rust package include developer facing tools, or is it just another component of a build pipeline?

We asked the following questions:

  • As a developer, I use Rust on the following platforms while programming.
  • On your primary development platform, how did you install your Rust toolchain?
  • The following features or tools are important in my development environment (do not use 1 - use a lot 5)
    • Integrated Development Environments with Language Features (syntax highlight, errors, completion, type checking
    • Debugging tools (lldb, gdb)
    • Online Documentation (doc.rust-lang.org, docs.rs)
    • Offline Documentation (local)
    • Build Caching (sccache)

Generally we wanted to know what platforms people were using so that we could establish what people on linux were using today vs what people on other platforms were using, and then knowing what other platforms are doing we can make decisions about how to proceed.

../../../_images/4.png

There were 751 people who responded that they were a developer in this section. We can see Linux is the most popular platform used while programming, but for “Linux only” (derived by selecting responses that only chose Linux and no other platforms) this number is about equal to Mac and Windows. Given the prevalence of containers and other online linux environments it would make sense that developers access multiple platforms from their preferred OS, which is why there are many responses that selected multiple platforms for their work.

../../../_images/5.png

From the next question we see overwhelming support of rustup as the preferred method to install rust on most developer machines. As we did not ask “why” we can only speculate on the reasons for this decision.

../../../_images/6.png

When we isolate this to “Linux only”, we see a slight proportion increase in package manager installed rust environments, but there remains a strong tendancy for rustup to be the preferred method of installation.

This may indicate that even within Linux distros with their package manager capabilities, and even with distributions try to provide rapid rust toolchain updates, that developers still prefer to use rust from rustup. Again, we can only speculate to why this is, but it already starts to highlight that distribution packaged rust is unlikely to be used as a developer facing tool.

../../../_images/7.png ../../../_images/8.png ../../../_images/9.png

Once we start to look at features of rust that developers rely on we see a very interesting distribution. I have not included all charts here. Some features are strongly used (IDE rls, online docs) where others seem to be more distributed in attitude (debuggers, offline docs, build caching). From the strongly supported features when we filter this by linux users using distribution packaged rust, we see a similar (but not as strong) trend for importance of IDE features. The other features like debuggers, offline docs and build caching all remain very distributed. This shows that tools like rls for IDE integration are very important, but with only a small number of developers using packaged rust as developers versus rustup it may not be an important area to support with limited packaging resources and time. It’s very likely that developers who are on other distributions, mac or windows are more comfortable with a rustup based installation process.

Distributors

This section was designed to help answer the following questions:

  • Should our rust package include developer facing tools, or is it just another component of a build pipeline?
  • When people create or distribute rust software, how are they managing their dependencies, and do we need to provide tools to assist?
  • Based on the above, how can we make it easier for people to distribute rust software in packages as a distribution?
  • How do developers manage security issues in rust libraries, and how can this be integrated to reduce packaging friction?

We asked the following questions:

  • Which platforms (operating systems) do you target for Rust software
  • How do you or your team/community build or provide Rust software for people to use?
  • In your release process, how do you manage your Rust dependencies?
  • In your ideal workflow, how would you prefer to manager your Rust dependencies?
  • How do you manage security updates in your Rust dependencies?
../../../_images/10.png

Our first question here really shows the popularity of Linux as a target platform for running rust with 570 out of 618 responses indicating they target Linux as a platform.

../../../_images/11.png

Once we look at the distribution methods, both building projects to packages and using distribution packaged rust in containers fall well behind the use of rustup in containers and locally installed rust tools. However if we observe container packaged rust and packaged rust binaries (which likely use the distro rust toolchains) we have 205 uses of the rust package out of 1280 uses, where we see 59 out of 680 from developers. This does indicate a tendancy that the rust package in a distribution is more likely to be used in a build pipeline over developer use - but rustup still remains most popular. I would speculate that this is because developers want to recreate the same process on their development systems as their target systems which would likely involve rustup as the method to ensure the identical toolchains are installed.

The next questions were focused on rust dependencies - as a staticly linked language, this changes the approach to how libraries can be managed. To answer how we as a distribution should support people in the way they want to manage libraries, we need to know how they use it today, and how they would ideally prefer to manage this in the future.

../../../_images/12.png ../../../_images/13.png

In both the current process and ideal processes we see a large tendancy to online library use from crates.io, and in both cases vendoring (pre-downloading) comes in second place. Between the current process and ideal process, we see a small reduction in online library use to the other options. As a distribution, since we can not provide online access to crates, we can safely assume most online crates users would move to vendoring if they had to work offline for packaging as it’s the most similar process available.

../../../_images/14.png ../../../_images/15.png

We can also look at some other relationships here. People who provide packages still tend to ideally prefer online crates usage, with distribution libraries coming in second place here. There is still significant momentum for packagers to want to use vendoring or online dependencies though. When we look at ideal management strategies for container builds, we see distribution packages being much less popular, and online libraries still remaining at the top.

../../../_images/16.png

Finally, when we look at how developers are managing their security updates, we see a really healthy statistic that many people are using tools like cargo audit and cargo outdated to proactively update their dependencies. Very few people rely on distribution packages for their updates however. But it remains that we see 126 responses from users who aren’t actively following security issues which again highlights a need for distributions who do provide rust packaged software to be proactive to detect issues that may exist.

Outcomes

By now we have looked at a lot of the survey and the results, so it’s time to answer our questions.

  • How are people installing rust toolchains so that we can attract them to OpenSUSE by reducing friction?

Developers are preferring the use of rustup over all other sources. Being what’s used on linux and other platforms, we should consider packaging and distributing rustup to give options to users (who may wish to avoid the curl | sh method.) I’ve already started the process to include this in OpenSUSE tumbleweed.

  • In what ways are people using distribution rust packages in their environments (contrast to rustup)?
  • Should our rust package include developer facing tools, or is it just another component of a build pipeline?

Generally developers tend strongly to rustup for their toolchains, where distribution rust seems to be used more in build pipelines. As a result of the emphasis on online docs and rustup, we can likely remove offline documentation and rls from the distribution packages as they are either not being used or have very few users and is not worth the distribution support cost and maintainer time. We would likely be better to encourage users to use rustup for developer facing needs instead.

To aid this argument, it appears that rls updates have been not functioning in OpenSUSE tumbleweed for a few weeks due to a packaging mistake, and no one has reported the issue - this means that the “scream test” failed. The lack of people noticing this again shows developer tools are not where our focus should be.

  • When people create or distribute rust software, how are they managing their dependencies, and do we need to provide tools to assist?
  • Based on the above, how can we make it easier for people to distribute rust software in packages as a distribution?

Distributors prefer cargo and it’s native tools, and this is likely an artifact of the tight knit tooling that exists in the rust community. Other options don’t seem to have made a lot of headway, and even within distribution packaging where you may expect stronger desire for packaged libraries, we see a high level of support for cargo directly to manage rust dependencies. From this I think it shows that efforts to package rust crates have not been effective to attract developers who are currently used to a very different workflow.

  • How do developers manage security issues in rust libraries, and how can this be integrated to reduce packaging friction?

Here we see that many people are proactive in updating their libraries, but there still exists many who don’t actively manage this. As a result, automating tools like cargo audit inside of build pipelines will likely help packagers, and also matches their existing and known tools. Given that many people will be performing frequent updates of their libraries or upstream releases, we’ll need to also ensure that the process to update and commit updates to packages is either fully automated or at least has a minimal hands on contact as possible. When combined with the majority of developers and distributors prefering online crates for dependencies, encouraging people to secure these existing workflows will likely be a smaller step for them. Since rust is staticly linked, we can also target our security efforts at leaf (consuming) packages rather than the libraries themself.

Closing

Again, thank you to everyone who answered the survey. It’s now time for me to go and start to do some work based on this data!

Gnome 3 compare to MacOs

Posted by William Brown on September 11, 2021 02:00 PM

Gnome 3 compare to MacOs

An assertion I have made in the past is that to me “Gnome 3 feels like MacOs with rough edges”. After some discussions with others, I’m finally going to write this up with examples.

It’s worth pointing out that in my opinion, Gnome 3 is probably still the best desktop experience on Linux today for a variety of reasons - it’s just that for me, these rough edges really take away from that being a good experience for me.

High Level Structure Comparison

Here’s a pair of screenshots of MacOS 11.5.2 and Gnome 40.4. In both we have the settings menu open of the respective environment. Both are set to the resolution of 1680x1050, with the Mac using scaling from retina (2880x1800) to this size.

../../../_images/gnome-settings-1.png ../../../_images/macos-settings-1.png

From this view, we can already make some observations. Both of these have a really similar structure which when we look at appears like this:

../../../_images/skeleton.png

The skeleton overall looks really similar, if not identical. We have a top bar that provides a system tray and status and a system context in the top left, as well as application context.

Now we can look at some of the details of each of the platforms at a high level from this skeleton.

We can see on the Mac that the “top menu bar” takes 2.6% of our vertical screen real-estate. Our system context is provided by the small Apple logo in the top left that opens to a menu of various platform options.

Next to that, we can see that our system preferences uses that top menu bar to provide our application context menus like edit, view, window and help. Further, on the right side of this we have a series of icons for our system - some of these from third party applications like nextcloud, and others coming from macos showing our backup status, keyboard, audio, battery, wifi time and more. This is using the space at the top of our screen really effectively, it doesn’t feel wasted, and adds context to what we are doing.

If we now look at Gnome we can see a different view. Our menu bar takes 3.5% of our vertical screen realestate, and the dark colour already feels like it is “dominating” visually. In that we have very little effective horizontal space use. The activities button (system context) takes us to our overview screen, and selecting the “settings” item which is our current application has no response or menu displayed.

The system tray doesn’t allow 3rd party applications, and the overview only shows our network and audio status and our clock (battery may be displayed on a laptop). To find more context about our system requires interaction with the single component at the top right, limiting our ability to interact with a specific element (network, audio etc) or understand our systems state quickly.

Already we can start to see some differences here.

  • UI elements in MacOS are smaller and consume less screen space.
  • Large amounts of non-functional dead space in Gnome
  • Elements are visually more apparently and able to be seen at a high level, where Gnome’s require interaction to find details

System Preferences vs Settings

Let’s compare the system preferences and Settings now. These are still similar, but not as close as our overall skeleton and this is where we start to see more about the different approaches to design in each.

The MacOS system preferences has all of it’s top level options displayed in a grid, with an easily accesible search function and forward and back navigation aides. This make it easy to find the relevant area that is required, and everything is immediately accessible and clear. Searching for items dims the application and begins to highlight elements that contain the relevant topic, helping to guide you to the location and establishing to the user where they can go in the future without the need to search. Inside any menu of the system preferences, search is always accesible and in the same consistent location of the application.

../../../_images/macos-settings-search.png

When we look at Gnome, in the settings application we see that not all available settings are displayed - the gutter column on the left is a scrollable UI element, but with no scroll bars present, this could be missed by a user that the functionality is present. Items like “Applications” which have a “>” present confusingly changes the gutter context to a list of applications rather than remaining at the top level when selected like all other items that don’t have the “>”. Breaking the users idea of consistency, when in these sub-gutters, the search icon is replaced with the “back” navigation icon, meaning you can not search when in a sub-gutter.

Finally, even visually we can see that the settings is physically larger as a window, with much larger fonts and the title bar containing much more dead space. The search icon (when present) requires interaction before the search text area appears adding extra clicks and interactions to achieve the task.

When we do search, the results are replaced into the gutter element. Screen lock here is actually in a sub-gutter menu for privacy, and not discoverable at the top level as an element. The use of nested gutters here adds confusion about where items are due to all the gutter content changes.

../../../_images/gnome-settings-search.png

Again we are starting to see differences here:

  • MacOS search uses greater visual feedback to help guide users to where they need to be
  • Gnome hides many options in sub-menus, or with very few graphical guides which hinders discovery of items
  • Again, the use of dead space in Gnome vs the greater use of space in MacOS
  • Gnome requires more interactions to “get around” in general
  • Gnome applications visually are larger and take up more space of the screen
  • Gnome changes the UI and layout in subtle and inconsistent ways that rely on contextual knowledge of “where” you currently are in the application

Context Menus

Lets have a look at some of the menus that exist in the system tray area now. For now I’ll focus on audio, but these differences broadly apply to all of the various items here on MacOS and Gnome.

On MacOS when we select our audio icon in the system tray, we are presented with a menu that contains the current volume, the current audio output device (including options for network streaming) and a link to the system preferencs control panel for further audio settings that may exist. We aren’t overwhelmed with settings or choices, but we do have the ability to change our common options and shortcut links to get to the extended settings if needed.

../../../_images/macos-audio-1.png

A common trick in MacOS though is holding the option key during interactions. Often this can display power-user or extended capabilities. When done on the audio menu, we are also able to then control our input device selection.

../../../_images/macos-audio-2.png

On Gnome, in the system tray there is only a single element, that controls audio, power, network and more.

../../../_images/gnome-audio-1.png

All we can do in this menu is control the volume - that’s it. There are no links to direct audio settings, device management, and there are no “hidden” shortcuts (like option) that allows greater context or control.

To summarise our differences:

  • MacOS provides topic-specific system tray menus, with greater functionality and links to further settings
  • Gnome has a combined menu, that is limited in functionality, and has only a generic link to settings
  • Gnome lacks the ability to gain extended options for power-users to view extra settings or details

File Browser

Finally lets look at the file browser. For fairness, I’ve changed Gnome’s default layout to “list” to match my own usage in finder.

../../../_images/macos-files-1.png

We can already see a number of useful elements here. We have the ability to “tree” folders through the “>” icon, and rows of the browser alternate white/grey to help us visually identify lines horizontally. The rows are small and able to have (in this screenshot) 16 rows of content on the screen simultaneously. Finally, not shown here, but MacOS finder can use tabs for browsing different locations. And as before, we have our application context menu in the top bar with a large amount of actions available.

../../../_images/gnome-files-1.png

Gnomes rows are all white with extremely faint grey lines to delineate, making it hard to horizontally track items if the window was expanded. The icons are larger, and there is no ability to tree the files and folders. We can only see ~10 rows on screen despite the similar size of the windows presented here. Finally, the extended options are hidden in the “burger” menu next to the application close.

A theme should be apparent here:

  • Both MacOS and Gnome share a very similar skeleton of how this application is laid out
  • MacOS makes better use of visual elements to help your eye track across spaces to make connections
  • Gnome has a lot of dead space still and larger text and icons which takes greater amounts of screen space
  • Due to the application context and other higher level items, MacOS is “faster” to get to where you need to go

Keyboard Shortcuts

Keyboard shortcuts are something that aide powerusers to achieve tasks quicker, but the challenge is often finding what shortcuts exist to use them. Lets look at how MacOS and Gnome solve this.

../../../_images/macos-shortcut-1.png

Here in MacOS, anytime we open a menu, we can see the shortcut listed next to the menu item that is present, including disabled items (that are dimmed). Each shortcut’s symbols match the symbols of the keyboard allowing these to be cross-language and accessible. And since we are in a menu, we remain in the context of our Application and able to then immediately use the menu or shortcut.

In fact, even if we select the help menu and search a new topic, rather than take us away from menu’s, MacOS opens the menu and points us to where we are trying to go, allowing us to find the action we want and learn it’s shortcut!

../../../_images/macos-shortcut-2.png

This is great, because it means in the process of getting help, we are shown how to perform the action for future interactions. Because of the nature of MacOS human interface guidelines this pattern exists for all applications on the platform, including third party ones helping to improve accessibility of these features.

Gnome however takes a really different approach. Keyboard shortcuts are listed as a menu item from our burger menu.

../../../_images/gnome-shortcut-1.png

When we select it, our applications context is taken away and replaced with a dictionary of keyboard shortcuts, spread over three pages.

../../../_images/gnome-shortcut-2.png

I think the use of the keyboard icons here is excellent, but because we are now in a dictionary of shortcuts, it’s hard to find what we want to use, and we “taken away” from the context of the actions we are trying to perform in our application. Again, we have to perform more interactions to find the information that we are looking for in our applications, and we aren’t able to easily link the action to the shortcut in this style of presentation. We can’t transfer our knowledge of the “menus” into a shortcut that we can use without going through a reference manual.

Another issue here is this becomes the responsibility of each application to create these references and provide them, rather than being an automatically inherited feature through the adherence to human interface guidelines.

Conclusion

Honestly, I could probably keep making these comparisons all day. Gnome 3 and MacOS really do feel very similar to me. From style of keyboard shortcuts, layout of the UI, the structure of it’s applications and even it’s approach to windowing feels identical to MacOS. However while it looks similar on a surface level, there are many rough edges, excess interactions, poor use of screen space and visual elements.

MacOS certainly has it’s flaws, and makes it’s mistakes. But from a ease of use perspective, it tries to get out of the way and show you how to use the computer for yourself. MacOS takes a back seat to the usage of the computer.

Gnome however feels like it wants to be front and centre. It needs you to know all the time “you’re using Gnome!”. It takes you on a small adventure tour to complete simple actions or to discover new things. It even feels like Gnome has tried to reduce “complexity” so much that they have thrown away many rich features and interactions that could make a computer easier to use and interact with.

So for me, this is why I feel that Gnome is like MacOS with rough edges. There are many small, subtle and frustrating user interactions like this all through out the Gnome 3 experience that just aren’t present in MacOS.

StartTLS in LDAP

Posted by William Brown on August 11, 2021 02:00 PM

StartTLS in LDAP

LDAP as a protocol is a binary protocol which uses ASN.1 BER encoded structures to communicate between a client and server, to query directory information (ie users, groups, locations, etc).

When this was created there was little consideration to security with regard to person-in-the-middle attacks (aka mitm: meddler in the middle, interception). As LDAP has become used not just as a directory service for accessing information, but now as an authentication and authorisation system it’s important that the content of these communications is secure from tampering or observation.

There have been a number of introduced methods to try and assist with this situation. These are:

  • StartTLS
  • SASL with encryption layers
  • LDAPS (LDAP over TLS)

Other protocols of a similar age also have used StartTLS such as SMTP and IMAP. However recent research has (again) shown issues with correct StartTLS handling, and recommends using SMTPS or IMAPS.

Today the same is true of LDAP - the only secure method of communication to an LDAP server is LDAPS. In this blog, I’ll be exploring the issues that exist with StartTLS (I will not cover SASL or GSSAPI).

How does StartTLS work?

StartTLS works by starting a plaintext (unencrypted) connection to the LDAP server, and then by upgrading that connection to begin TLS within the existing connection.

┌───────────┐                            ┌───────────┐
│           │                            │           │
│           │─────────open tcp 389──────▶│           │
│           │◀────────────ok─────────────│           │
│           │                            │           │
│           │                            │           │
│           │────────ldap starttls──────▶│           │
│           │◀──────────success──────────│           │
│           │                            │           │
│  Client   │                            │  Server   │
│           │──────tls client hello─────▶│           │
│           │◀─────tls server hello──────│           │
│           │────────tls key xchg───────▶│           │
│           │◀────────tls finish─────────│           │
│           │                            │           │
│           │──────TLS(ldap bind)───────▶│           │
│           │                            │           │
│           │                            │           │
└───────────┘                            └───────────┘

As we can see in LDAP StartTLS we establish a valid plaintext tcp connection, and then we send and LDAP message containing a StartTLS extended operation. If successful, we begin a TLS handshake over the connection, and when complete, our traffic is now encrypted.

This is contrast to LDAPS where TLS must be successfully established before the first LDAP message is exchanged.

It’s a good time to note that this is inefficent as it takes an extra round-trip to establish StartTLS like this contrast to LDAPS which increases latency for all communications. LDAP clients tend to open and close many connections, so this adds up quickly.

Security Issues

Client Misconfiguration

LDAP servers at the start of a connection will only accept two LDAP messages. Bind (authenticate) and StartTLS. Since StartTLS starts with a plaintext connection, if a client is misconfigured it is trivial for it to operate without StartTLS.

For example, consider the following commands.

# ldapwhoami -H ldap://172.17.0.3:389 -x -D 'cn=Directory Manager' -W
Enter LDAP Password:
dn: cn=directory manager
# ldapwhoami -H ldap://172.17.0.3:389 -x -Z -D 'cn=Directory Manager' -W
Enter LDAP Password:
dn: cn=directory manager

Notice that in both, the command succeeds and we authenticate. However, only in the second command are we using StartTLS. This means we trivially leaked our password. Forcing LDAPS to be the only protocol prevents this as every byte of the connection is always encrypted.

# ldapwhoami -H ldaps://172.17.0.3:636 -x -D 'cn=Directory Manager' -W
Enter LDAP Password:
dn: cn=directory manager

Simply put this means that if you forget to add the command line flag for StartTLS, forget the checkbox in an admin console, or any other kind of possible human error (which happen!), then LDAP will silently continue without enforcing that StartTLS is present.

For a system to be secure we must prevent human error from being a factor by removing elements of risk in our systems.

MinSSF

A response to the above is to enforce MinSSF, or “Minimum Security Strength Factor”. This is an option on both OpenLDAP and 389-ds and is related to the integration of SASL. It represents that the bind method used must have “X number of bits” of security (however X is very arbitrary and not really representative of true security).

In the context of StartTLS or TLS, the provided SSF becomes the number of bits in the symmetric encryption used in the connection. Generally this is 128 due to the use of AES128.

Let us assume we have configured MinSSF=128 and we attempt to bind to our server.

┌───────────┐                            ┌───────────┐
│           │                            │           │
│           │─────────open tcp 389──────▶│           │
│           │◀────────────ok─────────────│           │
│  Client   │                            │  Server   │
│           │                            │           │
│           │──────────ldap bind────────▶│           │
│           │◀───────error - minssf──────│           │
│           │                            │           │
└───────────┘                            └───────────┘

The issue here is the minssf isn’t enforced until the bind message is sent. If we look at the LDAP rfc we see:

BindRequest ::= [APPLICATION 0] SEQUENCE {
     version                 INTEGER (1 ..  127),
     name                    LDAPDN,
     authentication          AuthenticationChoice }

AuthenticationChoice ::= CHOICE {
     simple                  [0] OCTET STRING,
                             -- 1 and 2 reserved
     sasl                    [3] SaslCredentials,
     ...  }

SaslCredentials ::= SEQUENCE {
     mechanism               LDAPString,
     credentials             OCTET STRING OPTIONAL }

Which means that in a simple bind (password) in the very first message we send our plaintext password. MinSSF only tells us after we already made the mistake, so this is not a suitable defence.

StartTLS can be disregarded

An interesting aspect of how StartTLS works with LDAP is that it’s possible to prevent it from being installed successfully. If we look at the RFC:

If the server is otherwise unwilling or unable to perform this
operation, the server is to return an appropriate result code
indicating the nature of the problem.  For example, if the TLS
subsystem is not presently available, the server may indicate this by
returning with the resultCode set to unavailable.  In cases where a
non-success result code is returned, the LDAP session is left without
a TLS layer.

What this means is it is up to the client and how they respond to this error to enforce a correct behaviour. An example of a client that disregards this error may proceed such as:

┌───────────┐                            ┌───────────┐
│           │                            │           │
│           │─────────open tcp 389──────▶│           │
│           │◀────────────ok─────────────│           │
│           │                            │           │
│  Client   │                            │  Server   │
│           │────────ldap starttls──────▶│           │
│           │◀───────starttls error──────│           │
│           │                            │           │
│           │─────────ldap bind─────────▶│           │
│           │                            │           │
└───────────┘                            └───────────┘

In this example, the ldap bind proceeds even though TLS is not active, again leaking our password in plaintext. A classic example of this is OpenLDAP’s own cli tools which in almost all examples of StartTLS online use the option ‘-Z’ to enable this.

# ldapwhoami -Z -H ldap://127.0.0.1:12345 -D 'cn=Directory Manager' -w password
ldap_start_tls: Protocol error (2)
dn: cn=Directory Manager

The quirk is that ‘-Z’ here only means to try StartTLS. If you want to fail when it’s not available you need ‘-ZZ’. This is a pretty easy mistake for any administrator to make when typing a command. There is no way to configure in ldap.conf that you always want StartTLS enforced either leaving it again to human error. Given the primary users of the ldap cli are directory admins, this makes it a high value credential open to potential human input error.

Within client applications a similar risk exists that the developers need to correctly enforce this behaviour. Thankfully for us, the all client applications that I tested handle this correctly:

  • SSSD
  • nslcd
  • ldapvi
  • python-ldap

However, I am sure there are many others that should be tested to ensure that they correctly handle errors during StartTLS.

Referral Injection

Referral’s are a feature of LDAP that allow responses to include extra locations where a client may look for the data they requested, or to extend the data they requested. Due to the design of LDAP and it’s response codes, referrals are valid in all response messages.

LDAP StartTLS does allow a referral as a valid response for the client to then follow - this may be due to the requested server being undermaintenance or similar.

Depending on the client implementation, this may allow an mitm to proceed. There are two possible scenarioes.

Assuming the client does do certificate validation, but is poorly coded, the following may occur:

┌───────────┐                            ┌───────────┐
│           │                            │           │
│           │─────────open tcp 389──────▶│           │
│           │◀────────────ok─────────────│           │
│           │                            │  Server   │
│           │                            │           │
│           │────────ldap starttls──────▶│           │
│           │◀──────────referral─────────│           │
│           │                            │           │
│           │                            └───────────┘
│  Client   │
│           │                            ┌───────────┐
│           │─────────ldap bind─────────▶│           │
│           │                            │           │
│           │                            │           │
│           │                            │ Malicious │
│           │                            │  Server   │
│           │                            │           │
│           │                            │           │
│           │                            │           │
└───────────┘                            └───────────┘

In this example our server sent a referral as a response to the StartTLS extended operation, which the client then followed - however the client did not attempt to install StartTLS again when contacting the malicious server. This would allow a bypass of certification validation by simply never letting TLS begin at all. Thankfully the clients I tested did not exhibt this behaviour, but it is possible.

If the client has configured certificate validation to never (tls_reqcert = never, which is a surprisingly common setting …) then the following is possible.

┌───────────┐                            ┌───────────┐
│           │                            │           │
│           │─────────open tcp 389──────▶│           │
│           │◀────────────ok─────────────│           │
│           │                            │  Server   │
│           │                            │           │
│           │────────ldap starttls──────▶│           │
│           │◀──────────referral─────────│           │
│           │                            │           │
│           │                            └───────────┘
│  Client   │
│           │                            ┌───────────┐
│           │────────ldap starttls──────▶│           │
│           │◀──────────success──────────│           │
│           │                            │           │
│           │◀──────TLS installed───────▶│ Malicious │
│           │                            │  Server   │
│           │───────TLS(ldap bind)──────▶│           │
│           │                            │           │
│           │                            │           │
└───────────┘                            └───────────┘

In this example the client follows the referral and then attempts to install StartTLS again. The malicious server may present any certificate it wishes and can then intercept traffic.

In my testing I found that this affected both SSSD and nslcd, however both of these when redirected to the malicous server would attempt to install StartTLS over an existing StartTLS channel, which caused the server to return an error condition. Potentially a modified malicious server in this case would be able to install two layers of TLS, or a response that would successfully trick these clients to divulging further information. I have not yet spent time to research this further.

Conclusion

While not as significant as the results found on “No Start TLS”, LDAP still is potentially exposed to risks related to StartTLS usage. To mitigate these LDAP server providers should disable plaintext LDAP ports and exclusively use LDAPS, with tls_reqcert set to “demand”.

Getting started with Yew

Posted by William Brown on June 19, 2021 02:00 PM

Getting started with Yew

Yew is a really nice framework for writing single-page-applications in Rust, that is then compiled to wasm for running in the browser. For me it has helped make web development much more accessible to me, but getting started with it isn’t always straight forward.

This is the bare-minimum to get a “hello world” in your browser - from there you can build on that foundation to make many more interesting applications.

Dependencies

MacOS

  • Ensure that you have rust, which you can setup with RustUp.
  • Ensure that you have brew, which you can install from the Homebrew Project. This is used to install other tools.
  • Install wasm-pack. wasm-pack is what drives the rust to wasm build process.
cargo install wasm-pack
  • Install npm and rollup. npm is needed to install rollup, and rollup is what takes our wasm and javacript and bundles them together for our browser.
brew install npm
npm install --global rollup
  • Install miniserve for hosting our website locally during development.
brew install miniserve

A new project

We can now create a new rust project. Note we use –lib to indicate that it’s a library, not an executable.

cargo new --lib yewdemo

To start with we’ll need some boilerplate and helpers to get ourselves started.

index.html - our default page that will load our wasm to run. This is our “entrypoint” into the site that starts everything else off. In this case it loads our bundled javascript.

<!DOCTYPE html>
<html>
  <head>
    <meta charset="utf-8">
    <title>PROJECTNAME</title>
    <script src="/pkg/bundle.js" defer></script>
  </head>
  <body>
  </body>
</html>

main.js - this is our javascript entrypoint that we’ll be using. Remember to change PROJECTNAME to your crate name (ie yewdemo). This will be combined with our wasm to create the bundle.js file.

import init, { run_app } from './pkg/PROJECTNAME.js';
async function main() {
   await init('/pkg/PROJECTNAME_bg.wasm');
   run_app();
  }
main()

Cargo.toml - we need to extend Cargo.toml with some dependencies and settings that allows wasm to build and our framework dependencies.

[lib]
crate-type = ["cdylib"]

[dependencies]
wasm-bindgen = "^0.2"
yew = "0.18"

build_wasm.sh - create this file to help us build our project. Remember to call chmod +x build_wasm.sh so that you can execute it later.

#!/bin/sh
wasm-pack build --target web && \
    rollup ./main.js --format iife --file ./pkg/bundle.js

src/lib.rs - this is a template of a minimal start point for yew. This has all the stubs in place for a minimal “hello world” website.

use wasm_bindgen::prelude::*;
use yew::prelude::*;
use yew::services::ConsoleService;

pub struct App {
    link: ComponentLink<Self>,
}

impl Component for App {
    type Message = App;
    type Properties = ();

    // This is called when our App is initially created.
    fn create(_: Self::Properties, link: ComponentLink<Self>) -> Self {
        App {
            link,
        }
    }

    fn change(&mut self, _: Self::Properties) -> ShouldRender {
        false
    }

    // Called during event callbacks initiated by events (user or browser)
    fn update(&mut self, msg: Self::Message) -> ShouldRender {
        false
    }

    // Render our content to the page, emitting Html that will be loaded into our
    // index.html's <body>
    fn view(&self) -> Html {
        ConsoleService::log("Hello World!");
        html! {
            <div>
                <h2>{ "Hello World" }</h2>
            </div>
        }
    }
}

// This is the entry point that main.js calls into.
#[wasm_bindgen]
pub fn run_app() -> Result<(), JsValue> {
    yew::start_app::<App>();
    Ok(())
}

Building your Hello World

Now you can build your project with:

./build_wasm.sh

And if you want to see it on your machine in your browser:

miniserve -v --index index.html .

Navigate to http://127.0.0.1:8080 to see your Hello World!

Troubleshooting

I made all the following mistakes while writing this blog 😅

build_wasm.sh - permission denied

./build_wasm.sh
zsh: permission denied: ./build_wasm.sh

You need to run “chmod +x build_wasm.sh” so that you can execute this. Permission denied means that the executable bits are missing from the file.

building - ‘Could not resolve’

./main.js → ./pkg/bundle.js...
[!] Error: Could not resolve './pkg/PROJECTNAME.js' from main.js
Error: Could not resolve './pkg/PROJECTNAME.js' from main.js

This error means you need to edit main.js so that PROJECTNAME matches your crate name.

Blank Page in Browser

When you first load your page it may be blank. You can check if a file is missing or incorrectly named by right clicking the page, select ‘inspect’, and in the inspector go to the ‘network’ tab.

From there refresh your page, and see if any files 404. If they do you may need to rename them or there is an error in yoru main.js. A common one is:

PROJECTNAME.wasm: 404

This is because in main.js you may have changed the await init line, and removed the suffix _bg.

# Incorrect
await init('/pkg/PROJECTNAME.wasm');
# Correct
await init('/pkg/PROJECTNAME_bg.wasm');

Compiler Bootstrapping - Can We Trust Rust?

Posted by William Brown on May 11, 2021 02:00 PM

Compiler Bootstrapping - Can We Trust Rust?

Recently I have been doing a lot of work for SUSE with how we package the Rust compiler. This process has been really interesting and challenging, but like anything it’s certainly provided a lot of time for thought while waiting for my packages to build.

The Rust package in OpenSUSE has two methods of building the compiler internally in it’s spec file.

    1. Use our previously packaged version of rustc from packages
    1. Bootstrap using the signed and prebuilt binaries provided by the rust project

Bootstrapping

There are many advocates of bootstrapping and then self sustaining a chain of compilers within a distribution. The roots of this come from Ken Thompsons Turing Award speech known as Reflections on trusting trust . This details the process in which a compiler can be backdoored, to produce future backdoored compilers. This has been replicated by Manish G. detailed in their blog, Reflections on Rusting Trust where they successfully create a self-hosting backdoored rust compiler.

The process can be visualised as:

┌──────────────┐              ┌──────────────┐
│  Backdoored  │              │   Trusted    │
│   Sources    │──────┐       │   Sources    │──────┐
│              │      │       │              │      │
└──────────────┘      │       └──────────────┘      │
                      │                             │
┌──────────────┐      │       ┌──────────────┐      │      ┌──────────────┐
│   Trusted    │      ▼       │  Backdoored  │      ▼      │  Backdoored  │
│ Interpreter  │──Produces───▶│    Binary    ├──Produces──▶│    Binary    │
│              │              │              │             │              │
└──────────────┘              └──────────────┘             └──────────────┘

We can see that in this attack, even with a set of trusted compiler sources, we can continue to produce a chain of backdoored binaries.

This has led to many people, and even groups such as Bootstrappable promoting work to be able to produce trusted chains from trusted sources, so that we can assert a level of trust in our produced compiler binaries.

┌──────────────┐              ┌──────────────┐
│   Trusted    │              │   Trusted    │
│   Sources    │──────┐       │   Sources    │──────┐
│              │      │       │              │      │
└──────────────┘      │       └──────────────┘      │
                      │                             │
┌──────────────┐      │       ┌──────────────┐      │      ┌──────────────┐
│   Trusted    │      ▼       │              │      ▼      │              │
│ Interpreter  │──Produces───▶│Trusted Binary├──Produces──▶│Trusted Binary│
│              │              │              │             │              │
└──────────────┘              └──────────────┘             └──────────────┘

This process would continue forever to the right, where each trusted binary is the result of trusted sources. This then ties into topics like reproducible builds which assert that you can separately rebuild the sources and attain the same binary, showing the process can not have been tampered with.

But does it really work like that?

Outside of thought exercises, there is little evidence of these attacks being carried out in reality.

Last year in 2020 we saw supply chain attacks such as the Solarwinds supply chain attacks which was reported by Fireeye as “Inserting malicious code into legitimate software updates for the Orion software that allow an attacker remote access into the victim’s environment”. What’s really interesting here was that no compiler was compromised in the process like our theoretical attack, but code was simply inserted and then subsequently was released.

Tavis Ormandy in his blog You don’t need reproducible builds covers supply chain security, and examines why reproducible builds are not effective in the promises and claims they present. Importantly, Tavis discusses how trivial it is to insert “bugdoors”, or pieces of code that are malicious and will not be found, and can potentially be waved off as human error.

Today, we don’t even need bugdoors, with Microsoft Security Response Centre reporting that 70% of vulnerabilities are memory safety issues.

No amount of reproducible builds or compiler bootstrapping chain can shield us from the reality that attackers today will target the softest area, and today that is security issues in our languages, and insecure configuration of supply chain infrastructure.

We don’t need backdoored compilers when we know that a security critical piece of software written in C is still exposed to the network.

But lets assume …

Okay, so lets assume that backdoored compilers are a real risk for a moment. We need to establish a few things first to create our secure bootstrapping environment, and these requirements generally are extremely difficult to meet.

We will need:

  • Trusted Interpreter
  • Trusted Sources

This is the foundation, having these two trusted entities that we can use to begin the process. But what is “trusted”? How can we define that these items are truly trusted?

One method could be to check the cryptographic signatures of the released source code, to validate that it is “what was released”, but this does not mean that the source code is free from backdoors/bugdoors which are the very thing we are attempting to shield ourselves from.

What would be truly required here is a detailed and complete audit of all of the source code to these compilers, which would be a monumental task in and of itself. So today instead, we do not perform source code audits, and we blindly trust the providers of the source code as legitimate and having provided us tamper-free source code. We assert that blind trust through the validation of those cryptographic signatures. We blindly trust that they have vetted every commit and line of code, and they have not had their own source code supply chain compromised in some way to provide us this “trusted source”. This gives us a relationship with the producers of that source, that they are trustworthy and have performed vetting of code and their members with privileges, that they will “do the right thing”™.

The second challenge is asserting trust in the interpreter. Where did this binary come from? How was it built? Were it’s sources trusted? As one can imagine, this becomes a very deep rabbit hole when we want to chase it, but in reality the approach taken by todays linux distributions is that “well we haven’t been compromised to this point, so I guess this one is okay” and we yolo build with it. We then create a root of trust in that one point in time, which then creates our bootstrapping chain of trust for future builds of subsequent trusted sources.

So what about Rust?

Rust is interesting compared to something like C (clang/gcc), as the rust project not only provides signed sources, they also provide signed static binaries of their compiler. This is because unlike clang/gcc which have very long release lifecycles, rust is released every six weeks and to build version N of the compiler, requires version N or N - 1. This allows people who have missed a version to easily skip ahead without needing to build every intermediate version of the compiler.

A frequent complaint is the difficulty to package rust because any time releases are missed, you must compile every intermediate version to adhere to the bootstrappable guidelines and principles to created a more “trusted” compiler.

But just like any other humans, in order to save time, when we miss a version, we can use the rust language’s provided signed binaries to reset the chain, allowing us to miss versions of rust, or to re-package older versions in some cases.

                        ┌──────────────┐             ┌──────────────┐
                 │      │   Trusted    │             │   Trusted    │
              Missed    │   Sources    │──────┐      │   Sources    │──────┐
             Version!   │              │      │      │              │      │
                 │      └──────────────┘      │      └──────────────┘      │
                 │                            │                            │
┌──────────────┐ │      ┌──────────────┐      │      ┌──────────────┐      │
│              │ │      │Trusted Binary│      ▼      │              │      ▼
│Trusted Binary│ │      │ (from rust)  ├──Produces──▶│Trusted Binary│──Produces───▶ ...
│              │ │      │              │             │              │
└──────────────┘ │      └──────────────┘             └──────────────┘

This process here is interesting because:

  • Using the signed binary from rust-lang is actually faster since we can skip one compiler rebuild cycle due to being the same version as the sources
  • It shows that the “bootstrappable” trust chain, does not actually matter since we frequently move our trust root to the released binary from rust, rather than building all intermediates

Given this process, we must ask, what value do we have from trying to adhere to the bootstrappable principles with rust? We already root our trust in the rust project, meaning that because we blindly trust the sources and the static compiler, why would our resultant compiler be any more “trustworthy” just because we were the ones who compiled it?

Beyond this the binaries that are issued by the rust project are used by thousands of people every day through tools like rustup. In reality, these have been proven time and time again that they are trusted to be able to run on mass deployments, and that the rust project has the ability and capability to respond to issues in their source code as well as the binaries they provide. They certainly have earned the trust of many people through this!

So why do we keep assuming both that we are somehow more trustworthy than the rust project, but simultaneously they are fully trusted in the artefacts they provide to us?

Contradictions

It is this contradiction that has made me rethink the process that we take to packaging rust in SUSE. I think we should bootstrap from upstream rust every release because the rust project are in a far better position to perform audits and respond to trust threats than part time package maintainers that are commonly part of Linux distributions.

│ ┌──────────────┐                              │ ┌──────────────┐
│ │   Trusted    │                              │ │   Trusted    │
│ │   Sources    │──────┐                       │ │   Sources    │──────┐
│ │              │      │                       │ │              │      │
│ └──────────────┘      │                       │ └──────────────┘      │
│                       │                       │                       │
│ ┌──────────────┐      │      ┌──────────────┐ │ ┌──────────────┐      │      ┌──────────────┐
│ │Trusted Binary│      ▼      │              │ │ │Trusted Binary│      ▼      │              │
│ │ (from rust)  ├──Produces──▶│Trusted Binary│ │ │ (from rust)  ├──Produces──▶│Trusted Binary│
│ │              │             │              │ │ │              │             │              │
│ └──────────────┘             └──────────────┘ │ └──────────────┘             └──────────────┘

We already fully trust the sources they release, and we already fully trust their binary compiler releases. We can simplify our build process (and speed it up!) by acknowledging this trust relationship exists, rather than trying to continue to convince ourselves that we are somehow “more trusted” than the rust project.

Also we must consider the reality of threats in the wild. Does all of this work and discussions of who is more trusted really pay off and defend us in reality? Or are we focused on these topics because they are something that we can control and have opinions over, rather than acknowledging the true complexity and dirtiness of security threats as they truly exist today?

Open Source Enshrines the Wrong Privilege

Posted by William Brown on March 22, 2021 02:00 PM

Open Source Enshrines the Wrong Privilege

Within Open Source/Free Software, we repeatedly see a set of behaviours - hostile or toxic project owners, abusive relationships, aggression towards users, and complete disregard to users of the software. Some projects have risen above this and advanced the social behaviours in their communities, but these are still the minority of projects.

Many advocates for FLOSS have been trying to enhance adoption of these technologies in communities, but with the exception of limited non-technical audiences, this really hasn’t gained much ground.

It is my opinion that these community behaviours, and the low adoption of FLOSS technologies comes back to what our Open Source licenses enshrine - the very thing they embody and create.

The Origins of Free Software

The story of Free Software starts with an individual (later revealed as abusive), who was frustrated at not being able to access software on a printer so that he could alter it’s behaviour. This has been extended to the idea that Free Software “grants people control over their own lives and software”.

This however, is not correct.

What Free Software licenses protect is that individuals with time, resources, specialised technical knowledge and social standing have the possibility to alter that software’s behaviour.

When we consider that the majority of the world are not developers or software engineers, what is it that our Free Software is doing to protect and support these individuals? Should we truly expect individuals who are linguists, authors, scientists, retail staff, or social workers to be able to “alter the software to fix their own problems”?

Even as technical experts, we are frustrated when someone closes an issue with “PR’s welcome”. Imagine how these other people feel when they can’t even express or report the problem in the first place or get told they aren’t good enough, or that “they can fix it themselves if they want”.

This attitude also discounts the subject matter knowledge required to alter or contribute to any piece of software however. I may be a Senior Software Engineer, but I lack the knowledge and time to contribute to Gnome for example. Even with these “freedoms” I lack the ability to “control” the software on my own system.

Open Source is Selfish

These licenses that we have in FLOSS all enshrine selfish and privileged behaviours.

I have the rights to freely access this code so I can read it or alter it.

I can change this project to fix issues I have.

I have freedoms.

None of these statements from FLOSS describe other people - the people who consume our software (in some cases, without choice). People who are not subject matter experts and can’t contribute to “solve their own problems”. People who may not have the experience and language to describe the problems they face.

This lack of empathy, the lack of concern for others in FLOSS leads us to where we are now. Those who have the subject matter knowledge lead projects, and do what they want because they can fix it. They tell others “PR’s welcome” knowing full-well that the other person may never be able to contribute, that the barriers to contribution are so high (both in programming experience and domain knowledge). They design the software to work the way they want, because they understand it and it “works for me”.

This is reflected in our software. Software that not does not care for the needs, experiences or rights of others. Software that pretends to be accessible, all while creating gated communities of control. Software that is out of reach of people, the same people that we “claim” to be working for and supporting.

It leads to our communities that are selfish, and do not empathise with people. Communities that have placed negative behaviours on pedestals and turned these people into “leaders”. Software that does not account for the experiences of our users, believing that the “community knows best”.

One does not need to look far for FLOSS projects that speak one set of words, but their actions do not align.

What Can We Do?

In our projects we need to go beyond preserving the freedoms of ourselves, and begin to discuss the freedoms and interactions that others should have with our systems and projects. Here are some starting ideas that I have:

  • Have a code of conduct for all contributors (remember, opening an issue is a contribution).
  • Document your target users, and what kind of experience they should have. Expand this over time.
  • Promote empathy for those who aren’t direct contributors - indirect users without choice exist.
  • Remove dependencies on as many problematic software projects as possible.
  • Push for improvements to open licenses that enshrine the freedoms of others - not just developers.

As individual communities we can advance the state of software and how we act socially so that future projects and users are in a better place. No software exists in a vacuum, all software exists to support people. We need to always keep in mind the effects our software has on others.

Time Machine on Samba with ZFS

Posted by William Brown on March 21, 2021 02:00 PM

Time Machine on Samba with ZFS

Time Machine is Apple’s in-built backup system for MacOS. It’s probably the best consumer backup option, which really achieves “set and forget” backups.

It can backup to an external hard disk on a dock, an Apple Time Machine (wireless access point), or a custom location based on SMB shares.

Since I have a fileserver at home, I use this as my Time Machine backup target. To make this work really smoothly there are a few setup steps.

MacOS Time Machine Performance

By default timemachine operates as a low priority process. You can set a sysctl to improve the performance of this:

sysctl -w debug.lowpri_throttle_enabled=0

You will need a launchd script to make this setting survive a reboot.

ZFS

I’m using ZFS on my server, which is probably the best filesystem available. To make Time Machine work well on ZFS there are a number of tuning options that can help. As these backups write and read many small files, you should have a large amount of RAM for ARC (best) or a ZIL + L2ARC on nvme. RAID 10 will likely work better than RAIDZ here as you need better seek latency than write throughput due to the need to access many small files.

For the ZFS properties on the filesystem I have set:

atime: off
dnodesize: auto
xattr: sa
logbias: latency
recordsize: 32K
compression: zstd-10
quota: 3T
# optional
sync: disabled

The important ones here are the compression setting, which in my case gives a 1.3x compression ratio to save space, the quota to prevent the backups overusing space, the recordsize that helps to minimise write fragmentation.

You may optionally choose to disable sync. This is because Time Machine issues a sync after every single file write to the server, which can cause low performance with many small files. To mitigate the data loss risk here, I both snapshot the backups directory hourly, but I also have two stripes (an A/B backup target) so that if one of the stripes goes back, I can still access the other. This is another reason that compression is useful, to help offset the cost of the duplicated data.

Quota

Inside of the backups filessytem I have two folders:

timemachine_a
timemachine_b

In each of these you can add a PList that applies quota limits to the time machine stripes.

<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE plist PUBLIC "-//Apple//DTD PLIST 1.0//EN" "http://www.apple.com/DTDs/PropertyList-1.0.dtd">
<plist version="1.0">
<dict>
  <key>GlobalQuota</key>
    <integer>1000000000000</integer>
  </dict>
</plist>

The quota is in bytes. You may not need this if you use the smb fruit:time machine max size setting.

smb.conf

In smb.conf I offer two shares for the A and B stripe. These have identical configurations beside the paths.

[timemachine_b]
comment = Time Machine
path = /var/data/backup/timemachine_b
browseable = yes
write list = timemachine
create mask = 0600
directory mask = 0700
spotlight = no
vfs objects = catia fruit streams_xattr
fruit:aapl = yes
fruit:time machine = yes
fruit:time machine max size = 1050G
durable handles = yes
kernel oplocks = no
kernel share modes = no
posix locking = no
# NOTE: Changing these will require a new initial backup cycle if you already have an existing
# timemachine share.
case sensitive = true
default case = lower
preserve case = no
short preserve case = no

The fruit settings are required to help Time Machine understand that this share is usable for it. Most of the durable settings are related to performance improvement to help minimise file locking and to improve throughput. These are “safe” only because we know that this volume is ALSO not accessed or manipulated by any other process or nfs at the same time.

I have also added a custom timemachine user to smbpasswd, and created a matching posix account who should own these files.

MacOS

You can now add this to MacOS via system preferences. Alternately you can use the command line.

tmutil setdestination smb://timemachine:password@hostname/timemachine_a

If you intend to have stripes (A/B), MacOS is capable of mirroring between two strips alternately. You can append the second stripe with (note the -a).

tmutil setdestination -a smb://timemachine:password@hostname/timemachine_b

Against Packaging Rust Crates

Posted by William Brown on February 15, 2021 02:00 PM

Against Packaging Rust Crates

Recently the discussion has once again come up around the notion of packaging Rust crates as libraries in distributions. For example, taking a library like serde and packaging it to an RPM. While I use RPM as the examples here it applies equally to other formats.

Proponents of crate packaging want all Rust applications to use the “distributions” versions of a crate. This is to prevent “vendoring” or “bundling”. This is where an application (such as 389 Directory Server) ships all of it’s sources, as well as the sources of it’s Rust dependencies in a single archive. These sources may differ in version from the bundled sources of other applications.

“Packaging crates is not reinventing Cargo”

This is a common claim by advocates of crate packaging. However it is easily disproved:

If packaging is not reinventing cargo, I am free to use all of Cargo’s features without conflicts to distribution packaging.

The reality is that packaging crates is reinventing Cargo - but without all it’s features. Common limitations are that Cargo’s exact version/less than requirements can not be used safely, or Cargo’s ability to apply patches or uses sources from specific git revisions can not be used at all.

As a result, this hinders upstreams from using all the rich features within Cargo to comply with distribution packaging limitations, or it will cause the package to hit exceptions in policy and necesitate vendoring anyway.

“You can vendor only in these exceptional cases …”

As noted, since packaging is reinventing Cargo, if you use features of Cargo that are unsupported then you may be allowed to vendor depending on the distributions policy. However, this raises some interesting issues itself.

Assume I have been using distribution crates for a period of time - then the upstream adds an exact version or git revision requirement to a project or a dependency in my project. I now need to change my spec file and tooling to use vendoring and all of the benefits of distribution crates no longer exists (because you can not have any dependency in your tree that has an exact version rule).

If the upstream ‘un-does’ that change, then I need to roll back to distribution crates since the project would no longer be covered by the exemption.

This will create review delays and large amounts of administrative overhead. It means pointless effort to swap between vendored and distribution crates based on small upstream changes. This may cause packagers to avoid certain versions or updates so that they do not need to swap between distribution methods.

It’s very likely that these “exceptional” cases will be very common, meaning that vendoring will be occuring. This necesitates supporting vendored applications in distribution packages.

“You don’t need to package the universe”

Many proponents say that they have “already packaged most things”. For example in 389 Directory Server of our 60 dependencies, only 2 were missing in Fedora (2021-02). However this overlooks the fact that I do not want to package those 2 other crates just to move forward. I want to support 389 Directory Server the application not all of it’s dependencies in a distribution.

This is also before we come to larger rust projects, such as Kanidm that has nearly 400 dependencies. The likelihood that many of them are missing is high.

So you will need to package the universe. Maybe not all of it. But still a lot of it. It’s already hard enough to contribute packages to a distribution. It becomes even harder when I need to submit 3, 10, or 100 more packages. It could be months before enough approvals were in place. It’s a staggering amount of administration and work, which will discourage many contributors.

People have already contacted me to say that if they had to package crates to distribution packages to contribute, they would give up and walk away. We’ve already lost future contributors.

Further to this Ruby, Python and many other languages today all recommend language native tools such as rvm or virtualenv to avoid using distribution packaged libraries.

Packages in distributions should exist as a vehicle to ship bundled applications that are created from their language native tools.

“We will update your dependencies for you”

A supposed benefit is that versions of crates in distributions will be updated in the background according to semver rules.

If we had an exact version requirement (that was satisfiable), a silent background update will cause this to no longer work - and will break the application from building. This would necesitate one of:

  • A change to the Cargo.toml to remove the equality requirement - a requirement that may exist for good reason.
  • It will force the application to temporarily swap to vendoring instead.
  • The application will remain broken and unable to be updated until upstream resolves the need for the equality requirement.

Background updates also ignore the state of your Cargo.lock file by removing it. A Cargo.lock file is recommended to be checked in with binary applications in Rust, as evidence that shows “here is an exact set of dependencies that upstream has tested and verified as building and working”.

To remove and ignore this file, means to remove the guarantees of quality from an upstream.

It is unlikely that packagers will run the entire test suite of an application to regain this confidence. They will “apply the patch and pray” method - as they already do with other languages.

We can already see how background updates can have significant negative consequences on application stability. FreeIPA has hundreds of dependencies, and it’s common that if any of them changes in small ways, it can cause FreeIPA to fall over. This is not the fault of FreeIPA - it’s the fault of relying on so many small moving parts that can change underneath your feet without warning. FreeIPA would strongly benefit from vendoring to improve it’s stability and quality.

Inversely, it can cause hesitation to updating libraries - since there is now a risk of breaking other applications that depend on them. We do not want people to be afraid of updates.

“We can respond to security issues”

On the surface this is a strong argument, but in reality it does not hold up. The security issues that face Rust are significantly different to that which affect C. In C it may be viable to patch and update a dynamic library to fix an issue. It saves time because you only need to update and change one library to fix everything.

Security issues are much rarer in Rust. When they occur, you will have to update and re-build all applications depending on the affected library.

Since this rebuilding work has to occur, where the security fix is applied is irrelevant. This frees us to apply the fixes in a different way to how we approach C.

It is better to apply the fixes in a consistent and universal manner. There will be applications that are vendored due to vendoring exceptions, there is now duplicated work and different processes to respond to both distribution crates, and vendored applications.

Instead all applications could be vendored, and tooling exists that would examine the Cargo.toml to check for insecure versions (RustSec/cargo-audit does this for example). The Cargo.toml’s can be patched, and applications tested and re-vendored. Even better is these changes could easily then be forwarded to upstreams, allowing every distribution and platform to benefit from the work.

In the cases that the upstream can not fix the issue, then Cargo’s native patching tooling can be used to supply fixes directly into vendored sources for rare situations requiring it.

“Patching 20 vulnerable crates doesn’t scale, we need to patch in one place!”

A common response to the previous section is that the above process won’t scale as we need to find and patch 20 locations compared to just one. It will take “more human effort”.

Today, when a security fix comes out, every distribution’s security teams will have to be made aware of this. That means - OpenSUSE, Fedora, Debian, Ubuntu, Gentoo, Arch, and many more groups all have to become aware and respond. Then each of these projects security teams will work with their maintainers to build and update these libraries. In the case of SUSE and Red Hat this means that multiple developers may be involved, quality engineering will be engaged to test these changes. Consumers of that library will re-test their applications in some cases to ensure there are no faults of the components they rely upon. This is all before we approach the fact that each of these distributions have many supported and released versions they likely need to maintain so this process may be repeated for patching and testing multiple versions in parallel.

In this process there are a few things to note:

  • There is a huge amount of human effort today to keep on top of security issues in our distributions.
  • Distributions tend to be isolated and can’t share the work to resolve these - the changes to the rpm specs in SUSE won’t help Debian for example.
  • Human error occurs in all of these layers causing security issues to go un-fixed or breaking a released application.

To suggest that rust and vendoring somehow makes this harder or more time consuming is discounting the huge amount of time, skill, and effort already put in by people to keep our C based distributions functioning today.

Vendored Rust won’t make this process easier or harder - it just changes the nature of the effort we have to apply as maintainers and distributions. It shifts our focus from “how do we ensure this library is secure” to “how do we ensure this application made from many libraries is secure”. It allows further collaboration with upstreams to be involved in the security update process, which ends up benefiting all distributions.

“It doesn’t duplicate effort”

It does. By the very nature of both distribution libraries and vendored applications needing to exist in a distribution, there will become duplicated but seperate processes and policies to manage these, inspect, and update these. This will create a need for tooling and supporting both methods, which consumes time for many people.

People have already done the work to package and release libraries to crates.io. Tools already exist to provide our dependencies and include them in our applications. Why do we need to duplicate these features and behaviours in distribution packages when Cargo already does this correctly, and in a way that is universal and supported.

Don’t support distribution crates

I can’t be any clearer than that. They consume excessive amounts of contributor time, for little to no benefit, it detracts from simpler language-native solutions for managing dependencies, distracts from better language integration tooling being developed, it can introduce application instability and bugs, and it creates high barriers to entry for new contributors to distributions.

It doesn’t have to be like this.

We need to stop thinking that Rust is like C. We have to accept that language native tools are the interface people will want to use to manage their libraries and distribute whole applications. We must use our time more effectively as distributions.

If we focus on supporting vendored Rust applications, and developing our infrastructure and tooling to support this, we will attract new contributors by lowering barriers to entry, but we will also have a stronger ability to contribute back to upstreams, and we will simplify our building and packaging processes.

Today, tools like docker, podman, flatpak, snapd and others have proven how bundling/vendoring, and a focus an applications can advance the state of our ecosystems. We need to adopt the same ideas into distributions. Our package managers should become a method to ship applications - not libraries.

We need to focus our energy to supporting applications as self contained units - not supporting the libraries that make them up.

Edits

  • Released: 2021-02-16
  • EDIT: 2021-02-22 - improve clarity on some points, thanks to ftweedal.
  • EDIT: 2021-02-23 - due to a lot of comments regarding security updates, added an extra section to address how this scales.

Getting Started Packaging A Rust CLI Tool in SUSE OBS

Posted by William Brown on February 14, 2021 02:00 PM

Getting Started Packaging A Rust CLI Tool in SUSE OBS

Distribution packaging always seems like something that is really difficult or hard to do, but the SUSE Open Build Service makes it really easy to not only build packages, but to then contribute them to Tumbleweed. Not only that, OBS can also build for Fedora, CentOS and more.

Getting Started

You’ll need to sign up to service - there is a sign up link on the front page of OBS

To do this you’ll need a SUSE environment. Docker is an easy way to create this without having to commit to a full virtual machine / install.

docker run \
    --security-opt=seccomp:unconfined --cap-add=SYS_PTRACE --cap-add=SYS_CHROOT --cap-add=SYS_ADMIN \
    -i -t opensuse/tumbleweed:latest /bin/sh
  • NOTE: We need these extra privileges so that the osc build command can work due to how it uses chroots/mounts.

Inside of this we’ll need some packages to help make the process easier.

zypper install obs-service-cargo_vendor osc obs-service-tar obs-service-obs_scm \
    obs-service-recompress obs-service-set_version obs-service-format_spec_file \
    obs-service-cargo_audit cargo sudo

You should also install your editor of choice in this command (docker images tend not to come with any editors!)

You’ll need to configure osc, which is the CLI interface to OBS. This is done in the file ~/.config/osc/oscrc. A minimal starting configuration is:

[general]
# URL to access API server, e.g. https://api.opensuse.org
# you also need a section [https://api.opensuse.org] with the credentials
apiurl = https://api.opensuse.org
[https://api.opensuse.org]
user = <username>
pass = <password>

You can check this works by using the “whois” command.

# osc whois
firstyear: "William Brown" <email here>

Optionally, you may install cargo lock2rpmprovides to assist with creation of the license string for your package:

cargo install cargo-lock2rpmprovides

Packaging A Rust Project

In this example we’ll use a toy Rust application I created called hellorust. Of course, feel free to choose your own project or Rust project you want to package!

  • HINT: It’s best to choose binaries, not libraries to package. This is because Rust can self-manage it’s dependencies, so we don’t need to package every library. Neat!

First we’ll create a package in our OBS home project.

osc co home:<username>
cd home:<username>
osc mkpac hellorust
cd hellorust

OBS comes with a lot of useful utilities to help create and manage sources for our project. First we’ll create a skeleton RPM spec file. This should be in a file named hellorust.spec

%global rustflags -Clink-arg=-Wl,-z,relro,-z,now -C debuginfo=2

Name:           hellorust
#               This will be set by osc services, that will run after this.
Version:        0.0.0
Release:        0
Summary:        A hello world with a number of the day printer.
#               If you know the license, put it's SPDX string here.
#               Alternately, you can use cargo lock2rpmprovides to help generate this.
License:        Unknown
#               Select a group from this link:
#               https://en.opensuse.org/openSUSE:Package_group_guidelines
Group:          Amusements/Games/Other
Url:            https://github.com/Firstyear/hellorust
Source0:        %{name}-%{version}.tar.xz
Source1:        vendor.tar.xz
Source2:        cargo_config

BuildRequires:  rust-packaging
ExcludeArch:    s390 s390x ppc ppc64 ppc64le %ix86

%description
A hello world with a number of the day printer.

%prep
%setup -q
%setup -qa1
mkdir .cargo
cp %{SOURCE2} .cargo/config
# Remove exec bits to prevent an issue in fedora shebang checking
find vendor -type f -name \*.rs -exec chmod -x '{}' \;

%build
export RUSTFLAGS="%{rustflags}"
cargo build --offline --release

%install
install -D -d -m 0755 %{buildroot}%{_bindir}

install -m 0755 %{_builddir}/%{name}-%{version}/target/release/hellorust %{buildroot}%{_bindir}/hellorust

%files
%{_bindir}/hellorust

%changelog

There are a few commented areas you’ll need to fill in and check. But next we will create a service file that allows OBS to help get our sources and bundle them for us. This should go in a file called _service

<services>
  <service mode="disabled" name="obs_scm">
    <!-- ✨ URL of the git repo ✨ -->
    <param name="url">https://github.com/Firstyear/hellorust.git</param>
    <param name="versionformat">@PARENT_TAG@~git@TAG_OFFSET@.%h</param>
    <param name="scm">git</param>
    <!-- ✨ The version tag or branch name from git ✨ -->
    <param name="revision">v0.1.1</param>
    <param name="match-tag">*</param>
    <param name="versionrewrite-pattern">v(\d+\.\d+\.\d+)</param>
    <param name="versionrewrite-replacement">\1</param>
    <param name="changesgenerate">enable</param>
    <!-- ✨ Your email here ✨ -->
    <param name="changesauthor"> YOUR EMAIL HERE </param>
  </service>
  <service mode="disabled" name="tar" />
  <service mode="disabled" name="recompress">
    <param name="file">*.tar</param>
    <param name="compression">xz</param>
  </service>
  <service mode="disabled" name="set_version"/>
  <service name="cargo_audit" mode="disabled">
      <!-- ✨ The name of the project here ✨ -->
     <param name="srcdir">hellorust</param>
  </service>
  <service name="cargo_vendor" mode="disabled">
      <!-- ✨ The name of the project here ✨ -->
     <param name="srcdir">hellorust</param>
     <param name="compression">xz</param>
  </service>

</services>

Now this service file does a lot of the heavy lifting for us:

  • It will fetch the sources from git, based on the version we set.
  • It will turn them into a tar.xz for us.
  • It will update the changelog for the rpm, and set the correct version in the spec file.
  • It scans our project for any known vulnerabilities
  • It will download our rust dependencies, and then bundle them to vendor.tar.xz.

So our current work dir should look like:

# ls -1 .
.osc
_service
hellorust.spec

Now we can run osc service ra. This will run the services in our _service file as we mentioned. Once it’s complete we’ll have quite a few more files in our directory:

# ls -1 .
_service
_servicedata
cargo_config
hellorust
hellorust-0.1.1~git0.db340ad.obscpio
hellorust-0.1.1~git0.db340ad.tar.xz
hellorust.obsinfo
hellorust.spec
vendor.tar.xz

Inside the hellorust folder (home:username/hellorust/hellorust), is a checkout of our source. If you cd to that directory, you can run cargo lock2rpmprovides which will display your license string you need:

License: ( Apache-2.0 OR MIT ) AND ( Apache-2.0 WITH LLVM-exception OR Apache-2.0 OR MIT ) AND

Just add the license from the project, and then we can update our hellorust.spec with the correct license.

License: ( Apache-2.0 OR MIT ) AND ( Apache-2.0 WITH LLVM-exception OR Apache-2.0 OR MIT ) AND MPL-2.0
  • HINT: You don’t need to use the emitted “provides” lines here. They are just for fedora rpms to adhere to some of their policy requirements.

Now we can build our package on our local system to test it. This may take a while to get all its build dependencies and other parts, so be patient :)

osc build

If that completes successfully, you can now test these rpms:

# zypper in /var/tmp/build-root/openSUSE_Tumbleweed-x86_64/home/abuild/rpmbuild/RPMS/x86_64/hellorust-0.1.1~git0.db340ad-0.x86_64.rpm
(1/1) Installing: hellorust-0.1.1~git0.db340ad-0.x86_64  ... [done]
# rpm -ql hellorust
/usr/bin/hellorust
# hellorust
Hello, Rust! The number of the day is: 68

Next you can commit to your project. Add the files that we created:

# osc add _service cargo_config hellorust-0.1.1~git0.db340ad.tar.xz hellorust.spec vendor.tar.xz
# osc status
A    _service
?    _servicedata
A    cargo_config
?    hellorust-0.1.1~git0.db340ad.obscpio
A    hellorust-0.1.1~git0.db340ad.tar.xz
?    hellorust.obsinfo
A    hellorust.spec
A    vendor.tar.xz
  • HINT: You DO NOT need to commit _servicedata OR hellorust-0.1.1~git0.db340ad.obscpio OR hellorust.obsinfo
osc ci

From here, you can use your packages from your own respository, or you can forward them to OpenSUSE Tumbleweed (via Factory). You likely need to polish and add extra parts to your package for it to be accepted into Factory, but this should at least make it easier for you to start!

For more, see the how to contribute to Factory document. To submit to Leap, the package must be in Factory, then you can request it to be submitted to Leap as well.

Happy Contributing! 🦎🦀

A few weeks on the road

Posted by Mo Morsi on November 24, 2020 02:07 AM

I spent the last few weeks on the road. Originally intending to take a cross-country road trip to seek new perspectives, this trip didn't quite amount to that, though it was good to get out of town for a while. Even shorter trips to new locations help clear the head, and I suspect that after this past year, many minds need some decluttering!

The first stop was Charleston, South Carolina, a quintessential southern town. After exploring King Street, The Battery, and the old Antebellum houses of historic Charleston, the journey continued onto the Tampa Bay, Florida area to spend some time on the gulf. There is alot to be said of being on the water, the waves crashing on the shore, the salt in the air, and the slower pace of life is theraputic when faced with fast-paced startup culture. The South offers alot which other parts of the country do not. Besides the beautiful weather this time of year, Southern courtesy is unparalleled and it is a refreshing change from the no-nonsense business oriented mannerisms of the north-east. Of course I'm painting broad stokes here, but it's something one has to experience first hand.

One major downside to modern society no matter where you go is the pervasiveness of a homogeneous corporate culture. Everywhere you look there are strip malls, McDonalds and Walmarts, and the same rest-stops, hotels, and types of business establishments. Global interconnectivity and connectedness has many benefits but a major disadvantage is that no idea is unique for long, and novel / original concepts are instantly copied and replicated without end.

The human psyche needs both the familiar as well as novelty, and the later is increasingly hard to find in the world. Even digitally, many concepts have already been pioneered, and many new innovations are rehashes of old ideas. Humanity will need to reconcile with this dichotomy in the future as there is no turning back, and no stemming innovation or the pioneering spirit. The only strategy that I see individuals and organizations having to be able to adapt in the modern age is to continue moving forward, accomplishments can be celebrated with brief periods of respite, but the hyper-competitive nature of the world doesn't leave much room for leniency... talk about pressure!

After a few weeks, it was time to return back to NY, my homestate and location of maximum productivity. A 18+ hour drive from Florida saw the arrival in my hometown of Syracuse. This week will be spent here for the Thanksgiving holiday before returning back to The Matrix, NYC, which while still in the midst of the on-going pandemic and the economic crisis that is just beginning to unfold, is critical to return to inorder to leverage the many business and networking opportunities that a major metropolitain center affords. Plus NYC is a fun city to live in!


In other news, be sure to checkout Dev Null Productions' latest product: FleXRP - a tool to setup your XRP account for the upcoming Flare Networks Spark Token Airdrop. While Flare will run on an independent Blockchain, the token distribution will be based on XRP account balances, with credentials being seeded from those XRP accounts who have been associated with ERC-compatibile (Ethereum based) addresses. FleXRP takes care of this setup process via a simple to use interface: run the app, provide the required information, and you are good to go!

Lastly I recommend fellow entrepreneurs, innovators, and other ambitious individuals read How To Win Friends and Influence People, that is if you haven't yet... I suspect this one is given to all business school students on day one! Having listening to the audiobook on my road trip, the knowledge is indispensible in the domain of human relations. While the concepts can be found summarized online, the anecdotes in the book really excemplify the points being made and demonstrate practical application.

While we're on the subject, also be sure to checkout Thickface, Blackheart, a timeless classic on practical eastern business practices.

That's all for now, until next time!

Webauthn UserVerificationPolicy Curiosities

Posted by William Brown on November 20, 2020 02:00 PM

Webauthn UserVerificationPolicy Curiosities

Recently I received a pair of interesting bugs in Webauthn RS where certain types of authenticators would not work in Firefox, but did work in Chromium. This confused me, and I couldn’t reproduce the behaviour. So like any obsessed person I ordered myself one of the affected devices and waited for Australia Post to lose it, find it, lose it again, and then finally deliver the device 2 months later.

In the meantime I swapped browsers from Firefox to Edge and started to notice some odd behaviour when logging into my corporate account - my yubikey began to ask me for my pin on every authentication, even though the key was registered to the corp servers without a pin. Yet the key kept working on Edge with a pin - and confusingly without a pin on Firefox.

Some background on Webauthn

Before we dive into the issue, we need to understand some details about Webauthn. Webauthn is a standard that allows a client (commonly a web browser) to cryptographically authenticate to a server (commonly a web site). Webauthn defines how different types of hardware cryptographic authenticators may communicate between the client and the server.

An example of some types of authenticator devices are U2F tokens (yubikeys), TouchID (Apple Mac, iPhone, iPad), Trusted Platform Modules (Windows Hello) and many more. Webauthn has to account for differences in these hardware classes and how they communicate, but in the end each device performs a set of asymmetric cryptographic (public/private key) operations.

Webauthn defines the structures of how a client and server communicate to both register new authenticators and subsequently authenticate with those authenticators.

For the first step of registration, the server provides a registration challenge to the client. The structure of this (which is important for later) looks like:

dictionary PublicKeyCredentialCreationOptions {
    required PublicKeyCredentialRpEntity         rp;
    required PublicKeyCredentialUserEntity       user;

    required BufferSource                             challenge;
    required sequence<PublicKeyCredentialParameters>  pubKeyCredParams;

    unsigned long                                timeout;
    sequence<PublicKeyCredentialDescriptor>      excludeCredentials = [];
    AuthenticatorSelectionCriteria               authenticatorSelection = {
        AuthenticatorAttachment      authenticatorAttachment;
        boolean                      requireResidentKey = false;
        UserVerificationRequirement  userVerification = "preferred";
    };
    AttestationConveyancePreference              attestation = "none";
    AuthenticationExtensionsClientInputs         extensions;
};

The client then takes this structure, and creates a number of hashes from it which the authenticator then signs. This signed data and options are returned to the server as a PublicKeyCredential containing an Authenticator Attestation Response.

Next is authentication. The server sends a challenge to the client which has the structure:

dictionary PublicKeyCredentialRequestOptions {
    required BufferSource                challenge;
    unsigned long                        timeout;
    USVString                            rpId;
    sequence<PublicKeyCredentialDescriptor> allowCredentials = [];
    UserVerificationRequirement          userVerification = "preferred";
    AuthenticationExtensionsClientInputs extensions;
};

Again, the client takes this structure, takes a number of hashes and the authenticator signs this to prove it is the holder of the private key. The signed response is sent to the server as a PublicKeyCredential containing an Authenticator Assertion Response.

Key to this discussion is the following field:

UserVerificationRequirement          userVerification = "preferred";

This is present in both PublicKeyCredentialRequestOptions and PublicKeyCredentialCreationOptions. This informs what level of interaction assurance should be provided during the signing process. These are discussed in NIST SP800-64b (5.1.7, 5.1.9) (which is just an excellent document anyway, so read it).

One aspect of these authenticators is that they must provide tamper proof evidence that a person is physically present and interacting with the device for the signature to proceed. This is important as it means that if someone is able to gain remote code execution on your system, they are unable to use your authenticator devices (even if it’s part of the device, like touch id) as they are not physically present at the machine.

Some authenticators are able to go beyond to strengthen this assurance, by verifying the identity of the person interacting with the authenticator. This means that the interaction also requires say a PIN (something you know), or a biometric (something you are). This allows the authenticator to assert not just that someone is present but that it is a specific person who is present.

All authenticators are capable of asserting user presence but only some are capable of asserting user verification. This creates two classes of authenticators as defined by NIST SP800-64b.

Single-Factor Cryptographic Devices (5.1.7) which only assert presence (the device becomes something you have) and Multi-Factor Cryptographic Devices (5.1.9) which assert the identity of the holder (something you have + something you know/are).

Webauthn is able to request the use of a Single-Factor Device or Multi-Factor Device through it’s UserVerificationRequirement option. The levels are:

  • Discouraged - Only use Single-Factor Devices
  • Required - Only use Multi-Factor Devices
  • Preferred - Request Multi-Factor if possible, but allow Single-Factor devices.

Back to the mystery …

When I initially saw these reports - of devices that did not work in Firefox but did in Chromium, and of devices asking for PINs on some browsers but not others - I was really confused. The breakthrough came as I was developing Webauthn Authenticator RS. This is the client half of Webauthn, so that I could have the Kanidm CLI tools use Webauthn for multi-factor authentication (MFA). In the process, I have been using the authenticator crate made by Mozilla and used by Firefox.

The authenticator crate is what communicates to authenticators by NFC, Bluetooth, or USB. Due to the different types of devices, there are multiple different protocols involved. For U2F devices, the protocol is CTAP over USB. There are two versions of the CTAP protocol - CTAP1, and CTAP2.

In the authenticator crate, only CTAP1 is supported. CTAP1 devices are unable to accept a PIN, so user verification must be performed internally to the device (such as a fingerprint reader built into the U2F device).

Chromium, however, is able to use CTAP2 - CTAP2 does allow a PIN to be provided from the host machine to the device as a user verification method.

Why would devices fail in Firefox?

Once I had learnt this about CTAP1/CTAP2, I realised that my example code in Webauthn RS was hardcoding Required as the user verification level. Since Firefox can only use CTAP1, it was unable to use PINs to U2F devices, so they would not respond to the challenge. But on Chromium with CTAP2 they are able to have PINs so Required can be satisfied and the devices work.

Okay but the corp account?

This one is subtle. The corp identity system uses user verification of ‘Preferred’. That meant that on Firefox, no PIN was requested since CTAP1 can’t provide them, but on Edge/Chromium a PIN can be provided as they use CTAP2.

What’s more curious is that the same authenticator device is flipping between Single Factor and Multi Factor, with the same Private/Public Key pair just based on what protocol is used! So even though the ‘Preferred’ request can be satisfied on Chromium/Edge, it’s not on Firefox. To further extend my confusion, the device was originally registered to the corp identity system in Firefox so it would have not had user verification available, but now that I use Edge it has gained this requirement during authentication.

That seems … wrong.

I agree. But Webauthn fully allows this. This is because user verification is a property of the request/response flow, not a property of the device.

This creates some interesting side effects that become an opportunity for user confusion. (I was confused about what the behaviour was and I write a webauthn server and client library - imagine how other people feel …).

Devices change behaviour

This means that during registration one policy can be requested (i.e. Required) but subsequently it may not be used (Preferred + Firefox + U2F, or Discouraged). Another example of a change in behaviour occurs when a device is used on Chromium with Preferred user verification is required, but when used on Firefox the device may not require verification. It also means that a site that implements Required can have devices that simply don’t work in other browsers.

Because this is changing behaviour it can confuse users. For examples:

  • Why do I need a PIN now but not before?
  • Why did I need a PIN before but not now?
  • Why does my authenticator work on this computer but not on another?

Preferred becomes Discouraged

This opens up a security risk where since Preferred “attempts” verification but allows it to not be present, a U2F device can be “downgraded” from Multi-Factor to Single-Factor by using it with CTAP1 instead of CTAP2. Since it’s also per request/response, a compromised client could also tamper with the communication to the authenticator removing the requested userverification parameter silently and the server would allow it.

This means that in reality, Preferred is policy and security wise equivalent to Discouraged, but with a more annoying UI/UX for users who have to conduct a verification that doesn’t actually help identify them.

Remember - if unspecified, ‘Preferred’ is the default user verification policy in Webauthn!

Lock Out / Abuse Vectors

There is also a potential abuse vector here. Many devices such as U2F tokens perform a “trust on first use” for their PIN setup. This means that the first time a user verification is requested you configure the pin at that point in time.

A potential abuse vector here is a token that is always used on Firefox, a malicious person could connect the device to Chromium, and setup the PIN without the knowledge of the owner. The owner could continue to use the device, and when Firefox eventually supports CTAP2, or they swap computer or browser, they would not know the PIN, and their token would effectively be unusable at that point. They would need to reset it, potentially causing them to be locked out from accounts, but more likely causing them to need to conduct a lot of password/credential resets.

Unable to implement Authenticator Policy

One of the greatest issues here though is that because user verification is part of the request/response flow and not per device attributes, authenticator policy, and mixed credentials are unable to exist in the current implementation of Webauthn.

Consider a user who has enrolled say their laptop’s U2F device + password, and their iPhone’s touchID to a server. Both of these are Multi Factor credentials. The U2F is a Single Factor Device and becomes Multi-Factor in combination with the password. The iPhone touchID is a Multi-Factor Device on it’s due to the biometric verification it is capable of.

We should be able to have a website request webauthn and based on the device used we can flow to the next required step. If the device was the iPhone, we would be authenticated as we have authenticated a Multi Factor credentials. If we saw the U2F device we would then move to request the password since we have only received a Single Factor. However Webauthn is unable to express this authentication flow.

If we requested Required, we would exclude the U2F device.

If we requested Discouraged, we would exclude the iPhone.

If we request Preferred, the U2F device could be used on a different browser with CTAP2, either:

  • bypassing the password, since the device is now a self contained Multi-Factor; or
  • the U2F device could prompt for the PIN needlessly and we progress to setting a password

The request to an iPhone could be tampered with, preventing the verification occurring and turning it into a single factor device (presence only).

Today, these mixed device scenarios can not exist in Webauthn. We are unable to create the policy around Single-Factor and Multi-Factor devices as defined by NIST because these require us to assert the verification requirements per credential, but Webauthn can not satisfy this.

We would need to pre-ask the user how they want to authenticate on that device and then only send a Webauthn challenge that can satisfy the authentication policy we have decided on for those credentials.

How to fix this

The solution here is to change PublicKeyCredentialDescriptor in the Webauthn standard to contain an optional UserVerificationRequirement field. This would allow a “global” default set by the server and then per-credential requirements to be defined. This would allow the user verification properties during registration to be associated to that credential, which can then be enforced by the server to guarantee the behaviour of a webauthn device. It would also allow the ‘Preferred’ option to have a valid and useful meaning during registration, where devices capable of verification can provide that or not, and then that verification boolean can be then transformed to a Discouraged or Required setting for that credential for future authentications.

The second change would be to disallow ‘Preferred’ as a valid value in the “global” default during authentications. The new “default” global value should be ‘Discouraged’ and then only credentials that registered with verification would indicate that in their PublicKeyCredentialDescriptor.

This would resolve the issues above by:

  • Making the use of an authenticator consistent after registration. For example, authenticators registered with CTAP1 would stay ‘Discouraged’ even when used with CTAP2
  • If PIN/Verification abuse occurred, the credentials registered on CTAP1 without verification would continue to be ‘presence only’ preventing the lockout
  • Allowing the server to proceed with the authentication flow based on which credential authenticated and provide logic about further factors if needed.
  • Allowing true Single Factor and Multi Factor device policies to be expressed in line with NIST SP800-63b, so users can have a mix of Single and Multi Factor devices associated with a single account.

I have since opened this issue with the webauthn specification about this, but early comments seem to be highly focused on the current expression of the standard rather than the issues around the user experience and ability for identity systems to accurately express credential policy.

In the meantime, I am going to make changes to Webauthn RS to help avoid some of these issues:

  • Preferred will be renamed to Preferred_Is_Equivalent_To_Discouraged (it will still emit ‘Preferred’ in the JSON, this only changes the Rust API enum)
  • Credential structures persisted by applications will contain the boolean of user-verification if it occurred during registration
  • During an authentication, if the set of credentials contains inconsistent user-verification booleans, an error will be raised
  • Authentication User Verification Policy is derived from the set of credentials having a consistent user-verification boolean

While not perfect, it will mean that it’s “hard to hold it wrong” with Webauthn RS.

Acknowledgements

Thanks to both @Charcol0x89 and @JuxhinDB for reviewing this post.

Rust, SIMD and target-feature flags

Posted by William Brown on November 19, 2020 02:00 PM

Rust, SIMD and target-feature flags

This year I’ve been working on concread and one of the ways that I have improved it is through the use of packed_simd for parallel key lookups in hashmaps. During testing I saw a ~10% speed up in Kanidm which heavily relies on concread, so great, pack it up, go home.

…?

Or so I thought. Recently I was learning to use Ghidra with a friend, and as a thought exercise I wanted to see how Rust decompiled. I put the concread test suite into Ghidra and took a look. Looking at the version of concread with simd_support enabled, I saw this in the disassembly (truncated for readability).

  **************************************************************
  *                          FUNCTION                          *
  **************************************************************
  Simd<[packed_simd_2--masks--m64;8]> __stdcall eq(Simd<[p
 ...
100114510 55              PUSH       RBP
100114511 48 89 e5        MOV        RBP,RSP
100114514 48 83 e4 c0     AND        RSP,-0x40
100114518 48 81 ec        SUB        RSP,0x100
          00 01 00 00
10011451f 48 89 f8        MOV        RAX,__return_storage_ptr__
100114522 0f 28 06        MOVAPS     XMM0,xmmword ptr [self->__0.__0]
 ...
100114540 66 0f 76 c4     PCMPEQD    XMM0,XMM4
100114544 66 0f 70        PSHUFD     XMM4,XMM0,0xb1
          e0 b1
100114549 66 0f db c4     PAND       XMM0,XMM4
 ...
100114574 0f 29 9c        MOVAPS     xmmword ptr [RSP + local_90],XMM3
          24 b0 00
          00 00
1001145b4 48 89 7c        MOV        qword ptr [RSP + local_c8],__return_storage_pt
          24 78
 ...
1001145be 0f 29 44        MOVAPS     xmmword ptr [RSP + local_e0],XMM0
          24 60
 ...
1001145d2 48 8b 44        MOV        RAX,qword ptr [RSP + local_c8]
          24 78
1001145d7 0f 28 44        MOVAPS     XMM0,xmmword ptr [RSP + local_e0]
          24 60
1001145dc 0f 29 00        MOVAPS     xmmword ptr [RAX],XMM0
 ...
1001145ff 48 89 ec        MOV        RSP,RBP
100114602 5d              POP        RBP
100114603 c3              RET

Now, it’s been a long time since I’ve had to look at x86_64 asm, so I saw this and went “great, it’s not using a loop, those aren’t simple TEST/JNZ instructions, they have a lot of letters, awesome it’s using HW accel.

Time passes …

Coming back to this, I have been wondering how we could enable SIMD in concread at SUSE, since 389 Directory Server has just merged a change for 2.0.0 that uses concread as a cache. For this I needed to know what minimum CPU is supported at SUSE. After some chasing internally, knowing what we need I asked in the Rust Brisbane group about how you can define in packed_simd to only emit instructions that work on a minimum CPU level rather than my cpu or the builder cpu.

The response was “but that’s already how it works”.

I was helpfully directed to the packed_simd perf guide which discusses the use of target features and target cpu. At that point I realised that for this whole time I’ve only been using the default:

# rustc --print cfg | grep -i target_feature
target_feature="fxsr"
target_feature="sse"
target_feature="sse2"

The PCMPEQD is from sse2, but my cpu is much newer and should support AVX and AVX2. Retesting this, I can see my CPU has much more:

# rustc --print cfg -C target-cpu=native | grep -i target_feature
target_feature="aes"
target_feature="avx"
target_feature="avx2"
target_feature="bmi1"
target_feature="bmi2"
target_feature="fma"
target_feature="fxsr"
target_feature="lzcnt"
target_feature="pclmulqdq"
target_feature="popcnt"
target_feature="rdrand"
target_feature="rdseed"
target_feature="sse"
target_feature="sse2"
target_feature="sse3"
target_feature="sse4.1"
target_feature="sse4.2"
target_feature="ssse3"
target_feature="xsave"
target_feature="xsavec"
target_feature="xsaveopt"
target_feature="xsaves"

All this time, I haven’t been using my native features!

For local builds now, I have .cargo/config set with:

[build]
rustflags = "-C target-cpu=native"

I recompiled concread and I now see in Ghidra:

00198960 55              PUSH       RBP
00198961 48 89 e5        MOV        RBP,RSP
00198964 48 83 e4 c0     AND        RSP,-0x40
00198968 48 81 ec        SUB        RSP,0x100
         00 01 00 00
0019896f 48 89 f8        MOV        RAX,__return_storage_ptr__
00198972 c5 fc 28 06     VMOVAPS    YMM0,ymmword ptr [self->__0.__0]
00198976 c5 fc 28        VMOVAPS    YMM1,ymmword ptr [RSI + self->__0.__4]
         4e 20
0019897b c5 fc 28 12     VMOVAPS    YMM2,ymmword ptr [other->__0.__0]
0019897f c5 fc 28        VMOVAPS    YMM3,ymmword ptr [RDX + other->__0.__4]
         5a 20
00198984 c4 e2 7d        VPCMPEQQ   YMM0,YMM0,YMM2
         29 c2
00198989 c4 e2 75        VPCMPEQQ   YMM1,YMM1,YMM3
         29 cb
0019898e c5 fc 29        VMOVAPS    ymmword ptr [RSP + local_a0[0]],YMM1
         8c 24 a0
         00 00 00
...
001989e7 48 89 ec        MOV        RSP,RBP
001989ea 5d              POP        RBP
001989eb c5 f8 77        VZEROUPPER
001989ee c3              RET

VPCMPEQQ is the AVX2 compare instruction (You can tell it’s AVX2 due to the register YMM, AVX uses XMM). Which means now I’m getting the SIMD comparisons I wanted!

These can be enabled with RUSTFLAGS=’-C target-feature=+avx2,+avx’ for selected builds, or in your .cargo/config. It may be a good idea for just local development to do target-cpu=native.

Deploying sccache on SUSE

Posted by William Brown on November 18, 2020 02:00 PM

Deploying sccache on SUSE

sccache is a ccache/icecc-like tool from Mozilla, which in addition to working with C and C++, is also able to help with Rust builds.

Adding the Repo

A submission to Factory (tumbleweed) has been made, so check if you can install from zypper:

zypper install sccache

If not, sccache is still part of devel:tools:building so you will need to add the repo to use sccache.

zypper ar -f obs://devel:tools:building devel:tools:building
zypper install sccache

It’s also important you do not have ccache installed. ccache intercepts the gcc command so you end up “double caching” potentially.

zypper rm ccache

Single Host

To use sccache on your host, you need to set the following environment variables:

export RUSTC_WRAPPER=sccache
export CC="sccache /usr/bin/gcc"
export CXX="sccache /usr/bin/g++"
# Optional: This can improve rust caching
# export CARGO_INCREMENTAL=false

This will allow sccache to wrap your compiler commands. You can show your current sccache status with:

sccache -s

There is more information about using cloud/remote storage for the cache on the sccache project site

Distributed Compiliation

sccache is also capable of distributed compilation, where a number of builder servers can compile items and return the artificats to your machine. This can save you time by allowing compilation over a cluster, using a faster remote builder, or just to keep your laptop cool.

Three components are needed to make this work. A scheduler that coordinates the activities, one or more builders that provide their CPU, and a client that submits compilation jobs.

The sccache package contains the required elements for all three parts.

Note that the client does not need to be the same version of SUSE or even the same distro as the scheduler or builder. This is because the client is able to bundle and submit it’s toolchains to the workers on the fly. Neat! sccache is capacble of also compiling for macos and windows, but in these cases the toolchains can-not be submitted on the fly and requires extra work to configure.

Scheduler

The scheduler is configured with /etc/sccache/scheduler.conf. You need to define the listening ip, client auth, and server (builder) auth methods. The example configuration is well commented to help with this:

# The socket address the scheduler will listen on. It's strongly recommended
# to listen on localhost and put a HTTPS server in front of it.
public_addr = "127.0.0.1:10600"
# public_addr = "[::1]:10600"

[client_auth]
# This is how a client will authenticate to the scheduler.
# # sccache-dist auth generate-shared-token --help
type = "token"
token = "token here"
#
# type = "jwt_hs256"
# secret_key = ""

[server_auth]
# sccache-dist auth --help
# To generate the secret_key:
# # sccache-dist auth generate-jwt-hs256-key
# To generate a key for a builder, use the command:
# # sccache-dist auth generate-jwt-hs256-server-token --config /etc/sccache/scheduler.conf --secret-key "..." --server "builderip:builderport"
type = "jwt_hs256"
secret_key = "my secret key"

You can start the scheduler with:

systemctl start sccache-dist-scheduler.service

If you have issues you can increase logging verbosity with:

# systemctl edit sccache-dist-scheduler.service
[Service]
Environment="RUST_LOG=sccache=trace"

Builder

Similar to the scheduler, the builder is configured with /etc/sccache/builder.conf. Most of the defaults should be left “as is” but you will need to add the token generated from the comments in scheduler.conf - server_auth.

# This is where client toolchains will be stored.
# You should not need to change this as it is configured to work with systemd.
cache_dir = "/var/cache/sccache-builder/toolchains"
# The maximum size of the toolchain cache, in bytes.
# If unspecified the default is 10GB.
# toolchain_cache_size = 10737418240
# A public IP address and port that clients will use to connect to this builder.
public_addr = "127.0.0.1:10501"
# public_addr = "[::1]:10501"

# The URL used to connect to the scheduler (should use https, given an ideal
# setup of a HTTPS server in front of the scheduler)
scheduler_url = "https://127.0.0.1:10600"

[builder]
type = "overlay"
# The directory under which a sandboxed filesystem will be created for builds.
# You should not need to change this as it is configured to work with systemd.
build_dir = "/var/cache/sccache-builder/tmp"
# The path to the bubblewrap version 0.3.0+ `bwrap` binary.
# You should not need to change this as it is configured for a default SUSE install.
bwrap_path = "/usr/bin/bwrap"

[scheduler_auth]
type = "jwt_token"
# This will be generated by the `generate-jwt-hs256-server-token` command or
# provided by an administrator of the sccache cluster. See /etc/sccache/scheduler.conf
token = "token goes here"

Again, you can start the builder with:

systemctl start sccache-dist-builder.service

If you have issues you can increase logging verbosity with:

# systemctl edit sccache-dist-builder.service
[Service]
Environment="RUST_LOG=sccache=trace"

You can configure many hosts as builders, and compilation jobs will be distributed amongst them.

Client

The client is the part that submits compilation work. You need to configure your machine the same as single host with regard to the environment variables.

Additionally you need to configure the file ~/.config/sccache/config. An example of this can be found in /etc/sccache/client.example.

[dist]
# The URL used to connect to the scheduler (should use https, given an ideal
# setup of a HTTPS server in front of the scheduler)
scheduler_url = "http://x.x.x.x:10600"
# Used for mapping local toolchains to remote cross-compile toolchains. Empty in
# this example where the client and build server are both Linux.
toolchains = []
# Size of the local toolchain cache, in bytes (5GB here, 10GB if unspecified).
# toolchain_cache_size = 5368709120

cache_dir = "/tmp/toolchains"

[dist.auth]
type = "token"
# This should match the `client_auth` section of the scheduler config.
token = ""

You can check the status with:

sccache --stop-server
sccache --dist-status

If you have issues, you can increase the logging with:

sccache --stop-server
SCCACHE_NO_DAEMON=1 RUST_LOG=sccache=trace sccache --dist-status

Then begin a compilation job and you will get the extra logging. To undo this, run:

sccache --stop-server
sccache --dist-status

In addition, sccache even in distributed mode can still use cloud or remote storage for items, using it’s cache first, and the distributed complitation second. Anything that can’t be remotely complied will be run locally.

Verifying

If you compile something from your client, you should see messages like this appear in journald in the builder/scheduler machine:

INFO 2020-11-19T22:23:46Z: sccache_dist: Job 140 created and will be assigned to server ServerId(V4(x.x.x.x:10501))
INFO 2020-11-19T22:23:46Z: sccache_dist: Job 140 successfully assigned and saved with state Ready
INFO 2020-11-19T22:23:46Z: sccache_dist: Job 140 updated state to Started
INFO 2020-11-19T22:23:46Z: sccache_dist: Job 140 updated state to Complete

The Last Two Years

Posted by Mo Morsi on November 01, 2020 04:00 AM

The last post on this blog (now mobile friendly!) was just over two years ago. Shortly after Dev Null Productions was launched, long before COVID19. That world is now foreign and as we live and grow, we transform in ways we never expected, our experiences drive us to new insights and perspectives. The past few years has seen alot of changes, which in return has led to much reflection. Meditation helps the day in and day out in this increasingly complex world. There are many great articles on the subject, I encourage all to read the one that introduced me onto the subject many years ago.

I also advise caution to those finding themselves spending a large amout of time in The Matrix. While technical development can be fruitful and rewarding, there is alot to be said about the balance in life and human connection. Any worthwhile endeavor takes many hours and focus, especially one as complex as launching a business, each individual needs to undertake their own journey. Dev Null Productions moves forward and I'm excited for the things that are next in store.

But before that I'll be taking some personal time off, many hours were spent during the first 1/2 of the year running the NYC/XRP meetup and building and launching our latest project Zerp Tracker. I'm proud of what we delivered and the product is working beautifully with no manual intervention, but the body and mind needs to recuperate.

I've been playing alot of guitar, honing my rollerblading technique, and am planning on launching a cross-country road trip in the near future... stay tuned for photos! Unfortunately the pandemic and looming economic crisis may affect that, the extent and ramifications of which noone will be able to foresee, but the open road calls and I look forward to the sights and experiences across this great planet.

Data &amp; Market Analysis in C++, R, and Python

Posted by Mo Morsi on October 30, 2020 06:22 AM

In recent years, since efforts on The Omega Project and The Guild sort of fizzled out, I've been exploring various areas of interest with no particular intent other than to play around with some ideas. Data & Financial Engineering was one of those domains and having spent some time diving into the subject (before once again moving on to something else altogether) I'm sharing a few findings here.

My journey down this path started not too long after the Bitcoin Barber Shop Pole was completed, and I was looking for a new project to occupy my free time (the little of it that I have). Having long since stepped down from the SIG315 board, but still renting a private office at the space, I was looking for some way to incorporate that into my next project (besides just using it as the occasional place to work). Brainstorming a bit, I settled on a data visualization idea, where data relating to any number of categories would be aggregated, geotagged, and then projected onto a virtual globe. I decided to use the Marble widget library, built ontop of the QT Framework and had great success:

Datachoppa

The architecture behind the DataChoppa project was simple, a generic 'Data' class was implemented using smart pointers ontop of which the Facet Pattern was incorporated, allowing data to be recorded from any number of sources in a generic manner and represented via convenient high level accessors. This was all collected via synchronization and generation plugins which implement a standarized interface whose output was then fed onto a queue on which processing plugings were listening, selecting the data that they were interested in to be operated on from there. The Processors themselves could put more data onto the queue after which the whole process was repeated ad inf., allowing each plugin to satisfy one bit of data-related functionality.

Datachoppa arch

Core Generic & Data Classes

namespace DataChoppa{
  // Generic value container
  class Generic{
      Map<std::string, boost::any> values;
      Map<std::string, std::string> value_strings;
  };

  namespace Data{
    /// Data representation using generic values
    class Data : public Generic{
      public:
        Data() = default;
        Data(const Data& data) = default;

        Data(const Generic& generic, TYPES _types, const Source* _source) :
          Generic(generic), types(_types), source(_source) {}

        bool of_type(TYPE type) const;

        Vector to_vector() const;

      private:
        TYPES types;

        const Source* source;
    }; // class Data
  }; // namespace Data
}; // namespace DataChoppa

The Process Loop

  namespace DataChoppa {
    namespace Framework{
      void Processor::process_next(){
        if(to_process.empty()) return;
  
        Data::Data data = to_process.first();
        to_process.pop_front();
  
        Plugins::Processors::iterator plugin = plugins.begin();
  
        while(plugin != plugins.end()) {
          Plugins::Meta* meta = dynamic_cast<Plugins::Meta*>(*plugin);
          //LOG(debug) << "Processing " << meta->id;
  
          try{
            queue((*plugin)->process(data));
  
          }catch(const Exceptions::Exception& e){
            LOG(warning) << "Error when processing: " << e.what()
                         << " via " << meta->id;
          }
  
          plugin++;
        }
      }
    }; /// namespace Framework
  }; /// namespace DataChoppa

The HTTP Plugin (abridged)

namespace DataChoppa {
  namespace Plugins{
    class HTTP : public Framework::Plugins::Syncer,
                 public Framework::Plugins::Job,
                 public Framework::Plugins::Meta {
      public:
        /// ...

        /// sync - always return data to be added to queue, even on error
        Data::Vector sync(){
          String _url = url();
          Network::HTTP::SyncRequest request(_url, request_timeout);

          for(const Network::HTTP::Header& header : headers())
            request.header(header);

          int attempted = 0;
          Network::HTTP::Response response(request);

          while(attempts == -1 || attempted &lt; attempts){
            ++attempted;

            try{
              response.update_from(request.request(payload()));

            }catch(Exceptions::Timeout){
              if(attempted == attempts){
                Data::Data result = response.to_error_data();
                result.source = &source;
                return result.to_vector();
              }
            }

            if(response.has_error()){
              if(attempted == attempts){
                Data::Data result = response.to_error_data();
                result.source = &source;
                return result.to_vector();
              }

            }else{
              Data::Data result = response.to_data();
              result.source = &source;
              return result.to_vector();
            }
          }

          /// we should never get here
          return Data::Vector();
        }
    };
  }; // namespace Plugins
}; // namespace DataChoppa

Overall I was pleased with the result (and perhaps I should have stopped there...). The application collected and aggregated data from many sources including RSS feeds (google news, reddit, etc), weather sources (yahoo weather, weather.com), social networks (facebook, twitter, meetup, linkedin), chat protocols (IRC, slack), financial sources, and much more. While exploring the last I discovered the world of technical analysis and began incorporating many various market indicators into a financial analysis plugin for the project.

The Market Analysis Architecture

Datachoppa extractors Datachoppa annotators

Aroon Indicator (for example)

namespace DataChoppa{
  namespace Market {
    namespace Annotators {
      class Aroon : public Annotator {
        public:
          double aroon_up(const Quote& quote, int high_offset, double range){
            return ((range-1) - high_offset) / (range-1) * 100;
          }

          DoubleVector aroon_up(const Quotes& quotes, const Annotations::Extrema* extrema, int range){
            return quotes.collect<DoubleVector>([extrema, range](const Quote& q, int i){
                     return aroon_up(q, extrema->high_offsets[i], range);
                   });
          }

          double aroon_down(const Quote& quote, int low_offset, double range){
            return ((range-1) - low_offset) / (range-1) * 100;
          }

          DoubleVector aroon_down(const Quotes& quotes, const Annotations::Extrema* extrema, int range){
            return quotes.collect<DoubleVector>([extrema, range](const Quote& q, int i){
                     return aroon_down(q, extrema->low_offsets[i], range);
                   });
          }

          AnnotationList annotate() const{
            const Quotes& quotes = market->quotes;
            if(quotes.size() < range) return AnnotationList();

            const Annotations::Extrema* extrema = aroon_extrema(market, range);
                    Annotations::Aroon* aroon = new Annotations::Aroon(range);
                                        aroon->upper = aroon_up(market->quotes, extrema, range);
                                        aroon->lower = aroon_down(market->quotes, extrema, range);
            return aroon->to_list();
          }
      }; /// class Aroon
    }; /// namespace Annotators
  }; /// namespace Market
}; // namespace DataChoppa

The whole thing worked great, data was pulled in both real time and historical from yahoo finance (until they discontinued it... from then it was google finance), the indicators were run, and results were output. Of course, making $$$ is not as simple as just crunching numbers, and being rather naive I just tossed the results of the indicators into weighted "buckets" and backtested based on simple boolean flags based on the computed signals against threshold values. Thankfully I backtested though as the performance was horrible as losses greatly exceed profits :-(

At this point I should take a step back and note that my progress so far was the result of the availibilty of alot of great resources (we really live in the age of accelerated learning). Specifically the following are indispensible books & sites for those interested in this subject:

  • stockcharts.com - Information on any indicator can be found on this site with details on how it is computed and how it can be used
  • investopedia - Sort of the Wikipedia of investment knowledge, offers great high level insights into how market works and the financial world as it stands
  • Beyond Candlesticks - Though candlestick patterns have limited use, this is great intro to the subject, and provides a good into to reading charts.
  • Nerds on Wall Street - A great book detailing the history of computational finance. Definetly must read if you are new to the domain as it provides a concise high level history on how markets have worked the last few centuries and various computations techniques employed to Seek Alpha
  • High Probability Trading - Provides insights as to the mentality and common pitfalls when trading.
Beyond candlesticks Nerds on wallstreet High prob trading

The last book is an excellent resource which conveys the importance of money and risk management, as well as the necessity to combine in all factors, or as many factors as you can, when making financial decisions. In the end, I feel this is the gist of it, it's not soley a matter of luck (though there is an aspect of that to this), but rather patience, discipline, balance, and most importantly focus (similar to Aikido but that's a topic for another time). There is no shorting it (unless you're talking about the assets themselves!), and if one does not have / take the necessary time to research and properly plan and out and execute strategies, they will most likely fail (as most do according to the numbers).

It was at this point that I decided to take a step back and restrategize, and having reflected and discussed it over with some acquaintances, I hedged my bets, cut my losses (tech-wise) and switched from C++ to another platform which would allow me prototype and execute ideas quicker. A good amount of time has gone into the C++ project and it worked great, but it did not make sense to continue via a slower development cycle when faster options are available (and afterall every engineer knows time is our most precious resource).

Python and R are the natural choices for this project domain, as there is extensive support in both languages for market analysis, backtesting, and execution. I have used Python at various, points in the past so it was easy to hit the ground running; R was new but by this time no language really poses a serious surprise, the best way I can describe it is spreadsheets on steroids (not exactly, as rather than spreadsheets, data frames and matrixes are the core components, but one can imagine R as being similar to the central execution environment behind Excel, Matlab, or other statistical-software).

I quickly picked up quantmod and prototyped some volatility, trend-following, momentum, and other analysis signal generators in R, plotting them using the provided charting interface. R is a great language for this sort of data manipulation, one can quickly load up structured data from CSV files or online resources, splice it and dice it, chunk it and dunk it, organize it and prioritize it, according to any arithmatic, statistical, or linear/non-linear means which they desire. Quickly loading a new 'view' on the data is as simply as a line of code, and operations can quickly be chained together at high performance.

Volatility indicator in R (consolidated)

quotes <- load_default_symbol("volatility")

quotes.atr <- ATR(quotes, n=ATR_RANGE)

quotes.atr$tr_atr_ratio <- quotes.atr$tr / quotes.atr$atr
quotes.atr$is_high      <- ifelse(quotes.atr$tr_atr_ratio > HIGH_LEVEL, TRUE, FALSE)

# Also Generate ratio of atr to close price
quotes.atr$atr_close_ratio <- quotes.atr$atr / Cl(quotes)

# Generate rising, falling, sideways indicators by calculating slope of ATR regression line
atr_lm       <- list()
atr_lm$df    <- data.frame(quotes.atr$atr, Time = index(quotes.atr))
atr_lm$model <- lm(atr ~ poly(Time, POLY_ORDER), data = atr_lm$df) # polynomial linear model

atr_lm$fit   <- fitted(atr_lm$model)
atr_lm$diff  <- diff(atr_lm$fit)
atr_lm$diff  <- as.xts(atr_lm$diff)

# Current ATR / Close Ratio
quotes.atr.abs_per <- median(quotes.atr$atr_close_ratio[!is.na(quotes.atr$atr_close_ratio)])

# plots
chartSeries(quotes.atr$atr)
addLines(predict(atr_lm$model))
addTA(quotes.atr$tr, type="h")
addTA(as.xts(as.logical(quotes.atr$is_high), index(quotes.atr)), col=col1, on=1)

While it all works great, the R language itself offers very little syntactic sugar for operations not related to data-processing. While there are libraries for most common functionality found in many other execution environments, languages such as Ruby and Python, offer a "friendlier" experience to both novice and seasoned developers alike. Furthermore the process of data synchronization was a tedious step, I was looking for something that offered the flexability of DataChoppa to pull in and process live and historical data from a wide variety of sources, caching results on the fly, and using those results and analysis for subsequent operations.

This all led me to developing a series of Python libraries targeted towards providing a configurable high level view of the market. Intelligence Amplification (IA) as opposed to Artifical Intelligence (AI) if you will (see Nerds on Wall Street).

marketquery.py is a high level market querying library, which implements plugins used to resolve generic market queries for ticker time based data. One can used the interface to query for the lastest quotes or a specific range of them from a particular source, or allow the framework to select one for you.

Retrieve first 3 months of the last 5 years of GBPUSD data

  from marketquery.querier        import Querier
  from marketbase.query.builder   import QueryBuilder
  
  sym = "GBPUSD"
  
  first_3mo_of_last_5yr = (QueryBuilder().symbol(sym)
                                         .first("3months_of_year")
                                         .last("5years")
                                         .query)
  
  querier = Querier()
  res     = querier.run(first_3mo_of_last_5yr)
  
  for query, dat in res.items():
      print(query)
      print(dat.raw[:1000] + (dat.raw[1000:] and '...'))

Retrieve last two month of hourly EURJPY data

  from marketquery.querier        import Querier
  from marketbase.query.builder   import QueryBuilder
  
  sym = "EURJPY"
  
  two_months_of_hourly = (QueryBuilder().symbol(sym)
                                        .last("2months")
                                        .interval("hourly")
                                        .query)
  
  querier = Querier()
  res     = querier.run(two_months_of_hourly).raw()
  print(res[:1000] + (res[1000:] and '...'))

This provides a quick way to both lookup market data according to specific criteria, as well as cache it so that network resources are used effectively. All caching is configurable, and the user can define timeouts based on the target query, source, and/or data retrieved.

From there the next level up is the technical analysis is was trivial to whip up the tacache.py module which uses the marketquery.py interface to retrieve raw data before feeding it into TALib caching the results. The same caching mechanisms offering the same flexability is employed, if one needs to process a large data set and/or subsets multiple times in a specified period, computational resources are not wasted (important when running on a metered cloud)

Computing various technical indicators

  from marketquery.querier       import Querier
  from marketbase.query.builder  import QueryBuilder
  
  from tacache.runner            import TARunner
  from tacache.source            import Source
  from tacache.indicator         import Indicator
  from talib                     import SMA
  from talib                  import MACD
  
  ###
  
  res = Querier().run(QueryBuilder().symbol("AUDUSD")
                                    .query)
  
  ###
  
  ta_runner = TARunner()
  analysis  = ta_runner.run(Indicator(SMA),
                            query_result=res)
  print(analysis.raw)
  
  analysis  = ta_runner.run(Indicator(MACD),
                            query_result=res)
  macd, sig, hist = analysis.raw
  print(macd)

Finally ontop of all this I wrote a2m.py, a high level querying interface consisting of modules reporting on market volatility and trends as well as other metrics; python scripts which I could quickly execute to report the current and historical market state, making used of the underlying cached query and technical analysis data, periodically invalidated to pull in new/recent live data.

Example using a2m to compute volatility

  sym = "EURUSD"
  self.resolver  = Resolver()
  self.ta_runner = TARunner()

  daily = (QueryBuilder().symbol(sym)
                         .interval("daily")
                         .last("year")
                         .query)

  hourly = (QueryBuilder().symbol(sym)
                          .interval("hourly")
                          .last("3months")
                          .latest()
                          .query)

  current = (QueryBuilder().symbol(sym)
                           .latest()
                           .data_dict()
                           .query)

  daily_quotes   = resolver.run(daily)
  hourly_quotes  = resolver.run(hourly)
  current_quotes = resolver.run(current)

  daily_avg  = ta_runner.run(Indicator(talib.SMA, timeperiod=120),  query_result=daily_quotes).raw[-1]
  hourly_avg = ta_runner.run(Indicator(talib.SMA, timeperiod=30),  query_result=hourly_quotes).raw[-1]

  current_val    = current_quotes.raw()[-1]['Close']
  daily_percent  = current_val / daily_avg  if current_val &lt; daily_avg  else daily_avg  / current_val
  hourly_percent = current_val / hourly_avg if current_val &lt; hourly_avg else hourly_avg / current_val
Awesome to the max

I would go onto use this to execute some Forex trades, again not in an algorithmic / automated manner, but rather based on combined knowledge from fundamentals research, as well as the high level technical data, and what was the result...

Poor squidward

I jest, though I did lose a little $$$, it wasn't that much, and to be honest I feel this was due to lack of patience/discipline and other "novice" mistakes as discussed above. I did make about 1/2 of it back, and then lost interest. This all requires alot of focus and time, and I had already spent 2+ years worth of free time on this. With many other interests pulling my strings, I decided to sideline the project(s) alltogether and focus on my next crazy venture.

TLDR;

After some of consideration, I decided to release the R code I wrote under the MIT license. They are rather simple expirements though could be useful as a starting point for others new to the subject. As far as the Python modules and DataChoppa, I intended to eventually release them but aim to take a break first to focus on other efforts and then go back to the war room, to figure out the next stage of the strategy.

And that's that! Enough number crunching, time to go out for a hike!

Hiking meme

Why I still choose Ruby

Posted by Mo Morsi on October 30, 2020 06:17 AM

With the plethora of languages available to developers, I wanted to do a quick follow-up post as to why given my experience in many different environments, Ruby is still the goto language for all my computational needs!

Prg mtn

While different languages offer different solutions in terms of syntax support, memory managment, runtime guarantees, and execution flows; the underlying arithmatic, logical, and I/O hardware being controlled is the same. Thus in theory, given enough time and optimization the performance differences between languages should go to 0 as computational power and capacity increases / goes to infinity (yes, yes, Moore's law and such, but lets ignore that for now).

Of course different classes of problem domains impose their own requirements,

  • real time processing depends low level optimizations that can only be done in assembly and C,
  • data crunching and process parallelization often needs minimal latency and optimized runtimes, something which you only get with compiled/static-typed languages such as C++ and Java,
  • and higher level languages such as Ruby, Python, Perl, and PHP are great for rapid development cycles and providing high level constructs where complicated algorithms can be invoked via elegant / terse means.

But given the rapid rate of hardware performance in recent years, whole classes of problems which were previously limited to 'lower-level' languages such as C and C++ are now able to be feasbily implemented in higher level languages.

Computer power

(source)

Thus we see high performance financial applications being implemented in Python, major websites with millions of users a day being implemented in Ruby and Javascript, massive data sets being crunched in R, and much more.

So putting the performance aspect of these environments aside we need to look at the syntactic nature of these languages as well as the features and tools they offer for developers. The last is the easiest to tackle as these days most notable languages come with compilers/interpreters, debuggers, task systems, test suites, documentation engines, and much more. This was not always the case though as Ruby was one of the first languages that pioneered builtin package management through rubygems, and integrated dependency solutions via gemspecs, bundler, etc. CPAN and a few other language-specific online repositories existed before, but with Ruby you got integration that was a core part of the runtime environment and community support. Ruby is still known to be on the leading front of integrated and end-to-end solutions.

Syntax differences is a much more difficult subject to objectively dicuss as much of it comes down to programmer preference, but it would be hard to object to the statement that Ruby is one of the most Object Oriented languages out there. It's not often that you can call the string conversion or type identification methods on ALL constructs, variables, constants, types, literals, primitives, etc:

  > 1.to_s
  => "1"
  > 1.class
  => Integer

Ruby also provides logical flow control constructs not seen in many other languages. For example in addition to the standard if condition then dosomething paradigm, Ruby allows the user to specify the result after the predicate, eg dosomething if condition. This simple change allows developers to express concepts in a natural manner, akin to how they would often be desrcibed between humans. In addition to this, other simple syntax conveniences include:

  • The unless keyword, simply evaluating to if not
      file = File.open("/tmp/foobar", "w")
      file.write("Hello World") unless file.exist?
    
  • Methods are allowed to end with ? and ! which is great for specifying immutable methods (eg. Socket.open?), mutable methods and/or methods that can thrown an exception (eg. DBRecord.save!)
  • Inclusive and exclusive ranges can be specified via parenthesis and two or three elipses. So for example:
      > (1..4).include?(4)
      => true
      > (1...4).include?(4)
      => false
    
  • The yield keywork makes it trivial for any method to accept and invoke a callback during the course of its lifetime
  • And much more

Expanding upon the last, blocks are a core concept in Ruby, once which the language nails right on the head. Not only can any function accept an anonymous callback block, blocks can be bound to parameters and operated on like any other data. You can check the number of parameters the callbacks accept by invoking block.arity, dynamically dispatch blocks, save them for later invokation and much more.

Due to the asynchronous nature of many software solutions (many problems can be modeled as asynchronous tasks) blocks fit into many Ruby paradigms, if not as the primary invocation mechanism, then as an optional mechanism so as to enforce various runtime guarantees:

  File.open("/tmp/foobar"){ |file|
    # do whatever with file here
  }

  # File is guaranteed to be closed here, we didn't have to close it ourselves!

By binding block contexts, Ruby facilitates implementing tightly tailored solutions for many problem domains via DSLs. Ruby DSLs exists for web development, system orchestration, workflow management, and much more. This of course is not to mention the other frameworks, such as the massively popular Rails, as well as other widely-used technologies such as Metasploit

Finally, programming in Ruby is just fun. The language is condusive to expressing complex concepts elegantly, jives with many different programming paradigms and styles, and offers a quick prototype to production workflow that is intuitive for both novice and seasoned developers. Nothing quite scratches that itch like Ruby!

Doge ruby

RETerm to The Terminal with a GUI

Posted by Mo Morsi on October 30, 2020 06:11 AM

When it comes to user interfaces, most (if not all) software applications can be classified into one of three categories:

  • Text Based - whether they entail one-off commands, interactive terminals (REPL), or text-based visual widgets, these saw a major rise in the 50s-80s though were usurped by GUIs in the 80s-90s
  • Graphical - GUIs, or Graphical User Interfaces, facilitate creating visual windows which the user may interact with via the mouse or keyboard. There are many different GUI frameworks available for various platforms
  • Web Based - A special type of graphical interface rendered via a web browser, many applications provide their frontend via HTML, Javascript, & CSS
Interfaces comparison

In recent years modern interface trends seem to be moving in the direction of the Web User Interfaces (WUI), with increasing numbers of apps offering their functionality primarily via HTTP. That being said GUIs and TUIs (Text User Interfaces) are still an entrenched use case for various reasons:

  • Web browsers, servers, and network access may not be available or permissable on all systems
  • Systems need mechanisms to access and interact with the underlying components, incase higher level constructs, such as graphics and network subsystems fail or are unreliable
  • Simpler text & graphical implementations can be coupled and optimized for the underlying operational environment without having to worry about portability and cross-env compatability. Clients can thus be simpler and more robust.

Finally there is a certain pleasing ascethic to simple text interfaces that you don't get with GUIs or WUIs. Of course this is a human-preference sort-of-thing, but it's often nice to return to our computational roots as we move into the future of complex gesture and voice controlled computer interactions.

Scifi terminal

When working on a recent side project (to be announced), I was exploring various concepts as to the user interface which to throw ontop of it. Because other solutions exist in the domain which I'm working in (and for other reasons), I wanted to explore something novel as far as user interaction, and decided to expirement with a text-based approach. ncurses is the goto library for this sort of thing, being available on most modern platforms, along with many widget libraries and high level wrappers

Ncurses

Unfortunately ncurses comes with alot of boilerplate and it made sense to seperate that from the project I intend to use this for. Thus the RETerm library was born, with the intent to provide a high level DSL to implmenent terminal interfaces and applications (... in Ruby of couse <3 !!!)

Reterm sc1

RETerm, aka the Ruby Enhanced TERMinal allows the user to incorporate high level text-based widgets into an orgnaized terminal window, with seemless standardized keyboard interactions (mouse support is on the roadmap to be added). So for example, one could define a window containing a child widget like so:

require 'reterm'
include RETerm

value = nil

init_reterm {
  win = Window.new :rows => 10,
                   :cols => 30
  win.border!
  update_reterm

  slider = Components::VSlider.new
  win.component = slider
  value = slider.activate!
}

puts "Slider Value: #{value}"

This would result in the following interface containing a vertical slider:

Reterm sc2

RETerm ships with many built widgets including:

Text Entry

Reterm sc3

Clickable Button

Reterm sc4

Radio Switch/Rocker/Selectable List

Reterm sc5 Reterm sc6 Reterm sc7

Sliders (both horizontal and vertical)

Dial

Ascii Text (with many fonts via artii/figlet)

Reterm sc8

Images (via drawille)

Reterm sc9

RETerm is now available via rubygems. To install, simplly:

  $ gem install reterm

That's All Folks... but wait there is more!!! Afterall:

Delorian meme

For a bit of a value-add, I decided to implement a standard schema where text interfaces could be described in a JSON config file and loaded by the framework, similar to xml schemas which GTK and Android use for their interfaces. One can simply describe their interface in JSON and the framework will instantiate the corresponding text interface:

{
  "window" : {
    "rows"      : 10,
    "cols"      : 50,
    "border"    : true,
    "component" : {
      "type" : "Entry",
      "init" : {
        "title" : "<C>Demo",
        "label" : "Enter Text: "
      }
    }
  }
}
Reterm sc10

To assist in generating this schema, I implemented a graphical designer, where components can be dragged and dropped into a 2D canvas to layout the interface.

That's right, you can now use a GUI based application to design a text-based interface.

Retro meme

The Designer itself can be found in the same repo as the RETerm project, loaded in the "designer/" subdir.

Reterm designer

To use if you need to install visualruby (a high level wrapper to ruby-gnome) like so:

  $ gem install visualruby

And that's it! (for real this time) This was certainly a fun side-project to a side-project (toss in a third "side-project" if you consider the designer to be its own thing!). As I to return to the project using RETerm, I aim to revisit it every so often, adding new features, widgets, etc....

EOF

CLS

Into The Unknown - My Departure from RedHat

Posted by Mo Morsi on October 30, 2020 06:09 AM

In May 2006, a young starry eyed intern walked into the large corporate lobby of RedHat's Centential Campus in Raleigh, NC, to begin what would be a 12 year journey full of ups and downs, break-throughs and setbacks, and many many memories. Flash forward to April 2018, when the "intern-turned-hardend-software-enginner" filed his resignation and ended his tenure at RedHat to venture into the risky but exciting world of self-employment / entrepreneurship... Incase you were wondering that former-intern / Software Engineer is myself, and after nearly 12 years at RedHat, I finished my last day of employment on Friday April 13th, 2018.

Overall RedHat has been a great experience, I was able to work on many ground-breaking products and technologies, with many very talented individuals from across the spectrum and globe, in a manner that facilitated maximum professional and personal growth. It wasn't all sunshine and lolipops though, there were many setbacks, including many cancelled projects and dead-ends. That being said, I felt I was always able to speak my mind without fear of reprocussion, and always strived to work on those items that mattered the most and had the furthest reaching impact.

Some (but certainly not all) of those items included:

  • The highly publicized, but now defunct, RHX project
  • The oVirt virtualization management (cloud) platform, where I was on the original development team, and helped build the first prototypes & implementation
  • The RedHat Ruby stack, which was a battle to get off the ground (given the prevalence of the Java and Python ecosystems, continuing to to this day). This is one of the items I am most proud of, we addressed the needs of both the RedHat/Fedora and Ruby communities, building the necessary bridge logic to employ Ruby and Rails solutions in many enterprise production environments. This continues to this day as the team continuously stays ontop of upstream Ruby developments and provide robust support and solutions for downstream use
  • The RedHat CloudForms projects, on which I worked on serveral interations, again including initial prototypes and standards, as well as ManageIQ integration.
  • ReFS reverse engineering and parser. The last major research topic that I explored during my tenure at RedHat, this was a very fun project where I built upon the sparse information about the filesystem internals that's out there, and was able to deduce the logic up to the point of being able to read directory lists and file contents and metadata out of a formatted filesystem. While there is plenty of work to go on this front, I'm confident that the published writeups are an excellent launching point for additional research as well as the development of formal tools to extract data from the filesystem.

My plans for the immediate future are to take a short break then file to form a LLC and explore options under that umbrella. Of particular interest are crypto-currencies, specifically Ripple. I've recently begun developing an integrated wallet, ledger and market explorer, and statistical analysis framework called Wipple which I'm planning to continue working on and if all goes according to plan, generating some revenue from. There is alot of ??? between here and there, but thats the name of the game!

Until then, I'd like to thank everyone who helped me do my thing at RedHat, from landing my initial internships and the full-time after that, to showing me the ropes, and not shooting me down when I put myself out there to work on and promote our innovation solutions and technologies!

Dev Null Productions

Posted by Mo Morsi on October 30, 2020 06:09 AM

After my Departure from RedHat I was able to get some RnR, but quickly wanted to get a head start of my next venture. This is because I decided to put a cap on the amount of time that would be dedicated to trying to make "it" happen, and would pause at regular checkpoints to monitor progress. This is not to say I'm going to quit the endeavor at that point in the future (the timeframe of which I'm keeping private), but the intent is to drive focus and keep the ball moving forward objectively. While Omega was a great project to work on, both fun and the source of much growth and experience, I am not comfortable with the amount of time spent on it, for what was gained. All hindsight is 20/20, but every good trader knows when to cut losses.

Dev Null Productions LLC. was launched four months ago in April 2018 and we haven't looked back. Our flagship product, Wipple XRP Intelligence was launched shortly after, providing realtime access to the XRP network and high level stats and reporting. The product is under continued development and we've begun a social-media based marketing drive to promote it. Things are still early, and there is still aways to go & obstacles to overcome (not to mention the crypto-currency bear market that we've been in for the last 1/2 year), but the progress has been great, and there are many more awesome features in the queue

This Thursday, I am giving a presentation on XRP to the Syracuse Software Development Meetup, hosted at the Tech Garden, a tech incubator in Syracuse, NY. I aim to go over the XRP protocol, discussing both the history of Ripple and the technical details, as well as common use cases and gotchyas from our experiences. The event is looking very solid, and there is already a large turnout and some great momentum growing, so I'm excited to participate in it and see how it all goes. While we're still in the early phases of development, I'm hoping to drive some interest in the project, and perhaps meet collaborators who'd like to come onboard for a percentage of ultimate profits!

Be sure to stay tuned for more updates and developments, until then, keep Rippling!

IBM To Buy RedHat &amp; Dev Null Update

Posted by Mo Morsi on October 30, 2020 06:04 AM

I had no idea about the acquisition when I left Red Hat last spring, but I'll just leave this one here:

In other news, Dev Null is going great, the product has more or less stabilized, we're hitting up conferences all over the country to promote it, and it's gaining some traction with the XRP community on social media. There's still along ways to go to see Wipple reach full fruition but I'm excited by the progress and eager to continue driving development.

That's all for now, I have a laundry list of topics which I'd like to blog about but until I get some time to do so, they will have to stay on the backburner!

How a Search Query is Processed in Kanidm

Posted by William Brown on August 31, 2020 02:00 PM

How a Search Query is Processed in Kanidm

Databases from postgres to sqlite, mongodb, and even LDAP all need to take a query and turn that into a meaningful result set. This process can often seem like magic, especially when you consider an LDAP server is able to process thousands of parallel queries, with a database spanning millions of entries and still can return results in less than a millisecond. Even more impressive is that every one of these databases can be expected to return the correct result, every time. This level of performance, correctness and precision is an astounding feat of engineering, but is rooted in a simple set of design patterns.

Disclaimer

This will be a very long post. You may want to set aside some time for it :)

This post will discuss how Kanidm processes queries. This means that some implementation specifics are specific to the Kanidm project. However conceptually this is very close to the operation of LDAP servers (389-ds, samba 4, openldap) and MongoDB, and certainly there are still many overlaps and similarities to SQLite and Postgres. At the least, I hope it gives you some foundation to research the specifics behaviours you chosen database.

This post does NOT discuss how creation or modification paths operate. That is likely worthy of a post of it’s own. Saying this, search relies heavily on correct function of the write paths, and they are always intertwined.

The referenced code and links relate to commit dbfe87e from 2020-08-24. The project may have changed since this point, so it’s best if you can look at the latest commits in the tree if possible.

Introduction

Kanidm uses a structured document store model, similar to LDAP or MongoDB. You can consider entries to be like a JSON document. For example,

{
    "class": [
        "object",
        "memberof",
        "account",
        "posixaccount"
    ],
    "displayname": [
        "William"
    ],
    "gidnumber": [
        "1000"
    ],
    "loginshell": [
        "/bin/zsh"
    ],
    "name": [
        "william"
    ],
    "uuid": [
        "5e01622e-740a-4bea-b694-e952653252b4"
    ],
    "memberof": [
        "admins",
        "users",
        "radius"
    ],
    "ssh_publickey": [
        {
            "tag": "laptop",
            "key": "...."
        }
    ]
}

Something of note here is that an entry has many attributes, and those attributes can consist of one or more values. values themself can be structured such as the ssh_publickey value which has a tag and the public key, or the uuid which enforces uuid syntax.

Filters / Queries

During a search we want to find entries that match specific attribute value assertions or attribute assertions. We also want to be able to use logic to provide complex conditions or logic in how we perform the search. We could consider the search in terms of SQL such as:

select from entries where name = william and class = account;

Or in LDAP syntax

(&(objectClass=account)(name=william))

In Kanidm JSON (which admitedly, is a bit rough, we don’t expect people to use this much!)

{ "and": [{"eq": ["class", "account"]}, {"eq": ["name": "william"]} ]}

Regardless of how we structure these, they are the same query. We want to find entries where the property of class=account and name=william hold true. There are many other types of logic we could apply (especially true for sql), but in Kanidm we support the following proto(col) filters

pub enum Filter {
    Eq(String, String),
    Sub(String, String),
    Pres(String),
    Or(Vec<Filter>),
    And(Vec<Filter>),
    AndNot(Box<Filter>),
    SelfUUID,
}

These represent:

  • Eq(uality) - an attribute of name, has at least one value matching the term
  • Sub(string) - an attribute of name, has at least one value matching the substring term
  • Pres(ence) - an attribute of name, regardless of value exists on the entry
  • Or - One or more of the nested conditions must evaluate to true
  • And - All nested conditions must be true, or the and returns false
  • AndNot - Within an And query, the inner term must not be true relative to the related and term
  • SelfUUID - A dynamic Eq(uality) where the authenticated user’s UUID is added. Essentially, this substitutes to “eq (uuid, selfuuid)”

Comparing to the previous example entry, we can see that { “and”: [{“eq”: [“class”, “account”]}, {“eq”: [“name”: “william”]} ]} would be true, where { “eq”: [“name”: “claire”]} would be false as no matching name attribute-value exists on the entry.

Recieving the Query

There are multiple ways that a query could find it’s way into Kanidm. It may be submitted from the raw search api, it could be generated from a REST endpoint request, it may be translated via the LDAP compatability. The most important part is that it is then recieved by a worker thread in the query server. For this discussion we’ll assume we recieved a raw search via the front end.

handle_search is the entry point of a worker thread to process a search operation. The first thing we do is begin a read transaction over the various elements of the database we need.

fn handle(&mut self, msg: SearchMessage, _: &mut Self::Context) -> Self::Result {
let mut audit = AuditScope::new("search", msg.eventid, self.log_level);
let res = lperf_op_segment!(&mut audit, "actors::v1_read::handle<SearchMessage>", || {
    // Begin a read
    let qs_read = self.qs.read();

The call to qs.read takes three transactions - the backend, the schema cache and the access control cache.

pub fn read(&self) -> QueryServerReadTransaction {
    QueryServerReadTransaction {
        be_txn: self.be.read(),
        schema: self.schema.read(),
        accesscontrols: self.accesscontrols.read(),
    }
}

The backend read takes two transactions internally - the database layers, and the indexing metadata cache.

pub fn read(&self) -> BackendReadTransaction {
    BackendReadTransaction {
        idlayer: UnsafeCell::new(self.idlayer.read()),
        idxmeta: self.idxmeta.read(),
    }
}

Once complete, we can now transform the submitted request, into an internal event. By structuring all requests as event, we standarise all operations to a subset of operations, and we ensure that that all resources required are available in the event. The search event as processed stores an event origin aka the identiy of the event origin. The search query is stored in the filter attribute, and the original query is stored in the filter_orig. There is a reason for this duplication.

pub fn from_message(
    audit: &mut AuditScope,
    msg: SearchMessage,
    qs: &QueryServerReadTransaction,
) -> Result<Self, OperationError> {
    let event = Event::from_ro_uat(audit, qs, msg.uat.as_ref())?;
    let f = Filter::from_ro(audit, &event, &msg.req.filter, qs)?;
    // We do need to do this twice to account for the ignore_hidden
    // changes.
    let filter = f
        .clone()
        .into_ignore_hidden()
        .validate(qs.get_schema())
        .map_err(OperationError::SchemaViolation)?;
    let filter_orig = f
        .validate(qs.get_schema())
        .map_err(OperationError::SchemaViolation)?;
    Ok(SearchEvent {
        event,
        filter,
        filter_orig,
        // We can't get this from the SearchMessage because it's annoying with the
        // current macro design.
        attrs: None,
    })
}

As filter is processed it is transformed by the server to change it’s semantics. This is due to the call to into_ignore_hidden. This function adds a wrapping layer to the outside of the query that hides certain classes of entries from view unless explicitly requested. In the case of kanidm this transformation is to add:

{ "and": [
    { "andnot" : { "or" [
        {"eq": ["class", "tombstone"]},
        {"eq": ["class", "recycled"]}
    }]},
    <original query>
]}

This prevents the display of deleted (recycle bin) entries, and the display of tombstones - marker entries representing that an entry with this UUID once existed in this location. These tombstones are part of the (future) eventually consistent replication machinery to allow deletes to be processed.

This is why filter_orig is stored. We require a copy of the “query as intended by the caller” for the purpose of checking access controls later. A user may not have access to the attribute “class” which would mean that the addition of the into_ignore_hidden could cause them to not have any results at all. We should not penalise the user for something they didn’t ask for!

After the query is transformed, we now validate it’s content. This validation ensures that queries contain only attributes that truly exist in schema, and that their representation in the query is sound. This prevents a number of security issues related to denial of service or possible information disclosures. The query has every attribute-value compared to the schema to ensure that they exist and are correct syntax types.

Start Processing the Query

Now that the search event has been created and we know that is is valid within a set of rules, we can submit it to the search_ext(ernal) interface of the query server. Because everything we need is contained in the search event we are able to process the query from this point. Search external is a wrapper to the internal search, where search_ext is able to wrap and apply access controls to the results from the operation.

fn search_ext(
    &self,
    au: &mut AuditScope,
    se: &SearchEvent,
) -> Result<Vec<Entry<EntryReduced, EntryCommitted>>, OperationError> {
    lperf_segment!(au, "server::search_ext", || {
        /*
         * This just wraps search, but it's for the external interface
         * so as a result it also reduces the entry set's attributes at
         * the end.
         */
        let entries = self.search(au, se)?;

        let access = self.get_accesscontrols();
        access
            .search_filter_entry_attributes(au, se, entries)
            .map_err(|e| {
                // Log and fail if something went wrong.
                ladmin_error!(au, "Failed to filter entry attributes {:?}", e);
                e
            })
        // This now returns the reduced vec.
    })
}

The internal search function is now called, and we begin to prepare for the backend to handle the query.

We have a final transformation we must apply to the query that we intend to pass to the backend. We must attach metadata to the query elements so that we can perform informed optimisation of the query.

let be_txn = self.get_be_txn();
let idxmeta = be_txn.get_idxmeta_ref();
// Now resolve all references and indexes.
let vfr = lperf_trace_segment!(au, "server::search<filter_resolve>", || {
    se.filter.resolve(&se.event, Some(idxmeta))
})

This is done by retreiving indexing metadata from the backend, which defines which attributes and types of indexes exist. This indexing metadata is passed to the filter to be resolved. In the case of tests we may not pass index metadata, which is why filter resolve accounts for the possibility of idxmeta being None. The filter elements are transformed, for example we change eq to have a boolean associated if the attribute is indexed. In our example this would change the query:

{ "and": [
    { "andnot" : { "or" [
        {"eq": ["class", "tombstone"]},
        {"eq": ["class", "recycled"]}
    }]},
    { "and": [
        {"eq": ["class", "account"]},
        {"eq": ["name": "william"]}
    ]}
]}

To

{ "and": [
    { "andnot" : { "or" [
        {"eq": ["class", "tombstone", true]},
        {"eq": ["class", "recycled", true]}
    }]},
    { "and": [
        {"eq": ["class", "account", true]},
        {"eq": ["name": "william", true]}
    ]}
]}

With this metadata associated to the query, we can now submit it to the backend for processing.

Backend Processing

We are now in a position where the backend can begin to do work to actually process the query. The first step of the backend search function is to perform the final optimisation of the filter.

fn search(
    &self,
    au: &mut AuditScope,
    erl: &EventLimits,
    filt: &Filter<FilterValidResolved>,
) -> Result<Vec<Entry<EntrySealed, EntryCommitted>>, OperationError> {
    lperf_trace_segment!(au, "be::search", || {
        // Do a final optimise of the filter
        let filt =
            lperf_trace_segment!(au, "be::search<filt::optimise>", || { filt.optimise() });

Query optimisation is critical to make searches fast. In Kanidm it relies on a specific behaviour of the indexing application process. I will highlight that step shortly.

For now, the way query optimisation works is by sorting and folding terms in the query. This is because there are a number of logical equivalences, but also that due to the associated metadata and experience we know that some terms may be better in different areas. Optimisation relies on a sorting function that will rearrange terms as needed.

An example is that a nested and term, can be folded to the parent because logically an and inside and and is the same. Similar for or inside or.

Within the and term, we can then rearrange the terms, because the order of the terms does not matter in an and or or, only that the other logical elements hold true. We sort indexed equality terms first because we know that they are always going to resolve “faster” than the nested andnot term.

{ "and": [
    {"eq": ["class", "account", true]},
    {"eq": ["name": "william", true]},
    { "andnot" : { "or" [
        {"eq": ["class", "tombstone", true]},
        {"eq": ["class", "recycled", true]}
    }]}
]}

In the future, an improvement here is to put name before class, which will happen as part of the issue #238 which allows us to work out which indexes are going to yield the best information content. So we can sort them first in the query.

Finally, we are at the point where we can begin to actually load some data! 🎉

The filter is submitted to filter2idl. To understand this function, we need to understand how indexes and entries are stored.

let (idl, fplan) = lperf_trace_segment!(au, "be::search -> filter2idl", || {
    self.filter2idl(au, filt.to_inner(), FILTER_SEARCH_TEST_THRESHOLD)
})?;

All databases at the lowest levels are built on collections of key-value stores. That keyvalue store may be a in memory tree or hashmap, or an on disk tree. Some common stores are BDB, LMDB, SLED. In Kanidm we use SQLite as a key-value store, through tables that only contain two columns. The intent is to swap to SLED in the future once it gains transactions over a collection of trees, and that trees can be created/removed in transactions.

The primary storage of all entries is in the table id2entry which has an id column (the key) and stores serialised entries in the data column.

Indexes are stored in a collection of their own tables, named in the scheme “idx_<type>_<attr>”. For example, “idx_eq_name” or “idx_pres_class”. These are stored as two columns, where the “key” column is a precomputed result of a value in the entry, and the “value” is a set of integer ID’s related to the entries that contain the relevant match.

As a bit more of a graphic example, you can imagine these as:

id2entry
| id | data                                    |
| 1  | { "name": "william", ... }
| 2  | { "name": "claire", ... }

idx_eq_name
| key     |
| william | [1, ]
| claire  | [2, ]

idm_eq_class
| account | [1, 2, ... ]

As these are key-value stores, they are able to be cached through an in-memory key value store to speed up the process. Initially, we’ll assume these are not cache.

filter2idl

Back to filter2idl. The query begins by processing the outer and term. As the and progresses inner elements are iterated over and then recursively sent to filter2idl.

FilterResolved::And(l) => {
    // First, setup the two filter lists. We always apply AndNot after positive
    // and terms.
    let (f_andnot, f_rem): (Vec<_>, Vec<_>) = l.iter().partition(|f| f.is_andnot());

    // We make this an iter, so everything comes off in order. if we used pop it means we
    // pull from the tail, which is the WORST item to start with!
    let mut f_rem_iter = f_rem.iter();

    // Setup the initial result.
    let (mut cand_idl, fp) = match f_rem_iter.next() {
        Some(f) => self.filter2idl(au, f, thres)?,
        None => {
            lfilter_error!(au, "WARNING: And filter was empty, or contains only AndNot, can not evaluate.");
            return Ok((IDL::Indexed(IDLBitRange::new()), FilterPlan::Invalid));
        }
    };
    ...

The first term we encounter is {“eq”: [“class”, “account”, true]}. At this point filter2idl is able to request the id list from the lower levels.

FilterResolved::Eq(attr, value, idx) => {
    if *idx {
        // Get the idx_key
        let idx_key = value.get_idx_eq_key();
        // Get the idl for this
        match self
            .get_idlayer()
            .get_idl(au, attr, &IndexType::EQUALITY, &idx_key)?
        {
            Some(idl) => (
                IDL::Indexed(idl),
                FilterPlan::EqIndexed(attr.to_string(), idx_key),
            ),
            None => (IDL::ALLIDS, FilterPlan::EqCorrupt(attr.to_string())),
        }
    } else {
        // Schema believes this is not indexed
        (IDL::ALLIDS, FilterPlan::EqUnindexed(attr.to_string()))
    }
}

The first level that is able to serve the request for the key to be resolved is the ARCache layer. This tries to lookup the combination of (“class”, “account”, eq) in the cache. If found it is returned to the caller. If not, it is requested from the sqlite layer.

let cache_key = IdlCacheKey {
    a: $attr.to_string(),
    i: $itype.clone(),
    k: $idx_key.to_string(),
};
let cache_r = $self.idl_cache.get(&cache_key);
// If hit, continue.
if let Some(ref data) = cache_r {
    ltrace!(
        $audit,
        "Got cached idl for index {:?} {:?} -> {}",
        $itype,
        $attr,
        data
    );
    return Ok(Some(data.as_ref().clone()));
}
// If miss, get from db *and* insert to the cache.
let db_r = $self.db.get_idl($audit, $attr, $itype, $idx_key)?;
if let Some(ref idl) = db_r {
    $self.idl_cache.insert(cache_key, Box::new(idl.clone()))
}

This sqlite layer performs the select from the “idx_<type>_<attr>” table, and then deserialises the stored id list (IDL).

let mut stmt = self.get_conn().prepare(query.as_str()).map_err(|e| {
    ladmin_error!(audit, "SQLite Error {:?}", e);
    OperationError::SQLiteError
})?;
let idl_raw: Option<Vec<u8>> = stmt
    .query_row_named(&[(":idx_key", &idx_key)], |row| row.get(0))
    // We don't mind if it doesn't exist
    .optional()
    .map_err(|e| {
        ladmin_error!(audit, "SQLite Error {:?}", e);
        OperationError::SQLiteError
    })?;

let idl = match idl_raw {
    Some(d) => serde_cbor::from_slice(d.as_slice())
        .map_err(|_| OperationError::SerdeCborError)?,
    // We don't have this value, it must be empty (or we
    // have a corrupted index .....
    None => IDLBitRange::new(),
};

The IDL is returned and cached, then returned to the filter2idl caller. At this point the IDL is the “partial candidate set”. It contains the ID numbers of entries that we know partially match this query at this point. Since the first term is {“eq”: [“class”, “account”, true]} the current candidate set is [1, 2, …].

The and path in filter2idl continues, and the next term encountered is {“eq”: [“name”: “william”, true]}. This resolves into another IDL. The two IDL’s are merged through an and operation leaving only the ID numbers that were present in both.

(IDL::Indexed(ia), IDL::Indexed(ib)) => {
    let r = ia & ib;
    ...

For this example this means in our example that the state of r(esult set) is the below;

res = ia & ib;
res = [1, 2, ....] & [1, ];
res == [1, ]

We know that only the entry with ID == 1 matches both name = william and class = account.

We now perform a check called the “filter threshold check”. If the number of ID’s in the IDL is less than a certain number, we can shortcut and return early even though we are not finished processing.

if r.len() < thres && f_rem_count > 0 {
    // When below thres, we have to return partials to trigger the entry_no_match_filter check.
    let setplan = FilterPlan::AndPartialThreshold(plan);
    return Ok((IDL::PartialThreshold(r), setplan));
} else if r.is_empty() {
    // Regardless of the input state, if it's empty, this can never
    // be satisfied, so return we are indexed and complete.
    let setplan = FilterPlan::AndEmptyCand(plan);
    return Ok((IDL::Indexed(IDLBitRange::new()), setplan));
} else {
    IDL::Indexed(r)
}

This is because the IDL is now small, and continuing to load more indexes may cost more time and resources. The IDL can only ever shrink or stay the same from this point, never expand, so we know it must stay small.

However, you may correctly have deduced that there are still two terms we must check. That is the terms contained within the andnot of the query. I promise you, we will check them :)

So at this point we now step out of filter2idl and begin the process of post-processing the results we have.

Resolving the Partial Set

We check the way that the IDL is tagged so that we understand what post processing is required, and check some security controls. If the search was unindexed aka ALLIDS, and if the account is not allowed to access fully unindexed searches, then we return a failure at this point. We also now check if the query was Partial(ly) unindexed, and if it is, we assert limits over the number of entries we may load and test.

match &idl {
    IDL::ALLIDS => {
        if !erl.unindexed_allow {
            ladmin_error!(au, "filter (search) is fully unindexed, and not allowed by resource limits");
            return Err(OperationError::ResourceLimit);
        }
    }
    IDL::Partial(idl_br) => {
        if idl_br.len() > erl.search_max_filter_test {
            ladmin_error!(au, "filter (search) is partial indexed and greater than search_max_filter_test allowed by resource limits");
            return Err(OperationError::ResourceLimit);
        }
    }
    IDL::PartialThreshold(_) => {
        // Since we opted for this, this is not the fault
        // of the user and we should not penalise them by limiting on partial.
    }
    IDL::Indexed(idl_br) => {
        // We know this is resolved here, so we can attempt the limit
        // check. This has to fold the whole index, but you know, class=pres is
        // indexed ...
        if idl_br.len() > erl.search_max_results {
            ladmin_error!(au, "filter (search) is indexed and greater than search_max_results allowed by resource limits");
            return Err(OperationError::ResourceLimit);
        }
    }
};

We then load the related entries from the IDL we have. Initially, this is called through the entry cache of the ARCache.

let entries = self.get_idlayer().get_identry(au, &idl).map_err(|e| {
    ladmin_error!(au, "get_identry failed {:?}", e);
    e
})?;

As many entries as possible are loaded from the ARCache. The remaining ID’s that were missed are stored in a secondary IDL set. The missed entry set is then submitted to the sqlite layer where the entries are loaded and deserialised. An important part of the ARCache is to keep fully inflated entries in memory, to speed up the process of retrieving these. Real world use shows this can have orders of magnitude of impact on performance by just avoiding this deserialisation step, but also that we avoid IO to disk.

The entry set is now able to be checked. If the IDL was Indexed no extra work is required, and we can just return the values. But in all other cases we must apply the filter test. The filter test is where the terms of the filter are checked against each entry to determine if they match and are part of the set.

This is where the partial threshold is important - that the act of processing the remaining indexes may be more expensive than applying the filter assertions to the subset of entries in memory. It’s also why filter optimisation matters. If a query can be below the threshold sooner, than we can apply the filter test earlier and we reduce the number of indexes we must load and keep cached. This helps performance and cache behaviour.

The filter test applies the terms of the filter to the entry, using the same rules as the indexing process to ensure consistent results. This gives us a true/false result, which lets us know if the entry really does match and should become part of the final candidate set.

fn search(...) {
    ...
    IDL::Partial(_) => lperf_segment!(au, "be::search<entry::ftest::partial>", || {
        entries
            .into_iter()
            .filter(|e| e.entry_match_no_index(&filt))
            .collect()
    }),
    ...
}

fn entry_match_no_index_inner(&self, filter: &FilterResolved) -> bool {
    match filter {
        FilterResolved::Eq(attr, value, _) => self.attribute_equality(attr.as_str(), value),
        FilterResolved::Sub(attr, subvalue, _) => {
            self.attribute_substring(attr.as_str(), subvalue)
        }
        ...
    }
}

It is now at this point that we finally have the fully resolved set of entries, in memory as a result set from the backend. These are returned to the query server’s search function.

Access Controls

Now the process of applying access controls begins. There are two layers of access controls as applied in kanidm. The first is which entries are you allowed to see. The second is within an entry, what attributes may you see. There is a reason for this seperation. The seperation exists so that when an internal search is performed on behalf of the user, we retrieve the set of entries you can see, but the server internally then performs the operation on your behalf and itself has access to see all attributes. If you wish to see this in action, it’s a critical part of how modify and delete both function, where you can only change or delete within your visible entry scope.

The first stage is search_filter_entries. This is the function that checks what entries you may see. This checks that you have the rights to see specific attributes (ie can you see name?), which then affects, “could you possibly have queried for this?”.

Imagine for example, that we search for “password = X” (which kanidm disallows but anyway …). Even if you could not read password, the act of testing the equality, if an entry was returned you would know now about the value or association to a user since the equality condition held true. This is a security risk for information disclosure.

The first stage of access controls is what rules apply to your authenticated user. There may be thousands of access controls in the system, but only some may related to your account.

let related_acp: Vec<&AccessControlSearch> =
    lperf_segment!(audit, "access::search_filter_entries<related_acp>", || {
        search_state
            .iter()
            .filter(|acs| {
                let f_val = acs.acp.receiver.clone();
                match f_val.resolve(&se.event, None) {
                    Ok(f_res) => rec_entry.entry_match_no_index(&f_res),
                    Err(e) => {
                        ...
                    }
                }
            })
            .collect()
    });

The next stage is to determine what attributes did you request to filter on. This is why filter_orig is stored in the event. We must test against the filter as intended by the caller, not the filter as executed. This is because the filter as executed may have been transformed by the server, using terms the user does not have access to.

let requested_attrs: BTreeSet<&str> = se.filter_orig.get_attr_set();

Then for each entry, the set of allowed attributes is determined. If the user related access control also holds rights oven the entry in the result set, the set of attributes it grants read access over is appended to the “allowed” set. This repeats until the set of related access controls is exhausted.

let allowed_entries: Vec<Entry<EntrySealed, EntryCommitted>> =
    entries
        .into_iter()
        .filter(|e| {

            let allowed_attrs: BTreeSet<&str> = related_acp.iter()
                .filter_map(|acs| {
                    ...
                    if e.entry_match_no_index(&f_res) {
                        // add search_attrs to allowed.
                        Some(acs.attrs.iter().map(|s| s.as_str()))
                    } else {
                        None
                    }
                    ...
                })
                .collect();

            let decision = requested_attrs.is_subset(&allowed_attrs);
            lsecurity_access!(audit, "search attr decision --> {:?}", decision);
            decision
        })

This now has created a set of “attributes this person can see” on this entry based on all related rules. The requested attributes are compared to the attributes you may see, and if requested is a subset or equal, then the entry is allowed to be returned to the user.

If there is even a single attribute in the query you do not have the rights to see, then the entry is disallowed from the result set. This is because if you can not see that attribute, you must not be able to apply a filter test to it.

To give a worked example, consider the entry from before. We also have three access controls:

applies to: all users
over: pres class
read attr: class

applies to: memberof admins
over: entries where class = account
read attr: name, displayname

applies to: memberof radius_servers
over: entries where class = account
read attr: radius secret

Our current authenticated user (let’s assume it’s also “name=william”), would only have the first two rules apply. As we search through the candidate entries, the “all users” rule would match our entry, which means class is added to the allowed set. Then since william is memberof admins, they also have read to name, and displayname. Since the target entry is class=account then name and displayname are also added to the allowed set. But since william is not a member of radius_servers, we don’t get to read radius secrets.

At this point the entry set is reduced to the set of entries the user was able to have applied filter tests too, and is returned.

The query server then unwinds to search_ext where the second stage of access controls is now checked. This calls search_filter_entry_attributes which is responsible for changing an entry in memory to remove content that the user may not see. A key difference is this line:

Again, the set of related access controls is generated, and then applied to each entry to determine if they are in scope. This builds a set of “attributes the user can see, per entry”. This is then applied to the entry to reduction function, which removes any attribute not in the allowed set.

e.reduce_attributes(&allowed_attrs)

A clear example is when you attempt to view yourself vs when you view another persons account as there are permissions over self that exist, which do not apply to others. You may view your own legalname field, but not the legalname of another person.

The entry set is finally returned and turned into a JSON entry for transmission to the client. Hooray!

Conclusion

There is a lot that goes into a query being processed in a database. But like all things in computing since it was created by a person, any other person must be able to understand it. It’s always amazing that this whole process can be achieved in fractions of a second, in parallel, and so reliably.

There is so much more involved in this process too. The way that a write operation is performed to extract correct index values, the way that the database reloads the access control cache based on changes, and even how the schema is loaded and constructed. Ontop of all this, a complete identity management stack is built that can allow authentication through wireless, machines, ssh keys and more.

If you are interested more in databases and Kanidm please get in contact!

Using SUSE Leap Enterprise with Docker

Posted by William Brown on August 25, 2020 02:00 PM

Using SUSE Leap Enterprise with Docker

It’s a little bit annoying to connect up all the parts for this. If you have a SLE15 system then credentials for SCC are automatically passed into containers via secrets.

But if you are on a non-SLE base, like myself with MacOS or OpenSUSE you’ll need to provide these to the container in another way. The documentation is a bit tricky to search and connect up what you need but in summary:

  • Get /etc/SUSEConnect and /etc/zypp/credentials.d/SCCcredentials from an SLE install that has been registered. The SLE version does not matter.
  • Mount them into the image:
docker ... -v /scc/SUSEConnect:/etc/SUSEConnect \
    -v /scc/SCCcredentials:/etc/zypp/credentials.d/SCCcredentials \
    ...
    registry.suse.com/suse/sle15:15.2

Now you can use the images from the SUSE registry. For example docker pull registry.suse.com/suse/sle15:15.2 and have working zypper within them.

If you want to add extra modules to your container (you can list what’s available with container-suseconnect from an existing SLE container of the same version), you can do this by adding environment variables at startup. For example, to add dev tools like gdb:

docker ... -e ADDITIONAL_MODULES=sle-module-development-tools \
    -v /scc/SUSEConnect:/etc/SUSEConnect \
    -v /scc/SCCcredentials:/etc/zypp/credentials.d/SCCcredentials \
    ...
    registry.suse.com/suse/sle15:15.2

This also works during builds to add extra modules.

HINT: SUSEConnect and SCCcredentials and not version dependent so will work in any image version.

Windows Hello in Webauthn-rs

Posted by William Brown on August 23, 2020 02:00 PM

Windows Hello in Webauthn-rs

Recently I’ve been working again on webauthn-rs, as a member of the community wants to start using it in production for a service. So far the development of the library has been limited to the test devices that I own, but now this pushes me toward implementing true fido compliance.

A really major part of this though was that a lot of their consumers use windows, which means support windows hello.

A background on webauthn

Webauthn itself is not a specification for the cryptographic operations required for authentication using an authenticator device, but a specification that wraps other techniques to allow a variety of authenticators to be used exposing their “native” features.

The authentication side of webauthn is reasonably simple in this way. The server stores a public key credential associated to a user. During authentication the server provides a challenge which the authenticator signs using it’s private key. The server can then verify using it’s copy of the challenge, and the public key that the authentication must have come from that credentials. Of course like anything there is a little bit of magic in here around how the authenticators store credentials that allows other properties to be asserted, but that’s beyond the scope of this post.

The majority of the webauthn specification is around the process of registering credentials and requesting specific properties to exist in the credentials. Some of these properties are optional hints (resident keys, authenticator attachment) and some of these properties are enforced (user verification so that the credential is a true MFA). Beyond these there is also a process for the authenticator to provide information about it’s source and trust. This process is attestation and has multiple different formats and details associated.

It’s interesting to note that for most deployments of webauthn, attestation is not required by the attestation conveyance preference, and generally provides little value to these deployments. For many sites you only need to know that a webauthn authenticator is in use. However attestation allows environments with strict security requirements to verify and attest the legitimacy of, and make and model of authenticators in use. (An interesting part of webauthn is how much of it seems to be Google and Microsoft internal requirements leaking into a specification, just saying).

This leads to what is effectively, most of the code in webauthn-rs - attestation.rs.

Windows Hello

Windows Hello is Microsoft’s equivalent to TouchID on iOS. Using a Trusted Platform Module (TPM) as a tamper-resistant secure element, it allows devices such as a Windows Surface to perform cryptographic operations. As Microsoft is attempting to move to a passwordless future (which honestly, I’m on board for and want to support in Kanidm), this means they want to support Webauthn on as many of their devices as possible. Microsoft even defines in their hardware requirements for Windows 10 Home, Pro, Education and Enterprise that as of July 28, 2016, all new device models, lines or series … a component which implements the TPM 2.0 must be present and enabled by default from this effective date.. This is pretty major as this means that slowly MS have been ensuring that all consumer and enterprise devices are steadily moving to a point where passwordless is a viable future. Microsoft state that they use TPMv2 for many reasons, but a defining one is: The TPM 1.2 spec only allows for the use of RSA and the SHA-1 hashing algorithm which is now considered broken.

Of course, if you have noticed this means that TPM’s are involved. Webauthn supports a TPM attestation path, and that means I have to implement it.

Once more into the abyss

Reading the Webauthn spec for TPM attestation it pointed me to the TPMv2.0 specification part1, part2 and part3. I will spare you from this as there is a sum total of 861 pages between these documents, and the Webauthn spec while it only references a few parts, manages to then create a set of expanding references within these documents. To make it even more enjoyable, text search is mostly broken in these documents, meaning that trying to determine the field contents and types involves a lot of manual-eyeball work.

TPM’s structures are packed C structs which means that they can be very tricky to parse. They use u16 identifiers to switch on unions, and other fun tricks that we love to see from C programs. These u16’s often have some defined constants which are valid choices, such as TPM_ALG_ID, which allows switching on which cryptographic algorithms are in use. Some stand out parts of this section were as follows.

Unabashed optimism:

TPM_ALG_ERROR 0x0000 // Should not occur

Broken Crypto

TPM_ALG_SHA1 0x0004 // The SHA1 Algorithm

Being the boomer equivalent of JWT

TPM_ALG_NULL 0x0010 // The NULL Algorithm

And supporting the latest in modern cipher suites

TPM_ALG_XOR 0x000A // TCG TPM 2.0 library specification - the XOR encryption algorithm.

ThE XOR eNcRyPtIoN aLgoRitHm.

Some of the structures are even quite fun to implement, such as TPMT_SIGNATURE, where a matrix of how to switch on it is present where the first two bytes when interpreted as a u16, define a TPM_ALG_ID where, if it the two bytes are not in a set of the TPM_ALG_ID then the whole blob including leading two bytes is actually just a blob of hash. It would certainly be unfortunate if in the interest of saving two bytes that my hash accidentally emited data where the first two bytes were accidentally a TPM_ALG_ID causing a parser to overflow.

I think the cherry on all of this though, is that despite Microsoft requiring TPMv2.0 to move away from RSA and SHA-1, that when I checked the attestation signatures for a Windows Hello device I had to implement the following:

COSEContentType::INSECURE_RS1 => {
    hash::hash(hash::MessageDigest::sha1(), input)
        .map(|dbytes| Vec::from(dbytes.as_ref()))
        .map_err(|e| WebauthnError::OpenSSLError(e))
}

Conclusion

Saying this, I’m happy that Windows Hello is now in Webauthn-rs. The actual Webauthn authentication flows DO use secure algorithms (RSA2048 + SHA256 and better), it is only in the attestation path that some components are signed by SHA1. So please use webauthn-rs, and do use Windows Hello with it!

User gesture is not detected - using iOS TouchID with webauthn-rs

Posted by William Brown on August 11, 2020 02:00 PM

User gesture is not detected - using iOS TouchID with webauthn-rs

I was recently contacted by a future user of webauthn-rs who indicated that the library may not currently support Windows Hello as an authenticator. This is due to the nature of the device being a platform attached authenticator and that webauthn-rs at the time did not support attachment preferences.

As I have an ipad, and it’s not a primary computing device I decided to upgrade to iPadOS 14 beta to try out webauthn via touch (and handwriting support).

The Issue

After watching Jiewen’s WWDC presentation about using TouchID with webauthn, I had a better idea about some of the server side requirements to work with this.

Once I started to test though, I would recieve the following error in the safari debug console.

User gesture is not detected. To use the platform authenticator,
call 'navigator.credentials.create' within user activated events.

I was quite confused by this error - a user activated event seems to be a bit of an unknown term, and asking other people they also didn’t quite know what it meant. My demo site was using a button input with onclick event handlers to call javascript similar to the following:

function register() {
fetch(REG_CHALLENGE_URL + username, {method: "POST"})
   .then(res => {
      ... // error handling
   })
   .then(res => res.json())
   .then(challenge => {
     challenge.publicKey.challenge = fromBase64(challenge.publicKey.challenge);
     challenge.publicKey.user.id = fromBase64(challenge.publicKey.user.id);
     return navigator.credentials.create(challenge)
       .then(newCredential => {
         console.log("PublicKeyCredential Created");
              ....
         return fetch(REGISTER_URL + username, {
           method: "POST",
           body: JSON.stringify(cc),
           headers: {
             "Content-Type": "application/json",
           },
         })
       })

This works happily in Firefox and Chrome, and for iPadOS it event works with my yubikey 5ci.

I investigated further to determine if the issue was in the way I was presenting the registration to the navigator.credentials.create function. Comparing to webauthn.io (which does work with TouchID on iPadOS 14 beta), I noticed some subtle differences but nothing that should cause an issue like this.

After much pacing, thinking and asking for help I eventually gave in and went to the source of webkit

The Solution

Reading through the webkit source I noted that the check within the code was looking for association of how the event was initiated. This comes from a context that is available within the browser. This got me to think about the fact that the fetch api is async, and I realised at this point that webauthn.io was using the jQuery.ajax apis. I altered my demo to use the same, and it began to work with TouchID. That meant that the user activation was being lost over the async boundary to the fetch API. (note: it’s quite reasonable to expect user interaction to use navigator.credentials to prevent tricking or manipulating users into activating their webauthn devices).

I emailed Jiewen, who responded overnight and informed me that this is an issue, and it’s being tracked in the webkit bugzilla . He assures me that it will be resolved in a future release. Many thanks to him for helping me with this issue!

At this point I now know that TouchID will work with webauthn-rs, and I can submit some updates to the library to help support this.

Notes on Webauthn with TouchID

It’s worth pointing out a few notes from the WWDC talk, and the differences I have observed with webauthn on real devices.

In the presentation it is recommended that in your Credential Creation Options, that you (must?) define the options listed to work with TouchID

authenticatorSelection: { authenticatorAttachment: "platform" },
attestation: "direct"

It’s worth pointing out that authenticatorAttachment is only a hint to the client to which credentials it should use. This allows your web page to streamline the UI flow (such as detection of platform key and then using that to toggle the authenticatorAttachment), but it’s not an enforced security policy. There is no part of the attestation response that indicates the attachement method. The only way to determine that the authenticator is a platform authenticator would be in attestation “direct” to validate the issuing CA or the device’s AAGUID match the expectations you hold for what authenticators can be used within your organisation or site.

Additionally, TouchID does work with no authenticatorAttachment hint (safari prompts if you want to use an external key or TouchID), and that attestation: “none” also works. This means that a minimised and default set of Credential Creation Options will allow you to work with the majority of webauthn devices.

Finally, the WWDC video glosses over the server side process. Be sure to follow the w3c standard for verifying attestations, or use a library that implementes this standard (such as webauthn-rs or duo-labs go webauthn). I’m sure that other libraries exist, but it’s critical that they follow the w3c process as webauthn is quite complex and fiddly to implement in a correct and secure manner.

docker buildx for multiarch builds

Posted by William Brown on August 05, 2020 02:00 PM

docker buildx for multiarch builds

I have been previously building Kanidm with plain docker build, but recently a community member wanted to be able to run kanidm on arm64. That meant that I needed to go down the rabbit hole of how to make this work …

What not to do …

There is a previous method of using manifest files to allow multiarch uploads. It’s pretty messy but it works, so this is an option if you want to investigate but I didn’t want to pursue it.

Bulidx exists and I got it working on my linux machine with the steps from here but the build took more than 3 hours, so I don’t recommend it if you plan to do anything intense or frequently.

Buildx cluster

Docker has a cross-platform building toolkit called buildx which is currently tucked into the experimental features. It can be enabled on docker for mac in the settings (note: you only need experimental support on the coordinating machine aka your workstation).

Rather than follow the official docs this will branch out. The reason is that buildx in the official docs uses qemu-aarch64 translation which is very slow and energy hungry, taking a long time to produce builds. As mentioned already I was seeing in excess of 3 hours for aarch64 on my builder VM or my mac.

Instead, in this configuration I will use my mac as a coordinator, and an x86_64 VM and a rock64pro as builder nodes, so that the builds are performed on native architecture machines.

First we need to configure our nodes. In /etc/docker/daemon.json we need to expose our docker socket to our mac. I have done this with the following:

{
  "hosts": ["unix:///var/run/docker.sock", "tcp://0.0.0.0:2376"]
}

WARNING: This configuration is HIGHLY INSECURE. This exposes your docker socket to the network with no authentication, which is equivalent to un-authenticated root access. I have done this because my builder nodes are on an isolated and authenticated VLAN of my home network. You should either do similar or use TLS authentication.

NOTE: The ssh:// transport does not work for docker buildx. No idea why but it don’t.

Once this is done restart docker on the two builder nodes.

Now we can configure our coordinator machine. We need to check buildx is present:

docker buildx --help

We then want to create a new builder instance and join our nodes to it. We can use the DOCKER_HOST environment variable for this:

DOCKER_HOST=tcp://x.x.x.x:2376 docker buildx create --name cluster
DOCKER_HOST=tcp://x.x.x.x:2376 docker buildx create --name cluster --append

We can then startup and bootstrap the required components with:

docker buildx use cluster
docker buildx inspect --bootstrap

We should see output like:

Name:   cluster
Driver: docker-container

Nodes:
Name:      cluster0
Endpoint:  tcp://...
Status:    running
Platforms: linux/amd64, linux/386

Name:      cluster1
Endpoint:  tcp://...
Status:    running
Platforms: linux/arm64, linux/arm/v7, linux/arm/v6

If we are happy with this we can make this the default builder.

docker buildx use cluster --default

And you can now use it to build your images such as:

docker buildx build --push --platform linux/amd64,linux/arm64 -f Dockerfile -t <tag> .

Now I can build my multiarch images much quicker and efficently!

Developer Perspective on Docker

Posted by William Brown on July 12, 2020 02:00 PM

Developer Perspective on Docker

A good mate of mine Ron Amosa put a question up on twitter about what do developers think Docker brings to the table. I’m really keen to see what he has to say (his knowledge of CI/CD and Kubernetes is amazing by the way!), but I thought I’d answer his question from my view as a software engineer.

Docker provides resource isolation and management to applications

Lets break that down.

What is a resource? What is an application?

It doesn’t matter what kind of application we write: A Rust command line tool, an embedded database in C, or a webserver or website with Javascript. Every language and that program requires resources to run. Let’s focus on a Python webserver for this thought process.

Our webserver (which is an application) requires a lot of things to be functional! It needs to access a network to open listening sockets, it needs access to a filesystem to read pages or write to a database (like sqlite). It needs CPU time to process requests, and memory to create a stack/heap to work through those requests. But as well our application also needs to be seperated and isolated from other programs too, so that they can not disclose our data - but so that faults in our application do not affect other services. It probably needs a seperate user and group, which is a key idea in unix process isolation and security. Maybe also there are things like SELinux or AppArmor that also provide extra enhancements.

But why stop here there are many more. We might need system controls (sysctls) that define networking stack behaviour like how TCP performs. We may need specific versions of python libraries for our application. Perhaps we also want to limit the system calls that our python application can perform to our OS.

I hope we can see that the resources we have, really is more than simply CPU and Memory here! Every application is really quite involved.

A short view back to the past …

In the olden days, as developers we had to be responsible for these isolations. For example on a system, we’d have to select a bind address so that we could be configured to only use a single network device for listening on. This not only meant that our applications had to support this behaviour, but that a person had to read our documentation, and find out how to configure that behaviour to isolate the networking resource.

And of course many others. To limit the amount of CPU or RAM that was available required you to configure ulimits for the user, and to select which user was going to run our application.

Many problems have been seen too with a language like python where libraries are not isolated and there are conflicts between which version different applications require. Is it the fault of python? The application developer? It’s hard to say …

What about system calls? With an interpretted language like python, you can’t just set the capabilities flags or other hardening options because they have to be set on the interpretter (python) which is used in many places. An example where the resource (python) is shared between many applications preventing us from creating isolated python runtimes.

Even things like SELinux and AppArmor required complex, hand created profiles, that were cryptic at best, or led to being disabled in the common case (It can’t be secure if it’s not usable! People will always take the easy option …).

And that’s even before we look at init scripts - bash scripts that had to be invoked in careful ways, and were all hand rolled, each adding different mistakes or issues. It was a time where to “package” and application and deploy it, required huge amounts of knowledge of a broad range of topics.

In many cases, I have seen another way this manifested. Rather than isolated applications (which was too hard), every application was installed on a dedicated virtual machine. The resource management then came as an artifact of every machine being seperate and managed by a hypervisor.

Systemd

Along came systemd though, and it got us much further. It provided consistent application launch tools, and has done a lot of work to manage and provide resource management such as cgroups (cpu, mem), dynamic users, some types of filesystem isolation and some more. Systemd as an init system has done some really good stuff.

But still problems exist. Applications still require custom SELinux or AppArmor profiles, and systemd can’t help managing network interfaces (that still falls on the application).

It also still relies on you to put the files in place, or a distribution package to get the file content into the system.

Docker

Docker takes this even further. Docker manages and arbitrates every resource your application requires, even the filesystem and install process. For example, a very complex topic like CPU or memory limit’s on Linux, becomes quite simple in docker which exposes CPU and memory tunables. Docker allows sysctls per container. You assign and manage storage.

docker run -v db:/data/db -v config:/data/config --network private_net \
    --memory 1024M --shm-size 128M -P 80:8080 --user isolated \
    my/application:version

From this command we can see what resources are involved. We know that we mount two storage locations. We can see that we confine the network to a private network, and that we want to limit memory to 1024M. We also can see we’ll be listening on port 80 which remaps to the container internally. We even know what user we’ll run as so we can assign permissions to the volumes. Our application is also defined as is it’s version.

Not only can we see what resources we are using there are a lot of other benefits. Docker can dynamically generate selinux/apparmor isolation profiles, so we get stronger process isolation between containers and host processes. We know that the filesystem of this container is isolated from others, so they can have and bundle the correct versions of dependencies. We know how to start, stop, and even monitor the application. It can even have health checks. Logs will (should?) go to stdout/err which docker will forward to a log collector we can define. In the future each application may even be in it’s own virtual memory space (IE seperate vm’s).

Docker provides a level of isolation to resources that is still hard to achieve in systemd, and not only that it makes very advanced or complex configurations easy to access and use. Accesibility of these features is vitally important to allow all people to create robust applications in their environments. But it also allows me as a developer to know what resources can exist in the container and how to interact with these in a way that will respect the wishes of the deploying administrator.

Docker Isn’t Security Isolation

It’s worth noting that while Docker can provide SELinux and AppArmor profiles, Docker is not an effective form of security isolation. It certainly makes the bar much much higher than before yes! And that’s great! And I hope that bar continues to rise. However today we do live in an age where there are many attacks still locally on linux kernels and the fix delay in these is still long. We also still see CPU sidechannels, and these will never be resolved while we rely on asynchronous CPU behaviour.

If you have high value data, it is always best to have seperated physical machines for these applications, and to always patch frequently, have a proper CI/CD pipeline, and centralised logging, and much much more. Ask your security team! I’m sure they’d love to help :)

Conclusion

For me personally, docker is about resource management and isolation. It helps me to define an interface that an admin can consume and interact with, making very advanced concepts easy to use. It gives me trust that applications will run in a way that is isolated and known all the way from development and testing through to production under high load. By making this accessible, it means that anyone - from a single docker container to a giant kubernetes cluster, can have really clear knowledge of how their applications are operating.

virt-manager missing pci.ids usb.ids macos

Posted by William Brown on June 14, 2020 02:00 PM

virt-manager missing pci.ids usb.ids macos

I got the following error:

/usr/local/Cellar/libosinfo/1.8.0/share/libosinfo/pci.ids No such file or directory

This appears to be an issue in libosinfo from homebrew. Looking at the libosinfo source, there are some aux download files. You can fix this with:

mkdir -p /usr/local/Cellar/libosinfo/1.8.0/share/libosinfo/
cd /usr/local/Cellar/libosinfo/1.8.0/share/libosinfo/
wget -q -O pci.ids http://pciids.sourceforge.net/v2.2/pci.ids
wget -q -O usb.ids http://www.linux-usb.org/usb.ids

All is happy again with virt-manager

Resolving AirPlayXPCHelper Perr NULL kCanceledErr with Apple TV and MacOS

Posted by William Brown on May 02, 2020 02:00 PM

Resolving AirPlayXPCHelper Perr NULL kCanceledErr with Apple TV and MacOS

I decided to finally get an Apple TV so that I could use my iPad and MacBook Pro to airplay to my projector. So far I’ve been really impressed by it and how well it works with modern amplifiers and my iPad.

Sadly though, when I tried to use my MacBook pro to airplay to the Apple TV I recieved an “Unable to connect” error, with no further description.

Initial Research

The first step was to look in console.app at the local system logs. The following item stood out:

error 09:24:41.459722+1000 AirPlayXPCHelper ### Error: CID 0xACF10006, Peer NULL, -6723/0xFFFFE5BD kCanceledErr

I only found a single result on a search for this, and they resolved the problem by disabling their MacOS firewall - attempting this myself did not fix the issue. There are also reports of apple service staff disabling the firewall to resolve airplay problems too.

Time to Dig Further …

Now it was time to look more. To debug an Apple TV you need to connect a USB-C cable to it’s service port on the rear of the device, while you connect this to a Mac on the other side. Console.app will then show you the streamed logs from the device.

While looking on the Apple TV I noticed the following log item:

[AirPlay] ### [0x8F37] Set up session 16845584210140482044 with [<ipv6 address>:3378]:52762 failed: 61/0x3D ECONNREFUSED {
"timingProtocol" : "NTP",
"osName" : "Mac OS X",
...
"isScreenMirroringSession" : true,
"osVersion" : "10.15.4",
"timingPort" : 64880,
...
}

I have trimmed this log, as most details don’t matter. What is important is that it looks like the Apple TV is attempting to back-connect to the MacBook Pro, which has a connection refused. From iOS it appears that the video/timing channel is initiated from the iOS device, so no back-connection is required, but for AirPlay to work from the MacBook Pro to the Apple TV, the Apple TV must be able to connect back on high ports with new UDP/TCP sessions for NTP to synchronise clocks.

My Network

My MacBook pro is on a seperate VLAN to my Apple TV for security reasons, mainly because I don’t want most devices to access management consoles of various software that I have installed. I have used the Avahi reflector on my USG to enable cross VLAN discovery. This would appear to be issue, is that my firewall is not allowing the NTP traffic back to my MacBook pro.

To resolve this I allowed some high ports from the Apple TV to connect back to the VLAN my MacBook Pro is on, and I allowed built-in software to recieve connections.

Once this was done, I was able to AirPlay across VLANs to my Apple TV!

Building containers on OBS

Posted by William Brown on April 19, 2020 02:00 PM

Building containers on OBS

My friend showed me how to build containers in OBS, the opensuse build service. It makes it really quite nice, as the service can parse your dockerfile, and automatically trigger rebuilds when any package dependency in the chain requires a rebuild.

The simplest way is to have a seperate project for your containers to make the repository setup a little easier.

When you edit the project metadata, if the project doesn’t already exist, a new one is created. So we can start by filling out the template from the command:

osc meta prj -e home:firstyear:containers

This will give you a template: We need to add some repository lines:

<project name="home:firstyear:apps">
  <title>Containers Demo</title>
  <description>Containers Demo</description>
  <person userid="firstyear" role="bugowner"/>
  <person userid="firstyear" role="maintainer"/>
  <build>
    <enable/>
  </build>
  <publish>
    <enable/>
  </publish>
  <debuginfo>
    <enable/>
  </debuginfo>
  <!-- this repository -->
  <repository name="containers">
    <path project="openSUSE:Templates:Images:Tumbleweed" repository="containers"/>
    <arch>x86_64</arch>
  </repository>
</project>

Remember, to set the publist to “enable” if you want the docker images you build to be pushed to the registry!

Now that that’s done, we can check out the project, and create a new container package within.

osc co home:firstyear:containers
cd home:firstyear:containers
osc mkpac mycontainer

Now in the mycontainer folder you can start to build a container. Add your dockerfile:

#!BuildTag: mycontainer
#
# docker pull registry.opensuse.org/home/firstyear/apps/containers/mycontainer:latest
#                                   ^projectname        ^repos     ^ build tag
FROM opensuse/tumbleweed:latest

#
# only one zypper ar command per line. only repositories inside the OBS are allowed
#
RUN zypper ar http://download.opensuse.org/repositories/home:firstyear:apps/openSUSE_Tumbleweed/ "home:firstyear:apps"
RUN zypper mr -p 97 "home:firstyear:apps"
RUN zypper --gpg-auto-import-keys ref
RUN zypper install -y vim-data vim python3-ipython shadow python3-praw
# Then the rest of your container as per usual ...

Then to finish up, you can commit this:

osc add Dockerfile
osc ci
osc results

APFS (why is df showing me funny numbers?!)

Posted by William Brown on March 27, 2020 02:00 PM

APFS (why is df showing me funny numbers?!)

Apple’s APFS has been the default for MacOS since High Sierra, where SSD (flash) automatically would convert from HFS+. This is a god send, especially with HFS+’s history of destroying any folder that has a large number of inodes within it.

However, APFS behaves differently to previous filesystem technology. Let’s see if we can explain why df reports multiple 932Gi disks like this:

> df -h
Filesystem                             Size   Used  Avail Capacity    iused      ifree %iused  Mounted on
/dev/disk1s5                          932Gi   10Gi  380Gi     3%     484322 9767493838    0%   /
/dev/disk1s1                          932Gi  530Gi  380Gi    59%    2072480 9765905680    0%   /System/Volumes/Data

And if we can explain why when you delete large files, you don’t get any space back from df either.

How it looked with HFS+

With HFS+ it was pretty simple. You had a disk (a block device), which had partitions (slices of the space in the block device) and those partitions were formatted with a filesystem that knew how to store data in them. An example:

> diskutil list
...
/dev/disk2 (external, physical):
   #:                       TYPE NAME                    SIZE       IDENTIFIER
   0:      GUID_partition_scheme                         *1.0 TB     disk2
   1:                        EFI EFI                     209.7 MB   disk2s1
   2:                  Apple_HFS tmachine                999.9 GB   disk2s2

We can see that disk2 is 1.0TB in size, and it contains two partitions, the first is 209.7MB for EFI (disk2s1) and the second has data and formatted as HFS+ (disk2s2).

Of course, this has some drawbacks - partitions don’t like being moved, and filesystem resizing is a costly process of time and IO cycles. It’s quite inflexible. If you wanted another partition here for read only data, well, you’d have to change a lot. Properties can only be applied to a filesystem as a whole, and they can’t share space. If you had a 1TB drive partitioned to 500GB each, and were running low on space on one of them, well … good luck! You have to move data manually, or change where applications store data.

APFS

APFS doesn’t quite follow this model though. APFS is what’s called a volume based filesystem. That means there is an intermediate layer in here. The layout looks like this in diskutil

> diskutil list
...
/dev/disk0 (internal, physical):
   #:                       TYPE NAME                    SIZE       IDENTIFIER
   0:      GUID_partition_scheme                        *1.0 TB     disk0
   1:                        EFI EFI                     314.6 MB   disk0s1
   2:                 Apple_APFS Container disk1         1.0 TB     disk0s2

So our disk0 looks like before - an EFI partition, and a very large APFS container. However the container itself is NOT the filesystem. The contain is a pool of storage that APFS volumes are created into. We can see the volumes too.

> diskutil list
...
/dev/disk1 (synthesized):
   #:                       TYPE NAME                    SIZE       IDENTIFIER
   0:      APFS Container Scheme -                      +1.0 TB     disk1
                                 Physical Store disk0s2
   1:                APFS Volume Macintosh HD — Data     569.4 GB   disk1s1
   2:                APFS Volume Preboot                 81.8 MB    disk1s2
   3:                APFS Volume Recovery                526.6 MB   disk1s3
   4:                APFS Volume VM                      10.8 GB    disk1s4
   5:                APFS Volume Macintosh HD            11.0 GB    disk1s5

Notice how /dev/disk1 is “synthesized”? It’s not real - it’s there to “trick” legacy tools into thinking that the container is a “block” device and the volumes are “partitions”.

Benefits of Volumes

One of the immediate benefits is that unlike partitions, in a volume filesystem, all the space of the underlying container (also known as: pool, volume group) is available to all volumes at anytime. Because the volumes are a flexible concept, they can have non-contiguous geometry on the disk (unlike a partition). That’s why in your df output you can see:

> df -h
Filesystem                             Size   Used  Avail Capacity    iused      ifree %iused  Mounted on
/dev/disk1s5                          932Gi   10Gi  380Gi     3%     484322 9767493838    0%   /
/dev/disk1s1                          932Gi  530Gi  380Gi    59%    2072480 9765905680    0%   /System/Volumes/Data

Both disk1s5 (Macintosh HD) and disk1s1 (Macintosh HD — Data) are APFS volumes. The container has 932Gi total space, and 380Gi available in the container which either volume could allocate. But you can also see the exact space reservation of each volume too: disk1s5 only has 10Gi in use, and disk1s1 has 530Gi in use.

It would be very possible for disk1s1 to grow to fill all the space, and then to contract, and then have disk1s5 grow to take all the space and contract - this is because the space is flexibly allocated from the container. Neat!

Each volume also can have different properties applied. For example, /dev/disk1s5 (Macintosh HD) in MacOS catalina is read-only:

/dev/disk1s5 on / (apfs, local, read-only, journaled)
/dev/disk1s1 on /System/Volumes/Data (apfs, local, journaled, nobrowse)

This is to prevent system tampering, and strengthen integrity of the system. There are a number of tricks to achieve this such as overlaying multiple volumes together. /Applications for example is actually a folder consitituted from the content of /System/Applications and /System/Volumes/Data/Applications. Anytime you “drag” and application to /Applications, you are actually putting it into /System/Volumes/Data/Applications. A very similar property holds for /Users (/System/Volumes/Data/Users), and even /Volumes.

Copy-on-Write, Snapshots

APFS is also a copy-on-write filesystem. This means whenever you write data, it’s actually written to newly allocated disk regions, and the pointers are atomicly flipped to it. The full write occurs or it does not. This is part of the reason why APFS is so much better than HFS+ - in a crash your data is either in a previous state, or the new state - never a half written or corrupted state.

This is the reason why APFS is only used on SSD (flash) devices - COW is very random IO write intensive, and on a rotational disk this would cause the head to “seek” randomly which would make both writes and reads very slow. SSD of course isn’t affected by this, so having a highly fragmented file does not impose a penalty in the same way.

Copy-on-Write however opens up some interesting behaviours. If you COW a file, but never remove the old version, you have a snapshot. This means you can have point-in-time views to how a filesystem was. This is actually used now by time machine during backups to ensure the content of a backup is stable before being written to the external backup media. It also allow time machine to perform “backups” while you are out-and-about, by snapshotting as you work. Because snapshots are just “not removing old data” they are low overhead to maintain and take snapshots.

You can see snapshots on your system with:

> tmutil listlocalsnapshots /
Snapshots for volume group containing disk /:
com.apple.TimeMachine.2020-03-27-084939.local
com.apple.TimeMachine.2020-03-27-100157.local
com.apple.TimeMachine.2020-03-27-105937.local
com.apple.TimeMachine.2020-03-27-121414.local
...

You can even take your own snapshots if you want!

> time tmutil localsnapshot
Created local snapshot with date: 2020-03-28-091943
tmutil localsnapshot  0.01s user 0.01s system 4% cpu 0.439 total

See how fast that is! Remember also because this is based on copy-on-write, the snapshots only take as much data as the differences, or what you are changing as you work.

Space Reclaim

This leads to the final point of confusion - when people delete files to clear space, but df reports no change. For example:

> df -h
Filesystem                             Size   Used  Avail Capacity    iused      ifree %iused  Mounted on
/dev/disk1s1                          932Gi  530Gi  380Gi    59%    2072480 9765905680    0%   /System/Volumes/Data
> ls -alh Downloads/Windows_Server_2016_Datacenter_EVAL_en-us_14393_refresh.ISO
-rwx------@ 1 william  staff   6.5G 10 Oct  2018 Downloads/Windows_Server_2016_Datacenter_EVAL_en-us_14393_refresh.ISO
> rm Downloads/Windows_Server_2016_Datacenter_EVAL_en-us_14393_refresh.ISO
> df -h
Filesystem                             Size   Used  Avail Capacity    iused      ifree %iused  Mounted on
/dev/disk1s1                          932Gi  530Gi  380Gi    59%    2072479 9765905681    0%   /System/Volumes/Data

Now I promise, I really did delete the file - check the “iused” and “ifree” columns. But also note that the “Used” space didn’t change? Surely we should expect to see this value drop to 523Gi since I removed a 6.5G file.

Remember that APFS is a voluming filesystem, with copy-on-write. Due to snapshots, the space used in a volume is the sum of active data and snapshotted data. This means that when you are removing a file you are removing it from the volume at this point in time, but it may still exist in snapshots that exist in the volume! That’s why there is a reduction in the iused/ifree (an inode pointer was removed) but no change in the space (the file still exists in a snapshot).

During normal operation, provided there is sufficent freespace, you won’t actually notice this behaviour. But when you say … have not a lot of space left (maybe 10G), and you delete some files to import something (say a 40G import), you try the copy again … and it fails! Drat! But you wait a bit and suddenly it works? What in heck happened?

In the background, MacOS has registered “okay, the user demands at least 30G more space to complete this task. Let’s clean snapshots until we have that much space available”. The snapshots are pruned so when you come back later, suddenly you have the space.

Again, you can actually do this yourself. tmutil has a command “thinlocalsnapshots” for this. An example usage would be:

> tmutil thinlocalsnapshots /System/Volumes/Data [bytes required]
Thinned local snapshots:

In my case I have a lot of space available, so no snapshots are pruned. But you may find that multiple snapshots are removed in this process!

Conclusion

APFS is actually a really cool piece of filesystem technology, and I think has made MacOS one of the most viable platforms for reliable daily use. It embraces many great ideas, and despite it’s youth, has done really well. But those new ideas conflict with legacy, and have some behaviours that are not always clearly exposed on shown to you, the user. Understanding those behaviours means we can see why our computers are behaving in certain - sometimes unexpected - ways.

389ds in containers

Posted by William Brown on March 27, 2020 02:00 PM

389ds in containers

I’ve spent a number of years working in the background to get 389-ds working in containers. I think it’s very close to production ready (one issue outstanding!) and I’m now using it at home for my production LDAP needs.

So here’s a run down on using 389ds in a container!

Getting it Started

The team provides an image for pre-release testing which you can get with docker pull:

docker pull 389ds/dirsrv:latest
# OR, if you want to be pinned to the 1.4 release series.
docker pull 389ds/dirsrv:1.4

The image can be run in an ephemeral mode (data will be lost on stop of the container) so you can test it:

docker run 389ds/dirsrv:1.4

Making it Persistent

To make your data persistent, you’ll need to add a volume, and bind it to the container.

docker volume create 389ds

You can run 389ds where the container instance is removed each time the container stops, but the data persists (I promise this is safe!) with:

docker run --rm -v 389ds:/data -p 3636:3636 389ds/dirsrv:latest

Check your instance is working with an ldapsearch:

LDAPTLS_REQCERT=never ldapsearch -H ldaps://127.0.0.1:3636 -x -b '' -s base vendorVersion

NOTE: Setting the environment variable `LDAPTLS_REQCERT` to `never` disables CA verification of the LDAPS connection. Only use this in testing environments!

If you want to make the container instance permanent (uses docker start/stop/restart etc) then you’ll need to do a docker create with similar arguments:

docker create  -v 389ds:/data -p 3636:3636 389ds/dirsrv:latest
docker ps -a
CONTAINER ID        IMAGE                 ...  NAMES
89b342c2e058        389ds/dirsrv:latest   ...  adoring_bartik

Remember, even if you rm the container instance, the volume stores all the data so you can re-pull the image and recreate the container and continue.

Administering the Instance

The best way is to the use the local LDAPI socket - by default the cn=Directory Manager password is randomised so that it can’t be accessed remotely.

To use the local LDAPI socket, you’ll use docker exec into the running instance.

docker start <container name>
docker exec -i -t <container name> /usr/sbin/dsconf localhost <cmd>
docker exec -i -t <container name> /usr/sbin/dsconf localhost backend suffix list
No backends

In a container, the instance is always named “localhost”. So lets add a database backend now to our instance:

docker exec -i -t <cn> /usr/sbin/dsconf localhost backend create --suffix dc=example,dc=com --be-name userRoot
The database was sucessfully created

You can even go ahead and populate your backend now. To make it easier, specify your basedn into the volume’s /data/config/container.inf. Once that’s done we can setup sample data (including access controls), and create some users and groups.

docker exec -i -t <cn> /bin/sh -c "echo -e '\nbasedn = dc=example,dc=com' >> /data/config/container.inf"
docker exec -i -t <cn> /usr/sbin/dsidm localhost initialise
docker exec -i -t <cn> /usr/sbin/dsidm localhost user create --uid william --cn william \
    --displayName William --uidNumber 1000 --gidNumber 1000 --homeDirectory /home/william
docker exec -i -t <cn> /usr/sbin/dsidm localhost group create --cn test_group
docker exec -i -t <cn> /usr/sbin/dsidm localhost group add_member test_group uid=william,ou=people,dc=example,dc=com
docker exec -i -t <cn> /usr/sbin/dsidm localhost account reset_password uid=william,ou=people,dc=example,dc=com
LDAPTLS_REQCERT=never ldapwhoami -H ldaps://127.0.0.1:3636 -x -D uid=william,ou=people,dc=example,dc=com -W
    Enter LDAP Password:
    dn: uid=william,ou=people,dc=example,dc=com

There is much more you can do with these tools of course, but it’s very easy to get started and working with an ldap server like this.

Further Configuration

Because this runs in a container, the approach to some configuration is a bit different. Some settings can be configured through either the content of the volume, or through environment variables.

You can reset the directory manager password on startup but use the environment variable DS_DM_PASSWORD. Of course, please use a better password than “password”. pwgen is a good tool for this! This password persists across restarts, so you should make sure it’s good.

docker run --rm -e DS_DM_PASSWORD=password -v 389ds:/data -p 3636:3636 389ds/dirsrv:latest
LDAPTLS_REQCERT=never ldapwhoami -H ldaps://127.0.0.1:3636 -x -D 'cn=Directory Manager' -w password
    dn: cn=directory manager

You can also configure certificates through pem files.

/data/tls/server.key
/data/tls/server.crt
/data/tls/ca/*.crt

All the certs in /data/tls/ca/ will be imported as CA’s and the server key and crt will be used for the TLS server.

If for some reason you need to reindex your db at startup, you can use:

docker run --rm -e DS_REINDEX=true -v 389ds:/data -p 3636:3636 389ds/dirsrv:latest

After the reindex is complete the instance will start like normal.

Conclusion

389ds in a container is one of the easiest and quickest ways to get a working LDAP environment today. Please test it and let us know what you think!

USG fixing avahi

Posted by William Brown on March 14, 2020 02:00 PM

USG fixing avahi

Sadly on the USG pro 4 avahi will regularly spiral out of control taking up 100% cpu. To fix this, we set an hourly restart:

sudo -s
crontab -e

Then add:

15 * * * * /usr/sbin/service avahi-daemon restart

Fedora 32 Wallpaper Submission - Story

Posted by William Brown on March 13, 2020 02:00 PM

Fedora 32 Wallpaper Submission - Story

Fedora opens submissions for wallpapers to be submitted for the next version of the release. I used fedora for a long time, so I decided to submit this photo, and write this post to talk about it:

../../../_images/20191119_184819_DSCF0043_5.jpg

This was takeing on 2019-11-19 in my home city of Adelaide, South Australia. I had traveled to see some friends over Christmas. We went to Mount Osmond to take some photos, and I took this as we walked up to the lookout.

The next day, this area was a high risk location for a possible bushfire - and many bushfires have since devastated many regions of Australia, affecting many people that I know.

I really find that the Australian landscape is so different to Europe or Asia - many tones of subtle reds, browns, and more. A dry and dusty look. The palette is such a contrast to the lush greens of Europe. Australia is a really beautiful country, in a very distinct and striking manner.

Anyway, I hope you like the photo :)

Fixing a MacBook Pro 8,2 with dead AMD GPU

Posted by William Brown on February 03, 2020 02:00 PM

Fixing a MacBook Pro 8,2 with dead AMD GPU

I’ve owned a MacBook Pro 8,2 late 2011 edition, which I used from 2011 to about 2018. It was a great piece of hardware, and honestly I’m surprised it lasted so long given how many MacOS and Fedora installs it’s seen.

I upgraded to a MacBook Pro 15,1, and I gave the 8,2 to a friend who was in need of a new computer so she could do her work. It worked really well for her until today when she’s messaged me that the machine is having a problem.

The Problem

The machine appeared to be in a bootloop, where just before swapping from the EFI GPU to the main display server, it would go black and then lock up/reboot. Booting to single user mode (boot holding cmd + s) showed the machine’s disk was intact with a clean apfs. The system.log showed corruption at the time of the fault, which didn’t instill confidence in me.

Attempting a recovery boot (boot holding cmd + r), this also yielded the bootloop. So we have potentially eliminated the installed copy of MacOS as the source of the issue.

I’ve then used the apple hardware test (boot while holding d), and it has passed the machine as a clear bill of health.

I have seen one of these machines give up in the past - my friends mother had one from the same generation and that died in almost the same way - could it be the same?

The 8,2’s cursed gpu stack

The 8,2 15” mbp has dual gpu’s - it has the on cpu Intel 3000, and an AMD radeon 6750M. The two pass through an LVDS graphics multiplexer to the main panel. The external display port however is not so clear - the DDC lines are passed through the GMUX, but the datalines directly attach to the the display port.

The machine is also able to boot with EFI rendering to either card. By default this is the AMD radeon. Which ever card is used at boot is also the first card MacOS attempts to use, but it will try to swap to the radeon later on.

This generation had a large number of the radeons develop faults in their 3d rendering capability so it would render the EFI buffer correctly, but on the initiation of 3d rendering it would fail. Sounds like what we have here!

To fix this …

Okay, so this is fixable. First, we need to tell EFI to boot primarily from the intel card. Boot to single user mode and then run.

nvram fa4ce28d-b62f-4c99-9cc3-6815686e30f9:gpu-power-prefs=%01%00%00%00

Now we need to prevent loading of the AMD drivers so that during the boot MacOS doesn’t attempt to swap from Intel to the Radeon. We can do this by hiding the drivers. System integrity protection will stop you, so you need to do this as part of recovery. Boot with cmd + r, which now works thanks to the EFI changes, then open terminal

cd /Volumes/Macintosh HD
sudo mkdir amdkext
sudo mv System/Library/Extensions/AMDRadeonX3000.kext amdkext/

Then reboot. You’ll notice the fans go crazy because the Radeon card can’t be disabled without the driver. We can post-boot load the driver to stop the fans to fix this up.

To achieve this we make a helper script:

# cat /usr/local/libexec/amd_kext_load.sh
#!/bin/sh
/sbin/kextload /amdkext/AMDRadeonX3000.kext

And a launchctl daemon

# cat /Library/LaunchDaemons/au.net.blackhats.fy.amdkext.plist
<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE plist PUBLIC "-//Apple//DTD PLIST 1.0//EN" "http://www.apple.com/DTDs/PropertyList-1.0.dtd">
<plist version="1.0">
        <dict>
                <key>Label</key>
                <string>au.net.blackhats.fy.amdkext</string>
                <key>Program</key>
                <string>/usr/local/libexec/amd_kext_load.sh</string>
                <key>RunAtLoad</key>
                <true/>
                <key>StandardOutPath</key>
                <string>/var/log/amd_kext_load.log</string>
        </dict>
</plist>

Now if you reboot, you’ll have a working mac, and the fans will stop properly. I’ve tested this with suspend and resume too and it works! The old beast continues to live :)

There are no root causes

Posted by William Brown on January 19, 2020 02:00 PM

There are no root causes

At Gold Coast LCA2020 I gave a lightning talk on swiss cheese. Well, maybe not really swiss cheese. But it was about the swiss cheese failure model which was proposed at the university of manchester.

Please note this will cover some of the same topics as the talk, but in more detail, and with less jokes.

An example problem

So we’ll discuss the current issues behind modern CPU isolation attacks IE spectre. Spectre is an attack that uses timing of a CPU’s speculative execution unit to retrieve information from another running process on the same physical system.

Modern computers rely on hardware features in their CPU to isolate programs from each other. This could be isolating your web-browser from your slack client, or your sibling’s login from yours.

This isolation however has been compromised by attacks like Spectre, and it looks unlikely that it can be resolved.

What is speculative execution?

In order to be “fast” modern CPU’s are far more complex than most of us have been taught. Often we believe that a CPU thread/core is executing “one instruction/operation” at a time. However this isn’t how most CPU’s work. Most work by having a pipeline of instructions that are in various stages of execution. You could imagine it like this:

let mut x = 0
let mut y = 0
x = 15 * some_input;
y = 10 * other_input;
if x > y {
    return true;
} else {
    return false;
}

This is some made up code, but in a CPU, every part of this could be in the “pipeline” at once.

let mut x = 0                   <<-- at the head of the queue and "further" along completion
let mut y = 0                   <<-- it's executed part way, but not to completion
x = 15 * some_input;
y = 10 * other_input;           <<-- all of these are in pipeline, and partially complete
if x > y {                      <<-- what happens here?
    return true;
} else {
    return false;
}

So how does this “pipeline” handle the if statement? If the pipeline is looking ahead, how can we handle a choice like an if? Can we really predict the future?

Speculative execution

At the if statement, the CPU uses past measurements to make a prediction about which branch might be taken, and it then begins to execute that path, even though ‘x > y’ has not been executed or completed yet! At this point x or y may not have even finished being computed yet!

Let’s assume for now our branch predictor thinks that ‘x > y’ is false, so we’ll start to execute the “return false” or any other content in that branch.

Now the instructions ahead catch up, and we resolve “did we really predict correctly?”. If we did, great! We have been able to advance the program state asynchronously even without knowing the answer until we get there.

If not, ohh nooo. We have to unwind what we were doing, clear some of the pipeline and try to do the correct branch.

Of course this has an impact on timing of the program. Some people found you could write a program to manipulate this predictor and using specific addresses and content, they could use these timing variations to “access memory” they are not allowed to by letting the specualative executor contribute to code they are not allowed to access before the unroll occurs. They could time this, and retrieve the memory contents from areas they are not allowed to access, breaking isolation.

Owwww my brain

Yes. Mine too.

Community Reactions

Since this has been found, a large amount of the community reaction has been about the “root cause”. ‘Clearly’ the root cause is “Intel are bad at making CPU’s” and so everyone should buy AMD instead because they “weren’t affected quite as badly” (Narrators voice: They were absolutely just as bad). We’ve had some intel CPU updates and kernel/program fixes so all good right? We addressed the root cause.

Or … did we?

Our computers are still asynchronous, and contain many out-of-order parts. It’s hard to believe we have “found” every method of exploiting this. Indeed in the last year many more ways to bypass hardware isolation due to our systems async nature have been found.

Maybe the “root cause” wasn’t addressed. Maybe … there are no ….

History

To understand how we got to this situation we need to look at how CPU’s have evolved. This is not a complete history.

The PDP11 was a system owned at bell labs, where the C programing language was developed. Back then CPU’s were very simple - A CPU and memory, executing one instruction at a time.

The C programming language gained a lot of popularity as it was able to be “quickly” ported to other CPU models to allow software to be compiled on other platforms. This led to many systems being developed in C.

Intel introduced the 8086, and many C programs were ported to run on it. Intel then released the 80486 in 1989, which had the first pipeline and cache to improve performance. In order to continue to support C, this meant the memory model could not change from the PDP11 - the cache had to be transparent, and the pipeline could not expose state.

This has of course led to computers being more important in our lives and businesses, so we expected further performance, leading to increased frequencies and async behaviours.

The limits of frequencies were really hit in the Pentium 4 era, when about 4GHz was shown to be a barrier of stability for those systems. They had very deep pipelines to improve performance, but that also had issues when branch prediction failed causing pipeline stalls. Systems had to improve their async behaviours futher to squeeze every single piece of performance possible out.

Compiler developers also wanted more performance so they started to develop ways to transform C in ways that “took advantage” of x86_64 tricks, by manipulating the environment so the CPU is “hinted” into states we “hope” it gets into.

Many businesses also started to run servers to provide to consumers, and in order to keep costs low they would put many users onto single pieces of hardware so they could share or overcommit resources.

This has created a series of positive reinforcement loops - C is ‘abi stable’ so we keep developing it due to it’s universal nature. C code can’t be changed without breaking every existing system. We can’t change the CPU memory model without breaking C, which is hugely prevalent. We improve the CPU to make C faster, transparently so that users/businesses can run more C programs and users. And then we improve compilers to make C faster given quirks of the current CPU models that exist …

Swiss cheese model

It’s hard to look at the current state of systems security and simply say “it’s the cpu vendors fault”. There are many layers that have come together to cause this situation.

This is called the “swiss cheese model”. Imagine you take a stack of swiss cheese and rotate and rearrange the slices. You will not be able to see through it. but as you continue to rotate and rearrange, eventually you may see a tunnel through the cheese where all the holes line up.

This is what has happened here - we developed many layers socially and technically that all seemed reasonable over time, and only after enough time and re-arrangements of the layers, have we now arrived at a situation where a failure has occured that permeates all of computer hardware.

To address it, we need to look beyond just “blaming hardware makers” or “software patches”. We need to help developers move away from C to other languages that can be brought onto new memory models that have manual or other cache strategies. We need hardware vendors to implement different async models. We need to educate businesses on risk analysis and how hardware works to provide proper decision making capability. We need developers to alter there behaviour to work in environments with higher performance constraints. And probably much much more.

There are no root causes

It is a very pervasive attitude in IT that every issue has a root cause. However, looking above we can see it’s never quite so simple.

Saying an issue has a root cause, prevents us from examining the social, political, economic and human factors that all become contributing factors to failure. Because we are unable to examine them, we are unable to address the various layers that have contributed to our failures.

There are no root causes. Only contributing factors.

Concurrency 2: Concurrently Readable Structures

Posted by William Brown on December 28, 2019 02:00 PM

Concurrency 2: Concurrently Readable Structures

In this post, I’ll discuss concurrently readable datastructures that exist, and ideas for future structures. Please note, this post is an inprogress design, and may be altered in the future.

Before you start, make sure you have read part 1

Concurrent Cell

The simplest form of concurrently readable structure is a concurrent cell. This is equivalent to a read-write lock, but has concurrently readable properties instead. The key mechanism to enable this is that when the writer begins, it clones the data before writing it. We trade more memory usage for a gain in concurrency.

To see an implementation, see my rust crate, concread

Concurrent Tree

The concurrent cell is good for small data, but a larger structure - like a tree - may take too long to clone on each write. A good estimate is that if your data in the cell is larger than about 512 bytes, you likely want a concurrent tree instead.

In a concurrent tree, only the branches involved in the operation are cloned. Imagine the following tree:

../../../_images/cow_1.png

When we attempt to change a value in the 4th leaf we copy it before we begin, and all it’s parents to update their pointers.

../../../_images/cow_2.png

In the process the pointers from the new root b to branch 1 are maintained. The new second branch also maintains a pointer to the original 3rd leaf.

This means that in this example only 3/7 nodes are copied, saving a lot of cloning. As your tree grows this saves a lot of work. Consider a tree with node-widths of 7 pointers and at height level 5. Assuming perfect layout, you only need to clone 5/~16000 nodes. A huge saving in memory copy!

The interesting part is a reader of root a, also is unaffected by the changes to root b - the tree from root a hasn’t been changed, as all it’s pointers and nodes are still valid.

When all readers of root a end, we clean up all the nodes it pointed to that no longer are needed by root b (this can be done with atomic reference counting, or garbage lists in transactions).

../../../_images/cow_3.png

It is through this copy-on-write (also called multi view concurrency control) that we achieve concurrent readability in the tree.

This is really excellent for databases where you have in memory structures that work in parallel to the database transactions. In kanidm an example is the in-memory schema that is used at run time but loaded from the database. They require transactional behaviours to match the database, and ACID properties so that readers of a past transaction have the “matched” schema in memory.

Concurrent Cache (Updated 2020-05-13)

A design that I have thought about for a long time has finally come to reality. This is a concurrently readable transactional cache. One writer, multiple readers with consistent views of the data. Additionally due to the transactioal nature, rollbacks and commits are fulled supported.

For a more formal version of this design, please see my concurrent ARC draft paper.

This scheme should work with any cache type - LRU, LRU2Q, LFU. I have used ARC.

ARC was popularised by ZFS - ARC is not specific to ZFS, it’s a strategy for cache replacement, despite the comment association between the two.

ARC is a pair of LRU’s with a set of ghost lists and a weighting factor. When an entry is “missed” it’s inserted to the “recent” LRU. When it’s accessed from the LRU a second time, it moves to the “frequent” LRU.

When entries are evicted from their sets they are added to the ghost list. When a cache miss occurs, the ghost list is consulted. If the entry “would have been” in the “recent” LRU, but was not, the “recent” LRU grows and the “frequent” LRU shrinks. If the item “would have been” in the “frequent” LRU but was not, the “frequent” LRU is expanded, and the “recent” LRU shrunk.

This causes ARC to be self tuning to your workload, as well as balancing “high frequency” and “high locality” operations. It’s also resistant to many cache invalidation or busting patterns that can occur in other algorithms.

A major problem though is ARC is not designed for concurrency - LRU’s rely on double linked lists which is very much something that only a single thread can modify safely due to the number of pointers that are not aligned in a single cache line, prevent atomic changes.

How to make ARC concurrent

To make this concurrent, I think it’s important to specify the goals.

  • Readers must always have a correct “point in time” view of the cache and its data
  • Readers must be able to trigger cache inclusions
  • Readers must be able to track cache hits accurately
  • Readers are isolated from all other readers and writer actions
  • Writers must always have a correct “point in time” view
  • Writers must be able to rollback changes without penalty
  • Writers must be able to trigger cache inclusions
  • Writers must be able to track cache hits accurately
  • Writers are isolated from all readers
  • The cache must maintain correct temporal ordering of items in the cache
  • The cache must properly update hit and inclusions based on readers and writers
  • The cache must provide ARC semantics for management of items
  • The cache must be concurrently readable and transactional
  • The overhead compared to single thread ARC is minimal

There are a lot of places to draw inspiration from, and I don’t think I can list - or remember them all.

My current design uses a per-thread reader cache to allow inclusions, with a channel to asynchronously include and track hits to the write thread. The writer also maintains a local cache of items including markers of removed items. When the writer commits, the channel is drained to a time point T, and actions on the ARC taken.

This means the LRU’s are maintained only in a single write thread, but the readers changes are able to influence the caching decisions.

To maintain consistency, and extra set is added which is the haunted set, so that a key that has existed at some point can be tracked to identify it’s point in time of eviction and last update so that stale data can never be included by accident.

Limitations and Concerns

Cache missing is very expensive - multiple threads may load the value, the readers must queue the value, and the writer must then act on the queue. Sizing the cache to be large enough is critically important as eviction/missing will have a higher penalty than normal. Optimally the cache will be “as large or larger” than the working set.

But with a concurrent ARC we now have a cache where each reader thread has a thread local cache and the writer is communicated to by channels. This may make the cache’s memory limit baloon to a high amount over a normal cache. To help, an algorithm was developed based on expect cache behaviour for misses and communication to help size the caches of readers and writers.

Conclusion

This is a snapshot of some concurrently readable datastructures, and how they are implemented and useful in your projects. Using them in Kanidm we have already seen excellent performance and scaling of the server, with very little effort for tuning. We plan to adapt these for use in 389 Directory Server too. Stay tuned!

Concurrency 1: Types of Concurrency

Posted by William Brown on December 28, 2019 02:00 PM

Concurrency 1: Types of Concurrency

I want to explain different types of concurrent datastructures, so that we can explore their properties and when or why they might be useful.

As our computer systems become increasingly parallel and asynchronous, it’s important that our applications are able to work in these environments effectively. Languages like Rust help us to ensure our concurrent structures are safe.

CPU Memory Model Crash Course

In no way is this a thorough, complete, or 100% accurate representation of CPU memory. My goal is to give you a quick brief on how it works. I highly recommend you read “what every programmer should know about memory” if you want to learn more.

In a CPU we have a view of a memory space. That could be in the order of KB to TB. But it’s a single coherent view of that space.

Of course, over time systems and people have demanded more and more performance. But we also have languages like C, that won’t change from their view of a system as a single memory space, or change how they work. Of course, it turns out C is not a low level language but we like to convince ourselves it is.

To keep working with C and others, CPU’s have acquired cache’s that are transparent to the operation of the memory. You have no control of what is - or is not - in the cache. It “just happens” asynchronously. This is exactly why spectre and meltdown happened (and will continue to happen) because these async behaviours will always have the observable effect of making your CPU faster. Who knew!

Anyway, for this to work, each CPU has multiple layers of cache. At L3 the cache is shared with all the cores on the die. At L1 it is “per cpu”.

Of course it’s a single view into memory. So if address 0xff is in the CPU cache of core 1, and also in cache of core 2, what happens? Well it’s supported! Caches between cores are kept in sync via a state machine called MESI. These states are:

  • Exclusive - The cache is the only owner of this value, and it is unchanged.
  • Modified - The cache is the only owner of this value, and it has been changed.
  • Invalid - The cache holds this value but another cache has changed it.
  • Shared - This cache and maybe others are viewing this valid value.

To gloss very heavily over this topic, we want to avoid invaild. Why? That means two cpus are contending for the value, causing many attempts to keep each other in check. These contentions cause CPU’s to slow down.

We want values to either be in E/M or S. In shared, many cpu’s are able to read the value at maximum speed, all the time. In E/M, we know only this cpu is changing the value.

This cache coherency is also why mutexes and locks exist - they issue the needed CPU commands to keep the caches in the correct states for the memory we are accessing.

Keep in mind Rust’s variables are immutable, and able to share between threads, or mutable and single thread only. Sound familar? Rust is helping with concurrency by keeping our variables in the fastest possible cache states.

Data Structures

We use data structures in programming to help improve behaviours of certain tasks. Maybe we need to find values quicker, sort contents, or search for things. Data Structures are a key element of modern computer performance.

However most data structures are not thread safe. This means only a single CPU can access or change them at a time. Why? Because if a second read them, due to cache-differences in content the second CPU may see an invalid datastructure, leading to undefined behaviour.

Mutexes can be used, but this causes other CPU’s to stall and wait for the mutex to be released - not really what we want on our system. We want every CPU to be able to process data without stopping!

Thread Safe Datastructures

There exist many types of thread safe datastructures that can work on parallel systems. They often avoid mutexes to try and keep CPU’s moving as fast as possible, relying on special atomic cpu operations to keep all the threads in sync.

Multiple classes of these structures exist, which have different properties.

Mutex

I have mentioned these already, but it’s worth specifying the properties of a mutex. A mutex is a system where a single CPU exists in the mutex. It becomes one “reader/writer” and all other CPU’s must wait until the mutex is released by the current CPU holder.

Read Write Lock

Often called RWlock, these allow one writer OR multiple parallel readers. If a reader is reading then a writer request is delayed until the readers complete. If a writer is changing data, all new reads are blocked. All readers will always be reading the same data.

These are great for highly concurrent systems provided your data changes infrequently. If you have a writer changing data a lot, this causes your readers to be continually blocking. The delay on the writer is also high due to a potentially high amount of parallel readers that need to exit.

Lock Free

Lock free is a common (and popular) datastructue type. These are structures that don’t use a mutex at all, and can have multiple readers and multiple writers at the same time.

The most common and popular structure for lock free is queues, where many CPUs can append items and many can dequeue at the same time. There are also a number of lock free sets which can be updated in the same way.

An interesting part of lock free is that all CPU’s are working on the same set - if CPU 1 reads a value, then CPU 2 writes the same value, the next read from CPU 1 will show the new value. This is because these structures aren’t transactional - lock free, but not transactional. There are some times where this is really useful as a property when you need a single view of the world between all threads, and your program can tolerate data changing between reads.

Wait Free

This is a specialisation of lock free, where the reader/writer has guaranteed characteristics about the time they will wait to read or write data. This is very detailed and subtle, only affecting real time systems that have strict deadline and performance requirements.

Concurrently Readable

In between all of these is a type of structure called concurrently readable. A concurrently readable structure allows one writer and multiple parallel readers. An interesting property is that when the reader “begins” to read, the view for that reader is guaranteed not to change until the reader completes. This means that the structure is transactional.

An example being if CPU 1 reads a value, and CPU 2 writes to it, CPU 1 would NOT see the change from CPU 2 - it’s outside of the read transaction!

In this way there are a lot of read-only immutable data, and one writer mutating and changing things … sounds familar? It’s very close to how our CPU’s cache work!

These structures also naturally lend themself well to long processing or database systems where you need transactional (ACID) properties. In fact some databases use concurrent readable structures to achieve ACID semantics.

If it’s not obvious - concurrent readability is where my interest lies, and in the next post I’ll discuss some specific concurrently readable structures that exist today, and ideas for future structures.

Packaging and the Security Proposition

Posted by William Brown on December 18, 2019 02:00 PM

Packaging and the Security Proposition

As a follow up to my post on distribution packaging, it was commented by Fraser Tweedale (@hackuador) that traditionally the “security” aspects of distribution packaging was a compelling reason to use distribution packages over “upstreams”. I want to dig into this further.

Why does C need “securing”

C as a language is unsafe in every meaning of the word. The best C programmers on the planet are incapable of writing a secure program. This is because to code in C you have to express a concurrent problem, into a language that is linearised, which is compiled relying on undefined behaviour, to be executed on an asynchronous concurrent out of order CPU. What could possibly go wrong?!

There is a lot you need to hold in mind to make C work. I can tell you now that I spend a majority of my development time thinking about the code to change rather than writing C because of this!

This has led to C based applications having just about every security issue known to man.

How is C “secured”

So as C is security swiss cheese, this means we have developed processes around the language to soften this issue - for example advice like patch and update continually as new changes are continually released to resolve issues.

Distribution packages have always been the “source” of updates for these libraries and applications. These packages are maintained by humans who need to update these packages. This means when a C project releases a fix, these maintainers would apply the patch to various versions, and then release the updates. These library updates due to C’s dynamic nature means when the machine is next rebooted (yes rebooted, not application restarted) that these fixes apply to all consumers who have linked to that library - change one, fix everything. Great!

But there are some (glaring) weaknesses to this model. C historically has little to poor application testing so many of these patches and their effects can’t be reproduced. Which also subsequently means that consuming applications also aren’t re-tested adequately. It can also have impacts where a change to a shared library can impact a consuming application in a way that was unforseen as the library changed.

The Dirty Secret

The dirty secret of many of these things is that “thoughts and prayers” is often the testing strategy of choice when patches are applied. It’s only because humans carefully think about and write tiny amounts of C that we have any reliability in our applications. And we already established that it’s nearly impossible for humans to write correct C …

Why Are We Doing This?

Because C linking and interfaces are so fragile, and due to the huge scope in which C can go wrong due to being a memory unsafe language, distributions and consumers have learnt to fear version changes. So instead we patch ancient C code stacks, barely test them, and hope that our castles of sand don’t fall over, all so we can keep “the same version” of a program to avoid changing it as much as possible. Ironically this makes those stacks even worse because we’ve developed infinite numbers of bespoke barely tested packages that people rely on daily.

To add more insult to this, most of this process is manual - humans monitor mailing lists, and have to know what code needs what patch, and when in what release streams. It’s a monumental amount of human time and labour involved to keep the sand castles standing. This manual involvement is what leads to information overload, and maintainers potentially missing security updates or releases that causes many distribution packages to be outdated, missing patches, or vulnerable more often than not. In other cases packages continue to be shipped that are unmaintained or have no upstream, so any issues that may exist are unknown or unresolved.

Distribution Security

This means all of platform and distribution security comes to one factor.

A lot of manual human labour.

It’s is only because distributions have so many volunteers or paid staff, that this entire system continues to progress to give the illusion of security and reliability. When it fails, it fails silently.

Heartbleed really dragged the poor state of C security into the open , and it’s still not been addressed.

When people say “how can we secure docker/flatpak/Rust” like we do with distributions, I say: “Do we really secure distributions at all?”. We only have a veneer of best effort masquerading as a secure supply chain.

A Different Model …

So let’s look briefly at Rust and how you package it today (against distribution maintainer advice).

Because it’s staticly linked, each application must be rebuilt if a library changes. Because the code comes from a central upstream, there are automated tools to find security issues (like cargo audit). The updates are pulled from the library as a whole working tested unit, and then built into our application to to recieve further testing and verification of the application as a whole singular functional unit.

These dependencies once can then be vendored to a tar (allowing offline builds and some aspects of reproducability). This vendor.tar.gz is placed into the source rpm along with the application source, and then built.

There is a much stronger pipeline of assurances here! And to further aid Rust’s cause, because it is a memory safe language, it eliminates most of the security issues that C is afflicted by, causing security updates to be far fewer, and to often affect higher level or esoteric situations. If you don’t believe me, look at the low frequency, and low severity commits for the rust advisory-db

People have worried that because Rust is staticly linked we’ll have to rebuild it and update it continually to keep it secure - I’d say because it’s Rust we’ll have stronger guarantees at build that security issues are less likely to exist and we won’t have to ship updates nearly as often as a C stack.

Another point to make is Rust libraries don’t release patches - because of Rust’s stronger guarantees at compile time and through integrated testing, people are less afraid of updates to versions. We are very unlikely to see Rust releasing patches, rather than just shipping “updates” to libraries and expecting you to update. Because these are staticly linked, we don’t have to worry about versions for other libraries on the platform, we only need to assure the application is currently working as intended. Because of the strong typing those interfaces of those libraries has stronger compile time guarantees at build time, meaning the issues around shared object versioning and symbol/version mismatching simply don’t exist - one of the key reasons people became version change averse in the first place.

So Why Not Package All The Things?

Many distribution packagers have been demanding a C-like model for Rust and others (remember, square peg, round hole). This means every single crate (library) is packaged, and then added to a set of buildrequires for the application. When a crate updates, it triggers the application to rebuild. When a security update for a library comes out, it rebuilds etc.

This should sound familiar … because it is. It’s reinventing Cargo in a clean-room.

RPM provides a way to manage dependencies. Cargo provides a way to manage dependencies.

RPM provides a way to offline build sources. Cargo provides a way to offline build sources.

RPM provides a way to patch sources. Cargo provides a way to update them inplace - and patch if needed.

RPM provides a way to … okay you get the point.

There is also a list of what we won’t get from distribution packages - remember distribution packages are the C language packaging system

We won’t get the same level of attention to detail, innovation and support as the upstream language tooling has. Simply put, users of the language just won’t use distribution packages (or toolchains, libraries …) in their workflows.

Distribution packages can’t offer is the integration into tools like cargo-audit for scanning for security issues - that needs still needs Cargo, not RPM, meaning the RPM will need to emulate what Cargo does exactly.

Using distribution packages means you have an untested pipeline that may add more risks now. Developers won’t use distribution packages - they’ll use cargo. Remember applications work best as they are tested and developed - outside of that environment they are an unknown.

Finally, the distribution maintainers security proposition is to secure our libraries - for distributions only. That’s acting in self interest. Cargo is offering a way to secure upstream so that everyone benefits. That means less effort and less manual labour all around. And secure libraries are not the full picture. Secure applications is what matters.

The large concerning factor is the sheer amount of human effort. We would spend hundreds if not thousands of hours to reinvent a functional tool in a disengaged manner, just so that we can do things as they have always been done in C - for the benefit of distributions individually rather than languages upstream.

What is the Point

Again - as a platform our role is to provide applications that people can trust. The way we provide these applications is never going to be one size fits all. Our objective isn’t to secure “this library” or “that library”, it’s to secure applications as a functional whole. That means that companies shipping those applications, should hire maintainers to work on those applications to secure their stacks.

Today I honestly think Rust has a better security and updating story than C packages ever has, powered by automation and upstream integration. Let’s lean on that, contribute to it, and focus on shipping applications instead of reinventing tools. We need to accept our current model is focused on C, that developers have moved around distribution packaging, and that we need to change our approach to eliminate the large human risk factor that currently exists.

We can’t keep looking to the models of the past, we need to start to invest in new methods for the future.

Today, distributions should focus on supporting and distributing _applications_ and work with native language supply chains to enable this.

Which is why I’ll keep using cargo’s tooling and auditing, and use distribution packages as a delievery mechanism for those applications.

What Could it Look Like?

We have a platform that updates as a whole (Fedora Atomic comes to mind …) with known snapshots that are tested and well known. This platform has methods to run applications, and those applications are isolated from each other, have their own libraries, and security audits.

And because there are now far fewer moving parts, quality is easier to assert, understand, and security updates are far easier and faster, less risky.

It certainly sounds a lot like what macOS and iOS have been doing with a read-only base, and self-contained applications within that system.

Packaging, Vendoring, and How It’s Changing

Posted by William Brown on December 17, 2019 02:00 PM

Packaging, Vendoring, and How It’s Changing

In today’s thoughts, I was considering packaging for platforms like opensuse or other distributions and how that interacts with language based packaging tools. This is a complex and … difficult topic, so I’ll start with my summary:

Today, distributions should focus on supporting and distributing _applications_ and work with native language supply chains to enable this.

Distribution Packaging

Let’s start by clarifying what distribution packaging is. This is your linux or platforms method of distributing it’s programs libraries. For our discussion we really only care about linux so say suse or fedora here. How macOS or FreeBSD deal with this is quite different.

Now these distribution packages are built to support certain workflows and end goals. Many open source C projects release their source code in varying states, perhaps also patches to improve or fix issues. These code are then put into packages, dependencies between them established due to dynamic linking, they are signed for verification purposes and then shipped.

This process is really optimised for C applications. C has been the “system language” for many decades now, and so we can really see these features designed to promote - and fill in gaps - for these applications.

For example, C applications are dynamically linked. Because of this it encourages package maintainers to “split” applications into smaller units that can have shared elements. An example that I know is openldap which may be a single source tree, but often is packaged to multiple parts such as the libldap.so, lmdb, openldap-client applications, it’s server binary, and probably others. The package maintainer is used to taking their scalpels and carefully slicing sources into elegant packages that can minimise how many things are installed to what is “just needed”.

We also see other behaviours where C shared objects have “versions”, which means you can install multiple versions of them at once and programs declare in their headers which library versions they want to consume. This means a distribution package can have many versions of the same thing installed!

This in mind, the linking is simplistic and naive. If a shared object symbol doesn’t exist, or you don’t give it the “right arguments” via a weak-compile time contract, it’s likely bad things (tm) will happen. So for this, distribution packaging provides the stronger assertions about “this program requires that library version”.

As well, in the past the internet was a more … wild place, where TLS wasn’t really widely used. This meant that to gain strong assertions about the source of a package and that it had not been tampered, tools like GPG were used.

What about Ruby or Python?

Ruby and Python are very different languages compared to C though. They don’t have information in their programs about what versions of software they require, and how they mesh together. Both languages are interpreted, and simply “import library” by name, searching a filesystem path for a library matching regardless of it’s version. Python then just loads in that library as source straight to the running vm.

It’s already apparent how we’ll run into issues here. What if we have a library “foo” that has a different function interface between version 1 and version 2? Python applications only request access to “foo”, not the version, so what happens if the wrong version is found? What if it’s not found?

Some features here are pretty useful from the “distribution package” though. Allowing these dynamic tools to have their dependencies requested from the “package”, and having the package integrity checked for them.

But overtime, conflicts started, and issues arose. A real turning point was ruby in debian/ubuntu where debian package maintainers (who are used to C) brought out the scalpel and attempted to slice ruby down to “parts” that could be reused form a C mindset. This led to a combinations of packages that didn’t make sense (rubygems minus TLS, but rubygems requires https), which really disrupted the communities.

Another issue was these languages as they grew in popularity had different projects requiring different versions of libraries - which as before we mentioned isn’t possible beside library search path manipulations which is frankly user hostile.

These issues (and more) eventually caused these communities as a whole to stop recommending distribution packages.

So put this history in context. We have Ruby (1995) and Python (1990), which both decided to avoid distribution packages with their own tools aka rubygems (2004) and pip (2011), as well as tools to manage multiple parallel environments (rvm, virtualenv) that were per-application.

New kids on the block

Since then I would say three languages have risen to importance and learnt from the experiences of Ruby - This is Javascript (npm/node), Go and Rust.

Rust went further than Ruby and Python and embedded distribution of libraries into it’s build tools from an early date with Cargo. As Rust is staticly linked (libraries are build into the final binary, rather than being dynamicly loaded), this moves all dependency management to build time - which prevents runtime library conflict. And because Cargo is involved and controls all the paths, it can do things such as having multiple versions available in a single build for different components and coordinating all these elements.

Now to hop back to npm/js. This ecosystem introduced a new concept - micro-dependencies. This happened because javascript doesn’t have dead code elimination. So if given a large utility library, and you call one function out of 100, you still have to ship all 99 unused ones. This means they needed a way to manage and distribute hundreds, if not thousands of tiny libraries, each that did “one thing” so that you pulled in “exactly” the minimum required (that’s not how it turned out … but it was the intent).

Rust also has inherited a similar culture - not to the same extreme as npm because Rust DOES have dead code elimination, but still enough that my concread library with 3 dependencies pulls in 32 dependencies, and kanidm from it’s 30 dependencies, pulls in 365 into it’s graph.

But in a way this also doesn’t matter - Rust enforces strong typing at compile time, so changes in libraries are detected before a release (not after like in C, or dynamic languages), and those versions at build are used in production due to the static linking.

This has led to a great challenge is distribution packaging for Rust - there are so many “libraries” that to package them all would be a monumental piece of human effort, time, and work.

But once again, we see the distribution maintainers, scalpel in hand, a shine in their eyes looking and thinking “excellent, time to package 365 libraries …”. In the name of a “supply chain” and adding “security”.

We have to ask though, is there really value of spending all this time to package 365 libraries when Rust functions so differently?

What are you getting at here?

To put it clearly - distribution packaging isn’t a “higher” form of distributing software. Distribution packages are not the one-true solution to distribute software. It doesn’t magically enable “security”. Distribution Packaging is the C language source and binary distribution mechanism - and for that it works great!

Now that we can frame it like this we can see why there are so many challenges when we attempt to package Rust, Python or friends in rpms.

Rust isn’t C. We can’t think about Rust like C. We can’t secure Rust like C.

Python isn’t C. We can’t think about Python like C. We can’t secure Python like C.

These languages all have their own quirks, behaviours, flaws, benefits, and goals. They need to be distributed in unique ways appropriate to those languages.

An example of the mismatch

To help drive this home, I want to bring up FreeIPA. FreeIPA has a lot of challenges in packaging due to it’s huge number of C, Python and Java dependencies. Recently on twitter it was annouced that “FreeIPA has been packaged for debian” as the last barrier (being dogtag/java) was overcome to package the hundreds of required dependencies.

The inevitable outcome of debian now packaging FreeIPA will be:

  • FreeIPA will break in some future event as one of the python or java libraries was changed in a way that was not expected by the developers or package maintainers.
  • Other applications may be “held back” from updating at risk/fear of breaking FreeIPA which stifles innovation in the java/python ecosystems surrounding.

It won’t be the fault of FreeIPA. It won’t be the fault of the debian maintainers. It will be that we are shoving square applications through round C shaped holes and hoping it works.

So what does matter?

It doesn’t matter if it’s Kanidm, FreeIPA, or 389-ds. End users want to consume applications. How that application is developed, built and distributed is a secondary concern, and many people will go their whole lives never knowing how this process works.

We need to stop focusing on packaging libraries and start to focus on how we distribute applications.

This is why projects like docker and flatpak have surprised traditional packaging advocates. These tools are about how we ship applications, and their build and supply chains are separated from these.

This is why I have really started to advocate and say:

Today, distributions should focus on supporting and distributing _applications_ and work with native language supply chains to enable this.

Only we accept this shift, we can start to find value in distributions again as sources of trusted applications, and how we see the distribution as an application platform rather than a collection of tiny libraries.

The risk of not doing this is alienating communities (again) from being involved in our platforms.

Follow Up

There have been some major comments since:

First, there is now a C package manager named conan . I have no experience with this tool, so at a distance I can only assume it works well for what it does. However it was noted it’s not gained much popularity, likely due to the fact that distro packages are the current C language distribution channels.

The second was about the security components of distribution packaging and this - that topic is so long I’ve written another post about the topic instead, to try and keep this post focused on the topic.

Finally, The Fedora Modularity effort is trying to deal with some of these issues - that modules, aka applications have different cadences and requirements, and those modules can move more freely from the base OS.

Some of the challenges have been explored by LWN and it’s worth reading. But I think the underlying issue is that again we are approaching things in a way that may not align with reality - people are looking at modules as libraries, not applications which is causing things to go sideways. And when those modules are installed, they aren’t isolated from each other , meaning we are back to square one, with a system designed only for C. People are starting to see that but the key point is continually missed - that modularity should be about applications and their isolation not about multiple library versions.

Fixing opensuse virtual machines with resume

Posted by William Brown on December 14, 2019 02:00 PM

Fixing opensuse virtual machines with resume

Today I hit an unexpected issue - after changing a virtual machines root disk to scsi, I was unable to boot the machine.

The host is opensuse leap 15.1, and the vm is the same. What’s happening!

The first issue appears to be that opensuse 15.1 doesn’t support scsi disks from libvirt. I’m honestly not sure what’s wrong here.

The second is that by default opensuse leap configures suspend and resume to disk - by it uses the pci path instead of a swap volume UUID. So when you change the bus type, it renames the path making the volume inaccessible. This causes boot to fail.

To work around you can remove “resume=/disk/path” from the cli. Then to fix it permanently you need:

transactional-update shell
vim /etc/default/grub
# Edit this line to remove "resume"
GRUB_CMDLINE_LINUX_DEFAULT="console=ttyS0,115200 resume=/dev/disk/by-path/pci-0000:00:07.0-part3 splash=silent quiet showopts"

vim /etc/default/grub_installdevice
# Edit the path to the correct swap location as by ls -al /dev/disk/by-path
/dev/disk/by-path/pci-0000:00:07.0-part3

I have reported these issues, and I hope they are resolved.

Password Quality and Badlisting in Kanidm

Posted by William Brown on December 06, 2019 02:00 PM

Password Quality and Badlisting in Kanidm

Passwords are still a required part of any IDM system. As much as I wish for Kanidm to only support webauthn and stronger authentication types, at the end of the day devices can be lost, destroyed, some people may not be able to afford them, some clients aren’t compatible with them and more.

This means the current state of the art is still multi-factor auth. Something you have and something you know.

Despite the presence of the multiple factors, it’s still important to quality check passwords. Microsoft’s Azure security team have written about passwords, and it really drives home the current situation. I would certainly trust these people at Microsoft to know what they are talking about given the scale of what they have to defend daily.

The most important take away is that trying to obscure the password from a bruteforce is a pointless exercise because passwords end up in password dumps, they get phished, keylogged, and more. MFA matters!

It’s important here to look at the “easily guessed” and “credential stuffing” category. That’s what we really want to defend against with password quality, and MFA protects us against keylogging, phising (only webauthn), and reuse.

Can we Avoid This?

Yes! Kanidm supports a “generated” password that is a long, high entropy password that should be stored in a password manager or similar tool to prevent human knowledge. This fits to our “device as authentication” philosophy. You authenticate to your device (phone, laptop etc), and then the devices stored passwords authenticate you from that point on. This has the benefit that devices and password managers generally perform better checking of the target where we enter the password, making phising less likely.

But sometimes we can’t rely on this, so we still need human-known passwords, so we still take steps to handle these.

Quality

In the darkages, we would decree that human known passwords had to have a number, symbol, roman numeral, no double letters, one capital letter, two kanji and no symbols that could be part of an sql injection because of that one legacy system we can’t patch.

This lead to people making horrid, un-rememberable passwords in leetspeak, or giving up altogether and making the excellent “Password1” (which passes AD minimum password requirements on server 2003).

What we really want is entropy, length, and memorability. Dropbox made a great library for this called zxcvbn, which has since been ported to rust . I highly recommend it.

This library is great because it focuses on entropy, and then if the password doesn’t meet these requirements, the library recommends ways to improve. This is excellent for human interaction and experience, guiding people to create better passwords that they can remember, rather than our outdated advice of the complex passwords as described above.

Badlisting

Badlisting is another great technique for improving password quality. It’s essentially a blocklist of passwords that people are not allowed to set. This way you can have corporate-specific breach lists, or the top 10k most used passwords badlisted to prevent users using them. For example, “correct horse battery staple” may be a strong password, but it’s well known thanks to xkcd.

It’s also good for preventing password reuse in your company if you are phished and the credentials privately notified to you as some of the regional CERT’s do, allowing you to block these without them being in a public breach list.

This is important as many bots will attempt to spam these passwords against accounts (rate limiting auth and soft-locking accounts also helps to delay these attack styles).

In Kanidm

In Kanidm, we chose to use both approaches. First we check the password with zxcvbn, then we ensure it’s not in a badlist.

In order to minimise the size of the badlist, the badlist uses case insensitive storage so that multiple variants of “password” and “PasSWOrD” are only listed once. We also preprocessed the badlist with zxcvbn to remove any passwords that it would have denied from being entered. The preprocessor tool will be shipped with kanidm so that administrators can preprocess their own lists before adding them to the badlist configuration.

Creating a Badlist

I decided to do some analysis on a well known set of passwords maintained on the seclists repository . Apparently this is what pentesters reach for when they want to bruteforce or credential stuff on a domain.

I analysed this in three ways. The first was as a full set of passwords (about 25 million) and a smaller but “popular” set in the “rockyou” files which is about 60,000 passwords. Finally I did an analysis of the rockyou + top 10 million most common (which combined was 1011327 unique passwords, so about 50k of the rockyou set is from top 10 million).

From the 25 million set I ran this through a preprocessor tool that I developed for kanidm. It eliminated anything less than a score of 3 and no length rule. This showed that zxcvbn was able to prevent 80% of these inputs from being allowed. If I was to ship this full list, this would contain 4.8 million badlisted passwords. It’s pretty amazing already that zxcvbn stops 80% of bad passwords that end up in breaches from being able to be used, with the remaining 20% likely to be complex passwords that just got dumped by other means.

However, for the badlist in Kanidm, I decided to start with “what’s popular” for now, and to allow sites to add extra content if they desire. This meant that I focused instead on the “rockyou” password set instead.

From the rockyou set I did more tests. zxcvbn has a concept of scores, and we can have policy to request a minimum score is present to allow the password. I did a score 3 test, a score 3 with min pw len 10 and a score 4 test. This showed the following results which has the % blocked by zxcvbn and the no. that passed which will required badlisting as zxcvbn can’t detect them (today).

TEST     | % blocked | no. passed
---------------------------------
 s3      |  98.3%    |  1004
 s3 + 10 |  98.9%    |  637
 s4      |  99.7%    |  133

Personally, it’s quite hilarious that “2fast2furious” passed the score 3 check, and “30secondstomars” and “dracomalfoy” passed the score 4 check, but who am I to judge - that’s what bad lists are for.

More seriously, I found it interesting to see the effect of the check on length - not only was the preprocessor step faster, but alone that eliminated ~400 passwords that would have “passed” on score 3.

Finally, from the rockyou + 10m set, the results show the following in the same conditions.

TEST     | % blocked | no. passed
---------------------------------
 s3      |  89.9%    |  101349
 s3 + 10 |  92.4%    |  76425
 s4      |  96.5%    |  34696

This shows that a very “easy” win is to enforce password length, in addition to entropy checkers like zxcvbn, which are effective to block 92% of the most common passwords in use on a broad set and 98% of what a pentester will look for (assuming rockyou lists). If you have a high security environment you should consider setting zxcvbn to request passwords of score 4 (the maximum), given that on the 10m set it had a 96.5% block rate.

Conclusions

You should use zxcvbn, it’s a great library, which quickly reduces a huge amount of risk from low quality passwords.

After that your next two strongest controls are password length, and being able to support badlisting.

Better yet, use MFA like Webauthn as well, and support server-side generated high-entropy passwords!

Rust 2020 - helping to get rust deployed

Posted by William Brown on November 27, 2019 02:00 PM

Rust 2020 - helping to get rust deployed

This is my contribution to Rust 2020, where community members put forward ideas on what they thing Rust should aim to achieve in 2020.

In my view, Rust has had an amazing adoption by developers, and is great if you are in a position to deploy it in your own infrastructure, but we have yet to really see Rust make it to broad low-level components (IE in a linux distro or other infrastructure).

As someone who works on “enterprise” software (389-ds) and my own IDM project (kanidm), there is a need to have software packaged and distributed. We can not ask our consumers to build and compile these tools. One could view it as a chain, where I develop software in a language, it’s packaged for a company (like SUSE), and then consumed by a customer (could be anyone!) who provides a service to others (indirect users).

Rust however has always been modeled that there is no “middle” section. You have either a developer who’s intent is to develop for other developers. This is where Rust ideas like crates.io becomes involved. Alternately, you have a larger example in firefox, where developers build a project and can “bundle” everything into a whole unit that is then distributed directly to customers.

The major difference is that in the intermediate distribution case, we have to take on different responsibilities such as security auditing, building, ensuring dependencies exist etc.

So this leads me to:

1: Cargo Vendor Needs Some Love

Cargo vendor today is quite confusing in some scenarios, and it’s not clear how to have it work for projects that require offline builds. I have raised issues about this, but largely they have been un-acted upon.

2: Cargo is Difficult to Use in Mixed Language Projects

A large value proposition of Rust is the ability to use it with FFI and C. This is great if you say have cargo compile your C code for you.

But most major existing projects don’t. They use autotools, cmake, or maybe even meson or more esoteric, waf (looking at you samba). Cargo’s extreme opinionation in this area makes it extremely difficult to integrate Rust into an existing build system reliably. It’s hard to point to one fault, as much as a broader “lack of concern” in the space, and I think cargo needs to evolve improvements to help allow Rust to be used from other build tools.

3: Rust Moves Too Fast

A lot of “slower” enterprise companies want to move slowly, including compiler versions. Admittedly, this conservative behaviour is because of the historical instability of gcc versions and how it can change or affect your code between releases. Rust doesn’t suffer this, but people are still wary of fast version changes. This means Rustc + Cargo will get pinned to some version that may be 6 months old.

However crate authors don’t consider this - they will use the latest and greatest features from stable (and sometimes still nightly … grrr) in releases. Multiple times I have found that on my development environment even with a 3 month old compiler, dependencies won’t build.

Compounding this, crates.io doesn’t distinguish a security release from a feature one. Crates also encourages continuall version bumping, rather than maintenence of versioned branches. IE version 0.4.3 of a crate with a security fix will become 0.4.4, but then a feature update to include try_from may go to 0.4.5 as it “adds” to the api, or they use it internally as a cleanup.

Part of this issue is that Rust applications need to be treated closer to docker, as static whole units where only the resulting binary is supported rather than the toolchain that built it. But that only works on pure Rust applications - any mixed C + Rust application will hit this issue due to the difference between a system Rust version and what crate dependencies publish.

So I think that from this it leads to:

3.1: Crates need to indicate a minimum supported compiler version

Rust has “toyed” with the idea of editions, but within 2018 we’ve seen new features like maybeuninit and try_from land, which within an “edition” caused crates to stop worked on older compilers.

As a result, editions I think is “too broad” and people will fear incrementing it, and Rust will add features without changing edition anyway. Instead Rust needs to consider following up on the minimum supported rust version flag RFC. Rust has made it pretty clear the only “edition” flag that matters is the rust compiler version based on crate developers and what they are releasing.

3.2: Rust Needs to Think “What’s Our End Goal”

Rust is still moving incredibly fast, and I think in a way we need to ask … when will Rust be finished? When will it as a language go from continually rapid growth to stable and known feature sets? The idea of Rust editions acts as though this has happened (saying we change only every few years) when this is clearly not the case. Rust is evolving release-on-release, every 6 weeks.

4: Zero Cost Needs to Factor in Human Cost

My final wish for Rust is that sometimes we are so obsessed with the technical desire for zero cost abstraction, that we forget the high human cost and barriers that can exist as a result making feature adoption challenging. Rust has had a great community that treats people very well, and I think sometimes we need to extend that into feature development, to really consider the human cognitive cost of a feature.

Summarised - what’s the benefit of a zero cost abstraction if people can not work out how to use it?

Summary

I want to see Rust become a major part of operating systems and how we build computer systems, but I think that we need to pace ourselves, improve our tooling, and have some better ideas around what Rust should look like.

Recovering LVM when a device is missing with a cache pool lv

Posted by William Brown on November 25, 2019 02:00 PM

Recovering LVM when a device is missing with a cache pool lv

I had a heartstopping moment today: my after running a command lvm proudly annouced it had removed an 8TB volume containing all of my virtual machine backing stores.

Everyone, A short view back to the past …

I have a home server, with the configured storage array of:

  • 2x 8TB SMR (Shingled Magnetic Recording) archive disks (backup target)
  • 2x 8TB disks (vm backing store)
  • 2x 1TB nvme SSD (os + cache)

The vm backing store also had a lvm cache segment via the nvme ssds in a raid 1 configuration. This means that the 2x8TB drives are in raid 1, and a partition on each of the nvme devices are in raid 1, then they are composed to allow the nvme to cache blocks from/to the 8TB array.

Two weeks ago I noticed one of the nvme drives was producing IO errors indicating a fault of the device. Not wanting to risk corruption or other issues from growing out of hand, I immediately shutdown the machine and identified the nvme disk with the error.

At this stage I took the precaution of imaging (dd) both the good and bad nvme devices to the archive array. Subsequently I completed a secure erase of the faulty nvme drive before returning it to the vendor for RMA.

I then left the server offline as I was away from my home for more than a week and would not need, and was unable to monitor if the other drives would produce further errors.

Returning home …

I decided to ignore William of the past (always a bad idea) and to “break” the raid on the remaining nvme device so that my server could operate allowing me some options for work related tasks.

This is an annoying process in lvm - you need to remove the missing device from the volume group as well as indicating to the array that it should no longer be in a raid state. This vgreduce is only for removing missing PV’s, it shouldn’t be doing anything else.

I initiated the raid break process on the home, root and swap devices. The steps are:

vgreduce --removemissing <vgname>
lvconvert -m0 <vgname>/<lvname>

This occured without fault due to being present on an isolated “system” volume group, so the partial lvs were untouched and left on the remaining pv in the vg.

When I then initiated this process on the “data” vg which contained the libvirt backing store, vgreduce gave me the terrifying message:

Removing logical volume "libvirt_t2".

Oh no ~

Recovery Attempts

When a logical volume is removed, it can be recovered as lvm stores backups of the LVM metadata state in /etc/lvm/archive.

My initial first reaction was that I was on a live disk, so I needed to backup this content else it would be lost on reboot. I chose to put this on the unaffected, and healthy SMR archive array.

mount /dev/mapper/archive-backup /archive
cp -a /etc/lvm /archive/lvm-backup

At this point I knew that randomly attempting commands would cause further damage and likely prevent any ability to recover.

The first step was to activate ssh so that I could work from my laptop - rather than the tty with keyboard and monitor on my floor. It also means you can copy paste, which reduces errors. Remember, I’m booted on a live usb, which is why I reset the password.

# Only needed in a live usb.
passwd
systemctl start sshd

I then formulated a plan and wrote it out. This helps to ensure that I’ve thought through the recovery process and the risks, it helps be to be fully aware of the situation.

vim recovery-plan.txt

Into this I laid out the commands I would follow. Here is the plan:

bytes 808934440960
data_00001-2096569583

dd if=/dev/zero of=/mnt/lv_temp bs=4096 count=197493760
losetup /dev/loop10 /mnt/lv_temp
pvcreate --restorefile /etc/lvm/archive/data_00001-2096569583.vg --uuid iC4G41-PSFt-6vqp-GC0y-oN6T-NHnk-ivssmg /dev/loop10
vgcfgrestore data --test --file /etc/lvm/archive/data_00001-2096569583.vg

Now to explain this: The situation we are in is:

  • We have a removed data/libvirt_t2 logical volume
  • The VG data is missing a single PV (nvme0). It still has three PVs (nvme1, sda1, sdb1).
  • We can not restore metadata unless all devices are present as per the vgcfgrestore man page.

This means, we need to make a replacement device to replace into the array, and then to restore the metadata with that.

The “bytes” section you see, is the size of the missing nvme0 partition that was a member of this array - we need to create a loopback device of the same or greater size to allow us to restore the metadata. (dd, losetup)

Once the loopback is created, we can then recreate the pv on the loopback device with the same UUID as the missing device.

Once this is present, we can now restore the metadata as documented which should contain the logical volume.

I ran these steps and it was all great until vgcfgrestore. I can not remember the exact error but it was along the lines of:

Unable to restore metadata as PV was missing for VG when last modification was performed.

Yep, the vgreduce command has changed the VG state, triggering a metadata backup, but because a device was missing at the time, we can not restore this metadata.

Options …

At this point I had to consider alternate options. I conducted research into this topic as well to see if others had encountered this case (no one has ever not been able to restore their metadata apparently in this case …). The options that I arrived at:

    1. Restore the metadata from the nvme /root as it has older (but known) states - however I had recently expanded the libvirt_t2 volume from a live disk, meaning it may not have the correct part sizes.
    1. Attempt to extract the xfs filesystem with DD from the disk to begin a data recovery.
    1. Cry in a corner
    1. Use lvcreate with the “same paramaters” and hope that it aligns the start at the same location as the former data/libvirt_t2 allowing the xfs filesystem to be accessed.

All of these weren’t good - especially not 3.

I opted to attempt solution 1, and then if that failed, I would disconnect one of the 8TB disks, attempt solution 4, then if that ALSO failed, I would then attempt 2, finally all else lost I would begin solution 3. The major risk of relying on 4 and 2 is that LVM has dynamic geometry on disk, it does not always allocate contiguously. This means that attempting 4 with lvcreate may not create with the same geometry, and it may write to incorrect locations causing dataloss. The risk of 2 was again, due to the dynamic geometry what we recover may be re-arranged and corrupt.

This mean option 1 was the best way to proceed.

I mounted the /root volume of the host and using the lvm archive I was able to now restore the metadata.

vgcfgrestore data --test --file /system/etc/lvm/archive/data_00004-xxxx.vg

Once completed I performed an lvscan to refresh what block devices were available. I was then shown that every member of the VG data had conflicting seqno, and that the metadata was corrupt and unable to proceed.

Somehow we’d made it worse :(

Successful Procedure

At this point, faced with 3 options that were all terrible, I started to do more research. I finally discovered a post describing that the lvm metadata is stored on disk in the same format as the .vg files in the archive, and it’s a ring buffer. We may be able to restore from these.

To do so, you must dd out of the disk into a file, and then manipulate the file to only contain a single metadata entry.

Remember how I made images of my disks before I sent them back? This was their time to shine.

I did do a recovery plan with these commands too, but it was more evolving due to the paramaters involved so it changed frequently with the offsets. The plan was very similar to above - use a loop device as a stand in for the missing block device, restore the metadata, and then go from there.

We know that LVM metadata occurs in the first section of the disk, just after the partition start. So to work out where this is we use gdisk to show the partitions in the backup image.

# gdisk /mnt/mion.nvme0n1.img
GPT fdisk (gdisk) version 1.0.4
...

Command (? for help): p
Disk /mnt/mion.nvme0n1.img: 2000409264 sectors, 953.9 GiB
Sector size (logical): 512 bytes
...

Number  Start (sector)    End (sector)  Size       Code  Name
   1            2048         1026047   500.0 MiB   EF00
   2         1026048       420456447   200.0 GiB   8E00
   3       420456448      2000409230   753.4 GiB   8E00

It’s important to note the sector size flag, as well as the fact the output is in sectors.

The LVM header occupies 255 sectors after the start of the partition. So this in mind we can now create a dd command to extract the needed information.

dd if=/mnt/mion.nvme0n1.img of=/tmp/lvmmeta bs=512 count=255 skip=420456448

bs sets the sector size to 512, count will read from the start up to 255 sectors of size ‘bs’, and skip says to start reading after ‘skip’ * ‘sector’.

At this point, we can now copy this and edit the file:

cp /tmp/lvmmeta /archive/lvm.meta.edit

Within this file you can see the ring buffer of lvm metadata. You need to find the highest seqno that is a complete record. For example, my seqno = 20 was partial (is the lvm meta longer than 255, please contact me if you know!), but seqno=19 was complete.

Here is the region:

# ^ more data above.
}
# Generated by LVM2 version 2.02.180(2) (2018-07-19): Mon Nov 11 18:05:45 2019

contents = "Text Format Volume Group"
version = 1

description = ""

creation_host = "linux-p21s"    # Linux linux-p21s 4.12.14-lp151.28.25-default #1 SMP Wed Oct 30 08:39:59 UTC 2019 (54d7657) x86_64
creation_time = 1573459545      # Mon Nov 11 18:05:45 2019

^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@data {
id = "4t86tq-3DEW-VATS-1Q5x-nLLy-41pR-zEWwnr"
seqno = 19
format = "lvm2"

So from there you remove everything above “contents = …”, and clean up the vgname header. It should look something like this.

contents = "Text Format Volume Group"
version = 1

description = ""

creation_host = "linux-p21s"    # Linux linux-p21s 4.12.14-lp151.28.25-default #1 SMP Wed Oct 30 08:39:59 UTC 2019 (54d7657) x86_64
creation_time = 1573459545      # Mon Nov 11 18:05:45 2019

data {
id = "4t86tq-3DEW-VATS-1Q5x-nLLy-41pR-zEWwnr"
seqno = 19
format = "lvm2"

Similar, you need to then find the bottom of the segment (look for the next highest seqno) and remove everything below the line: “# Generated by LVM2 …”

Now, you can import this metadata to the loop device for the missing device. Note I had to wipe the former lvm meta segment due to the previous corruption, which caused pvcreate to refuse to touch the device.

dd if=/dev/zero of=/dev/loop10 bs=512 count=255
pvcreate --restorefile lvmmeta.orig.nvme1.edited --uuid iC4G41-PSFt-6vqp-GC0y-oN6T-NHnk-ivssmg /dev/loop10

Now you can do a dry run:

vgcfgrestore --test -f lvmmeta.orig.nvme1.edited data

And the real thing:

vgcfgrestore -f lvmmeta.orig.nvme1.edited data
lvscan

Hooray! We have volumes! Let’s check them, and ensure their filesystems are sane:

lvs
lvchange -ay data/libvirt_t2
xfs_repair -n /dev/mapper/data-libvirt_t2

If xfs_repair says no errors, then go ahead and mount!

At this point, lvm started to resync the raid, so I’ll leave that to complete before I take any further action to detach the loopback device.

How to Handle This Next Time

The cause of this issue really comes from vgreduce –removemissing removing the device when a cache member can’t be found. I plan to report this as a bug.

However another key challenge was the inability to restore the lvm metadata when the metadata archive reported a missing device. This is what stopped me from being able to restore the array in the first place, even though I had a “fake” replacement. This is also an issue I intend to raise.

Next time I would:

  • Activate the array as a partial
  • Remove the cache device first
  • Then stop the raid
  • Then perform the vgreduction

I really hope this doesn’t happen to you!

Upgrading OpenSUSE 15.0 to 15.1

Posted by William Brown on September 24, 2019 02:00 PM

Upgrading OpenSUSE 15.0 to 15.1

It’s a little bit un-obvious how to do this. You have to edit the repo files to change the release version, then refresh + update.

sed -ri 's/15\.0/15.1/' /etc/zypp/repos.d/*.repo
zypper ref
zypper dup
reboot

Note this works on a transactional host too:

sed -ri 's/15\.0/15.1/' /etc/zypp/repos.d/*.repo
transactional-update dup
reboot

It would be nice if these was an upgrade tool that would attempt the upgrade and revert the repo files, or use temporary repo files for the upgrade though. It would be a bit nicer as a user experience than sed of the repo files.

Announcing Kanidm - A new IDM project

Posted by William Brown on September 17, 2019 02:00 PM

Announcing Kanidm - A new IDM project

Today I’m starting to talk about my new project - Kanidm. Kanidm is an IDM project designed to be correct, simple and scalable. As an IDM project we should be able to store the identities and groups of people, authenticate them securely to various other infrastructure components and services, and much more.

You can find the source for kanidm on github.

For more details about what the project is planning to achieve, and what we have already implemented please see the github.

What about 389 Directory Server

I’m still part of the project, and working hard on making it the best LDAP server possible. Kanidm and 389-ds have different goals. 389 Directory Server is a globally scalable, distributed database, that can store huge amounts of data and process thousands of operations per second. 389-ds let’s you build a system ontop, in any way you want. If you want an authentication system today, use 389-ds. We are even working on a self-service web portal soon too (one of our most requested features!). Besides my self, no one on the (amazing) 389 DS team has any association with kanidm (yet?).

Kanidm is an opinionated IDM system, and has strong ideas about how authentication and users should be processed. We aim to be scalable, but that’s a long road ahead. We also want to have more web integrations, client tools and more. We’ll eventually write a kanidm to 389-ds sync tool.

Why not integrate something with 389? Why something new?

There are a lot of limitations with LDAP when it comes to modern web-focused auth processes such as webauthn. Because of this, I wanted to make something that didn’t have the same limitations, and had different ideas about data storage and apis. That’s why I wanted to make something new in parallel. It was a really hard decision to want to make something outside of 389 Directory Server (Because I really do love the project, and have great pride in the team), but I felt like it was going to be more productive to build in parallel, than ontop.

When will it be ready?

I think that a single-server deployment will be usable for small installations early 2020, and a fully fledged system with replication would be late 2020. It depends on how much time I have and what parts I implement in what order. Current rough work order (late 2019) is indexing, RADIUS integration, claims, and then self-service/web ui.

OpenSUSE leap as a virtualisation host

Posted by William Brown on September 01, 2019 02:00 PM

OpenSUSE leap as a virtualisation host

I’ve been rebuilding my network to use SUSE from CentOS, and the final server was my hypervisor. Most of the reason for this is the change in my employment, so I feel it’s right to dogfood for my workplace.

What you will need

  • Some computer parts (assembaly may be required)
  • OpenSUSE LEAP 15.1 media (dd if=opensuse.iso of=/dev/a_usb_i_hope)

What are we aiming for?

My new machine has dual NVME and dual 8TB spinning disks. The intent is to have the OS on the NVME and to have a large part of the NVME act as LVM cache for the spinning disk. The host won’t run any applications beside libvirt, and has to connect a number of vlans over a link aggregation.

Useful commands

Through out this document I’ll assume some details about your devices and partitions. To find your own, and to always check and confirm what you are doing, some command will help:

lsblk  # Shows all block storage devices, and how (if) they are mounted.
lvs  # shows all active logical volumes
vgs  # shows all active volume groups
pvs  # shows all active physical volumes
dmidecode  # show hardware information
ls -al /dev/disk/by-<ID TYPE>/  # how to resolve disk path to a uuid etc.

I’m going to assume you have devices like:

/dev/nvme0  # the first nvme of your system that you install to.
/dev/nvme1  # the second nvme, used later
/dev/sda    # Two larger block storage devices.
/dev/sdb

Install

Install and follow the prompts. Importantly when you install you install to a single NVME, and choose transactional server + lvm, then put btrfs on the /root in the lvm. You want to partition such that there is free space still in the NVME - I left about 400GB unpartitioned for this. In other words, the disk should be like:

[ /efi | pv + vg system               | pv (unused) ]
       | /root (btrfs), /boot, /home  |

Remember to leave about 1GB of freespace on the vg system to allow raid 1 metadata later!

Honestly, it may take you a try or two to get this right with YaST, and it was probably the trickiest part of the install.

You should also select that network management is via networkmanager, not wicked. You may want to enable ssh here. I disabled the firewall personally because there are no applications and it interfers with the bridging for the vms.

Packages

Because this is suse transactional we need to add packages and reboot each time. Here is what I used, but you may find you don’t need everything here:

transactional-update pkg install libvirt libvirt-daemon libvirt-daemon-qemu \
  sssd sssd-ad sssd-ldap sssd-tools docker zsh ipcalc python3-docker rdiff-backup \
  vim rsync iotop tmux fwupdate fwupdate-efi bridge-utils qemu-kvm apcupsd

Reboot, and you are ready to partition.

Partitioning - post install

First, copy your gpt from the first NVME to the second. You can do this by hand with:

gdisk /dev/nvme0
p
q

gdisk /dev/nvme1
c
<duplicate the parameters as required>

Now we’ll make your /efi at least a little redundant

mkfs.fat /dev/nvme1p0
ls -al /dev/disk/by-uuid/
# In the above, look for your new /efi fs, IE CE0A-2C1D -> ../../nvme1n1p1
# Now add a line to /etc/fstab like:
UUID=CE0A-2C1D    /boot/efi2              vfat   defaults                      0  0

Now to really make this work, because it’s transactional, you have to make a change to the /root, which is readonly! To do this run

transactional-update shell dup

This put’s you in a shell at the end. Run:

mkdir /boot/efi2

Now reboot. After the reboot your second efi should be mounted. rsync /boot/efi/* to /boot/efi2/. I leave it to the reader to decide how to sync this periodically.

Next you can setup the raid 1 mirror for /root and the system vg.

pvcreate /dev/nvme1p1
vgextend system /dev/nvme1p1

Now we have enough pvs to make a raid 1, so we convert all our volumes:

lvconvert --type raid1 --mirrors 1 system/home
lvconvert --type raid1 --mirrors 1 system/root
lvconvert --type raid1 --mirrors 1 system/boot

If this fails with “not enough space to alloc metadata”, it’s because you didn’t leave space on the vg during install. Honestly, I made this mistake twice due to confusion about things leading to two reinstalls …

Getting ready to cache

Now lets get ready to cache some data. We’ll make pvs and vgs for data:

pvcreate /dev/nvme0p2
pvcreate /dev/nvme1p2
pvcreate /dev/sda1
pvcreate /dev/sda2
vgcreate data /dev/nvme0p2 /dev/nvme1p2 /dev/sda1 /dev/sdb2

Create the larger volume

lvcreate --type raid1 --mirrors 1 -L7.5T -n libvirt_t2 data /dev/sda1 /dev/sdb1

Prepare the caches

lvcreate --type raid1 --mirrors 1 -L 4G -n libvirt_t2_meta data
lvcreate --type raid1 --mirrors 1 -L 400G -n libvirt_t2_cache data
lvconvert --type cache-pool --poolmetadata data/libvirt_t2_meta data/libvirt_t2_cache

Now put the caches in front of the disks. It’s important for you to check you have the correct cachemode at this point, because you can’t change it without removing and re-adding the cache. I choose writeback because my nvme devices are in a raid 1 mirror, and it helps to accelerate writes. You may err to use the default where the SSD’s are read cache only.

lvconvert --type cache --cachemode writeback --cachepool data/libvirt_t2_cache data/libvirt_t2
mkfs.xfs /dev/mapper/data-libvirt_t2

You can monitor the amount of “cached” data in the data column of lvs.

Now you can add this to /etc/fstab as any other xfs drive. I mounted it to /var/lib/libvirt/images.

Network Manager

Now, I have to assemble the network bridges. Network Manager has some specific steps to follow to achieve this. I have:

  • two intel gigabit ports
  • the ports are link aggregated by 802.3ad
  • there are multiple vlans ontop of the link agg
  • bridges must be built on top of the vlans

This requires a specific set of steps to layer this, because network manager sees the bridge and the lagg as seperate things that require the vlan to tie them together.

Configure the link agg, and add our two ethernet phys

nmcli conn add type bond con-name bond0 ifname bond0 mode 802.3ad ipv4.method disabled ipv6.method ignore
nmcli connection add type ethernet con-name bond0-eth1 ifname eth1 master bond0 slave-type bond
nmcli connection add type ethernet con-name bond0-eth2 ifname eth2 master bond0 slave-type bond

Add a bridge for a vlan:

nmcli connection add type bridge con-name net_18 ifname net_18 ipv4.method disabled ipv6.method ignore

Now tie together a vlan on the bond, to the bridge we created.

nmcli connection add type vlan con-name bond0.18 ifname bond0.18 dev bond0 id 18 master net_18 slave-type bridge

You will need to repeat these last two commands as required for the vlans you have.

House Keeping

Finally you need to do some house keeping. Transactional server will automatically reboot and update so you need to be ready for this. You may disable this with:

systemctl disable transactional-update.timer

You likely want to edit:

/etc/sysconfig/libvirt-guests

To be able to handle guest shutdown policy due to a UPS failure or a reboot.

Now you can enable and start libvirt:

systemctl enable libvirtd
systemctl start libvirtd

Finally you can build and import virtualmachines.

LDAP Filter Syntax Validation

Posted by William Brown on August 28, 2019 02:00 PM

LDAP Filter Syntax Validation

Today I want to do a deep-dive into a change that will be released in 389 Directory Server 1.4.2. It’s a reasonably complicated change for our server, but it has a simple user interaction for admins and developers. I want to peel back some of the layers to explain what kind of experience, thought and team work goes into a change like this.

TL;DR - just keep upgrading your 389 Directory Server instance, and our ‘correct by default’ policy will apply, and you’ll keep having the best LDAP server we can make :)

LDAP Filters and How They Work

LDAP filters are one of the primary methods of expression in LDAP, and are used in almost every aspect of the system - from finding who you are when you login, to asserting you are member of a group or have other security attributes.

For the purposes of this discussion we’ll look at this filter:

'(|(cn=william)(cn=claire))'

In order to execute these queries quickly (LDAP is designed to handle thousands of operations per second) we heavily rely on indexing. Indexing is often a topic where people believe it to be some kind of “magic” but it’s reasonably simple: indexes are pre-computed partial result sets. So why do we need these?

We’ll imagine we have two entries (invalid, and truncated for brevity).

dn: cn=william,...
cn: william

dn: cn=claire,...
cn: claire

These entries both have entry-ids - these id’s are per-server in a replication group and are integers. You can show them by requesting entryid as an attribute in 389.

dn: cn=william,...
entryid: 1
cn: william

dn: cn=claire,...
entryid: 2
cn: claire

Our entries are stored in the main-entry database in /var/lib/dirsrv/slapd-standalone1/db/userRoot in the file “id2entry.db4”. This is a key-value database where the keys are the entryid, and the value is the serialised entry itself. Roughly, it’s:

[ ID ][ Entry             ]
  1     dn: cn=william,...
        cn: william

  2     dn: cn=claire,...
        cn: claire

Now, if we had NO indexes, to evaluate our filters we have to scan every entry of id2entry to determine if the filter matches. This algorithm is:

candidate_set = []
for id in id-min to id-max:
    entry = load_entry_by_id(id)
    if apply_filter(filter, entry):
        candidate_set.append(entry)

For two entries, this may be fast, but when you have 1000, 10.000, or even millions, this is extremely slow. We call these searches full table scans or in 389 DS, ALLIDS searches.

To make our searches faster we have indexes. An index is a mapping of a partial query term to an id list (In 389 we call these IDLs). An IDL is a set of integers. Our index for these examples would likely be something like:

cn
=william: [1, ]
=claire: [2, ]

These indexes are also stored in key-value databases in userRoot - you can see this as cn.db4.

So when we have an indexed term, to evaluate the query, we’ll load the indexes, then using mathematical set operations, we then produce a candidate_id_set, and we can then load the entries that only match.

For example in psuedo python code:

# Assume query is: (cn=william)

attr = filter.get_attr_name()
with open('%s.db' % attr) as index:
    idl = index.get('=william') # from the filter :)

for id in idl:
    ... # as before.

So we can see now that when we load the idl for cn index, this would give us the set [1, ]. Even if the database had 100 million entries, as our idl is a single value, we only need to load the one entry that matches. Neat!

When we have a more complex operation such as AND and OR, we can now manipulate the idl sets. For example:

(|(uid=claire)(uid=william))
   uid =claire -> idl [2, ]
   uid =william -> idl [1, ]

candidate_idl_set = union([2, ], [1, ])
# [1, 2]

This means again, even with millions of entries, we only need to load entry 1 and 2 to uphold the query provided to us.

So we finally know enough to understand how our example query is executed. PHEW!

Unindexed Attributes

However, it’s not always so easy. When we have an attribute that isn’t indexed, we have to handle this situation. In these cases, while we operate on the idl set, we may insert an idl with the value of ALLIDS (which as previously mentioned, is the “set of all entries”). This can have various effects.

If this was an AND query, we can annotate that the filter is partially resolved. This means that if we had:

(&(cn=william)(unindexed=foo))

Because an AND condition, both filter components must be satisfied, we have a partial candidate set from cn=william of [1, ]. We can load this partial candidate set, and then apply the filter test as in the full table scan case, but as we only apply it to a single entry this is really fast.

The real problem is OR queries. If we had:

(|(cn=william)(unindexed=foo))

Because OR means that both filter components could be satisfied, we have to turn unindexd into ALLIDS, and the result of the OR as a whole is ALLIDS. So even if we have 30 indexed values in the OR, a single ALLIDS (unindexed value) will always turn that OR into a full table scan. This is not good for performance!

Missing Attributes

So as a weirder case … what if the attribute doesn’t exist in schema at all? For example we could search for Microsoft AD attributes in 389 Directory Server, or we could submit bogus filters like “(whargarble=foo)”. What happens here?

Well, historically we treated these the same as unindexed queries. Which means that any term that is not in schema, would be treated as ALLIDS. This led to a “quitely known” denial of service attack again 389 Directory Server where you could emit a large number of queries for attributes that don’t exist, causing the server to attempt many ALLIDS scans. We have some defences like the allids limit (how many entries you can full table scan before giving up). But it can still cause entry cache churn and other performance issues.

I was first made aware of this issue in 2014 while working for University of Adelaide where our VMWare service would query LDAP for MS attributes, causing a large performance issue. We resolved this by adding the MS attributes to schema and indexing them so that they would create empty indexes - now we would call this in 389 Directory Server and “idl_alloc(0)” or “empty IDL”.

When initially hired by Red Hat in 2015 I knew this issue existed but I didn’t know enough about the server core to really fix it, so it went in the back of my mind … it was rare to have a customer raise this issue, but we had the work around and so I was able to advise support services on how to mitigate this.

In 2019 however, while investigating an issue related to filter optimisation, I was made aware of an issue with FreeIPA where they were doing certmap queries that requested MS Cert attributes. However it would cause large performance issues. We now had the issue again, and in a large widely installed product so it was time to tackle it.

How to handle this?

A major issue in this is “never breaking customers”. Because we had always supported this behaviour there is a risk that any solution would cause customer queries to “silently” begin to break if we issued a fix or change. More accurately, any change to how we execute the filters could cause results of the filters to change, which would disrupt customers.

Saying this, there is also precedent that 389 Directory Server was handling this incorrectly. From the RFC for LDAP it was noted:

Any assertion about the values of such an attribute is only defined if the AttributeType is known by the evaluating mechanism, the purported AttributeValue(s) conforms to the attribute syntax defined for that attribute type, the implied or indicated matching rule is applicable to that attribute type, and (when used) a presented matchValue conforms to the syntax defined for the indicated matching rules. When these conditions are not met, the FilterItem shall evaluate to the logical value UNDEFINED. An assertion which is defined by these conditions additionally evaluates to UNDEFINED if it relates to an attribute value and the attribute type is not present in an attribute against which the assertion is being tested. An assertion which is defined by these conditions and relates to the presence of an attribute type evaluates to FALSE.

Translation: If a filter component (IE nonexist=foo) is in a filter but NOT in the schema, the result of the filter’s evaluation is an empty-set aka undefined.

It was also clear that if an engaged and active consumer like FreeIPA is making this mistake, then it must be overlooked by many others without notice. So there is sometimes value in helping to raise the standard so that everyone benefits, and highlight mistakes quicker.

The Technical Solution

This is the easy part - we add a new configuration option with three states. “on”, “off”, “warn”. “On” would enable the strictest handling of filters, rejecting them an not continuing if any attribute requested was not in the schema. “Warn” would provide the rfc compliant behaviour, mapping to empty-set index, and notifying in the logs that this occured. Finally, “off” would be the previous “silently allow” behaviour.

This was easily achieved in filter parsing, by checking the attribute of each filter component against our schema hashmap. We then tag the filter element, and depending on the current setting level reject or continue.

In the query execution code, we now check the filter tag to understand if the attribute is schema present or not. If it’s flagged as “undefined”, then we immediately shortcut to return idl_alloc(0) instead of returning ALLIDS on the failure to find the relevant index db.

We can show the performance impact of this change:

Before with non-existant attribute

Average rate:    7.40/thr

After with “warn” enabled (map to empty set)

Average rate: 4808.70/thr

This is a huge improvement, and certainly shows the risk of DOS and how effective the solution was!

The Social Solution

Of course, this is the hard part - the 389 Directory Server team are all amazingly smart people, from many countries, and all live around the world. They all care that the server is the best possible, and that our standards as a team are high. That means when introducing a change that has a risk of affecting query result sets like this, they pay attention, and ask many difficult questions about how the feature will be implemented.

The first important justification - is a change like this worth while? We can see from the performance results that the risk of DOS is reasonable, so the answer there becomes Yes from a security view. But it’s also important to consider the cost on consumers - is this change going to benefit FreeIPA for example? As I am biased being the author I want to say “yes” - by notifying or rejecting invalid filters earlier, we can help FreeIPA developers improve their code quality, without expecting them to know LDAP inside and out.

The next major question is performance - before the feature was developed there is clearly a risk of DOS, but when we implement this we are performing additional locking on the schema. Is that a risk to our standalone performance or normal operating conditions. This had to also be discussed and assessed.

A really important point that was raised by Thierry is how we communicated these errors too. Previously we would use the “notes=” field of the access log. It looks like this:

conn=1 op=4 RESULT err=0 tag=101 nentries=13 etime=0.0003795424 notes=U

The challenge with the notes= field, is that it’s easy to overlook, and unless you are familar, hard to see what this is indicating. In this case, notes=U means partially unindexed query (one filter component but not all returned ALLIDS).

We can’t change the notes field due to the risk of breaking our own scripts like logconv.pl, support tools developed by RH or SUSE, or even integrations to platforms like splunk. But clearly we need a way to detail what is happening with your filter. So Thierry suggested an extension to have details about the provided notes. Now we get:

conn=1 op=4 RESULT err=0 tag=101 nentries=13 etime=0.0003795424 notes=U details="Partially Unindexed Filter"
conn=1 op=8 RESULT err=0 tag=101 nentries=0 etime=0.0001886208 notes=F details="Filter Element Missing From Schema"

So we have extended our log message, but without breaking existing integrations.

The final question is what our defaults should be. It’s one thing to have this feature, but what should we ship with? Do we strictly reject filters? Warn? Or disable, and expect people to turn this on.

This became a long discussion with Ludwig, Thierry and I - we discussed the risk of DOS in the first place, what the impact of the levels could be, how it could break legacy applications or sites using deprecated features or with weird data imports. Many different aspects were considered. We decided to default to “warn” (non-existant becomes empty-set), and we settled on communication with support to advise them of the upcoming change, but also we considered that our “back out” plan is to change the default and ship a patch if there is a large volume of negative feedback.

Conclusion

As of today, the PR is merged, and the code on it’s way to the next release. It’s a long process but the process exists to ensure we do what’s best for our users, while we try to balance many different aspects. We have a great team of people, with decades of experience from many backgrounds which means that these discussions can be long and detailed, but in the end, we hope to give what is the best product possible to our community.

It’s also valuable to share how much thought and effort goes into projects - in your life you may only interact with 1% of our work through our configuration and system, but we have an iceberg of decisions and design process that affects you every day, where we have to be responsible and considerate in our actions.

I hope you enjoyed this exploration of this change!

References

PR#50379

Using ramdisks with Cargo

Posted by William Brown on August 25, 2019 02:00 PM

Using ramdisks with Cargo

I have a bit of a history of killing SSDs - probably because I do a bit too much compiling and management of thousands of tiny files. Plenty of developers have this problem! So while thinking one evening, I was curious if I could setup a ramdisk on my mac for my cargo work to output to.

Making the ramdisk

On Linux you’ll need to use tmpfs or some access to /dev/shm.

On OSX you need to run a script like the following:

diskutil partitionDisk $(hdiutil attach -nomount ram://4096000) 1 GPTFormat APFS 'ramdisk' '100%'

This creates and mounts a 4GB ramdisk to /Volumes/ramdisk. Make sure you have enough ram!

Asking cargo to use it

We probably don’t want to make our changes permant in Cargo.toml, so we’ll use the environment:

CARGO_TARGET_DIR=/Volumes/ramdisk/rs cargo ...

Does it work?

Yes!

Disk Build (SSD, 2018MBP)

Finished dev [unoptimized + debuginfo] target(s) in 2m 29s

4 GB APFS ramdisk

Finished dev [unoptimized + debuginfo] target(s) in 1m 53s

For me it’s more valuable to try and save those precious SSD write cycles, so I think I’ll try to stick with this setup. You can see how much rust writes by doing a clean + build. My project used the following:

Filesystem                             Size   Used  Avail Capacity    iused               ifree %iused  Mounted on
/dev/disk110s1                        2.0Gi  1.2Gi  751Mi    63%       3910 9223372036854771897    0%   /Volumes/ramdisk

Make it permanent

Put the following in /Library/LaunchDaemons/au.net.blackhats.fy.ramdisk.plist

<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE plist PUBLIC "-//Apple//DTD PLIST 1.0//EN" "http://www.apple.com/DTDs/PropertyList-1.0.dtd">
<plist version="1.0">
        <dict>
                <key>Label</key>
                <string>au.net.blackhats.fy.ramdisk</string>
                <key>Program</key>
                <string>/usr/local/libexec/ramdisk.sh</string>
                <key>RunAtLoad</key>
                <true/>
                <key>StandardOutPath</key>
                <string>/var/log/ramdisk.log</string>
        </dict>
</plist>

And the following into /usr/local/libexec/ramdisk.sh

#!/bin/bash
date
diskutil partitionDisk $(hdiutil attach -nomount ram://4096000) 1 GPTFormat APFS 'ramdisk' '100%'

Finally put this in your cargo file of choice

[build]
target-dir = "/Volumes/ramdisk/rs"

Future william will need to work out if there are negative consequences to multiple cargo projects sharing the same target directory … hope not!

Launchctl tips

# Init the service
launchctl load /Library/LaunchDaemons/au.net.blackhats.fy.ramdisk.plist

References

lanuchd.info