macOS 11's hidden security improvements

macOS 11’s hidden security improvements

A deep dive into macOS 11’s internals reveals some security surprises that deserve to be more widely known.

Contents

  1. Introduction
    1. Disclaimers
  2. macOS 11’s better known security improvements
    1. Secret messages revealed?
  3. CPU security mitigation APIs
    1. The
      NO_SMT
      mitigation
    2. The
      TECS
      mitigation
    3. Who benefits from
      NO_SMT
      and
      TECS
      ?
  4. Endpoint Security API improvements
    1. More message types
    2. More notifications, less polling
    3. More metadata
    4. Improved performance
  5. A vulnerability quietly fixed
  6. O_NOFOLLOW_ANY
  7. Conclusion
  8. Endnotes

Introduction

When a new release of an operating system comes out, normal people find out what’s new by attending developer conferences, reading release notes, changelogs, reviews.

Me, I download the software development kit (SDK) for the new version, and diff it with the current version.

This is not uncommon on, say, Windows: There are entire websites dedicated to large scale, long term, differential reverse engineering, that tell you what new functions appeared in what version of Windows, how their relationship with other functions has changed, how internal data structures have evolved etc. On macOS, nobody seems to do it (at least not in public), and something as simple as diffing the includes from one SDK version to the next and patiently going through it, file by file, can reveal interesting features nobody knows (or at least talks) about.

Comparing the macOS 11 and macOS 10.15 SDKs, I found several intriguing surprises that deserve to be more widely known.

Disclaimers

In this article, I describe poorly-documented, or completely undocumented, features that could stop working as advertised or disappear completely without notice in future releases of macOS. Use common sense, assess the risks, choose, and take responsibility for your choice.

Note that I’m just a developer, neither a security researcher nor an exploit writer, and my descriptions of security issues and their mitigations might fall between “slightly incorrect” and “completely wrong”. I welcome corrections.

macOS 11’s better known security improvements

At the WWDC 2020, Apple made a big deal of several new macOS and iOS features that were, in fact, big deals. This article was supposed to come out much earlier, and I don’t expect anyone still remembers what the fuss was about over a year later, so I’ll give you a brief recap.

The major new security features that would debut in macOS 11 were:

  • Pointer Authentication Codes (PAC), hardware-enforced Call Flow Integrity (CFI), implemented by Apple’s homegrown 64 bit ARM processor, the M1. Currently limited to system code and kernel extensions, but open to all third-party developers for experimentation.
  • Device isolation was another M1-only feature, that uses the more powerful IOMMU of that platform to make sure hardware devices can only share memory with the operating system and not with each other. Cross-device memory sharing is a historical custom, based on a blind, unfounded trust in hardware.
  • Write XOR Execute (W^X) finally came to macOS, in a hardware-enforced form (yes, another M1-only feature). Memory pages can now be either writable or executable, never both at the same time; no exceptions. Just-in-time (JIT) compilers will need to be redesigned around this limitation to run on ARM Macs, but special APIs are provided to make the work easier.
  • Signed System Volume (SSV) cryptographically sealed the boot volume and made it tamper-evident. (MacOS has booted from a read-only volume since 10.15.) Apple’s Protecting data at multiple layers article briefly describes SSV, but Howard Oakley has an even more detailed write-up on his blog, with illustrations; a must-read. You should also check out Andrew Cunningham’s review of macOS 11.

These technologies have justly earned the attention of the press and security researchers, and they’ve been discussed in great detail elsewhere. (The Apple video Explore the new system architecture of Apple silicon Macs from session 10686 of the WWDC 2020 has a good overview of most of the new security features, and more.)

There’s really nothing I could add to what the excellent resources out there say about these topics, but there are other security improvements that everyone seems to have missed, and that Apple seems to be shy about.

Secret messages revealed?

On second thoughts, maybe my rummaging approach can add something novel (although incredibly trivial) to the publicly-disclosed security improvements: I’ve not seen anybody mention the fact that the cryptographically sealed filesystem underlying SSV is internally code-named “Cryptex”. The cryptex(5) man page claims that: “The name ‘cryptex’ is a portmanteau for ‘CRYPTographically-sealed EXtension’.”

… but I know, we know, they know that they took the name from Dan Brown’s best-selling, award-winning birdcage liner The Da Vinci Code. The otherwise forgettable (and best-forgotten) airport thriller introduced the intriguing concept of a cryptex: a secret message, sealed by a combination lock, that would self-destruct if opened by force.

There are intriguing hints in cryptex(5) that suggest a wider Cryptex Cinematic Universe, like references to a

cryptexctl(1)
command and a cryptexd(8) daemon, but those man pages are nowhere to be found, nor are the two binaries part of macOS. A placeholder man page for
libcryptex(3)
has literally nothing to say about the “Cryptex management library”, except an interesting detail: A copyright date of 19 October, 2018, suggesting that SSV had been in development for a long time before materializing as an end user feature.

The SDK includes the import libraries for libcryptex,

libcryptex_core
and libcryptex_interface, but not the libraries themselves, so we have the lists of exported symbols but not the code behind them. The libraries, too, are not part of macOS, which makes me think that the scattered Cryptex artefacts found in the SDK probably escaped, no idea how, from an Apple private code corral.

All that the symbol lists can tell us is that the politically correct “CRYPTographically-sealed EXtension” revisionism can be put to rest: To me, functions with names like codex_install_pack (exported by

libcryptex_interface
) unquestionably prove a Brownian origin of the name!

CPU security mitigation APIs

Developers are taught to think of the CPU as a perfect, mathematical abstraction. In 2018, year of microarchitectural vulnerabilities (Spectre and Meltdown to name the most infamous ones), we were set straight: CPUs run on code; CPU developers weren’t preternaturally capable of writing multithreaded code without race conditions; and the CPUs they made were buggy, unreliable and traitorous, conspiring with applications against the operating system (OS) to bypass access controls in undetectable ways.

The issues that could be fixed were fixed. The remaining issues could only be mitigated, either in the OS or the compiler, at the the cost of performance. I was aware of the mitigations rolled out by Microsoft as Windows updates, the new MSVC compiler option /Qspectre, the changes to the Chrome JIT compiler to prevent it from generating exploitable code from malicious Javascript, etc.

But, I was surprised to discover new, unannounced, completely undocumented mitigations in macOS 11.

As far as I can tell, this is the first public article ever written that describes them. The new APIs return virtually no hits in grep.app or GitHub code search—or Google, for that matter.

Two kinds of mitigations are provided, codenamed NO_SMT and

TECS
. Let’s have a closer look at them.

The
NO_SMT
mitigation

What is it?

NO_SMT disables Simultaneous multithreading (SMT), the CPU feature better known under Intel’s trade name of “Hyper-Threading”. SMT allows a CPU core to execute two or more threads at the same time, for improved performance at the cost of contention for per-core resources, such as caches, TLBs etc.

Letting multiple threads share invisible resources carries the risk of letting a malicious thread steal secrets from a “sibling” thread running on the same core—a risk that over the years has materialized into multiple attacks, like TLBleed, PortSmash, Fallout, ZombieLoad, RIDL. A straightforward mitigation for this entire family of attacks, past and future, is then to simply disable SMT, which is what NO_SMT does.

How to use
NO_SMT

In C/C++, #include ; no extra library necessary. From

:

#define F_SETSIZE       43              /* Truncate a file. Equivalent to calling truncate(2) */ 

As far as I can tell, this information disclosure vulnerability was never assigned a CVE, nor was it publicly acknowledged in any other way before it was silently fixed in macOS 11.

O_NOFOLLOW_ANY

Even primeval APIs like system call open(2) can still have room for improvement. macOS 11 introduces a new flag for it,

O_NOFOLLOW_ANY
, that mitigates an entire family of potential vulnerabilities, especially in security applications.

Endpoint Security provides the full, resolved (no symlinks), normalized path of each file involved in an auditable event (what OpenBSM veterans/victims like me used to know as “vnode kpath”), but how can applications be sure that the path still identifies the same file by the time they open it? With O_NOFOLLOW_ANY set,

open(2)
will fail with error ELOOP if any symlink is encountered anywhere in the path: A stronger version of
O_NOFOLLOW
that applies to the entire path, not just the final component.

Conclusion

What did I learn from my rummaging? Apple still likes its secrets; the Chromium source code still is the best documentation on the mitigations and sandboxing features provided by all major operating systems; diffing releases remains the best way to find hidden features; and some secrets can stay hidden in plain sight for a long time.

I wrote this article in part as a “look at this cool thing”, and in part as a sort of public service, so that the new, hidden macOS features no longer return a deafening silence when queried on search engines. Even if I got some details wrong, at least the topic can now be debated.

Endnotes

1 The relationship between tasks and processes in macOS can be roughly summarized as: Each (BSD) process corresponds exactly to one (Mach[5]) task, and vice versa, until the process calls exec(2).

exec(2)
terminates the current task, creates a new one and associates it to the process, replacing the dead one. macOS old timers may object that exec(2) actually keeps the same task, resetting its state. It used to work like that, but it was a fragile design, that was dealt a fatal blow by Google researcher Ian Beer in 2016. Starting from XNU 3789.21.4 (macOS 10.12.1, iOS 10.1),
exec(2)
creates a new task
.

2 What’s the use for such an ancient API? It may sound incredible, but until Endpoint Security, macOS had no reliable auditing mechanism to log process deaths. Except acct(2), that is, which to describe as “archaic” would be a compliment. It logs fixed-size records to a global log file, it truncates process names to 9 characters, it logs user ids but not process ids, and the timestamp format for process exit times is a literally unbelievable 1/64 * 8^exponent * mantissa seconds since the process started (good for about 8 years and a half of non-stop running, but with a variable precision that drops below the second at the 2:16:30 mark), encoded in 16 bits as 3 bits exponent, 13 bits mantissa; the process start time is a saner 32-bit count of seconds since the UNIX epoch (good for about 17 years from now). We opted not to use

acct(2)
.

3 Mandatory Access Control, no relation to “Mac” as an abbreviation for “Macintosh”. Historically referring to security models patterned after military document classification practices, MAC is nowadays a generic term for any policy-based security model, distinct from and orthogonal to permission-based security models (also known as DAC, or Discretionary Access Control). macOS inherited its modular MAC framework from the TrustedBSD project and uses it with great gusto (I count no fewer than seven policy modules on my macOS 11 machine). The Linux Security Modules framework is the Linux equivalent. Windows has limited MAC in the form of the capabilities system for UWP apps. The bitter irony is that, being limited to a small subset of applications, it’s a “mandatory” access control system that operates on an opt-in basis—to say nothing of its complete non-extensibility.

4 If daemons have to run under an anonymous user, why must it be root, which, while still bound by MAC policies, can bypass all ACLs, send kill signals to any process, invoke sysctls in write mode and in general do a lot of damage? A little known fact is that while system daemons run as root, they also do run inside extremely strict sandboxes based on the Seatbelt framework (internally based on, you guessed it, a MAC policy module, unimaginatively named Sandbox). In a sadly predictable twist, while extremely powerful and enabling incredibly granular access control, Seatbelt is almost completely undocumented (notice a pattern yet?). In a less predictable but sadder twist, it’s also been marked as deprecated with no replacement since OS X 10.8. Nevertheless, with all system daemons using it, plus third-party users of the caliber of Google Chrome and Mozilla Firefox, Seatbelt seems unlikely to disappear any time soon.

5 Mach (no relation to “Macintosh”) was a research project to replace the BSD kernel with a microkernel. The design proved impractical, and the most famous real-world implementation of Mach, NeXTSTEP (later “Darwin”, later “OS X”, later “macOS”), actually runs a Mach/BSD hybrid kernel with very little “micro”. Mach was an extremely influential design: Windows NT (later “Windows”) is a Mach clone too, except redesigned to work alongside a VMS-like kernel instead of a BSD one. Windows NT, too, flirted with a microkernel architecture, but it proved to be no more practical in the 90s than it had been in the 80s.

#define F_SETSIZE       43              /* Truncate a file without zeroing space */ 

And after (macOS 11 SDK):

#define F_SETSIZE       43              /* Truncate a file. Equivalent to calling truncate(2) */ 

As far as I can tell, this information disclosure vulnerability was never assigned a CVE, nor was it publicly acknowledged in any other way before it was silently fixed in macOS 11.

O_NOFOLLOW_ANY

Even primeval APIs like system call open(2) can still have room for improvement. macOS 11 introduces a new flag for it,

O_NOFOLLOW_ANY
, that mitigates an entire family of potential vulnerabilities, especially in security applications.

Endpoint Security provides the full, resolved (no symlinks), normalized path of each file involved in an auditable event (what OpenBSM veterans/victims like me used to know as “vnode kpath”), but how can applications be sure that the path still identifies the same file by the time they open it? With O_NOFOLLOW_ANY set,

open(2)
will fail with error ELOOP if any symlink is encountered anywhere in the path: A stronger version of
O_NOFOLLOW
that applies to the entire path, not just the final component.

Conclusion

What did I learn from my rummaging? Apple still likes its secrets; the Chromium source code still is the best documentation on the mitigations and sandboxing features provided by all major operating systems; diffing releases remains the best way to find hidden features; and some secrets can stay hidden in plain sight for a long time.

I wrote this article in part as a “look at this cool thing”, and in part as a sort of public service, so that the new, hidden macOS features no longer return a deafening silence when queried on search engines. Even if I got some details wrong, at least the topic can now be debated.

Endnotes

1 The relationship between tasks and processes in macOS can be roughly summarized as: Each (BSD) process corresponds exactly to one (Mach[5]) task, and vice versa, until the process calls exec(2).

exec(2)
terminates the current task, creates a new one and associates it to the process, replacing the dead one. macOS old timers may object that exec(2) actually keeps the same task, resetting its state. It used to work like that, but it was a fragile design, that was dealt a fatal blow by Google researcher Ian Beer in 2016. Starting from XNU 3789.21.4 (macOS 10.12.1, iOS 10.1),
exec(2)
creates a new task
.

2 What’s the use for such an ancient API? It may sound incredible, but until Endpoint Security, macOS had no reliable auditing mechanism to log process deaths. Except acct(2), that is, which to describe as “archaic” would be a compliment. It logs fixed-size records to a global log file, it truncates process names to 9 characters, it logs user ids but not process ids, and the timestamp format for process exit times is a literally unbelievable 1/64 * 8^exponent * mantissa seconds since the process started (good for about 8 years and a half of non-stop running, but with a variable precision that drops below the second at the 2:16:30 mark), encoded in 16 bits as 3 bits exponent, 13 bits mantissa; the process start time is a saner 32-bit count of seconds since the UNIX epoch (good for about 17 years from now). We opted not to use

acct(2)
.

3 Mandatory Access Control, no relation to “Mac” as an abbreviation for “Macintosh”. Historically referring to security models patterned after military document classification practices, MAC is nowadays a generic term for any policy-based security model, distinct from and orthogonal to permission-based security models (also known as DAC, or Discretionary Access Control). macOS inherited its modular MAC framework from the TrustedBSD project and uses it with great gusto (I count no fewer than seven policy modules on my macOS 11 machine). The Linux Security Modules framework is the Linux equivalent. Windows has limited MAC in the form of the capabilities system for UWP apps. The bitter irony is that, being limited to a small subset of applications, it’s a “mandatory” access control system that operates on an opt-in basis—to say nothing of its complete non-extensibility.

4 If daemons have to run under an anonymous user, why must it be root, which, while still bound by MAC policies, can bypass all ACLs, send kill signals to any process, invoke sysctls in write mode and in general do a lot of damage? A little known fact is that while system daemons run as root, they also do run inside extremely strict sandboxes based on the Seatbelt framework (internally based on, you guessed it, a MAC policy module, unimaginatively named Sandbox). In a sadly predictable twist, while extremely powerful and enabling incredibly granular access control, Seatbelt is almost completely undocumented (notice a pattern yet?). In a less predictable but sadder twist, it’s also been marked as deprecated with no replacement since OS X 10.8. Nevertheless, with all system daemons using it, plus third-party users of the caliber of Google Chrome and Mozilla Firefox, Seatbelt seems unlikely to disappear any time soon.

5 Mach (no relation to “Macintosh”) was a research project to replace the BSD kernel with a microkernel. The design proved impractical, and the most famous real-world implementation of Mach, NeXTSTEP (later “Darwin”, later “OS X”, later “macOS”), actually runs a Mach/BSD hybrid kernel with very little “micro”. Mach was an extremely influential design: Windows NT (later “Windows”) is a Mach clone too, except redesigned to work alongside a VMS-like kernel instead of a BSD one. Windows NT, too, flirted with a microkernel architecture, but it proved to be no more practical in the 90s than it had been in the 80s.

/*  * Set CPU Security Mitigation on the spawned process  * This attribute affects all threads and is inherited on fork and exec  */ int     posix_spawnattr_set_csm_np(const posix_spawnattr_t * __restrict attr, uint32_t flags) __API_AVAILABLE(macos(11.0)); /*  * flags for CPU Security Mitigation attribute  * POSIX_SPAWN_NP_CSM_ALL should be used in most cases,  * the individual flags are provided only for performance evaluation etc  */ #define POSIX_SPAWN_NP_CSM_ALL         0x0001 #define POSIX_SPAWN_NP_CSM_NOSMT       0x0002 #define POSIX_SPAWN_NP_CSM_TECS        0x0004 

The meaning of the flags is identical to the similarly named flags, and

posix_spawnattr_set_csm_np(attr, POSIX_SPAWN_NP_CSM_NOSMT)
is 100% identical to and interchangeable with posix_spawnattr_setnosmt_np(attr).

How does it work?

Just like NO_SMT,

TECS
is a write-once, enable-only flag that is copied from task to task, from task to thread, and from thread to CPU. The task flag is TF_TECS. Below the task level, the flag becomes architecture-specific, x86-64-only, morphing into a mitigation codenamed
SEGCHK
. Thus, the thread flag is boolean field machine_thread::mthr_do_segchk, and the CPU flag is boolean field
cpu_data::cpu_curthread_do_segchk
, also known as CPU_NEED_SEGCHK in assembler code.

SEGCHK is implemented entirely in assembler, in kernel-to-user return routines

ks_64bit_return
and ks_32bit_return. If
CPU_NEED_SEGCHK
is set for the current CPU, they execute a VERW instruction shortly before the final
SYSEXIT
/SYSRET/
IRET
. VERW is an obscure and largely obsolete instruction that checks if the specified segment (the user mode stack segment, in the case of
SEGCHK
) is writable; but more importantly, it has the side effect of flushing the caches exploited by the RDCL and MDS families of attacks, mitigating them.

TECS is only enabled if it’s supported by the CPU, or if it’s been forced on by default. CPU support for

TECS
is only checked on x86-64, and it corresponds to whether SEGCHK is supported. The checks, performed at boot time, are:

Using VERW as a mitigation was initially suggested by two of the discoverers of RIDL (page 200), but it seems it proved insufficient, and CPU vendors had to enhance the instruction to act as a proper mitigation. macOS doesn’t trust the un-enhanced

VERW
.

SEGCHK can be forced on system-wide with a boot parameter. Again, see support article HT210108. Similarly, it can be forced off system-wide with undocumented boot parameter

cwad
(CPU workaround disable), which has the same syntax as cwae (CPU workaround enable).
cwae
has priority over cwad.

Unlike NO_SMT,

SEGCHK
/TECS has no firmware-level equivalent, nor can it be disabled after boot.

Who benefits from
NO_SMT
and
TECS
?

Google.

I’ve looked everywhere and no one else seems to use these mitigation APIs. The only source code match (outside of the macOS 11 and 12 SDKs, and the XNU source code itself) is Chromium. The only binary matches on my macOS 11 machine (outside of system libraries) are the Chrome and Electron frameworks, i.e. Chromium. Not even Safari seems to use them!

In Chromium, when compiling for macOS, the base::LaunchOptions structure passed to function

base::LaunchProcess
contains a boolean field named enable_cpu_security_mitigations; if set, the macOS implementation of
base::LaunchProcess
launches the new process with CSM flags POSIX_SPAWN_NP_CSM_ALL. If I understand the code correctly, mitigations are enabled for renderer and plugin host sub-processes, and disabled for all other kinds of sub-processes (another possible reading of the code suggests that the feature is implemented, but unused. Honestly, I haven’t dug too deep).

It’s hard not to wonder why Apple went through the effort of implementing mitigations and exposing them as APIs, and then neither document nor even use them. If they are ineffective, the question becomes why Google bothers using them. Either way, we are left with no clear answer.

Endpoint Security API improvements

Endpoint Security probably needs no introduction to the audience of this article, but I’ll still give a brief one.

This C API, first introduced in macOS 10.15, replaced and made obsolete the pre-existing patchwork of archaic auditing, monitoring and policing APIs (among which OpenBSM, KAUTH, Socketfilter and the venerable acct(2)—est. 1979[2]).

The design of Endpoint Security combined the near-absolute visibility and veto power over system state of a MAC[3] policy module with the safety properties of a client-server model, with a really nice and pretty-well-documented API on top. In short, it was the perfect API for a large variety of security applications.

Or was it? Unfortunately, Endpoint Security wasn’t without its own shortcomings, but they’re gradually being rectified. Let’s have a look at the most important improvements that macOS 11 and 12 make to Endpoint Security, only some of which were officially documented.

More message types

More operations can now be detected and/or vetoed, such as fcntl(2),

searchfs(2)
, ptrace(2), remounting a filesystem,
IOServiceOpen
, task_name_for_pid, process suspension and process resumption.

Interestingly, process suspension includes private system call pid_shutdown_sockets, which doesn’t actually suspend processes, but only shuts down their network connections after they’ve already been suspended. The system call was originally only available on iOS, where it’s part of how apps are sent to the background.

macOS 12 adds some more notifications: setuid(2),

setgid(2)
, seteuid(2),
setegid(2)
, setreuid(2) and
setregid(2)
.

More notifications, less polling

Some process metadata that only used to be available for querying, and necessitated polling and/or diffing to detect changes, now generates change events.

ES_EVENT_TYPE_NOTIFY_CS_INVALIDATED messages notify that a process’s code signature has gone invalid (i.e.

CS_VALID
flag no longer set) but the process is allowed to keep running (i.e. CS_HARD flag not set). Previously, it was only pollable through private system calls
csops
or csops_audittoken with operation code
CS_OPS_STATUS
.

ES_EVENT_TYPE_NOTIFY_REMOTE_THREAD_CREATE messages notify the creation of remote (i.e. inter-process) threads. Previously, this information was only available at low fidelity and with great effort, either by polling and diffing the data returned by Mach task method

task_info
with flavor TASK_EXTMOD_INFO, or by monitoring syslog for
com.apple.kernel.external_modification
messages.

More metadata

exec(2) messages now include the new process’s working directory (

es_event_exec_t::cwd
field).

Process metadata for all messages now includes:

  • The process’s controlling terminal, if any (
    es_process_t::tty
    field).
  • The process’s “start time”, i.e. the time when its process identifier was allocated by
    fork(2)
    (
    es_process_t::start_time
    field). Previously only available through
    sysctl(2)
    with the
    kern.proc.pid.
    OID.
  • the “responsible process” (
    es_process_t::responsible_audit_token
    field), i.e. the process that the notorious (to us developers) Transparency, Consent & Control (TCC) framework blames for an operation subject to user consent. Often, this is the client process that caused a daemon/agent process to be launched, which in an auditing context should be considered the “true” parent of a process (instead of “placeholder”
    xpcproxy(8)
    ). Previously only available through the private—and completely undocumented—”responsibility” API of MAC policy module Quarantine (e.g.
    responsibility_get_responsible_for_pid
    ).

Finally, for the first time ever in a macOS auditing API, all messages now report not just the process that caused the message to be generated, but the exact thread as well (es_message_t::thread field).

Improved performance

It’s now possible to process messages asynchronously without the overhead of es_copy_message/

es_free_message
(equivalent to a sequence of malloc,
memcpy
and free): Messages are now reference counted (see new functions
es_retain_message
/es_release_message), and can be moved across threads almost for free.
es_copy_message
and es_free_message have been outright deprecated and should no longer be used, except for backwards compatibility with macOS 10.15. They won’t be missed by me or my spindump traces.

A vulnerability quietly fixed

Sometimes, diffing SDK versions can even reveal security holes that were quietly fixed. Such is the case for fcntl(2) command

F_SETSIZE
.

F_SETSIZE is used to change the maximum disk space allocated to a file: If it’s smaller than the current size, the file is truncated; if it’s larger, the file is extended. What stops a malicious process from extending a file so that it fills the entire disk, and then reading from the extended file to carve deleted files out of what was previously free space? Very simple:

F_SETSIZE
fills the new file space with all zeroes to conceal what it used to contain. As an optimization, a superuser process (effective user id 0) is allowed to extend a file without zeroing out, because a superuser process is assumed to have access to that data anyway.

However, macOS has gradually made the UNIX security model irrelevant. For example, even the superuser is only allowed to access the private documents of a regular user with the user’s permission—permission that is given on a per-application basis, through that protector of users and bane of developers known as the Transparency, Consent & Control (TCC) framework. This reflects the new meaning that macOS has given to the “root” superuser: No longer the administrator of a multi-user system, as it was originally meant on UNIX, but either a temporary identity assumed by each user for system administration tasks (e.g. by way of sudo(8)), or the anonymous user under which daemons run[4].

In this new security model, the superuser can no longer be assumed to have unrestricted access to everything. However, not zeroing space when extending a file would let a superuser process with no entitlements at all recover any file that had been deleted.

macOS 11 fixes this by no longer handling the superuser as a special case for F_SETSIZE. The man page for

fcntl(2)
now says:

F_SETSIZE . Deprecated. In previous releases, this would allow a process with root privileges to truncate a file without zeroing space. For security reasons, this operation is no longer supported and will instead truncate the file in the same manner as

truncate(2)
.

Even the comments in were amended. Before (macOS 10.15 SDK):

#define F_SETSIZE       43              /* Truncate a file without zeroing space */ 

And after (macOS 11 SDK):

#define F_SETSIZE       43              /* Truncate a file. Equivalent to calling truncate(2) */ 

As far as I can tell, this information disclosure vulnerability was never assigned a CVE, nor was it publicly acknowledged in any other way before it was silently fixed in macOS 11.

O_NOFOLLOW_ANY

Even primeval APIs like system call open(2) can still have room for improvement. macOS 11 introduces a new flag for it,

O_NOFOLLOW_ANY
, that mitigates an entire family of potential vulnerabilities, especially in security applications.

Endpoint Security provides the full, resolved (no symlinks), normalized path of each file involved in an auditable event (what OpenBSM veterans/victims like me used to know as “vnode kpath”), but how can applications be sure that the path still identifies the same file by the time they open it? With O_NOFOLLOW_ANY set,

open(2)
will fail with error ELOOP if any symlink is encountered anywhere in the path: A stronger version of
O_NOFOLLOW
that applies to the entire path, not just the final component.

Conclusion

What did I learn from my rummaging? Apple still likes its secrets; the Chromium source code still is the best documentation on the mitigations and sandboxing features provided by all major operating systems; diffing releases remains the best way to find hidden features; and some secrets can stay hidden in plain sight for a long time.

I wrote this article in part as a “look at this cool thing”, and in part as a sort of public service, so that the new, hidden macOS features no longer return a deafening silence when queried on search engines. Even if I got some details wrong, at least the topic can now be debated.

Endnotes

1 The relationship between tasks and processes in macOS can be roughly summarized as: Each (BSD) process corresponds exactly to one (Mach[5]) task, and vice versa, until the process calls exec(2).

exec(2)
terminates the current task, creates a new one and associates it to the process, replacing the dead one. macOS old timers may object that exec(2) actually keeps the same task, resetting its state. It used to work like that, but it was a fragile design, that was dealt a fatal blow by Google researcher Ian Beer in 2016. Starting from XNU 3789.21.4 (macOS 10.12.1, iOS 10.1),
exec(2)
creates a new task
.

2 What’s the use for such an ancient API? It may sound incredible, but until Endpoint Security, macOS had no reliable auditing mechanism to log process deaths. Except acct(2), that is, which to describe as “archaic” would be a compliment. It logs fixed-size records to a global log file, it truncates process names to 9 characters, it logs user ids but not process ids, and the timestamp format for process exit times is a literally unbelievable 1/64 * 8^exponent * mantissa seconds since the process started (good for about 8 years and a half of non-stop running, but with a variable precision that drops below the second at the 2:16:30 mark), encoded in 16 bits as 3 bits exponent, 13 bits mantissa; the process start time is a saner 32-bit count of seconds since the UNIX epoch (good for about 17 years from now). We opted not to use

acct(2)
.

3 Mandatory Access Control, no relation to “Mac” as an abbreviation for “Macintosh”. Historically referring to security models patterned after military document classification practices, MAC is nowadays a generic term for any policy-based security model, distinct from and orthogonal to permission-based security models (also known as DAC, or Discretionary Access Control). macOS inherited its modular MAC framework from the TrustedBSD project and uses it with great gusto (I count no fewer than seven policy modules on my macOS 11 machine). The Linux Security Modules framework is the Linux equivalent. Windows has limited MAC in the form of the capabilities system for UWP apps. The bitter irony is that, being limited to a small subset of applications, it’s a “mandatory” access control system that operates on an opt-in basis—to say nothing of its complete non-extensibility.

4 If daemons have to run under an anonymous user, why must it be root, which, while still bound by MAC policies, can bypass all ACLs, send kill signals to any process, invoke sysctls in write mode and in general do a lot of damage? A little known fact is that while system daemons run as root, they also do run inside extremely strict sandboxes based on the Seatbelt framework (internally based on, you guessed it, a MAC policy module, unimaginatively named Sandbox). In a sadly predictable twist, while extremely powerful and enabling incredibly granular access control, Seatbelt is almost completely undocumented (notice a pattern yet?). In a less predictable but sadder twist, it’s also been marked as deprecated with no replacement since OS X 10.8. Nevertheless, with all system daemons using it, plus third-party users of the caliber of Google Chrome and Mozilla Firefox, Seatbelt seems unlikely to disappear any time soon.

5 Mach (no relation to “Macintosh”) was a research project to replace the BSD kernel with a microkernel. The design proved impractical, and the most famous real-world implementation of Mach, NeXTSTEP (later “Darwin”, later “OS X”, later “macOS”), actually runs a Mach/BSD hybrid kernel with very little “micro”. Mach was an extremely influential design: Windows NT (later “Windows”) is a Mach clone too, except redesigned to work alongside a VMS-like kernel instead of a BSD one. Windows NT, too, flirted with a microkernel architecture, but it proved to be no more practical in the 90s than it had been in the 80s.

/*  * CPU Security Mitigation APIs  *  * Set CPU security mitigation on the current proc (all existing and future threads)  * This attribute is inherited on fork and exec  */ int proc_set_csm(uint32_t flags) __API_AVAILABLE(macos(11.0));

/* Set CPU security mitigation on the current thread */ int proc_setthread_csm(uint32_t flags) __API_AVAILABLE(macos(11.0));

/* * flags for CPU Security Mitigation APIs * PROC_CSM_ALL should be used in most cases, * the individual flags are provided only for performance evaluation etc */ #define PROC_CSM_ALL 0x0001 /* Set all available mitigations */ #define PROC_CSM_NOSMT 0x0002 /* Set NO_SMT – see above */ #define PROC_CSM_TECS 0x0004 /* Execute VERW on every return to user mode */

As with the dedicated NO_SMT API, we can enable mitigations for the entire current process, using

proc_set_csm
, or just the calling thread, with proc_setthread_csm. CSM functions, too, are wrappers for
process_policy(2)
.

Finally, just like NO_SMT, CSM also extends

posix_spawn(2)
. From :

/*  * Set CPU Security Mitigation on the spawned process  * This attribute affects all threads and is inherited on fork and exec  */ int     posix_spawnattr_set_csm_np(const posix_spawnattr_t * __restrict attr, uint32_t flags) __API_AVAILABLE(macos(11.0)); /*  * flags for CPU Security Mitigation attribute  * POSIX_SPAWN_NP_CSM_ALL should be used in most cases,  * the individual flags are provided only for performance evaluation etc  */ #define POSIX_SPAWN_NP_CSM_ALL         0x0001 #define POSIX_SPAWN_NP_CSM_NOSMT       0x0002 #define POSIX_SPAWN_NP_CSM_TECS        0x0004 

The meaning of the flags is identical to the similarly named flags, and

posix_spawnattr_set_csm_np(attr, POSIX_SPAWN_NP_CSM_NOSMT)
is 100% identical to and interchangeable with posix_spawnattr_setnosmt_np(attr).

How does it work?

Just like NO_SMT,

TECS
is a write-once, enable-only flag that is copied from task to task, from task to thread, and from thread to CPU. The task flag is TF_TECS. Below the task level, the flag becomes architecture-specific, x86-64-only, morphing into a mitigation codenamed
SEGCHK
. Thus, the thread flag is boolean field machine_thread::mthr_do_segchk, and the CPU flag is boolean field
cpu_data::cpu_curthread_do_segchk
, also known as CPU_NEED_SEGCHK in assembler code.

SEGCHK is implemented entirely in assembler, in kernel-to-user return routines

ks_64bit_return
and ks_32bit_return. If
CPU_NEED_SEGCHK
is set for the current CPU, they execute a VERW instruction shortly before the final
SYSEXIT
/SYSRET/
IRET
. VERW is an obscure and largely obsolete instruction that checks if the specified segment (the user mode stack segment, in the case of
SEGCHK
) is writable; but more importantly, it has the side effect of flushing the caches exploited by the RDCL and MDS families of attacks, mitigating them.

TECS is only enabled if it’s supported by the CPU, or if it’s been forced on by default. CPU support for

TECS
is only checked on x86-64, and it corresponds to whether SEGCHK is supported. The checks, performed at boot time, are:

Using VERW as a mitigation was initially suggested by two of the discoverers of RIDL (page 200), but it seems it proved insufficient, and CPU vendors had to enhance the instruction to act as a proper mitigation. macOS doesn’t trust the un-enhanced

VERW
.

SEGCHK can be forced on system-wide with a boot parameter. Again, see support article HT210108. Similarly, it can be forced off system-wide with undocumented boot parameter

cwad
(CPU workaround disable), which has the same syntax as cwae (CPU workaround enable).
cwae
has priority over cwad.

Unlike NO_SMT,

SEGCHK
/TECS has no firmware-level equivalent, nor can it be disabled after boot.

Who benefits from
NO_SMT
and
TECS
?

Google.

I’ve looked everywhere and no one else seems to use these mitigation APIs. The only source code match (outside of the macOS 11 and 12 SDKs, and the XNU source code itself) is Chromium. The only binary matches on my macOS 11 machine (outside of system libraries) are the Chrome and Electron frameworks, i.e. Chromium. Not even Safari seems to use them!

In Chromium, when compiling for macOS, the base::LaunchOptions structure passed to function

base::LaunchProcess
contains a boolean field named enable_cpu_security_mitigations; if set, the macOS implementation of
base::LaunchProcess
launches the new process with CSM flags POSIX_SPAWN_NP_CSM_ALL. If I understand the code correctly, mitigations are enabled for renderer and plugin host sub-processes, and disabled for all other kinds of sub-processes (another possible reading of the code suggests that the feature is implemented, but unused. Honestly, I haven’t dug too deep).

It’s hard not to wonder why Apple went through the effort of implementing mitigations and exposing them as APIs, and then neither document nor even use them. If they are ineffective, the question becomes why Google bothers using them. Either way, we are left with no clear answer.

Endpoint Security API improvements

Endpoint Security probably needs no introduction to the audience of this article, but I’ll still give a brief one.

This C API, first introduced in macOS 10.15, replaced and made obsolete the pre-existing patchwork of archaic auditing, monitoring and policing APIs (among which OpenBSM, KAUTH, Socketfilter and the venerable acct(2)—est. 1979[2]).

The design of Endpoint Security combined the near-absolute visibility and veto power over system state of a MAC[3] policy module with the safety properties of a client-server model, with a really nice and pretty-well-documented API on top. In short, it was the perfect API for a large variety of security applications.

Or was it? Unfortunately, Endpoint Security wasn’t without its own shortcomings, but they’re gradually being rectified. Let’s have a look at the most important improvements that macOS 11 and 12 make to Endpoint Security, only some of which were officially documented.

More message types

More operations can now be detected and/or vetoed, such as fcntl(2),

searchfs(2)
, ptrace(2), remounting a filesystem,
IOServiceOpen
, task_name_for_pid, process suspension and process resumption.

Interestingly, process suspension includes private system call pid_shutdown_sockets, which doesn’t actually suspend processes, but only shuts down their network connections after they’ve already been suspended. The system call was originally only available on iOS, where it’s part of how apps are sent to the background.

macOS 12 adds some more notifications: setuid(2),

setgid(2)
, seteuid(2),
setegid(2)
, setreuid(2) and
setregid(2)
.

More notifications, less polling

Some process metadata that only used to be available for querying, and necessitated polling and/or diffing to detect changes, now generates change events.

ES_EVENT_TYPE_NOTIFY_CS_INVALIDATED messages notify that a process’s code signature has gone invalid (i.e.

CS_VALID
flag no longer set) but the process is allowed to keep running (i.e. CS_HARD flag not set). Previously, it was only pollable through private system calls
csops
or csops_audittoken with operation code
CS_OPS_STATUS
.

ES_EVENT_TYPE_NOTIFY_REMOTE_THREAD_CREATE messages notify the creation of remote (i.e. inter-process) threads. Previously, this information was only available at low fidelity and with great effort, either by polling and diffing the data returned by Mach task method

task_info
with flavor TASK_EXTMOD_INFO, or by monitoring syslog for
com.apple.kernel.external_modification
messages.

More metadata

exec(2) messages now include the new process’s working directory (

es_event_exec_t::cwd
field).

Process metadata for all messages now includes:

  • The process’s controlling terminal, if any (
    es_process_t::tty
    field).
  • The process’s “start time”, i.e. the time when its process identifier was allocated by
    fork(2)
    (
    es_process_t::start_time
    field). Previously only available through
    sysctl(2)
    with the
    kern.proc.pid.
    OID.
  • the “responsible process” (
    es_process_t::responsible_audit_token
    field), i.e. the process that the notorious (to us developers) Transparency, Consent & Control (TCC) framework blames for an operation subject to user consent. Often, this is the client process that caused a daemon/agent process to be launched, which in an auditing context should be considered the “true” parent of a process (instead of “placeholder”
    xpcproxy(8)
    ). Previously only available through the private—and completely undocumented—”responsibility” API of MAC policy module Quarantine (e.g.
    responsibility_get_responsible_for_pid
    ).

Finally, for the first time ever in a macOS auditing API, all messages now report not just the process that caused the message to be generated, but the exact thread as well (es_message_t::thread field).

Improved performance

It’s now possible to process messages asynchronously without the overhead of es_copy_message/

es_free_message
(equivalent to a sequence of malloc,
memcpy
and free): Messages are now reference counted (see new functions
es_retain_message
/es_release_message), and can be moved across threads almost for free.
es_copy_message
and es_free_message have been outright deprecated and should no longer be used, except for backwards compatibility with macOS 10.15. They won’t be missed by me or my spindump traces.

A vulnerability quietly fixed

Sometimes, diffing SDK versions can even reveal security holes that were quietly fixed. Such is the case for fcntl(2) command

F_SETSIZE
.

F_SETSIZE is used to change the maximum disk space allocated to a file: If it’s smaller than the current size, the file is truncated; if it’s larger, the file is extended. What stops a malicious process from extending a file so that it fills the entire disk, and then reading from the extended file to carve deleted files out of what was previously free space? Very simple:

F_SETSIZE
fills the new file space with all zeroes to conceal what it used to contain. As an optimization, a superuser process (effective user id 0) is allowed to extend a file without zeroing out, because a superuser process is assumed to have access to that data anyway.

However, macOS has gradually made the UNIX security model irrelevant. For example, even the superuser is only allowed to access the private documents of a regular user with the user’s permission—permission that is given on a per-application basis, through that protector of users and bane of developers known as the Transparency, Consent & Control (TCC) framework. This reflects the new meaning that macOS has given to the “root” superuser: No longer the administrator of a multi-user system, as it was originally meant on UNIX, but either a temporary identity assumed by each user for system administration tasks (e.g. by way of sudo(8)), or the anonymous user under which daemons run[4].

In this new security model, the superuser can no longer be assumed to have unrestricted access to everything. However, not zeroing space when extending a file would let a superuser process with no entitlements at all recover any file that had been deleted.

macOS 11 fixes this by no longer handling the superuser as a special case for F_SETSIZE. The man page for

fcntl(2)
now says:

F_SETSIZE . Deprecated. In previous releases, this would allow a process with root privileges to truncate a file without zeroing space. For security reasons, this operation is no longer supported and will instead truncate the file in the same manner as

truncate(2)
.

Even the comments in were amended. Before (macOS 10.15 SDK):

#define F_SETSIZE       43              /* Truncate a file without zeroing space */ 

And after (macOS 11 SDK):

#define F_SETSIZE       43              /* Truncate a file. Equivalent to calling truncate(2) */ 

As far as I can tell, this information disclosure vulnerability was never assigned a CVE, nor was it publicly acknowledged in any other way before it was silently fixed in macOS 11.

O_NOFOLLOW_ANY

Even primeval APIs like system call open(2) can still have room for improvement. macOS 11 introduces a new flag for it,

O_NOFOLLOW_ANY
, that mitigates an entire family of potential vulnerabilities, especially in security applications.

Endpoint Security provides the full, resolved (no symlinks), normalized path of each file involved in an auditable event (what OpenBSM veterans/victims like me used to know as “vnode kpath”), but how can applications be sure that the path still identifies the same file by the time they open it? With O_NOFOLLOW_ANY set,

open(2)
will fail with error ELOOP if any symlink is encountered anywhere in the path: A stronger version of
O_NOFOLLOW
that applies to the entire path, not just the final component.

Conclusion

What did I learn from my rummaging? Apple still likes its secrets; the Chromium source code still is the best documentation on the mitigations and sandboxing features provided by all major operating systems; diffing releases remains the best way to find hidden features; and some secrets can stay hidden in plain sight for a long time.

I wrote this article in part as a “look at this cool thing”, and in part as a sort of public service, so that the new, hidden macOS features no longer return a deafening silence when queried on search engines. Even if I got some details wrong, at least the topic can now be debated.

Endnotes

1 The relationship between tasks and processes in macOS can be roughly summarized as: Each (BSD) process corresponds exactly to one (Mach[5]) task, and vice versa, until the process calls exec(2).

exec(2)
terminates the current task, creates a new one and associates it to the process, replacing the dead one. macOS old timers may object that exec(2) actually keeps the same task, resetting its state. It used to work like that, but it was a fragile design, that was dealt a fatal blow by Google researcher Ian Beer in 2016. Starting from XNU 3789.21.4 (macOS 10.12.1, iOS 10.1),
exec(2)
creates a new task
.

2 What’s the use for such an ancient API? It may sound incredible, but until Endpoint Security, macOS had no reliable auditing mechanism to log process deaths. Except acct(2), that is, which to describe as “archaic” would be a compliment. It logs fixed-size records to a global log file, it truncates process names to 9 characters, it logs user ids but not process ids, and the timestamp format for process exit times is a literally unbelievable 1/64 * 8^exponent * mantissa seconds since the process started (good for about 8 years and a half of non-stop running, but with a variable precision that drops below the second at the 2:16:30 mark), encoded in 16 bits as 3 bits exponent, 13 bits mantissa; the process start time is a saner 32-bit count of seconds since the UNIX epoch (good for about 17 years from now). We opted not to use

acct(2)
.

3 Mandatory Access Control, no relation to “Mac” as an abbreviation for “Macintosh”. Historically referring to security models patterned after military document classification practices, MAC is nowadays a generic term for any policy-based security model, distinct from and orthogonal to permission-based security models (also known as DAC, or Discretionary Access Control). macOS inherited its modular MAC framework from the TrustedBSD project and uses it with great gusto (I count no fewer than seven policy modules on my macOS 11 machine). The Linux Security Modules framework is the Linux equivalent. Windows has limited MAC in the form of the capabilities system for UWP apps. The bitter irony is that, being limited to a small subset of applications, it’s a “mandatory” access control system that operates on an opt-in basis—to say nothing of its complete non-extensibility.

4 If daemons have to run under an anonymous user, why must it be root, which, while still bound by MAC policies, can bypass all ACLs, send kill signals to any process, invoke sysctls in write mode and in general do a lot of damage? A little known fact is that while system daemons run as root, they also do run inside extremely strict sandboxes based on the Seatbelt framework (internally based on, you guessed it, a MAC policy module, unimaginatively named Sandbox). In a sadly predictable twist, while extremely powerful and enabling incredibly granular access control, Seatbelt is almost completely undocumented (notice a pattern yet?). In a less predictable but sadder twist, it’s also been marked as deprecated with no replacement since OS X 10.8. Nevertheless, with all system daemons using it, plus third-party users of the caliber of Google Chrome and Mozilla Firefox, Seatbelt seems unlikely to disappear any time soon.

5 Mach (no relation to “Macintosh”) was a research project to replace the BSD kernel with a microkernel. The design proved impractical, and the most famous real-world implementation of Mach, NeXTSTEP (later “Darwin”, later “OS X”, later “macOS”), actually runs a Mach/BSD hybrid kernel with very little “micro”. Mach was an extremely influential design: Windows NT (later “Windows”) is a Mach clone too, except redesigned to work alongside a VMS-like kernel instead of a BSD one. Windows NT, too, flirted with a microkernel architecture, but it proved to be no more practical in the 90s than it had been in the 80s.

#define TF_TECS                 0x00020000                              /* task threads must enable CPU security */ 

Thread Enable CPU Security? Even if it’s the correct interpretation, it doesn’t help us understand what it does.

In its current incarnation (the generic name suggests the specifics might change in the future), TECS flushes certain internal CPU buffers before returning from kernel mode to user mode. It’s a mitigation for the Rogue Data Cache Load (RDCL) family of attacks (like Meltdown) and the Microarchitectural Data Sampling (MDS) family of attacks (like RIDL and Fallout).

How to use
TECS

Unlike NO_SMT,

TECS
doesn’t have a dedicated API, but it’s enabled through a generic API called CPU Security Mitigations (CSM), that can also enable NO_SMT. In C/C++,
#include 
; no extra library necessary. From :

/*  * CPU Security Mitigation APIs  *  * Set CPU security mitigation on the current proc (all existing and future threads)  * This attribute is inherited on fork and exec  */ int proc_set_csm(uint32_t flags) __API_AVAILABLE(macos(11.0));

/* Set CPU security mitigation on the current thread */ int proc_setthread_csm(uint32_t flags) __API_AVAILABLE(macos(11.0));

/* * flags for CPU Security Mitigation APIs * PROC_CSM_ALL should be used in most cases, * the individual flags are provided only for performance evaluation etc */ #define PROC_CSM_ALL 0x0001 /* Set all available mitigations */ #define PROC_CSM_NOSMT 0x0002 /* Set NO_SMT – see above */ #define PROC_CSM_TECS 0x0004 /* Execute VERW on every return to user mode */

As with the dedicated NO_SMT API, we can enable mitigations for the entire current process, using

proc_set_csm
, or just the calling thread, with proc_setthread_csm. CSM functions, too, are wrappers for
process_policy(2)
.

Finally, just like NO_SMT, CSM also extends

posix_spawn(2)
. From :

/*  * Set CPU Security Mitigation on the spawned process  * This attribute affects all threads and is inherited on fork and exec  */ int     posix_spawnattr_set_csm_np(const posix_spawnattr_t * __restrict attr, uint32_t flags) __API_AVAILABLE(macos(11.0)); /*  * flags for CPU Security Mitigation attribute  * POSIX_SPAWN_NP_CSM_ALL should be used in most cases,  * the individual flags are provided only for performance evaluation etc  */ #define POSIX_SPAWN_NP_CSM_ALL         0x0001 #define POSIX_SPAWN_NP_CSM_NOSMT       0x0002 #define POSIX_SPAWN_NP_CSM_TECS        0x0004 

The meaning of the flags is identical to the similarly named flags, and

posix_spawnattr_set_csm_np(attr, POSIX_SPAWN_NP_CSM_NOSMT)
is 100% identical to and interchangeable with posix_spawnattr_setnosmt_np(attr).

How does it work?

Just like NO_SMT,

TECS
is a write-once, enable-only flag that is copied from task to task, from task to thread, and from thread to CPU. The task flag is TF_TECS. Below the task level, the flag becomes architecture-specific, x86-64-only, morphing into a mitigation codenamed
SEGCHK
. Thus, the thread flag is boolean field machine_thread::mthr_do_segchk, and the CPU flag is boolean field
cpu_data::cpu_curthread_do_segchk
, also known as CPU_NEED_SEGCHK in assembler code.

SEGCHK is implemented entirely in assembler, in kernel-to-user return routines

ks_64bit_return
and ks_32bit_return. If
CPU_NEED_SEGCHK
is set for the current CPU, they execute a VERW instruction shortly before the final
SYSEXIT
/SYSRET/
IRET
. VERW is an obscure and largely obsolete instruction that checks if the specified segment (the user mode stack segment, in the case of
SEGCHK
) is writable; but more importantly, it has the side effect of flushing the caches exploited by the RDCL and MDS families of attacks, mitigating them.

TECS is only enabled if it’s supported by the CPU, or if it’s been forced on by default. CPU support for

TECS
is only checked on x86-64, and it corresponds to whether SEGCHK is supported. The checks, performed at boot time, are:

Using VERW as a mitigation was initially suggested by two of the discoverers of RIDL (page 200), but it seems it proved insufficient, and CPU vendors had to enhance the instruction to act as a proper mitigation. macOS doesn’t trust the un-enhanced

VERW
.

SEGCHK can be forced on system-wide with a boot parameter. Again, see support article HT210108. Similarly, it can be forced off system-wide with undocumented boot parameter

cwad
(CPU workaround disable), which has the same syntax as cwae (CPU workaround enable).
cwae
has priority over cwad.

Unlike NO_SMT,

SEGCHK
/TECS has no firmware-level equivalent, nor can it be disabled after boot.

Who benefits from
NO_SMT
and
TECS
?

Google.

I’ve looked everywhere and no one else seems to use these mitigation APIs. The only source code match (outside of the macOS 11 and 12 SDKs, and the XNU source code itself) is Chromium. The only binary matches on my macOS 11 machine (outside of system libraries) are the Chrome and Electron frameworks, i.e. Chromium. Not even Safari seems to use them!

In Chromium, when compiling for macOS, the base::LaunchOptions structure passed to function

base::LaunchProcess
contains a boolean field named enable_cpu_security_mitigations; if set, the macOS implementation of
base::LaunchProcess
launches the new process with CSM flags POSIX_SPAWN_NP_CSM_ALL. If I understand the code correctly, mitigations are enabled for renderer and plugin host sub-processes, and disabled for all other kinds of sub-processes (another possible reading of the code suggests that the feature is implemented, but unused. Honestly, I haven’t dug too deep).

It’s hard not to wonder why Apple went through the effort of implementing mitigations and exposing them as APIs, and then neither document nor even use them. If they are ineffective, the question becomes why Google bothers using them. Either way, we are left with no clear answer.

Endpoint Security API improvements

Endpoint Security probably needs no introduction to the audience of this article, but I’ll still give a brief one.

This C API, first introduced in macOS 10.15, replaced and made obsolete the pre-existing patchwork of archaic auditing, monitoring and policing APIs (among which OpenBSM, KAUTH, Socketfilter and the venerable acct(2)—est. 1979[2]).

The design of Endpoint Security combined the near-absolute visibility and veto power over system state of a MAC[3] policy module with the safety properties of a client-server model, with a really nice and pretty-well-documented API on top. In short, it was the perfect API for a large variety of security applications.

Or was it? Unfortunately, Endpoint Security wasn’t without its own shortcomings, but they’re gradually being rectified. Let’s have a look at the most important improvements that macOS 11 and 12 make to Endpoint Security, only some of which were officially documented.

More message types

More operations can now be detected and/or vetoed, such as fcntl(2),

searchfs(2)
, ptrace(2), remounting a filesystem,
IOServiceOpen
, task_name_for_pid, process suspension and process resumption.

Interestingly, process suspension includes private system call pid_shutdown_sockets, which doesn’t actually suspend processes, but only shuts down their network connections after they’ve already been suspended. The system call was originally only available on iOS, where it’s part of how apps are sent to the background.

macOS 12 adds some more notifications: setuid(2),

setgid(2)
, seteuid(2),
setegid(2)
, setreuid(2) and
setregid(2)
.

More notifications, less polling

Some process metadata that only used to be available for querying, and necessitated polling and/or diffing to detect changes, now generates change events.

ES_EVENT_TYPE_NOTIFY_CS_INVALIDATED messages notify that a process’s code signature has gone invalid (i.e.

CS_VALID
flag no longer set) but the process is allowed to keep running (i.e. CS_HARD flag not set). Previously, it was only pollable through private system calls
csops
or csops_audittoken with operation code
CS_OPS_STATUS
.

ES_EVENT_TYPE_NOTIFY_REMOTE_THREAD_CREATE messages notify the creation of remote (i.e. inter-process) threads. Previously, this information was only available at low fidelity and with great effort, either by polling and diffing the data returned by Mach task method

task_info
with flavor TASK_EXTMOD_INFO, or by monitoring syslog for
com.apple.kernel.external_modification
messages.

More metadata

exec(2) messages now include the new process’s working directory (

es_event_exec_t::cwd
field).

Process metadata for all messages now includes:

  • The process’s controlling terminal, if any (
    es_process_t::tty
    field).
  • The process’s “start time”, i.e. the time when its process identifier was allocated by
    fork(2)
    (
    es_process_t::start_time
    field). Previously only available through
    sysctl(2)
    with the
    kern.proc.pid.
    OID.
  • the “responsible process” (
    es_process_t::responsible_audit_token
    field), i.e. the process that the notorious (to us developers) Transparency, Consent & Control (TCC) framework blames for an operation subject to user consent. Often, this is the client process that caused a daemon/agent process to be launched, which in an auditing context should be considered the “true” parent of a process (instead of “placeholder”
    xpcproxy(8)
    ). Previously only available through the private—and completely undocumented—”responsibility” API of MAC policy module Quarantine (e.g.
    responsibility_get_responsible_for_pid
    ).

Finally, for the first time ever in a macOS auditing API, all messages now report not just the process that caused the message to be generated, but the exact thread as well (es_message_t::thread field).

Improved performance

It’s now possible to process messages asynchronously without the overhead of es_copy_message/

es_free_message
(equivalent to a sequence of malloc,
memcpy
and free): Messages are now reference counted (see new functions
es_retain_message
/es_release_message), and can be moved across threads almost for free.
es_copy_message
and es_free_message have been outright deprecated and should no longer be used, except for backwards compatibility with macOS 10.15. They won’t be missed by me or my spindump traces.

A vulnerability quietly fixed

Sometimes, diffing SDK versions can even reveal security holes that were quietly fixed. Such is the case for fcntl(2) command

F_SETSIZE
.

F_SETSIZE is used to change the maximum disk space allocated to a file: If it’s smaller than the current size, the file is truncated; if it’s larger, the file is extended. What stops a malicious process from extending a file so that it fills the entire disk, and then reading from the extended file to carve deleted files out of what was previously free space? Very simple:

F_SETSIZE
fills the new file space with all zeroes to conceal what it used to contain. As an optimization, a superuser process (effective user id 0) is allowed to extend a file without zeroing out, because a superuser process is assumed to have access to that data anyway.

However, macOS has gradually made the UNIX security model irrelevant. For example, even the superuser is only allowed to access the private documents of a regular user with the user’s permission—permission that is given on a per-application basis, through that protector of users and bane of developers known as the Transparency, Consent & Control (TCC) framework. This reflects the new meaning that macOS has given to the “root” superuser: No longer the administrator of a multi-user system, as it was originally meant on UNIX, but either a temporary identity assumed by each user for system administration tasks (e.g. by way of sudo(8)), or the anonymous user under which daemons run[4].

In this new security model, the superuser can no longer be assumed to have unrestricted access to everything. However, not zeroing space when extending a file would let a superuser process with no entitlements at all recover any file that had been deleted.

macOS 11 fixes this by no longer handling the superuser as a special case for F_SETSIZE. The man page for

fcntl(2)
now says:

F_SETSIZE . Deprecated. In previous releases, this would allow a process with root privileges to truncate a file without zeroing space. For security reasons, this operation is no longer supported and will instead truncate the file in the same manner as

truncate(2)
.

Even the comments in were amended. Before (macOS 10.15 SDK):

#define F_SETSIZE       43              /* Truncate a file without zeroing space */ 

And after (macOS 11 SDK):

#define F_SETSIZE       43              /* Truncate a file. Equivalent to calling truncate(2) */ 

As far as I can tell, this information disclosure vulnerability was never assigned a CVE, nor was it publicly acknowledged in any other way before it was silently fixed in macOS 11.

O_NOFOLLOW_ANY

Even primeval APIs like system call open(2) can still have room for improvement. macOS 11 introduces a new flag for it,

O_NOFOLLOW_ANY
, that mitigates an entire family of potential vulnerabilities, especially in security applications.

Endpoint Security provides the full, resolved (no symlinks), normalized path of each file involved in an auditable event (what OpenBSM veterans/victims like me used to know as “vnode kpath”), but how can applications be sure that the path still identifies the same file by the time they open it? With O_NOFOLLOW_ANY set,

open(2)
will fail with error ELOOP if any symlink is encountered anywhere in the path: A stronger version of
O_NOFOLLOW
that applies to the entire path, not just the final component.

Conclusion

What did I learn from my rummaging? Apple still likes its secrets; the Chromium source code still is the best documentation on the mitigations and sandboxing features provided by all major operating systems; diffing releases remains the best way to find hidden features; and some secrets can stay hidden in plain sight for a long time.

I wrote this article in part as a “look at this cool thing”, and in part as a sort of public service, so that the new, hidden macOS features no longer return a deafening silence when queried on search engines. Even if I got some details wrong, at least the topic can now be debated.

Endnotes

1 The relationship between tasks and processes in macOS can be roughly summarized as: Each (BSD) process corresponds exactly to one (Mach[5]) task, and vice versa, until the process calls exec(2).

exec(2)
terminates the current task, creates a new one and associates it to the process, replacing the dead one. macOS old timers may object that exec(2) actually keeps the same task, resetting its state. It used to work like that, but it was a fragile design, that was dealt a fatal blow by Google researcher Ian Beer in 2016. Starting from XNU 3789.21.4 (macOS 10.12.1, iOS 10.1),
exec(2)
creates a new task
.

2 What’s the use for such an ancient API? It may sound incredible, but until Endpoint Security, macOS had no reliable auditing mechanism to log process deaths. Except acct(2), that is, which to describe as “archaic” would be a compliment. It logs fixed-size records to a global log file, it truncates process names to 9 characters, it logs user ids but not process ids, and the timestamp format for process exit times is a literally unbelievable 1/64 * 8^exponent * mantissa seconds since the process started (good for about 8 years and a half of non-stop running, but with a variable precision that drops below the second at the 2:16:30 mark), encoded in 16 bits as 3 bits exponent, 13 bits mantissa; the process start time is a saner 32-bit count of seconds since the UNIX epoch (good for about 17 years from now). We opted not to use

acct(2)
.

3 Mandatory Access Control, no relation to “Mac” as an abbreviation for “Macintosh”. Historically referring to security models patterned after military document classification practices, MAC is nowadays a generic term for any policy-based security model, distinct from and orthogonal to permission-based security models (also known as DAC, or Discretionary Access Control). macOS inherited its modular MAC framework from the TrustedBSD project and uses it with great gusto (I count no fewer than seven policy modules on my macOS 11 machine). The Linux Security Modules framework is the Linux equivalent. Windows has limited MAC in the form of the capabilities system for UWP apps. The bitter irony is that, being limited to a small subset of applications, it’s a “mandatory” access control system that operates on an opt-in basis—to say nothing of its complete non-extensibility.

4 If daemons have to run under an anonymous user, why must it be root, which, while still bound by MAC policies, can bypass all ACLs, send kill signals to any process, invoke sysctls in write mode and in general do a lot of damage? A little known fact is that while system daemons run as root, they also do run inside extremely strict sandboxes based on the Seatbelt framework (internally based on, you guessed it, a MAC policy module, unimaginatively named Sandbox). In a sadly predictable twist, while extremely powerful and enabling incredibly granular access control, Seatbelt is almost completely undocumented (notice a pattern yet?). In a less predictable but sadder twist, it’s also been marked as deprecated with no replacement since OS X 10.8. Nevertheless, with all system daemons using it, plus third-party users of the caliber of Google Chrome and Mozilla Firefox, Seatbelt seems unlikely to disappear any time soon.

5 Mach (no relation to “Macintosh”) was a research project to replace the BSD kernel with a microkernel. The design proved impractical, and the most famous real-world implementation of Mach, NeXTSTEP (later “Darwin”, later “OS X”, later “macOS”), actually runs a Mach/BSD hybrid kernel with very little “micro”. Mach was an extremely influential design: Windows NT (later “Windows”) is a Mach clone too, except redesigned to work alongside a VMS-like kernel instead of a BSD one. Windows NT, too, flirted with a microkernel architecture, but it proved to be no more practical in the 90s than it had been in the 80s.

tstudent@MAC-67C2FA8CA4EC ~ % sudo sysctl kern.sched_allow_NO_SMT_threads=0 Password: kern.sched_allow_NO_SMT_threads: 1 -> 0 

This instantly disables NO_SMT system-wide. I wonder why they bothered making it a write-once flag, only to make it so trivial to disable.

The
TECS
mitigation

What is it?

I have been unable to figure out what TECS stands for. Closest I could get was this comment from the source code of the kernel (

osfmk/kern/task.h
, from XNU 7195.50.7.100.1):

#define TF_TECS                 0x00020000                              /* task threads must enable CPU security */ 

Thread Enable CPU Security? Even if it’s the correct interpretation, it doesn’t help us understand what it does.

In its current incarnation (the generic name suggests the specifics might change in the future), TECS flushes certain internal CPU buffers before returning from kernel mode to user mode. It’s a mitigation for the Rogue Data Cache Load (RDCL) family of attacks (like Meltdown) and the Microarchitectural Data Sampling (MDS) family of attacks (like RIDL and Fallout).

How to use
TECS

Unlike NO_SMT,

TECS
doesn’t have a dedicated API, but it’s enabled through a generic API called CPU Security Mitigations (CSM), that can also enable NO_SMT. In C/C++,
#include 
; no extra library necessary. From :

/*  * CPU Security Mitigation APIs  *  * Set CPU security mitigation on the current proc (all existing and future threads)  * This attribute is inherited on fork and exec  */ int proc_set_csm(uint32_t flags) __API_AVAILABLE(macos(11.0));

/* Set CPU security mitigation on the current thread */ int proc_setthread_csm(uint32_t flags) __API_AVAILABLE(macos(11.0));

/* * flags for CPU Security Mitigation APIs * PROC_CSM_ALL should be used in most cases, * the individual flags are provided only for performance evaluation etc */ #define PROC_CSM_ALL 0x0001 /* Set all available mitigations */ #define PROC_CSM_NOSMT 0x0002 /* Set NO_SMT – see above */ #define PROC_CSM_TECS 0x0004 /* Execute VERW on every return to user mode */

As with the dedicated NO_SMT API, we can enable mitigations for the entire current process, using

proc_set_csm
, or just the calling thread, with proc_setthread_csm. CSM functions, too, are wrappers for
process_policy(2)
.

Finally, just like NO_SMT, CSM also extends

posix_spawn(2)
. From :

/*  * Set CPU Security Mitigation on the spawned process  * This attribute affects all threads and is inherited on fork and exec  */ int     posix_spawnattr_set_csm_np(const posix_spawnattr_t * __restrict attr, uint32_t flags) __API_AVAILABLE(macos(11.0)); /*  * flags for CPU Security Mitigation attribute  * POSIX_SPAWN_NP_CSM_ALL should be used in most cases,  * the individual flags are provided only for performance evaluation etc  */ #define POSIX_SPAWN_NP_CSM_ALL         0x0001 #define POSIX_SPAWN_NP_CSM_NOSMT       0x0002 #define POSIX_SPAWN_NP_CSM_TECS        0x0004 

The meaning of the flags is identical to the similarly named flags, and

posix_spawnattr_set_csm_np(attr, POSIX_SPAWN_NP_CSM_NOSMT)
is 100% identical to and interchangeable with posix_spawnattr_setnosmt_np(attr).

How does it work?

Just like NO_SMT,

TECS
is a write-once, enable-only flag that is copied from task to task, from task to thread, and from thread to CPU. The task flag is TF_TECS. Below the task level, the flag becomes architecture-specific, x86-64-only, morphing into a mitigation codenamed
SEGCHK
. Thus, the thread flag is boolean field machine_thread::mthr_do_segchk, and the CPU flag is boolean field
cpu_data::cpu_curthread_do_segchk
, also known as CPU_NEED_SEGCHK in assembler code.

SEGCHK is implemented entirely in assembler, in kernel-to-user return routines

ks_64bit_return
and ks_32bit_return. If
CPU_NEED_SEGCHK
is set for the current CPU, they execute a VERW instruction shortly before the final
SYSEXIT
/SYSRET/
IRET
. VERW is an obscure and largely obsolete instruction that checks if the specified segment (the user mode stack segment, in the case of
SEGCHK
) is writable; but more importantly, it has the side effect of flushing the caches exploited by the RDCL and MDS families of attacks, mitigating them.

TECS is only enabled if it’s supported by the CPU, or if it’s been forced on by default. CPU support for

TECS
is only checked on x86-64, and it corresponds to whether SEGCHK is supported. The checks, performed at boot time, are:

Using VERW as a mitigation was initially suggested by two of the discoverers of RIDL (page 200), but it seems it proved insufficient, and CPU vendors had to enhance the instruction to act as a proper mitigation. macOS doesn’t trust the un-enhanced

VERW
.

SEGCHK can be forced on system-wide with a boot parameter. Again, see support article HT210108. Similarly, it can be forced off system-wide with undocumented boot parameter

cwad
(CPU workaround disable), which has the same syntax as cwae (CPU workaround enable).
cwae
has priority over cwad.

Unlike NO_SMT,

SEGCHK
/TECS has no firmware-level equivalent, nor can it be disabled after boot.

Who benefits from
NO_SMT
and
TECS
?

Google.

I’ve looked everywhere and no one else seems to use these mitigation APIs. The only source code match (outside of the macOS 11 and 12 SDKs, and the XNU source code itself) is Chromium. The only binary matches on my macOS 11 machine (outside of system libraries) are the Chrome and Electron frameworks, i.e. Chromium. Not even Safari seems to use them!

In Chromium, when compiling for macOS, the base::LaunchOptions structure passed to function

base::LaunchProcess
contains a boolean field named enable_cpu_security_mitigations; if set, the macOS implementation of
base::LaunchProcess
launches the new process with CSM flags POSIX_SPAWN_NP_CSM_ALL. If I understand the code correctly, mitigations are enabled for renderer and plugin host sub-processes, and disabled for all other kinds of sub-processes (another possible reading of the code suggests that the feature is implemented, but unused. Honestly, I haven’t dug too deep).

It’s hard not to wonder why Apple went through the effort of implementing mitigations and exposing them as APIs, and then neither document nor even use them. If they are ineffective, the question becomes why Google bothers using them. Either way, we are left with no clear answer.

Endpoint Security API improvements

Endpoint Security probably needs no introduction to the audience of this article, but I’ll still give a brief one.

This C API, first introduced in macOS 10.15, replaced and made obsolete the pre-existing patchwork of archaic auditing, monitoring and policing APIs (among which OpenBSM, KAUTH, Socketfilter and the venerable acct(2)—est. 1979[2]).

The design of Endpoint Security combined the near-absolute visibility and veto power over system state of a MAC[3] policy module with the safety properties of a client-server model, with a really nice and pretty-well-documented API on top. In short, it was the perfect API for a large variety of security applications.

Or was it? Unfortunately, Endpoint Security wasn’t without its own shortcomings, but they’re gradually being rectified. Let’s have a look at the most important improvements that macOS 11 and 12 make to Endpoint Security, only some of which were officially documented.

More message types

More operations can now be detected and/or vetoed, such as fcntl(2),

searchfs(2)
, ptrace(2), remounting a filesystem,
IOServiceOpen
, task_name_for_pid, process suspension and process resumption.

Interestingly, process suspension includes private system call pid_shutdown_sockets, which doesn’t actually suspend processes, but only shuts down their network connections after they’ve already been suspended. The system call was originally only available on iOS, where it’s part of how apps are sent to the background.

macOS 12 adds some more notifications: setuid(2),

setgid(2)
, seteuid(2),
setegid(2)
, setreuid(2) and
setregid(2)
.

More notifications, less polling

Some process metadata that only used to be available for querying, and necessitated polling and/or diffing to detect changes, now generates change events.

ES_EVENT_TYPE_NOTIFY_CS_INVALIDATED messages notify that a process’s code signature has gone invalid (i.e.

CS_VALID
flag no longer set) but the process is allowed to keep running (i.e. CS_HARD flag not set). Previously, it was only pollable through private system calls
csops
or csops_audittoken with operation code
CS_OPS_STATUS
.

ES_EVENT_TYPE_NOTIFY_REMOTE_THREAD_CREATE messages notify the creation of remote (i.e. inter-process) threads. Previously, this information was only available at low fidelity and with great effort, either by polling and diffing the data returned by Mach task method

task_info
with flavor TASK_EXTMOD_INFO, or by monitoring syslog for
com.apple.kernel.external_modification
messages.

More metadata

exec(2) messages now include the new process’s working directory (

es_event_exec_t::cwd
field).

Process metadata for all messages now includes:

  • The process’s controlling terminal, if any (
    es_process_t::tty
    field).
  • The process’s “start time”, i.e. the time when its process identifier was allocated by
    fork(2)
    (
    es_process_t::start_time
    field). Previously only available through
    sysctl(2)
    with the
    kern.proc.pid.
    OID.
  • the “responsible process” (
    es_process_t::responsible_audit_token
    field), i.e. the process that the notorious (to us developers) Transparency, Consent & Control (TCC) framework blames for an operation subject to user consent. Often, this is the client process that caused a daemon/agent process to be launched, which in an auditing context should be considered the “true” parent of a process (instead of “placeholder”
    xpcproxy(8)
    ). Previously only available through the private—and completely undocumented—”responsibility” API of MAC policy module Quarantine (e.g.
    responsibility_get_responsible_for_pid
    ).

Finally, for the first time ever in a macOS auditing API, all messages now report not just the process that caused the message to be generated, but the exact thread as well (es_message_t::thread field).

Improved performance

It’s now possible to process messages asynchronously without the overhead of es_copy_message/

es_free_message
(equivalent to a sequence of malloc,
memcpy
and free): Messages are now reference counted (see new functions
es_retain_message
/es_release_message), and can be moved across threads almost for free.
es_copy_message
and es_free_message have been outright deprecated and should no longer be used, except for backwards compatibility with macOS 10.15. They won’t be missed by me or my spindump traces.

A vulnerability quietly fixed

Sometimes, diffing SDK versions can even reveal security holes that were quietly fixed. Such is the case for fcntl(2) command

F_SETSIZE
.

F_SETSIZE is used to change the maximum disk space allocated to a file: If it’s smaller than the current size, the file is truncated; if it’s larger, the file is extended. What stops a malicious process from extending a file so that it fills the entire disk, and then reading from the extended file to carve deleted files out of what was previously free space? Very simple:

F_SETSIZE
fills the new file space with all zeroes to conceal what it used to contain. As an optimization, a superuser process (effective user id 0) is allowed to extend a file without zeroing out, because a superuser process is assumed to have access to that data anyway.

However, macOS has gradually made the UNIX security model irrelevant. For example, even the superuser is only allowed to access the private documents of a regular user with the user’s permission—permission that is given on a per-application basis, through that protector of users and bane of developers known as the Transparency, Consent & Control (TCC) framework. This reflects the new meaning that macOS has given to the “root” superuser: No longer the administrator of a multi-user system, as it was originally meant on UNIX, but either a temporary identity assumed by each user for system administration tasks (e.g. by way of sudo(8)), or the anonymous user under which daemons run[4].

In this new security model, the superuser can no longer be assumed to have unrestricted access to everything. However, not zeroing space when extending a file would let a superuser process with no entitlements at all recover any file that had been deleted.

macOS 11 fixes this by no longer handling the superuser as a special case for F_SETSIZE. The man page for

fcntl(2)
now says:

F_SETSIZE . Deprecated. In previous releases, this would allow a process with root privileges to truncate a file without zeroing space. For security reasons, this operation is no longer supported and will instead truncate the file in the same manner as

truncate(2)
.

Even the comments in were amended. Before (macOS 10.15 SDK):

#define F_SETSIZE       43              /* Truncate a file without zeroing space */ 

And after (macOS 11 SDK):

#define F_SETSIZE       43              /* Truncate a file. Equivalent to calling truncate(2) */ 

As far as I can tell, this information disclosure vulnerability was never assigned a CVE, nor was it publicly acknowledged in any other way before it was silently fixed in macOS 11.

O_NOFOLLOW_ANY

Even primeval APIs like system call open(2) can still have room for improvement. macOS 11 introduces a new flag for it,

O_NOFOLLOW_ANY
, that mitigates an entire family of potential vulnerabilities, especially in security applications.

Endpoint Security provides the full, resolved (no symlinks), normalized path of each file involved in an auditable event (what OpenBSM veterans/victims like me used to know as “vnode kpath”), but how can applications be sure that the path still identifies the same file by the time they open it? With O_NOFOLLOW_ANY set,

open(2)
will fail with error ELOOP if any symlink is encountered anywhere in the path: A stronger version of
O_NOFOLLOW
that applies to the entire path, not just the final component.

Conclusion

What did I learn from my rummaging? Apple still likes its secrets; the Chromium source code still is the best documentation on the mitigations and sandboxing features provided by all major operating systems; diffing releases remains the best way to find hidden features; and some secrets can stay hidden in plain sight for a long time.

I wrote this article in part as a “look at this cool thing”, and in part as a sort of public service, so that the new, hidden macOS features no longer return a deafening silence when queried on search engines. Even if I got some details wrong, at least the topic can now be debated.

Endnotes

1 The relationship between tasks and processes in macOS can be roughly summarized as: Each (BSD) process corresponds exactly to one (Mach[5]) task, and vice versa, until the process calls exec(2).

exec(2)
terminates the current task, creates a new one and associates it to the process, replacing the dead one. macOS old timers may object that exec(2) actually keeps the same task, resetting its state. It used to work like that, but it was a fragile design, that was dealt a fatal blow by Google researcher Ian Beer in 2016. Starting from XNU 3789.21.4 (macOS 10.12.1, iOS 10.1),
exec(2)
creates a new task
.

2 What’s the use for such an ancient API? It may sound incredible, but until Endpoint Security, macOS had no reliable auditing mechanism to log process deaths. Except acct(2), that is, which to describe as “archaic” would be a compliment. It logs fixed-size records to a global log file, it truncates process names to 9 characters, it logs user ids but not process ids, and the timestamp format for process exit times is a literally unbelievable 1/64 * 8^exponent * mantissa seconds since the process started (good for about 8 years and a half of non-stop running, but with a variable precision that drops below the second at the 2:16:30 mark), encoded in 16 bits as 3 bits exponent, 13 bits mantissa; the process start time is a saner 32-bit count of seconds since the UNIX epoch (good for about 17 years from now). We opted not to use

acct(2)
.

3 Mandatory Access Control, no relation to “Mac” as an abbreviation for “Macintosh”. Historically referring to security models patterned after military document classification practices, MAC is nowadays a generic term for any policy-based security model, distinct from and orthogonal to permission-based security models (also known as DAC, or Discretionary Access Control). macOS inherited its modular MAC framework from the TrustedBSD project and uses it with great gusto (I count no fewer than seven policy modules on my macOS 11 machine). The Linux Security Modules framework is the Linux equivalent. Windows has limited MAC in the form of the capabilities system for UWP apps. The bitter irony is that, being limited to a small subset of applications, it’s a “mandatory” access control system that operates on an opt-in basis—to say nothing of its complete non-extensibility.

4 If daemons have to run under an anonymous user, why must it be root, which, while still bound by MAC policies, can bypass all ACLs, send kill signals to any process, invoke sysctls in write mode and in general do a lot of damage? A little known fact is that while system daemons run as root, they also do run inside extremely strict sandboxes based on the Seatbelt framework (internally based on, you guessed it, a MAC policy module, unimaginatively named Sandbox). In a sadly predictable twist, while extremely powerful and enabling incredibly granular access control, Seatbelt is almost completely undocumented (notice a pattern yet?). In a less predictable but sadder twist, it’s also been marked as deprecated with no replacement since OS X 10.8. Nevertheless, with all system daemons using it, plus third-party users of the caliber of Google Chrome and Mozilla Firefox, Seatbelt seems unlikely to disappear any time soon.

5 Mach (no relation to “Macintosh”) was a research project to replace the BSD kernel with a microkernel. The design proved impractical, and the most famous real-world implementation of Mach, NeXTSTEP (later “Darwin”, later “OS X”, later “macOS”), actually runs a Mach/BSD hybrid kernel with very little “micro”. Mach was an extremely influential design: Windows NT (later “Windows”) is a Mach clone too, except redesigned to work alongside a VMS-like kernel instead of a BSD one. Windows NT, too, flirted with a microkernel architecture, but it proved to be no more practical in the 90s than it had been in the 80s.

tstudent@MAC-67C2FA8CA4EC ~ % sysctl kern.sched_allow_NO_SMT_threads kern.sched_allow_NO_SMT_threads: 1 

The equivalent of NO_SMT can be forced on system-wide at the firmware level, by setting NVRAM variable

SMTDisable
to %01, as described in Apple support article HT210108.

Why you probably shouldn’t use
NO_SMT

Enabling NO_SMT through the API, instead of configuring the firmware to boot the machine with SMT support disabled, provides limited protection, as the

sched_allow_NO_SMT_threads
variable is writable at runtime by the superuser:

tstudent@MAC-67C2FA8CA4EC ~ % sudo sysctl kern.sched_allow_NO_SMT_threads=0 Password: kern.sched_allow_NO_SMT_threads: 1 -> 0 

This instantly disables NO_SMT system-wide. I wonder why they bothered making it a write-once flag, only to make it so trivial to disable.

The
TECS
mitigation

What is it?

I have been unable to figure out what TECS stands for. Closest I could get was this comment from the source code of the kernel (

osfmk/kern/task.h
, from XNU 7195.50.7.100.1):

#define TF_TECS                 0x00020000                              /* task threads must enable CPU security */ 

Thread Enable CPU Security? Even if it’s the correct interpretation, it doesn’t help us understand what it does.

In its current incarnation (the generic name suggests the specifics might change in the future), TECS flushes certain internal CPU buffers before returning from kernel mode to user mode. It’s a mitigation for the Rogue Data Cache Load (RDCL) family of attacks (like Meltdown) and the Microarchitectural Data Sampling (MDS) family of attacks (like RIDL and Fallout).

How to use
TECS

Unlike NO_SMT,

TECS
doesn’t have a dedicated API, but it’s enabled through a generic API called CPU Security Mitigations (CSM), that can also enable NO_SMT. In C/C++,
#include 
; no extra library necessary. From :

/*  * CPU Security Mitigation APIs  *  * Set CPU security mitigation on the current proc (all existing and future threads)  * This attribute is inherited on fork and exec  */ int proc_set_csm(uint32_t flags) __API_AVAILABLE(macos(11.0));

/* Set CPU security mitigation on the current thread */ int proc_setthread_csm(uint32_t flags) __API_AVAILABLE(macos(11.0));

/* * flags for CPU Security Mitigation APIs * PROC_CSM_ALL should be used in most cases, * the individual flags are provided only for performance evaluation etc */ #define PROC_CSM_ALL 0x0001 /* Set all available mitigations */ #define PROC_CSM_NOSMT 0x0002 /* Set NO_SMT – see above */ #define PROC_CSM_TECS 0x0004 /* Execute VERW on every return to user mode */

As with the dedicated NO_SMT API, we can enable mitigations for the entire current process, using

proc_set_csm
, or just the calling thread, with proc_setthread_csm. CSM functions, too, are wrappers for
process_policy(2)
.

Finally, just like NO_SMT, CSM also extends

posix_spawn(2)
. From :

/*  * Set CPU Security Mitigation on the spawned process  * This attribute affects all threads and is inherited on fork and exec  */ int     posix_spawnattr_set_csm_np(const posix_spawnattr_t * __restrict attr, uint32_t flags) __API_AVAILABLE(macos(11.0)); /*  * flags for CPU Security Mitigation attribute  * POSIX_SPAWN_NP_CSM_ALL should be used in most cases,  * the individual flags are provided only for performance evaluation etc  */ #define POSIX_SPAWN_NP_CSM_ALL         0x0001 #define POSIX_SPAWN_NP_CSM_NOSMT       0x0002 #define POSIX_SPAWN_NP_CSM_TECS        0x0004 

The meaning of the flags is identical to the similarly named flags, and

posix_spawnattr_set_csm_np(attr, POSIX_SPAWN_NP_CSM_NOSMT)
is 100% identical to and interchangeable with posix_spawnattr_setnosmt_np(attr).

How does it work?

Just like NO_SMT,

TECS
is a write-once, enable-only flag that is copied from task to task, from task to thread, and from thread to CPU. The task flag is TF_TECS. Below the task level, the flag becomes architecture-specific, x86-64-only, morphing into a mitigation codenamed
SEGCHK
. Thus, the thread flag is boolean field machine_thread::mthr_do_segchk, and the CPU flag is boolean field
cpu_data::cpu_curthread_do_segchk
, also known as CPU_NEED_SEGCHK in assembler code.

SEGCHK is implemented entirely in assembler, in kernel-to-user return routines

ks_64bit_return
and ks_32bit_return. If
CPU_NEED_SEGCHK
is set for the current CPU, they execute a VERW instruction shortly before the final
SYSEXIT
/SYSRET/
IRET
. VERW is an obscure and largely obsolete instruction that checks if the specified segment (the user mode stack segment, in the case of
SEGCHK
) is writable; but more importantly, it has the side effect of flushing the caches exploited by the RDCL and MDS families of attacks, mitigating them.

TECS is only enabled if it’s supported by the CPU, or if it’s been forced on by default. CPU support for

TECS
is only checked on x86-64, and it corresponds to whether SEGCHK is supported. The checks, performed at boot time, are:

Using VERW as a mitigation was initially suggested by two of the discoverers of RIDL (page 200), but it seems it proved insufficient, and CPU vendors had to enhance the instruction to act as a proper mitigation. macOS doesn’t trust the un-enhanced

VERW
.

SEGCHK can be forced on system-wide with a boot parameter. Again, see support article HT210108. Similarly, it can be forced off system-wide with undocumented boot parameter

cwad
(CPU workaround disable), which has the same syntax as cwae (CPU workaround enable).
cwae
has priority over cwad.

Unlike NO_SMT,

SEGCHK
/TECS has no firmware-level equivalent, nor can it be disabled after boot.

Who benefits from
NO_SMT
and
TECS
?

Google.

I’ve looked everywhere and no one else seems to use these mitigation APIs. The only source code match (outside of the macOS 11 and 12 SDKs, and the XNU source code itself) is Chromium. The only binary matches on my macOS 11 machine (outside of system libraries) are the Chrome and Electron frameworks, i.e. Chromium. Not even Safari seems to use them!

In Chromium, when compiling for macOS, the base::LaunchOptions structure passed to function

base::LaunchProcess
contains a boolean field named enable_cpu_security_mitigations; if set, the macOS implementation of
base::LaunchProcess
launches the new process with CSM flags POSIX_SPAWN_NP_CSM_ALL. If I understand the code correctly, mitigations are enabled for renderer and plugin host sub-processes, and disabled for all other kinds of sub-processes (another possible reading of the code suggests that the feature is implemented, but unused. Honestly, I haven’t dug too deep).

It’s hard not to wonder why Apple went through the effort of implementing mitigations and exposing them as APIs, and then neither document nor even use them. If they are ineffective, the question becomes why Google bothers using them. Either way, we are left with no clear answer.

Endpoint Security API improvements

Endpoint Security probably needs no introduction to the audience of this article, but I’ll still give a brief one.

This C API, first introduced in macOS 10.15, replaced and made obsolete the pre-existing patchwork of archaic auditing, monitoring and policing APIs (among which OpenBSM, KAUTH, Socketfilter and the venerable acct(2)—est. 1979[2]).

The design of Endpoint Security combined the near-absolute visibility and veto power over system state of a MAC[3] policy module with the safety properties of a client-server model, with a really nice and pretty-well-documented API on top. In short, it was the perfect API for a large variety of security applications.

Or was it? Unfortunately, Endpoint Security wasn’t without its own shortcomings, but they’re gradually being rectified. Let’s have a look at the most important improvements that macOS 11 and 12 make to Endpoint Security, only some of which were officially documented.

More message types

More operations can now be detected and/or vetoed, such as fcntl(2),

searchfs(2)
, ptrace(2), remounting a filesystem,
IOServiceOpen
, task_name_for_pid, process suspension and process resumption.

Interestingly, process suspension includes private system call pid_shutdown_sockets, which doesn’t actually suspend processes, but only shuts down their network connections after they’ve already been suspended. The system call was originally only available on iOS, where it’s part of how apps are sent to the background.

macOS 12 adds some more notifications: setuid(2),

setgid(2)
, seteuid(2),
setegid(2)
, setreuid(2) and
setregid(2)
.

More notifications, less polling

Some process metadata that only used to be available for querying, and necessitated polling and/or diffing to detect changes, now generates change events.

ES_EVENT_TYPE_NOTIFY_CS_INVALIDATED messages notify that a process’s code signature has gone invalid (i.e.

CS_VALID
flag no longer set) but the process is allowed to keep running (i.e. CS_HARD flag not set). Previously, it was only pollable through private system calls
csops
or csops_audittoken with operation code
CS_OPS_STATUS
.

ES_EVENT_TYPE_NOTIFY_REMOTE_THREAD_CREATE messages notify the creation of remote (i.e. inter-process) threads. Previously, this information was only available at low fidelity and with great effort, either by polling and diffing the data returned by Mach task method

task_info
with flavor TASK_EXTMOD_INFO, or by monitoring syslog for
com.apple.kernel.external_modification
messages.

More metadata

exec(2) messages now include the new process’s working directory (

es_event_exec_t::cwd
field).

Process metadata for all messages now includes:

  • The process’s controlling terminal, if any (
    es_process_t::tty
    field).
  • The process’s “start time”, i.e. the time when its process identifier was allocated by
    fork(2)
    (
    es_process_t::start_time
    field). Previously only available through
    sysctl(2)
    with the
    kern.proc.pid.
    OID.
  • the “responsible process” (
    es_process_t::responsible_audit_token
    field), i.e. the process that the notorious (to us developers) Transparency, Consent & Control (TCC) framework blames for an operation subject to user consent. Often, this is the client process that caused a daemon/agent process to be launched, which in an auditing context should be considered the “true” parent of a process (instead of “placeholder”
    xpcproxy(8)
    ). Previously only available through the private—and completely undocumented—”responsibility” API of MAC policy module Quarantine (e.g.
    responsibility_get_responsible_for_pid
    ).

Finally, for the first time ever in a macOS auditing API, all messages now report not just the process that caused the message to be generated, but the exact thread as well (es_message_t::thread field).

Improved performance

It’s now possible to process messages asynchronously without the overhead of es_copy_message/

es_free_message
(equivalent to a sequence of malloc,
memcpy
and free): Messages are now reference counted (see new functions
es_retain_message
/es_release_message), and can be moved across threads almost for free.
es_copy_message
and es_free_message have been outright deprecated and should no longer be used, except for backwards compatibility with macOS 10.15. They won’t be missed by me or my spindump traces.

A vulnerability quietly fixed

Sometimes, diffing SDK versions can even reveal security holes that were quietly fixed. Such is the case for fcntl(2) command

F_SETSIZE
.

F_SETSIZE is used to change the maximum disk space allocated to a file: If it’s smaller than the current size, the file is truncated; if it’s larger, the file is extended. What stops a malicious process from extending a file so that it fills the entire disk, and then reading from the extended file to carve deleted files out of what was previously free space? Very simple:

F_SETSIZE
fills the new file space with all zeroes to conceal what it used to contain. As an optimization, a superuser process (effective user id 0) is allowed to extend a file without zeroing out, because a superuser process is assumed to have access to that data anyway.

However, macOS has gradually made the UNIX security model irrelevant. For example, even the superuser is only allowed to access the private documents of a regular user with the user’s permission—permission that is given on a per-application basis, through that protector of users and bane of developers known as the Transparency, Consent & Control (TCC) framework. This reflects the new meaning that macOS has given to the “root” superuser: No longer the administrator of a multi-user system, as it was originally meant on UNIX, but either a temporary identity assumed by each user for system administration tasks (e.g. by way of sudo(8)), or the anonymous user under which daemons run[4].

In this new security model, the superuser can no longer be assumed to have unrestricted access to everything. However, not zeroing space when extending a file would let a superuser process with no entitlements at all recover any file that had been deleted.

macOS 11 fixes this by no longer handling the superuser as a special case for F_SETSIZE. The man page for

fcntl(2)
now says:

F_SETSIZE . Deprecated. In previous releases, this would allow a process with root privileges to truncate a file without zeroing space. For security reasons, this operation is no longer supported and will instead truncate the file in the same manner as

truncate(2)
.

Even the comments in were amended. Before (macOS 10.15 SDK):

#define F_SETSIZE       43              /* Truncate a file without zeroing space */ 

And after (macOS 11 SDK):

#define F_SETSIZE       43              /* Truncate a file. Equivalent to calling truncate(2) */ 

As far as I can tell, this information disclosure vulnerability was never assigned a CVE, nor was it publicly acknowledged in any other way before it was silently fixed in macOS 11.

O_NOFOLLOW_ANY

Even primeval APIs like system call open(2) can still have room for improvement. macOS 11 introduces a new flag for it,

O_NOFOLLOW_ANY
, that mitigates an entire family of potential vulnerabilities, especially in security applications.

Endpoint Security provides the full, resolved (no symlinks), normalized path of each file involved in an auditable event (what OpenBSM veterans/victims like me used to know as “vnode kpath”), but how can applications be sure that the path still identifies the same file by the time they open it? With O_NOFOLLOW_ANY set,

open(2)
will fail with error ELOOP if any symlink is encountered anywhere in the path: A stronger version of
O_NOFOLLOW
that applies to the entire path, not just the final component.

Conclusion

What did I learn from my rummaging? Apple still likes its secrets; the Chromium source code still is the best documentation on the mitigations and sandboxing features provided by all major operating systems; diffing releases remains the best way to find hidden features; and some secrets can stay hidden in plain sight for a long time.

I wrote this article in part as a “look at this cool thing”, and in part as a sort of public service, so that the new, hidden macOS features no longer return a deafening silence when queried on search engines. Even if I got some details wrong, at least the topic can now be debated.

Endnotes

1 The relationship between tasks and processes in macOS can be roughly summarized as: Each (BSD) process corresponds exactly to one (Mach[5]) task, and vice versa, until the process calls exec(2).

exec(2)
terminates the current task, creates a new one and associates it to the process, replacing the dead one. macOS old timers may object that exec(2) actually keeps the same task, resetting its state. It used to work like that, but it was a fragile design, that was dealt a fatal blow by Google researcher Ian Beer in 2016. Starting from XNU 3789.21.4 (macOS 10.12.1, iOS 10.1),
exec(2)
creates a new task
.

2 What’s the use for such an ancient API? It may sound incredible, but until Endpoint Security, macOS had no reliable auditing mechanism to log process deaths. Except acct(2), that is, which to describe as “archaic” would be a compliment. It logs fixed-size records to a global log file, it truncates process names to 9 characters, it logs user ids but not process ids, and the timestamp format for process exit times is a literally unbelievable 1/64 * 8^exponent * mantissa seconds since the process started (good for about 8 years and a half of non-stop running, but with a variable precision that drops below the second at the 2:16:30 mark), encoded in 16 bits as 3 bits exponent, 13 bits mantissa; the process start time is a saner 32-bit count of seconds since the UNIX epoch (good for about 17 years from now). We opted not to use

acct(2)
.

3 Mandatory Access Control, no relation to “Mac” as an abbreviation for “Macintosh”. Historically referring to security models patterned after military document classification practices, MAC is nowadays a generic term for any policy-based security model, distinct from and orthogonal to permission-based security models (also known as DAC, or Discretionary Access Control). macOS inherited its modular MAC framework from the TrustedBSD project and uses it with great gusto (I count no fewer than seven policy modules on my macOS 11 machine). The Linux Security Modules framework is the Linux equivalent. Windows has limited MAC in the form of the capabilities system for UWP apps. The bitter irony is that, being limited to a small subset of applications, it’s a “mandatory” access control system that operates on an opt-in basis—to say nothing of its complete non-extensibility.

4 If daemons have to run under an anonymous user, why must it be root, which, while still bound by MAC policies, can bypass all ACLs, send kill signals to any process, invoke sysctls in write mode and in general do a lot of damage? A little known fact is that while system daemons run as root, they also do run inside extremely strict sandboxes based on the Seatbelt framework (internally based on, you guessed it, a MAC policy module, unimaginatively named Sandbox). In a sadly predictable twist, while extremely powerful and enabling incredibly granular access control, Seatbelt is almost completely undocumented (notice a pattern yet?). In a less predictable but sadder twist, it’s also been marked as deprecated with no replacement since OS X 10.8. Nevertheless, with all system daemons using it, plus third-party users of the caliber of Google Chrome and Mozilla Firefox, Seatbelt seems unlikely to disappear any time soon.

5 Mach (no relation to “Macintosh”) was a research project to replace the BSD kernel with a microkernel. The design proved impractical, and the most famous real-world implementation of Mach, NeXTSTEP (later “Darwin”, later “OS X”, later “macOS”), actually runs a Mach/BSD hybrid kernel with very little “micro”. Mach was an extremely influential design: Windows NT (later “Windows”) is a Mach clone too, except redesigned to work alongside a VMS-like kernel instead of a BSD one. Windows NT, too, flirted with a microkernel architecture, but it proved to be no more practical in the 90s than it had been in the 80s.

int     posix_spawnattr_setnosmt_np(const posix_spawnattr_t * __restrict attr) __API_AVAILABLE(macos(11.0)); 

posix_spawnattr_setnosmt_np(3) performs the equivalent of

proc_set_no_smt
on the new process. In the name of the function, the “_np” suffix stands for “non-portable”: A customary way to mark OS-specific extensions to posix_spawn(2).

“But I already use fork(2) and I can’t stop using it! How do I enable

NO_SMT
after exec(2) without enabling them for the fork child?” You’re in luck, because macOS has you covered: Any
posix_spawn(2)
feature is automatically available to exec(2) thanks to non-standard flag
POSIX_SPAWN_SETEXEC
, that can be set on a posix_spawnattr_t using
posix_spawnattr_setflags(3)
, and makes posix_spawn(2) behave like
exec(2)
, replacing the current process instead of creating a new one.

How does it work?

NO_SMT is implemented as a per-task (not per-process[1]) flag, named

TF_NO_SMT
, or a per-thread scheduling flag named TH_SFLAG_NO_SMT. The flag is copied from tasks to their children, tasks and threads alike; it’s a write-once flag, that once set cannot be removed. The flag is then copied from each thread to the CPU they’re currently running on (field
processor_t::current_is_NO_SMT
).

NO_SMT is implemented by the

dualq
scheduling algorithm, in a pretty straightforward way: A NO_SMT thread cannot share a CPU core with any other thread.

NO_SMT can be disabled system-wide with boot argument

disable_NO_SMT_threads
, which causes the kernel variable sched_allow_NO_SMT_threads to be initialized with
0
instead of 1. The current value of
sched_allow_NO_SMT_threads
can be queried with sysctl kern.sched_allow_NO_SMT_threads:

tstudent@MAC-67C2FA8CA4EC ~ % sysctl kern.sched_allow_NO_SMT_threads kern.sched_allow_NO_SMT_threads: 1 

The equivalent of NO_SMT can be forced on system-wide at the firmware level, by setting NVRAM variable

SMTDisable
to %01, as described in Apple support article HT210108.

Why you probably shouldn’t use
NO_SMT

Enabling NO_SMT through the API, instead of configuring the firmware to boot the machine with SMT support disabled, provides limited protection, as the

sched_allow_NO_SMT_threads
variable is writable at runtime by the superuser:

tstudent@MAC-67C2FA8CA4EC ~ % sudo sysctl kern.sched_allow_NO_SMT_threads=0 Password: kern.sched_allow_NO_SMT_threads: 1 -> 0 

This instantly disables NO_SMT system-wide. I wonder why they bothered making it a write-once flag, only to make it so trivial to disable.

The
TECS
mitigation

What is it?

I have been unable to figure out what TECS stands for. Closest I could get was this comment from the source code of the kernel (

osfmk/kern/task.h
, from XNU 7195.50.7.100.1):

#define TF_TECS                 0x00020000                              /* task threads must enable CPU security */ 

Thread Enable CPU Security? Even if it’s the correct interpretation, it doesn’t help us understand what it does.

In its current incarnation (the generic name suggests the specifics might change in the future), TECS flushes certain internal CPU buffers before returning from kernel mode to user mode. It’s a mitigation for the Rogue Data Cache Load (RDCL) family of attacks (like Meltdown) and the Microarchitectural Data Sampling (MDS) family of attacks (like RIDL and Fallout).

How to use
TECS

Unlike NO_SMT,

TECS
doesn’t have a dedicated API, but it’s enabled through a generic API called CPU Security Mitigations (CSM), that can also enable NO_SMT. In C/C++,
#include 
; no extra library necessary. From :

/*  * CPU Security Mitigation APIs  *  * Set CPU security mitigation on the current proc (all existing and future threads)  * This attribute is inherited on fork and exec  */ int proc_set_csm(uint32_t flags) __API_AVAILABLE(macos(11.0));

/* Set CPU security mitigation on the current thread */ int proc_setthread_csm(uint32_t flags) __API_AVAILABLE(macos(11.0));

/* * flags for CPU Security Mitigation APIs * PROC_CSM_ALL should be used in most cases, * the individual flags are provided only for performance evaluation etc */ #define PROC_CSM_ALL 0x0001 /* Set all available mitigations */ #define PROC_CSM_NOSMT 0x0002 /* Set NO_SMT – see above */ #define PROC_CSM_TECS 0x0004 /* Execute VERW on every return to user mode */

As with the dedicated NO_SMT API, we can enable mitigations for the entire current process, using

proc_set_csm
, or just the calling thread, with proc_setthread_csm. CSM functions, too, are wrappers for
process_policy(2)
.

Finally, just like NO_SMT, CSM also extends

posix_spawn(2)
. From :

/*  * Set CPU Security Mitigation on the spawned process  * This attribute affects all threads and is inherited on fork and exec  */ int     posix_spawnattr_set_csm_np(const posix_spawnattr_t * __restrict attr, uint32_t flags) __API_AVAILABLE(macos(11.0)); /*  * flags for CPU Security Mitigation attribute  * POSIX_SPAWN_NP_CSM_ALL should be used in most cases,  * the individual flags are provided only for performance evaluation etc  */ #define POSIX_SPAWN_NP_CSM_ALL         0x0001 #define POSIX_SPAWN_NP_CSM_NOSMT       0x0002 #define POSIX_SPAWN_NP_CSM_TECS        0x0004 

The meaning of the flags is identical to the similarly named flags, and

posix_spawnattr_set_csm_np(attr, POSIX_SPAWN_NP_CSM_NOSMT)
is 100% identical to and interchangeable with posix_spawnattr_setnosmt_np(attr).

How does it work?

Just like NO_SMT,

TECS
is a write-once, enable-only flag that is copied from task to task, from task to thread, and from thread to CPU. The task flag is TF_TECS. Below the task level, the flag becomes architecture-specific, x86-64-only, morphing into a mitigation codenamed
SEGCHK
. Thus, the thread flag is boolean field machine_thread::mthr_do_segchk, and the CPU flag is boolean field
cpu_data::cpu_curthread_do_segchk
, also known as CPU_NEED_SEGCHK in assembler code.

SEGCHK is implemented entirely in assembler, in kernel-to-user return routines

ks_64bit_return
and ks_32bit_return. If
CPU_NEED_SEGCHK
is set for the current CPU, they execute a VERW instruction shortly before the final
SYSEXIT
/SYSRET/
IRET
. VERW is an obscure and largely obsolete instruction that checks if the specified segment (the user mode stack segment, in the case of
SEGCHK
) is writable; but more importantly, it has the side effect of flushing the caches exploited by the RDCL and MDS families of attacks, mitigating them.

TECS is only enabled if it’s supported by the CPU, or if it’s been forced on by default. CPU support for

TECS
is only checked on x86-64, and it corresponds to whether SEGCHK is supported. The checks, performed at boot time, are:

Using VERW as a mitigation was initially suggested by two of the discoverers of RIDL (page 200), but it seems it proved insufficient, and CPU vendors had to enhance the instruction to act as a proper mitigation. macOS doesn’t trust the un-enhanced

VERW
.

SEGCHK can be forced on system-wide with a boot parameter. Again, see support article HT210108. Similarly, it can be forced off system-wide with undocumented boot parameter

cwad
(CPU workaround disable), which has the same syntax as cwae (CPU workaround enable).
cwae
has priority over cwad.

Unlike NO_SMT,

SEGCHK
/TECS has no firmware-level equivalent, nor can it be disabled after boot.

Who benefits from
NO_SMT
and
TECS
?

Google.

I’ve looked everywhere and no one else seems to use these mitigation APIs. The only source code match (outside of the macOS 11 and 12 SDKs, and the XNU source code itself) is Chromium. The only binary matches on my macOS 11 machine (outside of system libraries) are the Chrome and Electron frameworks, i.e. Chromium. Not even Safari seems to use them!

In Chromium, when compiling for macOS, the base::LaunchOptions structure passed to function

base::LaunchProcess
contains a boolean field named enable_cpu_security_mitigations; if set, the macOS implementation of
base::LaunchProcess
launches the new process with CSM flags POSIX_SPAWN_NP_CSM_ALL. If I understand the code correctly, mitigations are enabled for renderer and plugin host sub-processes, and disabled for all other kinds of sub-processes (another possible reading of the code suggests that the feature is implemented, but unused. Honestly, I haven’t dug too deep).

It’s hard not to wonder why Apple went through the effort of implementing mitigations and exposing them as APIs, and then neither document nor even use them. If they are ineffective, the question becomes why Google bothers using them. Either way, we are left with no clear answer.

Endpoint Security API improvements

Endpoint Security probably needs no introduction to the audience of this article, but I’ll still give a brief one.

This C API, first introduced in macOS 10.15, replaced and made obsolete the pre-existing patchwork of archaic auditing, monitoring and policing APIs (among which OpenBSM, KAUTH, Socketfilter and the venerable acct(2)—est. 1979[2]).

The design of Endpoint Security combined the near-absolute visibility and veto power over system state of a MAC[3] policy module with the safety properties of a client-server model, with a really nice and pretty-well-documented API on top. In short, it was the perfect API for a large variety of security applications.

Or was it? Unfortunately, Endpoint Security wasn’t without its own shortcomings, but they’re gradually being rectified. Let’s have a look at the most important improvements that macOS 11 and 12 make to Endpoint Security, only some of which were officially documented.

More message types

More operations can now be detected and/or vetoed, such as fcntl(2),

searchfs(2)
, ptrace(2), remounting a filesystem,
IOServiceOpen
, task_name_for_pid, process suspension and process resumption.

Interestingly, process suspension includes private system call pid_shutdown_sockets, which doesn’t actually suspend processes, but only shuts down their network connections after they’ve already been suspended. The system call was originally only available on iOS, where it’s part of how apps are sent to the background.

macOS 12 adds some more notifications: setuid(2),

setgid(2)
, seteuid(2),
setegid(2)
, setreuid(2) and
setregid(2)
.

More notifications, less polling

Some process metadata that only used to be available for querying, and necessitated polling and/or diffing to detect changes, now generates change events.

ES_EVENT_TYPE_NOTIFY_CS_INVALIDATED messages notify that a process’s code signature has gone invalid (i.e.

CS_VALID
flag no longer set) but the process is allowed to keep running (i.e. CS_HARD flag not set). Previously, it was only pollable through private system calls
csops
or csops_audittoken with operation code
CS_OPS_STATUS
.

ES_EVENT_TYPE_NOTIFY_REMOTE_THREAD_CREATE messages notify the creation of remote (i.e. inter-process) threads. Previously, this information was only available at low fidelity and with great effort, either by polling and diffing the data returned by Mach task method

task_info
with flavor TASK_EXTMOD_INFO, or by monitoring syslog for
com.apple.kernel.external_modification
messages.

More metadata

exec(2) messages now include the new process’s working directory (

es_event_exec_t::cwd
field).

Process metadata for all messages now includes:

  • The process’s controlling terminal, if any (
    es_process_t::tty
    field).
  • The process’s “start time”, i.e. the time when its process identifier was allocated by
    fork(2)
    (
    es_process_t::start_time
    field). Previously only available through
    sysctl(2)
    with the
    kern.proc.pid.
    OID.
  • the “responsible process” (
    es_process_t::responsible_audit_token
    field), i.e. the process that the notorious (to us developers) Transparency, Consent & Control (TCC) framework blames for an operation subject to user consent. Often, this is the client process that caused a daemon/agent process to be launched, which in an auditing context should be considered the “true” parent of a process (instead of “placeholder”
    xpcproxy(8)
    ). Previously only available through the private—and completely undocumented—”responsibility” API of MAC policy module Quarantine (e.g.
    responsibility_get_responsible_for_pid
    ).

Finally, for the first time ever in a macOS auditing API, all messages now report not just the process that caused the message to be generated, but the exact thread as well (es_message_t::thread field).

Improved performance

It’s now possible to process messages asynchronously without the overhead of es_copy_message/

es_free_message
(equivalent to a sequence of malloc,
memcpy
and free): Messages are now reference counted (see new functions
es_retain_message
/es_release_message), and can be moved across threads almost for free.
es_copy_message
and es_free_message have been outright deprecated and should no longer be used, except for backwards compatibility with macOS 10.15. They won’t be missed by me or my spindump traces.

A vulnerability quietly fixed

Sometimes, diffing SDK versions can even reveal security holes that were quietly fixed. Such is the case for fcntl(2) command

F_SETSIZE
.

F_SETSIZE is used to change the maximum disk space allocated to a file: If it’s smaller than the current size, the file is truncated; if it’s larger, the file is extended. What stops a malicious process from extending a file so that it fills the entire disk, and then reading from the extended file to carve deleted files out of what was previously free space? Very simple:

F_SETSIZE
fills the new file space with all zeroes to conceal what it used to contain. As an optimization, a superuser process (effective user id 0) is allowed to extend a file without zeroing out, because a superuser process is assumed to have access to that data anyway.

However, macOS has gradually made the UNIX security model irrelevant. For example, even the superuser is only allowed to access the private documents of a regular user with the user’s permission—permission that is given on a per-application basis, through that protector of users and bane of developers known as the Transparency, Consent & Control (TCC) framework. This reflects the new meaning that macOS has given to the “root” superuser: No longer the administrator of a multi-user system, as it was originally meant on UNIX, but either a temporary identity assumed by each user for system administration tasks (e.g. by way of sudo(8)), or the anonymous user under which daemons run[4].

In this new security model, the superuser can no longer be assumed to have unrestricted access to everything. However, not zeroing space when extending a file would let a superuser process with no entitlements at all recover any file that had been deleted.

macOS 11 fixes this by no longer handling the superuser as a special case for F_SETSIZE. The man page for

fcntl(2)
now says:

F_SETSIZE . Deprecated. In previous releases, this would allow a process with root privileges to truncate a file without zeroing space. For security reasons, this operation is no longer supported and will instead truncate the file in the same manner as

truncate(2)
.

Even the comments in were amended. Before (macOS 10.15 SDK):

#define F_SETSIZE       43              /* Truncate a file without zeroing space */ 

And after (macOS 11 SDK):

#define F_SETSIZE       43              /* Truncate a file. Equivalent to calling truncate(2) */ 

As far as I can tell, this information disclosure vulnerability was never assigned a CVE, nor was it publicly acknowledged in any other way before it was silently fixed in macOS 11.

O_NOFOLLOW_ANY

Even primeval APIs like system call open(2) can still have room for improvement. macOS 11 introduces a new flag for it,

O_NOFOLLOW_ANY
, that mitigates an entire family of potential vulnerabilities, especially in security applications.

Endpoint Security provides the full, resolved (no symlinks), normalized path of each file involved in an auditable event (what OpenBSM veterans/victims like me used to know as “vnode kpath”), but how can applications be sure that the path still identifies the same file by the time they open it? With O_NOFOLLOW_ANY set,

open(2)
will fail with error ELOOP if any symlink is encountered anywhere in the path: A stronger version of
O_NOFOLLOW
that applies to the entire path, not just the final component.

Conclusion

What did I learn from my rummaging? Apple still likes its secrets; the Chromium source code still is the best documentation on the mitigations and sandboxing features provided by all major operating systems; diffing releases remains the best way to find hidden features; and some secrets can stay hidden in plain sight for a long time.

I wrote this article in part as a “look at this cool thing”, and in part as a sort of public service, so that the new, hidden macOS features no longer return a deafening silence when queried on search engines. Even if I got some details wrong, at least the topic can now be debated.

Endnotes

1 The relationship between tasks and processes in macOS can be roughly summarized as: Each (BSD) process corresponds exactly to one (Mach[5]) task, and vice versa, until the process calls exec(2).

exec(2)
terminates the current task, creates a new one and associates it to the process, replacing the dead one. macOS old timers may object that exec(2) actually keeps the same task, resetting its state. It used to work like that, but it was a fragile design, that was dealt a fatal blow by Google researcher Ian Beer in 2016. Starting from XNU 3789.21.4 (macOS 10.12.1, iOS 10.1),
exec(2)
creates a new task
.

2 What’s the use for such an ancient API? It may sound incredible, but until Endpoint Security, macOS had no reliable auditing mechanism to log process deaths. Except acct(2), that is, which to describe as “archaic” would be a compliment. It logs fixed-size records to a global log file, it truncates process names to 9 characters, it logs user ids but not process ids, and the timestamp format for process exit times is a literally unbelievable 1/64 * 8^exponent * mantissa seconds since the process started (good for about 8 years and a half of non-stop running, but with a variable precision that drops below the second at the 2:16:30 mark), encoded in 16 bits as 3 bits exponent, 13 bits mantissa; the process start time is a saner 32-bit count of seconds since the UNIX epoch (good for about 17 years from now). We opted not to use

acct(2)
.

3 Mandatory Access Control, no relation to “Mac” as an abbreviation for “Macintosh”. Historically referring to security models patterned after military document classification practices, MAC is nowadays a generic term for any policy-based security model, distinct from and orthogonal to permission-based security models (also known as DAC, or Discretionary Access Control). macOS inherited its modular MAC framework from the TrustedBSD project and uses it with great gusto (I count no fewer than seven policy modules on my macOS 11 machine). The Linux Security Modules framework is the Linux equivalent. Windows has limited MAC in the form of the capabilities system for UWP apps. The bitter irony is that, being limited to a small subset of applications, it’s a “mandatory” access control system that operates on an opt-in basis—to say nothing of its complete non-extensibility.

4 If daemons have to run under an anonymous user, why must it be root, which, while still bound by MAC policies, can bypass all ACLs, send kill signals to any process, invoke sysctls in write mode and in general do a lot of damage? A little known fact is that while system daemons run as root, they also do run inside extremely strict sandboxes based on the Seatbelt framework (internally based on, you guessed it, a MAC policy module, unimaginatively named Sandbox). In a sadly predictable twist, while extremely powerful and enabling incredibly granular access control, Seatbelt is almost completely undocumented (notice a pattern yet?). In a less predictable but sadder twist, it’s also been marked as deprecated with no replacement since OS X 10.8. Nevertheless, with all system daemons using it, plus third-party users of the caliber of Google Chrome and Mozilla Firefox, Seatbelt seems unlikely to disappear any time soon.

5 Mach (no relation to “Macintosh”) was a research project to replace the BSD kernel with a microkernel. The design proved impractical, and the most famous real-world implementation of Mach, NeXTSTEP (later “Darwin”, later “OS X”, later “macOS”), actually runs a Mach/BSD hybrid kernel with very little “micro”. Mach was an extremely influential design: Windows NT (later “Windows”) is a Mach clone too, except redesigned to work alongside a VMS-like kernel instead of a BSD one. Windows NT, too, flirted with a microkernel architecture, but it proved to be no more practical in the 90s than it had been in the 80s.

/*  * NO_SMT means that on an SMT CPU, this thread must be scheduled alone,  * with the paired CPU idle.  *  * Set NO_SMT on the current proc (all existing and future threads)  * This attribute is inherited on fork and exec  */ int proc_set_no_smt(void) __API_AVAILABLE(macos(11.0));

/* Set NO_SMT on the current thread */ int proc_setthread_no_smt(void) __API_AVAILABLE(macos(11.0));

Simply call proc_set_no_smt to enable

NO_SMT
for the entire process (existing and future threads alike), or proc_setthread_no_smt to enable it for the calling thread only. Like the comments say,
fork(2)
children inherit the parent process’s NO_SMT state, and
exec(2)
won’t reset it.

Note that “libproc” is a misnomer, and these aren’t library functions but thin C wrappers over the private system call

process_policy(2)
.

NO_SMT also extends

posix_spawn(2)
, so that we can enable mitigations for a new process without setting them for the current process, or spawning a short-lived fork(2) child (ideally, we should never call
fork(2)
again
in any new code, on any OS. Ever). From :

int     posix_spawnattr_setnosmt_np(const posix_spawnattr_t * __restrict attr) __API_AVAILABLE(macos(11.0)); 

posix_spawnattr_setnosmt_np(3) performs the equivalent of

proc_set_no_smt
on the new process. In the name of the function, the “_np” suffix stands for “non-portable”: A customary way to mark OS-specific extensions to posix_spawn(2).

“But I already use fork(2) and I can’t stop using it! How do I enable

NO_SMT
after exec(2) without enabling them for the fork child?” You’re in luck, because macOS has you covered: Any
posix_spawn(2)
feature is automatically available to exec(2) thanks to non-standard flag
POSIX_SPAWN_SETEXEC
, that can be set on a posix_spawnattr_t using
posix_spawnattr_setflags(3)
, and makes posix_spawn(2) behave like
exec(2)
, replacing the current process instead of creating a new one.

How does it work?

NO_SMT is implemented as a per-task (not per-process[1]) flag, named

TF_NO_SMT
, or a per-thread scheduling flag named TH_SFLAG_NO_SMT. The flag is copied from tasks to their children, tasks and threads alike; it’s a write-once flag, that once set cannot be removed. The flag is then copied from each thread to the CPU they’re currently running on (field
processor_t::current_is_NO_SMT
).

NO_SMT is implemented by the

dualq
scheduling algorithm, in a pretty straightforward way: A NO_SMT thread cannot share a CPU core with any other thread.

NO_SMT can be disabled system-wide with boot argument

disable_NO_SMT_threads
, which causes the kernel variable sched_allow_NO_SMT_threads to be initialized with
0
instead of 1. The current value of
sched_allow_NO_SMT_threads
can be queried with sysctl kern.sched_allow_NO_SMT_threads:

tstudent@MAC-67C2FA8CA4EC ~ % sysctl kern.sched_allow_NO_SMT_threads kern.sched_allow_NO_SMT_threads: 1 

The equivalent of NO_SMT can be forced on system-wide at the firmware level, by setting NVRAM variable

SMTDisable
to %01, as described in Apple support article HT210108.

Why you probably shouldn’t use
NO_SMT

Enabling NO_SMT through the API, instead of configuring the firmware to boot the machine with SMT support disabled, provides limited protection, as the

sched_allow_NO_SMT_threads
variable is writable at runtime by the superuser:

tstudent@MAC-67C2FA8CA4EC ~ % sudo sysctl kern.sched_allow_NO_SMT_threads=0 Password: kern.sched_allow_NO_SMT_threads: 1 -> 0 

This instantly disables NO_SMT system-wide. I wonder why they bothered making it a write-once flag, only to make it so trivial to disable.

The
TECS
mitigation

What is it?

I have been unable to figure out what TECS stands for. Closest I could get was this comment from the source code of the kernel (

osfmk/kern/task.h
, from XNU 7195.50.7.100.1):

#define TF_TECS                 0x00020000                              /* task threads must enable CPU security */ 

Thread Enable CPU Security? Even if it’s the correct interpretation, it doesn’t help us understand what it does.

In its current incarnation (the generic name suggests the specifics might change in the future), TECS flushes certain internal CPU buffers before returning from kernel mode to user mode. It’s a mitigation for the Rogue Data Cache Load (RDCL) family of attacks (like Meltdown) and the Microarchitectural Data Sampling (MDS) family of attacks (like RIDL and Fallout).

How to use
TECS

Unlike NO_SMT,

TECS
doesn’t have a dedicated API, but it’s enabled through a generic API called CPU Security Mitigations (CSM), that can also enable NO_SMT. In C/C++,
#include 
; no extra library necessary. From :

/*  * CPU Security Mitigation APIs  *  * Set CPU security mitigation on the current proc (all existing and future threads)  * This attribute is inherited on fork and exec  */ int proc_set_csm(uint32_t flags) __API_AVAILABLE(macos(11.0));

/* Set CPU security mitigation on the current thread */ int proc_setthread_csm(uint32_t flags) __API_AVAILABLE(macos(11.0));

/* * flags for CPU Security Mitigation APIs * PROC_CSM_ALL should be used in most cases, * the individual flags are provided only for performance evaluation etc */ #define PROC_CSM_ALL 0x0001 /* Set all available mitigations */ #define PROC_CSM_NOSMT 0x0002 /* Set NO_SMT – see above */ #define PROC_CSM_TECS 0x0004 /* Execute VERW on every return to user mode */

As with the dedicated NO_SMT API, we can enable mitigations for the entire current process, using

proc_set_csm
, or just the calling thread, with proc_setthread_csm. CSM functions, too, are wrappers for
process_policy(2)
.

Finally, just like NO_SMT, CSM also extends

posix_spawn(2)
. From :

/*  * Set CPU Security Mitigation on the spawned process  * This attribute affects all threads and is inherited on fork and exec  */ int     posix_spawnattr_set_csm_np(const posix_spawnattr_t * __restrict attr, uint32_t flags) __API_AVAILABLE(macos(11.0)); /*  * flags for CPU Security Mitigation attribute  * POSIX_SPAWN_NP_CSM_ALL should be used in most cases,  * the individual flags are provided only for performance evaluation etc  */ #define POSIX_SPAWN_NP_CSM_ALL         0x0001 #define POSIX_SPAWN_NP_CSM_NOSMT       0x0002 #define POSIX_SPAWN_NP_CSM_TECS        0x0004 

The meaning of the flags is identical to the similarly named flags, and

posix_spawnattr_set_csm_np(attr, POSIX_SPAWN_NP_CSM_NOSMT)
is 100% identical to and interchangeable with posix_spawnattr_setnosmt_np(attr).

How does it work?

Just like NO_SMT,

TECS
is a write-once, enable-only flag that is copied from task to task, from task to thread, and from thread to CPU. The task flag is TF_TECS. Below the task level, the flag becomes architecture-specific, x86-64-only, morphing into a mitigation codenamed
SEGCHK
. Thus, the thread flag is boolean field machine_thread::mthr_do_segchk, and the CPU flag is boolean field
cpu_data::cpu_curthread_do_segchk
, also known as CPU_NEED_SEGCHK in assembler code.

SEGCHK is implemented entirely in assembler, in kernel-to-user return routines

ks_64bit_return
and ks_32bit_return. If
CPU_NEED_SEGCHK
is set for the current CPU, they execute a VERW instruction shortly before the final
SYSEXIT
/SYSRET/
IRET
. VERW is an obscure and largely obsolete instruction that checks if the specified segment (the user mode stack segment, in the case of
SEGCHK
) is writable; but more importantly, it has the side effect of flushing the caches exploited by the RDCL and MDS families of attacks, mitigating them.

TECS is only enabled if it’s supported by the CPU, or if it’s been forced on by default. CPU support for

TECS
is only checked on x86-64, and it corresponds to whether SEGCHK is supported. The checks, performed at boot time, are:

Using VERW as a mitigation was initially suggested by two of the discoverers of RIDL (page 200), but it seems it proved insufficient, and CPU vendors had to enhance the instruction to act as a proper mitigation. macOS doesn’t trust the un-enhanced

VERW
.

SEGCHK can be forced on system-wide with a boot parameter. Again, see support article HT210108. Similarly, it can be forced off system-wide with undocumented boot parameter

cwad
(CPU workaround disable), which has the same syntax as cwae (CPU workaround enable).
cwae
has priority over cwad.

Unlike NO_SMT,

SEGCHK
/TECS has no firmware-level equivalent, nor can it be disabled after boot.

Who benefits from
NO_SMT
and
TECS
?

Google.

I’ve looked everywhere and no one else seems to use these mitigation APIs. The only source code match (outside of the macOS 11 and 12 SDKs, and the XNU source code itself) is Chromium. The only binary matches on my macOS 11 machine (outside of system libraries) are the Chrome and Electron frameworks, i.e. Chromium. Not even Safari seems to use them!

In Chromium, when compiling for macOS, the base::LaunchOptions structure passed to function

base::LaunchProcess
contains a boolean field named enable_cpu_security_mitigations; if set, the macOS implementation of
base::LaunchProcess
launches the new process with CSM flags POSIX_SPAWN_NP_CSM_ALL. If I understand the code correctly, mitigations are enabled for renderer and plugin host sub-processes, and disabled for all other kinds of sub-processes (another possible reading of the code suggests that the feature is implemented, but unused. Honestly, I haven’t dug too deep).

It’s hard not to wonder why Apple went through the effort of implementing mitigations and exposing them as APIs, and then neither document nor even use them. If they are ineffective, the question becomes why Google bothers using them. Either way, we are left with no clear answer.

Endpoint Security API improvements

Endpoint Security probably needs no introduction to the audience of this article, but I’ll still give a brief one.

This C API, first introduced in macOS 10.15, replaced and made obsolete the pre-existing patchwork of archaic auditing, monitoring and policing APIs (among which OpenBSM, KAUTH, Socketfilter and the venerable acct(2)—est. 1979[2]).

The design of Endpoint Security combined the near-absolute visibility and veto power over system state of a MAC[3] policy module with the safety properties of a client-server model, with a really nice and pretty-well-documented API on top. In short, it was the perfect API for a large variety of security applications.

Or was it? Unfortunately, Endpoint Security wasn’t without its own shortcomings, but they’re gradually being rectified. Let’s have a look at the most important improvements that macOS 11 and 12 make to Endpoint Security, only some of which were officially documented.

More message types

More operations can now be detected and/or vetoed, such as fcntl(2),

searchfs(2)
, ptrace(2), remounting a filesystem,
IOServiceOpen
, task_name_for_pid, process suspension and process resumption.

Interestingly, process suspension includes private system call pid_shutdown_sockets, which doesn’t actually suspend processes, but only shuts down their network connections after they’ve already been suspended. The system call was originally only available on iOS, where it’s part of how apps are sent to the background.

macOS 12 adds some more notifications: setuid(2),

setgid(2)
, seteuid(2),
setegid(2)
, setreuid(2) and
setregid(2)
.

More notifications, less polling

Some process metadata that only used to be available for querying, and necessitated polling and/or diffing to detect changes, now generates change events.

ES_EVENT_TYPE_NOTIFY_CS_INVALIDATED messages notify that a process’s code signature has gone invalid (i.e.

CS_VALID
flag no longer set) but the process is allowed to keep running (i.e. CS_HARD flag not set). Previously, it was only pollable through private system calls
csops
or csops_audittoken with operation code
CS_OPS_STATUS
.

ES_EVENT_TYPE_NOTIFY_REMOTE_THREAD_CREATE messages notify the creation of remote (i.e. inter-process) threads. Previously, this information was only available at low fidelity and with great effort, either by polling and diffing the data returned by Mach task method

task_info
with flavor TASK_EXTMOD_INFO, or by monitoring syslog for
com.apple.kernel.external_modification
messages.

More metadata

exec(2) messages now include the new process’s working directory (

es_event_exec_t::cwd
field).

Process metadata for all messages now includes:

  • The process’s controlling terminal, if any (
    es_process_t::tty
    field).
  • The process’s “start time”, i.e. the time when its process identifier was allocated by
    fork(2)
    (
    es_process_t::start_time
    field). Previously only available through
    sysctl(2)
    with the
    kern.proc.pid.
    OID.
  • the “responsible process” (
    es_process_t::responsible_audit_token
    field), i.e. the process that the notorious (to us developers) Transparency, Consent & Control (TCC) framework blames for an operation subject to user consent. Often, this is the client process that caused a daemon/agent process to be launched, which in an auditing context should be considered the “true” parent of a process (instead of “placeholder”
    xpcproxy(8)
    ). Previously only available through the private—and completely undocumented—”responsibility” API of MAC policy module Quarantine (e.g.
    responsibility_get_responsible_for_pid
    ).

Finally, for the first time ever in a macOS auditing API, all messages now report not just the process that caused the message to be generated, but the exact thread as well (es_message_t::thread field).

Improved performance

It’s now possible to process messages asynchronously without the overhead of es_copy_message/

es_free_message
(equivalent to a sequence of malloc,
memcpy
and free): Messages are now reference counted (see new functions
es_retain_message
/es_release_message), and can be moved across threads almost for free.
es_copy_message
and es_free_message have been outright deprecated and should no longer be used, except for backwards compatibility with macOS 10.15. They won’t be missed by me or my spindump traces.

A vulnerability quietly fixed

Sometimes, diffing SDK versions can even reveal security holes that were quietly fixed. Such is the case for fcntl(2) command

F_SETSIZE
.

F_SETSIZE is used to change the maximum disk space allocated to a file: If it’s smaller than the current size, the file is truncated; if it’s larger, the file is extended. What stops a malicious process from extending a file so that it fills the entire disk, and then reading from the extended file to carve deleted files out of what was previously free space? Very simple:

F_SETSIZE
fills the new file space with all zeroes to conceal what it used to contain. As an optimization, a superuser process (effective user id 0) is allowed to extend a file without zeroing out, because a superuser process is assumed to have access to that data anyway.

However, macOS has gradually made the UNIX security model irrelevant. For example, even the superuser is only allowed to access the private documents of a regular user with the user’s permission—permission that is given on a per-application basis, through that protector of users and bane of developers known as the Transparency, Consent & Control (TCC) framework. This reflects the new meaning that macOS has given to the “root” superuser: No longer the administrator of a multi-user system, as it was originally meant on UNIX, but either a temporary identity assumed by each user for system administration tasks (e.g. by way of sudo(8)), or the anonymous user under which daemons run[4].

In this new security model, the superuser can no longer be assumed to have unrestricted access to everything. However, not zeroing space when extending a file would let a superuser process with no entitlements at all recover any file that had been deleted.

macOS 11 fixes this by no longer handling the superuser as a special case for F_SETSIZE. The man page for

fcntl(2)
now says:

F_SETSIZE . Deprecated. In previous releases, this would allow a process with root privileges to truncate a file without zeroing space. For security reasons, this operation is no longer supported and will instead truncate the file in the same manner as

truncate(2)
.

Even the comments in were amended. Before (macOS 10.15 SDK):

#define F_SETSIZE       43              /* Truncate a file without zeroing space */ 

And after (macOS 11 SDK):

#define F_SETSIZE       43              /* Truncate a file. Equivalent to calling truncate(2) */ 

As far as I can tell, this information disclosure vulnerability was never assigned a CVE, nor was it publicly acknowledged in any other way before it was silently fixed in macOS 11.

O_NOFOLLOW_ANY

Even primeval APIs like system call open(2) can still have room for improvement. macOS 11 introduces a new flag for it,

O_NOFOLLOW_ANY
, that mitigates an entire family of potential vulnerabilities, especially in security applications.

Endpoint Security provides the full, resolved (no symlinks), normalized path of each file involved in an auditable event (what OpenBSM veterans/victims like me used to know as “vnode kpath”), but how can applications be sure that the path still identifies the same file by the time they open it? With O_NOFOLLOW_ANY set,

open(2)
will fail with error ELOOP if any symlink is encountered anywhere in the path: A stronger version of
O_NOFOLLOW
that applies to the entire path, not just the final component.

Conclusion

What did I learn from my rummaging? Apple still likes its secrets; the Chromium source code still is the best documentation on the mitigations and sandboxing features provided by all major operating systems; diffing releases remains the best way to find hidden features; and some secrets can stay hidden in plain sight for a long time.

I wrote this article in part as a “look at this cool thing”, and in part as a sort of public service, so that the new, hidden macOS features no longer return a deafening silence when queried on search engines. Even if I got some details wrong, at least the topic can now be debated.

Endnotes

1 The relationship between tasks and processes in macOS can be roughly summarized as: Each (BSD) process corresponds exactly to one (Mach[5]) task, and vice versa, until the process calls exec(2).

exec(2)
terminates the current task, creates a new one and associates it to the process, replacing the dead one. macOS old timers may object that exec(2) actually keeps the same task, resetting its state. It used to work like that, but it was a fragile design, that was dealt a fatal blow by Google researcher Ian Beer in 2016. Starting from XNU 3789.21.4 (macOS 10.12.1, iOS 10.1),
exec(2)
creates a new task
.

2 What’s the use for such an ancient API? It may sound incredible, but until Endpoint Security, macOS had no reliable auditing mechanism to log process deaths. Except acct(2), that is, which to describe as “archaic” would be a compliment. It logs fixed-size records to a global log file, it truncates process names to 9 characters, it logs user ids but not process ids, and the timestamp format for process exit times is a literally unbelievable 1/64 * 8^exponent * mantissa seconds since the process started (good for about 8 years and a half of non-stop running, but with a variable precision that drops below the second at the 2:16:30 mark), encoded in 16 bits as 3 bits exponent, 13 bits mantissa; the process start time is a saner 32-bit count of seconds since the UNIX epoch (good for about 17 years from now). We opted not to use

acct(2)
.

3 Mandatory Access Control, no relation to “Mac” as an abbreviation for “Macintosh”. Historically referring to security models patterned after military document classification practices, MAC is nowadays a generic term for any policy-based security model, distinct from and orthogonal to permission-based security models (also known as DAC, or Discretionary Access Control). macOS inherited its modular MAC framework from the TrustedBSD project and uses it with great gusto (I count no fewer than seven policy modules on my macOS 11 machine). The Linux Security Modules framework is the Linux equivalent. Windows has limited MAC in the form of the capabilities system for UWP apps. The bitter irony is that, being limited to a small subset of applications, it’s a “mandatory” access control system that operates on an opt-in basis—to say nothing of its complete non-extensibility.

4 If daemons have to run under an anonymous user, why must it be root, which, while still bound by MAC policies, can bypass all ACLs, send kill signals to any process, invoke sysctls in write mode and in general do a lot of damage? A little known fact is that while system daemons run as root, they also do run inside extremely strict sandboxes based on the Seatbelt framework (internally based on, you guessed it, a MAC policy module, unimaginatively named Sandbox). In a sadly predictable twist, while extremely powerful and enabling incredibly granular access control, Seatbelt is almost completely undocumented (notice a pattern yet?). In a less predictable but sadder twist, it’s also been marked as deprecated with no replacement since OS X 10.8. Nevertheless, with all system daemons using it, plus third-party users of the caliber of Google Chrome and Mozilla Firefox, Seatbelt seems unlikely to disappear any time soon.

5 Mach (no relation to “Macintosh”) was a research project to replace the BSD kernel with a microkernel. The design proved impractical, and the most famous real-world implementation of Mach, NeXTSTEP (later “Darwin”, later “OS X”, later “macOS”), actually runs a Mach/BSD hybrid kernel with very little “micro”. Mach was an extremely influential design: Windows NT (later “Windows”) is a Mach clone too, except redesigned to work alongside a VMS-like kernel instead of a BSD one. Windows NT, too, flirted with a microkernel architecture, but it proved to be no more practical in the 90s than it had been in the 80s.

ABOUT THE AUTHOR

T. Student

Still learning.