Deploy NodeJs app - Part 3 - Optimizing Docker image

July 14, 2021

8 min

Deploy NodeJs app - Part 3 - Optimizing Docker image

Dockerization is a powerful concept that makes the deployment process easier nowadays. However, when not carefully implemented, Dockerfile could produce a large image size and time-consuming build process.

In this article, we'll cover how to optimize the docker image.

Note that this article is part of Deploy NodeJs app series about deploying application to the cloud:

We will use three steps to make our Dockerfile optimized:

Use lightweight base image
Use multi-stage Dockerfile
Investigate image layers for further optimization

Initial Application

We'll start from an initial NestJs application. You can find it here. The complete solution is there.

Let's take a look at an initial Docker file.

FROM node:16

WORKDIR /usr/src/app

COPY package.json yarn.lock ./

RUN yarn

COPY . .

EXPOSE 3000

CMD [ "yarn", "start" ]

The built image should produce the following result:

$ docker build . -t nestjs-api

$ docker images

REPOSITORY  ... IMAGE ID       SIZE
nestjs-api  ... 150a5278252f   1.43GB

It is surprising to get 1.43 GB of image size for NestJs starter app. Isn't it?

1. Use a lightweight base image.

Notice the first Dockerfile line:

FROM node:16

We have used a base image named node:16. This image is a Full Image of Debian. It is suitable for general use. It would be a top choice when you are not sure what to chose yet.

However, there are a lot more images that we can choose from. The smallest one is alpine. And this will be our choice.

Let's compare other base docker images. For purposes of this comparison, we've produced them with Dockerfile containing only the first layer FROM with a specific image name. So pure base image sizes looks like this:

node:16-alpine          112 MB
node:16-slim            176 MB
node:16                 907 MB
node:16-stretch         938 MB

As we see, alpine is the smallest one. Let's update the first layer of our Dockerfile:

FROM node:16-alpine

As a result, we should get this image size:

REPOSITORY  ... IMAGE ID       SIZE
nestjs-api  ... 27a93957eb10   632MB

Not bad, not bad, we cut the output image by half. Would we cut more? Obviously!

2. Use multi-stage builds

The goal of this step is to get rid of all devDependencies from the final image. In production, we don't need any of them. But at the build time, we do. So how to deal with both needs?

With multi-stage builds, you use multiple FROM statements in your Dockerfile. Each FROM instruction can use a different base, and each of them begins a new stage of the build. You can selectively copy artifacts from one stage to another, leaving behind everything you don’t want in the final image.

The idea of our optimization is simply:

first stage: build an application
second stage: copy production build from the previous stage without devDepencencies to the final image

Let's check it out:

FROM node:16-alpine AS build-image

WORKDIR /usr/src/app

COPY package.json yarn.lock ./

RUN yarn

COPY . .

RUN yarn build

FROM node:16-alpine AS final-image

WORKDIR /usr/src/app

COPY package.json yarn.lock ./

RUN yarn --production

COPY --from=build-image /usr/src/app/dist ./dist

EXPOSE 3000

CMD ["node", "dist/main"]

Let's break it down:

FROM node:16-alpine AS build-stage

Each stage starts from a FROM keyword. Each stage will create a build stage that we can reuse later. Note that we have a new keyword AS. We will use it to give a name to a specific stage. Later on, we can use it as a reference. It's up to us how we name it.

WORKDIR /usr/src/app

COPY package.json yarn.lock ./

RUN yarn

COPY . .

The following four lines are just the same as before.

RUN yarn build

Previously we just run a CMD yarn start. It has triggered a build automatically behind and starts the app. This time, we need to generate a build only.

Time to the second stage.

FROM node:16-alpine AS final-stage

We will use just the same base image, but this time, we will name it differently.

WORKDIR /usr/src/app

COPY package.json yarn.lock ./

Then we set folder and copying packages like in the previous stage.

RUN yarn --production

And here comes our optimization. We don't want to install any devDependencies. So we have to set the production flag to production

COPY --from=build-image /usr/src/app/dist ./dist

We will use the output from the previous stage. As you can notice, we have used a --from parameter. It indicates from which image we will copy.

EXPOSE 3000

CMD ["node", "dist/main"]

We are exposing port like before, and because this is the last stage, we run an app direct from the dist folder containing our build.

Let's see the result:

REPOSITORY  ... IMAGE ID       SIZE
nestjs-api  ... de9a036cc871   392MB

It seems that we get rid of hundreds of MB. But in comparison to pure alpine base image (112 MB), it looks like an app build takes about 250 MB? Can we do something with this? Sure.

3. Investigate image layers

Docker history

We can use docker history command line to check the sizes of specific layers.

$ docker history nestjs-api

IMAGE        CREATED BY                      ... SIZE
15d3d9fbf9eb CMD ["node" "dist/main"]        ... 0B
<missing>    EXPOSE map[3000/tcp:{}]         ... 0B
<missing>    COPY /usr/src/app/dist ./dist   ... 60.4kB
<missing>    RUN /bin/sh -c yarn --production... 279MB
<missing>    COPY package.json yarn.lock ./  ... 256kB
<missing>    WORKDIR /usr/src/app            ... 0B
<missing>    /bin/sh -c #(nop)  CMD ["node"] ... 0B
<missing>    /bin/sh -c #(nop)  ENTRYPOINT ["... 0B
<missing>    /bin/sh -c #(nop) COPY file:2387... 116B
<missing>    /bin/sh -c apk add --no-cache --... 7.77MB
<missing>    /bin/sh -c #(nop)  ENV YARN_VERS... 0B
<missing>    /bin/sh -c addgroup -g 1000 node... 98.5MB
<missing>    /bin/sh -c #(nop)  ENV NODE_VERS... 0B
<missing>    /bin/sh -c #(nop)  CMD ["/bin/sh... 0B
<missing>    /bin/sh -c #(nop) ADD file:8ec69... 5.61MB

Notice the size at line:

<missing>    RUN /bin/sh -c yarn --production... 279MB

We can expect that production build should take far less space.

Time to deep dive.

Dive tool

A tool for exploring a docker image, layer contents, and discovering ways to shrink the size of your Docker image.

Install Dive.

brew install dive

Explore Docker image with dive.

dive nestjs-api

As the result you should see following:

Inspecting docker image wiht dive

This tool allows us to analyze file structure deeply. O the left-hand side, we have Docker Layers and the size they occupy on the disk. We can select a layer by arrows.

On the right-hand side, we can see details of the file structure that the selected layer affected.

Finally, we have discovered that yarn has created a cache when installing dependencies.

Rearrange stages

To get rid of yarn cache, we can rearrange Dockerfile into this:

FROM node:16-alpine AS initial-stage

WORKDIR /usr/src/app

COPY package.json yarn.lock ./

FROM initial-stage AS build-stage

RUN yarn

COPY . .

RUN yarn build

FROM initial-stage AS production-stage

RUN yarn --production

FROM initial-stage AS final-stage

COPY --from=production-stage /usr/src/app/node_modules ./node_modules

COPY --from=build-stage /usr/src/app/dist ./dist

EXPOSE 3000

CMD ["node", "dist/main"]

In the file above, we have four stages now. Let's take a closer look at each of them.

Stage 1. initial

FROM node:16-alpine AS initial-stage

WORKDIR /usr/src/app

COPY package.json yarn.lock ./

In the first stage, we will create a initial-stage build. We will use it for all following three.

Stage 2. build

FROM initial-stage AS build-stage

RUN yarn

COPY . .

RUN yarn build

In the second one, build-stage, we will make install dependencies and build our app. Notice that we are reusing a stage build from the previous stage.

Stage 3. production

FROM initial-stage AS production-stage

RUN yarn --production

The third one, we need only to install production dependencies. Yarn will also create an internal cache.

Stage 4. final

FROM initial-stage AS final-stage

COPY --from=production-stage /usr/src/app/node_modules ./node_modules

COPY --from=build-stage /usr/src/app/dist ./dist

EXPOSE 3000

CMD ["node", "dist/main"]

The final-stage is also based on initial-stage. In this stage, we will do nothing heavy but two things

copy node-modules from production-stage
copy build files from build-stage.

Notice that we won't copy a yarn cache file, so in that way, we will get rid of it.

Let's see the results:

REPOSITORY  ... IMAGE ID       SIZE
nestjs-api  ... 61de4c904e57   123MB

Sum up

Finally, we have decreased the initial docker image from 1.43 GB to 123 MB in three steps.

Steps                       Image size   Decrease by

initial image               1.43 GB

1. use alpine base image     632 MB      798 MB  (55 %)
2. remove devDepencencies    392 MB      240 MB  (17 %)
3. remove yarn cache         123 MB      269 MB  (19 %)

Total                                    1.3GB   (91 %)

So in total, we have decreased the initial image by 1.3GB (91%).

Quite an impressive result. Isn't it?

Thanks for reading.

← Back to home