Install Klipper on your old 3D-Printer and still keep Marlin

This is the final post in a series of 3 blog posts describing how I make Marlin and Klipper coexist on my 3-Printers:

It seems many printers are moving to Klipper these days, and mine probably should too.  But I am very satisfied with the Marlin software that controls my printer, as of 2023 you have to install new firmware on your Microcontroller to run Klipper – well now it’s 2024 and I want it all – all the time.

With Marlin the full program is flashed onto the CPU, meaning that if you want to implement new features you need to install new firmware on the printer.  Klipper on the other hand has the high-level part of the program running under Linux, hence new features can be implemented by editing some config files or adding some Python code. Hence it is clear that the development of new features will be much easier with Klipper than with Marlin.  One small example that comes to my mind is adaptive printer-bed-leveling, where you only map the Z-position of the part of the print-bed that will hold a print.

I like my 3D printers as they are, especially the workflow where I put a print file on an SD-card, put it in the printer, and print it, but I am still curious, will Klipper make a difference for me, so I would like to try it out.

This was my incentive for implementing Dual-Boot capability on the ATmega2560, which I have described in my 2 previous blog posts

The primary application which would be Marlin, it could be the firmware currently controlling your 3D-printer, it needs no modification, it might even be the original firmware.

The secondary application which would be Klipper, will need to be relocated to a free address space after the Marlin Firmware, and the Klipper firmware needs to be modified to run there,  and this is the real topic of this blog post.

Compile Klipper so that it will run in the upper address space
and redirect read of data/code in Flash there

In the previous blog post, we got Klipper running in as a secondary firmware, but we had to shrink Marlin so it all would fit in the lower 128 kbyte Flash.  The dualtool.sh from the previous post gives us this info

 dualtool klipper/out/klipper.elf
# This Device is runing DualBoot based on optiboot 8.3
# klipper/out/klipper.elf do not have  trampolines
# NAME[1]=klipper BASE[1]=0x17000 SIZE[1]=32244
 0x00000 - 0x17000 1st flashed with marlin DEFAULT
 0x17000 - 0x3fb00 2nd ../build.klipper/ramps_2nd/klipper.elf 32244 bytes 
 0x3fb00 - 0x3fc00 irq vector table... 256 bytes
 0x3fc00 - 0x40000 DualBoot bootloader 1024 bytes
# Keep primary firmware, not erasing
# Replace secondary firmware,
no trampolines in secondary firmware
 DUAL_BASE max 0x37d0c -> 0x37d00

This time Klipper will be relocated higher to give Marlin more space (the base could be up to 0x37d00), we will choose 0x34000. The relocation is handled during the linking stage, by adding these arguments, to the final call to avr-gcc

EXTRA_LDFLAGS="-mrelax -fno-jump-tables -Wl,--section-start=.text=0x34000"

Now Klipper will be able to run in the newly assigned address space, but all its read-only data/strings are stored in Flash, this will be accessed via 16-bit pointers and Klipper will just get hold of some random code from the Marlin-code instead of date from its own address space.  That was the reason we had to have both Marlin and Klipper fit in the lower 128k Flash in the previous blog post.

To find out where Klipper does these calls the source code is scanned for calls to

Luckily for us, the code is well written and there are only two places to understand and modify, and the fix is trivial too

READP(variable) macro

except for one call to memcpy_P() all access to read-only variables goes through this macro

  #define READP(VAR) ({                                                   \
    _Pragma("GCC diagnostic push");                                     \
    _Pragma("GCC diagnostic ignored \"-Wint-to-pointer-cast\"");        \
    typeof(VAR) __val =                                                 \
        __builtin_choose_expr(sizeof(VAR) == 1,                         \
            (typeof(VAR))pgm_read_byte(&(VAR)),                         \
        __builtin_choose_expr(sizeof(VAR) == 2,                         \
            (typeof(VAR))pgm_read_word(&(VAR)),                         \
        __builtin_choose_expr(sizeof(VAR) == 4,                         \
            (typeof(VAR))pgm_read_dword(&(VAR)),                        \
        __force_link_error__unknown_type)));                            \
    _Pragma("GCC diagnostic pop");                                      \
    __val;                                                              \
    })

This makes the Klipper source code easier to maintain since there is only one function READP() which does the heavy lifting, and has to be adapted to the underlying MCU, this is also very good for us since we just need to replace this with a version that gets hold of the flash-content from the 16-bit address space we are operating in.

  #define READP(VAR) ({                                                   \
    _Pragma("GCC diagnostic push");                                     \
    _Pragma("GCC diagnostic ignored \"-Wint-to-pointer-cast\"");        \
    typeof(VAR) __val =                                                 \
        __builtin_choose_expr(sizeof(VAR) == 1,                         \
            (typeof(VAR))pgm_read_byte_here(&(VAR)),                         \
        __builtin_choose_expr(sizeof(VAR) == 2,                         \
            (typeof(VAR))pgm_read_word_here(&(VAR)),                         \
        __builtin_choose_expr(sizeof(VAR) == 4,                         \
            (typeof(VAR))pgm_read_dword_here(&(VAR)),                        \
        __force_link_error__unknown_type)));                            \
    _Pragma("GCC diagnostic pop");                                      \
    __val;                                                              \
    })

Now we just have to implement the pgm_read_????_here() functions. The pcm_read_????() function is defined in avr/pgmspace.h and they end up using the LPM instruction to get hold of the flash content, we just need to change that to use the ELPM instruction which adds the RAMPZ to the pointer to be able to access all the Flash on the device

#define __hereLPM_classic__(addr)   \
(__extension__({                \
    uint16_t __addr16 = (uint16_t)(addr); \
    uint8_t __result;           \
    __asm__ __volatile__        \
    (                           \
        "elpm" "\n\t"            \
        "mov %0, r0" "\n\t"     \
        : "=r" (__result)       \
        : "z" (__addr16)        \
        : "r0"                  \
    );                          \
    __result;                   \
}))

#define __hereLPM_word_classic__(addr)          \
(__extension__({                            \
    uint16_t __addr16 = (uint16_t)(addr);   \
    uint16_t __result;                      \
    __asm__ __volatile__                    \
    (                                       \
        "elpm"           "\n\t"              \
        "mov %A0, r0"   "\n\t"              \
        "adiw r30, 1"   "\n\t"              \
        "elpm"           "\n\t"              \
        "mov %B0, r0"   "\n\t"              \
        : "=r" (__result), "=z" (__addr16)  \
        : "1" (__addr16)                    \
        : "r0"                              \
    );                                      \
    __result;                               \
}))
#define __hereLPM_dword_classic__(addr)         \
(__extension__({                            \
    uint16_t __addr16 = (uint16_t)(addr);   \
    uint32_t __result;                      \
    __asm__ __volatile__                    \
    (                                       \
        "elpm"           "\n\t"              \
        "mov %A0, r0"   "\n\t"              \
        "adiw r30, 1"   "\n\t"              \
        "elpm"           "\n\t"              \
        "mov %B0, r0"   "\n\t"              \
        "adiw r30, 1"   "\n\t"              \
        "elpm"           "\n\t"              \
        "mov %C0, r0"   "\n\t"              \
        "adiw r30, 1"   "\n\t"              \
        "elpm"           "\n\t"              \
        "mov %D0, r0"   "\n\t"              \
        : "=r" (__result), "=z" (__addr16)  \
        : "1" (__addr16)                    \
        : "r0"                              \
    );                                      \
    __result;                               \
}))


#define pgm_read_byte_here(address_short) __hereLPM_classic__((uint16_t)(address_short))
#define pgm_read_word_here(address_short) __hereLPM_word_classic__((uint16_t)(address_short))
#define pgm_read_dword_here(address_short) __hereLPM_dword_classic__((uint16_t)(address_short))

memcpy_P()

There is one call to memcpy_P(). Instead of reimplementing a version that works on the current 16-bit address space, it is easier to just replace memcpy_P() with a for-loop using the READP() macro from above, here are the changes needed to do that:

diff --git a/src/command.c b/src/command.c
index 39c09458..ed4dc85b 100644
--- a/src/command.c
+++ b/src/command.c
@@ -156,7 +156,13 @@ command_encodef(uint8_t *buf, const struct command_encoder *ce, va_list args)
             *p++ = v;
             uint8_t *s = va_arg(args, uint8_t*);
             if (t == PT_progmem_buffer)
+#ifdef DUALBOOT_BASE
+               for (uint16_t i=0; i<v; i++) {
+                    p[i] = READP( s[i] );
+               }
+#else
                 memcpy_P(p, s, v);
+#endif
             else
                 memcpy(p, s, v);
             p += v;

</pre

Initialization of RAMPZ

This is done by adding a few lines to the initialization of the program, here are the diffs

diff --git a/src/avr/main.c b/src/avr/main.c
index 0523af41..988d2982 100644
--- a/src/avr/main.c
+++ b/src/avr/main.c
@@ -14,6 +14,16 @@
 
 DECL_CONSTANT_STR("MCU", CONFIG_MCU);
 
+#ifdef DUALBOOT_BASE
+static void
+__attribute__((section(".init3"),naked,used,no_instrument_function))
+init3_set_eind (void)
+{
+  __asm volatile ("ldi r24,pm_hh8(__vectors)\n\t"
+                  "out %i0,r24" :: "n" (&RAMPZ) : "r24","memory");
+  __asm volatile ("out %i0,r24" :: "n" (&EIND) : "r24","memory");      // not needed
+}
+#endif
 
 /****************************************************************
  * Dynamic memory

Thats it – The source code is available:

GitHub.com/StorePeter/DualFirmware_Marlin_Klipper

The README.md will guide you through the process.

Best Regards

StorePeter

This entry was posted in 3D printer, Arduino, Embedded. Bookmark the permalink.

2 Responses to Install Klipper on your old 3D-Printer and still keep Marlin

  1. Pingback: Dual Applications on ATmega2560 – Marlin AND Klipper – FAIL (well almost) | StorePeter

  2. Pingback: AVR DualBoot Bootloader | StorePeter

Comments are closed.