I just tried this using sdcc-4.2.0, and it didn’t work, the calling convention has been changed, the character should be passed in the A-register. In 2018 sdcc passed parameter on the stack. With the proper line commented out in the sources below dprint works with sdcc from 2022
- In Debugging @ 5.3Mbit/sec (5333333 Baud) on Arduino and other Embedded systems I describe how I handle debug print on Atmega processors like the one used in Arduino and 3D printers, basicly I let the embedded CPU transmit as fast as possible and let the rest of the world adjust to whatever speed that happened to be. It works great and is now my goto solution to get a project of the ground.
Lately I have been tinkering with boards which use the stm8s processors. Not because they have got something special to offer (except the price), but there are many very cheap gadgets available from China which use this processor, like thermostat, voltmeters, programmable power-supplies or relays. Have a look at github.com/TG9541/stm8ef. I see these boards as half-baked solution to many of my projects, if they just had f.ex. a way to be networked or…. – they need to be HACKED.
The stm8s architecture is as simple as it gets, it has the same register set as the venerable 0ld 6502 microprocessor from 1975, with the registers updated a bit.
| 6502 | A 8bit | X 8bit | Y 8bit | SP 8bit | PC 16bit | status |
| stm8 | A 8bit | X 16bit | Y 16bit | SP 16bit | PC 24bit | status |
The stm8s is not supported by GCC, but the SDCC compiler has it covered, not C++. I am more of a standard C guy anyway, so not a big thing for me. To get going you need to:
- install sdcc, I am using version 3.8 from sdcc.sourceforge.net,
- install stm8s header files github.com/the-cave/stm8s-header
- install stm8flash github.com/vdudouyt/stm8flash
- get hold of a ST-Link adapter
For a detailed guide on how to do this please see TG9541 STM8S-Programming
The cheapest stm8s103 is a 20pin version with 8kbFLASH/1Kb RAM $3.28/10pcs delivered this is also the most common and will be used here.
It was actually quite tricky to get consistent bit timing. The cpu uses a 32bit bus internally, the instructions are between 1 and 4 bytes long, and the cpu uses 3-stage pipe-line to increase speed. Hence it is important to get the instructions properly aligned.
The functionallity of dprint() implemented here is the same as I described in Debugging @ 5.3Mbit/sec (5333333 Baud) on Arduino and other Embedded systems so it might be a good idea to read if you haven’t already. The main difference is a different instruction-set, a different compiler, and a slightly different language.
| Device | atmega2560 | atmega328p | stms8103f3 |
| Package | 64pin | 28pin | 20pin |
| Flash | 256kb | 32kb | 8kb |
| RAM | 8kb | 2kb | 1kb |
| EEPROM | 4kb | 1kb | 640 |
| Language | C++ | C++ | C |
| Compiler | gcc-5.4 | gcc-5.4 | sdcc-3.8.0 |
Let us have a look at the source code, as usual an init-routine and a printc() routine written in assembler.
#define TX_PORT GPIOC
#define TX_BIT 5
GLOBAL char debug_level;
PUBLIC void print_init(void)
{
TX_PORT->DDR &= ~(1 << TX_BIT); // input to test for pullup
#ifdef DEBUG_LEVEL
debug_level = DEBUG_LEVEL;
#else
if (TX_PORT->IDR & (1 << TX_BIT)) { // if pulled up enable debug
debug_level = 1;
} else {
debug_level = 0; // to disable prints, ground TX_PIN with a resistor
}
#endif
TX_PORT->DDR |= 1 << TX_BIT; // output
TX_PORT->ODR |= 1 << TX_BIT; // output high
TX_PORT->CR1 |= 1 << TX_BIT; // push-pull
TX_PORT->CR2 |= 1 << TX_BIT; // fast mode
}
// interrupt disabled for 11/5333333 seconds = 2 usec
// remove rim/sim if critical, the only implication might be garbelt prints
PUBLIC void printc(char c)
{
__asm
sim
; ld a, (0x03,sp) ; calling convention as it was when I first wrote this
jp printc_32bit_aligned
.bndry 4 ; align to 16 bit
printc_32bit_aligned:
bres 0x500a, #5 ; start bit
srl a ; CC.C=bit0
nop
bccm 0x500a, #5 ; bit0
srl a ; CC.C=bit1
nop
bccm 0x500a, #5 ; bit1
srl a
nop
bccm 0x500a, #5 ; bit2
srl a
nop
bccm 0x500a, #5 ; bit3
srl a
nop
bccm 0x500a, #5 ; bit4
srl a
nop
bccm 0x500a, #5 ; bit5
srl a
nop
bccm 0x500a, #5 ; bit6
srl a
nop
bccm 0x500a, #5 ; bit7
srl a
nop
bset 0x500a, #5 ; stopbit
rim
__endasm;
}
if you wonder what PUBLIC and GLOBAL means in the above code you will have to stick around until the end of this article.
Since we are not using C++ there is no operator overloading, so we need different named routines for different data types, but otherwise it is business as usual
PUBLIC void prints(char *s)
{
while (*s) {
printc(*s++);
}
}
PUBLIC char nibble2ascii(uint8_t d)
{
d &= 0xf;
if (d < 10)
d += '0';
else
d += 'a' - 10;
return (d);
}
PUBLIC void printb(uint8_t b)
{
printc(nibble2ascii(b>>4));
printc(nibble2ascii(b));
}
PUBLIC void printw(uint16_t w)
{
dprintb((uint8_t)(w>>8));
dprintb((uint8_t)w);
}
PUBLIC void printl(uint32_t l)
{
dprintb((uint16_t)(l>>16));
dprintb((uint16_t)l);
}
PUBLIC void printu(uint16_t u)
{
uint8_t c;
c = '0' + (uint8_t)(u % 10);
u /= 10;
if (u) {
printu(u);
}
printc(c);
}
PUBLIC void wheel(void)
{
static uint8_t n;
printc('\b');
n++;
if (n>3) n=0;
printc("-/|\\"[n]);
}
Likewise the header file that do the conditional dprints are a little different too
# define DTRACE(args...) if (debug_level>0) args # define DDTRACE(args...) if (debug_level>1) args # define DDDTRACE(args...) if (debug_level>2) args # define dprint_init() print_init() # define dwheel() DTRACE(wheel()) #define dprintc(c) DTRACE(printc(c)) #define ddprintc(c) DDTRACE(printc(c)) #define dddprintc(c) DDDTRACE(printc(c)) #define dprints(s) DTRACE(prints(s)) #define ddprints(s) DDTRACE(prints(s)) #define dddprints(s) DDDTRACE(prints(s)) #define dprintb(b) DTRACE(printb(b)) #define ddprintb(b) DDTRACE(printb(b)) #define dddprintb(b) DDDTRACE(printb(b)) #define dprintw(w) DTRACE(printw(w)) #define ddprintw(w) DDTRACE(printw(w)) #define dddprintw(w) DDDTRACE(printw(w)) #define dprintl(l) DTRACE(printl(l)) #define ddprintl(l) DDTRACE(printl(l)) #define dddprintl(l) DDDTRACE(printl(l)) #define dprintu(u) DTRACE(printu(u)) #define ddprintu(u) DDTRACE(printu(u)) #define dddprintu(u) DDDTRACE(printu(u))
Here is an example of how dprint could be used:
/*
* main - test of soft uart on PC5 @ 5.33 Mbit/sec
* Copyright (C) peter@lorenzen.us 2018 license https://en.wikipedia.org/wiki/Beerware
*/
#include <stm8s.h>
#include <stm8s_tim4.h>
#include "dprint.h"
void clock_init()
{
// RM0016 page 89
// system clock to run at 16MHz using the internal oscillator.
CLK->ICKR = 1; // High-speed internal RC on
CLK->ECKR = 0; // external clock off
CLK->SWR = 0xe1; // HSI as the clock source.
CLK->SWCR = 0x02; // Enable clock switch
CLK->CKDIVR = 0; // f_HSI=F_HSI_RC/1 f_CPU=f_MASTER
CLK->PCKENR1 = 0xff; // Enable all peripheral clocks.
CLK->PCKENR2 = 0xff; // Ditto.
}
uint16_t msec=0; // 65536 msec = 1 minut 5 sec
uint16_t minut=0; // 65536 minut = 45 days.
uint16_t ms4nextminut=60000;
void timer4_init()
{
TIM4->PSCR = TIM4_PRESCALER_64; // f_MASTER/64 = 250kHz, 4usec
TIM4->ARR = 250; // 250KHz / 250 = 1kHz, 1msec
TIM4->IER = TIM4_IER_UIE; // Enable interrupt on update event
TIM4->CR1 = TIM4_CR1_URS + TIM4_CR1_CEN;// Enable timer
}
void TIM4_UPD_handle() __interrupt (23)
{
TIM4->SR1 &= ~ TIM4_SR1_UIF; // clear Update Interrupt Flag
msec++;
if (msec == ms4nextminut) {
minut++;
ms4nextminut += 60000;
}
}
#define msec_after(offset) ((uint16_t)(msec-offset))
#define minut_after(offset) ((uint16_t)(minut-offset))
void msec_sleep(uint16_t ms_delay)
{
uint16_t diff;
uint16_t timestamp = msec;
uint8_t msb = timestamp>>8;
ddprintc('\n');
ddprintw(timestamp);
while (1) {
uint16_t ms = msec;
if (msb != ms>>8) {
msb=ms>>8;
ddprintc(',');
ddprintb(msb);
}
if (msec_after(timestamp) >= ms_delay) {
ddprintc(' ');
return;
}
__asm__("wfi"); // Wait for Interrupt
}
}
uint16_t prev_minut=1234;
int main()
{
char c;
uint16_t m;
clock_init();
dprint_init();
timer4_init();
dprints("\nHello World\n");
dprints(NAME);
dprints(" revision: ");
dprintu(REVISION);
dprintc('\n');
for (int i=0;i<10;i++) {
for (c='0'; c<='Z'; c++) {
dprintc(c);
}
dprintc('\n');
}
while (1) {
m = minut;
if (prev_minut != m) {
prev_minut = m;
dprints("\n Minut:");
dprintu(m);
dprints("=0x");
dprintw(m);
dprints(" ");
}
dwheel();
msec_sleep(1024); // 1024 = 0x400 1.024 seconds
}
}
Makefile
I always use Makefiles to compile my code, they have become more and more sophisticated, maybe you can get some inspiration from this Makefile
# Copyright (C) peter@lorenzen.us 2018 license: https://en.wikipedia.org/wiki/Beerware
DEBUG_LEVEL = 2
#NAME = dprint_disabled
NAME = dprint_enabled
DEVICE = stm8s103f3
EXT = .ihx
OEXT = .rel
OBJDIR = obj
INCDIR = include
CFLAGS += -l stm8 -mstm8 -I$(INCDIR) $(EXTRA)
LDFLAGS = --out-fmt-ihx -mstm8
IMAGE = images/$(NAME)$(EXT)
SRCS = main.c dprint.c revision.c
OBJS = $(patsubst %,$(OBJDIR)/%$(OEXT), $(basename $(SRCS)))
INCLUDES = $(patsubst %,$(INCDIR)/%$.h, $(basename $(SRCS)))
REVISION=$(shell git branch | awk 'BEGIN {n=0} /^\* r/ { gsub("^* r","");n=$0} END {print n}')
NEXT_REVISION=$(shell echo $((1+$(REVISION))))
CFLAGS += -DREVISION=$(REVISION)
CFLAGS += -DNAME="\"$(NAME)\""
FILES += Makefile .gitignore $(SRCS)
IMAGE = images/$(NAME)-r$(REVISION)$(EXT)
$(warning IMAGE=/$(IMAGE))
ifneq ($(DEBUG_LEVEL),)
CFLAGS += -DDEBUG_LEVEL=$(DEBUG_LEVEL)
endif
ifeq ($(DEVICE),stm8s103f3)
CFLAGS += -DSTM8S103
endif
ifeq ($(NAME),dprint_disabled)
CFLAGS += -DDPRINT_DISABLED
endif
all: $(IMAGE) size
include tools-stm8s/tools.mk # provides CC LINK STM8FLASH C2H SIZE
$(OBJDIR)/main$(OEXT): $(INCDIR)/dprint.h
size:
$(SIZE) images/*$(EXT)
$(IMAGE): $(OBJS) images
$(LINK) $(OBJS) $(LDFLAGS) -o $@
$(OBJDIR)/%$(OEXT): %.c $(INCDIR)/%.h $(OBJDIR) tools
$(CC) $(CFLAGS) %.c -c -o $@
$(INCDIR)/%.h: %.c $(INCDIR) Makefile
$(C2H) %.c > $@
$(OBJDIR) $(INCDIR) images:
mkdir -p $@
flash: $(IMAGE)
stm8flash -c stlink -p $(DEVICE) -w $(IMAGE)
make commit
commit:
@echo is on REVISION=$(REVISION) new will be $(NEXT_REVISION)
echo -n "hit ENTER to create branch r$(NEXT_REVISION): "; bash -c "read -t 2"
git commit -a;
git branch r$(NEXT_REVISION)
git checkout r$(NEXT_REVISION)
revision.c: .git Makefile
echo "const char revision[] = \"REV-$(REVISION)\";" > $@
tar: $(FILES) images
tar cvJf images/$(NAME)-r$(REVISION).txz $(FILES)
com:
@date
picocom -q -l /dev/ttyUSB0 -b 5333000 --imap lfcrlf
@date
clean:
rm -rf $(OBJDIR) $(INCDIR) revision.c
If the tools used are not just standard tools I tend to install them with a Makefile too, that way I have documented where I found it, and how it was installed,
Here we have to get get hold of some header files and get an compile stm8sflash, the details are here:
# Copyright (C) peter@lorenzen.us 2018 license: https://en.wikipedia.org/wiki/Beerware TOOLDIR = tools-stm8s CC = sdcc LINK = $(CC) STM8S_INC = $(TOOLDIR)/stm8s-header/inc CFLAGS += -I$(STM8S_INC) ifeq ($(CC),) $(error echo sdcc from https://sourceforge.net/projects/sdcc) endif STM8FLASH = $(TOOLDIR)/stm8flash/stm8flash C2H = $(TOOLDIR)/extract-header.sh SIZE = $(TOOLDIR)/sdcc-size.sh FILES += $(TOOLDIR)/tools.mk $(C2H) $(SIZE) tools: $(TOOLDIR) $(STM8FLASH) $(STMS8_INC) $(TOOLDIR): mkdir -p $@ $(TOOLDIR)/stm8flash: git clone https://github.com/vdudouyt/stm8flash.git $@ $(STM8FLASH): $(TOOLDIR)/stm8flash make -C $(TOOLDIR)/stm8flash $(STMS8_INC): git clone https://github.com/the-cave/stm8s-header.git $(TOOLDIR)/stm8s-header
I have always detested the include-file hell with functions prototypes and extern for global variables, so I generate header files on the fly, like having the .h embedded in the .c file
- Defines which is needed in other files goes in the beginning of the .c file
- Variables that need to be accessed from elsewhere are marked with GLOBAL
- functions which can be called from elsewhere are marked PUBLIC
- it is often a good idea to mark local functions as static, the compiler likes it.
The shell-script below extract-header.sh creates the header-file
#!/bin/bash
if [ $(uname) = Darwin ]; then
AWK=gawk
else
AWK=awk
fi
if grep "#ifdef HEADER_BELOW_UNTIL_HEADER_END_DO_NOT_CHANGE_THIS_LINE" $1 >/dev/null; then
$AWK 'BEGIN {
NAME=toupper(ARGV[1])
sub(".C$","_H",NAME)
}
/#ifdef HEADER_BELOW_UNTIL_HEADER_END_DO_NOT_CHANGE_THIS_LINE/ {
print "#ifndef " NAME
print "#define " NAME
print "// Auto generated from " ARGV[1] " - edits will be lost"
print "#define PUBLIC"
print "#define GLOBAL"
print "#include <stm8s.h> // taking care of uint8_t..."
header = 1;
next;
}
/#endif \/\/ HEADER_END_DO_NOT_CHANGE_THIS_LINE/ {
header = 0;
next;
}
header == 1 {
print $0;
next;
}
/^PUBLIC/ {
gsub("PUBLIC","")
gsub(").*$",");")
print $0
next;
}
/^GLOBAL/ {
gsub("GLOBAL","extern")
gsub(" *=.*$",";")
print $0
next;
}
END {
print "#endif // " NAME
}' $1
else
echo "WARNING: $1 has no .h section to extract" >&2
fi
I really like to see how the size of my my code evolves, so I have created sdcc-size.sh which is the equivalent of avr-size in the gnu world
#!/bin/bash
# resemble output of gnu sizes
#
# text data bss dec hex filename
# 8622 14 456 9092 2384 ../firmware_vusb_mega8/bin/r800-atmega328p@16Mhz.elf
#
if [ $(uname) = Darwin ]; then
AWK=gawk
else
AWK=awk
fi
echo " text data bss dec hex filename"
for i in $*;do
map=`echo $i | sed s/.ihx$/.map/`
$AWK 'BEGIN {text=0; data=0; bss=0; }
/l_CODE/ {text += strtonum("0x" $1);}
/l_INITALIZER/ {text += strtonum("0x" $1);}
/l_GSINIT/ {text += strtonum("0x" $1);}
/l_HOME/ {text += strtonum("0x" $1);}
/l_GSFINAL/ {text += strtonum("0x" $1);}
/l_DATA/ {bss += strtonum("0x" $1);}
/l_INITIALIZED/ {data += strtonum("0x" $1);}
END {
total = text + bss + data;
printf("%7d %7d %7d %7d %7x %s\n", text, data, bss, total, total, ARGV[1])
}' $map
done
I like to be able to keep an exact copy of the source-code to each and every code there exist in the devices around my house, Years ago I created a small tar-archive of the source-code whenever I flashed a device, these day I use git.
To get a revision numbering scheme i create a git branch whenever I flash some permanent code, The Makefile can still generate a tar file of the current software-revision so a workflow might be like this.
- make # compile
- make flash # upload the code to the microcontroller
- make com # connect to the microcontroller via a serial UART @ 5.33 Mbaud
- git status # to see what has change
- make tar # creates a full backup of all the files used
- make commit # save all changes in git and makes a new branch, ready for the next change
You can download a copy of the current source dprint_enabled-r7.txz, feel free to use it as you see fit – licensed as beerware
tar tvf images/dprint_enabled-r7.txz -rw-rw-r-- peter/peter 2010 2018-12-07 16:16 Makefile -rw-rw-r-- peter/peter 179 2018-12-05 21:23 .gitignore -rw-rw-r-- peter/peter 2233 2018-12-07 14:10 main.c -rw-rw-r-- peter/peter 3838 2018-12-07 14:07 dprint.c -rw-rw-r-- peter/peter 33 2018-12-07 16:16 revision.c -rw-rw-r-- peter/peter 748 2018-12-07 13:55 tools-stm8s/tools.mk -rwxrwxr-x peter/peter 876 2018-12-07 10:36 tools-stm8s/extract-header.sh -rwxrwxr-x peter/peter 852 2018-12-06 15:34 tools-stm8s/sdcc-size.sh
Although most of dprint.c has been presented above you can see the complete file below
#ifdef HEADER_BELOW_UNTIL_HEADER_END_DO_NOT_CHANGE_THIS_LINE
/*
* dprint - implements a soft uart on PC5 @ 5.33 Mbit/sec
* Copyright (C) peter@lorenzen.us 2018 license https://en.wikipedia.org/wiki/Beerware
*/
#ifdef DPRINT_DISABLED
# define DTRACE(args...)
# define DDTRACE(args...)
# define DDDTRACE(args...)
# define dprint_init()
# define dwheel()
#define dprintc(c)
#define ddprintc(c)
#define dddprintc(c)
#define dprints(s)
#define ddprints(s)
#define dddprints(s)
#define dprintb(b)
#define ddprintb(b)
#define dddprintb(b)
#define dprintw(w)
#define ddprintw(w)
#define dddprintw(w)
#define dprintu(u)
#define ddprintu(u)
#define dddprintu(u)
#else
# define DTRACE(args...) if (debug_level>0) args
# define DDTRACE(args...) if (debug_level>1) args
# define DDDTRACE(args...) if (debug_level>2) args
# define dprint_init() print_init()
# define dwheel() DTRACE(wheel())
#define dprintc(c) DTRACE(printc(c))
#define ddprintc(c) DDTRACE(printc(c))
#define dddprintc(c) DDDTRACE(printc(c))
#define dprints(s) DTRACE(prints(s))
#define ddprints(s) DDTRACE(prints(s))
#define dddprints(s) DDDTRACE(prints(s))
#define dprintb(b) DTRACE(printb(b))
#define ddprintb(b) DDTRACE(printb(b))
#define dddprintb(b) DDDTRACE(printb(b))
#define dprintw(w) DTRACE(printw(w))
#define ddprintw(w) DDTRACE(printw(w))
#define dddprintw(w) DDDTRACE(printw(w))
#define dprintl(l) DTRACE(printl(l))
#define ddprintl(l) DDTRACE(printl(l))
#define dddprintl(l) DDDTRACE(printl(l))
#define dprintu(u) DTRACE(printu(u))
#define ddprintu(u) DDTRACE(printu(u))
#define dddprintu(u) DDDTRACE(printu(u))
#endif
#endif // HEADER_END_DO_NOT_CHANGE_THIS_LINE
#include "dprint.h"
#define TX_PORT GPIOC
#define TX_BIT 5
GLOBAL char debug_level;
// interrupt disabled for 11/5333333 seconds = 2 usec
// remove rim/sim if critical, the only implication might be garbelt prints
#ifndef DPRINT_DISABLED
PUBLIC void printc(char c)
{
__asm
sim
ld a, (0x03,sp)
jp printc_32bit_aligned
.bndry 4 ; align to 16 bit
printc_32bit_aligned:
bres 0x500a, #5 ; start bit
srl a ; CC.C=bit0
nop
bccm 0x500a, #5 ; bit0
srl a ; CC.C=bit1
nop
bccm 0x500a, #5 ; bit1
srl a
nop
bccm 0x500a, #5 ; bit2
srl a
nop
bccm 0x500a, #5 ; bit3
srl a
nop
bccm 0x500a, #5 ; bit4
srl a
nop
bccm 0x500a, #5 ; bit5
srl a
nop
bccm 0x500a, #5 ; bit6
srl a
nop
bccm 0x500a, #5 ; bit7
srl a
nop
bset 0x500a, #5 ; stopbit
rim
__endasm;
}
PUBLIC void prints(char *s)
{
while (*s) {
printc(*s++);
}
}
PUBLIC char nibble2ascii(uint8_t d)
{
d &= 0xf;
if (d < 10)
d += '0';
else
d += 'a' - 10;
return (d);
}
PUBLIC void printb(uint8_t b)
{
printc(nibble2ascii(b>>4));
printc(nibble2ascii(b));
}
PUBLIC void printw(uint16_t w)
{
dprintb((uint8_t)(w>>8));
dprintb((uint8_t)w);
}
PUBLIC void printl(uint32_t l)
{
dprintb((uint16_t)(l>>16));
dprintb((uint16_t)l);
}
PUBLIC void printu(uint16_t u)
{
uint8_t c;
c = '0' + (uint8_t)(u % 10);
u /= 10;
if (u) {
printu(u);
}
printc(c);
}
PUBLIC void wheel(void)
{
static uint8_t n;
printc('\b');
n++;
if (n>3) n=0;
printc("-/|\\"[n]);
}
PUBLIC void print_init(void)
{
TX_PORT->DDR &= ~(1 << TX_BIT); // input to test for pullup
#ifdef DEBUG_LEVEL
debug_level = DEBUG_LEVEL;
#else
if (TX_PORT->IDR & (1 << TX_BIT)) { // if pulled up enable debug
debug_level = 1;
} else {
debug_level = 0; // to disable prints, ground TX_PIN with a resistor
}
#endif
TX_PORT->DDR |= 1 << TX_BIT; // output
TX_PORT->ODR |= 1 << TX_BIT; // output high
TX_PORT->CR1 |= 1 << TX_BIT; // push-pull
TX_PORT->CR2 |= 1 << TX_BIT; // fast mode
}
#endif

Pingback: Debugging @ 5.3Mbit/sec (5333333 Baud) on Arduino and other Embedded systems | Peter Lorenzen