作者归档:mzh

Go RISCV AES性能提升1100%?使用KCAPI教程

效果

Go-KCAPI地址,https://github.com/mengzhuo/go-kcapi (欢迎各种PR)
AES-CBC benchmark如下:上面是使用KCAPI的,下面是Go标准库

goos: linux
goarch: riscv64
pkg: github.com/mengzhuo/go-kcapi/aes
cpu: Spacemit(R) X60
BenchmarkCBCEncrypto/size=65536             4128            277873 ns/op         235.85 MB/s         752 B/op           3 allocs/op
BenchmarkCBCStdEncrypto/size=65536           391           3064170 ns/op          21.39 MB/s           0 B/op           0 allocs/op

235 MB/s vs 21.39 MB/s

性能提升:1102%

示例代码

go-kcapi我做了一定的封装,跟Go标准库有类似的调用方法
Developer Friendly ™

package main

import (
        "fmt"
        "log"

        "github.com/mengzhuo/go-kcapi/aes"
)

func main() {
        key := []byte("--YOUR-AES-KEY--")
        iv := []byte( "--YOUR-AES--IV--")
        enc, err := aes.NewCBCEncrypter(key, iv)
        if err != nil {
                log.Fatal(err)
        }
        src := []byte("--Hello,世界--")
        dst := make([]byte, 16)
        enc.CryptBlocks(dst, src)
        fmt.Printf("%x", dst) // d57b7738a2d589e0a42ca7424f6d47ed
}

原理

SpacemiT K1 芯片提供了硬件加速功能,并通过Linux Kernel Crypto User Interface暴露了出来。
通过调用相应接口有这么大提升了。

开发念念碎

TLDR;

最近在研究SpacemiT的riscv64 K1芯片,发现这个SoC有个硬件的AES加速模块。
可惜不是用riscv k扩展(crytpo)开发的,而是通过Linux内核暴露出来的自有的引擎(crypto engine)
这能忍不了,Go程序不能榨干芯片性能心痒痒。

于是看看Go咋调用Linux Kernel Crypto Engine。一顿搜索发现竟然没有库……读了读文档,发现还挺简单啊,不就socket编程嘛,就自己写一个!(后来发现我错了,原来相当复杂)

第一难:没有合适的文档
不得不说,Linux的内核文档没有示例代码,基本上啥都要直接翻libkcapi的源码和内核自身的源码。特别是cmsghdr压根没有类型说明,啥都是宏定义……我还是从源码里才翻出来ASSOCLEN是uint32_t,搞得好像这个世界只有C语言用户和C binding了。

第二难:没法debug AF_IF
这个不是shash的问题,内核得开dbg的问题,sendmsg之后。没有合适的地方返回,都是EINVALID,dmesg里也没日志,只能自己strace看调用数据。

第三难:不懂splice,scatter/gather RW,sendfile....
这是我的错,没学习过类似知识,比如网上的例子都是splice的,但Go runtime大牛Andy Pan,提醒我可以直接用sendfile,那不是6字就能代表我的心情的。

第四难:Go crypto接口跟Linux crypto接口不匹配,啊,这就是这个库存在的意义啦,要不开发者自己去用unix包调用也是可以的,反正不就是那几个syscall和buffer构建嘛~

第五难:好像没啥用……man……人艰不拆,还有好几个alg没实现,看看有没有人用再折腾吧……

总之,能调用kcapi提升性能,又学到了不少新知识,那是相当高兴的。
回头再按知识点写个Go开发相关接口的博客吧。

multi-instance udp2raw configuration for openwrt

/etc/config/udp2raw

config udp2raw foo
        option enable '1'
        option run_command '-c -l 127.0.0.1:55820 -r foo_ip:80 -k <bar_password> --raw-mode faketcp -a'

config udp2raw bar
        option enable '1'
        option run_command '-c -l 127.0.0.1:55821 -r bar_ip:80 -k <bar_password> --raw-mode faketcp -a'

/etc/init.d/udp2raw

#!/bin/sh /etc/rc.common

USE_PROCD=1

START=99
STOP=01

PROG=/usr/bin/udp2raw

_log() {
        logger -p daemon.info -t udp2raw "$@"
}

_err() {
        logger -p daemon.err -t udp2raw "$@"
}

start_service() {
    config_load "udp2raw"
    config_foreach start_instance udp2raw
}

start_instance() {
    config_get run_command "$1" 'run_command'
    _log "start instance $1 with $run_command"
    procd_open_instance "udp2raw_$1"
    procd_set_param command $PROG
    procd_append_param command $run_command
    procd_set_param stdout 1
    procd_set_param stderr 1
    procd_close_instance
}

reload_service() {
    stop
    start
}

service_triggers() {
    procd_add_reload_trigger udp2raw
}

Now, service udp2raw start

Caddy docker now support riscv64!

Caddy docker now supported riscv64 by default!

You can install and run simply by docker run --rm -d -p 8080:80 --name web caddy

What is caddy?
Caddy acts as an enterprise-grade web server with automatic HTTPS. Other servers don’t share this same feature out of the box.

Pull request that makes it possible!
https://github.com/caddyserver/caddy/issues/6331
https://github.com/caddyserver/xcaddy/pull/185
https://github.com/caddyserver/caddy-docker/pull/358

Installing OpenBSD 7.4 on a MilkV Mars

Here are some instructions for getting OpenBSD 74 running on a MilkV Mars (V1.11).

You will need:

  1. SD card at least 8GB
  2. An USB TTL serial adapter
  3. A second machine to interact with the MilkV Mars over serial
  4. An Ehternet cable and able connection to a tftp server

The original idea came from " Installing OpenBSD 7.3-current on a VisionFive2" , You can also check this instruction for references.

Steps

  1. Download miniroot.img 7.4 from OpenBSD.
  2. Donwload special DTB made by jh7110-starfive-visionfive-2-v1.3b.dtb and upload it to tftp server (my are 10.0.0.21)
  3. Flash this miniroot.img into SD card
  4. Boot the Mars and hit any key during U-Boot, now you can hit any keys during boot up
--------EEPROM INFO--------

In:    serial
Out:   serial
Err:   serial
Model: Milk-V Mars
Net:   eth0: ethernet@16030000, eth1: ethernet@16040000
switch to partitions #0, OK
mmc1 is current device
found device 1
bootmode flash device 1
** Invalid partition 3 **
Couldn't find partition mmc 1:3
Can't set block device
** Invalid partition 3 **
Couldn't find partition mmc 1:3
Can't set block device
Hit any key to stop autoboot:  0
StarFive #
  1. Uboot setup ip and load DTB file by
dhcp; setenv serverip 10.0.0.21; tftpboot ${fdt_addr_r} jh7110-starfive-visionfive-2-v1.3b.dtb

load mmc 1:1 ${kernel_addr_r} efi/boot/bootriscv64.efi; bootefi ${kernel_addr_r} ${fdt_addr_r}
  1. That's It! Now you can boot with OpenBSD and do install by normal process

OpenBSD 7.4 on MilkV Mars

p.s. In OpenBSD sysctl hw.perfpolicy=high to enable performance mode.

Go riscv64 FMA optimazation notes

FMA, which is short for fused multiply–add, use lots by math in Go compiler and standard library.
I found that Go 1.20 did some riscv64 support for FMA, however, when I'm trying to add test cases for FNMA x * y - z or FNMS -x * y + z.
The binary output always different with my expectation and test cases in math always failed.

At first, I thought that is floating point error since floating point number follows IEEE-754, 2008 edition which allows minor errors within 1e-16 i.e. "veryclose" in math test cases.
However, when I implement the same algorithm for 32 bits FP, there is far more error than it should be.

After my carefully search on SSA code generator, I found that FMA SSA for riscv64 will invert FMA into FNMX if multiplier or adder is negative.

(F(MADD|NMADD|MSUB|NMSUB)D neg:(FNEGD x) y z) && neg.Uses == 1 => (F(NMADD|MADD|NMSUB|MSUB)D x y z)

This SSA will convert FMADDD into FNMADD, unfortunately according to RISCV manual, this is wrong.
In the manual
FMADD means

x * y + z

FNMADD means

 - x * y - z

instead of original CL thought FNMADD should be implemented as

x * y - z

Then I commit a CL that fix this issue for good with some test cases.
https://go-review.googlesource.com/c/go/+/506575