[Solved] Convert commonmark links to Headings with spaces to GitHub flavored markdown.

N0x0n@lemmy.ml · 3 days ago

Thank you ! It does actually ticks every use case (for my files) looks pretty rad !

This might work, but I think it is best to not tinker further if you already have a working script (especially one that you understand and can modify further if needed).

I totally agree but I will keep your regex as reference, in the near future I will give it a try to decompose you regex as learning process but it looks rather very complex !

Another user came up with the following solution:

sed -E ':l;s/(\[[^]]*\]\()([^)#]*#[^)]*\))/\1\n\2/;Te;H;g;s/\n//;s/\n.*//;x;s/.*\n//;/^https?:/!{:h;s/^([^#]*#[^)]*)(%20|\.)([^)]*\))/\1-\3/;th;s/(#[^)]*\))/\L\1/;};tl;:e;H;z;x;s/\n//;'

Just as a little experiment, If you want to spend some time and give me a answer, what do you think? It’s a another way to achieve the same kind of results but they are significantly different. I know there a thousand ways to achieve the same results but I’m kinda curious how it looks from an experts eyes :).

Thanks again for your help and the time you took to write up a complex regex for my use case ! 👍

N0x0n@lemmy.ml · 3 days ago

Hello :) Sorry to pin you, I just gave pandoc a try but it doesn’t work and I had to dig a bit further into the web to find out why !

Links to Headings with Spaces are not specified by CommonMark and each tool implement a different approach… Most replace space with hyphens other use URL encoding (%20). So even though pandoc looks awesome it doesn’t work for my use case (or did i miss something? Feel free to comment).

You can give it a try on https://pandoc.org/try/ with commonmark to gfm:

[Just a test](#Just a test)
[Just a link](https://mylink/%20with%20space.com)
[External link](Readme.md#JUST%20a%20test)
[Link with numbers](readme.md#1.3%20this%20is%20another%20test)
[Link with numbers](Another%20file%20to%20readme.md#1.3%20this%20is%20another%20test)

If you prefere a cli version:

pandoc --from=commonmark_x --to=gfm+gfm_auto_identifiers "/home/user/Documents/test.md" -o "pandoc_test.md"

N0x0n@lemmy.ml · 3 days ago

Wow ! Thank you ! It did a rapid test on a test-file.md

[Just a test](#just-a-test)
[Just a link](https://mylink/%20with%20space.com)
[External link](readme.md#just-a-test)
[Link with numbers](readme.md#1-3-this-is-another-test)
[Link with numbers](Another%20file%20to%20readme.md#1-3-this-is-another-test)

Great job ! Thank you very much !!! I’m really impressed what someone with proper knowledge can do ! However, I really do not want to mess around with your regex… This will only call for disaster xD ! I will keep preciously your regex and annotated file in my knowledge base, I’m sure some time in the future I will come back to it and try to break it down as learning process.

Thank you very much !!! 👍

N0x0n@lemmy.ml · edit-2 4 days ago

I don’t really have a technical reason, but I do only named volumes to keep things clear and tidy, specially compose files with databases.

When I do a backup I run a script that saves each volumes/database/compose files well organized in directories archived with tar.

In have this structure in my home directory: /home/user/docker/application_name/docker-compose.yaml and it only contains the docker-compose.yml file (some times .env/Docker file).

I dunno if this is the most efficient way or even the best way to do things :/ but It also helps me to keep everything separate between all the necessary config files and the actual files (like movie files on Jellyfin) and it seems easier to switch over If I only need one part and not the other (uhhr sorry for my badly worded English, I hope it makes sense).

Other than that I also like to tinker arround and learn things :) Adding complexity gives me some kind of challenge? XD

N0x0n@lemmy.ml · 4 days ago

Hello :) Sorry for the very late response !

Effectively your regex is very close as a one line, I’m pretty impress ! :0 However I missed to mention something In my post (I only though about it after working on it with another user in the comments…). There a 2 things missing on your beautiful and complex regex:

Numbering with dots also needs to have a dash in between (actually I think every special characters like spaces or a dots are converted to a dash )

FROM
---------------
[Link with numbers](readme.md#1.3%20this%20is%20another%20test)

TO
---------------
[Link with numbers](readme.md#1-3-this-is-another-test)

The part before the hashtag needs to keep it original form (links to a real file)

FROM
---------------
[Link with numbers](Another%20file%20to%20readme.md#1.3%20this%20is%20another%20test.md)

TO
---------------
[Link with numbers](Another%20file%20to%20readme.md#1-3-this-is-another-test.md)

Sorry for the trouble I wasn’t aware of all the GitHub-Flavored Markdown syntax :/. I got a a very cool working script that works perfectly with another user but If you want to modify your regex and try to solve the issue in pure regex feel free :) I’m very curious how It could look like (god regex is so obscure and at the same time it has some beauty in it !)

#! /bin/bash

files="/home/USER/projects/test.md"

mdlinks="$(grep -Po ']\((?!https).*\)' "$files")"
mdlinks2="$(grep -Po '#.*' <<<$mdlinks)"

while IFS= read -r line; do
	#Converts 1.2 to 1-2 (For a third level heading needs to add a supplementary [0-9]) 
	dashlink="$(echo "$line" | sed -r 's|(.+[0-9]+)\.([0-9]+.+\))|\1-\2|')"
	sed -i "s/$line/${dashlink}/" "$files"

	#Puts everything to lowercase after a hashtag
	lowercaselink="$(echo "$dashlink" | sed -r 's|#.+\)|\L&|')"
	sed -i "s/$dashlink/${lowercaselink}/" "$files"

	#Removes spaces (%20) from markdown links after a hashtag
	spacelink="$(echo "$lowercaselink" | sed 's|%20|-|g')"
	sed -i "s/$lowercaselink/${spacelink}/" "$files"

done <<<"$mdlinks2"

N0x0n@lemmy.ml · 4 days ago

Sorry for the late response… I was busy with another user :S My English is so bad I’m not able to response to every one at the same time… Whatever…

I tried your pearl regex substitution and effectively it does what I ask from my post, so thank you very much for your help ! However, I missed a few use cases were your regex breaks… But that’s on me, your command works as expected !!!

[Link with numbers](Another%20Markdown%20file.md#1.3%20this%20is%20another%20test.md)

The part before the hashtag need to keeps it’s original form (even with %20) because it links to a markdown file directly and not a header (Hope it’s comprehensible?). It took me a lot of time with another user and we came to a wrapped up script that does everything:

#! /bin/bash

files="/home/USER/projects/test.md"

mdlinks="$(grep -Po ']\((?!https).*\)' "$files")"
mdlinks2="$(grep -Po '#.*' <<<$mdlinks)"

while IFS= read -r line; do
	#Converts 1.2 to 1-2 (For a third level heading needs to add a supplementary [0-9]) 
	dashlink="$(echo "$line" | sed -r 's|(.+[0-9]+)\.([0-9]+.+\))|\1-\2|')"
	sed -i "s/$line/${dashlink}/" "$files"

	#Puts everything to lowercase after a hashtag
	lowercaselink="$(echo "$dashlink" | sed -r 's|#.+\)|\L&|')"
	sed -i "s/$dashlink/${lowercaselink}/" "$files"

	#Removes spaces (%20) from markdown links after a hashtag
	spacelink="$(echo "$lowercaselink" | sed 's|%20|-|g')"
	sed -i "s/$lowercaselink/${spacelink}/" "$files"

done <<<"$mdlinks2"

If you are motivated you can still improve your regex If you want :) I’m kinda curious If it’s possible with a one-liner ! Thank again for your help and sorry for the late response !!

N0x0n@lemmy.ml · 4 days ago

Yeah probably bare bone regex was a mistake however a friendly user gave me a step by step guide on how to achieve my goal:

#! /bin/bash

files="/home/USER/projects/test.md"

mdlinks="$(grep -Po ']\((?!https).*\)' "$files")"
mdlinks2="$(grep -Po '#.*' <<<$mdlinks)"

while IFS= read -r line; do
	#Converts 1.2 to 1-2 (For a third level heading needs to add a supplementary [0-9]) 
	dashlink="$(echo "$line" | sed -r 's|(.+[0-9]+)\.([0-9]+.+\))|\1-\2|')"
	sed -i "s/$line/${dashlink}/" "$files"

	#Puts everything to lowercase after a hashtag
	lowercaselink="$(echo "$dashlink" | sed -r 's|#.+\)|\L&|')"
	sed -i "s/$dashlink/${lowercaselink}/" "$files"

	#Removes spaces (%20) from markdown links after a hashtag
	spacelink="$(echo "$lowercaselink" | sed 's|%20|-|g')"
	sed -i "s/$lowercaselink/${spacelink}/" "$files"

done <<<"$mdlinks2"

If you know a better way to achieve similar results I’m very open for every new lead and learn something new !

N0x0n@lemmy.ml · 5 days ago

Hello :) I promise this is the last time I will bother you (I know what you are going to say :P) ! If it’s not to much could you give me just a few hints on how I could improve a bit the final script?

#! /bin/bash

files="/home/USER/projects/test.md"

mdlinks="$(grep -Po ']\((?!https).*\)' "$files")"
mdlinks2="$(grep -Po '#.*' <<<$mdlinks)"

while IFS= read -r line; do
	#Converts 1.2 to 1-2 (For a third level heading needs to add a supplementary [0-9]) 
	dashlink="$(echo "$line" | sed -r 's|(.+[0-9]+)\.([0-9]+.+\))|\1-\2|')"
	sed -i "s/$line/${dashlink}/" "$files"

	#Puts everything to lowercase after a hashtag
	lowercaselink="$(echo "$dashlink" | sed -r 's|#.+\)|\L&|')"
	sed -i "s/$dashlink/${lowercaselink}/" "$files"

	#Removes spaces (%20) from markdown links after a hashtag
	spacelink="$(echo "$lowercaselink" | sed 's|%20|-|g')"
	sed -i "s/$lowercaselink/${spacelink}/" "$files"

done <<<"$mdlinks2"

This works perfectly en fulfills all my needs (thanks !!) ! However I’m not very fond of the variable string manipulation ($mdlinks2), if you have some tips without spoiling to much, would be great, otherwise it’s okay, it works exactly how I have imagined it and ticks all use cases. Also If you could give some pointer for an overall improvement or if you see something that could potentially create some strange loop or looks off feel free to comment in your spare time :).

Another question which has nothing to do with the post and gets a bit off topic… You gave me the right push I needed and I saw the power and usefulness of proper knowledge with sed/bash/Pearl. It’s time I finally learn a scripting language ! I want to hear your opinion on what tools would you recommend? Most people would say Python for beginners but I heard so much good things about Pearl (Exiftool is a good example of how powerful Pearl can be) but the syntax scares me out a little bit compared to Python.

Any good book material you have in mind for a beginner?

Thanks again for everything !!!

N0x0n@lemmy.ml · 6 days ago

First, thanks again for sharing your knowledge with me I really appreciate the time/effort you took to write all of this. I know those are a lot of thank you :/ but I’m really grateful for all of this, this is very valuable information I will keep in my knowledge base. It’s really time I learn proper bash/python/Pearl? scripting with all those tools (grep/sed/regex).

Second, YOU MISSED A DAMNED parentheses you fool xD ! mdlinks="$(grep -Po ']$(?!https).*$' ~/mkdn)" Took me some time to figured it out with a very non informative error bashscript.sh: line 8: unexpected EOF while looking for matching "' but as expected it works !

From
-------
[Just a test](#Just%20a%20test.md)
[Just a link](https://mylink/%20with%20space.com)
%20

To
-------
[Just a test](#Just-a-test.md)
[Just a link](https://mylink/%20with%20space.com)
%20

Next to show you my appreciation and not to take everything for granted and being spoon feed for everything, I tried to find a solution myself for something else, I will try to explain the best I can how I solved it.

From
-------
[Just a test](Another%20markdown%20file.md#Hello%20World)

To
-------
[Just a test](Another%20markdown%20file.md#hello-world)

The part before the hashtag needs to keep it’s initial form (it links to the original markdown file). So, because just playing around with Pearl and regex (which doesn’t end well doing this blindly without the proper knowledge) I did some simple string manipulation. It’s not very elegant but does the trick, thankfully to your well written breakdown.

I printed out the $mdlinks variable just to see what it prints out
Copied and changed your Pearl/regex to find the first hashtag (#) and save it into a new variable ($mdlinks2)
Feed your $mdlinks variable into my new Pearl/regex
Feed my new variable into done? (I’m a bit confused here but okay xD)

#! /bin/bash
mdlinks="$(grep -Po ']\((?!https).*\)' "/home/dany/newtest.md")"
echo $mdlinks

mdlinks2="$(grep -Po '#.*' <<<$mdlinks)"
echo $mdlinks2

while IFS= read -r line; do
	dashlink="$(echo "$line" | sed 's|%20|-|g')"
	sed -i "s/$line/${dashlink}/" "/home/dany/newtest.md"
done <<<"$mdlinks2"

Yes, not very elegant but It’s the best I could do currently :/ However, I still got a YES effect :P

To answer your question:

Quick question as I’m working on this, in the new link example, is the BDMV and other capitalized text in this link supposed to be converted to lowercase, or to remain uppercase?

As you can see in my string manipulation above, the part before the # needs to keep it’s original form :) (Sorry wasn’t aware of this before working with the original files) I solved it with some string manipulation as shown above.

I’m a bit tired from all this searching/trail&error, tomorrow I will try to wrap everything up and answer your post below :) ! Also, I need to clean up the mess I made in my home directory xD.

Thanks again for your help ! Have a good night/day !

N0x0n@lemmy.ml · 6 days ago

Hello !!!

Sorry for the very late response had something else to do. I will read everything carefully and response to every post :) I also thought about it over night and I think that sed and and regex wasn’t the best option here (as other have mentioned it).

I think a python script or bash (as you have mentioned it a bit later ) would be a better way. I’m sorry that I put you through all of this… wrong tool for the job :s.

N0x0n@lemmy.ml · edit-2 7 days ago

Sure :)

I don’t know if it still a thing but in the past some web URLs had spaces in their addresses e.g.

https://www.my/%20website%20with%20spaces.com

In markdown you can link to external web addresses like so

[some link to a web address](https://my/%20website%20with%20spaces.com)

However, /https/ ! s|%20|-|g replaces all occurrences of %20 (which is consider a space in html? Sorry if I’m wrong here :s still have a lot to learn) with -. This would break the link the the web URL [some link to a web address](https://my-website-with-spaces.com/). Am I wrong here?

If I may I just found something else that doesn’t quite work 😅 and it seems a bit harder to fix i think ! Sometimes I have links in this form:

[1.3 Subtitles](BDMV_svt-av1_encode_anime.md#1.3%20Subtitles)

As you can see I append the header with 1.3 but as dumb as it is… it also need to be 1-3-subtitles

e.g.

[1.3 Subtitles](BDMV_svt-av1_encode_anime.md#1.3%20Subtitles)

Needs to become

[1.3 Subtitles](BDMV_svt-av1_encode_anime.md#1-3-Subtitles)

Sorry for my bad English trying my best haha ! Hope it’s comprehensible.

Edit:

I don’t know why but lemmy add /%20 instead of %20 in my fake URLS ://

N0x0n@lemmy.ml · edit-2 7 days ago

Haha we cross-replied !

.* did the trick and removes my additional s|]$.+#.+$ to include that pattern form my last reply !

Last question https/ ! s|%20|-| change all occurrence of %20 in the whole file except if it begins with https, is there any way to just change that occurrence when it appears in the markdown link pattern []()?

e.g. replace in [Some text](some%20text.md) but not If Hello I'm just some%20place holder text ?

Thanks again for your easy to read and very informative walk through ! 🤩

N0x0n@lemmy.ml · edit-2 7 days ago

Sorry to spam your unread message 😅 !

I played a bit around and came to the following conclusion:

s|]$#.+$|\L&| - Works great for in document links so I further expanded to this s|]$#.+$|\L&|;s|]$.+#.+$|\L&| to also add the following pattern [Some Text](readme.md#hello%20world.md)

s|%20|-|g - Works on every occurrence of %20 even for the following pattern [Some text](https://my/%20home%20page.com) which would break all external links to the web. So I used this /https/ ! s|%20|-|g

It’s probably very sloppy what I’m doing and not as elegant as your command but it does the trick :) If you to further expand on it feel free however the following command does exactly what I wanted:

sed -re 's|]\(#.+\)|\L&|;s|]\(.+#.+\)|\L&|;/https/ ! s|%20|-|g'

Thanks again from the bottom of my heart !

N0x0n@lemmy.ml · edit-2 7 days ago

Thank you, thank you very much for taking your time to help me out here ! I really appreciate your full breakdown and complete development ! I didn’t tried it out yet but skimming through your post I’m sure it will work out !

However, I forgot to mention something:

The goal of this expression is to find markdown links, and to ignore https links. In your post you indicate the markdown links all start with a # symbol, so we don’t have to explicitly ignore the https as much as we just have to match all links starting with #.

This is only true for links in the same file, if i link to another file it look something like this:

[Why SVT-AV1 over AOM?](readme.md#Why%20SVT-AV1%20over%20AOM?)

I can try to wrap my head around and find a solution by myself, with your well written breakdown I’m sure I can try something out. But if you think it will be to complex for my limited knowledge feel free to adjust :).

Do you mind If I ping you if I’m not able to solve the issue?

Thank again !!! 👍

N0x0n@lemmy.ml · 7 days ago

Oupsi ! Forgot the 20 there ! 😅

N0x0n@lemmy.ml · 7 days ago

Hello,

I have thought of a python script and looked a bit around but couldn’t find something satisfactory. Also I’m a tiny bit more versed in bash/CLI than with python… Even though that’s very arguable !

I looked through the Github repo and at first glance I have no idea how this could do the job, again I probably have to dig a bit deeper and understand what this is actually doing !

Thanks for the pointer will give it a try :)

N0x0n@lemmy.ml · 7 days ago

This would be awesome ! A breakdown of the whole command will give me a better understanding !

Thank you in advance, waiting for your post :)

N0x0n@lemmy.ml · 7 days ago

Hello :) Thanks for your reply !

That’s exactly what I did and how I came to my “final” result but I doesn’t work as expected… because the lack of knowledge and understanding !

Will give sd a try and see if I can come up with something ! Thanks for the pointer !

N0x0n@lemmy.ml · edit-2 3 days ago

[Solved] Convert commonmark links to Headings with spaces to GitHub flavored markdown.

N0x0n@lemmy.ml · 8 days ago

I know that feeling ! My first service hosted via docker + Treafik outside my lan with a wireguard tunnel felt like a big dopamine hit ! Congrats !

Now I have over 20 services and It feels trivial :( I still love the easy to read/write syntax of Treafik ,however I feel like I’m missing a lot of important networking knowledge while avoiding Nginx !

Maybe one day when I’m too bored I will switch everything to Nginx, see how it goes !

N0x0n@lemmy.ml · 15 days ago

That’s some wiered/cool stuff I have never heard off ! I have absolutely no idea what I’m looking at but somehow I want one of those !

That’s the kind of cool niche stuff that is missing here on Lemmy !

N0x0n@lemmy.ml · edit-2 17 days ago

Note taking and documentation fatigue... Forgejo + IDE?