"Hashtags" in code blocks get parsed as hashtags #6

thebaer · 2018-11-11T01:03:44Z

As reported here: https://mst3k.interlinked.me/@cadey/100673777784986480

MCSH · 2018-11-12T20:19:16Z

How about this idea?

instead of using regexp for replacing hashtags, using a parser that does different passes. Like this https://github.com/russross/blackfriday/blob/master/markdown.go#L405

It would be something like this:

assign last_codeblock_end = 0

start = 0
for start < len(doc):
if start < last_codeblock_end:
ignore everything
else:
check if this is a new code block
then if not do rendering

I don't think its possible to parse it using regexp, at least not with vanilla regexp, I'm not sure how different Go's regexp is from the regular expression used in theory of computation.

Just for reference, this is the code that needs changing:
https://github.com/writeas/writefreely/blob/master/postrender.go#L33

Hope this helps.

mrvdb · 2018-11-30T11:10:36Z

How can we move forward here? This is probably less trivial than it looks. Some things which stood out to me:

i would have thought md -> html conversion was a more or less 'solved problem' by now, what makes our conversion special?
the md parser seems a modified version of https://github.com/russross/blackfriday correct? some details on why would be helpful
the extracttags is coming from https://github.com/kylemcc/twitter-text-go which seems to be optimized for tweet parsing. Perhaps not the best choice for extracting hashtags in general?

thebaer · 2018-12-01T00:27:03Z

@MCSH Agreed that regex isn't the right solution, and some kind of better parser is needed. Thanks for the suggestion!

@mrvdb:

The trouble is with using regex to find hashtags in plain Markdown and just replacing them with HTML -- that method lacks the context to know instances when it shouldn't replace "hashtags", like inside code blocks
Right, I listed the changes made in the forked repo. Mostly, it's to make the parser more strict. Changes are all based off of how people were actually using Write.as, e.g. trying to insert special formatting / characters but not wanting it to actual render some special way.
The twitter-text-go library is the best one I've found that works with any language / character set you can throw at it. We can always switch if there's a better library out there, but I doubt any take markdown into consideration like we'd need them to.

This improves rendering in a number of situations: - it keeps anchor tags working - it gives the user some control for not linking, for example in code blocks. Con: hashTags at the beginning of a line without a space won't get linked. Workaround related to issues #42 and #6 and #33

thebaer added bug help wanted post rendering labels Nov 11, 2018

thebaer mentioned this issue Nov 21, 2018

Weird footnote display #33

Closed

thebaer mentioned this issue Dec 1, 2018

Anchor links render proper in Drafts, but not when Published #42

Closed

mrvdb mentioned this issue Dec 3, 2018

Hashtag linking improvements #43

Merged

thebaer self-assigned this Jan 14, 2019

thebaer added this to the 1.0 milestone Jan 25, 2019

thebaer closed this as completed in 32e99d0 Feb 4, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

"Hashtags" in code blocks get parsed as hashtags #6

"Hashtags" in code blocks get parsed as hashtags #6

thebaer commented Nov 11, 2018

MCSH commented Nov 12, 2018

mrvdb commented Nov 30, 2018

thebaer commented Dec 1, 2018

"Hashtags" in code blocks get parsed as hashtags #6

"Hashtags" in code blocks get parsed as hashtags #6

Comments

thebaer commented Nov 11, 2018

MCSH commented Nov 12, 2018

mrvdb commented Nov 30, 2018

thebaer commented Dec 1, 2018