Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

"Hashtags" in code blocks get parsed as hashtags #6

Closed
thebaer opened this issue Nov 11, 2018 · 3 comments
Closed

"Hashtags" in code blocks get parsed as hashtags #6

thebaer opened this issue Nov 11, 2018 · 3 comments

Comments

@thebaer
Copy link
Member

thebaer commented Nov 11, 2018

As reported here: https://mst3k.interlinked.me/@cadey/100673777784986480

@MCSH
Copy link

MCSH commented Nov 12, 2018

How about this idea?

instead of using regexp for replacing hashtags, using a parser that does different passes. Like this https://github.com/russross/blackfriday/blob/master/markdown.go#L405

It would be something like this:

assign last_codeblock_end = 0

start = 0
for start < len(doc):
if start < last_codeblock_end:
ignore everything
else:
check if this is a new code block
then if not do rendering

I don't think its possible to parse it using regexp, at least not with vanilla regexp, I'm not sure how different Go's regexp is from the regular expression used in theory of computation.

Just for reference, this is the code that needs changing:
https://github.com/writeas/writefreely/blob/master/postrender.go#L33

Hope this helps.

@mrvdb
Copy link
Collaborator

mrvdb commented Nov 30, 2018

How can we move forward here? This is probably less trivial than it looks. Some things which stood out to me:

  • i would have thought md -> html conversion was a more or less 'solved problem' by now, what makes our conversion special?
  • the md parser seems a modified version of https://github.com/russross/blackfriday correct? some details on why would be helpful
  • the extracttags is coming from https://github.com/kylemcc/twitter-text-go which seems to be optimized for tweet parsing. Perhaps not the best choice for extracting hashtags in general?

@thebaer
Copy link
Member Author

thebaer commented Dec 1, 2018

@MCSH Agreed that regex isn't the right solution, and some kind of better parser is needed. Thanks for the suggestion!

@mrvdb:

  • The trouble is with using regex to find hashtags in plain Markdown and just replacing them with HTML -- that method lacks the context to know instances when it shouldn't replace "hashtags", like inside code blocks
  • Right, I listed the changes made in the forked repo. Mostly, it's to make the parser more strict. Changes are all based off of how people were actually using Write.as, e.g. trying to insert special formatting / characters but not wanting it to actual render some special way.
  • The twitter-text-go library is the best one I've found that works with any language / character set you can throw at it. We can always switch if there's a better library out there, but I doubt any take markdown into consideration like we'd need them to.

mrvdb referenced this issue in mrvdb/writefreely Dec 3, 2018
This improves rendering in a number of situations:

- it keeps anchor tags working
- it gives the user some control for not linking, for example in code
  blocks.

Con:
hashTags at the beginning of a line without a space won't get linked.

Workaround related to issues #42 and #6 and #33
@thebaer thebaer self-assigned this Jan 14, 2019
@thebaer thebaer added this to the 1.0 milestone Jan 25, 2019
@thebaer thebaer closed this as completed in 32e99d0 Feb 4, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Development

No branches or pull requests

3 participants