tr on mac: misplaced sequence asterisk - tr

I'm very new to all of this so please excuse any mistakes.
I'm working on on a mac.
I'm trying to follow this tutorial here
When I type in tr "[ -%,;\(\):=\.\\\*[]\"\']" "_" < hug_tol.fasta > hug_tol.clean.fasta
I get the error message tr:misplaced sequence asterisk
I'm guessing that something in the file must be wrong, but since I'm trying to remove those characters the error message doesn't make sense.
I haven't found anything on Google so maybe someone can help me.

The author of the tutorial appears to be using quasi-regex character class syntax for tr. tr is much more limited in it's scope than that. It only accepts a few escape characters and special characters. Simplify your command to
tr "%,;():=.*[]\"\' \\\\\-" "_" < hug_tol.fasta > hug_tol.clean.fasta
The - character does have special meaning, so put it at the end: in the beginning it will be interpreted as a command-line argument, while in the middle it specifies a character range. In bash, * won't be expanded in double quotes. For tr, to specify a plain \, you need a double \ (since it's the escape character). To get that through bash, you need \\\\.
You may also want to consider using the -c option to specify the complement set (the characters you want to keep), since it is probably much smaller:
tr -c "A-Za-z0-9_" "_" < hug_tol.fasta > hug_tol.clean.fasta
or more tersely
tr -c "[:alnum:]" "_" < hug_tol.fasta > hug_tol.clean.fasta

Related

Javascript Replace with parameters

So I'm making a markdown editor, and I want some function like "This is *italics*".replace("*$1*","<i>$1</i>");
Any easy way to do this? (Client Side, this'll be hosted on Github Pages or something, so a random npm package probably won't help)
Edit: An equal number of people have upvoted and downvoted this. It would help if you tell me why you downvoted.
Short answer: 'This is *italics*'.replace(/\*(.+)\*/, '<i>$1</i>');
Explanation: Using RegExp is the easiest way to go about this, specifically the grouping section.
Let's strip down /\*(.+)\*/:
The starting and ending / are defining that the thing in between is actually a RegExp
We need to check for asterisks at the start and at the end, but * is a quantity selector in the RegExp, therefore we need to escape them using a \ (basically saying "hey, the next chracter is not an actual selector, but something literal")
Next we need to specify that we need to check for any character between those asterisks (that's the .), appearing more than once (that's the +)
Finally we need to group this and tell the RegExp that what we want to remember is the thing between the asterisks and not the whole thing, that's where the parenthesis come to action.
Using those parenthesis, we can do $n (where n is the matched quantity number, in this case 1) in the replacing string to replace for the matching group

Get content with regex in javascript

::head
line 1
line 2
line 3
::content
content 1
content 2
content 3
How do I get "head" paragraph(first part) text with regex? This is from txt file.
Unfortunately, the below doesn't work in javascript because of this: Javascript regex multiline flag doesn't work. So we have to tweak things a bit. A line break in a file can be found in javascript strings as \n. In windows this includes \r but not in linux, so our \s* becomes more important now that we're doing this without using line-ending characters ($). I also noticed that you don't need to specifically gather the other lines, since line breaks are being ignored anyway.
/(::head[^]*?)\n\s*\n/m
This works in testing in Chrome, so it should work for your needs.
this is a little fancy, but it should fit if this is used in conjunction with many similar properties.
/(::head.*?$^.*?$)^\s*$/m
Note that you need the /m multiline flag.
Here it is tested against your sample data http://rubular.com/r/vtflEgDdkY
First, we check for the ::head data. That's where we start collecting information in a group with (). Then we look for anything with .*, but we do so with the lazy ? flag. Then we find the end of the line with $ and look for more lines with data with the line start ^ then anything .*? then the line end $ this will grab multiple lines because of the multiline flag, so it's important to use the lazy matching ? so we don't grab too much data. Then we look for an empty line. Normally you just need ^$ for that, but I wanted to make sure this would work if someone had stuck a stray space or tab on the lines in between sections, so we used \s* to grab spaces. The * allows it to find "0 or more" spaces as acceptable. Notice we didn't include the empty line in the group () because that's not the data you care about.
For further reading on regex, I recommend http://www.regular-expressions.info/tutorial.html It's where I learned everything I know about regex.
You can use [\s\S]+::content to match everything until ::content:
const text = ...
const matches = text.match(/^([\s\S]+)::content/m)
const content = matches[1]

how to document a single space character within a string in reST/Sphinx?

I've gotten lost in an edge case of sorts. I'm working on a conversion of some old plaintext documentation to reST/Sphinx format, with the intent of outputting to a few formats (including HTML and text) from there. Some of the documented functions are for dealing with bitstrings, and a common case within these is a sentence like the following: Starting character is the blank " " which has the value 0.
I tried writing this as an inline literal the following ways: Starting character is the blank `` `` which has the value 0. or Starting character is the blank :literal:` ` which has the value 0. but there are a few problems with how these end up working:
reST syntax objects to a whitespace immediately inside of the literal, and it doesn't get recognized.
The above can be "fixed"--it looks correct in the HTML () and plaintext (" ") output--with a non-breaking space character inside the literal, but technically this is a lie in our case, and if a user copied this character, they wouldn't be copying what they expect.
The space can be wrapped in regular quotes, which allows the literal to be properly recognized, and while the output in HTML is probably fine (" "), in plaintext it ends up double-quoted as "" "".
In both 2/3 above, if the literal falls on the wrap boundary, the plaintext writer (which uses textwrap) will gladly wrap inside the literal and trim the space because it's at the start/end of the line.
I feel like I'm missing something; is there a good way to handle this?
Try using the unicode character codes. If I understand your question, this should work.
Here is a "|space|" and a non-breaking space (|nbspc|)
.. |space| unicode:: U+0020 .. space
.. |nbspc| unicode:: U+00A0 .. non-breaking space
You should see:
Here is a “ ” and a non-breaking space ( )
I was hoping to get out of this without needing custom code to handle it, but, alas, I haven't found a way to do so. I'll wait a few more days before I accept this answer in case someone has a better idea. The code below isn't complete, nor am I sure it's "done" (will sort out exactly what it should look like during our review process) but the basics are intact.
There are two main components to the approach:
introduce a char role which expects the unicode name of a character as its argument, and which produces an inline description of the character while wrapping the character itself in an inline literal node.
modify the text-wrapper Sphinx uses so that it won't break at the space.
Here's the code:
class TextWrapperDeux(TextWrapper):
_wordsep_re = re.compile(
r'((?<!`)\s+(?!`)|' # whitespace not between backticks
r'(?<=\s)(?::[a-z-]+:)`\S+|' # interpreted text start
r'[^\s\w]*\w+[a-zA-Z]-(?=\w+[a-zA-Z])|' # hyphenated words
r'(?<=[\w\!\"\'\&\.\,\?])-{2,}(?=\w))') # em-dash
#property
def wordsep_re(self):
return self._wordsep_re
def char_role(name, rawtext, text, lineno, inliner, options={}, content=[]):
"""Describe a character given by unicode name.
e.g., :char:`SPACE` -> "char:` `(U+00020 SPACE)"
"""
try:
character = nodes.unicodedata.lookup(text)
except KeyError:
msg = inliner.reporter.error(
':char: argument %s must be valid unicode name at line %d' % (text, lineno))
prb = inliner.problematic(rawtext, rawtext, msg)
return [prb], [msg]
app = inliner.document.settings.env.app
describe_char = "(U+%05X %s)" % (ord(character), text)
char = nodes.inline("char:", "char:", nodes.literal(character, character))
char += nodes.inline(describe_char, describe_char)
return [char], []
def setup(app):
app.add_role('char', char_role)
The code above lacks some glue to actually force the use of the new TextWrapper, imports, etc. When a full version settles out I may try to find a meaningful way to republish it; if so I'll link it here.
Markup: Starting character is the :char:`SPACE` which has the value 0.
It'll produce plaintext output like this: Starting character is the char:` `(U+00020 SPACE) which has the value 0.
And HTML output like: Starting character is the <span>char:<code class="docutils literal"> </code><span>(U+00020 SPACE)</span></span> which has the value 0.
The HTML output ends up looking roughly like: Starting character is the char:(U+00020 SPACE) which has the value 0.

How to store $2 value to variable in InDesign GREP find / change using javascript

I'm writing InDesign javascript, that do GREP find / change - actually for now I need only find some text and save to my script variable what grep found in $2 value - everything else for my script I know how to do - so all I need for now is to find out how to get this $2
simple example:
app.findGrepPreferences = NothingEnum.nothing;
app.findGrepPreferences.findWhat = "(\\d)+(\\d)";
found = app.activeDocument.findGrep();
...
found[0].contents; // this store entire string with both digits and plus, and I need only $2 value (second digit in this case)
InDesign's JS interface does not do that for you, it only returns the complete match.
Since contents is a simple Javascript string (not native InDesign text anymore), you can use Javascript's own match function:
..
app.findGrepPreferences.findWhat = "(\\d)+(\\d)";
found = app.activeDocument.findGrep();
m = found[0].contents.match (/(\d)+(\d)/);
alert (m.join('\n'));
Be careful not to mix InDesign's GREP syntax with Javascript's. In particular, special characters such as ~< are ID extensions and will fail in JS.
Note that for an input of "2014" this will return
2014
1
4
where the first line is the full match (equal to $0), 1 is $1 and 4 is $2. This is most likely not what you expected. Since you are repeatedly matching "group 1" with the +, each next single digit replaces the last found one (expect for the very last one). You probably meant something like
(\d+)(\d)
which will return
2014
201
4
I know this already has an answer, but I thought I'd mention another approach that I think is useful. One way to do this would be to use lookbehinds and lookaheads so that the thing that is in the second set of parentheses is the entirety of the found string. In your case, you could change your search to
(?<=\\d)\\d(?!\\d)
What this says is "find a digit that is preceded by a digit, and followed by something that isn't a digit. Using this, your found array will only contain whatever the second \d matches (the last digit in a number).
Here's a pretty good explanation of how these work: http://carijansen.com/2013/03/03/positive-lookbehind-grep-for-designers/
One thing to note: you can't use \d+ in the lookbehind (it can't be variable length). That's why I use a lookahead as well.

Getting rid of / escaping unwanted symbols before adding statements through Cypher in Neo4J

I need to get rid of unwanted symbols, such as the multiple spaces, the leading and trailing whitespaces, as well as escape single and double quotes and other characters that may pose problems in my Neo4J Cypher query.
I currently use this (string.js Node module and jsesc Node module)
result = S(result).trim().collapseWhitespace().s;
result = jsesc(result, { 'quotes': 'double' });
They work fine, however,
1) I want to find a better, easier way to do it (preferably without those libraries) ;
2) When I use other encodings (e.g. Russian), jsesc seems to translate it into some other encoding than UTF-8 that the other parts of my script don't understand.
So I wanted to ask you if you could recommend me a RegExp that would do the job above without me having to use those modules.
Thank you!
I have a series of regex replace calls that do what you seem to be looking for, or at least the issues you mentioned. I put together a test string with several items you mentioned.
var testString = ' I start with \"unwanted items and" end with a space". Also I have Quotes ';
var cleanedString = testString.replace(/\s\s+/g, ' ').replace(/^\s|\s$/g, '').replace(/([^\\])(['"])/g, "$1\\$2");
console.log(cleanedString);
This will escape quotes (single or double) that have not yet been escaped, though you would have to worry about the case where the item is preceded by an escaped escape symbol. For example \\' would not be turned into \\\' as it should be. If you want to escape more characters you just need to add them to the final .replace regex. Let me know if there are specific examples you are looking for.

Resources