Archive

Archive for December, 2011

appengine – SELECT * From my_model WHERE category LIKE ‘news%’

December 13, 2011 Leave a comment

Problem:
I needed to use a LIKE query in appengine.

Solution:
Got some great answers from google search.

Source:
The solution I went with is here.
http://pythoughts.com/app-engine-datastore-workarounds-and-search/

From pythoughts.com

Where in sql you usually would query:

"SELECT * FROM thought WHERE user LIKE gra%"

We now have to use:

"SELECT * FROM thought WHERE user >= 'gra' AND user < :1", u'gra' + u'\ufffd'"

Imagine the table as being sorted alphabetically, user >= ‘gra’ would include any user-names that come after, or equal, ‘G’, then ‘R’, then ‘A’ in the alphabet. The next filter is a less-than inequality, so it’s going to filter out any values that are greater than the specified term, which we set as u’gra’ + u’\ufffd’. The unicode character and ‘\ufffd’ code specify that the result must include the first three character, ‘gra’ (formatted as unicode in order to concatenate with “u’\ufff'”), as well as any other unicode characters following that. This will limit the query enough to return a value such as ‘graeme’, if the search term is ‘grae’, but will not return ‘graham’, or, say, ‘tagrae’.

NB.
1. This works pretty well as the amount of data that I have will be “relatively small”.
2. The data saved in category field is in lowercase, without spaces or punctuation for easier search. Above solution is case sensitive.

Categories: appengine Tags:

awk – Passing a variable from bash to awk

December 9, 2011 1 comment

Problem:
How to pass a shell variable to awk.

Solution:
awk -v varible=$shell_variable

To substitue foo with bar for the line with Carl the following will not work.

$echo -e "Richard:foo\nSam:foo\nCarl:foo" | awk -F: -v var=Carl '/^var:/ {gsub(/foo/,"bar", $2); print}'

This is because ‘var’ is being treated as a literal string and not as a ‘variable’.
Instead use:

$echo -e "Richard:foo\nSam:foo\nCarl:foo" | awk -F: -v var=Carl '$0 ~ var {gsub(/foo/,"bar",$2); print}'
Carl:bar

Source:
http://www.unix.com/shell-programming-scripting/172894-trouble-passing-variable-bash-awk-gsub-command.html

From

gsub(regexp, replacement [, target])
Search target for all of the longest, leftmost, nonoverlapping matching substrings it can find and replace them with replacement. The ‘g’ in gsub() stands for “global,” which means replace everywhere. For example:

{ gsub(/Britain/, “United Kingdom”); print }

replaces all occurrences of the string ‘Britain’ with ‘United Kingdom’ for all input records.

The gsub() function returns the number of substitutions made. If the variable to search and alter (target) is omitted, then the entire input record ($0) is used. As in sub(), the characters ‘&’ and ‘\’ are special, and the third argument must be assignable.

And

sub(regexp, replacement [, target])
Search target, which is treated as a string, for the leftmost, longest substring matched by the regular expression regexp. Modify the entire string by replacing the matched text with replacement. The modified string becomes the new value of target. Return the number of substitutions made (zero or one).

The regexp argument may be either a regexp constant (/…/) or a string constant (“…”). In the latter case, the string is treated as a regexp to be matched. See Computed Regexps, for a discussion of the difference between the two forms, and the implications for writing your program correctly.

This function is peculiar because target is not simply used to compute a value, and not just any expression will do—it must be a variable, field, or array element so that sub() can store a modified value there. If this argument is omitted, then the default is to use and alter $0.2 For example:

str = “water, water, everywhere”
sub(/at/, “ith”, str)

sets str to ‘wither, water, everywhere’, by replacing the leftmost longest occurrence of ‘at’ with ‘ith’.

If the special character ‘&’ appears in replacement, it stands for the precise substring that was matched by regexp. (If the regexp can match more than one string, then this precise substring may vary.) For example:

{ sub(/candidate/, “& and his wife”); print }

changes the first occurrence of ‘candidate’ to ‘candidate and his wife’ on each input line. Here is another example:

$ awk ‘BEGIN {
> str = “daabaaa”
> sub(/a+/, “C&C”, str)
> print str
> }’
-| dCaaCbaaa

This shows how ‘&’ can represent a nonconstant string and also illustrates the “leftmost, longest” rule in regexp matching (see Leftmost Longest).

The effect of this special character (‘&’) can be turned off by putting a backslash before it in the string. As usual, to insert one backslash in the string, you must write two backslashes. Therefore, write ‘\\&’ in a string constant to include a literal ‘&’ in the replacement. For example, the following shows how to replace the first ‘|’ on each line with an ‘&’:

{ sub(/\|/, “\\&”); print }

As mentioned, the third argument to sub() must be a variable, field or array element. Some versions of awk allow the third argument to be an expression that is not an lvalue. In such a case, sub() still searches for the pattern and returns zero or one, but the result of the substitution (if any) is thrown away because there is no place to put it. Such versions of awk accept expressions like the following:

sub(/USA/, “United States”, “the USA and Canada”)

For historical compatibility, gawk accepts such erroneous code. However, using any other nonchangeable object as the third parameter causes a fatal error and your program will not run.

Finally, if the regexp is not a regexp constant, it is converted into a string, and then the value of that string is treated as the regexp to match.

Categories: bash Tags:

awk – Using multiple Input Field Separators

December 9, 2011 Leave a comment

Problem:
Each record is separated by |~|

Solution:

$echo "9882590|~|20111207|~|K03501000063005574033961|~|K|~|" | awk -v FS="[|~]+" -v OFS = "," '{print $1,$2,$3,$4}'

Source:
http://www.unix.com/shell-programming-scripting/172903-read-multiple-character-delimited-flat-file.html

Update: 2012-11-20 15:58:50
Another example:
I wanted to get rid of the time portion in the first filed. This part in the awk command FS=” |\|” says the Field separator is either a space or a pipe character.

$ less file
# Date time | Number
2012-10-01 06:08:54|377772327281
2012-10-01 06:08:55|371202225977
2012-10-01 06:08:55|061712877272
2012-10-01 06:08:55|671472162075
2012-10-01 06:08:55|179001295722
$ awk -v FS=" |\|" '{print $1, $3}' file | sort -k2,1 
2012-10-01 061712877272
2012-10-01 179001295722
2012-10-01 371202225977
2012-10-01 377772327281
2012-10-01 671472162075

sed – Case-insensitive search

December 1, 2011 Leave a comment

Problem:
Doing a case-insensitive search with sed.

Solution:

`/REGEXP/I’
`\%REGEXP%I’
The `I’ modifier to regular-expression matching is a GNU extension
which causes the REGEXP to be matched in a case-insensitive manner.

Example:

user@machine:~$ echo "someTHIng" | sed -n "/something/p"

user@machine:~$ echo "someTHIng" | sed -n "/something/Ip"
someTHIng
user@machine:~$ sed --version
GNU sed version 4.1.5
Copyright (C) 2003 Free Software Foundation, Inc.
This is free software; see the source for copying conditions.  There is NO
warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE,
to the extent permitted by law.
Categories: bash, sed Tags: ,