Skip to main content

Tutorial 3 - More on Python Basics

More on Python Basics - Random Functions, REGEX, List Comprehensions

More on Python Basics

The content and the code excerpts have been derived from the following sources:
  1. 7 Python Random Module Functions You Should Know. https://www.techbeamers.com/using-python-random/
  2. What's the difference between Randrange and Randint functions. https://www.codecademy.com/en/forum_questions/521bcf2b548c359b28000367
  3. Python Regular Expressions https://www.tutorialspoint.com/python/python_reg_expressions.htm
  4. Python Regular Expression Tutorial https://www.datacamp.com/community/tutorials/python-regular-expression-tutorial
  5. Regular Expressions https://developers.google.com/edu/python/regular-expressions
  6. Map Filter and Reduce http://book.pythontips.com/en/latest/map_filter.html
  7. Python File Handling https://www.pythonforbeginners.com/cheatsheet/python-file-handling

Random Functions (from the Random Module)

Random values can be generate using the following functions of the Random module in Python.

1. Randrange() Function.

Syntax: Randrange(stop) or, Randrange (start, stop[,step]) The stop value sets the boundary of the range.
In [3]:
import random
print(random.randrange(999))
476
In [6]:
print(random.randrange(0,1000,20)) #generate a random value in the intervals of 20 between 0 and 1000
40
In [10]:
print (random.randrange(20,25)) #generates a number between 20 and 25
24
In [50]:
print(random.randrange(0.2, 0.5)) #randrange takes only integer arguments... hence the ERROR

NameErrorTraceback (most recent call last)
<ipython-input-50-bc44f1bc2056> in <module>()
----> 1 print(random.randrange(0.2, 0.5)) #randrange takes only integer arguments... hence the ERROR

NameError: name 'random' is not defined

2. Random.Randint(Low, High) Function.

The randint() function is one of many functions which handle random numbers. It has two parameters low and high and generates an integer between low and high, inclusive.
In [16]:
i = 0
while i < 5:
    # Get random number in range 0 through 9.
     r = random.randint(0, 9)
     print(r)
     i += 1
9
0
6
1
7
There is one slight difference between randrange and randint when used with just two parameters. randint(x,y) will return a value >= x and <= y, while randrange(x,y) will return a value >=x and < y (n.b. not less than or equal to y)
When using 3 parameters, randint can take only 2 parameters.

3. Random.Choice(Seq) Function.

The choice() function arbitrarily determines an element from the given sequence.
In [18]:
# Generates a random string from the list of strings
print(random.choice( ['Apple', 'Ball', 'Cat'] ))

# Generates a random number from a list
print(random.choice([-1, 1, 3.5, 9, 15]))

# Generates a random number from a tuple
print(random.choice((1.1, -5, 6, 4, 7)))

# Generate as random char from a string
print(random.choice('Life is Beautiful'))
Ball
1
7
B

4. Random.Shuffle(List) Function.

Purpose- The shuffle() function rearranges the items of a list in place so that they occur in a random order.
For shuffling, it uses the Fisher-Yates algorithm which has O(n) complexity. It starts by iterating the last element in the array to the first entry, then swap each entry with an entry at a random index below it.
In [20]:
from random import shuffle

mylist = [11,21,31,41,51]
shuffle(mylist)

print(mylist)
[41, 51, 31, 11, 21]

5. Random.Sample(Collection, Random List Length) Function.

The sample() function randomly selects N items from a given collection (list, tuple, string, dictionary, set) and returns them as a list. It works by sampling the items without replacement. It means a single element from the sequence can appear in the resultant list at most once.
In [22]:
from random import sample

print(sample('Nepal',3)) # Selects any 2 chars from a string
print(sample((21, 12, -31, 24, 65, 16.3), 3)) # Creates a tuple of any three elements from a base tuple
print(sample([11, 12, 13, 14, -11, -12, -13, -14], 3)) # Randomly selects a list of three elements from a base list
print(sample({110, 120, 130, 140}, 3)) # Randomly selects a subset of size three from a given set of numbers
print(sample({'ABC', 'BCD', 'CDE', 'EFG'}, 3)) # Randomly selects a subset of size three from a given set of strings
['a', 'p', 'N']
[21, 24, -31]
[-12, -14, 11]
[140, 120, 130]
['EFG', 'ABC', 'CDE']

6. Random.Random() Function.

It selects the next random floating point number from the range [0.0, 1.0]. It is a semi-open range as the random function will always return a decimal no. less than its upper bound. However, it may return 0.
In [26]:
from random import random
print(random())
print(random())
print(random())
0.261579937049
0.608479898102
0.657566887448

7.Random.Uniform(Lower, Upper) Function.

It is an extension of the random() function. In this, you can specify the lower and upper bounds to generate a random number other than the ones between 0 and 1.
In [31]:
import random
print(random.uniform(500, 900))
print(random.uniform(500, 900))
print(random.uniform(500, 900))
506.356121176
733.243790287
729.920066651
In [34]:
# Generate a floating-point random number with fixed precision
import random

lower = 1.0; upper = 2.0; fixed_precision = 2
random_float = random.uniform(lower, upper)
print(round(random_float, fixed_precision)) 
#USING THE ROUND function to roundup to the fixed precision which is 2 in this case
1.32

REGEX - REGular EXpressions in Python

A regular expression is a special sequence of characters that helps you match or find other strings or sets of strings, using a specialized syntax held in a pattern. The module re provides full support for Perl-like regular expressions in Python. The re module raises the exception re.error if an error occurs while compiling or using a regular expression.

Match() Function

This function attempts to match RE pattern to string with optional flags.
Syntax: re.match(pattern, string, flags=0)
In [8]:
import re

line = "Maths is easier than Literature"

matchObj = re.match( r'(.*) is (.*?) .*', line, re.M|re.I)

if matchObj:
    print "matchObj.group() : ", matchObj.group()
    print "matchObj.group(1) : ", matchObj.group(1)
    print "matchObj.group(2) : ", matchObj.group(2)
else:
    print "No match!!"
matchObj.group() :  Maths is easier than Literature
matchObj.group(1) :  Maths
matchObj.group(2) :  easier

Search() Function

This function searches for first occurrence of RE pattern within string with optional flags.
Syntax: re.search(pattern, string, flags=0)
In [10]:
import re

line = "Maths is easier than Literature";

searchObj = re.search( r'(.*) is (.*?) .*', line, re.M|re.I)

if searchObj:
    print "searchObj.group() : ", searchObj.group()
    print "searchObj.group(1) : ", searchObj.group(1)
    print "searchObj.group(2) : ", searchObj.group(2)
else:
    print "Nothing found!!"
searchObj.group() :  Maths is easier than Literature
searchObj.group(1) :  Maths
searchObj.group(2) :  easier
Match() checks for a match only at the beginning of the string, while Search() checks for a match anywhere in the string
In [11]:
import re

line = "Cars are faster than bicycles";

matchObj = re.match( r'bicycles', line, re.M|re.I)
if matchObj:
    print "match --> matchObj.group() : ", matchObj.group()
else:
    print "No match!!"

searchObj = re.search( r'bicycles', line, re.M|re.I)
if searchObj:
    print "search --> searchObj.group() : ", searchObj.group()
else:
    print "Nothing found!!"
No match!!
search --> searchObj.group() :  bicycles

Sub()

This method replaces all occurrences of the RE pattern in string with repl, substituting all occurrences unless max provided. This method returns modified string.
Syntax: re.sub(pattern, repl, string, max=0)
In [16]:
import re

phone = "610-845-2975 # This is Phone Number"

num = re.sub(r'#.*$', "", phone) #Removing # by replacing it wih ""
print ("Phone Num : ", num)

num = re.sub(r'\D', "", phone) #Removing all characters except digits.   
print ("Phone Num : ", num)

num = re.sub(r'610',"972", phone) #replacing the area code of the phone number
print ("Phone Num : ", num)
('Phone Num : ', '610-845-2975 ')
('Phone Num : ', '6108452975')
('Phone Num : ', '972-845-2975 # This is Phone Number')

Wild Card Characters: Special Characters

Special characters are characters which do not match themselves as seen but actually have a special meaning when used in a regular expression.
The most widely used special characters are:
. - A period. Matches any single character except newline character.
In [19]:
re.search(r'Co.k.e', 'Cookie').group() #The group() function returns the string matched by the re.
Out[19]:
'Cookie'
\w - Lowercase w. Matches any single letter, digit or underscore.
In [20]:
re.search(r'Co\wk\we', 'Cookie').group()
Out[20]:
'Cookie'
\W - Uppercase w. Matches any character not part of \w (lowercase w).
In [22]:
re.search(r'C\Wke', 'C@ke').group()
Out[22]:
'C@ke'
\s - Lowercase s. Matches a single whitespace character like: space, newline, tab, return. \S - Uppercase s. Matches any character not part of \s (lowercase s).
In [25]:
print(re.search(r'Eat\scake', 'Eat cake').group())
print(re.search(r'Cook\Se', 'Cookie').group())
Eat cake
Cookie
In [32]:
#\d - Lowercase d. Matches decimal digit 0-9.
print(re.search(r'c\d\dkie', 'c00kie').group())

#^ - Caret. Matches a pattern at the start of the string.
print(re.search(r'^Eat', 'Eat cake').group())

#$ - Matches a pattern at the end of string.
print(re.search(r'cake$', 'Eat cake').group())

#[abc] - Matches a or b or c.
#[a-zA-Z0-9] - Matches any letter from (a to z) or (A to Z) or (0 to 9). Characters that are not within a range can be matched by complementing the set. If the first character of the set is ^, all the characters that are not in the set will be matched.
print(re.search(r'Number: [0-6]', 'Number: 5').group())

#Matches any character except 5
print(re.search(r'Number: [^5]', 'Number: 0').group())
#\A - Uppercase a. Matches only at the start of the string. Works across multiple lines as well.
print(re.search(r'\A[A-E]ookie', 'Cookie').group())
#\b - Lowercase b. Matches only the beginning or end of the word.
print(re.search(r'\b[A-E]ookie', 'Cookie').group())

#\ - Backslash. If the character following the backslash is a recognized escape character, then the special meaning of the term is taken. For example, \n is considered as newline. However, if the character following the \ is not a recognized escape character, then the \ is treated like any other character and passed through.

# This checks for '\' in the string instead of '\t' due to the '\' used 
print(re.search(r'Back\\stail', 'Back\stail').group())

# This treats '\s' as an escape character because it lacks '\' at the start of '\s'
print(re.search(r'Back\stail', 'Back tail').group())
c00kie
Eat
cake
Number: 5
Number: 0
Cookie
Cookie
Back\stail
Back tail

Repetitions

It becomes quite tedious if you are looking to find long patterns in a sequence. Fortunately, the re module handles repetitions using the following special characters:
In [33]:
# + - Checks for one or more characters to its left.
print(re.search(r'Co+kie', 'Cooookie').group())

# * - Checks for zero or more characters to its left.
# Checks for any occurrence of a or o or both in the given sequence
print(re.search(r'Ca*o*kie', 'Caokie').group())

# ? - Checks for exactly zero or one character to its left.
# Checks for exactly zero or one occurrence of a or o or both in the given sequence
print(re.search(r'Colou?r', 'Color').group())
Cooookie
Caokie
Color
But what if you want to check for exact number of sequence repetition?
For example, checking the validity of a phone number in an application. re module handles this very gracefully as well using the following regular expressions:
{x} - Repeat exactly x number of times.
{x,} - Repeat at least x times or more.
{x, y} - Repeat at least x times but no more than y times.
The + and * qualifiers are said to be greedy.
In [34]:
re.search(r'\d{9,10}', '0987654321').group()
Out[34]:
'0987654321'
Here's a cheat sheet for the regular expressions: https://www.dataquest.io/blog/regex-cheatsheet/

Map, Filter and Reduce

Map()

Map() applies a function to all the items in an input_list. Syntax: map(function_to_apply, list_of_inputs)
In [36]:
#Without using Map()
items = [1, 2, 3, 4, 5]
squared = []
for i in items:
    squared.append(i**2)
print(squared)
[1, 4, 9, 16, 25]
In [38]:
#Using Map()
items = [1, 2, 3, 4, 5]
squared = list(map(lambda x: x**2, items))
print(squared)
[1, 4, 9, 16, 25]
In [39]:
#We can also use a list of functions
def multiply(x):
    return (x*x)
def add(x):
    return (x+x)

funcs = [multiply, add]
for i in range(5):
    value = list(map(lambda x: x(i), funcs))
    print(value)
[0, 0]
[1, 2]
[4, 4]
[9, 6]
[16, 8]

Filter()

Filter creates a list of elements for which a function returns true.
In [40]:
number_list = range(-5, 5)
less_than_zero = list(filter(lambda x: x < 0, number_list))
print(less_than_zero)
[-5, -4, -3, -2, -1]

Reduce()

Reduce is a really useful function for performing some computation on a list and returning the result. It applies a rolling computation to sequential pairs of values in a list.
In [43]:
product = reduce((lambda x, y: x * y), [1, 2, 3, 4])
print (product) # prints the product of the integerst in the list [1,2,3,4]
24

File Handling in Python

File handling in Python requires no importing of modules. Instead we can use the built-in object "file". That object provides basic functions and methods necessary to manipulate files by default.

Open()

The open() function is used to open files in our system, the filename is the name of the file to be opened. The mode indicates, how the file is going to be opened "r" for reading, "w" for writing and "a" for a appending.
In [45]:
filename = "hello.txt"
file = open(filename, "r")
for line in file:
    print line,
This is a hello.txt.
Using this for file handling.

Read ()

The read functions contains different methods, read(),readline() and readlines() read() -- return one big string readline -- #return one line at a time readlines -- returns a list of lines

Write ()

This method writes a sequence of strings to the file.
write() -- Used to write a fixed sequence of characters to a file writelines() -- writelines can write a list of strings.

Append ()

The append function is used to append to the file instead of overwriting it. To append to an existing file, simply open the file in append mode ("a"):

Close()

When you’re done with a file, use close() to close it and free up any system resources taken up by the open file
In [49]:
#To open a text file, use:
fh = open("hello.txt", "r")

#To read a text file, use:
fh = open("hello.txt","r")
print fh.read()

#To read one line at a time, use:
fh = open("hello.txt", "r")
print fh.readline()

#To read a list of lines use:
fh = open("hello.txt.", "r")
print fh.readlines()

#To write to a file, use:
fh = open("hello.txt","w")
fh.write("Hello World")
fh.close()

#To write to a file, use:
fh = open("hello.txt", "w")
lines_of_text = ["a line of text", "another line of text", "a third line"]
fh.writelines(lines_of_text)
fh.close()

#To append to file, use:
fh = open("Hello.txt", "a")
fh.write("Hello World again")
fh.close

#To close a file, use
fh = open("hello.txt", "r")
print fh.read()
fh.close()
a line of textanother line of texta third line
a line of textanother line of texta third line
['a line of textanother line of texta third line']
a line of textanother line of texta third lineHello World again

List Comprehensions

List comprehensions provide a concise way to create lists. It consists of brackets containing an expression followed by a for clause, then zero or more for or if clauses. The expressions can be anything, meaning you can put in all kinds of objects in lists. The result will be a new list resulting from evaluating the expression in the context of the for and if clauses which follow it.
The list comprehension always returns a result list.
Syntax: [expression for item in list if conditional]
In [53]:
x = [i for i in range(10)]
print x
[0, 1, 2, 3, 4, 5, 6, 7, 8, 9]
In [55]:
squares = []
for x in range(10):
    squares.append(x**2)
print squares

# Or you can use list comprehensions to get the same result:
squares = [x**2 for x in range(10)]
print squares
[0, 1, 4, 9, 16, 25, 36, 49, 64, 81]
[0, 1, 4, 9, 16, 25, 36, 49, 64, 81]
In [56]:
listOfWords = ["this","is","a","list","of","words"]
items = [ word[0] for word in listOfWords ]
print items
['t', 'i', 'a', 'l', 'o', 'w']
In [57]:
[x.lower() for x in ["A","B","C"]]
Out[57]:
['a', 'b', 'c']
In [65]:
string = "Hello 12345 World"
numbers = [x for x in string if x.isdigit()]
print numbers
['1', '2', '3', '4', '5']

Popular posts from this blog

Tutorial 6 - Statistics and Probability

Statistics and Probability with Python HW 6 Statistics and probability homework ¶ Complete homework notebook in a homework directory with your name and zip up the homework directory and submit it to our class blackboard/elearn site. Complete all the parts 6.1 to 6.5 for score of 3. Investigate plotting, linearegression, or complex matrix manipulation to get a score of 4 or cover two additional investigations for a score of 5. 6.1 Coin flipping ¶ 6.1.1 ¶ Write a function, flip_sum, which generates $n$ random coin flips from a fair coin and then returns the number of heads. A fair coin is defined to be a coin where $P($heads$)=\frac{1}{2}$ The output type should be a numpy integer, hint: use random.rand() In [4]: import numpy as np import random """def random_flip(): return random.choice(["H", "T"]) def flip_sum(n): heads_count = 0 ...

Tutorial 5 - Matplotlib

Matplotlib Tutorial In [13]: % matplotlib inline import pandas as pd import numpy as np import matplotlib as mpl import matplotlib.pyplot as plt # Setting some Pandas options pd . set_option ( 'display.notebook_repr_html' , False ) pd . set_option ( 'display.max_columns' , 25 ) pd . set_option ( 'display.max_rows' , 25 ) Homework 4 ¶ Couple of reference site: http://matplotlib.org/examples/pylab_examples/ http://docs.scipy.org/doc/numpy/ Homework 4.1 ¶ 4.1.a Create a figure with two subplots in a row. One shows a sine wave of x from with x = 0 ... 2*pi the other shows the tagent of x with the same range. Label the figures. Should look something like: We can follow the following steps to get the required graphs showing sine and tangents of x: Create a numpy array x with values from 0 to 2*pi with 0.001 as step value Set the height and w...

Domain Research - Stock Market Prediction

Hi, as part of my research on a domain of Big Data implementation, I chose Stock Market Prediction. Here I present to you the things that I have learned during my research in the domain. Can stock market be predicted? Early researches on stock market prediction revolved around whether it could be predicted. One of such researches suggested that “short term stock price movements were governed by the  random walk hypothesis  and thus were unpredictable”. Another stated that “the stock price reflected completed market information and the market behaved efficiently so that instantaneous price corrections to equilibrium would make stock prediction useless.” In simple terms, the researches inferred that since the market was affected by a lot of factors which were random predicting the stock market is almost impossible. However, researches carried out later (Brown & Jennings 1998; Abarbanel & Bushee 1998) made use of ...