# Python Crash Course

## Variables

Python is a *dynamically typed* language: the data type of the variable is determined at runtime and does not have to be specified explicitly.

In [None]:
x = 5/2

In [None]:
x

In [None]:
type(x)

In [None]:
x = 5

In [None]:
x

In [None]:
type(x)

In [None]:
type(x)==float, isinstance(x, float)

Variables can change type after they have been set.

In [None]:
x = "Computational Linguistics" # x is now of type str
x

In [None]:
type(x)

## Arithmetic Operators

In [None]:
600+3     # Addition

In [None]:
3*(2+3)+2

In [None]:
17 / 3    # Division

In [None]:
2**6      # Exponentiation

In [None]:
17 // 3   # Floor division

In [None]:
-17 // 3


In [None]:
17 % 3    # Modulus

## Logical Operators

In [None]:
x = 10

In [None]:
x > 1 and x < 20 # Returns True if both statements are true

In [None]:
x > 5 or x > 20  # Returns True if one of the statements is true

In [None]:
not(x > 1 and x < 20)

## Modules

Python code is organized in *modules*. If we want to use a particular module, we have to *import* it first:

In [None]:
# load the module that is responsible for the calculation of random numbers
import random

In [None]:
# Select a random number between 10 and 20
random.randrange(10,20)

We can choose to import only parts from a module, by using the `from` keyword.

In [None]:
from random import randrange
randrange(10)

## Control structures

Python's syntax is __intendation-based__ (versus for example bracket-based like many languages). So Code formatting matters!

Python has the usual control structures `if`, `while` and `for`:

###### if

In [None]:
x = random.randrange(10)
y = random.randrange(10)
print(f"x = {x}")
print(f"y = {y}")

In [None]:
if x > y : 
    print("x is greater than y")
elif x < y:
    print("y is greater than x")
else:
    print("x and y are equal")

<div class="alert alert-warning">

**Python f-Strings** allow generating strings out of variables with automated formatting. See below.

</div>

#### Conditional Expression

Introduced in PEP 308, and often referred to as a ternary operator:

```python
x = x_if_true if condition else x_if_false
```
which is the succint version of:

```python
if condition:
    x = x_if_true
else:
    x = x_if_false
```

In [None]:
sun_shining = True
x = 35 if sun_shining else -4
print(x)

##### while

In [None]:
x = 1
while x / 2 > 0 :
    x /= 2
print(f"{x} is the smallest positive number I know.")

In [None]:
type(x)

###### for 

In [None]:
for i in range(10,15):
    print(i)
    print(i**2)

In [None]:
list(range(10,15,2))

## Lists, Tuples, Sets, and Dictionaries

There are four collection data types in the Python programming language:

- **List** is a collection which is ordered and changeable. Allows duplicate members.
- **Tuple** is a collection which is ordered and unchangeable. Allows duplicate members.
- **Set** is a collection which is unordered and unindexed. No duplicate members.
- **Dictionary** is a collection which is changeable and indexed. No duplicate members.

In [None]:
my_list = ['one', 'two', 'three', 'four', 'five']
my_list

In [None]:
len(my_list)

In [None]:
my_list[0]

In [None]:
my_list[-2]

In [None]:
my_list[2:4]

To add an item to the end of the list, we can use the `append()` method:

In [None]:
my_list.append('six')
my_list

In [None]:
my_list.append(3.14)
my_list

In [None]:
my_list.extend(["seven", "eight"])
my_list

A **Tuple** is a collection which is ordered and unchangeable.

In [None]:
a = ('eins', 'zwei', 'drei')

In [None]:
a[0]

In [None]:
a[1] = 'vier' # This will throw an error

A **set** is a collection which is unordered and unindexed. Once a set is created, you cannot change its items, but you can add new items.

In [None]:
set1 = {"a", "b","c"} 
set1

In [None]:
["a", "b","c", "b","c"]

In [None]:
set2 = set(["a", "b","c", "b","c"]) 
set2

In [None]:
l = ["a", "b","c", "b","c"]
l = list(set(l))
l

In [None]:
# Add an item to a set
set2.add("f")
set2

In [None]:
# Add multiple items to a set
set2.update("g","h")
set2

The `union()` method returns a new set with all items from both sets:

In [None]:
things = {"book", "table", "chair"}
colors = {"blue", "red", "green"}
words = colors.union(things) # or males | females
words

In [None]:
words - colors # OR words.difference(colors)

In [None]:
# Intersection
words & colors # OR things.intersection(colors)

A **dictionary** is a key-value collection which is changeable and indexed. In Python dictionaries are written with curly brackets and colon.

In [None]:
d = { 'abc': 10, 'def' : True }

In [None]:
d['def']

In [None]:
d['z'] = ['Python', 'is', 'a', 'great', 'language']

In [None]:
d

In [None]:
len(d)

In [None]:
#  Iterate by key:
for x in d.keys():
    print(f"key {x}: value {d[x]}")

In [None]:
#  Iterate by value:
for x in d.values():
    print(x)

In [None]:
#  Iterate by key and value
for key, value in d.items():
    print(key, value)

Useful functions if you are uncertain about the set of keys:

In [None]:
d["l"] # This will throw an error

In [None]:
"z" in d, "l" in d

In [None]:
d.get("abc", "default_value"), d.get("a", "default_value")

# List Comprehensions

List comprehensions are a tool for transforming one list into another list. During this transformation, elements can be conditionally included in the new list and each element can be transformed as needed.

Every list comprehension can be rewritten as a for loop but not every for loop can be rewritten as a list comprehension.

```python
new_list = []
for ITEM in old_list:
    if condition_based_on(ITEM):
        new_list.append(function(ITEM))
        ```

The above *for* loop can be rewritten as a list comprehension like this:
```python
new_list = [function(ITEM) for ITEM in old_list if condition_based_on(ITEM)]
```

In [None]:
numbers = [1, 2, 3, 4, 5]
doubled_odds = []
for n in numbers:
    if n % 2 == 1:
        doubled_odds.append(n * 2)
doubled_odds

In [None]:
doubled_odds = [n * 2 for n in numbers if n % 2 == 1]
doubled_odds

### Nested Loops
Hereâ€™s a for loop that flattens a matrix (a list of lists):

In [None]:
matrix = [[1,2,3,4], [5,6,7,8], [9,10,11,12]]
flattened = []
for row in matrix:
    for n in row:
        flattened.append(n)
flattened

In [None]:
flattened = [n for row in matrix for n in row]
flattened

## Strings

Strings are a Python object to manage sequences of characters.

In [None]:
s = 'Python is a great programming language'
print(s)

In [None]:
s = "Python is a great programming language"
print(s)

In [None]:
# Multiline Strings
s = """Python is a great
 programming language."""

print(s)

# Note: the line breaks are inserted at the same position as in the code.

 In effect a string behaves very much like a list:Square brackets can be used to access elements of the string.

In [None]:
s = "Python is a great programming language!"
s[0]

In [None]:
# String Length
len(s)

In [None]:
# Slicing
s[18:29] # Specify the start index and the end index, separated by a colon, to return a part of the string.

In [None]:
# Use negative indexes to start the slice from the end of the string
s[-10:-2]

In [None]:
# Indexes can also be omitted if they 
# correspond to the beginning or end
s[:15]

In [None]:
s[15:]

In [None]:
s[-15:]

In [None]:
s[:-15]

In [None]:
# Iterating over the string returns individual characters:
for c in s:
    print(c)

Python has a set of built-in methods that you can use on strings.

In [None]:
s.lower() # returns the string in lower case

In [None]:
s.upper() # returns the string in upper case

In [None]:
s.split(" ") # Splits the string at the specified separator (in this example the white space), and returns a list

In [None]:
s = 'Das ist ein Beispiel'
ss = ' '.join([s, s.upper(), s.lower()])
print(ss)

ss = "_".join(["ein", "beispiel", "hier"])
print(ss)

In [None]:
s = "Python "
s.isalpha() # Returns True if all characters in the string are in the alphabet

In [None]:
s = "Python"
s.isalpha()

In [None]:
s = "Python is a great programming language!"
print(s.split(' '))
[w.isalpha() for w in s.split(' ')]

In [None]:
[w for w in s.split(' ') if 'i' in w]

## f-strings

A modern and easy way to format strings, mostly for output purposes.

In [None]:
s = f"A pythonic f-string"
s

In [None]:
result = 3.14
number = 42

print(f"The result is {result} and the number is {number}. Added up it's {result + number}")

## Sorting, Minimum und Maximum

In [None]:
my_list = [2,4,7,12,6,8,2,3,4,5]

In [None]:
min(my_list)

In [None]:
max(my_list)

In [None]:
sorted(my_list, reverse=False)

In [None]:
my_list = 'This a sentence that serves as an example for splitting strings.'.split(' ')
my_list

In [None]:
min(my_list)

In [None]:
max(my_list)

In [None]:
sorted(my_list)

In all three functions above, an additional parameter `key` can be specified - a feature that makes keys out of the elements that are compared:

In [None]:
# Sort words by length
sorted(my_list, key=len)

In [None]:
sorted(my_list, key=len, reverse = True) # Reversing the Sort Order

In [None]:
# Ignore Case-sensitivity
sorted(my_list, key=lambda w: w.lower())

<div class="alert alert-warning">
A <b>lambda</b> function is a small anonymous function, it can take any number of arguments, but can only have one expression.
    </div>

In [None]:
x = lambda a : a.lower()
print(x("Berlin"))

In [None]:
def myfunc(n):
  return lambda a : a * n

mydoubler = myfunc(2)
mytrippler = myfunc(3)

print(mydoubler(11))
print(mytrippler(11))

### Sorting Dictionaries

In [None]:
prices = {'Apple': 1.99, 'Banana': 0.99, 'Orange': 1.49, 'Cantaloupe': 3.99, 'Grapes': 0.39}
sorted(prices) # List of Sorted Keys

In [None]:
sorted(prices.keys()) # List of Sorted Keys

In [None]:
sorted(prices.values()) # List of Sorted Values

In [None]:
sorted(prices.items()) #Sorting by Keys

In [None]:
sorted(prices.items(), key = lambda x : x[1]) # Sorting by Values

In [None]:
sorted(prices.items(), key = lambda x : x[1], reverse=True)# Reversing the Sort Order

## Functions

In Python a function is defined using the `def` keyword:

In [None]:
def pow_self(x):
    return x**x

for i in range(10):
    print(pow_self(i))

Since python is a dynamically typed language, problems often occur when data types behave similarly. That means for example methods and functions do **not** fail as long as the the methods exist and have meaning with regards to that type. A useful feature of python is to use _type hinting_ to hint at what a function expects and returns as data types:

In [None]:
def int_pow_self(x: int) -> int:
    return x**x

for i in range(10):
    print(int_pow_self(i))

This is useful for documentation and comments but does **not** fail, when given another data type:

In [None]:
int_pow_self(float(i))


In [None]:
int_pow_self(str(i))

## File Handling

The key function for working with files in Python is the `open()`function. It takes two parameters; `filename`, and `mode`. There are four different methods (modes) for opening a file:

- "r" - Read - Default value. Opens a file for reading, error if the file does not exist
- "a" - Append - Opens a file for appending, creates the file if it does not exist
- "w" - Write - Opens a file for writing, creates the file if it does not exist
- "x" - Create - Creates the specified file, returns an error if the file exists

In [None]:
fs = open("bible.txt","r") 
contents = fs.readlines() # read()
print(contents[:50])
fs.close()

In [None]:
f = open("test.txt", "w") # Overwrite the content
f.write("Woops! I have deleted the content!") 
f.close()

In [None]:
f = open("test.txt", "a") # Append content to the file
f.write("This is a new Content!") 
f.close()

In [None]:
fs = open("test.txt","r") 
contents = fs.read()
print(contents)
fs.close()

It is good practice to always `close()` an opened file after using it. A convenient way to handle opening/closing files (among other things) is the `with` statement, which automatically takes care of closing opened files:

In [None]:
with open("test.txt","r") as fs: 
    contents = fs.read()
    print(contents)

## Exercises
<div class="alert alert-info">

### Exercise 1
A) Using a list comprehension, create a new list called "newlist" out of the list "numbers", which contains only the positive numbers from the list.
</div>

In [None]:
numbers = [34.6, -203.4, 44.9, 68.3, -12.2, 44.6, 12.7]
newlist= [x for x in numbers if x > 0.0]
print(newlist)

<div class="alert alert-info">
B) Rewrite the following for loop using a list comprehension


```python
number_list = []
for x in range(100):
    if x % 3 == 0:
        if x % 5 == 0:
            number_list.append(x)
print(number_list)
```
</div>

In [None]:
number_list=[ x for x in range(100) if x%3 == 0 and x%5 == 0]
print(number_list)

<div class="alert alert-info">

### Exercise 2
Write a function to remove all punctuations from a string

You can use:

```python
import string
string.punctuation
```

</div>

In [None]:
import string
string.punctuation

In [None]:
sentence = "he said: 'what are you doing?'"
def clean(sentence):
    return "".join([c for c in sentence if c not in string.punctuation])
# Output should be 'he said what are you doing'
clean(sentence)

<div class="alert alert-info">
    
### Excersice 3: 

In folder *Data* you will find a file *bible.txt*, write a python code to:

- Calculate the average word length
- Calculate the average verses length
- Print the number of unique words in the text
- Get a set of unique words in the text and save them into *uniqueWords.txt*
- Get the most frequent words, and save them to file and call it *statistics.txt*

</div>

In [None]:
with open('bible.txt') as infile:
    contents = infile.readlines()
contents[:5]

In [None]:
contents = [ s.strip() for s in contents if s != '\n']
contents[:15]

In [None]:
tokenized_contents = [ clean(s).split(" ") for s in contents]
tokenized_contents[:5]

In [None]:
# Average Word Length
word_lengths = [len(token) for verses in tokenized_contents for token in verses]
avg_word_lengths = sum(word_lengths)/len(word_lengths)
print(f"{avg_word_lengths} chars per word")

In [None]:
# Average Verse Length
verse_lengths = [len(verse) for verse in tokenized_contents]
avg_verse_lengths = sum(verse_lengths)/len(verse_lengths)
print(f"{avg_verse_lengths} words per verse")

In [None]:
# Word Counts
wordlist = [word for verse in tokenized_contents for word in verse]
vocab = set(wordlist)
# Number of unique Words
print(f"Number of unique Words={len(vocab)}")

In [None]:
# save unique words
with open("unique_words.txt", "a") as f:
    f.write("\n".join(vocab))

# Create Dictionary
word_frequencies = {word : 0 for word in vocab}

for w in wordlist:
    word_frequencies[w] += 1
print(list(word_frequencies.items())[:10])

from collections import Counter
frequencies = Counter(wordlist)
list(frequencies.items())[:10]

In [None]:
# save most frequent words
word_frequencies = sorted(word_frequencies.items(), key = lambda x : x[1], reverse=True)

most_frequent_words = [word for word, _ in word_frequencies[:20]]
with open("most_frequent_words.txt", "a") as f:
    f.write("\n".join(most_frequent_words))

## Python Classes

Classes are a way to bundle up functions and variables into an object (Object-Oriented Programming).
That way data and corresponding functionality are enclosed in a newly defined data type.


In Python, classes define a new scope of definitions using the keyword **class**. The scope is marked by intendation.


In [None]:
class Student:
    pass  

This is a new class called `Student`, which so far does nothing but define the data type.
An `instance` of the class can be created like:

In [None]:
student = Student()
student

### Methods

From within the class we can define functions associated with the class, which are called _methods_. The association comes with the help of the keyword `self` and can be called via the Class.method syntax. 

In [None]:
class Student:
    def print_name(self):
        print("Jane Doe")
    
student = Student()
student.print_name()

### Attributes

We can also define persistent variables within the class which are saved inside the instance using the keyword self. Variables saved this way are called attributes to the instance. These are persistent in the class: You can manipulate an attribute in one method and access the updated state in another method.

In [None]:
class Student:
    def set_name(self, name):
        """Saving the function variable as internal attribute using the keyword self"""
        self.name = name
        
    def print_name(self):
        """Print out the value of the name attribute using the keywords self"""
        print(self.name)
    
student = Student() # Create a new Instance
student.set_name("Jane Doe") # Set the name

student.print_name() # Print the name

### Constructor


There is one special function called the `constructor` or the initialization function. It is a special function that is called during instantiation and creates the intial state of the class instance

In [None]:
class Student:
    def __init__(self, name, birthday, course):
        """This method is called during the the Student() class instantiation"""
        self.name = name
        self.birthday = birthday
        self.course = course
        
    def print_name(self):
        print(self.name)

student = Student(name="Jane Doe", birthday = "01-01-1990", course="Linguistische Informatik")
student.print_name()

These function can be arbitrarily complex while having access to all of the classes attributes and methods. For example you can also call functions of a class inside another function of the same class.

In [None]:
class Student:
    def __init__(self, name, birthday, courses):
        """This method is called during the the Student() class instantiation"""
        self.name = name
        self.birthday = birthday
        self.courses = courses
        
    def print_name(self):
        print(self.name)
    
    def print_number_of_courses(self):
        print(len(self.courses))

    def add_course(self, course):
        if(course not in self.courses):
            self.courses.append(course)
        else:
            print(f"Achtung: {course} Kurs bereits belegt")   

    def add_multiple_courses(self, multiple_courses):
        for course in multiple_courses:
            self.add_course(course) 

student = Student(name="Jane Doe", birthday = "01-01-1990", courses=["Linguistische Informatik","Text Mining"])
student.print_name()
student.add_course("Softwarepraktikum")
student.add_multiple_courses(["Datenbanken 1","Technische Informatik", "Text Mining"])
student.print_number_of_courses()
student.courses



### Parameterization

Creating class instances with an initial set of attributes can be a nice way to implement a task that takes a set of parameters and applies them many times to various inputs.
For example we can implement a class that checks if a number is inside a certain interval.

In [None]:
class Intervall:
    def __init__(self, lower, upper):
        if upper < lower:
            tmp = lower
            lower = upper
            upper = tmp
        
        self.lower = lower
        self.upper = upper

    def contains(self, n):
        return self.lower < n and n < self.upper

We can then define the parameters of the intervall once and apply it many times:

In [None]:
intervall = Intervall(5,10)

print(intervall.contains(6))
print(intervall.contains(2))
print(intervall.contains(12))
print(intervall.contains(10))

<div class="alert alert-info">
    
### Excersice 4: 

Implement a class that takes a list of characters (as a string!) as input for its constructor and with a method checks for any given string if that string contains one or more of these characters (ignoring capitalization)!

</div>

In [None]:
class StringCharChecker:
    def __init__(self, characters):
        self.characters = characters.lower()
    def check(self, s):
        return any(c in s.lower() for c in self.characters)
    
scc = StringCharChecker("M!y")

print(scc.check("Natural Language Processing")) # expect False
print(scc.check("Linguistische Informatik")) # expect True
print(scc.check("Happy Birthday!")) # expect True

# References

- https://docs.python.org/3/ 
- https://www.w3schools.com/python/