An implementation of the AVL tree to store data. I'm also include basic transaction handling with commit and rollback methods using in-memory snapshots and saving it to a .pkl (binary file)
databasestructure.py
Note: see an example with main.py file
import pickle
from databasestructure import *db = CustomDatabase()Example about how to create a table named users, with columns id, name and age:
db.create_table('users', ['id', 'name', 'age'], key_index=0)Note: key_index is wich column index has the id column, in this case id is on the first column, that's why key_index = 0.
db.insert('users', [None, 'Alice', 30])
db.commit('users')Note: The id column is the index, you use None to use the automatic autoincrement feature
This will look on the id column.
# Select the user with id=0
headers, user = db.select('users', 0)
print(headers)
print(user)# Select the user with age > 25
headers, user = db.select_by_column_value("users", "age", '>', 25)
print(headers)
print(user)Valid operators:
>>=<<===!=
# Select table
headers, table = db.select_all('users')
print(headers)
print(table)# You need the index of the row to be updated
user = db.update('users', 0, [0, 'Alice', 31])
print(user)# Add the column email to users table
db.add_column('users', 'email')
print(db.select_all('users'))# You need the index of the row to be deleted
db.delete('users', 1)Let's say that we have these tables:
- Table
users:
| id | name | age |
|---|---|---|
| 0 | Charlie | 28 |
| 1 | Alice | 30 |
- Table
orders:
| order_id | user_id | product |
|---|---|---|
| 101 | 0 | Laptop |
| 102 | 1 | Smartphone |
| 103 | 0 | Phone |
If we want to join tables, we must do it based on the key_index value.
So, to join table users and table orders by user_id column:
# Perform the join
headers, joined_data = db.join('users', 'orders', key_index1=0, key_index2=1)
# Print the joined data
print("Joined Data:\n")
print(headers)
for data in joined_data:
print(data)key_index1 = 0: This means that the join key for the first table (users) is in the first column (id).key_index2 = 1: This means that the join key for the second table (orders) is in the second column (user_id).
Note: Check test_joining.py file
- Result will be:
| id | name | age | order_id | user_id | product |
|---|---|---|---|---|---|
| 0 | Charlie | 28 | 101 | 0 | Laptop |
| 1 | Alice | 30 | 102 | 1 | Smartphone |
| 0 | Charlie | 28 | 103 | 0 | Phone |
db.insert('users', [None, 'Charlie', 28])
print("Before rollback user with latest id:", db.select_all('users'))
db.rollback('users')
print("After rollback user with latest:", db.select_all('users'))db.insert('users', [None, 'Charlie', 28])
db.commit('users')
print("After commit user with latest id:", db.select_all('users'))# Save the database in binary format
db.save('./avl_database.pkl')# Load the database in binary format
db = CustomDatabase.load('./avl_database.pkl')You must have a .csv with the same structure that the table you want to upload. Let's say table users, you will need a .csv file with a structure like the example:
| id | name | age |
|---|---|---|
| 1 | John | 25 |
| 2 | Jane | 30 |
| 3 | Bob | 22 |
db.load_from_csv("users", "./data.csv")Please check: main_multi_threading.py file
Inside of the dummy_data folder, there a generator of dummy data and main_test.py that will execute basic operations to test the database performance:
The results may vary due to CPU capabilities, and if you want to use multi-thread. For test support i used only 1 thread and a 11th Gen Intel(R) Core(TM) i3-1115G4 @ 3.00GHz CPU (very normal CPU).
Used 10,000 dummy users and 20,000 orders.
| Test | AVL DB (seconds) | MySQL (seconds) |
|---|---|---|
| Creation of both tables | 0.0 | - |
| Loaded 10,000 users from .csv to memory RAM | 0.12057 | - |
| Loaded 20,000 orders from .csv to memory RAM | 0.29318 | - |
| Write 30,000 rows from RAM to .pkl file | 0.09558 | - |
| Load 30,000 rows from .pkl to memory RAM | 0.10949 | - |
| From 10,000 users select those older than 25 yeard old | 0.00709 | - |
Joining users and orders tables, total 17,358 rows |
0.16644 | - |