This project demonstrates the use of SQLAlchemy ORM for modeling relational data in PostgreSQL, along with Alembic for schema migrations, Faker for generating test data.
The app simulates an academic environment with students
, teachers
, groups
, subjects
, and grades
.
Main features:
- SQLAlchemy-based data modeling for Students, Teachers, Groups, Subjects, and Grades.
- Alembic integration for database versioning and migrations.
- Faker-powered data seeding with realistic student records and grades.
- SQL queries implemented via ORM: top students, average grades, group statistics, and more.
- (Not implemented yet) A command-line interface for managing database records with create, list, update, and remove actions.
- Task Requirements
- Task Solution
- Project Setup & Run Instructions
- License
This project implements a database schema using PostgreSQL
and SQLAlchemy
, along with Alembic
for migrations, Faker
for data seeding.
Implement database that includes following tables:
- Students table;
- Groups table;
- Teachers table;
- Subjects table with a reference to the teacher who teaches the subject;
- Table, where each student has grades for subjects, along with the date the grade was received.
Use postgres
database. Run the PostgreSQL database using Docker:
docker run --name some-postgres -p 5432:5432 -e POSTGRES_PASSWORD=mysecretpassword -d postgres
Replace container name some-postgres
and password mysecretpassword
with your custom values to connect to the database.
Implement SQLAlchemy models for the following tables:
- Students table
- Groups table
- Teachers table
- Subjects table(with a foreign key to the teacher who teaches the subject)
- Table with Grades (with a foreign key to student and subject, and a timestamp for when the grade was assigned)
Use Alembic
to create database migrations and apply them to your PostgreSQL database.
Create a script seed.py
that populates the database with realistic random data:
- ~30-50 students
- 3 groups
- 5-8 subjects
- 3-5 teachers
- Up to 20 grades per student across all subjects
Use SQLAlchemy sessions and the Faker library for generating data.
Implement the following 10 SQL queries using SQLAlchemy ORM:
- Retrieve the top 5 students with the highest average grade across all subjects.
- Find the student with the highest average grade in a specific subject.
- Find the average grade per group for a specific subject.
- Find the overall average grade across all grades for cohort students (all grades).
- Find which subjects are taught by a specific teacher.
- Retrieve a list of students in a specific group.
- Retrieve the grades of students in a specific group for a specific subject.
- Find the average grade a specific teacher gives across their subjects.
- List the courses taken by a specific student.
- List the courses a specific teacher teaches to a specific student.
Use SQLAlchemy
sessions to perform all queries.
Organize them into a file my_select.py
, each as a function from select_1()
to select_10()
.
These tasks are optional and not required to pass the assignment.
- Find the average grade a specific teacher gives to a specific student.
- Retrieve grades from the last lesson for students in a specific group and subject.
Instead of using seed.py
, create a full-featured CLI application using the argparse
module to perform CRUD operations on all models.
Use the following command structure:
--action
(or-a
) for specifying the action (create
,list
,update
,remove
)--model
(or-m
) for specifying the target model (e.g.Teacher
,Group
, etc.)
Commands structure:
- Create a teacher:
-action create -m Teacher --name 'Boris Jonson'
- List all teachers:
-action list -m Teacher
- Update a teacher's data with
id=3
:-action update -m Teacher --id 3 --name 'Andry Bezos'
Implement similar commands for all models.
Examples of commands in terminal:
- Create a teacher
py main.py -a create -m Teacher -n 'Boris Jonson'
- Create a group
py main.py -a create -m Group -n 'AD-101'
As a solution to the technical requirements, I created an app that simulates an academic environment with students
, teachers
, groups
, subjects
, and grades
.
The application covers the following steps:
-
Database Modeling (ORM) I defined models using SQLAlchemy to represent all required entities and their relationships:
Base
- Base class for all SQLAlchemy models with shared metadata.BaseModel
- Abstract base class for all models in the system to have common behavior (inherited fromBase
class and UUIDMixin, TimestampMixin, SoftDeleteMixin).UUIDMixin
- Adds a UUID primary keyid
field..TimestampMixin
- Addscreated_at
andupdated_at
timestamp fields.SoftDeleteMixin
- Adds soft-delete fields:is_deleted
anddeleted_at
.
Student
- Represents a student with a reference to their assigned group.Group
- Represents a students group.Teacher
- Represents a teacher with a reference to their assigned subject.Subject
- Represents a subject to study.Grade
- Represents a grade for a student, tied to a group, subject, and specific task. Allows multiple grades per subject if tied to different tasks.
-
Quick Postgres database setup using Docker I set up a simple database in a container using Postgres image in Docker to test my app.
-
Migrations with Alembic I set up Alembic to manage database schema changes and apply them to a PostgreSQL instance.
Entity Relationship Diagram (ERD) to demonstrate entities and their relations:
-
Data Seeding I created a seed.py script to populate the database with realistic, randomly generated data using the Faker library. It may use specific
--seed <number>
flag for randomness predictability for testing and--dry-run
to see generated results without actual write to database. -
Advanced Queries I implemented 10 SQL queries using SQLAlchemy ORM to extract meaningful insights, such as top-performing students, course averages, and teacher-specific statistics, etc.
- Select 1: Find 5 students with highest average among all subjects.
- Select 2: Find the student(s) with the highest average grade in a given subject.
- Select 3: Find the average grade for each group in a specific subject.
- Select 4: Find the overall average grade across all grades for cohort students (all grades)
- Select 5: Find the courses taught by a given teacher.
- Select 6: Find all students in a given group.
- Select 7: Find grades of students in a given group for a specific subject.
- Select 8: Find the overall average grade a given teacher assigns.
- Select 9: Find courses attended by a given student.
- Select 10: Find courses a given teacher teaches to a specific student.
-
CLI Interface (Not implemented yet) Task was to developed a command-line tool using
argparse
to perform CRUD operations on each model, allowing creation, listing, updating, and deletion of records via terminal commands.
This guide will help you set up the environment and run the project.
Before you begin, make sure you have the following installed:
- Python 3.11.* (tested with Python 3.11.13) — Required to run the application locally (outside Docker, if needed).
- Poetry - To manage dependencies in virtual environment.
- Docker using PostgreSQL 13+ with pgcrypto" — Used to containerize the application in a unified environment using Docker or Docker Compose.
- psycopg2-binary - PostgreSQL database adapter for the Python programming language (on Linux may require installation of additional packages
sudo apt install libpg-dev python3-dev
). - (Optional - for local development) Git — To clone the repository, version control and development.
- (Optional - for local development) VS Code or another IDE — Recommended for browsing and editing the project source code and overall development.
- Clone repository:
or download the ZIP archive from GitHub Repository and extract it.
git clone https://github.com/oleksandr-romashko/goit-pythonweb-hw-06 cd goit-pythonweb-hw-06
- Install project dependencies:
poetry install
Dada are stored in Postgres database. To easily setup our own one for testing purposes we may use Docker.
To create Postgres database in Docker container use following command:
docker run --name pg-db-pythonweb-hw-06 -e POSTGRES_USER=app_user -e POSTGRES_PASSWORD=mysecretpassword -e POSTGRES_DB=lms_db -p 5432:5432 -d postgres
where:
pg-db-pythonweb-hw-06
- name of Docker container (you may set your own container name)app_user
- username, setup with admin privilegesmysecretpassword
- password for the user (create your own one)lms_db
- database name to connect to5432:5432
- external/internal container port to access postgres databasepostgres
- image name to base our container on (postgres
in our case)
For app to access data in database we need correctly setup connection configuration. There is config.ini.example serves as a template for that:
[DB]
USER=app_user
PASSWORD=
HOST=localhost
PORT=5432
DB_NAME=lms_db
Make a copy of config.ini.example
and rename it to config.ini
(it should be located in the project root directory).
Then in config.ini
add password and change other values with your own, if necessary. Values should correspond to values used to setup Postgres database in Docker container, otherwise application won't be able to connect to the database.
Now our app should be set up to connect to our database in Docker container.
As soon as we may connect to database we may perform queries. But our database has no tables or structure that conform to one set in ORM yet.
To comply our database with expected structure, execute following command:
poetry run alembic upgrade head
This will add tables with columns to database, that respect latest ORM structure.
Other useful Alembic commands:
poetry run alembic downgrade base
- downgrade database to basic statepoetry run alembic revision --autogenerate -m "revision message"
- automatically create new migration scriptpoetry run alembic history
- show migrations historypoetry run alembic current
- show current revision hash (useful to locate position in entire history)poetry run alembic upgrade <revision hash>
- upgrade to certain revisionpoetry run alembic downgrade <revision hash>
- downgrade to certain revisionpoetry run alembic downgrade -1
- downgrade to the previous revision
At this stage we have empty tables with no data.
To see results of following queries in the next steps, let's add some fake (still relevant) data to database:
poetry run python ./src/scripts/seed.py [--dry-run] [--seed <value>]
You may add following additional flags to the command:
--dry-run
- Run the script without saving or modifying any data in database (see summary of the generated data).--seed <value>
- Set a seed for the random number generator for reproducibility.
This will seed database with random data using Faker
package.
Testing Fot the followng testing you may use seed with value 40:
poetry run python ./src/scripts/seed.py --seed 40
Created tables in the database:
According to task requirements we need to perform 10 queries, located in: my_select.py.
Note: My app uses UUID generated on database side and seeds for random function and Faker, for randomness persistence, but I still couldn't manage to recreate the same results each run and on each machine, so your results may differ and may require tweaking select functions arguments they are called with in my_select.py script. Potential future improvements may fix it.
I consider following future improvements to add new features and fix some issues:
- Implement optional (not required) task for additional more complex queries:
- Find the average grade a specific teacher gives to a specific student.
- Retrieve grades from the last lesson for students in a specific group and subject.
- Implement CLI Interface
Task was to developed a command-line tool using
argparse
to perform CRUD operations on each model, allowing creation, listing, updating, and deletion of records via terminal commands. - Fix randomness for to reproduce test results Currently each time seeding may reproduce different results that introduces additional issues for manual testing and potential automatic testing.
- Consider selects to accept
id
of entity and just return results of query. Interpret, format and print results separately. - As I started to implement GDPR-ready database structure (separate table for personal data), add anonymizeation logic and retrieve all available user data logic. Investigate on methods of personal data handling and protection, required by GDPR.
This project is licensed under the MIT License. You are free to use, modify, and distribute this software in accordance with the terms of the license.