Major Refactor: Code Modernization, New Features, and Documentation #3

jesper-olsen · 2025-09-05T12:30:49Z

Hi Michael,

Thank you for creating this excellent educational repository for decision trees.

As I was working with the code, I made some updates to modernise it, improve its usability, and add proper documentation. I wanted to offer these changes back to the original project in case they are useful to you or to future learners who find this repo.

Here is a summary of the improvements:

Code Modernization & Refactoring:

Refactored uniqueCounts, entropy, and gini to use the more efficient and Pythonic collections.Counter and math.log2.
Added guard clauses to entropy and gini to prevent ZeroDivisionError.
Improved overall code formatting, readability, and added docstrings to all major functions.

New Features & Usability:

Command-Line Interface: Integrated argparse to allow users to select the dataset and criterion (gini/entropy) from the command line.
Graphviz Plotting: Added a --plot option to generate a tree visualisation and save it to file (e.g. .png format).
Improved Classification Output: Created a dedicated function to print classification results in a clear, human-readable, and deterministic format.

Project Structure & Documentation:

New README: Added a comprehensive README.md with instructions on prerequisites, installation, and usage.
Better File Naming: Renamed the main script from implementation.py to decision_tree.py to follow standard Python conventions.
Organized Directory Structure: Moved data files (.csv) into a data/ directory to separate code from data.

No pressure to merge this, as I know the project is not actively maintained, but I wanted to offer these improvements back. Thanks again for the great resource!

The member variable `results` in the `DecisionNode` class was ambiguous and required a comment to explain its purpose. This commit renames it to `class_counts` to be more descriptive and self-documenting, reflecting that it holds the distribution of training samples in a leaf node. The conditional checks have also been updated to be more idiomatic (e.g., `if node.class_counts:`).

…xport

+ adult dataset

Improves the core algorithm's efficiency by refactoring the impurity and pruning calculations to operate on Counter objects directly, avoiding the creation of large temporary data lists. - Adopts PEP 8 naming conventions and adds type hints. - Fixes a potential ZeroDivisionError in missing data classification. - Simplifies pre-pruning logic in the tree growth algorithm.

jesper-olsen added 24 commits September 4, 2025 18:59

-tab -plot

a237f43

+argparse

4a9d9ca

-defaultdict

7050dcb

+ criterion flag (entropy/gini)

17f8c55

+graphviz plot, README.md

3da31e4

doc

abf6e36

+assets, data

d810a7c

graphviz: +header, +impurity

17ccd23

classify, prune -> DecisionNode

f3ac593

snake case: grow_decision_tree

59eada7

Refactor decision tree into DecisionTree class, simplify training & e…

aff4034

…xport

new file

8dbb71c

+wine_quality + k-fold

fdb3036

doc

8df9587

doc

d0a6f4a

+ max_dept, min_samples pre-pruning

a2e9cea

+ adult dataset

doc

63e4fb0

doc

6f0cf9d

PEP8

b5509dd

match

3d0f752

PEP8, type hints

276f677

PEP8 + dt size

3879c5f

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Major Refactor: Code Modernization, New Features, and Documentation #3

Major Refactor: Code Modernization, New Features, and Documentation #3

Uh oh!

jesper-olsen commented Sep 5, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Major Refactor: Code Modernization, New Features, and Documentation #3

Are you sure you want to change the base?

Major Refactor: Code Modernization, New Features, and Documentation #3

Uh oh!

Conversation

jesper-olsen commented Sep 5, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant