|
1 | | -# [ Home ](https://github.com/STI-Team/ontogen) / ontogen |
2 | | -OnotoGen: Software system for analyzing and transforming spreadsheets into ontologies and related data sets |
3 | | -This document describes the command-line interface for the <code>OntoGen(ont)</code> system. |
| 1 | +# OntoGen |
4 | 2 |
|
5 | | -## Installation Instructions |
| 3 | +A command-line software called **OntoGen** for analysis and transformation of source spreadsheet data (CSV) to ontology (OWL/XML). |
| 4 | + |
| 5 | +## Version |
| 6 | + |
| 7 | +1.1 |
| 8 | + |
| 9 | +## Preliminaries |
| 10 | + |
| 11 | +A source (input) spreadsheet represents a set of same type entities in a relational form (a subset of the Cartesian product of *K*-data domains), where: |
| 12 | +1. *Attribute (a column name)* is a name of a data domain in a relationship schema; |
| 13 | +2. *Metadata (a schema)* is an ordered set of *K*-attributes of a relational table; |
| 14 | +3. *Tuple (a record)* is an ordered set of *K*-atomic values (one for each attribute of a relation); |
| 15 | +4. *Data (a recordset)* is a set of tuples of a relational table. |
| 16 | + |
| 17 | +A spreadsheet of same type entities (*a canonicalized form*) is a relational table in the third normal form (3NF), which contains an ordered set of *N*-rows and *M*-columns. |
| 18 | + |
| 19 | +A table represents a set of entities of the same type, where: |
| 20 | +1. *Categorical column or Named entities column (NE-column)* contains names (text mentions) of some named entities; |
| 21 | +2. *Literal column (L-column)* contains literal values (e.g. dates, numbers); |
| 22 | +3. *Subject (thematic) column (S-column)* is a *NE*-column represented as a potential primary key and defines a subject of a source table; |
| 23 | +4. *Another (non-subject) columns* represent entity properties including their relationships with other entities. |
| 24 | + |
| 25 | +**Assumption 1.** *The first row of a source spreadsheet is a header containing attribute (column) names.* |
| 26 | + |
| 27 | +**Assumption 2.** *All values of column cells in a source spreadsheet have same entity types and data types.* |
| 28 | + |
| 29 | +**Assumption 3.** *Source spreadsheets should be presented in the CSV format.* |
| 30 | + |
| 31 | +**OntoGen** supports the process of ontology engineering based on spreadsheet data transformation. |
| 32 | + |
| 33 | +**Assumption 4.** *A target ontology is presented in the [OWL2 DL](https://www.w3.org/TR/owl2-overview/) format.* |
| 34 | + |
| 35 | +## Installation |
| 36 | + |
| 37 | +First, you need to clone the project into your directory: |
6 | 38 |
|
7 | | -Run the following commands in order in a cmd.exe |
8 | 39 | ``` |
9 | | -git clone https://github.com/STI-Team/ontogen |
10 | | -cd ontogen |
| 40 | +git clone https://github.com/Lab42-Team/ontogen.git |
| 41 | +``` |
| 42 | + |
| 43 | +Next, you need to install all requirements for this project: |
11 | 44 |
|
12 | | -python3 -m venv ont_env |
13 | | -ont_env\Scripts\activate.bat |
14 | | -python.exe -m pip install --upgrade pip |
| 45 | +``` |
15 | 46 | pip install -r requirements.txt |
16 | 47 | ``` |
17 | | -If python3 is not installed, find out what version of python 3 is installed and use that instead |
18 | | -### Usage: python main.py [OPTIONS] |
| 48 | + |
| 49 | +*We recommend you to use Python 3.0 or more.* |
| 50 | + |
| 51 | +## Directory Structure |
| 52 | + |
| 53 | +* `datasets` contains datasets of source spreadsheets in the CSV format: |
| 54 | + * `tough-tables` contains [Tough Tables (2T)](https://zenodo.org/record/4246370#.Yf5AO-pBw2w) dataset, where noise spreadsheets are excluded; |
| 55 | + * `wiki-uku-49` contains spreadsheets describing the main concepts and relationships in the field of education, in particular, universities in the United Kingdom (see [wiki-UKU-49: United Kingdom Universities from Wikipedia](https://data.mendeley.com/datasets/33v9tk6jjb/1)); |
| 56 | + * `isi-167e` contains spreadsheets describing the main concepts and relationships in the field of Industrial Safety Inspection (see [ISI-167E: Entity spreadsheet tables](https://data.mendeley.com/datasets/3gjy46mx88/1)). |
| 57 | +* `examples` contains spreadsheet examples for testing. |
| 58 | +* `ontogen` contains software modules (py-scripts), including `main.py`. |
| 59 | +* `results` contains processing results (target ontologies). |
| 60 | + |
| 61 | +## Usage |
| 62 | + |
| 63 | +#### Usage: python main.py [OPTIONS] |
19 | 64 | **Options:** |
20 | 65 | - `--name=c:\userpath` -- Create ontologies |
21 | | -### A Simple Example |
| 66 | +#### A simple example |
22 | 67 | ``` |
23 | | -python main.py --name=f:\test |
24 | | -
|
| 68 | +python main.py --name=C:/test |
25 | 69 | ``` |
26 | 70 | or |
27 | 71 |
|
28 | 72 | ``` |
29 | 73 | python main.py |
30 | | -Your path name: f:\test |
31 | | -
|
| 74 | +Your path to source spreadsheets: C:/test |
32 | 75 | ``` |
| 76 | + |
| 77 | +## Authors |
| 78 | + |
| 79 | +* [Daria A. Denisova](mailto:daryalich@mail.ru) |
| 80 | +* [Nikita O. Dorodnykh](mailto:tualatin32@mail.ru) |
0 commit comments