Skip to content
Merged
Changes from 1 commit
Commits
Show all changes
215 commits
Select commit Hold shift + click to select a range
281cb57
feat: added entity label file loading function
tieandrews May 26, 2023
79fd36f
feat: script to coordinate training process
tieandrews May 29, 2023
f289c36
feat: bash script to run training and set params
tieandrews May 29, 2023
ca9c5fa
feat: script to generate hf formatted data
tieandrews May 29, 2023
dd3696f
docs: updated ner_training target name
tieandrews May 29, 2023
7f4ac68
docs: model training readme
tieandrews May 29, 2023
c64e778
feat: automated utils to evaluate trained models
tieandrews May 30, 2023
7361e91
feat: bash script to run evaluation on checkpoint
tieandrews May 30, 2023
26b20ce
bug: added _init_ to src to make module
tieandrews May 30, 2023
8d69aa6
bug: token based evaluation same as entity based
tieandrews May 30, 2023
2cb379a
feat: added new return objects
tieandrews May 30, 2023
eff149e
docs: updated docs to match new input format
tieandrews May 30, 2023
df022a7
feat: improved error checking and logging
tieandrews May 30, 2023
234aa3d
feat: cleaned up imports and constants
tieandrews May 30, 2023
a5348f8
feat: last imports cleaned
tieandrews May 30, 2023
0944ea3
feat: added custom metrics to mlflow logging
tieandrews May 30, 2023
9cf2140
feat: model training bash script
tieandrews May 30, 2023
1a18d96
feat: basic notebook to use on colab to train
tieandrews May 30, 2023
2357286
docs: initial commit
tieandrews May 30, 2023
9ea5abe
docs: results and models directory setup
tieandrews May 30, 2023
9192ad8
feat: removed unused label2id object
tieandrews May 31, 2023
0153d73
feat: separated folder location setting
tieandrews May 31, 2023
8e31500
bug: setup individual copied objects for pred/true
tieandrews May 31, 2023
1ac2bbc
bug: added defined max token length
tieandrews May 31, 2023
89be2df
bug: added quotes for paths with spaces in colab
tieandrews May 31, 2023
4534ba2
bug: fixed labelled file location with spaces
tieandrews May 31, 2023
f2bf1ac
Merge remote-tracking branch 'origin/dev' into 22-fine-tune-allenai-s…
tieandrews Jun 1, 2023
4cbfaaf
feat: rename and update standard params
tieandrews Jun 1, 2023
5ca03dd
docs: switch default data folder
tieandrews Jun 1, 2023
e7faccb
docs: clean up and make better defaults
tieandrews Jun 1, 2023
fd60e9b
feat: update to use new train/val/test data struct
tieandrews Jun 1, 2023
24eb8fb
feat: setting up __init__ files
tieandrews Jun 1, 2023
e40751c
feat: added final model name field
tieandrews Jun 1, 2023
00dd4bf
feat: added custom model logging at training end
tieandrews Jun 1, 2023
fb534fa
feat: setup huggingface batch inference
tieandrews Jun 1, 2023
de94f40
docs: init file setup
tieandrews Jun 1, 2023
2578015
bug: fixed batch prediction
tieandrews Jun 1, 2023
3f30515
feat: include max_samples by default for local run
tieandrews Jun 1, 2023
bb14221
feat: added better input checking from tests
tieandrews Jun 2, 2023
8ea406b
tests: fixes to complete test suite
tieandrews Jun 2, 2023
0b43a54
docs: added transformers requirements
tieandrews Jun 2, 2023
31c2fc8
bug: moved docopt to main
tieandrews Jun 3, 2023
c135b09
tests: initial basic tests
tieandrews Jun 3, 2023
92df812
feat: colab train notebook autoreload used
tieandrews Jun 3, 2023
da53411
docs: delete model card of specter2
tieandrews Jun 3, 2023
67088c4
tests: added tests for calculate/plot methods
tieandrews Jun 6, 2023
47372b4
files to be merged added
kellywujy Jun 6, 2023
e4a1653
1.0 notebook final
kellywujy Jun 6, 2023
672f0cd
2.0 rerun
kellywujy Jun 6, 2023
8ac7873
json format fix
kellywujy Jun 7, 2023
f024118
4.0 update model eval
kellywujy Jun 7, 2023
ca10b58
feat: added stride to generating overlap windows
tieandrews Jun 8, 2023
37fefaf
feat: added early stopping with patience 5
tieandrews Jun 8, 2023
5357087
feat: added warmup and seed
tieandrews Jun 8, 2023
9521d9c
bug: added load_best_model_at_end for early stop
tieandrews Jun 8, 2023
2bb2628
gdd query script added
kellywujy Jun 9, 2023
f760187
Usage Update
kellywujy Jun 9, 2023
5487b1d
gddid var name fix
kellywujy Jun 9, 2023
c1236be
col name fixed to match gdd API return
kellywujy Jun 9, 2023
c7e78ab
delete duplicate files
kellywujy Jun 9, 2023
cacf7ef
more log info added
kellywujy Jun 9, 2023
9081dde
test model file added
kellywujy Jun 9, 2023
c962d1e
model file update
kellywujy Jun 9, 2023
521a4c0
logger added to gdd api
kellywujy Jun 9, 2023
81abc0a
updated logging info
kellywujy Jun 9, 2023
4f00209
NaN value converted to Null
kellywujy Jun 9, 2023
2d6114c
Updated to new format of input json
shaunhutch Jun 10, 2023
af2c1dc
Update review tool name
brabbit61 Jun 11, 2023
8572650
Update colors
brabbit61 Jun 11, 2023
4616725
Add tabs for current, completed, irrelevant articles
brabbit61 Jun 11, 2023
5ad7181
Add tabs for current, completed, irrelevant articles
brabbit61 Jun 11, 2023
6d205ef
Fix app redirect
brabbit61 Jun 11, 2023
525d6ac
Fix modals
brabbit61 Jun 11, 2023
e07d67b
Add CSS properties
brabbit61 Jun 11, 2023
bf10b86
Create css file
brabbit61 Jun 11, 2023
d53604b
Update relevance score indicator
brabbit61 Jun 11, 2023
df5aa62
Api fix in progress
kellywujy Jun 11, 2023
7c41967
Add functionality to search by term
kellywujy Jun 12, 2023
84640ed
Fix notifications
brabbit61 Jun 12, 2023
1ad53dd
lang detect test added
kellywujy Jun 12, 2023
c9fbe7e
Minor CSS updates
brabbit61 Jun 12, 2023
c20fc99
Added favicon
shaunhutch Jun 12, 2023
8198904
Add IP address
brabbit61 Jun 13, 2023
9fc8e6a
Add more CSS properties
brabbit61 Jun 13, 2023
dd726d0
Minor Bug fix
brabbit61 Jun 13, 2023
54cb5cb
NaN bug fix
kellywujy Jun 13, 2023
b109d38
Update correct button funtionality
brabbit61 Jun 13, 2023
ff8d40d
gdd parquet added
kellywujy Jun 13, 2023
5a6f850
Bug fixes for the demo
brabbit61 Jun 14, 2023
ddd0f4e
Cleaned up code and added docstring
shaunhutch Jun 14, 2023
0c2b098
Added docstrings for the functions
shaunhutch Jun 14, 2023
ca0ae0d
Added Docstrings for functions
shaunhutch Jun 15, 2023
6e7d10e
Added int conversion for sentid
shaunhutch Jun 15, 2023
21f6068
Version checkpoint - no parquet file set up
kellywujy Jun 16, 2023
85bc107
parquet file
kellywujy Jun 16, 2023
ff5090f
detect bug fixed
kellywujy Jun 16, 2023
53b4ad6
Merge branch 'main' into data-review-tool-shaun
shaunhutch Jun 16, 2023
4fb7fee
Added About Markdown
shaunhutch Jun 17, 2023
ef73219
Made the new sentid negative
shaunhutch Jun 17, 2023
331519f
Updated Home Button
shaunhutch Jun 17, 2023
5e40adc
Added Instructions
shaunhutch Jun 17, 2023
bad92cf
test data_review_tool README.md
shaunhutch Jun 17, 2023
d7065fe
Fixed links
shaunhutch Jun 17, 2023
a07bc77
Added demo
shaunhutch Jun 17, 2023
e43e465
typo fix
shaunhutch Jun 17, 2023
a9572c6
bug: added control for when mlflow diasbled
tieandrews Jun 19, 2023
4d979e9
parquet single run set up
kellywujy Jun 19, 2023
fa048e2
Code clean up
brabbit61 Jun 19, 2023
cf4b408
Docker setup for data review tool
brabbit61 Jun 19, 2023
243ca7d
Merge branch 'data-review-tool-shaun' into data-review-tool-jenit
brabbit61 Jun 19, 2023
d6e561c
Fixed paths
shaunhutch Jun 19, 2023
3f238ec
updated href
shaunhutch Jun 19, 2023
446d386
Merge branch 'data-review-tool-shaun' of github.com:NeotomaDB/MetaExt…
shaunhutch Jun 19, 2023
5e56eca
Updated href round 2
shaunhutch Jun 19, 2023
ed39fb4
Add readme for setting up docker
brabbit61 Jun 19, 2023
8ae5d7b
feat: initial NER pipeline framework
tieandrews Jun 6, 2023
84b13b3
docs: minor reformatting
tieandrews Jun 6, 2023
d22ee40
feat: initial data loading functions complete
tieandrews Jun 7, 2023
4b6762b
feat: initial commit on functioning pipeline
tieandrews Jun 10, 2023
b413ba7
docs: rename of file
tieandrews Jun 10, 2023
3f72466
feat: initial commit for hf extraction module
tieandrews Jun 10, 2023
8e913ed
docs: minor import cleanup
tieandrews Jun 10, 2023
8578c35
tests: initial tests commit
tieandrews Jun 10, 2023
9da09b4
tests: sample gdd data for testing
tieandrews Jun 10, 2023
bb68c41
feat: updated log message format
tieandrews Jun 12, 2023
fef96eb
feat: added max_articles argument for testing
tieandrews Jun 12, 2023
90d3a00
feat: initial spacy integration with pipeline
tieandrews Jun 13, 2023
d5ae2e5
bug: spacy model load not working
tieandrews Jun 16, 2023
294160a
feat: added log file output and location definitin
tieandrews Jun 16, 2023
0ffa1e9
docs: initial dockerignore file
tieandrews Jun 16, 2023
608b242
bug: remove nltk stopwords download in script
tieandrews Jun 16, 2023
09cc278
feat: added env variable model selection
tieandrews Jun 16, 2023
3dbb99a
feat: initial functioning docker image
tieandrews Jun 17, 2023
1364987
feat: docker compose for entity entity extraction
tieandrews Jun 17, 2023
042d955
docs: added comments
tieandrews Jun 17, 2023
d37ddc1
feat: initial entity extraction dockerfile
tieandrews Jun 19, 2023
364c282
feat: initial docker-compose file commit
tieandrews Jun 19, 2023
b56ab5f
docs: entity-extraction pipeline README
tieandrews Jun 19, 2023
cd4b53b
feat: made pipeline run with folder of files
tieandrews Jun 19, 2023
a475077
bug: incorrect reference to df
tieandrews Jun 19, 2023
935f94b
bug: added dotenv to requirements
tieandrews Jun 19, 2023
a59ed6d
bug: tries to load model even if it's not there
tieandrews Jun 19, 2023
19d9ef2
bug: added pytorch and transformers
tieandrews Jun 19, 2023
5a9b3b7
bug: stopwords error, made download quiet
tieandrews Jun 19, 2023
18997e7
bug: load journal articles incorrect replace
tieandrews Jun 19, 2023
3e31033
feat: added non-root accessible file paths
tieandrews Jun 19, 2023
e63a649
feat: created non-root user permissions
tieandrews Jun 19, 2023
9b01ccc
docs: added docker run sample details
tieandrews Jun 19, 2023
e8b1605
bug: removed nltk download call
tieandrews Jun 19, 2023
c5cd5a1
bug: move stopwords download inside function
tieandrews Jun 19, 2023
654478e
Docker setup for data review tool
brabbit61 Jun 19, 2023
e479623
Add readme for setting up docker
brabbit61 Jun 19, 2023
8117c44
bug fix
brabbit61 Jun 19, 2023
c338885
Merge branch 'dev' into data-review-tool-shaun
brabbit61 Jun 19, 2023
769072e
Merge pull request #43 from NeotomaDB/data-review-tool-shaun
brabbit61 Jun 19, 2023
babdb8b
feat: updated preprocessing stride length
tieandrews Jun 19, 2023
50ac915
feat: updated grouped_entities to aggregationstrat
tieandrews Jun 19, 2023
af0e2d4
feat: baseline test results
tieandrews Jun 19, 2023
bc669ec
docs: roberta finetune v3 results
tieandrews Jun 19, 2023
386c058
feat: bert-multilanguage results
tieandrews Jun 20, 2023
b84e7ae
feat: specter2 results
tieandrews Jun 20, 2023
7215a39
docs: removed results folder
tieandrews Jun 20, 2023
796aba7
pipeline fixed
kellywujy Jun 20, 2023
fcf6a9a
Update README
brabbit61 Jun 20, 2023
52aa325
Update path
brabbit61 Jun 20, 2023
2e9ea76
Add video
brabbit61 Jun 20, 2023
db6c143
Added Draft Final Report
shaunhutch Jun 20, 2023
45a99f7
Added Acknowledgements and References Section
shaunhutch Jun 20, 2023
e0d8773
Dockerfile setup
kellywujy Jun 21, 2023
6a65aee
xdd placeholder added
kellywujy Jun 21, 2023
6eb378a
optional argument edits
kellywujy Jun 21, 2023
640afe4
Dockerization done
kellywujy Jun 21, 2023
f61ed6d
readme added
kellywujy Jun 21, 2023
85a9d7a
extra text removed
kellywujy Jun 21, 2023
c7db96d
docs: initial ner model results
tieandrews Jun 21, 2023
6173380
docs: remove ignore of results ner folder
tieandrews Jun 21, 2023
cf04ac4
feat: function to load ner model results json
tieandrews Jun 21, 2023
ce5f7cf
feat: function to plot entity counts
tieandrews Jun 21, 2023
92ead9c
bug: fixed entity extraction pipeline diagram
tieandrews Jun 21, 2023
e73261f
feat: function to plot model results
tieandrews Jun 21, 2023
1257ad1
feat: generate markdown table of NER results
tieandrews Jun 21, 2023
8446dfd
bug: fixed deployment pipeline diagram
tieandrews Jun 21, 2023
6151078
docs: minor formatting fixes of tables
tieandrews Jun 21, 2023
0bfc9e1
docs: initial report pdf
tieandrews Jun 21, 2023
7341c11
docker README updated
kellywujy Jun 21, 2023
129fac0
docs: HF model training readme update
tieandrews Jun 21, 2023
1c617e2
docs: update colab notebook
tieandrews Jun 21, 2023
743780b
feat: final roberta model training script
tieandrews Jun 21, 2023
11ace83
Fixed Markdown Tables and added references
shaunhutch Jun 21, 2023
81bc738
feat: automated baseline evaluation script
tieandrews Jun 21, 2023
8d99f29
docs: labelsetudio setup guide
tieandrews Jun 21, 2023
2aa5079
Updated Video
shaunhutch Jun 21, 2023
9211ba3
Merge branch 'dev' into 22-fine-tune-allenai-specter2-model-for-ner
tieandrews Jun 21, 2023
7329086
Relevance Prediction Notebook Cleaned
kellywujy Jun 21, 2023
aa181eb
bug: docopt outside of main
tieandrews Jun 21, 2023
e554317
redundant notebook clean up
kellywujy Jun 21, 2023
91a40db
Remove redundant script
kellywujy Jun 21, 2023
469ff72
Remove local files from the commit
kellywujy Jun 21, 2023
39573c1
main README updated
kellywujy Jun 21, 2023
f737764
BUG: render error
brabbit61 Jun 21, 2023
dfe4b75
Updated About Page
shaunhutch Jun 21, 2023
d76595f
Added dash-player
shaunhutch Jun 21, 2023
fc50bd7
bug: mermaid diagram not rendering in README
tieandrews Jun 21, 2023
e027c9b
Added Directory Structure
shaunhutch Jun 21, 2023
55bf95d
Merge branch 'dev' into kw-pipeline-notebook-merge
shaunhutch Jun 21, 2023
798f4b8
Merge pull request #46 from NeotomaDB/kw-pipeline-notebook-merge
shaunhutch Jun 21, 2023
75d01a1
Merge pull request #47 from NeotomaDB/data-review-tool-shaun
tieandrews Jun 21, 2023
6d43182
Merge pull request #45 from NeotomaDB/22-fine-tune-allenai-specter2-m…
shaunhutch Jun 21, 2023
442f464
bugs: rendered and rearanged for draft submission
tieandrews Jun 21, 2023
d033041
Merge pull request #48 from NeotomaDB/final-report
tieandrews Jun 21, 2023
be08e65
Added sample JSON
shaunhutch Jun 21, 2023
86c415f
docs: added project links to readme
tieandrews Jun 21, 2023
97f142a
bug: fixed static link
tieandrews Jun 21, 2023
42aa06b
docs: update directory structure
tieandrews Jun 21, 2023
38b4f66
bug: docker bug
tieandrews Jun 21, 2023
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Prev Previous commit
Next Next commit
Cleaned up code and added docstring
  • Loading branch information
shaunhutch committed Jun 14, 2023
commit ddd0f4ea31ccfada2d13e208f6fe5062dfd74eee
37 changes: 10 additions & 27 deletions src/data_review_tool/pages/navbar.py
Original file line number Diff line number Diff line change
@@ -1,9 +1,6 @@
import dash
from dash import dcc, html

from dash import html
import dash_bootstrap_components as dbc
import pandas as pd
import json
import dash_mantine_components as dmc
from pages.config import *

def create_navbar():
Expand Down Expand Up @@ -49,31 +46,17 @@ def create_navbar():
)
return navbar

# The following two functions were taken from https://stackoverflow.com/questions/54776916/inverse-of-pandas-json-normalize
def _get_nested_fields(df: pd.DataFrame):
"""Return a list of nested fields, sorted by the deepest level of nesting first."""
nested_fields = [*{field.rsplit(".", 1)[0] for field in df.columns if "." in field}]
nested_fields.sort(key=lambda record: len(record.split(".")), reverse=True)
return nested_fields
def find_start_end_char(text, entity):
"""Find the start and end character of an entity in a text.

def df_denormalize(df: pd.DataFrame) -> pd.DataFrame:
"""
Convert a normalised DataFrame into a nested structure.
Args:
text (str): Text to search for entity.
entity (str): Entity to search for in text.

Fields separated by '.' are considered part of a nested structure.
Returns:
start (int): Start character of entity in text.
end (int): End character of entity in text.
"""
nested_fields = _get_nested_fields(df)
for field in nested_fields:
list_of_children = [column for column in df.columns if field in column]
rename = {
field_name: field_name.rsplit(".", 1)[1] for field_name in list_of_children
}
renamed_fields = df[list_of_children].rename(columns=rename)
df[field] = json.loads(renamed_fields.to_json(orient="records"))
df.drop(list_of_children, axis=1, inplace=True)
return df

def find_start_end_char(text, entity):
start = text.find(entity)
if start == -1:
end = -1
Expand Down