-
Notifications
You must be signed in to change notification settings - Fork 25
Open
Description
Today simpleflow only logs to stdout and with little contextual information on the performed actions. It falls flat when trying to identify logs from a specific workflow or activity at a given point in time, given that machines often perform different tasks simultaneously.
I'd like to add a second handler, optional, to a remote syslog target (think rsyslog remote server, or a hosted service such as logmatic or papertrail).
It would log things in the following format (please criticize):
# for tasks performed "inside" a workflow
<workflow_id>: <activity|decision>#<started_event_id> <log level> pid=<pid> <message>
# for tasks performed "outside" a workflow (workers initialization, etc.)
<message>
# or (tell me?)
simpleflow: <message>
The result would look like this in papertrail or your favorite syslog server:
May 14 10:22:19 127.0.0.1 logger: INFO pid=3817 using task list basic-bcb8a2b0092072ef-ca7d812541f544fe964ded26ae5cc8db
May 14 10:22:19 127.0.0.1 logger: INFO pid=3823 starting <bound method ActivityPoller.start of ActivityPoller(domain=TestDomain, task_list=basic-bcb8a2b0092072ef-ca7d812541f544fe964ded26ae5cc8db)>
May 14 10:22:19 127.0.0.1 logger: INFO pid=3822 starting <bound method DeciderPoller.start of DeciderPoller(TestDomain, basic-bcb8a2b0092072ef-ca7d812541f544fe964ded26ae5cc8db, basic)>
May 14 10:22:19 127.0.0.1 logger: INFO pid=3824 starting ActivityPoller(task_list=basic-bcb8a2b0092072ef-ca7d812541f544fe964ded26ae5cc8db) on domain TestDomain
May 14 10:22:19 127.0.0.1 logger: INFO pid=3825 starting DeciderPoller(workflow=basic) on domain TestDomain
May 14 10:22:19 127.0.0.1 basic-bcb8a2b0092072ef: decision#3 INFO pid=3826 taking decision for workflow basic-bcb8a2b0092072ef (basic)
May 14 10:22:19 127.0.0.1 basic-bcb8a2b0092072ef: decision#3 INFO pid=3826 execution context from decider: {'workflow_id': u'basic-bcb8a2b0092072ef', 'version': u'example', 'name': u'basic', 'run_id': u'22ukk/DSnnnnnnnnnS5OR7mOxdUg1Y=', 'parent_workflow_id': None, 'tag_list': [u'a=1', u'b=foo'], 'parent_run_id': None, 'domain_name': u'TestDomain', 'continued_execution_run_id': None}
May 14 10:22:19 127.0.0.1 basic-bcb8a2b0092072ef: decision#3 INFO pid=3826 1 open activities (1 decisions)
May 14 10:22:19 127.0.0.1 basic-bcb8a2b0092072ef: decision#3 INFO pid=3826 completing decision for workflow basic-bcb8a2b0092072ef (basic)
May 14 10:22:19 127.0.0.1 basic-bcb8a2b0092072ef: activity#6 INFO pid=3827 activity context: {'activity_id': u'activity-examples.basic.increment-1', 'name': u'examples.basic.increment', 'run_id': u'22ukk/DSnnnnnnnnnS5OR7mOxdUg1Y=', 'domain_name': u'TestDomain', 'workflow_id': u'basic-bcb8a2b0092072ef', 'version': u'example', 'input': u'{"args":[1],"kwargs":{}}'}
May 14 10:22:19 127.0.0.1 basic-bcb8a2b0092072ef: decision#9 INFO pid=3828 taking decision for workflow basic-bcb8a2b0092072ef (basic)
May 14 10:22:19 127.0.0.1 basic-bcb8a2b0092072ef: decision#9 INFO pid=3828 execution context from decider: {'workflow_id': u'basic-bcb8a2b0092072ef', 'version': u'example', 'name': u'basic', 'run_id': u'22ukk/DSnnnnnnnnnS5OR7mOxdUg1Y=', 'parent_workflow_id': None, 'tag_list': [u'a=1', u'b=foo'], 'parent_run_id': None, 'domain_name': u'TestDomain', 'continued_execution_run_id': None}
May 14 10:22:20 127.0.0.1 basic-bcb8a2b0092072ef: decision#9 INFO pid=3828 2 open activities (2 decisions)
May 14 10:22:20 127.0.0.1 basic-bcb8a2b0092072ef: decision#9 INFO pid=3828 completing decision for workflow basic-bcb8a2b0092072ef (basic)
May 14 10:22:20 127.0.0.1 basic-bcb8a2b0092072ef: decision#16 INFO pid=3831 taking decision for workflow basic-bcb8a2b0092072ef (basic)
May 14 10:22:20 127.0.0.1 basic-bcb8a2b0092072ef: decision#16 INFO pid=3831 execution context from decider: {'workflow_id': u'basic-bcb8a2b0092072ef', 'version': u'example', 'name': u'basic', 'run_id': u'22ukk/DSnnnnnnnnnS5OR7mOxdUg1Y=', 'parent_workflow_id': None, 'tag_list': [u'a=1', u'b=foo'], 'parent_run_id': None, 'domain_name': u'TestDomain', 'continued_execution_run_id': None}
May 14 10:22:20 127.0.0.1 basic-bcb8a2b0092072ef: decision#16 INFO pid=3831 result of (1 + 1) * 2 = 4
May 14 10:22:20 127.0.0.1 basic-bcb8a2b0092072ef: decision#16 INFO pid=3831 1 open activities (0 decisions)
May 14 10:22:20 127.0.0.1 basic-bcb8a2b0092072ef: decision#16 INFO pid=3831 completing decision for workflow basic-bcb8a2b0092072ef (basic)
May 14 10:22:50 127.0.0.1 basic-bcb8a2b0092072ef: decision#21 INFO pid=3843 taking decision for workflow basic-bcb8a2b0092072ef (basic)
May 14 10:22:50 127.0.0.1 basic-bcb8a2b0092072ef: decision#21 INFO pid=3843 execution context from decider: {'workflow_id': u'basic-bcb8a2b0092072ef', 'version': u'example', 'name': u'basic', 'run_id': u'22u4Mb2YpWB4lVc4ZabGNkk/DSPnRMocS5OR7mOxdUg1Y=', 'parent_workflow_id': None, 'tag_list': [u'a=1', u'b=foo'], 'parent_run_id': None, 'domain_name': u'TestDomain', 'continued_execution_run_id': None}
May 14 10:22:50 127.0.0.1 basic-bcb8a2b0092072ef: decision#21 INFO pid=3843 result of (1 + 1) * 2 = 4
May 14 10:22:50 127.0.0.1 basic-bcb8a2b0092072ef: decision#21 INFO pid=3843 completing decision for workflow basic-bcb8a2b0092072ef (basic)
May 14 10:22:51 127.0.0.1 logger: INFO pid=3823 process: caught signal signal=SIGTERM pid=3823
May 14 10:22:51 127.0.0.1 logger: INFO pid=3823 process: will stop workers, this might take up several minutes. Please, be patient.
May 14 10:22:51 127.0.0.1 logger: INFO pid=3823 process: sending SIGTERM to pid=3824
May 14 10:22:51 127.0.0.1 logger: INFO pid=3823 process: waiting for proces=psutil.Process(pid=3824, name='Python') to finish.
May 14 10:22:51 127.0.0.1 logger: INFO pid=3824 process: caught signal signal=SIGTERM pid=3824
May 14 10:22:51 127.0.0.1 logger: INFO pid=3824 stopping ActivityPoller(task_list=basic-bcb8a2b0092072ef-ca7d812541f544fe964ded26ae5cc8db)
Remarks:
- here
logger
as the default program name, and some logs for abasic-bcb8a2b0092072ef
workflow - domain is supposed to be obvious ; in Botify case domain matches the "environment" and given a specific machine, we know the SWF domain it uses directly ; YMMV
- we could log a handful of other things but logs become very noisy and hard to read by a human (task list, activity id, activity name, etc.) => better have it in a tool above that, and this can always be changed later by making the logging pattern more flexible
- I didn't test how bare client
logging.getLogger().info("foo")
behave, especially in pypy forks when using simpleflow.execute - I tested it with
logging.handlers.SysLogHandler
which has the following limitations: silent errors when logging with TCP (or maybe papertrail's fault?), explicit errors on UDP when above the max packet size (~9k bytes on my machine, but depends on MTU and remote syslog server I guess) ; no TCP+TLS support - logging with syslog introduce another limitation, that "program name" is the "workflow id" in our case ; in syslog program name is limited to 64 chars, but workflow id in SWF is limited to 256 ; in practice we don't have much workflows over 80 chars at Botify and they're all unique when truncated to 52 chars, but YMMV
- the implementation would rely on environment variables which are 1/ non-intrusive and 2/ easy to retrieve in subprocesses, optionally not from simpleflow: any "poll_for_(activity|decision)_task" call would set a few environment variables, that the logger or any other program could use
Thoughts?
Metadata
Metadata
Assignees
Labels
No labels