Notes for WeeWX developers¶
This guide is intended for developers contributing to the open source project WeeWX.
Goals¶
The primary design goals of WeeWX are:
-
Architectural simplicity. No semaphores, no named pipes, no inter-process communications, no complex multi-threading to manage.
-
Extensibility. Make it easy for the user to add new features or to modify existing features.
-
Fast enough In any design decision, architectural simplicity and elegance trump speed.
-
One code base. A single code base should be used for all platforms, all weather stations, all reports, and any combination of features. Ample configuration and customization options should be provided so the user does not feel tempted to start hacking code. At worse, the user may have to subclass, which is much easier to port to newer versions of the code base, than customizing the base code.
-
Minimal dependencies. The code should rely on a minimal number of external packages, so the user does not have to go chase them down all over the Web before getting started.
-
Simple data model. The implementation should use a very simple data model that is likely to support many different types of hardware.
-
A pythonic code base. The code should be written in a style that others will recognize.
Strategies¶
To meet these goals, the following strategies were used:
-
A micro-kernel design. The WeeWX engine actually does very little. Its primary job is to load and run services at runtime, making it easy for users to add or subtract features.
-
A largely stateless design style. For example, many of the processing routines read the data they need directly from the database, rather than caching it and sharing with other routines. While this means the same data may be read multiple times, it also means the only point of possible cache incoherence is through the database, where transactions are easily controlled. This greatly reduces the chances of corrupting the data, making it much easier to understand and modify the code base.
-
Isolated data collection and archiving. The code for collecting and archiving data run in a single thread that is simple enough that it is unlikely to crash. The report processing is where most mistakes are likely to happen, so isolate that in a separate thread. If it crashes, it will not affect the main data thread.
-
A powerful configuration parser. The ConfigObj module, by Michael Foord and Nicola Larosa, was chosen to read the configuration file. This allows many options that might otherwise have to go in the code, to be in a configuration file.
-
A powerful templating engine. The Cheetah module was chosen for generating html and other types of files from templates. Cheetah allows search list extensions to be defined, making it easy to extend WeeWX with new template tags.
-
Pure Python. The code base is 100% Python — no underlying C libraries need be built to install WeeWX. This also means no Makefiles are needed.
While WeeWX is nowhere near as fast at generating images and HTML as its predecessor, wview (this is partially because WeeWX uses fancier fonts and a much more powerful templating engine), it is fast enough for all platforms but the slowest. I run it regularly on a 500 MHz machine where generating the 9 images used in the Current Conditions page takes just under 2 seconds (compared with 0.4 seconds for wview).
All writes to the databases are protected by transactions. You can kill the program at any time without fear of corrupting the databases.
The code makes ample use of exceptions to insure graceful recovery from problems such as network outages. It also monitors socket and console timeouts, restarting whatever it was working on several times before giving up. In the case of an unrecoverable console error (such as the console not responding at all), the program waits 60 seconds then restarts the program from the top.
Any hard exceptions, that is those that do not involve network and console timeouts and are most likely due to a logic error, are logged, reraised, and ultimately cause thread termination. If this happens in the main thread (not likely due to its simplicity), then this causes program termination. If it happens in the report processing thread (much more likely), then only the generation of reports will be affected — the main thread will continue downloading data off the instrument and putting them in the database.
Units¶
In general, there are three different areas where the unit system makes a difference:
-
On the weather station hardware. Different manufacturers use different unit systems for their hardware. The Davis Vantage series use U.S. Customary units exclusively, Fine Offset and LaCrosse stations use metric, while Oregon Scientific, Peet Bros, and Hideki stations use a mishmash of US and metric.
-
In the database. Either US or Metric can be used.
-
In the presentation (i.e., html and image files).
The general strategy is that measurements are converted by service
StdConvert
as they come off the weather station into a target
unit system, then stored internally in the database in that unit system.
Then, as they come off the database to be used for a report, they are
converted into a target unit, specified by a combination of the
configuration file weewx.conf
and the skin configuration file
skin.conf
.
Value None
¶
The Python special value None
is used throughout to signal an
invalid or bad data point. All functions must be written to expect it.
Device drivers should be written to emit None
if a data value
is bad (perhaps because of a failed checksum). If the hardware simply
doesn't support a data type, then the driver should not emit a value at
all.
The same rule applies to derived values. If the input data for a derived
value are missing, then no derived value should be emitted. However, if
the input values are present, but have value None
, then the
derived value should be set to None
.
However, the time value must never be None
. This is because it
is used as the primary key in the SQL database.
Time¶
WeeWX stores all data in UTC (roughly, Greenwich or Zulu) time. However, usually one is interested in weather events in local time and want image and HTML generation to reflect that. Furthermore, most weather stations are configured in local time. This requires that many data times be converted back and forth between UTC and local time. To avoid tripping up over time zones and daylight savings time, WeeWX generally uses Python routines to do this conversion. Nowhere in the code base is there any explicit recognition of DST. Instead, its presence is implicit in the conversions. At times, this can cause the code to be relatively inefficient.
For example, if one wanted to plot something every 3 hours in UTC time, it would be very simple: to get the next plot point, just add 10,800 to the epoch time:
next_ts = last_ts + 10800
But, if one wanted to plot something for every 3 hours in local time
(that is, at 0000, 0300, 0600, etc.), despite a possible DST change in
the middle, then things get a bit more complicated. One could modify the
above to recognize whether a DST transition occurs sometime between
last_ts
and the next three hours and, if so, make the necessary
adjustments. This is generally what wview
does. WeeWX takes a
different approach and converts from UTC to local, does the arithmetic,
then converts back. This is inefficient, but bulletproof against changes
in DST algorithms, etc.:
time_dt = datetime.datetime.fromtimestamp(last_ts)
delta = datetime.timedelta(seconds=10800)
next_dt = time_dt + delta
next_ts = int(time.mktime(next_dt.timetuple()))
Other time conversion problems are handled in a similar manner.
For astronomical calculations, WeeWX uses the latitude and longitude specified in the configuration file. If that location does not correspond to the computer's local time, reports with astronomical times will probably be incorrect.
Archive records¶
An archive record's timestamp, whether in software or in the database, represents the end time of the record. For example, a record timestamped 05-Feb-2016 09:35, includes data from an instant after 09:30, through 09:35. Another way to think of it is that it is exclusive on the left, inclusive on the right. Schematically:
09:30 < dateTime <= 09:35
Database queries should reflect this. For example, to find the maximum temperature for the hour between timestamps 1454691600 and 1454695200, the query would be:
SELECT MAX(outTemp) FROM archive
WHERE dateTime > 1454691600 and dateTime <= 1454695200;
This ensures that the record at the beginning of the hour (1454691600) does not get included (it belongs to the previous hour), while the record at the end of the hour (1454695200) does.
One must be constantly be aware of this convention when working with timestamped data records.
Better yet, if you need this kind of information, use an xtypes call:
max_temp = weewx.xtypes.get_aggregate('outTemp',
(1454691600, 1454695200),
'max',
db_manager)
It will not only make sure the limits of the query are correct, but will also decide whether the daily summary optimization can be used (details below). If not, it will use the regular archive table.
Internationalization¶
Generally, WeeWX is locale aware. It will emit reports using the local formatting conventions for date, times, and values.
Exceptions¶
In general, your code should not simply swallow an exception. For example, this is bad form:
try:
os.rename(oldname, newname)
except:
pass
While the odds are that if an exception happens it will be because the
file oldname
does not exist, that is not guaranteed. It could
be because of a keyboard interrupt, or a corrupted file system, or
something else. Instead, you should test explicitly for any expected
exception, and let the rest go by:
try:
os.rename(oldname, newname)
except OSError:
pass
WeeWX has a few specialized exception types, used to rationalize all
the different types of exceptions that could be thrown by the underlying
libraries. In particular, low-level I/O code can raise a myriad of
exceptions, such as USB errors, serial errors, network connectivity
errors, etc. All device drivers should catch these exceptions and
convert them into an exception of type WeeWxIOError
or one of
its subclasses.
Naming conventions¶
How you name variables makes a big difference in code readability. In general, long names are preferable to short names. Instead of this,
p = 990.1
use this,
pressure = 990.1
or, even better, this:
pressure_mbar = 990.1
WeeWX uses a number of conventions to signal the variable type, although they are not used consistently.
Suffix | Example | Description |
_ts | first_ts | Variable is a timestamp in unix epoch time. |
_dt | start_dt | Variable is an instance of datetime.datetime, usually in local time. |
_d | end_d | Variable is an instance of datetime.date, usually in local time. |
_tt | sod_tt | Variable is an instance of time.struct_time (a time tuple), usually in local time. |
_vh | pressure_vh | Variable is an instance of weewx.units.ValueHelper. |
_vt | speed_vt | Variable is an instance of weewx.units.ValueTuple. |
Code style¶
Generally, we try to follow the PEP 8 style guide, but there are many exceptions. In particular, many older WeeWX function names use camelCase, but PEP 8 calls for snake_case. Please use snake_case for new code.
Most modern code editors, such as Eclipse, or PyCharm, have the ability to automatically format code. Resist the temptation and don't use this feature! Two reasons:
-
Unless all developers use the same tool, using the same settings, we will just thrash back and forth between slightly different versions.
-
Automatic formatters play a useful role, but some of what they do are really trivial changes, such as removing spaces in otherwise blank lines. Now if someone is trying to figure out what real, syntactic, changes you have made, s/he will have to wade through all those extraneous changed lines, trying to find the important stuff.
If you are working with a file where the formatting is so ragged that you really must do a reformat, then do it as a separate commit. This allows the formatting changes to be clearly distinguished from more functional changes.
When invoking functions or instantiating classes, use the fully qualified name. Don't do this:
from datetime import datetime
now = datetime()
Instead, do this:
import datetime
now = datetime.datetime()
Git work flow¶
We use Git as the source control system. If Git is still mysterious to you, bookmark this: Pro Git, then read the chapter Git Basics. Also recommended is the article How to Write a Git Commit Message.
The code is hosted on GitHub. Their documentation is very extensive and helpful.
We generally follow Vincent Driessen's branching model. Ignore the complicated diagram at the beginning of the article, and just focus on the text. In this model, there are two key branches:
-
'master'. Fixes go into this branch. We tend to use fewer hot fix branches and, instead, just incorporate any fixes directly into the branch. Releases are tagged relative to this branch.
-
'development' (called
develop
in Vince's article). This is where new features go. Before a release, they will be merged into themaster
branch.
What this means to you is that if you submit a pull request that
includes a new feature, make sure you commit your changes relative to
the development branch. If it is just a bug fix, it should be
committed against the master
branch.
Forking the repository¶
The WeeWX GitHub repository is configured to use
GitHub Actions
to run Continuous Integration (CI) workflows automatically if certain
git
operations are done on branches under active development.
This means that CI workflows will also be run on any forks that you may have
made if the configured git
action is done. This can be confusing if you get
an email from GitHub if these tasks fail for some reason on your fork.
To control GitHub Actions for your fork, see the recommended solutions in this GitHub discussion on this topic.
Tools¶
Python¶
JetBrain's PyCharm is exellent, and now there's a free Community Edition. It has many advanced features, yet is structured that you need not be exposed to them until you need them. Highly recommended.
HTML and Javascript¶
For Javascript, JetBrain's WebStorm is excellent, particularly if you will be using a framework such as Node.js or Express.js.
Daily summaries¶
This section builds on the section The database in the Customization Guide. Read it first.
The big flat table in the database (usually called table
archive
) is the definitive table of record. While it includes a
lot of information, querying it can be slow. For example, to find the
maximum temperature of the year would require scanning the entire table,
which might include 100,000 or more records. To speed things up, WeeWX
includes daily summaries in the database as an optimization.
In the daily summaries, each observation type gets its own table, which
holds a statistical summary for the day. For example, for outside
temperature observation type outTemp
, this table would be
named archive_day_outTemp
. This is what it would look like:
dateTime | min | mintime | max | maxtime | sum | count | wsum | sumtime |
1652425200 | 44.7 | 1652511600 | 56.0 | 1652477640 | 38297.0 | 763 | 2297820.0 | 45780 |
1652511600 | 44.1 | 1652531280 | 66.7 | 1652572500 | 76674.4 | 1433 | 4600464.0 | 85980 |
1652598000 | 50.3 | 1652615220 | 59.8 | 1652674320 | 32903.0 | 611 | 1974180.0 | 36660 |
... | ... | ... | ... | ... | ... | ... | ... | ... |
This is what the table columns mean:
Name | Meaning |
dateTime | The time of the start of day in unix epoch time. This is the primary key in the database. It must be unique, and it cannot be null. |
min | The minimum temperature seen for the day. The unit is whatever unit system the main archive table uses (generally given by the first record in the table). |
mintime | The time in unix epoch time of the minimum temperature. |
max | The maximum temperature seen for the day. The unit is whatever unit system the main archive table uses (generally given by the first record in the table). |
maxtime | The time in unix epoch time of the maximum temperature. |
sum | The sum of all the temperatures for the day. |
count | The number of records in the day. |
wsum | The weighted sum of all the temperatures for the day. The weight is the archive interval in seconds. That is, for each record, the temperature is multiplied by the length of the archive record in seconds, then summed up. |
sumtime | The sum of all the archive intervals for the day in seconds. If the archive interval didn't change during the day, then this number would be interval * 60 * count. |
Note how the average temperature for the day can be calculated as wsum
/ sumtime
. This will be true even if the archive interval
changes during the day.
Now consider an extensive variable such as rain
. The total
rainfall for the day will be given by the field sum
. So,
calculating the total rainfall for the year can be done by scanning and
summing only 365 records, instead of potentially tens, or even hundreds,
of thousands of records. This results in a dramatic speed up for report
generation, particularly on slower machines such as the Raspberry Pi,
working off an SD card.
Wind summaries¶
The daily summary for wind includes six additional fields. This is what they mean:
Name | Meaning |
max_dir | The direction of the maximum wind seen for the day. |
xsum | The sum of the x-component (east-west) of the wind for the day. |
ysum | The sum of the y-component (north-south) of the wind for the day. |
dirsumtime | The sum of all the archive intervals for the day in seconds, which contributed to xsum and ysum. |
squaresum | The sum of the wind speed squared for the day. |
wsquaresum | The sum of the weighted wind speed squared for the day. That is the wind speed is squared, then multiplied by the archive interval in seconds, then summed for the day. This is useful for calculating RMS wind speed. |
Note that the RMS wind speed can be calculated as
math.sqrt(wsquaresum / sumtime)
Glossary¶
This is a glossary of terminology used throughout the code.
Name | Description |
archive interval | WeeWX does not store the raw data that comes off a weather station. Instead, it aggregates the data over a length of time, the archive interval, and then stores that. |
archive record | While packets are raw data that comes off the weather station, records are data aggregated by time. For example, temperature may be the average temperature over an archive interval. These are the data stored in the SQL database |
config_dict | All configuration information used by WeeWX is stored in the configuration file, usually with the name weewx.conf. By convention, when this file is read into the program, it is called config_dict, an instance of the class configobj.ConfigObj. |
datetime | An instance of the Python object datetime.datetime. Variables of type datetime usually have a suffix _dt. |
db_dict |
A dictionary with all the data necessary to bind to a database. An example for
SQLite would be
{ 'driver':'db.sqlite', 'SQLITE_ROOT':'/home/weewx/archive', 'database_name':'weewx.sdb' }An example for MySQL would be { 'driver':'db.mysql', 'host':'localhost', 'user':'weewx', 'password':'mypassword', 'database_name':'weewx' } |
epoch time | Sometimes referred to as "unix time," or "unix epoch time." The number of seconds since the epoch, which is 1 Jan 1970 00:00:00 UTC. Hence, it always represents UTC (well... after adding a few leap seconds... but, close enough). This is the time used in the databases and appears as type dateTime in the SQL schema, perhaps an unfortunate name because of the similarity to the completely unrelated Python type datetime. Very easy to manipulate, but it is a big opaque number. |
LOOP packet | The real-time data coming off the weather station. The terminology "LOOP" comes from the Davis series of weather stations. A LOOP packet can contain all observation types, or it may contain only some of them ("Partial packet"). |
observation type | A physical quantity measured by a weather station (e.g., outTemp) or something derived from it (e.g., dewpoint). |
skin_dict | All configuration information used by a particular skin is stored in the skin configuration file, usually with the name skin.conf. By convention, when this file is read into the program, it is called skin_dict, an instance of the class configobj.ConfigObj. |
SQL type | A type that appears in the SQL database. This usually looks something like outTemp, barometer, extraTemp1, and so on. |
standard unit system | A complete set of units used together. Either US, METRIC, or METRICWX. |
time stamp | A variable in unix epoch time. Always in UTC. Variables carrying a time stamp usually have a suffix _ts. |
tuple-time | An instance of the Python object time.struct_time. This is a 9-way tuple that represent a time. It could be in either local time or UTC, though usually the former. See module time for more information. Variables carrying tuple time usually have a suffix _tt. |
value tuple | A 3-way tuple. First element is a value, second element the unit type the value is in, the third the unit group. An example would be (21.2, 'degree_C', 'group_temperature'). |
WEEWX_ROOT | The location of the station data area. Unfortunately, due to legacy reasons, its value can be interpreted two different ways. In the configuration file weewx.conf, its value is relative to the location of the configuration file. In memory, in the data structure config_dict, it is an absolute path. |