Version: 4.9
Notes for Developers of the WeeWX Weather System

This guide is intended for developers contributing to the open source project WeeWX.

Goals

The primary design goals of WeeWX are:

Strategies

To meet these goals, the following strategies were used:

While WeeWX is nowhere near as fast at generating images and HTML as its predecessor, wview (this is partially because WeeWX uses fancier fonts and a much more powerful templating engine), it is fast enough for all platforms but the slowest. I run it regularly on a 500 MHz machine where generating the 9 images used in the "Current Conditions" page takes just under 2 seconds (compared with 0.4 seconds for wview).

All writes to the databases are protected by transactions. You can kill the program at any time (either Control-C if run directly or "/etc/init.d/weewx stop" if run as a daemon) without fear of corrupting the databases.

The code makes ample use of exceptions to insure graceful recovery from problems such as network outages. It also monitors socket and console timeouts, restarting whatever it was working on several times before giving up. In the case of an unrecoverable console error (such as the console not responding at all), the program waits 60 seconds then restarts the program from the top.

Any "hard" exceptions, that is those that do not involve network and console timeouts and are most likely due to a logic error, are logged, reraised, and ultimately cause thread termination. If this happens in the main thread (not likely due to its simplicity), then this causes program termination. If it happens in the report processing thread (much more likely), then only the generation of reports will be affected — the main thread will continue downloading data off the instrument and putting them in the database.

Units

In general, there are three different areas where the unit system makes a difference:

  1. On the weather station hardware. Different manufacturers use different unit systems for their hardware. The Davis Vantage series use U.S. Customary units exclusively, Fine Offset and LaCrosse stations use metric, while Oregon Scientific, Peet Bros, and Hideki stations use a mishmash of US and metric.
  2. In the database. Either US or Metric can be used.
  3. In the presentation (i.e., html and image files).

The general strategy is that measurements are converted by service StdConvert as they come off the weather station into a target unit system, then stored internally in the database in that unit system. Then, as they come off the database to be used for a report, they are converted into a target unit, specified by a combination of the configuration file weewx.conf and the skin configuration file skin.conf.

Value "None"

The Python special value None is used throughout to signal an invalid or bad data point. All functions must be written to expect it.

Device drivers should be written to emit None if a data value is bad (perhaps because of a failed checksum). If the hardware simply doesn't support a data type, then the driver should not emit a value at all.

The same rule applies to derived values. If the input data for a derived value are missing, then no derived value should be emitted. However, if the input values are present, but have value None, then the derived value should be set to None.

However, the time value must never be None. This is because it is used as the primary key in the SQL database.

Time

WeeWX stores all data in UTC (roughly, "Greenwich" or "Zulu") time. However, usually one is interested in weather events in local time and want image and HTML generation to reflect that. Furthermore, most weather stations are configured in local time. This requires that many data times be converted back and forth between UTC and local time. To avoid tripping up over time zones and daylight savings time, WeeWX generally uses Python routines to do this conversion. Nowhere in the code base is there any explicit recognition of DST. Instead, its presence is implicit in the conversions. At times, this can cause the code to be relatively inefficient.

For example, if one wanted to plot something every 3 hours in UTC time, it would be very simple: to get the next plot point, just add 10,800 to the epoch time:

next_ts = last_ts + 10800 

But, if one wanted to plot something for every 3 hours in local time (that is, at 0000, 0300, 0600, etc.), despite a possible DST change in the middle, then things get a bit more complicated. One could modify the above to recognize whether a DST transition occurs sometime between last_ts and the next three hours and, if so, make the necessary adjustments. This is generally what wview does. WeeWX takes a different approach and converts from UTC to local, does the arithmetic, then converts back. This is inefficient, but bulletproof against changes in DST algorithms, etc:

time_dt = datetime.datetime.fromtimestamp(last_ts)
delta = datetime.timedelta(seconds=10800)
next_dt = time_dt + delta
next_ts = int(time.mktime(next_dt.timetuple()))

Other time conversion problems are handled in a similar manner.

For astronomical calculations, WeeWX uses the latitude and longitude specified in the configuration file. If that location does not correspond to the computer's local time, reports with astronomical times will probably be incorrect.

Archive records

An archive record's timestamp, whether in software or in the database, represents the end time of the record. For example, a record timestamped 05-Feb-2016 09:35, includes data from an instant after 09:30, through 09:35. Another way to think of it is that it is exclusive on the left, inclusive on the right. Schematically:

09:30 < dateTime <= 09:35

Database queries should reflect this. For example, to find the maximum temperature for the hour between timestamps 1454691600 and 1454695200, the query would be:

SELECT MAX(outTemp) FROM archive WHERE dateTime > 1454691600 and dateTime <= 1454695200;

This ensures that the record at the beginning of the hour (1454691600) does not get included (it belongs to the previous hour), while the record at the end of the hour (1454695200) does.

One must be constantly be aware of this convention when working with timestamped data records.

Better yet, if you need this kind of information, use an xtypes call:

max_temp = weewx.xtypes.get_aggregate('outTemp',
                                      (1454691600, 1454695200),
                                      'max',
                                      db_manager)

It will not only make sure the limits of the query are correct, but will also decide whether or not the daily summary optimization can be used (details below). If not, it will use the regular archive table.

Internationalization

Generally, WeeWX is locale aware. It will emit reports using the local formatting conventions for date, times, and values.

Exceptions

In general, your code should not simply swallow an exception. For example, this is bad form:

    try:
        os.rename(oldname, newname)
    except:
        pass

While the odds are that if an exception happens it will be because the file oldname does not exist, that is not guaranteed. It could be because of a keyboard interrupt, or a corrupted file system, or something else. Instead, you should test explicitly for any expected exception, and let the rest go by:

    try:
        os.rename(oldname, newname)
    except OSError:
        pass

WeeWX has a few specialized exception types, used to rationalized all the different types of exceptions that could be thrown by the underlying libraries. In particular, low-level I/O code can raise a myriad of exceptions, such as USB errors, serial errors, network connectivity errors, etc. All device drivers should catch these exceptions and convert them into an exception of type WeeWxIOError or one of its subclasses.

Naming conventions

How you name variables makes a big difference in code readability. In general, long names are preferable to short names. Instead of this,

p = 990.1

use this,

pressure = 990.1

or, even better, this:

pressure_mbar = 990.1

WeeWX uses a number of conventions to signal the variable type, although they are not used consistently.

Variable suffix conventions
Suffix Example Description
_ts first_ts Variable is a timestamp in unix epoch time.
_dt start_dt Variable is an instance of datetime.datetime, usually in local time.
_d end_d Variable is an instance of datetime.date, usually in local time.
_tt sod_tt Variable is an instance of time.struct_time (a time tuple), usually in local time.
_vh pressure_vh Variable is an instance of weewx.units.ValueHelper.
_vt speed_vt Variable is an instance of weewx.units.ValueTuple.

Code style

Generally, we try to follow the PEP 8 style guide, but there are many exceptions. In particular, many older WeeWX function names use camelCase, but PEP 8 calls for snake_case. Please use snake_case for new code.

Most modern code editors, such as Eclipse, or PyCharm, have the ability to automatically format code. Resist the temptation and don't use this feature! Two reasons:

If you are working with a file where the formatting is so ragged that you really must do a reformat, then do it as a separate commit. This allows the formatting changes to be clearly distinguished from more functional changes.

When invoking functions or instantiating classes, use the fully qualified name. Don't do this:

from datetime import datetime
now = datetime()

Instead, do this:

import datetime
now = datetime.datetime()

Git work flow

We use Git as the source control system. If Git is still mysterious to you, bookmark this: Pro Git, then read the chapter Git Basics. Also recommended is the article How to Write a Git Commit Message.

The code is hosted on GitHub. Their documentation is very extensive and helpful.

We generally follow Vincent Driessen's branching model. Ignore the complicated diagram at the beginning of the article, and just focus on the text. In this model, there are two key branches:

What this means to you is that if you submit a pull request that includes a new feature, make sure you commit your changes relative to the development branch. If it is just a bug fix, it should be committed against the master branch.

Tools

Python

JetBrain's PyCharm is exellent, and now there's a free Community Edition. It has many advanced features, yet is structured that you need not be exposed to them until you need them. Highly recommended.

HTML and Javascript

For Javascript, JetBrain's WebStorm is excellent, particularly if you will be using a framework such as NodeJS or ExpressJS.

Daily summaries

This section builds on the discussion The database in the Customizing Guide. Read it first.

The big flat table in the database (usually called table archive) is the definitive table of record. While it includes a lot of information, querying it can be slow. For example, to find the maximum temperature of the year would require scanning the whole thing, which might include 100,000 or more records. To speed things up, WeeWX includes daily summaries in the database as an optimization.

In the daily summaries, each observation type gets its own table, which holds a statistical summary for the day. For example, for outside temperature (observation type outTemp), this table would be named archive_day_outTemp. Here's what it would look like:

Structure of the archive_day_outTemp daily summary
dateTime min mintime max maxtime sum count wsum sumtime
1652425200 44.7 1652511600 56.0 1652477640 38297.0 763 2297820.0 45780
1652511600 44.1 1652531280 66.7 1652572500 76674.4 1433 4600464.0 85980
1652598000 50.3 1652615220 59.8 1652674320 32903.0 611 1974180.0 36660
... ... ... ... ... ... ... ... ...

Here's what the table columns mean:

Name Meaning
dateTime The time of the start of day in unix epoch time. This is the primary key in the database. It must be unique, and it cannot be null.
min The minimum temperature seen for the day. The unit is whatever unit system the main archive table uses (generally given by the first record in the table).
mintime The time in unix epoch time of the minimum temperature.
max The maximum temperature seen for the day. The unit is whatever unit system the main archive table uses (generally given by the first record in the table).
maxtime The time in unix epoch time of the maximum temperature.
sum The sum of all the temperatures for the day.
count The number of records in the day.
wsum The weighted sum of all the temperatures for the day. The weight is the archive interval. That is, for each record, the temperature is multiplied by the length of the archive record, then summed up.
sumtime The sum of all the archive intervals for the day. If the archive interval didn't change during the day, then this number would be interval * count.

Note how the average temperature for the day can be calculated as wsum / sumtime. This will be true even if the archive interval changes during the day.

Now consider an extensive variable such as rain. The total rainfall for the day will be given by the field sum. So, calculating the total rainfall for the year can be done by scanning and summing only 365 records, instead of potentially tens, or even hundreds, of thousands of records. This results in a dramatic speed up for report generation, particularly on slower machines such as the Raspberry Pi, working off an SD card.

Wind

The daily summary for wind includes six additional fields. Here's what they mean:

Name Meaning
max_dir The direction of the maximum wind seen for the day.
xsum The sum of the x-component (east-west) of the wind for the day.
ysum The sum of the y-component (north-south) of the wind for the day.
dirsumtime The sum of all the archive intervals for the day, which contributed to xsum and ysum.
squaresum The sum of the wind speed squared for the day.
wsquaresum The sum of the weighted wind speed squared for the day. That is the wind speed is squared, then multiplied by the archive interval, then summed for the day. This is useful for calculating RMS wind speed.

Note that the RMS wind speed can be calculated as

math.sqrt(wsquaresum / sumtime)

Glossary

This is a glossary of terminology used throughout the code.

Terminology used in WeeWX
Name Description
archive interval WeeWX does not store the raw data that comes off a weather station. Instead, it aggregates the data over a length of time, the archive interval, and then stores that.
archive record While packets are raw data that comes off the weather station, records are data aggregated by time. For example, temperature may be the average temperature over an archive interval. These are the data stored in the SQL database
config_dict All configuration information used by WeeWX is stored in the configuration file, usually with the name weewx.conf. By convention, when this file is read into the program, it is called config_dict, an instance of the class configobj.ConfigObj.
datetime An instance of the Python object datetime.datetime. Variables of type datetime usually have a suffix _dt.
db_dict A dictionary with all the data necessary to bind to a database. An example for SQLite would be {'driver':'db.sqlite', 'root':'/home/weewx', 'database_name':'archive/weewx.sdb'}, an example for MySQL would be { 'driver':'db.mysql', 'host':'localhost', 'user':'weewx', 'password':'mypassword', 'database_name':'weewx'}.
epoch time Sometimes referred to as "unix time," or "unix epoch time." The number of seconds since the epoch, which is 1 Jan 1970 00:00:00 UTC. Hence, it always represents UTC (well... after adding a few leap seconds. But, close enough). This is the time used in the databases and appears as type dateTime in the SQL schema, perhaps an unfortunate name because of the similarity to the completely unrelated Python type datetime. Very easy to manipulate, but it is a big opaque number.
LOOP packet The real-time data coming off the weather station. The terminology "LOOP" comes from the Davis series. A LOOP packet can contain all observation types, or it may contain only some of them ("Partial packet").
observation type A physical quantity measured by a weather station (e.g., outTemp) or something derived from it (e.g., dewpoint).
skin_dict All configuration information used by a particular skin is stored in the skin configuration file, usually with the name skin.conf. By convention, when this file is read into the program, it is called skin_dict, an instance of the class configobj.ConfigObj.
SQL type A type that appears in the SQL database. This usually looks something like outTemp, barometer, extraTemp1, and so on.
standard unit system A complete set of units used together. Either US, METRIC, or METRICWX.
time stamp A variable in unix epoch time. Always in UTC. Variables carrying a time stamp usually have a suffix _ts.
tuple-time An instance of the Python object time.struct_time. This is a 9-wise tuple that represent a time. It could be in either local time or UTC, though usually the former. See module time for more information. Variables carrying tuple time usually have a suffix _tt.
value tuple A 3-way tuple. First element is a value, second element the unit type the value is in, the third the unit group. An example would be (21.2, 'degree_C', 'group_temperature').