This guide is intended for developers contributing to the open source project WeeWX.
Goals
The primary design goals of WeeWX are:
- Architectural simplicity. No semaphores, no named pipes, no inter-process communications, no complex multi-threading to manage.
- Extensibility. Make it easy for the user to add new features or to modify existing features.
- "Fast enough" In any design decision, architectural simplicity and elegance trump speed.
- One code base. A single code base should be used for all platforms, all weather stations, all reports, and any combination of features. Ample configuration and customization options should be provided so the user does not feel tempted to start hacking code. At worse, the user may have to subclass, which is much easier to port to newer versions of the code base, than customizing the base code.
- Minimal dependencies. The code should rely on a minimal number of external packages, so the user does not have to go chase them down all over the Web before getting started.
- Simple data model. The implementation should use a very simple data model that is likely to support many different types of hardware.
- A "pythonic" code base. The code should be written in a style that others will recognize.
Strategies
To meet these goals, the following strategies were used:
- A "micro-kernel" design. The WeeWX engine actually does very little. Its primary job is to load and run services at runtime, making it easy for users to add or subtract features.
- A largely stateless design style. For example, many of the processing routines read the data they need directly from the database, rather than caching it and sharing with other routines. While this means the same data may be read multiple times, it also means the only point of possible cache incoherence is through the database, where transactions are easily controlled. This greatly reduces the chances of corrupting the data, making it much easier to understand and modify the code base.
- Isolated data collection and archiving. The code for collecting and archiving data run in a single thread that is simple enough that it is unlikely to crash. The report processing is where most mistakes are likely to happen, so isolate that in a separate thread. If it crashes, it will not affect the main data thread.
- A powerful configuration parser. The ConfigObj module, by Michael Foord and Nicola Larosa, was chosen to read the configuration file. This allows many options that might otherwise have to go in the code, to be in a configuration file.
- A powerful templating engine. The Cheetah module was chosen for generating html and other types of files from templates. Cheetah allows search list extensions to be defined, making it easy to extend WeeWX with new template tags.
- Pure Python. The code base is 100% Python — no underlying C libraries need be built to install WeeWX. This also means no Makefiles are needed.
While WeeWX is nowhere near as fast at generating images and HTML as its predecessor, wview (this is partially because WeeWX uses fancier fonts and a much more powerful templating engine), it is fast enough for all platforms but the slowest. I run it regularly on a 500 MHz machine where generating the 9 images used in the "Current Conditions" page takes just under 2 seconds (compared with 0.4 seconds for wview).
All writes to the databases are protected by transactions. You can kill the program at any time (either Control-C if run directly or "/etc/init.d/weewx stop" if run as a daemon) without fear of corrupting the databases.
The code makes ample use of exceptions to insure graceful recovery from problems such as network outages. It also monitors socket and console timeouts, restarting whatever it was working on several times before giving up. In the case of an unrecoverable console error (such as the console not responding at all), the program waits 60 seconds then restarts the program from the top.
Any "hard" exceptions, that is those that do not involve network and console timeouts and are most likely due to a logic error, are logged, reraised, and ultimately cause thread termination. If this happens in the main thread (not likely due to its simplicity), then this causes program termination. If it happens in the report processing thread (much more likely), then only the generation of reports will be affected — the main thread will continue downloading data off the instrument and putting them in the database.
Units
In general, there are three different areas where the unit system makes a difference:
- On the weather station hardware. Different manufacturers use different unit systems for their hardware. The Davis Vantage series use U.S. Customary units exclusively, Fine Offset and LaCrosse stations use metric, while Oregon Scientific, Peet Bros, and Hideki stations use a mishmash of US and metric.
- In the database. Either US or Metric can be used.
- In the presentation (i.e., html and image files).
The general strategy is that measurements are converted by service StdConvert as they come off the weather station into a target unit system, then stored internally in the database in that unit system. Then, as they come off the database to be used for a report, they are converted into a target unit, specified by a combination of the configuration file weewx.conf and the skin configuration file skin.conf.
Value "None"
The Python special value None is used throughout to signal an invalid or bad data point. All functions must be written to expect it.
Device drivers should be written to emit None if a data value is bad (perhaps because of a failed checksum). If the hardware simply doesn't support a data type, then the driver should not emit a value at all.
The same rule applies to derived values. If the input data for a derived value are missing, then no derived value should be emitted. However, if the input values are present, but have value None, then the derived value should be set to None.
However, the time value must never be None. This is because it is used as the primary key in the SQL database.
Time
WeeWX stores all data in UTC (roughly, "Greenwich" or "Zulu") time. However, usually one is interested in weather events in local time and want image and HTML generation to reflect that. Furthermore, most weather stations are configured in local time. This requires that many data times be converted back and forth between UTC and local time. To avoid tripping up over time zones and daylight savings time, WeeWX generally uses Python routines to do this conversion. Nowhere in the code base is there any explicit recognition of DST. Instead, its presence is implicit in the conversions. At times, this can cause the code to be relatively inefficient.
For example, if one wanted to plot something every 3 hours in UTC time, it would be very simple: to get the next plot point, just add 10,800 to the epoch time:
next_ts = last_ts + 10800
But, if one wanted to plot something for every 3 hours in local time (that is, at 0000, 0300, 0600, etc.), despite a possible DST change in the middle, then things get a bit more complicated. One could modify the above to recognize whether a DST transition occurs sometime between last_ts and the next three hours and, if so, make the necessary adjustments. This is generally what wview does. WeeWX takes a different approach and converts from UTC to local, does the arithmetic, then converts back. This is inefficient, but bulletproof against changes in DST algorithms, etc:
time_dt = datetime.datetime.fromtimestamp(last_ts) delta = datetime.timedelta(seconds=10800) next_dt = time_dt + delta next_ts = int(time.mktime(next_dt.timetuple()))
Other time conversion problems are handled in a similar manner.
For astronomical calculations, WeeWX uses the latitude and longitude specified in the configuration file. If that location does not correspond to the computer's local time, reports with astronomical times will probably be incorrect.
Archive records
An archive record's timestamp, whether in software or in the database, represents the end time of the record. For example, a record timestamped 05-Feb-2016 09:35, includes data from an instant after 09:30, through 09:35. Another way to think of it is that it is exclusive on the left, inclusive on the right. Schematically:
09:30 < dateTime <= 09:35
Database queries should reflect this. For example, to find the maximum temperature for the hour between timestamps 1454691600 and 1454695200, the query would be:
SELECT MAX(outTemp) FROM archive WHERE dateTime > 1454691600 and dateTime <= 1454695200;
This ensures that the record at the beginning of the hour (1454691600) does not get included (it belongs to the previous hour), while the record at the end of the hour (1454695200) does.
One must be constantly be aware of this convention when working with timestamped data records.
Better yet, if you need this kind of information, use an xtypes call:
max_temp = weewx.xtypes.get_aggregate('outTemp', (1454691600, 1454695200), 'max', db_manager)
It will not only make sure the limits of the query are correct, but will also decide whether or not the daily summary optimization can be used (details below). If not, it will use the regular archive table.
Internationalization
Generally, WeeWX is locale aware. It will emit reports using the local formatting conventions for date, times, and values.
Exceptions
In general, your code should not simply swallow an exception. For example, this is bad form:
try: os.rename(oldname, newname) except: pass
While the odds are that if an exception happens it will be because the file oldname does not exist, that is not guaranteed. It could be because of a keyboard interrupt, or a corrupted file system, or something else. Instead, you should test explicitly for any expected exception, and let the rest go by:
try: os.rename(oldname, newname) except OSError: pass
WeeWX has a few specialized exception types, used to rationalized all the different types of exceptions that could be thrown by the underlying libraries. In particular, low-level I/O code can raise a myriad of exceptions, such as USB errors, serial errors, network connectivity errors, etc. All device drivers should catch these exceptions and convert them into an exception of type WeeWxIOError or one of its subclasses.
Naming conventions
How you name variables makes a big difference in code readability. In general, long names are preferable to short names. Instead of this,
p = 990.1
use this,
pressure = 990.1
or, even better, this:
pressure_mbar = 990.1
WeeWX uses a number of conventions to signal the variable type, although they are not used consistently.
Suffix | Example | Description |
_ts | first_ts | Variable is a timestamp in unix epoch time. |
_dt | start_dt | Variable is an instance of datetime.datetime, usually in local time. |
_d | end_d | Variable is an instance of datetime.date, usually in local time. |
_tt | sod_tt | Variable is an instance of time.struct_time (a time tuple), usually in local time. |
_vh | pressure_vh | Variable is an instance of weewx.units.ValueHelper. |
_vt | speed_vt | Variable is an instance of weewx.units.ValueTuple. |
Code style
Generally, we try to follow the PEP 8 style guide, but there are many exceptions. In particular, many older WeeWX function names use camelCase, but PEP 8 calls for snake_case. Please use snake_case for new code.
Most modern code editors, such as Eclipse, or PyCharm, have the ability to automatically format code. Resist the temptation and don't use this feature! Two reasons:
- Unless all developers use the same tool, using the same settings, we will just thrash back and forth between slightly different versions.
- Automatic formatters play a useful role, but some of what they do are really trivial changes, such as removing spaces in otherwise blank lines. Now if someone is trying to figure out what real, syntactic, changes you have made, s/he will have to wade through all those extraneous "changed lines," trying to find the important stuff.
If you are working with a file where the formatting is so ragged that you really must do a reformat, then do it as a separate commit. This allows the formatting changes to be clearly distinguished from more functional changes.
When invoking functions or instantiating classes, use the fully qualified name. Don't do this:
from datetime import datetime now = datetime()
Instead, do this:
import datetime now = datetime.datetime()
Git work flow
We use Git as the source control system. If Git is still mysterious to you, bookmark this: Pro Git, then read the chapter Git Basics. Also recommended is the article How to Write a Git Commit Message.
The code is hosted on GitHub. Their documentation is very extensive and helpful.
We generally follow Vincent Driessen's branching model. Ignore the complicated diagram at the beginning of the article, and just focus on the text. In this model, there are two key branches:
- master. Fixes go into this branch. We tend to use fewer "hot fix" branches and, instead, just incorporate any fixes directly into the branch. Releases are tagged relative to this branch.
- development (called develop in Vince's article). This is where new features go. Before a release, they will be merged into the master branch.
What this means to you is that if you submit a pull request that includes a new feature, make sure you commit your changes relative to the development branch. If it is just a bug fix, it should be committed against the master branch.
Tools
Python
JetBrain's PyCharm is exellent, and now there's a free Community Edition. It has many advanced features, yet is structured that you need not be exposed to them until you need them. Highly recommended.
HTML and Javascript
For Javascript, JetBrain's WebStorm is excellent, particularly if you will be using a framework such as NodeJS or ExpressJS.
Daily summaries
This section builds on the discussion The database in the Customizing Guide. Read it first.
The big flat table in the database (usually called table archive) is the definitive table of record. While it includes a lot of information, querying it can be slow. For example, to find the maximum temperature of the year would require scanning the whole thing, which might include 100,000 or more records. To speed things up, WeeWX includes daily summaries in the database as an optimization.
In the daily summaries, each observation type gets its own table, which holds a statistical summary for the day. For example, for outside temperature (observation type outTemp), this table would be named archive_day_outTemp. Here's what it would look like:
dateTime | min | mintime | max | maxtime | sum | count | wsum | sumtime |
1652425200 | 44.7 | 1652511600 | 56.0 | 1652477640 | 38297.0 | 763 | 2297820.0 | 45780 |
1652511600 | 44.1 | 1652531280 | 66.7 | 1652572500 | 76674.4 | 1433 | 4600464.0 | 85980 |
1652598000 | 50.3 | 1652615220 | 59.8 | 1652674320 | 32903.0 | 611 | 1974180.0 | 36660 |
... | ... | ... | ... | ... | ... | ... | ... | ... |
Here's what the table columns mean:
Name | Meaning |
dateTime | The time of the start of day in unix epoch time. This is the primary key in the database. It must be unique, and it cannot be null. |
min | The minimum temperature seen for the day. The unit is whatever unit system the main archive table uses (generally given by the first record in the table). |
mintime | The time in unix epoch time of the minimum temperature. |
max | The maximum temperature seen for the day. The unit is whatever unit system the main archive table uses (generally given by the first record in the table). |
maxtime | The time in unix epoch time of the maximum temperature. |
sum | The sum of all the temperatures for the day. |
count | The number of records in the day. |
wsum | The weighted sum of all the temperatures for the day. The weight is the archive interval. That is, for each record, the temperature is multiplied by the length of the archive record, then summed up. |
sumtime | The sum of all the archive intervals for the day. If the archive interval didn't change during the day, then this number would be interval * count. |
Note how the average temperature for the day can be calculated as wsum / sumtime. This will be true even if the archive interval changes during the day.
Now consider an extensive variable such as rain. The total rainfall for the day will be given by the field sum. So, calculating the total rainfall for the year can be done by scanning and summing only 365 records, instead of potentially tens, or even hundreds, of thousands of records. This results in a dramatic speed up for report generation, particularly on slower machines such as the Raspberry Pi, working off an SD card.
Wind
The daily summary for wind includes six additional fields. Here's what they mean:
Name | Meaning |
max_dir | The direction of the maximum wind seen for the day. |
xsum | The sum of the x-component (east-west) of the wind for the day. |
ysum | The sum of the y-component (north-south) of the wind for the day. |
dirsumtime | The sum of all the archive intervals for the day, which contributed to xsum and ysum. |
squaresum | The sum of the wind speed squared for the day. |
wsquaresum | The sum of the weighted wind speed squared for the day. That is the wind speed is squared, then multiplied by the archive interval, then summed for the day. This is useful for calculating RMS wind speed. |
Note that the RMS wind speed can be calculated as
math.sqrt(wsquaresum / sumtime)
Glossary
This is a glossary of terminology used throughout the code.
Name | Description |
archive interval | WeeWX does not store the raw data that comes off a weather station. Instead, it aggregates the data over a length of time, the archive interval, and then stores that. |
archive record | While packets are raw data that comes off the weather station, records are data aggregated by time. For example, temperature may be the average temperature over an archive interval. These are the data stored in the SQL database |
config_dict | All configuration information used by WeeWX is stored in the configuration file, usually with the name weewx.conf. By convention, when this file is read into the program, it is called config_dict, an instance of the class configobj.ConfigObj. |
datetime | An instance of the Python object datetime.datetime. Variables of type datetime usually have a suffix _dt. |
db_dict | A dictionary with all the data necessary to bind to a database. An example for SQLite would be {'driver':'db.sqlite', 'root':'/home/weewx', 'database_name':'archive/weewx.sdb'}, an example for MySQL would be { 'driver':'db.mysql', 'host':'localhost', 'user':'weewx', 'password':'mypassword', 'database_name':'weewx'}. |
epoch time | Sometimes referred to as "unix time," or "unix epoch time." The number of seconds since the epoch, which is 1 Jan 1970 00:00:00 UTC. Hence, it always represents UTC (well... after adding a few leap seconds. But, close enough). This is the time used in the databases and appears as type dateTime in the SQL schema, perhaps an unfortunate name because of the similarity to the completely unrelated Python type datetime. Very easy to manipulate, but it is a big opaque number. |
LOOP packet | The real-time data coming off the weather station. The terminology "LOOP" comes from the Davis series. A LOOP packet can contain all observation types, or it may contain only some of them ("Partial packet"). |
observation type | A physical quantity measured by a weather station (e.g., outTemp) or something derived from it (e.g., dewpoint). |
skin_dict | All configuration information used by a particular skin is stored in the skin configuration file, usually with the name skin.conf. By convention, when this file is read into the program, it is called skin_dict, an instance of the class configobj.ConfigObj. |
SQL type | A type that appears in the SQL database. This usually looks something like outTemp, barometer, extraTemp1, and so on. |
standard unit system | A complete set of units used together. Either US, METRIC, or METRICWX. |
time stamp | A variable in unix epoch time. Always in UTC. Variables carrying a time stamp usually have a suffix _ts. |
tuple-time | An instance of the Python object time.struct_time. This is a 9-wise tuple that represent a time. It could be in either local time or UTC, though usually the former. See module time for more information. Variables carrying tuple time usually have a suffix _tt. |
value tuple | A 3-way tuple. First element is a value, second element the unit type the value is in, the third the unit group. An example would be (21.2, 'degree_C', 'group_temperature'). |