This guide is intended for developers contributing to the open source project WeeWX.
Goals
The primary design goals of WeeWX are:
- Architectural simplicity. No semaphores, no named pipes, no inter-process communications, no complex multi-threading to manage.
- Extensibility. Make it easy for the user to add new features or to modify existing features.
- "Fast enough" In any design decision, architectural simplicity and elegance trump speed.
- One code base. A single code base should be used for all platforms, all weather stations, all reports, and any combination of features. Ample configuration and customization options should be provided so the user does not feel tempted to start hacking code. At worse, the user may have to subclass, which is much easier to port to newer versions of the code base, than customizing the base code.
- Minimal dependencies. The code should rely on a minimal number of external packages, so the user does not have to go chase them down all over the Web before getting started.
- Simple data model. The implementation should use a very simple data model that is likely to support many different types of hardware.
- A "pythonic" code base. The code should be written so others will find idioms that they recognize.
Strategies
To meet these goals, the following strategies were used:
- A "micro-kernel" design. The WeeWX engine actually does very little. Its primary job is to load and run services at runtime, making it easy for users to add or subtract features.
- A largely stateless design style. For example, many of the processing routines read the data they need directly from the database, rather than caching it and sharing with other routines. While this means the same data may be read multiple times, it also means the only point of possible cache incoherence is through the database, where transactions are easily controlled. This greatly reduces the chances of corrupting the data, making it much easier to understand and modify the code base.
- Isolated data collection and archiving. The code for collecting and archiving data run in a single thread that is simple enough that it is unlikely to crash. The report processing is where most mistakes are likely to happen, so isolate that in a separate thread. If it crashes, it will not affect the main data thread.
- A powerful configuration parser. The ConfigObj module, by Michael Foord and Nicola Larosa, was chosen to read the configuration file. This allows many options that might otherwise have to go in the code to go instead in a configuration file.
- A powerful templating engine. The Cheetah module was chosen for generating html and other types of files from templates. Cheetah allows search list extensions to be defined, making it easy to extend WeeWX with new template tags. Unfortunately, as of 2016, Cheetah has not been updated in many years (indeed, the Cheetah website seems to be dead). Fortunately, Cheetah seems to be very robust, with only a few well-known bugs that are easiy worked around, so we will likely continue to use it for the foreseeable future.
- Pure Python. The code base is 100% Python — no underlying C libraries need be built to install WeeWX. This also means no Makefiles are needed.
While WeeWX is nowhere near as fast at generating images and HTML as its predecessor, wview (this is partially because WeeWX uses fancier fonts and a much more powerful templating engine), it is fast enough for all platforms but the slowest. I run it regularly on a 500 MHz machine where generating the 9 images used in the "Current Conditions" page takes just under 2 seconds (compared with 0.4 seconds for wview).
All writes to the databases are protected by transactions. You can kill the program at any time (either Control-C if run directly or "/etc/init.d/weewx stop" if run as a daemon) without fear of corrupting the databases.
The code makes ample use of exceptions to insure graceful recovery from problems such as network outages. It also monitors socket and console timeouts, restarting whatever it was working on several times before giving up. In the case of an unrecoverable console error (such as the console not responding at all), the program waits 60 seconds then restarts the program from the top.
Any "hard" exceptions, that is those that do not involve network and console timeouts and are most likely due to a logic error, are logged, reraised, and ultimately cause thread termination. If this happens in the main thread (not likely due to its simplicity), then this causes program termination. If it happens in the report processing thread (much more likely), then only the generation of reports will be affected — the main thread will continue downloading data off the instrument and putting them in the database. You can fix the problem at your leisure, without worrying about losing any data.
Units
In general, there are three different areas where the unit system makes a difference:
- On the weather station hardware. Different manufacturers use different unit systems for their hardware. The Davis Vantage series use U.S. Customary units exclusively, Fine Offset and LaCrosse stations use metric, while Oregon Scientific, Peet Bros, and Hideki stations use a mishmash of US and metric.
- In the database. Either US or Metric can be used.
- In the presentation (i.e., html and image files).
The general strategy is that measurements are converted by service StdConvert as they come off the weather station into a target unit system, then stored internally in the database in that unit system. Then, as they come off the database to be used for a report, they are converted into a target unit, specified by the skin.
Value "None"
The Python special value None is used throughout to signal an invalid or bad data point. All functions must be written to expect it.
Device drivers should be written to emit None if a data value is bad (perhaps because of a failed checksum). If the hardware simply doesn't support it, then the driver should not emit a value at all.
The same rule applies to derived values. If the input data for a derived value are missing, then no derived value should be emitted. However, if the input values are present, but have value None, then the derived value should be set to None.
However, the time value must never be None. This is because it is used as the primary key in the SQL database.
Time
WeeWX stores all data in UTC (roughly, "Greenwich" or "Zulu") time. However, usually one is interested in weather events in local time and want image and HTML generation to reflect that. Furthermore, most weather stations are configured in local time. This requires that many data times be converted back and forth between UTC and local time. To avoid tripping up over time zones and daylight savings time, WeeWX generally uses Python routines to do this conversion. Nowhere in the code base is there any explicit recognition of DST. Instead, its presence is implicit in the conversions. At times, this can cause the code to be relatively inefficient.
For example, if one wanted to plot something every 3 hours in UTC time, it would be very simple: to get the next plot point, just add 10,800 to the epoch time:
next_ts = last_ts + 10800
But, if one wanted to plot something for every 3 hours in local time (that is, at 0000, 0300, 0600, etc.), despite a possible DST change in the middle, then things get a bit more complicated. One could modify the above to recognize whether a DST transition occurs sometime between last_ts and the next three hours and, if so, make the necessary adjustments. This is generally what wview does. WeeWX takes a different approach and converts from UTC to local, does the arithmetic, then converts back. This is inefficient, but bulletproof against changes in DST algorithms, etc:
time_dt = datetime.datetime.fromtimestamp(last_ts) delta = datetime.timedelta(seconds=10800) next_dt = time_dt + delta next_ts = int(time.mktime(next_dt.timetuple()))
Other time conversion problems are handled in a similar manner.
For astronomical calculations, WeeWX uses the latitude and longitude specified in the configuration file. If that location does not correspond to the computer's local time, reports with astronomical times will probably be incorrect.
Internationalization
Generally, WeeWX does not make much use of Unicode. This is because the Python 2.x libraries do not always handle it correctly. In particular, the function time.strftime() completely fails when handed a Unicode string with a non-ASCII character. As this function is often used by extensions, working around this bug is an unfair expectation on extension writers. So, we generally avoid Unicode.
Instead, WeeWX mostly uses regular strings, with any non-ASCII characters encoded as UTF-8.
An exception to this general rule is the image generator, which holds labels internally in Unicode, because that is the encoding expected by most fonts.
The document The Absolute Minimum Every Software Developer Absolutely, Positively Must Know About Unicode and Character Sets by Joel Spolsky, is highly recommended if you are just starting to work with UTF-8 and Unicode.
Exceptions
In general, your code should not simply swallow an exception. For example, this is bad form:
try: os.rename(oldname, newname) except: pass
While the odds are that if an exception happens it will be because the file oldname does not exist, that is not guaranteed. It could be because of a keyboard interrupt, or a corrupted file system, or something else. Instead, you should test explicitly for any expected exception, and let the rest go by:
try: os.rename(oldname, newname) except OSError: pass
WeeWX has a few specialized exception types, used to rationalized all the different types of exceptions that could be thrown by the underlying libraries. In particular, low-level I/O code can raise a myriad of exceptions, such as USB errors, serial errors, network connectivity errors, etc. All device drivers should catch these exceptions and convert them into an exception of type WeeWxIOError or one of its subclasses.
Code style
Generally, we try to follow the PEP 8 style guide, but there are many exceptions. In particular, many older WeeWX function names use camelCase, but PEP 8 calls for snake_case. Please use snake_case for new code.
Most modern code editors, such as Eclipse, or PyCharm, have the ability to automatically format code. Resist the temptation and don't use this feature! Two reasons:
- Unless all developers use the same tool, using the same settings, we will just thrash back and forth between slightly different versions.
- Automatic formatters play a useful role, but some of what they do are really trivial changes, such as removing spaces in otherwise blank lines. Now if someone is trying to figure out what real, syntactic changes you have made, s/he will have to wade through all those extraneous "changed lines," trying to find the important stuff.
If you are working with a file where the formatting is so ragged that you really must do a reformat, then do it as a separate commit. This allows the formatting changes to be clearly distinguished from more functional changes.
When invoking functions or instantiating classes, use the fully qualified name. Don't do this:
from datetime import dt now = dt()
Instead, do this:
import datetime now = datetime.datetime()
Git work flow
We use git as the source control system.
We generally follow Vincent Driessen's branching model. Ignore the complicated diagram at the beginning of the article, and just focus on the text. In this model, there are two key branches:
- master. Fixes go into this branch. We tend to use fewer "hot fix" branches and, instead, just incorporate any fixes directly into the branch. Releases are tagged relative to this branch.
- development (called develop in Vince's article). This is where new features go. Before a release, they will be merged into the master branch.
What this means to you is that if you submit a pull request that includes a new feature, make sure you commit your changes relative to the development branch.
Tools
Python
Eclipse, with the PyDev Python extension, is highly recommended. It's free, easy to customize and extremely powerful.
JetBrain's PyCharm is also good, and now there's a free Community Edition. Where it really shines is if you use a framework such as Django, or Backbone, but WeeWX does not use any of these, so there is no real need for PyCharm's extra functionality when working with WeeWX.
HTML and Javascript
For HTML, JetBrain's WebStorm used to be the undisputed master. However, in recent years, I've found that Eclipse's "Web Development Tools" to be its equal, or even better, particularly when working with long HTML documents like the Customizing Guide.
However, if you are working with Javascript, particularly if you're using a framework like NodeJS or ExpressJS, there is no contest: WebStorm is the way to go.
Glossary
This is a glossary of terminology used throughout the code.
Name | Description |
archive interval | WeeWX does not store the raw data that comes off a weather station. Instead, it aggregates the data over a length of time, the archive interval, and then stores that. |
archive record | While packets are raw data that comes off the weather station, records are data aggregated by time. For example, temperature may be the average temperature over an archive interval. These are the data stored in the SQL database |
config_dict | All configuration information used by WeeWX is stored in the configuration file, usually with the name weewx.conf. By convention, when this file is read into the program, it is called config_dict, an instance of the class configobj.ConfigObj. |
datetime | An instance of the Python object datetime.datetime. Variables of type datetime usually have a suffix _dt. |
db_dict | A dictionary with all the data necessary to bind to a database. An example for SQLite would be {'driver':'db.sqlite', 'root':'/home/weewx', 'database_name':'archive/weewx.sdb'}, an example for MySQL would be { 'driver':'db.mysql', 'host':'localhost', 'user':'weewx', 'password':'mypassword', 'database_name':'weewx'}. |
epoch time | Sometimes referred to as "unix time," or "unix epoch time." The number of seconds since the epoch, which is 1 Jan 1970 00:00:00 UTC. Hence, it always represents UTC (well... after adding a few leap seconds. But, close enough). This is the time used in the databases and appears as type dateTime in the SQL schema, perhaps an unfortunate name because of the similarity to the completely unrelated Python type datetime. Very easy to manipulate, but it is a big opaque number. |
LOOP packet | The real-time data coming off the weather station. The terminology "LOOP" comes from the Davis series. A LOOP packet can contain all observation types, or it may contain only some of them ("Partial packet"). |
observation type | A physical quantity measured by a weather station (e.g., outTemp) or something derived from it (e.g., dewpoint). |
skin_dict | All configuration information used by a particular skin is stored in the skin configuration file, usually with the name skin.conf. By convention, when this file is read into the program, it is called skin_dict, an instance of the class configobj.ConfigObj. |
SQL type | A type that appears in the SQL database. This usually looks something like outTemp, barometer, extraTemp1, and so on. |
standard unit system | A complete set of units used together. Either US, METRIC, or METRICWX. |
time stamp | A variable in unix epoch time. Always in UTC. Variables carrying a time stamp usually have a suffix _ts. |
tuple-time | An instance of the Python object time.struct_time. This is a 9-wise tuple that represent a time. It could be in either local time or UTC, though usually the former. See module time for more information. Variables carrying tuple time usually have a suffix _tt. |
value tuple | A 3-way tuple. First element is a value, second element the unit type the value is in, the third the unit group. An example would be (21.2, 'degree_C', 'group_temperature'). |