us- tering was applied to associate each census block with a monitoring station. This clustering analysis allowed grouping census blocks and stations on the basis of similarities within a group and dissimilarities between different groups. Using this approach, each census block was assigned by Airparif to the monitoring sta- tion (named the “index” monitor) best representing overall NO2 air quality within the census block (For more details, see Deguen et al. 2016  and Kihal et al. 2016  .
Newborn health data are available from the first birth certificate registered by the Maternal and Child Care department of Paris (named PMI: Protection Ma- ternelleet Infantile). This certificate is completed by parents and the health pro- fessional before exit of the maternity, within the 8 days following birth, then sent to the PMI local unit. Several mother and newborn characteristics are available from this certificate including birth date, gestational age and postal address of residency of the mother at the time when the certificate was completed, three crucial data needed to run the present package. All the postal addresses were geocoded at the level of residential census block.
For the purpose of this article, we illustrate our package on the births that occurred in the city of Paris between 1st January 2011 and 31st December 2011. Since the first birth certificate is a mandatory document handled by public health care services, data is considered exhaustive and covers quasi all registered births. In 2011 in Paris, 25,915 certificates have been transmitted to PMI, distri- buted over 936 IRIS. Between 1 and 107 newborns are found in each census block, with an average of 28 babies per census block. Their gestational age varies from 23 to 45 weeks with a mean of 39 weeks.
2.2. Air Pollution Exposure by Pregnancy Trimester
The procedure follows two independent and successive steps:
Step 1: Daily reconstitution of air pollutant concentrations at the census block level.
Daily concentrations of the pollutant in each census block were estimated based on the combination of the annual average concentration modeled in the census block with the relative daily variations to the annual average of its index monitor (as the index monitor is assumed to be representative of the daily varia- tions of air pollution within the census block). For example, if, for a given day, the index monitor measured that daily concentrations of nitrogen dioxide were 10% lower than the annual average at this same location, the daily concentration in this census block is set 10% lower than its annual average concentration.
Step 2: Calculation of individual air pollutant exposure by pregnancy trimes- ter.
For each birth, individual exposure to air pollution by trimester of pregnancy is estimated using the date of birth, the gestational age, and the census block in which the mother lived during her pregnancy, by averaging the indicator of daily concentrations of this census block over each trimester of pregnancy. The tri- mester divisions used are 1 - 13 weeks, 14 - 26 weeks, and equal or higher than 27 weeks to birth. If the gestational age is less than 26 weeks, the second trimes- ter goes from the 14th week to birth. In order to take into account the fact that the trimester of pregnancy can last from one to 13 weeks, the procedure also gives the length of each trimester, which can be used as weights for respective trimester exposures in a further statistical analysis.
3. Pregnancy Air Exposure Package
The package Pregnancy Air Exposure is available on the Equit’Area website and the installation is standard.
Pregnancy Air Exposure is composed of two main functions:
Reconstitution function creates the indicator of daily concentrations of the pollutant in each census block. Two data sets need to be specified: a data.frame containing the daily concentrations measured in each monitoring station and another with the annual concentrations and the index monitor of each census block throughout the study period (one year in the present case). The result ob- tained is a data.frame containing the estimations of the daily concentrations of the pollutant in each census block. This function is designed exclusively for air pol- lution data with no missing values. Yet, daily air pollutant concentrations collected in monitoring stations often contain missing values, which must consequently be imputed before using the Reconstitution function. To do so, the mtsdi package available on CRAN (https://cran.r-project.org/web/packages/mtsdi/index.html), can be used: the imputation obtained takes account for temporal dimensions, correlations between measurements in different monitoring stations, and the log-normality of the data (or normality if initial data distribution is already Gaussian)  .
Trimester Exposure function calculates individual exposure by pregnancy trimester. It also requires two data sets: a data.frame with daily air pollution concentrations by census block (which can be the one obtained using the Re- constitution function), and a data.frame with births information. In the latter, there must be no missing values for the variables used by the function, i.e. the identification number, birth date, gestational age, and the census block where the mother lived. The result is the data.frame containing births data, with 8 new variables. Five of them contain exposure for trimester 1, trimester 2, trimester 3, trimesters 1 and 2, and the whole pregnancy, respectively. The 3 other variables contain the number of weeks of each trimester of pregnancy.
In both functions, other parameters exist if the variables names are not default ones (except for annual averages), or if the dates are not given in the standard R format (yyyy-mm-dd). It allows the user to choose the variable names or the date format that he/she may want to use. It also allows having the variables in any order in the data.frames.
4.1. Descriptive Analyses of the NO2 Distribution
Figure 1 represents the spatial distribution of the annual averages of NO2 mod- eled at the census block level by Airparif for years 2010 and 2011. It shows that the annual concentrations of NO2 are high on the whole city of Paris and is nearly always above the limit of 40 mg/m3 defined by the European directive  . It also exhibits the strong spatial heterogeneity of concentrations with a north/south gradient: NO2 concentrations are generally higher in the north of Paris than in the southern census blocks.
Figure 2 reveals that, as expected, NO2 concentrations measured by the traffic monitoring stations are always higher that those measured by the background stations (two stations have been selected for this illustration). Seasonal variabili- ties are also clearly visible with higher levels in winter and lower levels in summer, a seasonal pattern which is more evident for the background stations, less influenced by the continuous traffic emissions.
4.2. Application of the R Package
The package can be used with different spatial scales and for any air pollutant. Here is a complete example of the use of the package, for the city of Paris, at the census block scale and for nitrogen dioxide (NO2).
Figure 1. Spatial distribution of the annual averages of NO2 modeled at the census block level by Airparif for years 2010 and 2011.
Figure 2. Daily NO2 concentrations over years 2010-2011 in the city of Paris for a background (line colored in blue) and a traffic (line colored in red) monitoring stations.
After installing the package, it should be loaded with the R Software:
R > library (Pregnancy Air Exposure).
The first step consists in the reconstitution of daily NO2 concentrations at the census block level.
Daily NO2 concentrations measured by the monitoring stations in Paris should be first loaded:
R > data (daily NO2 Monitoring Stations).
In order to load your own air pollution data, the read.table command can be used. Be aware that no missing values are accepted with the present R package; each year has to be complete from the 1st January to the 31st December. In the present example (Table 1), the data.frame contains 11 variables (one characte- rizing the dates and 10 qualifying the monitoring stations) and 730 observations (representing the total number of days for the 2 years, 2010 and 2011).
Annual averages and index monitors by census block must also be imported:
R > data (annual NO2 Census Block).
You can load your own air pollution data using the read.table command, but these data need to verify the following conditions. The variables containing annual average of air pollutant have to be named “Mean_YYYY” (e.g. “Mean_2010”). Each spatial area must appear only once in this data.frame.
In our example, the data contains 4 variables and 992 observations. You find in Table 2 the first and last lines of the data.frame, with the average of NO2 con- centrations expressed in μg/m³.
Table 1. Example of input data from the monitoring stations.
Table 2. Example of input data modeled by air quality monitoring networks of Ile de france region.
Then, the reconstitution at census block level can be done using the reconsti- tution function:
R > daily NO2 Census Block <-
+ reconstitution (daily Concentrations by Monitoring Station =
+ daily NO2 Monitoring Stations,
+annual Average and Index Monitor By Zone = annual NO2 Census Block,
+ date VarName = “date”,
+ date Format = “%d/%m/%Y”,
+ zone VarName = “census Block”,
+ index Monitor VarName = “index Monitor”)
The two first parameters are the two data.frame imported at the previous step. Other parameters are needed if the variables names are not the default ones, or if the dates are not entered in the standard R format (yyyy-mm-dd). For example, here we use another date format (dd/mm/YYYY), and another name for the identification of the spatial unit (“census Block”) than the default ones (with the parameters date Format and zone VarName). The index monitor and the name of the date (parameters date VarName and index Monitor VarName) are the default ones in our example (default values are given in the function help in R).The result, named dailyNO2CensusBlock, contains the estimates of the daily NO2 concentrations at the census block level expressed in μg/m3, in a data.frame with 993 variables (the total number of census blocks) and 730 observations (Table 3).
Now, from the daily NO2 concentrations for each census block, the procedure estimating exposure for each pregnancy trimester can begin.
First, birth data must be imported:
> data (“births”).
When importing your own birth data, input data must contain at least the 4 following variables with no missing values (Table 4): individual identification number, date of birth, gestational age, and census block identification number where the mother live. Our birth data contains only these 4 variables, and 25,914 observations, which represent all births in Paris in 2011.
The first and last observations are the following ones:
Table 3. Extract of the result: each variable (except the date in the first column) corresponding to a census block.
Table 4. Example of input health data.
Finally, trimester exposures can be estimated:
> exposures<- trimester Exposure(
+ daily Concentrations = daily NO2 Census Block,
+ births = births,
+ id VarName = “id”,
+ birth Date VarName = “birth Date”,
+ gestational Age VarName = “gestational Age”,
+ zone VarName = “census Block”,
+ date VarName = “date”,
+ date Format = “%Y-%m-%d”,
+ birth Date Format = “%d/%m/%Y”).
The two first parameters are air pollution data and birth data. As the previous function, the trimester Exposure function provides other parameters to use other variables names or dates formats than the default ones. Format of the exposure date and birth date variable can be changed. It is important that the user verifies that all the dates are compatible: exposure dates must cover the births dates period, and one year before to cover the period of pregnancy. In this example, we use birth data in 2011; for this reason, the period of air pollution data must go from 2010 to 2011 in order to estimate exposure of a birth occurring in the be- ginning of year 2011.
The result of this function is the data.frame containing births data, and the 8 new variables described before (Table 5). The first and last rows for these new variables are:
If pregnancy lasts less than 27 weeks, the observation will look like: (with “NA” representing a missing value)
The second function computing trimester exposures can be used on its own, “for different exposure assessments. As an example, the census block attributed to each birth can be replaced by a monitoring station (probably the closest sta- tion from where the mother lives), and pollution data in the first parameter by
Table 5. Example of exposure during different critical window.
Legend: T1: daily average exposure during the first trimester; T2: daily average exposure during the second trimester; T3: daily average exposure during the third trimester; T12: daily average exposure during the first and second trimester; T123: daily average exposure during the whole pregnancy; nWeeksT1: number of weeks of the first trimester; nWeeksT2: number of weeks of the second trimester; nWeeksT3: number of weeks of the third trimester. Note: The last newborn presented in this table was exposed during the 15 weeks of his last trimester of gestation to an average of 38.80 μg/m3 of NO2 per day. He/she was exposed during the 41 weeks of gestation to an average of 43.59 μg/m3 of NO2 per day.
daily concentrations, by monitoring stations instead of census blocks. In this case, the value given to daily Concentrations parameter in trimester Exposure function would look like the data.frame daily NO2 Monitoring Stations. For this purpose, all the variable names can be changed in the parameters of the functions.
5. Conclusion and Perspectives
In this article we have presented the Pregnancy Air Exposure package designed to ease the estimation of exposure to air pollutants during three windows across pregnancy trimesters. To our knowledge, no such reproducible procedure had previously been proposed. One advantage of the package is that the two implemented functions can be used independently. For instance, if input data are already in the appropriate format, the first function is not needed, and exposure during pregnancy periods can be readily estimated. As a domain of application for future work, the package could be used to extend estimation of air pollution to pre-pregnancy―during the conception period, for instance. Lastly, we expect to extend the package in the future, such as implementing complementary functions and additional tools that would allow visualization of results (such as mapping). These improvements will be made in response to user feedback and requirements.
This paper was supported by the following grant(s): Fondation de France 201300040943 2013-2016 to Séverine Deguen.
This work is supported by Fondation de France (grant N° 201300040943 2013-2016) and the EHESP School of Public Health. The funder had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.